Skip to content

Creating a splice annotation from a GTF file

Dana Wyman edited this page Jan 6, 2020 · 3 revisions

TranscriptClean was originally designed to perform noncanonical splice junction correction using high-confidence splice junctions derived from mapping short reads to the genome with STAR. However, if you prefer to use splice junctions from a GTF formatted transcript annotation such as GENCODE, you can use our accessory script, get_SJs_from_gtf.py, to convert your GTF to the splice junction file format required by TranscriptClean using this command:

python accessory_scripts/get_SJs_from_gtf.py --f /path/to/annotation.gtf \
                                             --g /path/to/reference_genome.fa \
                                             --o spliceJns.txt

The output file follows the STAR SJ.out.tab format, which is described in detail in the STAR manual (section 4.4) here:
Columns:

  1. Chromosome
  2. First base of the intron (1-based)
  3. Last base of the intron (1-based)
  4. Strand (0: undefined, 1: +, 2: -)
  5. Intron motif code
    0: non-canonical,
    1: GT/AG
    2: CT/AC
    3: GC/AG
    4: CT/GC
    5: AT/AC
    6: GT/AT
  6. 0: unannotated, 1: annotated. This script assigns '1' to all junctions.
    Columns 7, 8, and 9 are set to "NA" by this script.
Clone this wiki locally