Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--gene-key Error #3

Open
Rohit-Satyam opened this issue Oct 26, 2021 · 2 comments
Open

--gene-key Error #3

Rohit-Satyam opened this issue Oct 26, 2021 · 2 comments

Comments

@Rohit-Satyam
Copy link

Hi !!

I was trying to use the updated version of PERF and use the new feature for one of my bacterial strains. However, I am getting the following error

PERF -i ../raw/Tenacibaculum_discolor_gca_003664185.fa --format fasta -a -g ../raw/Tenacibaculum_discolor_gca_003664185.ASM366418v1.49.gff3 --anno-format GFF --gene-key ID

ERROR:

Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 12/12 [00:04<00:00,  2.97it/s]

GeneKeyError:
The attribute "gene_id" is not among the attributes for gene. Please select a different one.
The available ones are [Parent, Name, constitutive, ensembl_end_phase, ensembl_phase, exon_id, rank]

My GFF files contains the following flags in last column but changing it to ID or any other flag isn't working

ID=gene:C8N27_0080;biotype=protein_coding;description=cyclophilin family peptidyl-prolyl cis-trans isomerase;gene_id=C8N27_0080;logic_name=ena

When I use GTF file the error is

Using length cutoff of 12
Processing Ga0183463_112: 100%|█████████████████████████████████████████████████████████| 14/14 [00:03<00:00,  3.66it/s]
Traceback (most recent call last):
  File "/home/rohit/miniconda3/bin/PERF", line 8, in <module>
    sys.exit(main())
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 162, in main
    ssr_native(args, length_cutoff=args.min_length)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/core.py", line 106, in ssr_native
    fasta_ssrs(args, repeats_info)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/rep_utils.py", line 253, in fasta_ssrs
    annotate(args)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 160, in annotate
    gffObject = process_annofile(anno_file, annotype, gene_id)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 112, in process_annofile
    attr_obj = process_attrs(attribute, annotype)
  File "/home/rohit/miniconda3/lib/python3.8/site-packages/PERF/annotation.py", line 66, in process_attrs
    attr_obj[attrName] = attr[1].strip()
IndexError: list index out of range

I am not sure what is being used in the background to process GFF/GTF files but my highest recommendation is to integrate PERF with AGAT which is an excellent tool for GTF/GFF file processing and handling.

@avvaruakshay
Copy link
Collaborator

Hi,
Sorry you had to face the issue. I can see that you have mentioned the gene identifier as ID. Can you please check if any of the entries is missing an ID identifier? PERF uses a in house script for parsing GFF and GTF files and maybe facing an issue. Thank you for the suggestion on integrating AGAT with PERF. I'll surely look into it.

@avvaruakshay
Copy link
Collaborator

avvaruakshay commented Oct 29, 2021

Hi,
Based on you input files I have downloaded the genome and GFF of "Tenacibaculum_discolor" from NCBI and run PERF on it.

Command:

PERF -i GCF_003664185.1_ASM366418v1_genomic.fna.gz -g GCF_003664185.1_ASM366418v1_genomic.gff.gz --gene-key ID

Using length cutoff of 12
Processing NZ_RCCS01000003.1: 100%|██████████████████████| 12/12 [00:00<00:00, 18.09it/s]

Generating annotations for identified repeats..
100%|██████████████████████████████████| 2759/2759 [00:00<00:00, 32419.89it/s]

Output:

NZ_RCCS01000004.1	1021	1033	AAAATT	12	-	2	TTTAAT	gene-C8N27_RS00345427	1491	-	Genic	Promoter	-594
NZ_RCCS01000004.1	1452	1466	AACAC	14	-	2	TGTGT	gene-C8N27_RS00345427	1491	-	Genic	Promoter	-1025
NZ_RCCS01000004.1	2143	2155	AAAATG	12	-	2	CATTTT	gene-C8N27_RS003501668	4418	-	Genic	Promoter	-475
NZ_RCCS01000004.1	2301	2313	AAACG	12	-	2	TTCGT	gene-C8N27_RS003501668	4418	-	Genic	Promoter	-633

Seems to have not faced any issue. Can you please check your input file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants