Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated proteins at genome end #32

Open
ilyavs opened this issue Mar 29, 2023 · 1 comment
Open

Truncated proteins at genome end #32

ilyavs opened this issue Mar 29, 2023 · 1 comment

Comments

@ilyavs
Copy link

ilyavs commented Mar 29, 2023

Hi,
I am encountering cases where there are truncated proteins (missing stop codon) called by phanotate at the genome end. The common case was when there was another protein on the other side of the contig which was missing a start codon. Thus, when rotating the genome, a complete CDS would be found.
However, after rotating the genome, I still see rare cases of truncated proteins at genome end for which I can't find a logical continuation on the other side of the contig. Is this the intended behavior? Should I discard these proteins in post processing?
Thanks,
Ilya.

@deprekate
Copy link
Owner

It is standard for all gene callers to extend the ORFs off the ends, even without checking the other end for a possible connecting ORF:

$ prodigal -i NC_006820.fna | grep CDS | tail -n5
CDS 194317..194925
CDS 194897..195151
CDS 195163..195336
CDS 195308..195595
CDS 195718..>196278

However phanotate should be adding the chevrons to the locations to indicate that they extend off the ends:

$ phanotate.py NC_006820.fna -f genbank | grep CDS | tail -n5
CDS 194317..194925
CDS 194897..195151
CDS 195163..195336
CDS 195305..195595
CDS 195718..196278

Which is a bug I will need to fix for the current main branch.

In version 2.0 I may add the ability to provide a command argument to only include genes that extend off the ends if a -c --circular flag is given

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants