Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variant reported at wrong frequency #344

Open
insapathogenomics opened this issue Apr 23, 2024 · 3 comments
Open

Variant reported at wrong frequency #344

insapathogenomics opened this issue Apr 23, 2024 · 3 comments

Comments

@insapathogenomics
Copy link

Hello,

Tb-profiler v6.2.0 calls a variant in fbiC at 100% frequency, with associated resistance to Delamanid and Pretonamid, that only exists in 2 or 3 reads.

1305494 Rv1173 fbiC frameshift_variant&stop_lost&splice_region_variant c.2565_*55delGGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCT 1.000 delamanid,pretomanid Assoc w R - Interim Confers DLM-PMD cross-resistance

For examples use ENA isolates ERR2864287 or ERR3148548.

Thanks,
Miguel Pinto

image

@jodyphelan
Copy link
Owner

Hi @insapathogenomics,

Thanks for reporting this, I'll take a look.

@jodyphelan
Copy link
Owner

jodyphelan commented Apr 24, 2024

Looking at ERR2864287 I think there might actually be a real deletion in this sample. In IGV if you enable "Show soft-clipped bases" you should see a picture like this:

image

You should be able to see that almost all of the reads either have the deletion in the middle or they are clipped (indicated by the highly coloured regions at the end of the reads). The reason that you don't see many of the reads fully spanning the deletion is due to the fact that a better alignment score will often be reached by just clipping the read instead of introducing a large gap close to the edge of a read. The region seems to have a tandem repeat (indicated by the bottom track) with the repeat sequence being

GGCCTAGCCCCGGCGACGATGCCGGGTCGCGGGATGCGGCCCGTTGAGGAGCGGGGCAATCT

This is also the exact sequence which is deleted.

This is also backed up by doing an assembly of the data and aligning the two regions up you can see one has the sequence deleted.

image

Going back to the IGV screenshot - you can see that pretty much every read either spans the deletion or is clipped. Delly correctly identifies this and that is why the frequency is 1 in the result.

This is just a quick analysis but it does seem to be a real variant at a high frequency. Now the next question would be if it actually causes resistance. From the IGV screenshot you can see it only occurs right at the end of the gene, so most of the protein should be in-tact and might still function normally.

@jodyphelan
Copy link
Owner

Update on this: I've looked into a bit more and due to the deleted sequence being a tamdem repeat, with the stop codon being in the repeat, it will still produce the same protein sequence. So this should not give rise to any resistance.

You can see below I've just adjusted the aligned region so that the repeat is now aligned to the left and it still produced a stop codon in the same position (2571-2573)

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants