Question: Is it possible to ignore certain characters? #15

nrminor · 2024-02-04T22:14:27Z

Hello,

I'm currently working on a tool that calls pairwise distances between large numbers of SARS-CoV-2 genome sequences. One of the issues with this kind of bioinformatics is that these sequences often contain many non-ATGC characters that represent an ambiguous base, e.g., "N" is a stand-in for any base. If my understanding is correct, the distance metrics in triple_accel would treat these characters as mismatches. Is there a way I could use triple_accel, as it's currently written, to ignore non-ATGC bases rather than counting them toward the total edit distance?

Thanks for your help and for the excellent crate!
--Nick

The text was updated successfully, but these errors were encountered:

Daniel-Liu-c0deb0t · 2024-02-05T01:34:05Z

Hey Nick! It is not currently possible to ignore certain bytes (at least without modifying the library code).

I would suggest using Block Aligner, my newer library. Note that unlike triple_accel, Block Aligner uses scoring (+ for matches, - for mismatches and indels) instead of edit distance. It also supports custom substitution matrices so you can ignore Ns. Finally, it is faster in many cases.

nrminor · 2024-02-05T16:32:51Z

Block Aligner looks great, Daniel! I will definitely give it a try in my tool. Thanks for the quick reply and for pointing me in that direction!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Is it possible to ignore certain characters? #15

Question: Is it possible to ignore certain characters? #15

nrminor commented Feb 4, 2024

Daniel-Liu-c0deb0t commented Feb 5, 2024

nrminor commented Feb 5, 2024

Question: Is it possible to ignore certain characters? #15

Question: Is it possible to ignore certain characters? #15

Comments

nrminor commented Feb 4, 2024

Daniel-Liu-c0deb0t commented Feb 5, 2024

nrminor commented Feb 5, 2024