You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently working on a tool that calls pairwise distances between large numbers of SARS-CoV-2 genome sequences. One of the issues with this kind of bioinformatics is that these sequences often contain many non-ATGC characters that represent an ambiguous base, e.g., "N" is a stand-in for any base. If my understanding is correct, the distance metrics in triple_accel would treat these characters as mismatches. Is there a way I could use triple_accel, as it's currently written, to ignore non-ATGC bases rather than counting them toward the total edit distance?
Thanks for your help and for the excellent crate!
--Nick
The text was updated successfully, but these errors were encountered:
Hey Nick! It is not currently possible to ignore certain bytes (at least without modifying the library code).
I would suggest using Block Aligner, my newer library. Note that unlike triple_accel, Block Aligner uses scoring (+ for matches, - for mismatches and indels) instead of edit distance. It also supports custom substitution matrices so you can ignore Ns. Finally, it is faster in many cases.
Hello,
I'm currently working on a tool that calls pairwise distances between large numbers of SARS-CoV-2 genome sequences. One of the issues with this kind of bioinformatics is that these sequences often contain many non-ATGC characters that represent an ambiguous base, e.g., "N" is a stand-in for any base. If my understanding is correct, the distance metrics in
triple_accel
would treat these characters as mismatches. Is there a way I could usetriple_accel
, as it's currently written, to ignore non-ATGC bases rather than counting them toward the total edit distance?Thanks for your help and for the excellent crate!
--Nick
The text was updated successfully, but these errors were encountered: