Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Nextclade for phylogenetic informed QC #55

Open
corneliusroemer opened this issue Oct 10, 2023 · 1 comment
Open

ENH: Add Nextclade for phylogenetic informed QC #55

corneliusroemer opened this issue Oct 10, 2023 · 1 comment

Comments

@corneliusroemer
Copy link

Is your feature request related to a problem? Please describe.
This looks like a great pipeline that includes a lot of useful tools. I've noticed that you're still adding things to it.

One thing that could make it even better is to integrate Nextclade (disclosure: I work on it)

Besides providing an alternative way to call Pango lineages for SARS-CoV-2, Nextclade generalizes clade calling beyond SARS-CoV-2, e.g. Influenza, mpox, etc.

It also outputs helpful QC metrics of the final consensus genome output:

  • Reversion mutations relative to nearest neighbor from reference tree(if many this indicates artefacts due to pipeline misconfiguration or contamination or recombination)
  • frameshifts -> frameshifts are often a sign of pipeline making an off-by-one indel errors
  • stop codons -> indicative of artefacts if in essential genomes
  • ambiguous mutations -> can indicate contamination

Nextclade also places a sequence on a phylogenetic tree, which should be appreciated by end users.

Describe the solution you'd like
Include Nextclade into the pipeline

Additional context
I'd be happy to assist with development of this feature.

@priesgo
Copy link
Member

priesgo commented Oct 10, 2023

Thanks for your feedback @corneliusroemer , it is a nice compliment coming from someone working in Nextclade!

The QC metrics are very interesting, we will need to understand more of it and think how to deal with it in the pipeline and the dashboard. Also, we have had some performance issues with pangolin when processing a massive number of samples. do you have any comparison of both tools in terms of computational performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants