Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting gene mutations using matUtils extract and generate subsequent tree #367

Open
Biophylo2001 opened this issue Feb 28, 2024 · 1 comment

Comments

@Biophylo2001
Copy link

Biophylo2001 commented Feb 28, 2024

Hi, Is there any way to extract all mutations in a Particular gene for all samples , say in the Spike region and then build a phylogenetic tree with it
Something like this ?
matUtils extract --i my_pb_file.pb -m S_gene_mutation -t tree_with_S_gene_mutations.nwk

Or use the Mutation Annotated jsonl tree file to filter mutation in specific gene?

Thank you ,

@AngieHinrichs
Copy link
Contributor

Hi @Sanyukta2001, I don't think we have a single command to do that, but I think it could be done by extracting VCF from your protobuf tree file, filtering the VCF to keep only the Spike mutations, and running usher-sampled on the filtered VCF to build a new tree. It would go something like this, assuming the reference sequence is NC_045512.2 (Wuhan/Hu-1) for your tree so Spike coords are 21563-25384:

matUtils extract -i my_pb_file.pb -v my_pb_mutations.vcf

# Filter VCF to only mutations within the Spike gene coordinates
grep ^# my_pb_mutations.vcf > my_pb_mutations.filtered.vcf
grep -v ^# my_pb_mutations.vcf | awk '$2 >= 21563 && $2 <= 25384' >> my_pb_mutations.filtered.vcf

# Use filtered VCF to build a new tree
echo '()' > emptyTree.nwk
usher-sampled -t emptyTree.nwk -v my_pb_mutations.filtered.vcf -o my_pb_file.spikeOnly.pb

If your tree is large (say >10,000 samples) then it might be faster to add the usher-sampled option --optimization_radius 0 to prevent usher-sampled from doing rounds of optimization interspersed with adding samples from VCF, and then run matOptimize after usher-sampled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants