Skip to content

evolbioinfo/zika_Vietnam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML analysis of Zika data

This folder contains Snakemake [Köster et al., 2012] pipelines for reconstruction of evolutionary history of Zika.

The pipeline steps are detailed below.

Pipeline

0. Input data

The input data are located in the data folder and contain (1) Vietnamese sequences in the file Vietnam.fa and (2) genbank_20200811_org_Zika_virus_len_8000_14000.fa sequences, which were downloaded from GenBank [Benson et al. 2013] on 2020/08/11 with the keywords: organism “Zika virus”, and sequence length between 8000-14000 (full genome).

1. Metadata and MSA

Sampling dates and countries

The input GenBank sequences were annotated with the collection_date and country using Entrez [NCBI Resource Coordinators 2012].

Types

The sequences were typed (African vs Asian) with Genome Detective [Vilsker et al. 2019], and those with the type support < 100 removed.

MSA

The sequences were aligned against the reference [Theys et al. 2017] (which was then removed from the alignment) with MAFFT [Katoh and Standley 2013].

DIY

The metadata extraction, sequence combining and alignment pipeline Snakefile_combined_MSA is avalable in the snakemake folder and can be rerun as (from the snakemake folder):

snakemake --snakefile Snakefile_combined_MSA --keep-going --use-singularity -singularity-args "--home ~"

MSA pipeline

2. Phylogeny reconstruction

We reconstructed a maximum likelihood tree from the DNA sequences using partitioning into two groups: positions 1-2, and 3. The tree reconstruction was performed with 2 ML tools allowing for partitioning (GTRGAMMA+G6+I): RAxML-NG [Stamatakis, 2014] and IQ-TREE 2 [Minh et al., 2020], resulting in 2 trees with different topologies, the better tree (in terms of likelihood) was then selected.

The non-informative branches (<= 1/2 mutation) were then collapsed and the tree was rooted with the African outgroup (removed).

DIY

The phylogeny reconstruction pipeline Snakefile_phylogeny is avalable in the snakemake folder and can be rerun as (from the snakemake folder):

snakemake --snakefile Snakefile_phylogeny --keep-going --use-singularity -singularity-args "--home ~"

phylogeny reconstruction pipeline

3. Dating and Phylogeography

The phylogeny was dated with LSD 2 [To et al., 2015] (with temporal outlier removal). For comparison, the phylogeny was also dated with TreeTime [Sagulenko et al., 2018]. We then reconstructed ancestral characters for country using PastML [Ishikawa et al., 2018], on the full dated tree and subsampled trees (to assess the robustness of the phylogeographic predictions).

DIY

To perform tree dating, from the snakemake folder, run the Snakefile_dating pipeline:

snakemake --snakefile Snakefile_dating --keep-going --use-singularity --singularity-args "--home ~"

Dating pipeline To perform phylogeographic analysis, from the snakemake folder, run the Snakefile_phylogeography pipeline:

snakemake --snakefile Snakefile_phylogeography --keep-going --use-singularity --singularity-args "--home ~"

Phylogeographic pipeline