ML analysis of Zika data

This folder contains Snakemake [Köster et al., 2012] pipelines for reconstruction of evolutionary history of Zika.

The pipeline steps are detailed below.

Pipeline

0. Input data

The input data are located in the data folder and contain (1) Vietnamese sequences in the file Vietnam.fa and (2) genbank_20200811_org_Zika_virus_len_8000_14000.fa sequences, which were downloaded from GenBank [Benson et al. 2013] on 2020/08/11 with the keywords: organism “Zika virus”, and sequence length between 8000-14000 (full genome).

1. Metadata and MSA

Sampling dates and countries

The input GenBank sequences were annotated with the collection_date and country using Entrez [NCBI Resource Coordinators 2012].

Types

The sequences were typed (African vs Asian) with Genome Detective [Vilsker et al. 2019], and those with the type support < 100 removed.

MSA

The sequences were aligned against the reference [Theys et al. 2017] (which was then removed from the alignment) with MAFFT [Katoh and Standley 2013].

DIY

The metadata extraction, sequence combining and alignment pipeline Snakefile_combined_MSA is avalable in the snakemake folder and can be rerun as (from the snakemake folder):

snakemake --snakefile Snakefile_combined_MSA --keep-going --use-singularity -singularity-args "--home ~"

2. Phylogeny reconstruction

We reconstructed a maximum likelihood tree from the DNA sequences using partitioning into two groups: positions 1-2, and 3. The tree reconstruction was performed with 2 ML tools allowing for partitioning (GTRGAMMA+G6+I): RAxML-NG [Stamatakis, 2014] and IQ-TREE 2 [Minh et al., 2020], resulting in 2 trees with different topologies, the better tree (in terms of likelihood) was then selected.

The non-informative branches (<= 1/2 mutation) were then collapsed and the tree was rooted with the African outgroup (removed).

DIY

The phylogeny reconstruction pipeline Snakefile_phylogeny is avalable in the snakemake folder and can be rerun as (from the snakemake folder):

snakemake --snakefile Snakefile_phylogeny --keep-going --use-singularity -singularity-args "--home ~"

3. Dating and Phylogeography

The phylogeny was dated with LSD 2 [To et al., 2015] (with temporal outlier removal). For comparison, the phylogeny was also dated with TreeTime [Sagulenko et al., 2018]. We then reconstructed ancestral characters for country using PastML [Ishikawa et al., 2018], on the full dated tree and subsampled trees (to assess the robustness of the phylogeographic predictions).

DIY

To perform tree dating, from the snakemake folder, run the Snakefile_dating pipeline:

snakemake --snakefile Snakefile_dating --keep-going --use-singularity --singularity-args "--home ~"

To perform phylogeographic analysis, from the snakemake folder, run the Snakefile_phylogeography pipeline:

snakemake --snakefile Snakefile_phylogeography --keep-going --use-singularity --singularity-args "--home ~"

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
data		data
nextflow		nextflow
snakemake		snakemake
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML analysis of Zika data

Pipeline

0. Input data

1. Metadata and MSA

Sampling dates and countries

Types

MSA

DIY

2. Phylogeny reconstruction

DIY

3. Dating and Phylogeography

DIY

About

Releases 1

Packages

Contributors 2

Languages

evolbioinfo/zika_Vietnam

Folders and files

Latest commit

History

Repository files navigation

ML analysis of Zika data

Pipeline

0. Input data

1. Metadata and MSA

Sampling dates and countries

Types

MSA

DIY

2. Phylogeny reconstruction

DIY

3. Dating and Phylogeography

DIY

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages