Skip to content

Latest commit

 

History

History
49 lines (33 loc) · 2.77 KB

README.md

File metadata and controls

49 lines (33 loc) · 2.77 KB

Cancer phylogenetics using single-cell RNA-seq data

This repository contains code to fully replicate the analysis of Cancer phylogenetics using single-cell RNA-seq data (Moravec at al. 2021). Alternatively, it can be used to perform a similar analysis on a new dataset.

Note that the analysis assumes a relatively uniform cell populations, otherwise the discretization method using Highest Density Interval will not work.

Requirements

System requirements

  • Linux operation system
  • at least 30 GB RAM
  • about 400 GB of free space for intermediate files and results

Required software

R, python3, Cellranger, bamtofastq, GATK, VCFtools, IQtree, BEAST2

R packages:

phyloRNA, beter, data.table, devtools

Python packages:

pysam

Required files:

Original data published at GEO database under the accession number GSE163210.

Human reference genome GRCh38v15, annotation and known variants.

Code from this repository.

Running the analysis

Once you have installed required software and prepared your data, navigate into the analysis directory and type:

Rscript run.r

After few days, the analysis should finish.

Processed files

Pre-processed fasta files, trees and tests of phylogenetic clustering can be seen in the processed_files branch. These files are tracked with Git Large File Storage (LFS) extension.

Detailed instruction

Need help?

If anything is unclear or you need help with the analysis, raise an issue.