Skip to content

Predicting gene essentiality from transposon insertion sequencing (TnSeq) genomics data with machine learning

Notifications You must be signed in to change notification settings

anuraglimdi/tnseq-essential-genes

Repository files navigation

Predicting essential genes from TnSeq data

Transposon sequencing (TnSeq) is an great approach for identifying essentiality of genes/genomic features. It involves genome-wide transposon insertion mutagenesis and high throughput sequencing based fitness assays. A lot of the downstream analysis of essentiality and fitness effects involves drawing an arbitrary cutoff for calling a gene as essential or not, typically based on prior knowledge of the biological system.

In this project, I apply machine learning classification algorithms to predict gene essentiality in a transposon library collection in E. coli (published here, using the E. coli K-12 Keio knockout collection and TraDIS as a ground truth.

Goals of the project:

  • Compare performance of different classification approaches, relative to a naive arbitrary cutoff, and examine how/why they make differing predictions
  • Potentially identify non-obvious combination of TnSeq data features which underlie whether a gene is essential or not
  • Develop an approach for gene essentiality classification that can account for variation in sequencing depth and other experimental parameters
  • As a final validation of the approach, predict the essential genes in Acinetobacter balyayi, which also has both TnSeq and single-gene deletion data

About

Predicting gene essentiality from transposon insertion sequencing (TnSeq) genomics data with machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published