Skip to content

Code repository for an upcoming publication focused on pneumococcal genome characterization.

Notifications You must be signed in to change notification settings

duncanberger/PGL_cgMLST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 

Repository files navigation

Pneumococcal genome library and core genome multilocus sequence typing scheme

This repository contains the data and the analysis workflow for the manuscript "Development of core genome multilocus sequence typing and taxonomic barcoding schemes to delineate global pneumococcal population structure."

The repository is divided into 2 sections:
1). Analysis - containing all the commands used to analyse data.
2). Figures - containing scripts (1 script per figure) used to plot the results.

Abstract

Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonisation, disease, antimicrobial resistance, and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30,976 genomes and contextual data for carriage and disease pneumococci recovered between 1916-2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1,222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds, and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme, and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.

Citations

Jansen van Rensburg, M.J., Berger, D.J., Fohrmann, A., Bray, J.E., Jolley, K.A., Maiden, M.C. and Brueggemann, A.B., 2023. Development of the Pneumococcal Genome Library, a core genome multilocus sequence typing scheme, and a taxonomic life identification number barcoding system to investigate and define pneumococcal population structure. bioRxiv, pp.2023-12. (https://www.biorxiv.org/content/10.1101/2023.12.19.571883v1)

About

Code repository for an upcoming publication focused on pneumococcal genome characterization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages