This repository contains the data and the analysis workflow for the manuscript "Development of core genome multilocus sequence typing and taxonomic barcoding schemes to delineate global pneumococcal population structure."
The repository is divided into 2 sections:
1). Analysis - containing all the commands used to analyse data.
2). Figures - containing scripts (1 script per figure) used to plot the results.
Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonisation, disease, antimicrobial resistance, and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30,976 genomes and contextual data for carriage and disease pneumococci recovered between 1916-2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1,222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds, and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme, and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.
Jansen van Rensburg, M.J., Berger, D.J., Fohrmann, A., Bray, J.E., Jolley, K.A., Maiden, M.C. and Brueggemann, A.B., 2023. Development of the Pneumococcal Genome Library, a core genome multilocus sequence typing scheme, and a taxonomic life identification number barcoding system to investigate and define pneumococcal population structure. bioRxiv, pp.2023-12. (https://www.biorxiv.org/content/10.1101/2023.12.19.571883v1)