Skip to content

NCI-CGR/plco-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

plco-analysis: reproducible post-imputation association studies

Overview

This pipeline system is designed to enable reproducible post-imputation analysis, primarily with (G)LMM software packages, as well as meta-analysis and QC plotting. The pipelines are written in Make and interface seamlessly with computational clusters; This software was initially designed for the "PLCO Atlas" project, with the intent of running hundreds of association studies with as little person-power as possible, though with little effort it can be used for more reasonably-scoped analyses in standard GWAS. It is highly configurable and extensible, specifically regarding association tools that can be modularly added, either with Make pipelines or tools in other languages.

Installation Instructions

See installation instructions on readthedocs

Development Schedule

v2.0 (platform-independent build for PLCO Atlas tranche 2)
  • BOLT-LMM support
  • fastGWA support
  • SAIGE support (binary traits)
  • SAIGE support (categorical traits)
  • meta-analysis with metal
    • support for fastGWA
    • support for BOLT-LMM
    • support for SAIGE/binary
    • support for SAIGE/categorical
  • rsID support request
  • full resumability
    • resumable for cluster-submitted jobs
    • resumable for non-cluster jobs
  • full logging
    • logging for cluster-submitted jobs
    • logging for non-cluster jobs
  • SGE/qsub support
  • configuration via YAML
  • more efficient yaml access in preprocessor
  • testing via yaml
  • heuristic testing to support above
  • hunt down last untracked auxiliary files
  • complete (straightforward and documented) platform independence with conda
  • documentation: R-style vignette for generalized usage
  • this README
v3.0 (approximately corresponding with the end of PLCO Atlas)
  • polmm/ordinal phenotype support
  • top-level parameter exposure for analysis tools
  • validated slurm support
  • scalable testing with per-test dependency specification
  • force post-primary analysis tools to ignore analysis results absent from config
  • heuristic testing to support above
  • documentation: full installation for multiple platforms, clusters; possibly docker
  • documentation: doxygen support
  • this README
v4.0 (the Confluence build)
  • config-level parameter exposure for analysis tools
  • integration of external meta-analysis files
  • distributed meta-analysis best practice QC measures
  • LSF support
  • heuristic testing to support above
  • this README

Version History

  • 30 January 2021: remove manual dependency tracking in favor of conda and real installation instructions

  • 13 January 2021: urgent patches. v1.0.0 is merely for recordkeeping for T1 run

  • 12 January 2021: initial migration to CGR GitHub! v1.0.0, for tranche 1