Skip to content

A probably over-engineered bash-script for starting parallel runs of structure for different population sizes and numbers of replicates.

License

Notifications You must be signed in to change notification settings

alkc/parallel-structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

parallel-structure

Glued-together bash script for running parallel runs "Structure" for population genetics inference for different values of K and reps.

You will need both parallel and structure installed — an easy task with conda:

conda install -c bioconda parallel structure

Example

To check if the script works, please use the included example data set. More info about the sample data can be found at: https://web.stanford.edu/group/pritchardlab/software/structure-data_v.2.3.1.html (testdata1)

Please run the following command from the script directory:

bash parallel-structure.sh example-data/mainparams example-data/extraparams example-data/testdata1 output_dir 1 3 5 8

In the last four digits of the above command you are able to set, in the following order: minimum K, maximum K, number of repetitions and number of parallel jobs.

The command starts 8 parallel jobs for K=1 to K=3 with 5 replicates for each tested value K.

All output is saved to output_dir/

Citation

DOI

If this script has been useful to you and you think more researchers would benefit from knowing about it, then feel free to cite it at as follows:

More importantly, you should probably cite both structure and parallel on which this script relies.

For more info about how to cite Structure please refer to page 37 of the official Structure 3.4 manual (PDF)

For more info about how to cite GNU parallel, please look here https://doi.org/10.5281/zenodo.1146014 (or run parallel --citation in the bash prompt!).

CHANGELOG

version 0.6.1

  • Prepare for release on Zenodo
  • UPDATED README with better description

version 0.6 <2021-04-13>

  • FIXED bug where replicate runs started with the same seed, which defeated the purpose of reps.
  • ADDED Ability to set min K, max K, number of reps and number of parallel jobs from the command line
  • ADDED More informative error messages if files missing at specified paths

TODO:

  • Add installation instructions?
  • Add long named parameters (probably requires moving away from using bash?)
  • Add some parameter validation (e.g. exit with informative error if input files do not exist)
  • Make nbr parallel jobs parameter optional (default to nproc - 1?)

About

A probably over-engineered bash-script for starting parallel runs of structure for different population sizes and numbers of replicates.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages