Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable call caching of TSV generation in GvsImportGenomes #7226

Merged
merged 23 commits into from
Apr 23, 2021

Conversation

mmorgantaylor
Copy link
Member

@mmorgantaylor mmorgantaylor commented Apr 23, 2021

this PR:

  • changes CreateVariantIngestFiles to name the output files in a predictable way - i.e. rather than using a sample_id, it uses the name of the input gvcf. e.g. pet_001_NA12878.tsv becomes pet_001_NA12878.haplotypeCalls.reblocked.vcf.gz.tsv

    • added a test in CreateVariantIngestFilesIntegrationTest to assert that the files are named as expected
  • changes the GvsImportGenomes.wdl to:

    • check whether, for the given input gvcf file and for each of pet, vet, and sample_info, the output TSV already exists somewhere in the output directory. it checks subdirectories.
    • if the output TSV exists in a set_X subdirectory, we move that file back into the parent directory so that subsetting works as desired when we get to LoadTables
    • if the output TSV exists in a done subdirectory, we exit with an error

notes:

  • this does not check whether the sample is in the same table_id (e.g. pet_001 versus pet_002)

this has been tested as follows:

  • ran once with an exit 1 before bq load, to simulate generating TSVs and putting them into set_X subdirectories and then exiting, simulating a permissions or other bq issue
  • removed LOCKFILE, removed exit before bq load, then ran again - TSVs were not regenerated, the existing ones were moved into the parent directory and loaded properly into bq
  • then ran again with the same samples - as expected, errored out because the TSVs already existed in a done folder

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- I think this pushes the limits of what we should do in bash (for future maintainers) but does the trick.

@mmorgantaylor mmorgantaylor merged commit ac1a9b6 into ah_var_store Apr 23, 2021
@mmorgantaylor mmorgantaylor deleted the mmt_call_cache_IG branch April 23, 2021 22:31
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants