Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference block storage and query support #7498

Merged
merged 9 commits into from
Nov 4, 2021
Merged

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Oct 8, 2021

Overview: see this presentation

image

WDL

  • updated WDLs to support parameterized loading of PET and/or RANGES
  • enhanced inline schemas in WDL to JSON to allow for declaring required fields

Common

  • updated AvroFileReader to use GATKPath instead of String for file, allows us to read from gs:// directly
  • changed "mode" from EXOMES/GENOMES/ARRAYS (unused) to PET/RANGES
  • promoted GQStateEnum to top-level class (it was inside PetTsvCreator but used across the codebase)
  • added numerical GQ value to GQStateEnum
  • max deletion size is 1000bp

Import

  • added flags to enable writing of PET and/or VET
  • code to create RefRanges with pluggable writer and TSV/Avro implementations

Extract

  • add parameter to parameterize inferred GQ value
  • support to read VET/Ranges data from Avro files (to support testing)
  • Entire implementation of ranges support
  • Note there is a maximum supported DELETION size. Upstream deletions larger than this will not generate downstream spanning indels

Testing

  • added new integration test for ranges extract
  • added various unit tests
  • (IN PROCESS) scientific tieout against 1k
  • scale testing up to 90k once we've move to v2 reblocking

How to perform scientific tieout

  1. Run the "GvsIngest" pipeline with load_ref_ranges = true, this will load both the PET and REF_RANGES tables
  2. Run Create Alt Allele, Training, etc as normal
  3. Extract a callset twice -- once with mode = 'PET' (the default) and once with mode = 'RANGES'
  4. Compare the resulting VCFs

@kcibul kcibul marked this pull request as ready for review October 12, 2021 15:20
@kcibul kcibul changed the title Kc ranges extract Reference block storage and query support Oct 12, 2021
-SN $sample_name \
-SNM ~{sample_map} \
--ref-version 38

gsutil -m mv pet_*.tsv ~{output_directory}/pet_tsvs/
gsutil -m mv ref_ranges_*.tsv ~{output_directory}/ref_ranges_tsvs/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming that if load_ref_ranges is false and there are therefore no ref_ranges files, gsutil either doesn't throw an error, or it does but doesn't fail the task?

@gatk-bot
Copy link

gatk-bot commented Nov 1, 2021

Travis reported job failures from build 36687
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 36687.1 logs
cloud openjdk11 36687.14 logs

@gatk-bot
Copy link

gatk-bot commented Nov 1, 2021

Travis reported job failures from build 36689
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 36689.1 logs
cloud openjdk11 36689.14 logs

@gatk-bot
Copy link

gatk-bot commented Nov 2, 2021

Travis reported job failures from build 36702
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 36702.1 logs
cloud openjdk11 36702.14 logs

Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for spending the time to go over all of this! (👍🏻 , assuming the tests pass)

* Set gcloud config directory in travis

* This fixes an issue while installing gcloud on travis due to permissions in the root directories
@kcibul kcibul merged commit f232412 into ah_var_store Nov 4, 2021
@kcibul kcibul deleted the kc_ranges_extract branch November 4, 2021 13:58
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants