Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add withdrawn and is_control columns [VS-70] [VS-213] #7736

Merged
merged 16 commits into from
Mar 30, 2022

Conversation

rsasch
Copy link

@rsasch rsasch commented Mar 26, 2022

From https://docs.google.com/document/d/1YxYddVhQ-ZHEjRY9_XRTLCufDAUtq-MDZYpoO1ex0wA

  • add withdrawn field (type: TIMESTAMP, nullable) to sample_info table
  • add is_control field (type: BOOLEAN, required) to sample_info table
  • add samples_are_controls boolean parameter to GvsAssignIds (false by default) which will populate that field for ingest
  • GvsCreateFilterSet.wdl (and the associated code in GATK) will check for withdrawn IS NULL
  • add control_samples boolean parameter to GvsPrepareRangesCallset.wdl (false by default)
  • GvsPrepareRangesCallset.wdl (and the associated python script) will only add sample info to the cohort __SAMPLES table if withdrawn IS NULL
  • add control_samples boolean parameter to GvsExtractCallset.wdl (false by default)
  • add GenerateSampleListFile to GvsExtractCallset to create sample list profile if it is run on participants (not controls)

Closes

echo "SELECT i.sample_name FROM \`${INFO_SCHEMA_TABLE}\` p JOIN items i ON (p.partition_id = CAST(i.sample_id AS STRING)) WHERE p.total_logical_bytes > 0 AND (table_name like 'ref_ranges_%' OR table_name like 'vet_%' OR table_name like 'pet_%')" >> query.sql
echo "UNION DISTINCT " >> query.sql
echo "SELECT i.sample_name FROM items i WHERE i.is_loaded = True " >> query.sql
echo "SELECT i.sample_name FROM items i WHERE i.is_loaded = True AND i.withdrawn IS NULL " >> query.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are doing this---is the benefit to clustering the samples table by the withdrawn col too negligible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this adds clustering, it is checking for samples that should have data loaded in ref_ranges or vet tables.

@rsasch rsasch merged commit 141023f into ah_var_store Mar 30, 2022
@rsasch rsasch deleted the rsa_add_sample_columns branch March 30, 2022 15:15
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants