Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform full WGS cohort extract scientific tieout for 35 ACMG59 samples #7106

Merged
merged 5 commits into from
Mar 10, 2021

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Feb 25, 2021

Resolves https://github.com/broadinstitute/dsp-spec-ops/issues/239

See README.md in this PR for full details


To make this easier to review, the changes break down into a few sections

  1. Docs -- the README.md. Does it make sense? Could you follow it?

  2. Comparison Script (compare_data.py)-- is it clear? Obvs any bugs would be great. The Github Issue for this PR describes what is compared

  3. WDL changes -- should be straightforward to review, just minor changes

  4. Code changes (java) -- we can walk through this together if that's more effective

Copy link
Member

@mmorgantaylor mmorgantaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered your changes in python so far - still need to go through the full thing in more detail, but the changes all make sense

scripts/variantstore/tieout/compare_data.py Show resolved Hide resolved
@@ -261,9 +271,9 @@ def compare_sample_data(e1, e2):
log_difference(key, e1, e2)

# TODO: temporary until we decide what to do with spanning deletions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this #TODO comment still applicable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep -- tied to https://github.com/broadinstitute/dsp-spec-ops/issues/143, but I'll add that to the comment as well

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked at most of it except python

POS=$2

GVCF=$(cat legacy_wdl/sample_set_membership_v6.tsv | grep $SAMPLE | cut -f2)
gsutil cat $GVCF | gunzip | grep -C 10 $POS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a zgrep command that you can use so you don't have to gunzip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know!

if((isIndel && totalAsQualApprox < INDEL_QUAL_THRESHOLD) || (!isIndel && totalAsQualApprox < SNP_QUAL_THRESHOLD)) {
// logger.info(contig + ":" + currentPosition + ": dropped for low QualApprox of " + totalAsQualApprox);
if ( printDebugInformation ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about totalAsQualApprox. it is the sum of the QualApprox values for every sample with a variant at this site (but not allele specific). this seems strange. it seems like it should be an average or some other ratio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the straight sum. I agree it seems like you would want a single high quality variant or some such... but this is exactly what is calculated in the warp pipeline.

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -297,24 +325,19 @@ private double getQUALapproxFromSampleRecord(GenericRecord sampleRecord) {

String s = o.toString();

// TODO: KCIBUL -- unclear how QUALapproxes are summed from non-ref alleles... replicating what I saw but need to confirm with Laura
// Non-AS QualApprox (used for qualapprox filter) is simply the sum of the AS values (see GnarlyGenotyper)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace "is simply" but "is approximated by"

@kcibul kcibul merged commit c4484d6 into ah_var_store Mar 10, 2021
@kcibul kcibul deleted the kc_tieout branch March 10, 2021 13:56
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
…es (#7106)

* tieout changes
* tidy up, updated README.md
* PR comments
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants