Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array cohort extract #6666

Merged
merged 3 commits into from
Jun 19, 2020
Merged

array cohort extract #6666

merged 3 commits into from
Jun 19, 2020

Conversation

kcibul
Copy link
Contributor

@kcibul kcibul commented Jun 17, 2020

No description provided.

@gatk-bot
Copy link

gatk-bot commented Jun 17, 2020

Travis reported job failures from build 30710
Failures in the following jobs:

Test Type JDK Job ID Logs
unit openjdk11 30710.14 logs
unit openjdk8 30710.3 logs

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay - a test!
looks good to me. once it's merged into my branch we can refactor common code to a common place. let me know if you have strong opinions - or go ahead if you get to it first.


static {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this static block a place holder for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope just junk

static {
}

private static final Logger logger = LogManager.getLogger(ExtractCohortEngine.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to to match class name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

* This value is used to construct the genotype information of those missing samples
* when they are merged together into a {@link VariantContext} object
*/
public static int MISSING_CONF_THRESHOLD = 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can remove this


final String contig = probeInfo.contig;
final long position = probeInfo.position;
final Allele refAllele = Allele.create(refSource.queryAndPrefetch(contig, position, position).getBaseString(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if it would be easier to get the ref allele from the probe_info table - since we already use it. and then we might not need the reference as an input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about, although we still need the reference when we initialize the vcf writer to get the sequence dictionary for the header

private void finalizeCurrentVariant(final List<VariantContext> unmergedCalls, final Set<String> currentVariantSamplesSeen, final String contig, final long start, final Allele refAllele) {

// TODO: this is where we infer missing data points... once we know what we want to drop
// final Set<String> samplesNotEncountered = Sets.difference(sampleNames, currentVariantSamplesSeen);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we are going to be dropping something we aren't already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe... we need to dig into more data, and when we get to imputed data I think we will

lrr = data.lrr;
baf = data.baf;

// Genotype -- what about no-call?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is no-call the same as missing? there is a enum value for MISSING.
and if you mean how do you represent a missing allele in the genotype, there is a constant for that. i think it's Genotype.NO_CALL

}
} else {
// TODO: constantize
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can use the constants from the RawArrayFieldEnum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think these might all go away once we're storing compressed data so I didn't think too much about it

// Get the query string:
final String sampleListQueryString =
"SELECT probeId, Name, Chr, Position, Ref, AlleleA, AlleleB" +
" FROM `" + fqProbeTableName + "`";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could add a constant in ProbeInfoSchema for the list of field to get for extract. still need to figure out a package structure that makes sense since right now that class is in ingest

this.alleleA = alleleA;
this.alleleB = alleleB;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can merge the ProbeInfo class under ingest with this one

// Order is critical here, the ordinal is the int encoding
AA,AB, BB, NO_CALL
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, i think we should merge this with the enum in the ingest code (i can do this after the PR is merged)

@kcibul kcibul merged commit 49eefc5 into ah_var_store Jun 19, 2020
@kcibul kcibul deleted the kc_ah_va_store_array_extract branch June 19, 2020 18:35
ahaessly pushed a commit that referenced this pull request Aug 13, 2020
* array cohort extract

* roundtripping with binary compression

* PR comments
ahaessly pushed a commit that referenced this pull request Sep 11, 2020
* array cohort extract

* roundtripping with binary compression

* PR comments
meganshand pushed a commit that referenced this pull request Oct 6, 2020
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Jan 29, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Jan 29, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Feb 1, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Feb 1, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
Marianie-Simeon pushed a commit that referenced this pull request Feb 16, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
Marianie-Simeon pushed a commit that referenced this pull request Feb 16, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Mar 9, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
kcibul added a commit that referenced this pull request Mar 9, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
mmorgantaylor pushed a commit that referenced this pull request Apr 6, 2021
* array cohort extract

* roundtripping with binary compression

* PR comments
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants