-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool for arrays QC metrics calculations #6812
Conversation
"SELECT * FROM `" + genotypeCountsTable + "`"; | ||
|
||
//Execute Query | ||
final TableResult result = BigQueryUtils.executeQuery(genotypeCountQueryString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that it's MUCH faster to use the storage API here (like 10-20 times) since you're pulling out the entire table. You can see what I do for this in the ArrayExtractCohort
for probe_info
thisRow.add(String.valueOf(excessHetPval)); | ||
|
||
Double callRate = 1.0 - ((double) noCalls / sampleCount); | ||
thisRow.add(String.valueOf(callRate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to format these to any kind of fixed precision? I know some of the GATK tools go to 3 decimal places. Same fo all the doubles
Double excessHetPval = ExcessHet.calculateEH(genotypeCounts, sampleCount).getRight(); | ||
thisRow.add(String.valueOf(excessHetPval)); | ||
|
||
Double callRate = 1.0 - ((double) noCalls / sampleCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double or double?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for other types. If they can be null, use the object (Double) and be sure to handle the null case when you use them. if not use the primitive (double) and you don't have to worry!
Pulls down a temp table of genotype counts, calculates excess het and call rate and writes them to a tsv for future upload.