Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateVariantIngestFiles robust to partially / fully loaded samples [VS-262] #7843

Merged
merged 1 commit into from
May 13, 2022

Conversation

mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented May 10, 2022

Makes CreateVariantIngestFiles robust to partially or fully loaded samples.

Commit 21828af is what I actually propose to merge, while commit de67320 randomly injects failures covering all the known failure modes. I tested these changes using both commits and was able to verify that partially loaded samples were handled correctly on subsequent attempts to load the sample (unfortunately we can't actually prevent these partial loadings from happening in the first place because preemptions, among other possible reasons).

@mcovarr mcovarr changed the title Vs 262 partial loading independent CreateVariantIngestFiles robust to partially / fully loaded samples [VS-262] May 10, 2022
@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@900651f). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head dad538e differs from pull request most recent head 42bfa0b. Consider uploading reports for the commit 42bfa0b to get more accurate results

@@               Coverage Diff                @@
##             ah_var_store     #7843   +/-   ##
================================================
  Coverage                ?   86.304%           
  Complexity              ?     35190           
================================================
  Files                   ?      2170           
  Lines                   ?    164844           
  Branches                ?     17783           
================================================
  Hits                    ?    142267           
  Misses                  ?     16254           
  Partials                ?      6323           

Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@@ -37,6 +37,10 @@ public final class RefCreator {
private static final String PREFIX_SEPARATOR = "_";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yours, but it distresses me to see 'private static final', followed by 'private final static'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol given how nitpicky IntelliJ is in general it seems strange it doesn't complain about this

@@ -37,6 +37,10 @@ public final class RefCreator {
private static final String PREFIX_SEPARATOR = "_";
private final static String REF_RANGES_FILETYPE_PREFIX = "ref_ranges_";

public static boolean doRowsExistFor(CommonCode.OutputType outputType, String projectId, String datasetName, String tableNumber, String sampleId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test this? Is there even a test frameworkf for BigQuery?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test this manually in this run where shards 2 and 3 exercised all of the injected failure conditions on their failed attempts, during which they would have called this code. There are some BQ "unit tests" in BigQueryUtilsUnitTest but I'm not sure how to create an automated test for these conditions.

@@ -37,6 +37,10 @@ public final class RefCreator {
private static final String PREFIX_SEPARATOR = "_";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol given how nitpicky IntelliJ is in general it seems strange it doesn't complain about this

@@ -37,6 +37,10 @@ public final class RefCreator {
private static final String PREFIX_SEPARATOR = "_";
private final static String REF_RANGES_FILETYPE_PREFIX = "ref_ranges_";

public static boolean doRowsExistFor(CommonCode.OutputType outputType, String projectId, String datasetName, String tableNumber, String sampleId) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did test this manually in this run where shards 2 and 3 exercised all of the injected failure conditions on their failed attempts, during which they would have called this code. There are some BQ "unit tests" in BigQueryUtilsUnitTest but I'm not sure how to create an automated test for these conditions.

@@ -154,49 +158,6 @@ private void setCoveredInterval(String variantChr, int start, int end) {
previousInterval = new SimpleInterval(possiblyMergedGenomeLoc);
}

public List<List<String>> createRows(final long start, final long end, final VariantContext variant, final String sampleId) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by deletion of code IntelliJ flagged as unused

@@ -217,9 +205,51 @@ public void onTraversalStart() {
logger.info("Sample id " + sampleId + " was detected as already loaded, exiting successfully.");
System.exit(0);
} else if (state == LoadStatus.LoadState.PARTIAL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this to be PARTIAL || NONE? What happens if we've never attempted to load this sample before (which is the most common case by far)

Copy link
Collaborator Author

@mcovarr mcovarr May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sample load state is NONE there shouldn't be any ref ranges or vet rows, so no point running the expensive-ish queries below checking for rows. For load state NONE the refRangesRowsExist and vetRowsExist variables on lines 193-194 would remain at their initialized values of false which cause the ref ranges and vet *Creators to be constructed on lines 248 and 252.

loadStatus.writeLoadStatusStarted(Long.parseLong(sampleId));
}

if (refCreator != null) {
if (enableReferenceRanges && refCreator != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control logic feels a little complex here between flags and null checks combined with the check for rows (which represents it's decision by the null for refCreator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove these specific "enable" flag reads since they're redundant to the creation of the *Creators.

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one style/clarity thing, and one major question about Load Status

@mcovarr mcovarr requested a review from kcibul May 12, 2022 13:37
@mcovarr mcovarr force-pushed the vs_262_partial_loading_independent branch 2 times, most recently from de67320 to 0096b7c Compare May 12, 2022 15:19
Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- assuming the testing code will be removed (ie the mod stuff)

@mcovarr mcovarr force-pushed the vs_262_partial_loading_independent branch from 0096b7c to 42bfa0b Compare May 13, 2022 15:57
@mcovarr mcovarr merged commit 17a5e5e into ah_var_store May 13, 2022
@mcovarr mcovarr deleted the vs_262_partial_loading_independent branch May 13, 2022 17:25
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants