Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest Error Handling Fixes [VS-261] #7841

Merged
merged 2 commits into from
May 13, 2022
Merged

Conversation

mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented May 10, 2022

The main issue was that the StatusRuntimeExceptions that the baseline error handling code was trying to catch in practice always seem to be wrapped in at least one layer of exception of a different type. There was no catch handing for these wrapper exception types so the CreateVariantIngestFiles tool would simply crash.

The changes here also more generally try to follow the recommendations in the BQ Write API documentation, in particular closeing the JsonStreamWriter before retrying error codes not explicitly called out by the documentation.

EDIT: actually closing the writer didn't work out too well as we use the writer in PENDING mode and closing it seems to lose all pending writes. 😬 So in this circumstance we just throw and let WDL-level maxRetries start the data loading over from the beginning.

An exponential backoff was also added before retry attempts.

Parallel logic was also added to load status writing which should reduce (but not eliminate) the possibility of inconsistent sample status writes that require manual intervention. There is still the possibility of an inopportunely timed preemption, which is why VS-262 exists.

All of the WDL changes here are in support of a 2000-sample tieout, a large enough set that intermittent BigQuery errors are almost always observed. The tieout confirms that errors of the two major classes are seen (retryable and non-retryable) and that the number of rows per sample in the tieout dataset matches those in a reference dataset.

@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@17a5e5e). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7841   +/-   ##
================================================
  Coverage                ?   86.293%           
  Complexity              ?     35189           
================================================
  Files                   ?      2170           
  Lines                   ?    164876           
  Branches                ?     17784           
================================================
  Hits                    ?    142277           
  Misses                  ?     16276           
  Partials                ?      6323           

@@ -85,63 +87,6 @@ workflow GvsQuickstartIntegration {
}
}

task Setup {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is almost completely a cut/paste of a task from the quickstart integration test to GvsUtils.wdl so it could be reused.

break;
} catch (Exception e) {
@SuppressWarnings("ThrowableNotThrown")
StatusRuntimeException se = handleCausalStatusRuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this method doesn't actually handle anything, maybe rename to extract instead of handle or something… but only if you're making another commit to this PR anyways

@mcovarr mcovarr merged commit bfccaf6 into ah_var_store May 13, 2022
@mcovarr mcovarr deleted the vs_261_ingest_errors branch May 13, 2022 21:58
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants