-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Ingest #8301
Bulk Ingest #8301
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #8301 +/- ##
================================================
Coverage ? 76.562%
Complexity ? 21800
================================================
Files ? 1390
Lines ? 83084
Branches ? 13237
================================================
Hits ? 63611
Misses ? 14308
Partials ? 5165 |
85f4cb2
to
92681a8
Compare
06754dd
to
232e785
Compare
232e785
to
0c180be
Compare
2bd58db
to
7792340
Compare
7792340
to
6c40ab4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, but otherwise LGTM. We're THAT much closer to breaking past the ~10k limit for single runs and greatly simplifying ingestion for AoU-scale data! :-D
} | ||
|
||
## TODO I dont love that we are hardcoding them here and in the python--they need to be params! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just do that now with defaulted params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to update this whole python script in my next pr where I also address sample sets
scripts/variantstore/wdl/extract/bulk_ingest_test_files/shriners_columns_for_import.json
Outdated
Show resolved
Hide resolved
* add Aarons changes * put terra token in python * id not bucket * hardcode for testing * do we need a new docker image? * set workspace info * pull in name from rawls * pass output locations * add back prepare * add GvsImportGenomes back * update python for grabbing cols * split methods for easier testing * set defaults, but allow optional overrides for sample table and id * add unit test for python column guessing * clean up python for testing * add proper docker * is this where the loop is coming from? * better names * remove testing artifact * add back problem lines to the test * throw out columns with values other than strings * set defaults in the right place
Successful run here:
https://job-manager.dsde-prod.broadinstitute.org/jobs/41b11f26-9d55-45ad-b593-ddb5b8c78184
Bulk load data here:
https://console.cloud.google.com/bigquery?project=spec-ops-aou&ws=!1m25!1m4!4m3!1sgvs-internal!2sgg_quickstart1!3sgg-quickstart1_vat_12!1m4!1m3!1sspec-ops-aou!2sbquxjob_11f0b098_187d879575c!3sUS!1m4!4m3!1sspec-ops-aou!2sgg_quickstart!3svet_001!1m4!1m3!1sspec-ops-aou!2sbquxjob_aa2c57_187d8b92d34!3sUS!1m4!4m3!1sgvs-internal!2src_ingest_bulk_test_useability!3ssample_load_status
All that now needs to be input (assuming that we guess the additional parameters correctly)
this needs a new docker image and an interactive rebase lol
documentation attempt:
https://docs.google.com/document/d/1fxu0EnNp7ie42BtFQsSSN6QUESUiDl3fm8F5AnNkKhw/edit#heading=h.s5k25ipaom03