Bulk Ingest #8301

RoriCremer · 2023-04-29T00:32:57Z

Successful run here:
https://job-manager.dsde-prod.broadinstitute.org/jobs/41b11f26-9d55-45ad-b593-ddb5b8c78184

Bulk load data here:
https://console.cloud.google.com/bigquery?project=spec-ops-aou&ws=!1m25!1m4!4m3!1sgvs-internal!2sgg_quickstart1!3sgg-quickstart1_vat_12!1m4!1m3!1sspec-ops-aou!2sbquxjob_11f0b098_187d879575c!3sUS!1m4!4m3!1sspec-ops-aou!2sgg_quickstart!3svet_001!1m4!1m3!1sspec-ops-aou!2sbquxjob_aa2c57_187d8b92d34!3sUS!1m4!4m3!1sgvs-internal!2src_ingest_bulk_test_useability!3ssample_load_status

All that now needs to be input (assuming that we guess the additional parameters correctly)

this needs a new docker image and an interactive rebase lol

documentation attempt:
https://docs.google.com/document/d/1fxu0EnNp7ie42BtFQsSSN6QUESUiDl3fm8F5AnNkKhw/edit#heading=h.s5k25ipaom03

codecov · 2023-05-01T01:04:07Z

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@0f24625). Click here to learn what that means.
The diff coverage is n/a.

Additional details and impacted files

@@               Coverage Diff                @@
##             ah_var_store     #8301   +/-   ##
================================================
  Coverage                ?   76.562%           
  Complexity              ?     21800           
================================================
  Files                   ?      1390           
  Lines                   ?     83084           
  Branches                ?     13237           
================================================
  Hits                    ?     63611           
  Misses                  ?     14308           
  Partials                ?      5165

koncheto-broad

Minor comments, but otherwise LGTM. We're THAT much closer to breaking past the ~10k limit for single runs and greatly simplifying ingestion for AoU-scale data! :-D

scripts/variantstore/wdl/extract/get_columns_for_import.py

scripts/variantstore/wdl/GvsBulkIngestGenomes.wdl

scripts/variantstore/wdl/extract/get_columns_for_import.py

scripts/variantstore/wdl/GvsBulkIngestGenomes.wdl

mcovarr · 2023-05-15T18:23:53Z

scripts/variantstore/wdl/GvsPrepareBulkImport.wdl

    }

+    ## TODO I dont love that we are hardcoding them here and in the python--they need to be params!


why not just do that now with defaulted params?

I want to update this whole python script in my next pr where I also address sample sets

scripts/variantstore/wdl/extract/bulk_ingest_test_files/shriners_columns_for_import.json

* add Aarons changes * put terra token in python * id not bucket * hardcode for testing * do we need a new docker image? * set workspace info * pull in name from rawls * pass output locations * add back prepare * add GvsImportGenomes back * update python for grabbing cols * split methods for easier testing * set defaults, but allow optional overrides for sample table and id * add unit test for python column guessing * clean up python for testing * add proper docker * is this where the loop is coming from? * better names * remove testing artifact * add back problem lines to the test * throw out columns with values other than strings * set defaults in the right place

RoriCremer added 4 commits April 24, 2023 12:12

is this where I want to script?

0026ca2

add Aarons changes

3423646

short term get workspace name

7a4d07e

dockstore for testing

088a11c

comment out get col names

a987f0d

RoriCremer force-pushed the rc-vs-835-add-defaults-to-ingest branch 2 times, most recently from 85f4cb2 to 92681a8 Compare May 5, 2023 17:31

RoriCremer marked this pull request as ready for review May 8, 2023 14:22

RoriCremer changed the title ~~Bulk Ingest draft~~ Bulk Ingest May 8, 2023

RoriCremer force-pushed the rc-vs-835-add-defaults-to-ingest branch from 06754dd to 232e785 Compare May 8, 2023 15:48

RoriCremer added 5 commits May 8, 2023 11:51

beef up machine

e38fa5d

namespace only

d5b4ca9

update python

6a87b2e

update wdl to grab python

c627e79

put terra token in python

e000c96

RoriCremer force-pushed the rc-vs-835-add-defaults-to-ingest branch from 232e785 to 0c180be Compare May 8, 2023 15:52

RoriCremer added 11 commits May 8, 2023 11:57

maybe i should put this in scratch

d0905d3

cleanup

87b5920

id not bucket

559278a

hardcode for testing

cff9a46

make em optional

fbe1878

remove next steps in wdl for now

d514450

use a better docker image

e92f86c

do we need a new docker image?

45b9f01

set workspace info

0e67d17

pull in name from rawls

86bd2b0

pass output locations

b6f6fa4

RoriCremer force-pushed the rc-vs-835-add-defaults-to-ingest branch 2 times, most recently from 2bd58db to 7792340 Compare May 8, 2023 16:04

RoriCremer added 7 commits May 8, 2023 12:06

update python for grabbing cols

724c7c4

cleanup

d9b6e90

split methods for easier testing

91c8070

set defaults, but allow optional overrides for sample table and id

56df3b0

clean up

016595e

dont at me with the project--dockstore wouldnt update

bc138be

also here

6c40ab4

RoriCremer force-pushed the rc-vs-835-add-defaults-to-ingest branch from 7792340 to 6c40ab4 Compare May 8, 2023 16:06

RoriCremer added 13 commits May 9, 2023 11:21

add unit test for python column guessing

cc2e639

clean up python for testing

f029997

add proper docker

3b3ed89

is this where the loop is coming from?

b83d7c0

better names

4e31763

remove testing artifact

f5cd293

add back problem lines to the test

1db030e

throw out columns with values other than strings

4a1412b

clean up test---remove failure notes

f67ace8

why cant I wdl

7698198

still cant wdl

8b9d22d

megan helped me wdl

978d0af

right docker, wrong script

71c69ce

koncheto-broad approved these changes May 10, 2023

View reviewed changes

scripts/variantstore/wdl/extract/get_columns_for_import.py Outdated Show resolved Hide resolved

scripts/variantstore/wdl/extract/get_columns_for_import.py Outdated Show resolved Hide resolved

mcovarr reviewed May 15, 2023

View reviewed changes

RoriCremer added 3 commits May 16, 2023 16:24

clean up extra space

2d86ca9

clean up python script

29b4e9f

set defaults in the right place

182701d

mcovarr approved these changes May 16, 2023

View reviewed changes

RoriCremer merged commit deadfc8 into ah_var_store May 17, 2023

RoriCremer deleted the rc-vs-835-add-defaults-to-ingest branch May 17, 2023 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk Ingest #8301

Bulk Ingest #8301

RoriCremer commented Apr 29, 2023 •

edited

Loading

codecov bot commented May 1, 2023 •

edited

Loading

koncheto-broad left a comment

mcovarr May 15, 2023

RoriCremer May 16, 2023

		}

		## TODO I dont love that we are hardcoding them here and in the python--they need to be params!

Bulk Ingest #8301

Bulk Ingest #8301

Conversation

RoriCremer commented Apr 29, 2023 • edited Loading

codecov bot commented May 1, 2023 • edited Loading

Codecov Report

koncheto-broad left a comment

Choose a reason for hiding this comment

mcovarr May 15, 2023

Choose a reason for hiding this comment

RoriCremer May 16, 2023

Choose a reason for hiding this comment

RoriCremer commented Apr 29, 2023 •

edited

Loading

codecov bot commented May 1, 2023 •

edited

Loading