New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

VS-263 notes on ingest and beyond #7618

Merged

RoriCremer merged 5 commits into ah_var_store from rc-more-notes

Mar 23, 2022

Contributor

RoriCremer commented Dec 21, 2021 •

edited

Loading

these are some of my notes from our discussions over the last week---this is not a final draft, but I wanted to get this into your hands
brickbats welcome

notes for things to add:
sample sets need to be run with this.sample_set_id

there's not a good way to make sample sets from the UI---let's ask Morgan about what her process is to make them outside the UI--is there a script?

Rori Cremer and others added 2 commits

December 20, 2021 22:39


          notes in ingest

f45eecb


          default drop state

872e6ac

RoriCremer changed the title ~~notes on ingest and beyond~~ VS-263 notes on ingest and beyond

rsasch requested changes

View reviewed changes

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated Show resolved Hide resolved

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated

+              **Note:**
+              Samples that will be batched and loaded together must be put into a sample_set ahead of time, otherwise their loading may cause conflicts.
+              This workflow must be done piecemeal if over 4000 samples are to be loaded as only 4000 samples can be loaded in at a time. The best way to do this currently is to create sample_sets of 4000 samples each.
+              The workflow can then be run once for each sample_set. if the same sample_set is inadvertantly run twice the workflow will detect that the samples already exist in the system and the second duplicate workflow will fail.

rsasch Mar 2, 2022

Suggested change

      
            The workflow can then be run once for each sample_set. if the same sample_set is inadvertantly run twice the workflow will detect that the samples already exist in the system and the second duplicate workflow will fail.
          
            The workflow can then be run once for each sample_set. If the same sample_set is inadvertently run twice the workflow will detect that the samples already exist in the system and the workflow will fail.

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated

Comment on lines 76 to 80

+              If any of the imports have failed on a single sample, check that all of the other samples have been loaded in that sample_set during that workflow. Sometimes a sample will fail during loading while there are still samples in the queue waiting for loading to begin. Because of the failure, these samples will not be loaded at all.
+              Keep track of the samples that have not been loaded whether because they failed, or because they were in the queue when another sample failed. They will need to be added later.
+              Once all sample_sets have been run, if there have been any failures, collect all non-loaded samples together in a new sample_set and load that in.

rsasch Mar 2, 2022

It probably makes sense to just add a query to check for samples that are in sample_sets but never made it to thesample_load_status table to capture the ones that need to be put into a new sample_set. That way the user doesn't have to comb through past runs.

Contributor Author

RoriCremer Mar 4, 2022 •

edited

Loading

that we would do as part of the import genomes WDL? should I make a ticket for this?

rsasch Mar 4, 2022

Like the other queries that you include in this doc, you could include a sample query that lists all the samples that are in the sample_info table but not in the sample_load_status, which means the loading was never kicked off. The user could use this instead of having to keep track during ingest in order to figure out which samples those were.

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated Show resolved Hide resolved

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated Show resolved Hide resolved

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated Show resolved Hide resolved

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated

               **Note:** This workflow does not use the Terra Entity model to run, so be sure to select `Run workflow with inputs defined by file paths`
+              Sometimes this workflow will fail because the Gaussians have not cenverged. Dont panic! It can happen to anyone's data!
+              The first step in this case will be to adjust the Guassian for the failed step (there are two possible steps: model creation for the SNPS and model creation for the InDels) to a lower number.

rsasch Mar 2, 2022

Please include the specific inputs names for the gaussian values to change so that the user can fill them out easily.

scripts/variantstore/AOU_GVS_WORKSPACE.md Outdated

Comment on lines 135 to 136

		You can then kick off the workflow again. If that still does not work, or you would prefer to not change the number of Guassians, then you can remove a column from the model creation
		TODO: How do you remove a column from the model creation?

rsasch Mar 2, 2022

I don't know if we want to include this in the docs, since we don't have a real process for it.

Contributor Author

RoriCremer Mar 4, 2022

like completely skip that a column could be removed? or specifically the TODO?

rsasch Mar 4, 2022

We don't have a real process for how to pick which column (if that's the same as annotation) to remove.

RoriCremer added 3 commits

March 7, 2022 11:39


          remove cost info

19f246a


          gaussian notes

edb0b37


          add query for mising samples

481510e

rsasch approved these changes

View reviewed changes

RoriCremer merged commit 3b4c5ba into ah_var_store

RoriCremer deleted the rc-more-notes branch

March 23, 2022 20:14

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet