Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First cut at a python notebook to validate inputs. #7845

Merged
merged 1 commit into from
May 16, 2022

Conversation

gbggrant
Copy link
Collaborator

Working for sample_set now.

@gbggrant gbggrant requested a review from rsasch May 10, 2022 20:04
@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@bfccaf6). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 96b1120 differs from pull request most recent head 3dc3916. Consider uploading reports for the commit 3dc3916 to get more accurate results

@@               Coverage Diff                @@
##             ah_var_store     #7845   +/-   ##
================================================
  Coverage                ?   86.305%           
  Complexity              ?     35190           
================================================
  Files                   ?      2170           
  Lines                   ?    164837           
  Branches                ?     17775           
================================================
  Hits                    ?    142263           
  Misses                  ?     16251           
  Partials                ?      6323           

@gbggrant gbggrant marked this pull request as ready for review May 11, 2022 21:12
scripts/variantstore/InputValidation.ipynb Outdated Show resolved Hide resolved
scripts/variantstore/InputValidation.ipynb Outdated Show resolved Hide resolved
"sample_set = fapi.get_entity(ws_project, ws_name, \"sample_set\", sample_set_id).json()\n",
"if (\"attributes\" not in sample_set):\n",
" errors_seen = True\n",
" print(\"ERROR: Looking up \" + sample_set_id + \": ''\" + sample_set[\"message\"] + \"''\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what's up with these double single quotes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have cleaned those all up.

" errors_seen = True\n",
" print(\"ERROR: Looking up \" + sample_set_id + \": ''\" + sample_set[\"message\"] + \"''\")\n",
" \n",
"attributes = sample_set[\"attributes\"]\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this going to blow up if the attribute wasn't present on line 136?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thanks - have addressed this.

"num_pages = int(math.ceil(float(entity_count) / page_size))\n",
"\n",
"# get entities by page where each page has page_size # of rows using API call\n",
"#print('Getting all {num_pages} pages of entity data.')\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug cruft

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone

"for page in tqdm(range(1, num_pages + 1)):\n",
" page_of_entitites = fapi.get_entities_query(ws_project, ws_name, etype, page=page,\n",
" page_size=page_size).json()#, sort_direction='asc',\n",
"# filter_terms=attribute_names).json()\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone too.

scripts/variantstore/InputValidation.ipynb Outdated Show resolved Hide resolved
"\n",
"# Inspect samples table - determine possibe names for reblocked_gvcfs.\n",
"etype = 'sample'\n",
"entity_types = fapi.list_entity_types(ws_project, ws_name).json()\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that the entity types we care about are actually in here (sample_set, sample, etc)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good suggestion - incorporated in latest commit.

" reblocked_gvcf_index_name = reblocked_gvcf_index.split('/')[-1]\n",
" if (reblocked_gvcf_index_name != expected_reblocked_gvcf_index_name):\n",
" errors_seen = True\n",
" print(\"ERROR: Did not find expected index file (named: ''\" + expected_reblocked_gvcf_index_name + \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paren and double single quote issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have cleaned those all up.

@gbggrant gbggrant requested a review from mcovarr May 12, 2022 15:09
" if (error_seen):\n",
" errors_seen = True\n",
"\n",
" # Inspect sample table - determine possible names for reblocked_gvcf indices.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun fact: three counts of indices, five counts of indexes 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go back and forth on that.

" field_name = list(field_names_found)[0]\n",
" else:\n",
" error_seen = True\n",
" print(f\"ERROR: There are multiple columns in the 'samples' datatable {str(field_names_found)} that potentially contain reblocked gvcfs\")\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the sample datatable here and on line 68?

Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for some small nits (and one more for good measure, "sample set" or "sample_set" not both).

"This python notebook is intended to allow you to quickly validate the inputs for a Joint Call Set.\n",
"To run it:\n",
"\n",
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process\n",
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process.\n",

"- the sample set that you have listed is found\n",
"- there are no duplicate samples in the sample set\n",
"- there are no empty sample names in the sample set\n",
"- each sample has a corresponding reblocked_gvcf index\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
"- each sample has a corresponding reblocked_gvcf index\n",
"- each sample has a reblocked gVCF and a corresponding reblocked gVCF index\n",

Working for sample_set now.
Added Introduction
Fixed error in looking up reblocked gvcf.
@gbggrant gbggrant merged commit f58e9b2 into ah_var_store May 16, 2022
@gbggrant gbggrant deleted the gg_VS-310_FailFast branch May 16, 2022 16:59
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants