Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial notebook copy pasta #8008

Merged
merged 5 commits into from
Sep 16, 2022
Merged

Conversation

RoriCremer
Copy link
Contributor

No description provided.

@codecov
Copy link

codecov bot commented Sep 1, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@d52f05d). Click here to learn what that means.
The diff coverage is n/a.

Additional details and impacted files
@@               Coverage Diff                @@
##             ah_var_store     #8008   +/-   ##
================================================
  Coverage                ?   86.243%           
  Complexity              ?     35196           
================================================
  Files                   ?      2173           
  Lines                   ?    165004           
  Branches                ?     17792           
================================================
  Hits                    ?    142304           
  Misses                  ?     16373           
  Partials                ?      6327           

Comment on lines 84 to 87
# copy the reference data to set in hail
# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz' .
# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.fai' .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, the add_sequence call on lines 87/88 reads the references from GCS directly.

Suggested change
# copy the reference data to set in hail
# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz' .
# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.fai' .

# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz' .
# gsutil -m cp 'gs://hail-common/references/Homo_sapiens_assembly38.fasta.fai' .

## Now RESTART the Kernal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary to restart the kernel in the notebook when working from the terminal, which is the expectation per lines 68-69.

Suggested change
## Now RESTART the Kernal

Comment on lines 139 to 143





Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

## * Replace LGT with GT ( for easier calculations later )
filtered_vd = filtered_vds.variant_data
filtered_vd = filtered_vd.annotate_entries(GT=hl.vds.lgt_to_gt(filtered_vd.LGT, filtered_vd.LA) )
filtered_vds = hl.vds.VariantDataset(filtered_vds.reference_data, filtered_vd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like filtered_vds needs to be assigned here, it's never read before being reassigned on line 125.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the logic of it there, but you are right that it's overkill


## * Respect the FT flag by setting all failing GTs to a no call
# TODO We dont seem to be using the dense matrix table here (TODO do we need to?)
filtered_vd = filtered_vd.annotate_entries(GT=hl.or_missing(hl.coalesce(filtered_vd.FT, True), filtered_vd.GT))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment for us non Hail wizards. 🙂 My (possibly incorrect) interpretation, working inside out:

coalesce returns the first non-missing value, so I think this is saying to return filtered_vt.FT if it is non-missing, otherwise True.

or_missing takes as its first argument a boolean predicate which if true returns the second argument, otherwise missing.

Putting it together:

filtered_vd.FT is True ⇒ GT keeps its current value
filtered_vd.FT is False ⇒ GT assigned missing
filtered_vd.FT is missing ⇒ GT keeps its current value

which actually doesn't seem right... 🤔 should this be ~filtered_vd.FT to keep GT if the filter is True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does (and should) keep GT if the filter is True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I need to reverse the logic in my brain for how this works in Hail; work in progress... 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just keep reciting "filter false is filter fail"

@mcovarr mcovarr self-requested a review September 6, 2022 17:42
@RoriCremer RoriCremer marked this pull request as ready for review September 6, 2022 20:30
@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3056000079
Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 11 3056000079.11 logs

@RoriCremer RoriCremer merged commit 53cf8a7 into ah_var_store Sep 16, 2022
@RoriCremer RoriCremer deleted the rc-vs-616-notebook-for-hail branch September 16, 2022 14:42
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants