Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the uber_monitor.py script #8268

Merged
merged 24 commits into from
Apr 16, 2023
Merged

Conversation

gbggrant
Copy link
Collaborator

Adding the uber_monitor.py script to GvsUtils.wdl
Threaded it into GvsCreateFilterSet
Included a unit test.

@gbggrant gbggrant requested review from mcovarr and rsasch March 28, 2023 20:27
scripts/variantstore/wdl/GvsCreateFilterSet.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsUtils.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsUtils.wdl Outdated Show resolved Hide resolved
scripts/variantstore/wdl/GvsUtils.wdl Outdated Show resolved Hide resolved
scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl Outdated Show resolved Hide resolved
scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl Outdated Show resolved Hide resolved
scripts/vcf_site_level_filtering_wdl/JointVcfFiltering.wdl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Mar 28, 2023

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@0f24625). Click here to learn what that means.
The diff coverage is n/a.

Additional details and impacted files
@@               Coverage Diff                @@
##             ah_var_store     #8268   +/-   ##
================================================
  Coverage                ?   86.097%           
  Complexity              ?     35609           
================================================
  Files                   ?      2197           
  Lines                   ?    167119           
  Branches                ?     18006           
================================================
  Hits                    ?    143884           
  Misses                  ?     16800           
  Partials                ?      6435           

Int disk_size = if (defined(split_intervals_disk_size_override)) then select_first([split_intervals_disk_size_override]) else 10
Int disk_memory = if (defined(split_intervals_mem_override)) then select_first([split_intervals_mem_override]) else 16
Int disk_size = select_first([split_intervals_disk_size_override, 10])
Int disk_memory = select_first([split_intervals_mem_override, 16])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix - noticed by miniwdl

@@ -137,6 +145,15 @@ workflow JointVcfFiltering {
Array[File] indels_variant_scored_vcf_index = ScoreVariantAnnotationsINDELs.output_vcf_index
Array[File] snps_variant_scored_vcf = ScoreVariantAnnotationsSNPs.output_vcf
Array[File] snps_variant_scored_vcf_index = ScoreVariantAnnotationsSNPs.output_vcf_index
Array[File?] monitoring_logs = flatten(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pondering if the pattern should be that any given workflow also call and output uber_monitor's summary file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what this is asking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I have JointVcfFiltering itself call summarize_task_monitor_logs and generate its own report in addition to providing the outputs for the parent workflow to summarize the logs.

@@ -357,10 +358,44 @@ workflow GvsCreateFilterSet {
}
}

call Utils.UberMonitor as UberMonitorItAll {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will output a summary file for all tasks - used or not.

@gbggrant
Copy link
Collaborator Author

gbggrant commented Apr 4, 2023

Successful VQSR Lite Run (with monitoring summary output) here
Successful VQSR Classic Run (with monitoring summary output) here

@gbggrant gbggrant requested a review from mcovarr April 4, 2023 19:30
Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes to globals requested. Also both the uber script and its test could use some PEP 8 love as I expect PEP 8 to be among the validations we'll run automatically in the new repo. 🙂 IntelliJ / PyCharm should provide PEP8 warnings by default.

Comment on lines 6 to 10
global MaxCpu
global MaxMem
global MaxMemPct
global MaxDisk
global MaxDiskPct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these need to be declared global here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of those

Comment on lines 68 to 77
global MaxCpu
MaxCpu = -100.0
global MaxMem
MaxMem = -100.0
global MaxMemPct
MaxMemPct = -100.0
global MaxDisk
MaxDisk = -100.0
global MaxDiskPct
MaxDiskPct = -100.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't the initializations happen where the variables are defined and this whole block deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be initialized here since they are per used per monitoring log fle

scripts/variantstore/wdl/extract/uber_monitor.py Outdated Show resolved Hide resolved
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but I would suggest renaming "uber_monitor.py" and "test_uber_monitor.py" to better reflect what the script does, e.g. "collate_task_monitor_logs.py".

scripts/variantstore/wdl/GvsCreateFilterSet.wdl Outdated Show resolved Hide resolved
@gbggrant
Copy link
Collaborator Author

Added pep8 fixes, renamed script.

@gbggrant
Copy link
Collaborator Author

gbggrant commented Apr 10, 2023

Updated:
Successful VQSR Lite Run (with monitoring summary output) here
Successful VQSR Classic Run (with monitoring summary output) here

@gbggrant gbggrant requested a review from mcovarr April 11, 2023 02:06
Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor issues perhaps best reviewed in mobbing

@@ -398,12 +432,14 @@ task ExtractFilterTask {
}

String intervals_name = basename(intervals)

File monitoring_script = "gs://gvs_quickstart_storage/cromwell_monitoring_script.sh"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bucket with quickstart in its name might not be the best place for a script that's going to be used for non-quickstart runs. Maybe gs://gvs_internal?

def parse_monitoring_log_file(mlog_file, output):
eprint(f"Parsing: {mlog_file}")

if (os.stat(mlog_file).st_size == 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for the PEP 8 fixups, but I'm still seeing a lot of non-PEP 8 warnings in IntelliJ. e.g. on this line "Remove redundant parantheses". If/when we go to our own repo there will likely be Python linting that will error on issues like this. Happy to review in mobbing to make sure we're seeing the same thing!

@@ -137,6 +145,15 @@ workflow JointVcfFiltering {
Array[File] indels_variant_scored_vcf_index = ScoreVariantAnnotationsINDELs.output_vcf_index
Array[File] snps_variant_scored_vcf = ScoreVariantAnnotationsSNPs.output_vcf
Array[File] snps_variant_scored_vcf_index = ScoreVariantAnnotationsSNPs.output_vcf_index
Array[File?] monitoring_logs = flatten(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what this is asking

@gbggrant
Copy link
Collaborator Author

Okay, I think I've got most of it. Still want to move the monitoring script somewhere better.

@gbggrant
Copy link
Collaborator Author

After moving the monitoring script:
Successful VQSR Lite Run in AoU-land (with monitoring summary output) here
Successful VQSR Classic Run in non-AoU terra (with monitoring summary output) here

@gbggrant gbggrant requested a review from mcovarr April 14, 2023 11:28
@mcovarr
Copy link
Collaborator

mcovarr commented Apr 14, 2023

There are still 17 (!) references to the script in its previous location.

Is it possible to bring that number down?

@gbggrant
Copy link
Collaborator Author

Passing integration test here

@gbggrant gbggrant merged commit ce3a5c7 into ah_var_store Apr 16, 2023
@gbggrant gbggrant deleted the gg_VS-871_UberMonitor branch April 16, 2023 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants