Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory improvement when writing missing positions to pet #7098

Merged
merged 8 commits into from
Feb 24, 2021

Conversation

mmorgantaylor
Copy link
Member

related to issue #211 and #233 - in CreateVariantIngestFiles, when writing missing positions to the pet tsv, we were looping through blocks of missing intervals twice, once to construct the pet rows, holding them all in memory, and again to write the pet rows. in this PR, we merge those steps into one: for each missing block, we write the pet lines to file as we loop through, avoiding the need to hold large blocks in memory.

note that this does not resolve the memory issues that originally prompted issues #211 and #233, but it's nonetheless a minor improvement that helps immensely when there are large numbers of missing locations.

Copy link
Contributor

@ahaessly ahaessly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
there are actually 2 files that only have whitespace changes. they don't need to be in this PR

@@ -168,7 +168,7 @@ task CreateImportTsvs {
--mode GENOMES \
-SNM ~{sample_map} \
--ref-version 38

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this class from the PR - only whitespace changes

Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great -- just small nits

public static List<List<String>> createMissingTSV(long start, long end, String sampleName) {
List<List<String>> rows = new ArrayList<>();

public void createMissingTSV(long start, long end, String sampleName) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to writeMissingEntry or something else with write?

public static List<List<String>> createMissingTSV(long start, long end, String sampleName) {
List<List<String>> rows = new ArrayList<>();

public void createMissingTSV(long start, long end, String sampleName) throws IOException {
for (long position = start; position <= end; position ++){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit -- remove space on postincrement operator (ie position++)

@mmorgantaylor mmorgantaylor merged commit 382d5da into ah_var_store Feb 24, 2021
@mmorgantaylor mmorgantaylor deleted the mmt_CVIF_memory branch February 24, 2021 18:15
kcibul pushed a commit that referenced this pull request Mar 9, 2021
* WIP memory improvements - only do one loop

* move log to end of loop

* reduce memory/cpus for ImportGenomes, re-enable non-TSV outputs

* just kidding, let's not change the wdl yet

* remove redundant outputType inputs

* remove debugging log

* rename createMissingTSV to writeMissingPositions, minor edits

* revert whitespace-only changes
mmorgantaylor added a commit that referenced this pull request Apr 6, 2021
* WIP memory improvements - only do one loop

* move log to end of loop

* reduce memory/cpus for ImportGenomes, re-enable non-TSV outputs

* just kidding, let's not change the wdl yet

* remove redundant outputType inputs

* remove debugging log

* rename createMissingTSV to writeMissingPositions, minor edits

* revert whitespace-only changes
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants