Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeightedSplitInterval fixes [VS-384] [VS-332] #7795

Merged
merged 9 commits into from
Apr 25, 2022

Conversation

mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented Apr 19, 2022

EDIT: restored the experimental tool warning as we're not sure it's currently okay to remove that

Fixes a few issues found with WeightedSplitIntervals:

  • Fixes a crash processing the Charlie data set when it initially seems like more intervals could be added to the interval list under construction, but upon further examination it turns out there cannot
  • Fixes a glitch in writing interval name lines for split intervals when the parent interval name was null
  • Removes the experimental tool warning

@@ -61,15 +61,15 @@ public WeightedInterval[] split(int basesInFirstInterval) {
this.getStart(),
this.getStart() + basesInFirstInterval - 1,
this.isNegativeStrand(),
this.getName() + "-1", // ensure names are unique
this.getName() == null ? null : this.getName() + "-1", // ensure non-null names are unique
Copy link
Collaborator Author

@mcovarr mcovarr Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interval list files were getting lines like

chr1	772635	877632	+	null-2|null-1

which should have been

chr1	772635	877632	+	.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch---but are nulls okay here? Seems odd we were getting nulls to begin with?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WeightedSplitIntervals only uses the two-arg version of the WeightedInterval constructor which means the WeightedIntervals it creates always have a null name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to give them names? Not necessary, but looks kind of odd without them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would defer to others on that, I am new enough to this domain that this didn't jump out as odd to me. 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I've always found the interval names in these files less than useful since unless your targeting biological constructs (ie gene names, etc) you typically end up doing something like "foo-XXX" which sort of is meaningless


if (basesToTake == 0) {
// We can't add any more bases to the current interval list so put this interval back.
iter.pushback(wi);
Copy link
Collaborator Author

@mcovarr mcovarr Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes the Charlie crashing issue and the resulting interval list files are identical to those produced by the 86f1116 version of the gatk jar used for Beta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@@ -41,7 +41,6 @@
programGroup = IntervalsManipulationProgramGroup.class
)
@DocumentedFeature
@ExperimentalFeature
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

experimental no more!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk to @lbergelson -- the Experimental tag in GATK means something specific about support. Many significant tools (reblocking I believe) are in this category because there is not full frontline support for them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks... the team had expressed some concern about the seemingly alarming "experimental" warning message on startup in what now feels like a core part of a production(ish) pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that, and I think it's how this came up the last time :D It was something akin to Gmail being in Beta. But Louis or @droazen can give us the scoop

@gatk-bot
Copy link

Travis reported job failures from build 38959
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 38959.5 logs

@gatk-bot
Copy link

Travis reported job failures from build 38961
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 38961.5 logs

@@ -62,7 +54,6 @@ public void testNoLossRealisticWgs() {

final int scatterCount = 100;
final File outputDir = createTempDir("output");
final Interval interval = new Interval("chr20", 1000000, 2000000);
Copy link
Collaborator Author

@mcovarr mcovarr Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable in an existing test wasn't being used which may or may not be concerning?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just a cut and paste from testNoLossSimple

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I think you're probably right.

@gatk-bot
Copy link

Travis reported job failures from build 38963
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 38963.5 logs

@gatk-bot
Copy link

Travis reported job failures from build 38965
Failures in the following jobs:

Test Type JDK Job ID Logs
conda openjdk8 38965.5 logs

@@ -62,7 +54,6 @@ public void testNoLossRealisticWgs() {

final int scatterCount = 100;
final File outputDir = createTempDir("output");
final Interval interval = new Interval("chr20", 1000000, 2000000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just a cut and paste from testNoLossSimple

@@ -61,15 +61,15 @@ public WeightedInterval[] split(int basesInFirstInterval) {
this.getStart(),
this.getStart() + basesInFirstInterval - 1,
this.isNegativeStrand(),
this.getName() + "-1", // ensure names are unique
this.getName() == null ? null : this.getName() + "-1", // ensure non-null names are unique
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to give them names? Not necessary, but looks kind of odd without them.

@@ -88,6 +79,41 @@ public void testNoLossRealisticWgs() {

}

@Test
public void testHandleZeroBasesToTake() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ran this test with the pre-fix code does it fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's how I discovered all the scatter widths that exercise the condition. Much easier to find if the code under test throws an exception, no need to set a breakpoint. 🙂

@mcovarr mcovarr changed the title WeightedSplitInterval fixes [VS-384] WeightedSplitInterval fixes [VS-384] [VS-332] Apr 20, 2022
Copy link
Contributor

@kcibul kcibul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work Miguel!


if (basesToTake == 0) {
// We can't add any more bases to the current interval list so put this interval back.
iter.pushback(wi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@mcovarr mcovarr force-pushed the weighted_split_interval_edge_case_fix branch from c7a7f8b to 553417e Compare April 22, 2022 13:36
@mcovarr mcovarr merged commit 614a0f7 into ah_var_store Apr 25, 2022
@mcovarr mcovarr deleted the weighted_split_interval_edge_case_fix branch April 25, 2022 13:05
This was referenced Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants