Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java 17 and Spark 3.3.0. #8035

Merged
merged 36 commits into from
Mar 2, 2023
Merged

Java 17 and Spark 3.3.0. #8035

merged 36 commits into from
Mar 2, 2023

Conversation

cmnbroad
Copy link
Collaborator

No description provided.

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3128674181
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3128674181.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3130948787
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3130948787.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3132106780
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3132106780.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3136014671
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3136014671.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3236342061
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3236342061.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3237120012
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3237120012.0 logs

@gatk-bot
Copy link

gatk-bot commented Nov 7, 2022

Github actions tests reported job failures from actions build 3413577203
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3413577203.0 logs

@gatk-bot
Copy link

gatk-bot commented Jan 9, 2023

Github actions tests reported job failures from actions build 3877173679
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3877173679.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 3987916695
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17 3987916695.0 logs

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Jan 24, 2023

@samuelklee I've worked through all of the issues with the other repos (Barclay, picard, etc.), so we're ready to move forward with Java 17 reviews, but the ModelSegments tests are still inconsistent. This branch has updated ModelSegments expected result files that were produced locally with Java 17 (based on our discussion from some months ago), but now the same tests still fail on the docker. They succeed when run locally or in the CI non-docker job.

At one point when I looked into these, the values produced when the tests fail look similar(/identical ?) to the values that the tests USED to produce (i.e., I suspect if we reverted expected results files, the tests might succeed on the docker and fail on the non-docker).

I'm not sure what is causing the instability, but this is going to become a blocking issue for this in the next couple of weeks. We also have the option of disabling them only on the docker (or non-docker), but I hate to do that. Can you take a look at this in the next week or so ?

There is also one WDL test (CNVSomaticPanelWorkflow.CreateReadCountPanelOfNormals) that fails pretty consistently - I suspect this is a resource/memory issue since it sometimes returns exit code 134. The default CI machines are pitiful things, 7GB IIRC.

The other test failures (doc and wdl gen) can be ignored for now - they require code that is on another branch that is dependent on a new Barclay version.

@samuelklee
Copy link
Contributor

samuelklee commented Jan 24, 2023

Thanks @cmnbroad, I can take a look starting Friday. In the meantime, could you highlight any possible differences between the Docker vs. non-Docker Java / native environments, if any come to mind?

As for the WDL test, I would be very surprised if it was a memory issue---I think those tests build a panel with 2 samples over fewer than 100 intervals. Doing PCA on a 2 x 100 matrix shouldn't require 7GB...!

@samuelklee
Copy link
Contributor

Also just dropping a link to #8107, some other relevant links there as well.

@samuelklee
Copy link
Contributor

Looks like the Docker environment uses Java 17.0.1+12, while the non-Docker environment just specifies Java 17? Not sure what the latter resolves to, so maybe this isn’t the issue, but I’ll do more digging later.

@samuelklee
Copy link
Contributor

samuelklee commented Jan 25, 2023

I did a simple experiment and changed the version of Java used in the non-Docker ("17", although again I'm not sure what this actually resolves to) to that used in the Docker (17.0.1+12). This causes both non-Docker and Docker tests to now fail, rather than just the Docker tests; see #8174 (comment). Moreover, the test failures produce exactly the same discrepant numerical results.

I think we can probably conclude that the expected test results were generated with "17" and that changing to 17.0.1+12 generates different results. This is not too unreasonable; see the Slack thread linked in #8111 (comment), for example, which shows that we might be getting into pretty hairy territory and that even changes to things like how HotSpot Intrinsics are implemented in each JVM can cause the numerical differences we see here.

So perhaps we can either 1) change the Docker version to the version corresponding to "17" or 2) change the non-Docker version to 17.0.1+12 and update the expected results?

Not sure about the failing WDL test yet, but hopefully this is enough to get us started!

@cmnbroad
Copy link
Collaborator Author

@samuelklee Thats great - at least it gives us a way to move forward. I'm not surprised that there are differences between 8/11/17, but its a bit surprising that there is so much difference between Java 17 point releases.

When I'm back next week I'll update my branch to use the same Java versions and expected results files. Thanks for looking into this!

@samuelklee
Copy link
Contributor

@cmnbroad could the failing WDL test simply be due to some Spark configuration issue, rather than memory? Locally, for both 1) the WDL test within the Docker and 2) CreateReadCountPanelOfNormalsIntegrationTest using 17.0.3 without the Docker, I seem to hit the exception discussed here: https://stackoverflow.com/questions/72724816/running-unit-tests-with-spark-3-3-0-on-java-17-fails-with-illegalaccesserror-cl

Not sure why CreateReadCountPanelOfNormalsIntegrationTest seems to pass in the CI environments, but perhaps it'll be more obvious to you?

Just for context, note that this tool relies on the Spark MLlib implementation of PCA.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Jan 30, 2023

@samuelklee Oh interesting - the main jar and the test jar have directives - see for example the one for the main jar here - that should prevent that exception from happening (but only if you're running from a jar, otherwise you need to add a bunch of --add-exports manually to prevent that exception - I need to write up a wiki article on how to do that, since you need to do it to run tests from within the IDE). So it shouldn't happen on the docker, but if it is, maybe I missed a code path. I'll take a look.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Feb 7, 2023

So the issue was that the manifest syntax for --add-opens that I was using was incorrect - its amazing that all the other tests pass that way - but I can now see how that happens. I have a fix and the test passes now. I just need to clean everything up. Thanks for looking @samuelklee.

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 4294252456
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 4294252456.0 logs

@gatk-bot
Copy link

gatk-bot commented Feb 28, 2023

Github actions tests reported job failures from actions build 4294588163
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 4294588163.2 logs
integration 17.0.6+10 4294588163.0 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 4295524165
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 4295524165.2 logs

@gatk-bot
Copy link

Github actions tests reported job failures from actions build 4296321287
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling 17.0.6+10 4296321287.2 logs

@cmnbroad
Copy link
Collaborator Author

@Louisb I did wind up making the docgen and wdlgen and tab completion integration tests non-docker only, since they're failing when run on the docker. The tasks themselves seem to be working fine - these tests are really just here to do a quick smoke test, and to make it easy to debug issues in case of a failure. I suspect the failures are related to the way we run the tests on the docker using the 3 separate jars. So for now I made them non-docker only.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final changes seem sane. I'm in favor of merging this and then fixing up the base image after it goes in since it's taking longer than anticipated.

@lbergelson lbergelson marked this pull request as ready for review March 2, 2023 20:35
@lbergelson lbergelson merged commit 330c59a into master Mar 2, 2023
@lbergelson lbergelson deleted the cn_java_17 branch March 2, 2023 20:43
@bbimber
Copy link
Contributor

bbimber commented Mar 2, 2023

@lbergelson or @cmnbroad: would you mind kicking off a build on this, so I can see how DISCVR-seq builds against it? Are you planning a GATK release any time soon?

@lbergelson
Copy link
Member

lbergelson commented Mar 2, 2023 via email

@bbimber
Copy link
Contributor

bbimber commented Mar 2, 2023

ok, thanks. DISCVRseq does use the doclet stuff to build javadoc, and I recall this being a little wonky to get working. it'd be helpful to figure out how easy/complex this is going to be. i'm all for migrating though.

@jdmccauley
Copy link

Hi, I'm a GATK4 user and I apologize in advance if these questions can be answered in documentation: I could not find answers to these myself.

Are releases not made alongside master branch changes? Is the master branch not a stable branch? I am every excited to try GATK with Java 17, however I'm unsure if it's time to test since there is not a release for this pull request.
I also say this because I'm struggling to get gradle to build this on my machine for reasons unrelated to GATK, and I'm much more comfortable running an already-built .jar.

@bbimber
Copy link
Contributor

bbimber commented Mar 7, 2023

Thats a good idea. I don't know what our release plan is. I think we might do one more minor release on java 8 for someone internal and then give a bit of time before a java 17 release to let things shake out a bit.

@lbergelson: were you able to kick off a build? thanks in advance.

@lbergelson
Copy link
Member

@bbimber Ack, sorry, there's a snapshot up at: 4.3.0.0-47-g4ba4ab5-SNAPSHOT

@droazen
Copy link
Collaborator

droazen commented Mar 8, 2023

@jdmccauley Yes, we do releases from the master branch periodically -- there are 1 or 2 additional outstanding PRs we'd like to get merged before the next release, but it should not be much longer. In the mean time, if you're unable to build GATK from source, you could try running the latest prebuilt nightly docker image from https://hub.docker.com/r/broadinstitute/gatk-nightly/

@jdmccauley
Copy link

@droazen Understood, thanks! I just pulled the jar out of the latest nightly docker image and it seems to run with Java 17 on my machine. Thanks for the tip, and I'm looking forward to the official release after those PRs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants