Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1 #7938

Open
AJDCiarla opened this issue Jul 12, 2022 · 9 comments

Comments

@AJDCiarla
Copy link

IllegalStateException in GenotypeGVCFs after GenomicsDBImport - GATK 4.2.6.1

Looks like there are similar issues occurring in #7639 and #7933. This is a follow up report from the GATK Forum.

GATK Forum Post: (https://gatk.broadinstitute.org/hc/en-us/community/posts/6972994559643-java-lang-IllegalStateException-in-GenotypeGVCFs-after-GenomicsDBImport-GATK-4-2-6-1)


Bug Report

Tools/Methods

GenotypeGVCFs --> GenomicsDBImport

Affected version(s)

-GenomicsDBImport: GATK 4.2.4.0
-GenotypeGVCFs: GATK 4.2.6.1

Description

IllegalStateException being thrown in GenotypeGVCFs after GenomicsDBImport. Exception denotes that "genome has no likelihoods". User is dividing into 50 intervals.

Stacktrace:

GENOMICSDB_TIMER,GenomicsDB iterator next() timer,Wall-clock time(s),74.14547183399837,Cpu time(s),67.38693261000097
[July 1, 2022 1:36:56 AM CST] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 104.22 minutes.
Runtime.totalMemory()=13973323776
java.lang.IllegalStateException: Genotype has no likelihoods: [COLI1040 TGAGC*/T GQ 39 DP 2 AD 1,1 {SB=[1, 0, 1, 0]}]
    at org.broadinstitute.hellbender.utils.GenotypeUtils.computeDiploidGenotypeCounts(GenotypeUtils.java:89)
    at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.calculateEH(ExcessHet.java:96)
    at org.broadinstitute.hellbender.tools.walkers.annotator.ExcessHet.annotate(ExcessHet.java:84)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.addInfoAnnotations(VariantAnnotatorEngine.java:355)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:334)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:306)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.regenotypeVC(GenotypeGVCFsEngine.java:185)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:135)
    at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$null$1(VariantLocusWalker.java:161)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$2(VariantLocusWalker.java:151)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.ReferencePipeline$Head.forEachOrdered(ReferencePipeline.java:590)
    at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:148)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

Exact Commands Used:

GenomicsDBImport:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms2G -Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenomicsDBImport --genomicsdb-workspace-path 007_Database_DBImport_VCFref/database_interval_9 --sample-name-map sample_name_map --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --reader-threads 5 --batch-size 60 --tmp-dir TMPDIR --max-num-intervals-to-import-in-parallel 3 --merge-input-intervals

GenotypeGVCFs:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms4G -Xmx16G -XX:+UseParallelGC -XX:ParallelGCThreads=2 -jar MySoftwares/gatk-4.2.6.1/gatk-package-4.2.6.1-local.jar GenotypeGVCFs -R PigeonBatch5/000_DataLinks/000_RefSeq/Cliv2.1_genomic.fasta --intervals 006_IntervalsSplit_DBImport_VCFref/interval_9.list --force-output-intervals PigeonBatch4/008_RawVcfGz/MergeVcf/pigeonBatch1234_filtered.vcf.gz -V gendb://007_Database_DBImport_VCFref/database_interval_9 -O 008_RawVcfGz_DBImport_VCFref/001_DividedIntervals/interval_9.vcf.gz --tmp-dir TMPDIR --allow-old-rms-mapping-quality-annotation-data --only-output-calls-starting-in-intervals --verbosity ERROR

User Description of the Issue:

"I'm using the GenotypeGVCFs function based on GenomicsDBImport database. I've divided the reference into 50 intervals. Some intervals seems ok, but some reports error as following.

I used a VCF file in "--force-output-intervals" for down stream analysis. I've never seen this error without "--force-output-intervals". I've searched for the error message and changed my GATK version to 4.2.6.1 since similar error has been solved as a bug in recent update, but it still not works on my dataset..."

@droazen and @samuelklee , any insight on this?

Thank you,

Anthony

@samuelklee
Copy link
Contributor

samuelklee commented Jul 14, 2022

Just reiterating here what @lbergelson noted in office hours: looks like the offending check was added in #7738, which ultimately affects both the ExcessHet and InbreedingCoeff annotations. @droazen reviewed that PR and might have more insight as to the desired behavior for these annotations when we are missing PLs due to GenomicsDB dropping them upstream---should we just not emit these annotations?

@droazen
Copy link
Collaborator

droazen commented Jul 18, 2022

@AJDCiarla The user should try re-running GenotypeGVCFs with --max-genotype-count set to a value greater than 1024. This should prevent the PLs from getting dropped and avoid the downstream error. The user may also need to increase --max-alternate-alleles as well.

@droazen
Copy link
Collaborator

droazen commented Jul 18, 2022

@AJDCiarla It would also be useful to know whether the error occurs when the user runs GenotypeGVCFs without the --force-output-intervals argument.

@bbimber
Copy link
Contributor

bbimber commented Jul 24, 2022

@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.

I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.

Have you had any further discussions beyond what @samuelklee suggested above?

@bbimber
Copy link
Contributor

bbimber commented Jul 26, 2022

@droazen, I'm running a job using a JAR based on #7962 and it progressed beyond the previous failures.

@AJDCiarla
Copy link
Author

We can close this, I have created a new ticket here #7966 for the --force-output-intervals bug. @droazen

bbimber added a commit to BimberLab/DiscvrLabKeyModules that referenced this issue Aug 9, 2022
bbimber added a commit to BimberLab/DiscvrLabKeyModules that referenced this issue Aug 24, 2022
* Support direct CRAM conversion in alignment pipelines

* Remove jboss and standardize gradle files with develop

* Switch error to warning

* Update GenotypeGVCFHandler to include non-variant sites when making sites-only VCF

* Update case in toLower

* More informative error message

* Fix argument in SamtoolsCramConverter

* Update jbrowse dependencies (#165)

* Add CRAM to allowable JBrowse track types

* Initial support for pbmm2 and pbsv

* Initial support for vulcan long read aligner

* Support for quality metrics from nimble

* Ensure output directory exists

* Fix to JBrowse 2 CRAM tracks

* Add additional vulcan alignment outputs

* Fix filepath typo

* Update nimble alignment defaults

* Add debug message for nimble

* Bugfix to nimble metrics import when running as alignment

* Improve column width for nimble panels

* Make nimble max_hits_to_report configurable

* Skip merge unaligned for long-read aligners

* Add UCell calculation step

* Better handling for job resume after nimble failure

* Add validation and bugfix for Nimble metrics import

* Improve log messages for nimble metrics import

* Support maxGenotypeCount for GenotypeGVCFs

* Bump terser from 5.12.1 to 5.14.2 in /jbrowse (#166)

Bumps [terser](https://github.com/terser/terser) from 5.12.1 to 5.14.2.
- [Release notes](https://github.com/terser/terser/releases)
- [Changelog](https://github.com/terser/terser/blob/master/CHANGELOG.md)
- [Commits](https://github.com/terser/terser/commits)

---
updated-dependencies:
- dependency-name: terser
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bugfix to the order of nimble gz and error checks

* Bugfix to the order of nimble gz and error checks

* Add debugging

* Add debugging to cDNA prep

* Add view to assist in management of single-cell data

* Add another view to assist in management of single-cell data

* Remove no longer needed install of jbrowse 1

* Update report filter

* Allow batch assignment of sequence libraries to runs by lane

* Add new param to fail 10x processing if too few cells are found

* Support nimble strandedness filter

* Remove no-longer-needed nimble handler

* Support CellMembrane and Seurat IntegrateData

* Update SplitSeurat minCellsToKeep to allow fractions

* Register new IntegrateData step

* Support RIRA CalculateGeneComponentScores

* Bugfix to RIRA CalculateGeneComponentScores

* Skip GATK annotations to avoid broadinstitute/gatk#7938

* Return to using MS- prefix for cell hashing dual-index barcodes

* Add additional single-cell filters and support ReblockGVCF

* Allow celltypist model to run across genomes

* Fix nextclade syntax

* Fix nextclade syntax

* Fix nextclade syntax

* Fix nextclade syntax

* Support GATK ReblockGVCF

* Switch queries to use POST

* Bugfix to ReblockGvcfHandler

* Add admin action to manually update URI on ExpData objects

* Allow GenotypeGVCFHandler to create genomicsdb workspaces on-the-fly

* Ensure file is cached for GenotypeGVCFs exclude_intervals

* More specific regex

* Serialize SequenceAnalysisJobSupport outside of PipelineJob to reduce the size of the job's JSON

* Allow deserialization of legacy JSON files containing support property

* Update artifactory URLs

* Improve unit test

* Debug pipeline job serialization (#167)

* Debug pipeline job serialization

* Fix bug with gene scores not being saved

* Increase RAM for remote FASTQC jobs

* Test fixes

* Only serialize SequenceJobSupport to disk when running on webserver

* More fixes around serialization of SequenceJobSupport

* Reduce HaplotypeCaller max-alternate-alleles

* Support additional GenotypeGVCF params

* Fix tests

* Fix value for createsSeuratObjects on several steps

* Improve warning messages

* Improve warning messages

* Improve warning messages

* Improve GenotypeGVCFs logging

* Improve GenotypeGVCFs logging and drop old params

* Prior to GenotypeGVCFs, create workspaces with padding over the provided intervals

* Refactor VcfComparisonStep to support VCF output(s), and add mGAP-release-specific version

* Add validation for SamtoolsCramConverter

* Reduce the amount of serialization to disk from SequenceJobSupport

* Prepare sequence pipeline client code for non-savable params

* Allow Seurat merge object name to be excluded from saved templates

* Allow GenotypeGVCFs to locally cache support files

* Update picard syntax to match upcoming argument changes

* Further reduce sequence support serialization

* Add creation of bgzipped genomes to standard genome import

* Bugfix to genome gzipping

* Update picard version for tests

* Bugfix to alignment and skipping merge unaligned reads

* Match picard version to sequence tests

* Correct picard version

* Improve logging

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@jigaoxiang
Copy link

@droazen, like Karina posted in #7933, with our inputs this issue only occurs when using --force-output-intervals. I tried increasing --max-alternate-alleles to 2048 with no change.

I just got back from a vacation, but this week I will try to debug this more closely to see what is causing the issue.

Have you had any further discussions beyond what @samuelklee suggested above?

Hello, did you deal with this probelm, I also encounter this.

@bbimber
Copy link
Contributor

bbimber commented Sep 7, 2022

The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.

@jigaoxiang
Copy link

The error comes from two annotations: InbreedingCoeff and ExcessHet. One solution is to add "-AX ExcessHet -AX InbreedingCoeff". It doesnt exactly solve the problem, but it avoids hitting the problem code.

Awesome! It is useful. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants