Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ArrayIndexOutOfBoundsException when creating tabix index #7838

Open
bw2 opened this issue May 6, 2022 · 4 comments
Open

java.lang.ArrayIndexOutOfBoundsException when creating tabix index #7838

bw2 opened this issue May 6, 2022 · 4 comments

Comments

@bw2
Copy link

bw2 commented May 6, 2022

Bug Report

Affected tool(s) or class(es)

gatk SortVcf

Affected version(s)

Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 1.8.0_322-b06; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.4.1

Description

SortVcf finishes sorting and writes out a VCF, but then fails with java.lang.ArrayIndexOutOfBoundsException when generating the tabix index. To work around this, I can run with --CREATE_INDEX false and then run tabix to generate the index.

INFO	2022-05-06 12:14:45	SortVcf	wrote       675,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:41,521,469
INFO	2022-05-06 12:14:45	SortVcf	wrote       700,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:61,833,861
INFO	2022-05-06 12:14:45	SortVcf	wrote       725,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:78,534,676
INFO	2022-05-06 12:14:45	SortVcf	wrote       750,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:100,707,682
INFO	2022-05-06 12:14:45	SortVcf	wrote       775,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:117,527,190
INFO	2022-05-06 12:14:45	SortVcf	wrote       800,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:134,613,380
INFO	2022-05-06 12:14:45	SortVcf	wrote       825,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:153,780,108
INFO	2022-05-06 12:14:45	SortVcf	wrote       850,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:173,329,831
INFO	2022-05-06 12:14:46	SortVcf	wrote       875,000 records.  Elapsed time: 00:00:03s.  Time for last 25,000:    0s.  Last read position: chr3:192,133,262
[Fri May 06 12:14:46 EDT 2022] picard.vcf.SortVcf done. Elapsed time: 0.36 minutes.
Runtime.totalMemory()=2855272448
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

java.lang.ArrayIndexOutOfBoundsException: 16799
	at htsjdk.samtools.BinningIndexBuilder.processFeature(BinningIndexBuilder.java:102)
	at htsjdk.tribble.index.tabix.TabixIndexCreator.finalizeFeature(TabixIndexCreator.java:106)
	at htsjdk.tribble.index.tabix.TabixIndexCreator.addFeature(TabixIndexCreator.java:92)
	at htsjdk.variant.variantcontext.writer.IndexingVariantContextWriter.add(IndexingVariantContextWriter.java:203)
	at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:242)
	at picard.vcf.SortVcf.writeSortedOutput(SortVcf.java:183)
	at picard.vcf.SortVcf.doWork(SortVcf.java:101)
	at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
	at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
	at org.broadinstitute.hellbender.Main.main(Main.java:292)

Expected output

There's almost certainly some format issue with my VCF, but ideally GATK would have a better error message than ArrayIndexOutOfBoundsException.

@lbergelson
Copy link
Member

lbergelson commented May 10, 2022

@bw2 I agree, this is an unhelpful error. We should fix it but it probably has to be done in htsjdk. (or picard since this is a picard tool we import).

I'm not 100% sure what the issue is, it seems like were somehow resolving an invalid bin in the index. I would expect that that might happen using a very long chromosome, but 193,00,00 shouldn't be too large. Are you using non-human data or something with an extremely long variant?

@bw2
Copy link
Author

bw2 commented May 15, 2022

Yes, this was human data. It might have been a long variant.

@droazen
Copy link
Collaborator

droazen commented Jun 6, 2022

@bw2 Do you have a small file that reproduces this issue? We'll need a runnable test case that reproduces this in order to debug further.

@cwhelan
Copy link
Member

cwhelan commented Jun 6, 2022

I'm not sure if it'll fix or affect this issue, but I noticed this and want to note that @tedsharpe has an active pull request to fix issues with tabix index generation: #7858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants