Block compressed gVCF file is rejected by GenomicsDBImport --bypass-feature-reader due to non-standard file extension #7691

lessdata · 2022-02-23T17:35:23Z

Bug Report

Affected tool(s) or class(es)

GenomicsDBImport

Affected version(s)

Latest public release version [4.2.5.0]

Description

My gVCF files are block compressed and indexed, but the files have the file extension ".gvcf.gz" rather than ".vcf.gz". When I run GenomicsDBImport with --bypass-feature-reader, the ".gvcf.gz" file cannot be recognized as a block compressed vcf file. The code of GenomicsDBImport validates if input is block compressed by checking if the file extension is ".vcf.gz".

    private static void assertVariantFileIsCompressedAndIndexed(final Path path) {
        if (!path.toString().toLowerCase().endsWith(FileExtensions.COMPRESSED_VCF)) {
            throw new UserException("Input variant files must be block compressed vcfs when using " +
                BYPASS_FEATURE_READER + ", but " + path.toString() + " does not appear to be");
        }
        Path indexPath = path.resolveSibling(path.getFileName() + FileExtensions.COMPRESSED_VCF_INDEX);
        IOUtils.assertFileIsReadable(indexPath);
    }

I understand that this is an issue on my side because I did not name my gVCF files with the standard extension ".vcf.gz". Is it possible to make this check less stringent in a future release? Maybe make any ".gz"/".bgz" file acceptable, or check the ".tbi" index file to identify block compression (existing index typically means the file is block compressed and indexed).

Thank you.

The text was updated successfully, but these errors were encountered:

droazen · 2022-02-23T19:22:37Z

@lessdata In many cases we need to rely on the file extensions to check the file format, because actually opening the files and reading the first few bytes to determine the format gets expensive when the files are hosted in the cloud and there are many VCFs. I do agree that this error message could be improved, however -- it should mention the file extensions that are allowed.

Resolves #7691

…error message (#7692) Resolves #7691

droazen added a commit that referenced this issue Feb 23, 2022

Mention acceptable VCF file extension in GenomicsDBImport

e4d581c

Resolves #7691

droazen mentioned this issue Feb 23, 2022

Mention acceptable compressed VCF file extension in GenomicsDBImport error message #7692

Merged

droazen closed this as completed in #7692 Feb 23, 2022

droazen added a commit that referenced this issue Feb 23, 2022

Mention acceptable compressed VCF file extension in GenomicsDBImport …

77b725d

…error message (#7692) Resolves #7691

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block compressed gVCF file is rejected by GenomicsDBImport --bypass-feature-reader due to non-standard file extension #7691

Block compressed gVCF file is rejected by GenomicsDBImport --bypass-feature-reader due to non-standard file extension #7691

lessdata commented Feb 23, 2022 •

edited

Loading

droazen commented Feb 23, 2022

Block compressed gVCF file is rejected by GenomicsDBImport --bypass-feature-reader due to non-standard file extension #7691

Block compressed gVCF file is rejected by GenomicsDBImport --bypass-feature-reader due to non-standard file extension #7691

Comments

lessdata commented Feb 23, 2022 • edited Loading

Bug Report

Affected tool(s) or class(es)

Affected version(s)

Description

droazen commented Feb 23, 2022

lessdata commented Feb 23, 2022 •

edited

Loading