Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenomicsDB excessive logging "[E::faidx_adjust_position] The sequence "chrX" was not found" #7751

Open
1 of 2 tasks
ldgauthier opened this issue Apr 4, 2022 · 3 comments
Open
1 of 2 tasks
Assignees

Comments

@ldgauthier
Copy link
Contributor

Bug Report

Affected tool(s) or class(es)

GenomicsDBImport

Affected version(s)

  • Latest public release version [version?]
  • Latest master branch as of Apr 4, 2022

Description

[E::faidx_adjust_position] The sequence "chrX" was not found
[E::faidx_adjust_position] The sequence "chrX" was not found
[E::faidx_adjust_position] The sequence "chrX" was not found
[E::faidx_adjust_position] The sequence "chrX" was not found

Steps to reproduce

Run the first test case for GnarlyGenotyperIntergrationTest::testUsingGenomicsDB() on the branch #7750

The test contains the argument --intervals chrX:1000000-5000000, but I'm not sure why that would be an issue. The tool runs fine and the output is valid.

Expected behavior

An informative warning or a single output of the existing warning

Actual behavior

Excessive logging

@nalinigans
Copy link
Contributor

The test contains the argument --intervals chrX:1000000-5000000, but I'm not sure why that would be an issue.

This is from htslib::faidx_fetch_seq_into_buffer because the reference for the test does not contain the contig chrX. We could just log this once and continue. Is this what you want? Or do you want an exception at this point?

@ldgauthier
Copy link
Contributor Author

The VCF sequence dictionary does contain a chrX -- is that enough? A lot of our tools only need a dictionary and can get one from the header of various file types.

Otherwise I think an exception would be appropriate. If that was the only reference a user had, would they be able to query the GenomicsDB successfully?

@mlathara
Copy link
Contributor

mlathara commented Apr 11, 2022

Sequence dictionary is not enough -- we actually need the reference because GenomicsDB uses that to fill in the reference base in some cases. For this reason, the reference is a required argument when reading from GenomicsDB, but as this issue outlines we probably should go one step further and validate that the intervals being queried are in the reference. We can add this to GenomicsDB but it's probably better to have a check done in GATK so that we fail fast.

It is interesting though that the results seem valid...presumably having the reference base as 'N' in some cases doesn't affect it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants