Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement -L system, enable access to it for tools that request it #4

Closed
akiezun opened this issue Dec 10, 2014 · 12 comments
Closed

Implement -L system, enable access to it for tools that request it #4

akiezun opened this issue Dec 10, 2014 · 12 comments
Assignees
Labels
Milestone

Comments

@akiezun
Copy link
Contributor

akiezun commented Dec 10, 2014

Implement -L system, enable access to it for tools that request it (define how they 'request it' - maybe by implementing an interface or calling a function or overriding some generic hook - part of this issue is to design it).

@jmthibault79
Copy link
Contributor

One limitation of the GATK interval implementation is that it doesn't handle variants with ranges (like GVCF reference blocks) correctly - and it would be hard to change it so it does. We should keep this use case in mind. The problem is that only the start position is considered, so an interval covering the middle or end of a variant range will not select it.

Example: http://gatkforums.broadinstitute.org/discussion/4925/combinegvcfs-fails-to-combine-calls-from-whole-genome-gvcfs-if-an-interval-file-is-provided

@cwhelan
Copy link
Member

cwhelan commented Dec 10, 2014

Just wanted to chime in and say that it would be great to handle ranged variants well from a structural variation perspective too..

@amilev
Copy link

amilev commented Dec 10, 2014

I second Joel, I had to produce base level resolution gVCF to avoid it which is an "ugly" workaround

@ldgauthier
Copy link
Contributor

I third Joel.

On Wed, Dec 10, 2014 at 10:20 AM, amilev [email protected] wrote:

I second Joel, I had to produce base level resolution gVCF to avoid it
which is an "ugly" workaround


Reply to this email directly or view it on GitHub
#4 (comment)
.

@vdauwera
Copy link
Contributor

I fourth Joel on behalf of the users.Also, would it be possible to allow overlapping intervals without merging? Users have asked for this in order to handle alternative transcripts in walkers like DepthOfCoverage.On Wed, Dec 10, 2014 at 10:22 AM, ldgauthier [email protected] wrote: I third Joel.

On Wed, Dec 10, 2014 at 10:20 AM, amilev [email protected] wrote:

I second Joel, I had to produce base level resolution gVCF to avoid it

which is an "ugly" workaround

Reply to this email directly or view it on GitHub

#4 (comment)

.

—Reply to this email directly or view it on GitHub.

@akiezun akiezun added the Engine label Dec 11, 2014
@akiezun
Copy link
Contributor Author

akiezun commented Dec 15, 2014

I'm going to assign it to @kshakir for now - to implement the simplest -L (one set of intervals, provided by a file). He may choose to split this issue into smaller ones for more granular features. The approach we're going to take is to implement only the features of -L that we need. The first milestone reflects the first feature to implement.

@droazen
Copy link
Collaborator

droazen commented Jan 13, 2015

This may require us to decide whether to support both GATK-style interval files and Picard-style interval files, or standardize on a single format.

We also need to improve the code that detects the interval file format (currently catches exception from Picard to detect GATK interval files)

@droazen
Copy link
Collaborator

droazen commented Jan 13, 2015

@yfarjoun @nh13 we'd like your opinion on this issue of whether to continue to support both Picard and GATK interval file formats

@droazen
Copy link
Collaborator

droazen commented Jan 13, 2015

Should also support -XL

@droazen
Copy link
Collaborator

droazen commented Jan 13, 2015

Note that right now the hellbender engine is hardcoded to merge adjacent/overlapping intervals when doing read traversals (as required by the htsjdk API) whereas in the GATK the user had the option of not merging intervals

@yfarjoun
Copy link
Contributor

I think that especially with the imminent switch over to the next reference, I would prefer the Picard version that includes a dictionary header. otherwise we are setting ourselves up for a lot of support questions whose answer is (after hours of debugging) "you are using the wrong reference".

@droazen
Copy link
Collaborator

droazen commented Jan 13, 2015

I'll add: the relevant code has mostly been moved over already (the IntervalUtils class). The current bad/unreliable method of distinguishing between GATK and Picard interval files can be seen in the intervalFileToList() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants