Help subsetting samples #134

oyhel · 2019-09-17T15:27:50Z

vcfpy version:
0.12.1
Python version:
3.6.3
Operating System:
ubuntu 18.04

Description

I need to subset samples from a large VCF with approximately 100.000 samples. As the extraction is the only part of the workflow outside python it would be very nice to be able to use vcfpy for this. I read the docs and also tried to deduce how the package works from the source. However, I am coming up short and would very much appreciate some help understanding how the subsetting of samples works.

What I Did

import vcfpy
reader = vcfpy.Reader.from_path()

subset_samples = sel = ['sample123', 'sample124']
reader = vcfpy.Reader.from_path('myfile.vcf.gz', parsed_samples=subset_samples)

reader.parsed_samples returns the samples chosen to be parsed in the input:
['sample123', 'sample124']

When iterating over the records in the reader object:
for record in reader: record

I get:

<vcfpy.record.UnparsedCall object at 0x7f065fd30f98>

So I am assuming that the parser respects my list of samples to be parsed. However I am struggling with getting the calls for only these samples.

My goal is to extract all markers, but only for a few samples.

Any help would be very much appreciated!

The text was updated successfully, but these errors were encountered:

kylec · 2020-01-15T20:28:00Z

I have the same problem. I set parsed_samples for a subset of samples like you did but reader.calls still return calls for all the samples.

I got around it by finding the index of my samples.

sample_index = vcf_reader.header.samples.name_to_idx[subset_samples]
for row in reader:
  row.calls[sample_index]

holtgrewe · 2020-01-16T21:17:08Z

Hi, I think that you are looking for Record.call_for_sample.

for record in reader:
    for sample in sel:
        call = reader.call_for_sample[sample]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help subsetting samples #134

Help subsetting samples #134

oyhel commented Sep 17, 2019

kylec commented Jan 15, 2020

holtgrewe commented Jan 16, 2020

Help subsetting samples #134

Help subsetting samples #134

Comments

oyhel commented Sep 17, 2019

Description

What I Did

kylec commented Jan 15, 2020

holtgrewe commented Jan 16, 2020