Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow sex dedup determination #114

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

afaulconbridge
Copy link
Contributor

When deduplicating sex chromosomes is a useful thing to do. However, in some conditions the default methods of sex detection can give incorrect results e.g. when homozygous reference alleles are omitted from the source such as in whole-exome or whole-genome. Therefore it is useful to be able to specify the sex detection method used when deciding how to deduplicate the sex chromosomes.

This PR both supports the existing use of a boolean for deduplicate_XY_chrom as well as using a String that can be passed to determine_sex() as the chrom parameter.

( I'll probably have another PR soon for improvements to determine_sex() itself. )

@codecov-io
Copy link

codecov-io commented Nov 24, 2020

Codecov Report

Merging #114 (e535507) into develop (91bfccf) will increase coverage by 1.37%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #114      +/-   ##
===========================================
+ Coverage    92.98%   94.36%   +1.37%     
===========================================
  Files            8        8              
  Lines         1468     1419      -49     
  Branches       252      252              
===========================================
- Hits          1365     1339      -26     
+ Misses          65       41      -24     
- Partials        38       39       +1     
Impacted Files Coverage Δ
src/snps/snps.py 95.49% <0.00%> (+0.88%) ⬆️
src/snps/io/reader.py 97.87% <0.00%> (-0.02%) ⬇️
src/snps/resources.py 94.73% <0.00%> (+0.03%) ⬆️
src/snps/ensembl.py 70.73% <0.00%> (+39.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91bfccf...87fc45d. Read the comment docs.

Copy link
Owner

@apriha apriha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afaulconbridge . What are your thoughts about adding a new kwarg instead, perhaps determine_sex_chrom="X", which would set a member variable (e.g., _determine_sex_chrom), that would be used for this and calls to determine_sex via the sex property?

@afaulconbridge
Copy link
Contributor Author

Hmm, interesting idea. I like it because it only has to be set in one place, and can be used in many as a default. It would still have to be overridable on specific invocations, but that shouldn't be a problem.

I'll have a try with that approach and see how it works in practice.

@afaulconbridge
Copy link
Contributor Author

@apriha I've implemented the selection of sex detection method as an instance variable, so it can be shared between methods easily. Theres are some further improvements to be made (e.g. a separate threshold to detect low heterozygous X chromosomes as male, number of non-null Y snps as a proportion of the total snps) but they are separate issues for the future.

Copy link
Owner

@apriha apriha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @afaulconbridge - I like how the sex determination method and thresholds are accessible via the SNPs object.

Eventually it'd be nice to have a determination method (e.g., all) that checks all methods. A dict could store the sex determined for each method (perhaps accessible by a new property for later inspection).

s._sex_method = "Y"
self.assertEqual(s.sex, "") # no Y chromosome
s._sex_method = "XY"
self.assertEqual(s.sex, "") # can't detect sex, no Y present & X is not male
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the four occurrences of this comment be # can't detect sex, no Y present & X is not female?

@@ -95,6 +98,12 @@ def __init__(
processes to launch if multiprocessing
rsids : tuple, optional
rsids to extract if loading a VCF file
sex_method : string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sex_method : string
sex_method : {"X", "Y", "XY", "YX"}

@@ -95,6 +98,12 @@ def __init__(
processes to launch if multiprocessing
rsids : tuple, optional
rsids to extract if loading a VCF file
sex_method : string
sex detection method code: "X", "Y", "XY" and "YX supported
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd briefly describe the behavior of each method here, e.g., to help understand when "XY" should be used vs. "X".

percentage Y SNPs that are not null; above this threshold, Male is determined
chrom : {"X", "Y"}
use X or Y chromosome SNPs to determine sex
def determine_sex(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only called from the sex property and it takes no parameters, this could be a private method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants