Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pedigreeCross fails with half founders #131

Open
AudreyAAMartin opened this issue Apr 11, 2023 · 3 comments
Open

pedigreeCross fails with half founders #131

AudreyAAMartin opened this issue Apr 11, 2023 · 3 comments

Comments

@AudreyAAMartin
Copy link
Contributor

This is an adapted example from the pedigreeCross() - we have added half-founders (individual 3 has missing mother, while individual 4 has a missing father)

> founderPop = quickHaplo(nInd=4, nChr=1, segSites=10)
> id = 1:12
> mother = c(0,0,0,1,1,3:9)
> father = c(0,0,2,0,2,3:9)
> pop2 = pedigreeCross(pop, id, mother, father, simParam=SP)
Error in .local(x, i, ...) : Trying to select invalid individuals
@gregorgorjanc
Copy link
Contributor

The issue is that when we have a partial pedigree, we get individuals that have one parent known and one unknown. For founders (both parents unknown) we need to create one founding genome and we assign that genome to the founder. For the half founders, we need to create one parent genome for the unknown parent, hence one more founder for every individual that has only mother missing and one more founder for every individual that has only father missing - and then we cross these additional founders with full founders or newly generated individuals so that we can generate genome of these half founders.

This issue is overlooked in the current implementation - probably due to limited testing. @AudreyAAMartin is working with real (=messy, partial) pedigree and found this issue.

I have drafted an implementation that handles the above logic (requiring one extra founder for every half founder in addition to full founders) at gregorgorjanc@2190026, though it currently does not work with matchID=TRUE. I can see how to accommodate this, but would like to get feedback from you @gaynorr before I move onward.

@gregorgorjanc
Copy link
Contributor

I have switched off random shuffle of founders as I think this can just create confusion - founder genomes are sampled anyway (hence there is no real order there), so best to avoid this.

gregorgorjanc@c1eae76

I get this now using the above

#Incomplete pedigree case (using matchID=FALSE)
founderPop = quickHaplo(nInd=8, nChr=1, segSites=10)
SP = SimParam$new(founderPop)$setTrackRec(TRUE)
pop = newPop(founderPop, simParam=SP)
id = 1:10
mother = c(0,0,1,0,1,3,3,0,0,NA)
father = c(0,0,0,2,2,4,0,4,0,NA)
pop2 = pedigreeCross(pop, id, mother, father, simParam=SP)
pullIbdHaplo(pop2)

with this output

     1_1 1_2 1_3 1_4 1_5 1_6 1_7 1_8 1_9 1_10
1_1    1   1   1   1   1   1   1   1   1    1
1_2    2   2   2   2   2   2   2   2   2    2
2_1    3   3   3   3   3   3   3   3   3    3
2_2    4   4   4   4   4   4   4   4   4    4
3_1    1   1   2   2   2   2   1   1   1    1
3_2   14  14  14  13  13  13  13  13  13   13
4_1   10  10  10  10  10  10  10  10  10   10
4_2    3   3   3   3   3   3   3   3   3    3
5_1    2   2   1   1   2   2   2   1   1    1
5_2    3   3   3   3   3   3   3   3   3    3
6_1    1   1   2   2   2   2   1   1   1    1
6_2    3   3   3   3   3  10  10  10  10   10
7_1   14  14  14   2   2   2   1   1   1    1
7_2   15  15  15  15  16  15  15  15  15   15
8_1   11  11  12  12  11  11  11  11  11   11
8_2    3   3   3   3   3   3   3   3   3    3
9_1    5   5   5   5   5   5   5   5   5    5
9_2    6   6   6   6   6   6   6   6   6    6
10_1   7   7   7   7   7   7   7   7   7    7
10_2   8   8   8   8   8   8   8   8   8    8

@gaynorr
Copy link
Owner

gaynorr commented Apr 12, 2023

This relates to #79. As Gregor said, pedigreeCross is pretty fragile because it hasn't been well tested. There are probably some other cases where the function breaks.

The intent of the random shuffling was to deal with imported real data. The idea was that this would make performing multiple replicates of a gene drop experiment easier when you were using matchID=FALSE. I wasn't really considering using it on simulated data. It's one of those functions that was developed for a very specific use case and only made available because it could potentially be more widely used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants