Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join two tables by index range but table length increased #87

Open
garyzhubc opened this issue Dec 22, 2023 · 0 comments
Open

Join two tables by index range but table length increased #87

garyzhubc opened this issue Dec 22, 2023 · 0 comments

Comments

@garyzhubc
Copy link

I'm trying to follow this answer that join two tables with range: https://stackoverflow.com/a/46341899/6636572

I want to join two tables where one has some ranges and another are numbers and I want to annotate the numbers by matching which range those numbers are in from another dataframe.

However the length of the table surprisingly increased. I expect it to stay the same as 2832. What's happening? How to trouble shoot this? The result doesn't change if I use fuzzy_left_join

> head(gene_list_selected)
  chr start_pos  end_pos  gene_name
1   1  55013806 55100417     ACOT11
2   1  55074849 55089200    FAM151A
3   1  55107412 55175940      MROH7
4   1  55107412 55208328 MROH7-TTC4
5   1  55181494 55208328       TTC4
6   1  55222570 55230226      PARS2
> head(df)
            rsid      pos
1 1:55013860:C:T 55013860
2 1:55013957:G:A 55013957
3 1:55014013:C:T 55014013
4 1:55014095:C:T 55014095
5 1:55014099:C:T 55014099
6 1:55014100:G:A 55014100
> nrow(gene_list_selected)
[1] 21
> nrow(df)
[1] 2832
> df_with_gene_name<-df %>% 
+     fuzzy_inner_join(gene_list_selected, by = c("pos"="start_pos","pos"="end_pos"), match_fun = list(`>=`, `<=`)) 
> 
> nrow(df_with_gene_name)
[1] 3298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant