Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genome_cluster may have a bug. #5

Closed
wataken1127 opened this issue May 22, 2019 · 4 comments
Closed

genome_cluster may have a bug. #5

wataken1127 opened this issue May 22, 2019 · 4 comments

Comments

@wataken1127
Copy link

wataken1127 commented May 22, 2019

Thank you for your great package.

genome_cluster does not work well when the range has several numbers of digits.
for example,

x2 <- data.frame(id = 1:3, bla=letters[1:3],
chromosome = c("chr1", "chr1", "chr1"),
start = c(1696, 2846, 945),
end = c(1700, 2850, 946))
genome_cluster(x2, by=c("chromosome", "start", "end"))

dose not work.
cluster_id of "a" and "c" is "0", and that of "b" is "1".
(it should be 0, 1, and 2, right?)

I guess genome_cluster cannot distinguish ranges with different numbers of digits like 1700 and 945.
Do you have any good idea?

Thanks,
Kentaro

@const-ae
Copy link
Owner

Hi Kentaro,

You are absolutely right, that is a bug and I am very sorry if it caused you any inconveniences.
The problem is not related to the number of digits, but seems to be related to the order of the ranges. I am pretty sure the offending method is cluster_interval.

I will try to fix this as soon as possible, but I haven't looked into this code for about two years, so I might take a day or two to figure out what is happening there.

I will comment again, when I understand what is happening there.

Best Regards,
Constantin

@wataken1127
Copy link
Author

Dear Constantin,
Thank you for your prompt reply.
I will wait for the update.

Regards,
Kentaro

const-ae added a commit that referenced this issue May 22, 2019
Initialize prev_end with smallest possible value and then count upwards
@const-ae
Copy link
Owner

I have fixed the issue, now the clustering returns the expected numbers

library(tidygenomics)
x2 <- data.frame(id = 1:3, bla=letters[1:3],
                 chromosome = c("chr1", "chr1", "chr1"),
                 start = c(1696, 2846, 945),
                 end = c(1700, 2850, 946))
genome_cluster(x2, by=c("chromosome", "start", "end"))
#> # A tibble: 3 x 6
#>      id bla   chromosome start   end cluster_id
#>   <int> <fct> <fct>      <dbl> <dbl>      <dbl>
#> 1     1 a     chr1        1696  1700          1
#> 2     2 b     chr1        2846  2850          2
#> 3     3 c     chr1         945   946          0

Created on 2019-05-23 by the reprex package (v0.2.1)

To get the latest version, install the package from GitHub

devtools::install_github("const-ae/tidygenomics")

I will try push the updated version to CRAN within the next days. Thank you again, for raising the issue.

@const-ae
Copy link
Owner

The fixed version is on CRAN now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants