genome_cluster may have a bug. #5

wataken1127 · 2019-05-22T06:34:24Z

Thank you for your great package.

genome_cluster does not work well when the range has several numbers of digits.
for example,

x2 <- data.frame(id = 1:3, bla=letters[1:3],
chromosome = c("chr1", "chr1", "chr1"),
start = c(1696, 2846, 945),
end = c(1700, 2850, 946))
genome_cluster(x2, by=c("chromosome", "start", "end"))

dose not work.
cluster_id of "a" and "c" is "0", and that of "b" is "1".
(it should be 0, 1, and 2, right?)

I guess genome_cluster cannot distinguish ranges with different numbers of digits like 1700 and 945.
Do you have any good idea?

Thanks,
Kentaro

const-ae · 2019-05-22T08:25:03Z

Hi Kentaro,

You are absolutely right, that is a bug and I am very sorry if it caused you any inconveniences.
The problem is not related to the number of digits, but seems to be related to the order of the ranges. I am pretty sure the offending method is cluster_interval.

I will try to fix this as soon as possible, but I haven't looked into this code for about two years, so I might take a day or two to figure out what is happening there.

I will comment again, when I understand what is happening there.

Best Regards,
Constantin

wataken1127 · 2019-05-22T08:34:40Z

Dear Constantin,
Thank you for your prompt reply.
I will wait for the update.

Regards,
Kentaro

Initialize prev_end with smallest possible value and then count upwards

const-ae · 2019-05-23T09:41:38Z

I have fixed the issue, now the clustering returns the expected numbers

library(tidygenomics)
x2 <- data.frame(id = 1:3, bla=letters[1:3],
                 chromosome = c("chr1", "chr1", "chr1"),
                 start = c(1696, 2846, 945),
                 end = c(1700, 2850, 946))
genome_cluster(x2, by=c("chromosome", "start", "end"))
#> # A tibble: 3 x 6
#>      id bla   chromosome start   end cluster_id
#>   <int> <fct> <fct>      <dbl> <dbl>      <dbl>
#> 1     1 a     chr1        1696  1700          1
#> 2     2 b     chr1        2846  2850          2
#> 3     3 c     chr1         945   946          0

^{Created on 2019-05-23 by the reprex package (v0.2.1)}

To get the latest version, install the package from GitHub

devtools::install_github("const-ae/tidygenomics")

I will try push the updated version to CRAN within the next days. Thank you again, for raising the issue.

const-ae · 2019-05-27T17:31:59Z

The fixed version is on CRAN now.

const-ae added a commit that referenced this issue May 22, 2019

Fix issue #5

8b8320a

Initialize prev_end with smallest possible value and then count upwards

const-ae closed this as completed May 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genome_cluster may have a bug. #5

genome_cluster may have a bug. #5

wataken1127 commented May 22, 2019 •

edited

Loading

const-ae commented May 22, 2019

wataken1127 commented May 22, 2019

const-ae commented May 23, 2019

const-ae commented May 27, 2019

genome_cluster may have a bug. #5

genome_cluster may have a bug. #5

Comments

wataken1127 commented May 22, 2019 • edited Loading

const-ae commented May 22, 2019

wataken1127 commented May 22, 2019

const-ae commented May 23, 2019

const-ae commented May 27, 2019

wataken1127 commented May 22, 2019 •

edited

Loading