Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the newest version of fuzzyjoin, when joining data.tables, they lose the data.table attribute #75

Open
emilBeBri opened this issue Oct 30, 2020 · 0 comments

Comments

@emilBeBri
Copy link

In the new version of fuzzyjoin, joining data.tables makes them stop being data.tables.

Just updated to R 4.* and therefore alot of packages updated as well. In these new versions - I made a complete uninstall of my OS so don't know which versions it was - , joining data.tables with fuzzyjoin was suddenly a problem if later code relied on the data.table syntax.

reprex:

library(data.table)
library(fuzzyjoin)
a1 <- data.table(name=c('suzy', 'suxy', 'John', 'Janni', 'Tom'))
b1 <- data.table(name=c('suzzy', 'johnn', 'Jannice', 'Tom'))
c1 <- stringdist_inner_join(a1, b1, by = 'name', method='lv', max_dist=1, ignore_case=T, distance_col='fuzzy_dist')
is.data.table(c1)

you can easily recreate that with:

setDT(c1)
is.data.table(c1)

So it's easy to fix, but it broke some functions for matching i had made that relied on the data.table syntax after the stringdist_inner_join() was applied.

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8    
 [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8   
 [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fuzzyjoin_0.1.6   data.table_1.13.2

loaded via a namespace (and not attached):
 [1] stringdist_0.9.6.3 tidyr_1.1.2        crayon_1.3.4.9000  dplyr_1.0.2       
 [5] R6_2.4.1           lifecycle_0.2.0    magrittr_1.5       pillar_1.4.6      
 [9] stringi_1.5.3      rlang_0.4.8        vctrs_0.3.4        generics_0.0.2    
[13] ellipsis_0.3.1     tools_4.0.3        stringr_1.4.0      glue_1.4.2        
[17] purrr_0.3.4        parallel_4.0.3     compiler_4.0.3     pkgconfig_2.0.3   
[21] tidyselect_1.1.0   tibble_3.0.4 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant