Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming distance_col when matching along multiple variables #84

Open
spspitze opened this issue Mar 9, 2022 · 0 comments
Open

Naming distance_col when matching along multiple variables #84

spspitze opened this issue Mar 9, 2022 · 0 comments

Comments

@spspitze
Copy link

spspitze commented Mar 9, 2022

I'm experimenting with matching along n variables (ex x1 and x2) and want to keep track of the distance for each variable (distance_col = "distance"). You can do this, but the data frame creates n + 1 variables, a distance measure for each variable with the corresponding prefix (x1.distance) and an original distance measure distance that is only NA's. It would be nice if this were dropped automatically.

library(tidyverse)
library(fuzzyjoin)

ex_1 <- tibble(
  x1 = c("how", "now", "brown", "cow"),
  x2 = c("what", "do", "I", "know")
)

ex_2 <- tibble(
  x1 = c("hw", "nw", "brwn", "cw"),
  x2 = c("wht", "d", "I", "knw")
)

stringdist_inner_join(ex_1, ex_2, by = c("x1", "x2"),
                      method = "lv",
                      distance_col = "distance")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant