Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R crashes when doing a difference_left_join on larger datasets #41

Open
pabuta opened this issue Apr 20, 2018 · 2 comments
Open

R crashes when doing a difference_left_join on larger datasets #41

pabuta opened this issue Apr 20, 2018 · 2 comments

Comments

@pabuta
Copy link

pabuta commented Apr 20, 2018

I have the following setting:

DatasetA: approximately 20 thousand rows, coordinates are given as latA, lonA
DatasetB: approximately 2 million rows, coordinates are given as latB, lonB

Because the coordinates do not exactly match, I tried the following:

DatasetC <- DatasetA %>%
difference_left_join(DatasetB, by = c("latA" = "latB", "lonA" = "lonB"), max_dist = 2)

This works when I take a sample (e.g. 10%) from DatasetA but repeatedly crashes when using the entire dataset. Did you experience similar behaviour?

@dylanbeaudette
Copy link

PostGIS or a GEOS-based R package such as spdep or sf would be far more efficient for this kind of operation.

@markbneal
Copy link

If you like the fuzzy join approach, you may be able to subset your data frames by region, then fuzzy join within regions, then join up the resultant data frames.

#51 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants