Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would be great for geo_join to use k-d trees #39

Open
dgrtwo opened this issue Feb 8, 2018 · 0 comments
Open

Would be great for geo_join to use k-d trees #39

dgrtwo opened this issue Feb 8, 2018 · 0 comments

Comments

@dgrtwo
Copy link
Owner

dgrtwo commented Feb 8, 2018

geo_join does a full m*n comparison of points. This works for moderately sized datasets but becomes intractable in both time and memory.

The right geo_join() implementation would likely use k-d trees to build a spatial index of one table, then look up points from the other, while taking advantage of the limited max_dist argument. This would turn it (on average) from O(m*n) time to O(m*log(n)) time. (Similar approaches could be used for functions such as distance_join, and on a related note b-k trees could be used for Levenshtein distance).

I may or may not have time to implement this but it would be a top-notch pull request.

@dgrtwo dgrtwo changed the title Would be great for geo_join to use kd trees Would be great for geo_join to use k-d trees Feb 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant