-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use random projection trees #178
Comments
I agree, that's a bug. I think there are some somewhat sensible ways I can fix that, but I'm not sure when I can get to it. |
I just pushed some fixes for this to the master branch. If you have the time to check it out and see if it resolves your problem I would greatly appreciate it. |
I pip-uninstalled the version that I got via pip and installed the version from this repository. Now,
|
Ah yes, I think I see what the problem there would. I think I can fix that; hopefully I'll get to it in the next few days. |
Now, I looked into the code. If I understand correctly, the exception is raised because the code finds the hyperplanes, but their normal vectors are of dimension 1 (which is the dimension of dummy normal vectors Since the dimension of the normal vectors equals the dimension of the data, it seems that changing the signature of Of course, by doing so, we skip the "degenerate-tree" check in However, (why) do we even care whether trees contain more than one node, i.e., at least one hyperplane was found? I played around a bit (using EDIT: Also all the existing pytests pass (but there is no test yet that covers 1D data). |
A yes, I see. The 1D data is going to potentially be an issue. I'll add a test for 1D data and see if I can wring out the last issues. |
With regard to the changes you are proposing here. They seem to make sense, but I admit (without line references) I don't entirely follow exactly what you are proposing. I would certainly welcome a PR with these changes if you can manage it. |
By mistake, I have also included my implementation of (tested in |
Problem description:
My dataset cannot be represented as a
N_INSTANCES x N_DIMENSIONS
table of floats: the instances are rows from the main table in a relational database.Thus, my custom distance measure takes the ids of two instances (
i1
andi2
in the code below) as its arguments, and computes the distance between the corresponding instances by accessing data from many tables (imagine computing the distance between two movies, given the actors that appeared in the movies, the genres of the movies, etc.).This is why random projection trees cannot help: randomly dividing instances according to their ids does not make sense.
This is why I use
pynndescent.NNDescent(..., tree_init=False)
. However, whenindex.prepare()
is called, pynndescent again tries to grow some trees and the minimal working example (below) raises an error (no hyper planes found).An ugly solution:
It turns out that providing examples as
[id, id, ..., id]
instead of[id]
resolves the issue (the more examples we have, the more copies of the id are necessary), but I still think that the behaviour of theprepare()
should be considered a bug.A possible solution [needs to be implemented]:
Maybe we could start the search in a random node, or the closest among random
k
nodes, ork
random nodes.Traceback:
Minimal (not)working example:
The text was updated successfully, but these errors were encountered: