You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi. I am able to deploy LSHSuperBitNNDescentTextExample successfully in our spark cluster. I really like the idea of pre-calculating the stringProfiles via ks.getProfile and performance is good.
I am testing the NNCTPHExample and trying to feed NNCTPH the pre-calculated the stringProfiles. Unfortunately, it seems like the NNCTPH constructor and .setSimilarity only takes in String? Can we make NNCTPH take in StringProfile or SparseIntegerVector? It is a lot slower than LSHSuperBitNNDescentTextExample, and I suspect it has to recalculate the profiles at every comparison. I also replaced Jaro-Winkler with the more cost efficient Jaccard index, which improved performance slightly.
The text was updated successfully, but these errors were encountered:
thiakx
changed the title
NNCTPH take in StringProfile or SparseIntegerVector?
Make NNCTPH take in StringProfile or SparseIntegerVector?
Jun 21, 2016
Your idea is good, but NNCTPH is currently not compatible with this approach:
NNCTPH requires a simple String as input, so it can compute a hash and bin the data in different buckets, while you would like to compute similarity between the profile representation of these strings.
One solution would be to refactor NNCTPH so it uses an interface as input (instead of the Node class). I will make some tests and keep you informed...
Hi. I am able to deploy LSHSuperBitNNDescentTextExample successfully in our spark cluster. I really like the idea of pre-calculating the stringProfiles via ks.getProfile and performance is good.
I am testing the NNCTPHExample and trying to feed NNCTPH the pre-calculated the stringProfiles. Unfortunately, it seems like the NNCTPH constructor and .setSimilarity only takes in String? Can we make NNCTPH take in StringProfile or SparseIntegerVector? It is a lot slower than LSHSuperBitNNDescentTextExample, and I suspect it has to recalculate the profiles at every comparison. I also replaced Jaro-Winkler with the more cost efficient Jaccard index, which improved performance slightly.
The text was updated successfully, but these errors were encountered: