-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzy searching, all-in-SQLite version #103
Conversation
Thanks! I'll play with this shortly :) |
Sorry for the delay :) This seems plenty fast enough. It'd be really nice if it could prioritize longer matches somehow, though. ...I'm not sure how best to do that using the |
No worries, there's a lot happening right now 😬 I'll look at fixing up the highlighting soon but here's how you'd prioritize longer matches in general, for results having the same
However, I don't think it's a good idea. Especially with fuzzy searching, shorter matches get filtered out naturally as you continue to type, so prioritizing longer matches makes the shorter all but unreachable by fuzzy searching alone.
If anything, I'd want the opposite, so |
Agreed, that shorter commands (or longer contiguous substring matches?)
would be better. Since the rank from Mcfly is a float, though, I think the
length would almost never be used in your implementation.
…On Thu, Oct 15, 2020 at 6:11 AM Dian Fay ***@***.***> wrote:
No worries, there's a lot happening right now 😬
I'll look at fixing up the highlighting soon but here's how you'd
prioritize longer matches in general, for results having the same rank:
diff --git a/src/history/history.rs b/src/history/history.rs
index 6c2eca1..cea1d2f 100644
--- a/src/history/history.rs
+++ b/src/history/history.rs
@@ -247,7 +247,8 @@ impl History {
selected_occurrences_factor, occurrences_factor
FROM contextual_commands
WHERE cmd LIKE (:like)
- ORDER BY rank DESC LIMIT :limit";
+ ORDER BY rank DESC, length(cmd) DESC
+ LIMIT :limit";
let mut statement = self
.connection
.prepare(query)
However, I don't think it's a good idea. Especially with fuzzy searching,
shorter matches get filtered out naturally as you continue to type, so
prioritizing longer matches makes the shorter all but unreachable by fuzzy
searching alone.
$ cruns
cargo run -- search --fuzzy
cargo run -- search
If anything, I'd want the opposite, so cargo run -- search would be the
top result until you entered an f or a z.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#103 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAUO623Y5MWFJIGV64YPC3SK3YJHANCNFSM4R5DKHVA>
.
|
Serves me right for not checking; you could order by rank rounded to int, then length (ascending), then the full rank, but that's getting a bit kludgy. Weighting contiguous characters higher would be a good next step after basic fuzzy searching. That's beyond the capabilities of SQLite so it'd have to happen after results are returned. Results near the bottom could have worse quality than those that didn't get in under the |
I'd be fine merging this as-is if we could fix highlighting, and then we can always work on improving the sorting. |
This implementation could be smartened up a little bit -- right now |
Fuzzy matches are now weighted by length, so a lower-ranked but shorter match has a chance to come in above a higher-ranked but longer match. For example, with the search text
The difference in the two match lengths (18 vs 111 characters) is factored into the weight calculation. Here the real ssh command gets a rank bump of +0.86 while the longer string's rank is increased only by +0.14, so the ssh command can overcome an unweighted rank disparity of up to 0.72. If the longer match were instead, say, 30 characters, the ssh command would be ranked at +0.63 while the longer string would add +0.38. The ssh command's unweighted rank would have to be within 0.25 of the longer string to jump ahead. |
I think this is a very sensible approach. Ideally, it'd be nice to empirically derive some sort of weighting for fuzzy matches that takes into account how long the character runs are, how many different sections it's matching, etc., but to do this, I think we'd need a bunch of training data that we don't have. For the time being, this seems fine if it's working well for you in practice. |
I'll install this locally and start using it in fuzzy mode as well. The code looks good. Would you mind running |
Updated with formatting. I've been using it locally and am pretty happy with it; there's definitely room for further improvements but it's already quite useful. |
Been working for me! Thanks for the contribution :) |
Released in v0.5.1 :) |
A quarter-second faster than the skim version at ~100k entries; the one thing this needs is highlight detection, which still only looks for a contiguous match.