patch 9.1.1627: fuzzy matching can be improved
Problem: fuzzy-matching can be improved Solution: Implement a better fuzzy matching algorithm (Girish Palya) Replace fuzzy matching algorithm with improved fzy-based implementation The [current](https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/) fuzzy matching algorithm has several accuracy issues: * It struggles with CamelCase * It fails to prioritize matches at the beginning of strings, often ranking middle matches higher. After evaluating alternatives (see my comments [here](https://github.com/vim/vim/issues/17531#issuecomment-3112046897) and [here](https://github.com/vim/vim/issues/17531#issuecomment-3121593900)), I chose to adopt the [fzy](https://github.com/jhawthorn/fzy) algorithm, which: * Resolves the aforementioned issues. * Performs better. Implementation details This version is based on the original fzy [algorithm](https://github.com/jhawthorn/fzy/blob/master/src/match.c), with one key enhancement: **multibyte character support**. * The original implementation supports only ASCII. * This patch replaces ascii lookup tables with function calls, making it compatible with multibyte character sets. * Core logic (`match_row()` and `match_positions()`) remains faithful to the original, but now operates on codepoints rather than single-byte characters. Performance Tested against a dataset of **90,000 Linux kernel filenames**. Results (in milliseconds) show a **\~2x performance improvement** over the current fuzzy matching algorithm. ``` Search String Current Algo FZY Algo ------------------------------------------------- init 131.759 66.916 main 83.688 40.861 sig 98.348 39.699 index 109.222 30.738 ab 72.222 44.357 cd 83.036 54.739 a 58.94 62.242 b 43.612 43.442 c 64.39 67.442 k 40.585 36.371 z 34.708 22.781 w 38.033 30.109 cpa 82.596 38.116 arz 84.251 23.964 zzzz 35.823 22.75 dimag 110.686 29.646 xa 43.188 29.199 nha 73.953 31.001 nedax 94.775 29.568 dbue 79.846 25.902 fp 46.826 31.641 tr 90.951 55.883 kw 38.875 23.194 rp 101.575 55.775 kkkkkkkkkkkkkkkkkkkkkkkkkkkkk 48.519 30.921 ``` ```vim vim9script var haystack = readfile('/Users/gp/linux.files') var needles = ['init', 'main', 'sig', 'index', 'ab', 'cd', 'a', 'b', 'c', 'k', 'z', 'w', 'cpa', 'arz', 'zzzz', 'dimag', 'xa', 'nha', 'nedax', 'dbue', 'fp', 'tr', 'kw', 'rp', 'kkkkkkkkkkkkkkkkkkkkkkkkkkkkk'] for needle in needles var start = reltime() var tmp = matchfuzzy(haystack, needle) echom $'{needle}' (start->reltime()->reltimefloat() * 1000) endfor ``` Additional changes * Removed the "camelcase" option from both matchfuzzy() and matchfuzzypos(), as it's now obsolete with the improved algorithm. related: neovim/neovim#34101 fixes #17531 closes: #17900 Signed-off-by: Girish Palya <girishji@gmail.com> Signed-off-by: Christian Brabandt <cb@256bit.org>
This commit is contained in:
committed by
Christian Brabandt
parent
5ba6e41d37
commit
7e0df5eee9
@ -41723,6 +41723,8 @@ Functions: ~
|
||||
- Add the optional {opts} |Dict| argument to |getchar()| to control: cursor
|
||||
behaviour, return type and whether or not to simplify the returned key
|
||||
- |chdir()| allows to optionally specify a scope argument
|
||||
- |matchfuzzy()| and |matchfuzzypos()| use an improved fuzzy matching
|
||||
algorithm (same as fzy).
|
||||
|
||||
Others: ~
|
||||
- the regex engines match correctly case-insensitive multi-byte characters
|
||||
|
Reference in New Issue
Block a user