patch 9.1.1276: inline word diff treats multibyte chars as word char

Problem:  inline word diff treats multibyte chars as word char
          (after 9.1.1243)
Solution: treat all non-alphanumeric characters as non-word characters
          (Yee Cheng Chin)

Previously inline word diff simply used Vim's definition of keyword to
determine what is a word, which leads to multi-byte character classes
such as emojis and CJK (Chinese/Japanese/Korean) characters all
classifying as word characters, leading to entire sentences being
grouped as a single word which does not provide meaningful information
in a diff highlight.

Fix this by treating all non-alphanumeric characters (with class number
above 2) as non-word characters, as there is usually no benefit in using
word diff on them. These include CJK characters, emojis, and also
subscript/superscript numbers. Meanwhile, multi-byte characters like
Cyrillic and Greek letters will still continue to considered as words.

Note that this is slightly inconsistent with how words are defined
elsewhere, as Vim usually considers any character with class >=2 to be
a "word".

related: #16881 (diff inline highlight)
closes: #17050

Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Christian Brabandt <cb@256bit.org>
This commit is contained in:
Yee Cheng Chin
2025-04-04 19:16:21 +02:00
committed by Christian Brabandt
parent b8d5c85099
commit 9aa120f7ad
6 changed files with 43 additions and 6 deletions

View File

@ -1,4 +1,4 @@
*options.txt* For Vim version 9.1. Last change: 2025 Mar 28
*options.txt* For Vim version 9.1. Last change: 2025 Apr 04
VIM REFERENCE MANUAL by Bram Moolenaar
@ -2989,7 +2989,10 @@ A jump table for the options with a short description can be found at |Q_op|.
difference.
word Use internal diff to perform a
|word|-wise diff and highlight the
difference.
difference. Non-alphanumeric
multi-byte characters such as emoji
and CJK characters are considered
individual words.
internal Use the internal diff library. This is
ignored when 'diffexpr' is set. *E960*