adam/vim - vim - Gitea: Git with a cup of tea

adam/vim

Author	SHA1	Message	Date
Christian Brabandt	f2b16986a1	patch 9.1.1258: regexp: max \U and \%U value is limited by INT_MAX Problem: regexp: max \U and \%U value is limited by INT_MAX but gives a confusing error message (related: v8.1.0985). Solution: give a better error message when the value reaches INT_MAX When searching Vim allows to get up to 8 hex characters using the /\V and /\%V regex atoms. However, when using "/\UFFFFFFFF" the code point is already above what an integer variable can hold, which is 2,147,483,647. Since patch v8.1.0985, Vim already limited the max codepoint to INT_MAX (otherwise it caused a crash in the nfa regex engine), but instead of error'ing out it silently fell back to parse the number as a backslash value and not as a codepoint value and as such this "/[\UFFFFFFFF]" will happily find a "\" or an literal "F". And this "/[\d127-\UFFFFFFFF]" will error out as "reverse range in character class). Interestingly, the max Unicode codepoint value is U+10FFFF which still fits into an ordinary integer value, which means, that we don't even need to parse 8 hex characters, but 6 should have been enough. However, let's not limit Vim to search for only max 6 hex characters (which would be a backward incompatible change), but instead allow all 8 characters and only if the codepoint reaches INT_MAX, give a more precise error message (about what the max unicode codepoint value is). This allows to search for "[\U7FFFFFFE]" (will likely return "E486 Pattern not found") and "[/\U7FFFFFF]" now errors "E1517: Value too large, max Unicode codepoint is U+10FFFF". While this change is straight forward on architectures where long is 8 bytes, this is not so simple on Windows or 32bit architectures where long is 4 bytes (and therefore the test fails there). To account for that, let's make use of the vimlong_T number type and make a few corresponding changes in the regex engine code and cast the value to the expected data type. This however may not work correctly on systems that doesn't have the long long datatype (e.g. OpenVMS) and probably the test will fail there. fixes: #16949 closes: #16994 Signed-off-by: Christian Brabandt <cb@256bit.org>	2025-03-29 09:08:58 +01:00
Christian Brabandt	22e8e12d9f	patch 9.1.0645: regex: wrong match when searching multi-byte char case-insensitive Problem: regex: wrong match when searching multi-byte char case-insensitive (diffsetter) Solution: Apply proper case-folding for characters and search-string This patch does the following 4 things: 1) When the regexp engine compares two utf-8 codepoints case insensitive it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: * we try to match the correct length for the pattern and the text * in case of a match, we step over it correctly There is one tricky thing for the backtracing engine. We also need to calculate correctly the number of bytes to compare the 2 different utf-8 strings s1 and s2. So we will count the number of characters in s1 that the byte len specified. Then we count the number of bytes to step over the same number of characters in string s2 and then we can correctly compare the 2 utf-8 strings. 2) A similar thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. 3) A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. 4) When comparing characters using collections, we must also apply case folding to each character in the collection and not just to the current character from the search string. This doesn't apply to the NFA engine, because internally it converts collections to branches [abc] -> a\\|b\\|c fixes: #14294 closes: #14756 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-07-30 20:39:18 +02:00
Christian Brabandt	6043024cd4	patch 9.1.0412: typo in regexp_bt.c in DEBUG code Problem: typo in regexp_bt.c in DEBUG code, causing compile error (@kfleong7, after v9.1.0409) Solution: Replace bulen by buflen Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-05-14 11:19:47 +02:00
John Marriott	82792db631	patch 9.1.0409: too many strlen() calls in the regexp engine Problem: too many strlen() calls in the regexp engine Solution: refactor code to retrieve strlen differently, make use of bsearch() for getting the character class (John Marriott) closes: #14648 Signed-off-by: John Marriott <basilisk@internode.on.net> Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-05-12 00:07:17 +02:00
Christian Brabandt	c97f4d61cd	patch 9.1.0297: Patch 9.1.0296 causes too many issues Problem: Patch 9.1.0296 causes too many issues (Tony Mechelynck, @chdiza, CI) Solution: Back out the change for now Revert "patch 9.1.0296: regexp: engines do not handle case-folding well" This reverts commit `7a27c108e0` it causes issues with syntax highlighting and breaks the FreeBSD and MacOS CI. It needs more work. fixes: #14487 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-10 16:22:17 +02:00
Christian Brabandt	7a27c108e0	patch 9.1.0296: regexp: engines do not handle case-folding well Problem: Regex engines do not handle case-folding well Solution: Correctly calculate byte length of characters to skip When the regexp engine compares two utf-8 codepoints case insensitively it may match an adjacent character, because it assumes it can step over as many bytes as the pattern contains. This however is not necessarily true because of case-folding, a multi-byte UTF-8 character can be considered equal to some single-byte value. Let's consider the pattern 'ſ' and the string 's'. When comparing and ignoring case, the single character 's' matches, and since it matches Vim will try to step over the match (by the amount of bytes of the pattern), assuming that since it matches, the length of both strings is the same. However in that case, it should only step over the single byte value 's' so by 1 byte and try to start matching after it again. So for the backtracking engine we need to ensure: - we try to match the correct length for the pattern and the text - in case of a match, we step over it correctly The same thing can happen for the NFA engine, when skipping to the next character to test for a match. We are skipping over the regstart pointer, however we do not consider the case that because of case-folding we may need to adjust the number of bytes to skip over. So this needs to be adjusted in find_match_text() as well. A related issue turned out, when prog->match_text is actually empty. In that case we should try to find the next match and skip this condition. fixes: #14294 closes: #14433 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-04-09 22:53:19 +02:00
Julio B	46fa3c7e27	patch 9.1.0217: regexp: verymagic cannot match before/after a mark Problem: regexp: verymagic cannot match before/after a mark Solution: Correctly check for the very magic check (Julio B) Fix regexp parser for \v%>'m and \v%<'m Currently \v%'m works fine, but it is unable to match before or after the position of mark m. closes: #14309 Signed-off-by: Julio B <julio.bacel@gmail.com> Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-03-28 10:23:37 +01:00
Christian Brabandt	d2cc51f9a1	patch 9.1.0011: regexp cannot match combining chars in collection Problem: regexp cannot match combining chars in collection Solution: Check for combining characters in regex collections for the NFA and BT Regex Engine Also, while at it, make debug mode work again. fixes #10286 closes: #12871 Signed-off-by: Christian Brabandt <cb@256bit.org>	2024-01-04 22:54:08 +01:00
Christian Brabandt	be07caa071	patch 9.0.1777: patch 9.0.1771 causes problems Problem: patch 9.0.1771 causes problems Solution: revert it Revert "patch 9.0.1771: regex: combining chars in collections not handled" This reverts commit `ca22fc36a4`. Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 22:28:28 +02:00
Christian Brabandt	ca22fc36a4	patch 9.0.1771: regex: combining chars in collections not handled Problem: regex: combining chars in collections not handled Solution: Check for following combining characters for NFA and BT engine closes: #10459 closes: #10286 Signed-off-by: Christian Brabandt <cb@256bit.org>	2023-08-20 20:38:56 +02:00
RestorerZ	68ebcee023	patch 9.0.1594: some internal error messages are translated Problem: Some internal error messages are translated. Solution: Consistently do not translate internal error messages. (closes #12459)	2023-05-31 17:12:14 +01:00
Bram Moolenaar	ebfec1c531	patch 9.0.1234: the code style has to be checked manually Problem: The code style has to be checked manually. Solution: Add basic code style checks in a test. Fix or avoid uncovered problems.	2023-01-22 21:14:53 +00:00
Yegappan Lakshmanan	f97a295cca	patch 9.0.1221: code is indented more than necessary Problem: Code is indented more than necessary. Solution: Use an early return where it makes sense. (Yegappan Lakshmanan, closes #11833)	2023-01-18 18:17:48 +00:00
Bram Moolenaar	4c5678ff0c	patch 9.0.0977: it is not easy to see what client-server commands are doing Problem: It is not easy to see what client-server commands are doing. Solution: Add channel log messages if ch_log() is available. Move the channel logging and make it available with the +eval feature.	2022-11-30 18:12:19 +00:00
Bram Moolenaar	01105b37a1	patch 9.0.0951: trying every character position for a match is inefficient Problem: Trying every character position for a match is inefficient. Solution: Use the start position of the match ignoring "\zs".	2022-11-26 11:47:10 +00:00
Bram Moolenaar	753aead960	patch 9.0.0414: matchstr() still does not match column offset Problem: matchstr() still does not match column offset when done after a text search. Solution: Only use the line number for a multi-line search. Fix the test. (closes #10938)	2022-09-08 12:17:06 +01:00
Bram Moolenaar	75a115e8d6	patch 9.0.0407: matchstr() does match column offset Problem: matchstr() does match column offset. (Yasuhiro Matsumoto) Solution: Accept line number zero. (closes #10938)	2022-09-07 18:21:24 +01:00
Bram Moolenaar	13ed494bb5	patch 9.0.0228: crash when pattern looks below the last line Problem: Crash when pattern looks below the last line. Solution: Consider invalid lines to be empty. (closes #10938)	2022-08-19 13:59:25 +01:00
Bram Moolenaar	7f9969c559	patch 9.0.0067: cannot show virtual text Problem: Cannot show virtual text. Solution: Initial changes for virtual text support, using text properties.	2022-07-25 18:13:54 +01:00
Bram Moolenaar	509ce03831	patch 8.2.5137: cannot build without the +channel feature Problem: Cannot build without the +channel feature. (Dominique Pellé) Solution: Add #ifdef around ch_log() calls. (closes #10598)	2022-06-20 11:23:01 +01:00
Bram Moolenaar	616592e081	patch 8.2.5115: search timeout is overrun with some patterns Problem: Search timeout is overrun with some patterns. Solution: Check for timeout in more places. Make the flag volatile and atomic. Use assert_inrange() to see what happened.	2022-06-17 15:17:10 +01:00
Paul Ollis	6574577cac	patch 8.2.5057: using gettimeofday() for timeout is very inefficient Problem: Using gettimeofday() for timeout is very inefficient. Solution: Set a platform dependent timer. (Paul Ollis, closes #10505)	2022-06-05 16:55:54 +01:00
Bram Moolenaar	02e8d4e4ff	patch 8.2.5028: syntax regexp matching can be slow Problem: Syntax regexp matching can be slow. Solution: Adjust the counters for checking the timeout to check about once per msec. (closes #10487, closes #2712)	2022-05-27 15:35:28 +01:00
Christian Brabandt	360da40b47	patch 8.2.4978: no error if engine selection atom is not at the start Problem: No error if engine selection atom is not at the start. Solution: Give an error. (Christian Brabandt, closes #10439)	2022-05-18 15:04:02 +01:00
Bram Moolenaar	72bb10df1f	patch 8.2.4693: new regexp does not accept pattern "\%>0v" Problem: new regexp does not accept pattern "\%>0v". Solution: Do accept digit zero.	2022-04-05 14:00:32 +01:00
Bram Moolenaar	91ff3d4f52	patch 8.2.4688: new regexp engine does not give an error for "\%v" Problem: New regexp engine does not give an error for "\%v". Solution: Check for a value argument. (issue #10079)	2022-04-04 18:32:32 +01:00
Bram Moolenaar	b55986c52d	patch 8.2.4646: using buffer line after it has been freed Problem: Using buffer line after it has been freed in old regexp engine. Solution: After getting mark get the line again.	2022-03-29 13:24:58 +01:00
Bram Moolenaar	6456fae9ba	patch 8.2.4440: crash with specific regexp pattern and string Problem: Crash with specific regexp pattern and string. Solution: Stop at the start of the string.	2022-02-22 13:37:31 +00:00
Bram Moolenaar	424bcae1fb	patch 8.2.4273: the EBCDIC support is outdated Problem: The EBCDIC support is outdated. Solution: Remove the EBCDIC support.	2022-01-31 14:59:41 +00:00
Bram Moolenaar	b2810f123c	patch 8.2.4046: some error messages not in the right place Problem: Some error messages not in the right place. Solution: Adjust the errors file. Fix typo.	2022-01-08 21:38:52 +00:00
Bram Moolenaar	677658ae49	patch 8.2.4008: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-05 16:09:06 +00:00
Bram Moolenaar	a6f7929e62	patch 8.2.4005: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-04 21:30:47 +00:00
Bram Moolenaar	eaaac014a0	patch 8.2.3983: error messages are spread out Problem: Error messages are spread out. Solution: Move more error messages to errors.h.	2022-01-02 17:00:40 +00:00
Bram Moolenaar	74409f6279	patch 8.2.3970: error messages are spread out Problem: Error messages are spread out. Solution: Move more errors to errors.h.	2022-01-01 15:58:22 +00:00
Bram Moolenaar	d0819d11ec	patch 8.2.3962: build fails for missing error message Problem: Build fails for missing error message. Solution: Add changes in missed file.	2021-12-31 23:15:53 +00:00
Bram Moolenaar	12f3c1b77f	patch 8.2.3749: error messages are everywhere Problem: Error messages are everywhere. Solution: Move more error messages to errors.h and adjust the names.	2021-12-05 21:46:34 +00:00
Bram Moolenaar	d8e44476d8	patch 8.2.3197: error messages are spread out Problem: Error messages are spread out. Solution: Move a few more error messages to errors.h.	2021-07-21 22:20:33 +02:00
Bram Moolenaar	e29a27f6f8	patch 8.2.3190: error messages are spread out Problem: Error messages are spread out. Solution: Move error messages to errors.h and give them a clear name.	2021-07-20 21:07:36 +02:00
Bram Moolenaar	04db26b360	patch 8.2.3110: a pattern that matches the cursor position is complicated Problem: A pattern that matches the cursor position is bit complicated. Solution: Use a dot to indicate the cursor line and column. (Christian Brabandt, closes #8497, closes #8179)	2021-07-05 20:15:23 +02:00
Bram Moolenaar	872bee557e	patch 8.2.2885: searching for \%'> does not match linewise end of line Problem: searching for \%'> does not match linewise end of line. (Tim Chase) Solution: Match end of line if column is MAXCOL. (closes #8238)	2021-05-24 22:56:15 +02:00
Bram Moolenaar	df36514a64	patch 8.2.2829: some comments are not correct or clear Problem: Some comments are not correct or clear. Solution: Adjust the comments. Add test for cursor position.	2021-05-03 20:01:45 +02:00
Bram Moolenaar	0b94e297af	patch 8.2.2716: the equivalent class regexp is missing some characters Problem: The equivalent class regexp is missing some characters. Solution: Update the list of equivalent characters. (Dominique Pellé, closes #8029)	2021-04-05 13:59:53 +02:00
Bram Moolenaar	a3d10a508c	patch 8.2.2181: valgrind warnings for using uninitialized value Problem: Valgrind warnings for using uninitialized value. Solution: Do not use "start" or "end" unless there is a match.	2020-12-21 18:24:00 +01:00
Bram Moolenaar	a7a691cc14	patch 8.2.2121: internal error when using \ze before \zs in a pattern Problem: Internal error when using \ze before \zs in a pattern. Solution: Check the end is never before the start. (closes #7442)	2020-12-09 16:36:04 +01:00
Bram Moolenaar	e83cca2911	patch 8.2.1633: some error messages are internal but do not use iemsg() Problem: Some error messages are internal but do not use iemsg(). Solution: Use iemsg(). (Dominique Pellé, closes #6894)	2020-09-07 18:53:21 +02:00
Bram Moolenaar	71ccd03ee8	patch 8.2.0967: unnecessary type casts for vim_strnsave() Problem: Unnecessary type casts for vim_strnsave(). Solution: Remove the type casts.	2020-06-12 22:59:11 +02:00
Bram Moolenaar	a80faa8930	patch 8.2.0559: clearing a struct is verbose Problem: Clearing a struct is verbose. Solution: Define and use CLEAR_FIELD() and CLEAR_POINTER().	2020-04-12 19:37:17 +02:00
Bram Moolenaar	f4140488c7	patch 8.2.0260: several lines of code are duplicated Problem: Several lines of code are duplicated. Solution: Move duplicated code to a function. (Yegappan Lakshmanan, closes #5330)	2020-02-15 23:06:45 +01:00
Bram Moolenaar	7c77b34967	patch 8.2.0033: crash when make_extmatch() runs out of memory Problem: Crash when make_extmatch() runs out of memory. Solution: Check for NULL. (Dominique Pelle, closs #5392)	2019-12-22 19:40:40 +01:00
Bram Moolenaar	9490b9a61c	patch 8.1.2010: new file uses old style comments Problem: New file uses old style comments. Solution: Change to new style comments. (Yegappan Lakshmanan, closes #4910)	2019-09-08 17:20:12 +02:00

1 2

51 Commits