patch 9.0.1617: charidx() result is not consistent with byteidx()

Problem: charidx() and utf16idx() result is not consistent with byteidx(). Solution: When the index is equal to the length of the text return the lenght of the text instead of -1. (Yegappan Lakshmanan, closes #12503)
2023-06-08 17:09:45 +01:00
parent 5bf042810b
commit 577922b917
6 changed files with 132 additions and 52 deletions
--- a/runtime/doc/builtin.txt
+++ b/runtime/doc/builtin.txt
@ -1528,11 +1528,13 @@ charidx({string}, {idx} [, {countcc} [, {utf16}]])
 		When {utf16} is present and TRUE, {idx} is used as the UTF-16
 		index in the String {expr} instead of as the byte index.

-		Returns -1 if the arguments are invalid or if {idx} is greater
-		than the index of the last byte in {string}.  An error is
-		given if the first argument is not a string, the second
-		argument is not a number or when the third argument is present
-		and is not zero or one.
+		Returns -1 if the arguments are invalid or if there are less
+		than {idx} bytes. If there are exactly {idx} bytes the length
+		of the string in characters is returned.
+
+		An error is given and -1 is returned if the first argument is
+		not a string, the second argument is not a number or when the
+		third argument is present and is not zero or one.

 		See |byteidx()| and |byteidxcomp()| for getting the byte index
 		from the character index and |utf16idx()| for getting the
@ -10119,8 +10121,8 @@ uniq({list} [, {func} [, {dict}]])			*uniq()* *E882*
 <
 							*utf16idx()*
 utf16idx({string}, {idx} [, {countcc} [, {charidx}]])
-		Same as |charidx()| but returns the UTF-16 index of the byte
-		at {idx} in {string} (after converting it to UTF-16).
+		Same as |charidx()| but returns the UTF-16 code unit index of
+		the byte at {idx} in {string} (after converting it to UTF-16).

 		When {charidx} is present and TRUE, {idx} is used as the
 		character index in the String {string} instead of as the byte
@ -10128,6 +10130,10 @@ utf16idx({string}, {idx} [, {countcc} [, {charidx}]])
 		An {idx} in the middle of a UTF-8 sequence is rounded upwards
 		to the end of that sequence.

+		Returns -1 if the arguments are invalid or if there are less
+		than {idx} bytes in {string}. If there are exactly {idx} bytes
+		the length of the string in UTF-16 code units is returned.
+
 		See |byteidx()| and |byteidxcomp()| for getting the byte index
 		from the UTF-16 index and |charidx()| for getting the
 		character index from the UTF-16 index.