CI: Manage multibyte characters in syntax tests
As reported in #16559, bytes of a multibyte character may be written as separate U+FFFD characters in a ":terminal" window on a busy machine. The testing facilities currently offer an optional filtering step to be carried out between reading and comparing the contents of two screendump files for each such file. This filtering has been resorted to (#14767 and #16560) in an attempt to unconditionally replace known non-Latin-1 characters with an arbitrary substitute ASCII character and avoid this rendering mishap leading to syntax tests failures. However, it has been overlooked at the time that metadata description (in shorthand) to follow spurious U+FFFD characters may be *distinct* and make the remainder of such a line, ASCII characters and whatnot, also unequal between compared screendump files. While it is straightforward to adapt current filter files to ignore the line characters after the leftmost U+FFFD, > It is challenging and error-prone to keep up to date filter > files because moving around examples in source files will > likely make redundant some previously required filter files > and, at the same time, it may require creating new filter > files for the same source file; substituting one multibyte > character for another multibyte character will also demand > a coordinated change for filter files. Besides, unconditionally dropping arbitrary parts of a line is rather too blunt an instrument. An alternative approach is to not use the supported filtering for this purpose; let a syntax test pass or fail initially; then *if* the same failure is imminent, drop the leftmost U+FFFD and the rest of the previously seen line (repeating it for all previously seen unequal lines) before another round of file contents comparing. The obvious disadvantage with this filtering, unconditional and otherwise, is that if there are consistent failures for _other reasons_ and the unequal parts happen to be after U+FFFDs, then spurious test passing can happen when stars align for _a particular test runner_. Hence syntax test authors should strive to write as little significant text after multibyte characters as syntactically permissible, write multibyte characters closer to EOL in general, and make sure that their checked-in and published "*.dump" files do not have any U+FFFDs. It is also practical to refrain from attempting screendump generation if U+FFFDs can already be discovered, and instead try re-running from scratch the syntax test in hand, while accepting other recently generated screendumps without going through with new rounds of verification. Reference: https://github.com/vim/vim/pull/16470#issuecomment-2599848525 closes: #17704 Signed-off-by: Aliaksei Budavei <0x000c70@gmail.com> Signed-off-by: Christian Brabandt <cb@256bit.org>
This commit is contained in:
		
				
					committed by
					
						 Christian Brabandt
						Christian Brabandt
					
				
			
			
				
	
			
			
			
						parent
						
							43b99c9376
						
					
				
				
					commit
					0fde6aebdd
				
			| @ -105,6 +105,7 @@ enddef | ||||
| "	    some dictionary with some state entries; | ||||
| "	    the file contents of the newly generated screen dump; | ||||
| "	    the zero-based number of the line whose copies are not equal. | ||||
| " (See an example in runtime/syntax/testdir/runtest.vim.) | ||||
| " | ||||
| " The file name used is "dumps/{filename}.dump". | ||||
| " | ||||
|  | ||||
		Reference in New Issue
	
	Block a user