Character unescaping improvements #699

ForNeVeR · 2024-11-06T21:14:40Z

Some issues with the current code in Cesium.CodeGen.Ir.Expressions.Constants.CharConstant.UnescapeCharacter and Cesium.Parser.TokenExtensions.UnwrapStringLiteral:

There are two of them, with different implementations. There should be only one.
UnescapeCharacter doesn't support \u and \U aka universal-character-name from the standard.
UnescapeCharacter also has a bug in handling octal and hex sequences: both are considered to only have two digits, with special treatment of \0. While the standard defines octal sequences to be either one, two or three characters long, while the hex escapes are of arbitrary length.
\0 should not be a special case in either of the methods; it is just an octal number.
UnwrapStringLiteral also seems to treat octal sequences weirdly: I only see support for octal numbers starting from 0 which is not correct (UnescapeCharacter handles these better).
Normal compiler behavior is to report a warning on an invalid sequence (e.g. \m) and treat it as the character itself. We don't do this: we either silently accept or break on such sequences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character unescaping improvements #699

Character unescaping improvements #699

ForNeVeR commented Nov 6, 2024 •

edited

Loading

Character unescaping improvements #699

Character unescaping improvements #699

Comments

ForNeVeR commented Nov 6, 2024 • edited Loading

ForNeVeR commented Nov 6, 2024 •

edited

Loading