-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(marchaler): escape runes outside the multilingual plane #937
Conversation
Thank you for your contribution! I'm a little confused about the need to marshal strings with those characters into TOML basic strings instead of literal strings. Looking at the spec, literal strings can contain the same characters as basic strings, except those that should be quoted:
So any other Unicode point, as long as it is correctly encoded in UTF-8, should be allowed in literal strings. Do you mind elaborating on why this is a bug? I'm sorry if I've missed something. |
@pelletier it is allowed, but when you unmarchal these, also in go they will be string literals. And there we get into issues as this is in our case then also printed literally instead of interpreted. You can see the effect in the linked issue. They aren't literal strings in the object, yet always exported as such. In our case, this struct: {
PowerlineSymbol: "\ue0b0"
} Is exported as: [[blocks.segments]]
powerline_symbol = '\ue0b0' while it should be: [[blocks.segments]]
powerline_symbol = "\ue0b0" I'd argue there's a difference between the above and: {
PowerlineSymbol: `\ue0b0`
} which is an actual string literal. |
The literal For go-toml to emit the TOML string `\ue0b0`
"\\ue0b0" I'm not familiar with oh-my-posh, but took at stab at looking that the config export code in that repo. This seems to be the relevant snippet: Looks like all three export formats follow a pattern. First export using a format-specific library, then all apply the same post-processing routine From what I gather, this function goes over all runes in the output, and transforms some unicode points to escape them in a This process probably only works in JSON, where strings are always represented in double quotes, and double quotes accept the Assuming this analysis is correct, I don't think there is an issue with go-toml here. Maybe move the the glyph escaping to a pre-processing step, before calling the encoders? That way it would avoid needing to be format-specific, and work for all three formats? |
@pelletier I can validate that indeed. Allow me to check and report back. I didn't think about that anymore as that part was somewhere in the past and did miss that during my analysis. |
@pelletier confirmed, and sorry for missing that part completely. If I need to get a string representation of these code points, I'll figure that out later. For now that's not as important. |
No worries, glad you figured out a solution to your issue! 🙌🏻 |
I recently switched to your amazing
toml
library and noticed it has an issue quoting strings as literal which ruines the rendering in oh-my-posh when exporting a configuration (see JanDeDobbeleer/oh-my-posh#4750).This PR ensure we only quote strings as literal when it only contains characters inside the multilingual plane and/or emoji.
There was a V1 test that, rightfully so, fails which I also adjusted for correctness.
Paste
benchstat
results here (running ATM, takes a loooong time). Will post when it's done.