Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for parsing broken strings and use String#encode instead of rb_str_conv_enc() in parser #665

Merged
merged 4 commits into from
Oct 31, 2024

Conversation

eregon
Copy link
Member

@eregon eregon commented Oct 30, 2024

* rb_str_conv_enc() returns the source string unmodified
  if the conversion did not work. But we should be consistent with
  the generator here and only accept BINARY or convertible to UTF-8.
@eregon
Copy link
Member Author

eregon commented Oct 30, 2024

This is only for the parsing part.
I think there it's fair to treat BINARY as UTF-8 unconditionally, because in the end we're just parsing bytes.
If the encoding of the source string is unknown (== BINARY) it seems rather fair to assume it's actually UTF-8 for JSON.

For the generating part I think we could do the same: str.encoding == BINARY ? str.dup.force_encoding(UTF-8) : str.encode(UTF-8).
But it's less clear if that's good or bad, because we're potentially trying to dump something which cannot be dumped faithfully.
And it seems to already be an error in all previous JSON versions (except JSON 1.7.7 in Ruby 2.0!) when the String is broken, for ruby -rjson -e 'p JSON.dump(["\x80"])'. Let's keep that error (ref: #634).
When it's binary but not broken, ruby -rjson -e 'p JSON::VERSION; p JSON.dump(["é".b])', it seems only since 2.6.1, 2.5.1 and before gave Encoding::UndefinedConversionError.
Only the C extension as this behavior, so I think it's good to warn there (ref: #609).

TLDR: I think this change is good for the parsing part, for the generating part it seems good as-is and to add the warning in #642.

@eregon
Copy link
Member Author

eregon commented Oct 30, 2024

It's kind of funky that JSON used rb_str_export_to_enc() in #634 and rb_str_conv_enc() and both of these have error-prone semantics in case the conversion fails. At least now it should be consistent and just use String#encode, like the pure-Ruby version.

@byroot byroot merged commit dd8b3db into ruby:master Oct 31, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants