-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Convert.ToBase64String uses CR+LF on non-Windows platforms #32452
Comments
It is documented behavior:
|
Yes, I noted that. What I am questioning is whether this is the right, reasonable thing, though. Case in point:
I would personally prefer for the framework to be consistent in its handling of newlines across all corefx classes. To me, the fact that If changing the current semantics would be considered a "too big" breaking change, I think the second-best thing would be to introduce a new enum value like
Thinking out loud, this would perhaps be the "best" option in a way. It adds a little bit of code complexity but OTOH, it makes the method much more flexible and users can choose the semantics that fit their particular use case best. Comments on this idea? (Feel free to suggest better enum member names) |
My personal opinion is that There's new high performance API in |
@perlun with the point of view that the base64 encoded string is a kind of "contract" it makes sense to have CR + LF (note: it could have also be defined with just LF), so that decoding can be done with expected line endings on all platforms. See also Variants summary table. So according to the various RFCs about base64 the behavior is correct. Hence there is no need for Also look at HTTP, where line endings are CR + LF independent of the platform. For historical and practical (parsing) reasons.
...with the RFCs / standards 😉 |
Interesting. You are indeed correct, the RFC:s speak about CR+LF like you suggest. In practice, I don't think I've ever seen a Unix-oriented tool generate base64 with CR+LF though, which is why I was (perhaps incorrectly) assuming that LF would be kosher to use here. A couple of examples: GNU coreutilsNote the single dot character in the stream at the $ echo "hello world, hello world, hello world, hello world, hello world" | base64 | hexdump -C
00000000 61 47 56 73 62 47 38 67 64 32 39 79 62 47 51 73 |aGVsbG8gd29ybGQs|
00000010 49 47 68 6c 62 47 78 76 49 48 64 76 63 6d 78 6b |IGhlbGxvIHdvcmxk|
00000020 4c 43 42 6f 5a 57 78 73 62 79 42 33 62 33 4a 73 |LCBoZWxsbyB3b3Js|
00000030 5a 43 77 67 61 47 56 73 62 47 38 67 64 32 39 79 |ZCwgaGVsbG8gd29y|
00000040 62 47 51 73 49 47 68 6c 62 47 78 76 0a 49 48 64 |bGQsIGhlbGxv.IHd|
00000050 76 63 6d 78 6b 43 67 3d 3d 0a |vcmxkCg==.|
0000005a
$ base64 --version
base64 (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Simon Josefsson. MRI RubyAlso outputs LF as line separator. Note that it splits the lines at 60 characters instead of 76. It claims to be compliant with RFC 2045, but I can only see CRLF mentioned in that RFC. $ ruby -r base64 -e "puts Base64.encode64('hello world, hello world, hello world, hello world, hello world');" | hexdump -C
00000000 61 47 56 73 62 47 38 67 64 32 39 79 62 47 51 73 |aGVsbG8gd29ybGQs|
00000010 49 47 68 6c 62 47 78 76 49 48 64 76 63 6d 78 6b |IGhlbGxvIHdvcmxk|
00000020 4c 43 42 6f 5a 57 78 73 62 79 42 33 62 33 4a 73 |LCBoZWxsbyB3b3Js|
00000030 5a 43 77 67 61 47 56 73 62 47 38 67 0a 64 32 39 |ZCwgaGVsbG8g.d29|
00000040 79 62 47 51 73 49 47 68 6c 62 47 78 76 49 48 64 |ybGQsIGhlbGxvIHd|
00000050 76 63 6d 78 6b 0a |vcmxk.|
00000056
$ ruby -v
ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux-gnu] Python 3Also outputs LF when splitting lines in base64-encoded content (around the $ python3 -c "import base64; print(base64.encodebytes(b'hello world, hello world, hello world, hello world, hello world'));"
b'aGVsbG8gd29ybGQsIGhlbGxvIHdvcmxkLCBoZWxsbyB3b3JsZCwgaGVsbG8gd29ybGQsIGhlbGxv\nIHdvcmxk\n'
$ python3 --version
Python 3.7.3 To be honest, I challenge anyone reading this to find me one single Unix utility or programming language/platform (apart from .NET Core) that generates base64-content with CR+LF line endings. 🙂 I think it depends greatly on what the imagined use case is here. If it is for programmatically generating RFC-compliant content for inserting into a MIME-encoded email, sure; the current semantics are fine and valid. If it is for other use cases (i.e. printing the output to the console), producing something with extra characters isn't as useful.
Thanks, I must admit my ignorance, I wasn't actually aware of this (have obviously been doing too little wiresharking lately. 😉) Bottom line/TL;DR:
While I do agree that it can be seen as a violation of the RFCs, the de facto way tools work on Linux and other LF-only platforms is that they bend the RFCs to fit the platform's native world view better... 🙂
Being consistent with RFCs and standards is indeed important. What also makes sense is to carefully inspect how other tools & languages interpret those same RFCs. I don't think we should be too "wise in our own eyes", to use some biblical vocabulary. It might very well be better to assume all these other tools have good reasons for their choices and that we perhaps just haven't seen the light yet. 🙈 |
Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process. This process is part of our issue cleanup automation. |
Hi,
While debugging a failing test in my code tonight, I was made aware of the utterly surprising fact that a call to
Convert.ToBase64String(plainTextBytes, Base64FormattingOptions.InsertLineBreaks)
will insert not just LF linefeeds but CR+LF, even though I am running my code on Linux.This is definitely not what I would expect; I would have assumed that the class would use the newline format used by the platform in question (i.e. CR+LF on Windows, LF on Linux and macOS).
Here is the code in question (permalink):
runtime/src/libraries/System.Private.CoreLib/src/System/Convert.cs
Lines 2507 to 2524 in 03dc181
I realize that changing this behavior at this stage would be a slightly breaking change, so it's not something to be taken lightly. Still, it seems like the only sensible thing to do since this definitely breaks the principle of least surprise. Please let me know what you think; I can probably manage to fix a PR if you are happy with the change per se.
(I will fix my failing test now and go to sleep. 😉)
The text was updated successfully, but these errors were encountered: