Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

Open
triska opened this issue Jul 29, 2015 · 3 comments
Assignees

Comments

@triska
Copy link
Member

triska commented Jul 29, 2015

Suppose I relay a POST request, and I want to use the exact same content_type as in the original request. I do:

...,
memberchk(content_type(Type), Request),
http_post(Target, codes(Type,Codes), _, [])

Unfortunately, this yields unintended results for example with pengines, because pengines sends:

Content-Type: application/json; charset=utf-8

and http_post/4 appends its own ; charset=UTF-8 to that, yielding the HTTP header field:

Content-Type: application/json; charset=utf-8; charset=UTF-8

and pengines itself cannot handle this unexpected duplicated parameter when it is sent.

If possible, please avoid appending ; charset=UTF=8 if the charset is already specified in the content type. Thank you!

@triska triska changed the title http_post/4: codes(Type, Codes) option sometimes incorrectly emits charset twice http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice Jul 30, 2015
@wouterbeek
Copy link
Contributor

The culprit seems to be on http_header.pl line 889 where utf8 is automatically assumed for POST data format codes(Type,Codes). I'm not sure why a codes list must be UTF-8 encoded?

@JanWielemaker
Copy link
Member

Difficult. codes([Type], Codes) must send UTF-8 because Codes are unicode code points and UTF-8 is the only representation thereof that all web tools and Prolog understand. Anything reading text in SWI-Prolog must ensure that, whatever encoding the input uses, the Prolog code list/atom/string is Unicode.

We could detect an existing charset in content_type//2. I think it just unnecessarily complicates
the code though. In theory we could validate all header fields. I think that merely complicates the code and hurts performance without adding much value.

It you just want to proxy, you should read the data as bytes and pass the Content-Type unmodified, with or without a charset. I guess the most sensible solution would be to add bytes(Type, Bytes) and demand all Bytes to be 0..255. The latter is enforced anyway as writing to the octet stream will raise an exception if the data contains a character outside the 0.255 range.

I'm happy with a patch adding bytes(Type, Bytes).

@triska
Copy link
Member Author

triska commented Oct 29, 2015

I think this would be a very good solution to this and related issues like #9 . As mentioned, one key aspect in these cases is that the (reverse) proxy cannot just copy everything the target emits, because some parts of the output (notable example: HTTP redirections) need to be rewritten to match the site topology. I think that a bytes/2 option would solve such issues: It is efficient for simply copying the response, and also allows parsing if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants