http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

triska · 2015-07-29T20:14:11Z

Suppose I relay a POST request, and I want to use the exact same content_type as in the original request. I do:

...,
memberchk(content_type(Type), Request),
http_post(Target, codes(Type,Codes), _, [])

Unfortunately, this yields unintended results for example with pengines, because pengines sends:

Content-Type: application/json; charset=utf-8

and http_post/4 appends its own ; charset=UTF-8 to that, yielding the HTTP header field:

Content-Type: application/json; charset=utf-8; charset=UTF-8

and pengines itself cannot handle this unexpected duplicated parameter when it is sent.

If possible, please avoid appending ; charset=UTF=8 if the charset is already specified in the content type. Thank you!

The text was updated successfully, but these errors were encountered:

wouterbeek · 2015-10-28T21:49:29Z

The culprit seems to be on http_header.pl line 889 where utf8 is automatically assumed for POST data format codes(Type,Codes). I'm not sure why a codes list must be UTF-8 encoded?

JanWielemaker · 2015-10-29T10:34:34Z

Difficult. codes([Type], Codes) must send UTF-8 because Codes are unicode code points and UTF-8 is the only representation thereof that all web tools and Prolog understand. Anything reading text in SWI-Prolog must ensure that, whatever encoding the input uses, the Prolog code list/atom/string is Unicode.

We could detect an existing charset in content_type//2. I think it just unnecessarily complicates
the code though. In theory we could validate all header fields. I think that merely complicates the code and hurts performance without adding much value.

It you just want to proxy, you should read the data as bytes and pass the Content-Type unmodified, with or without a charset. I guess the most sensible solution would be to add bytes(Type, Bytes) and demand all Bytes to be 0..255. The latter is enforced anyway as writing to the octet stream will raise an exception if the data contains a character outside the 0.255 range.

I'm happy with a patch adding bytes(Type, Bytes).

triska · 2015-10-29T20:44:09Z

I think this would be a very good solution to this and related issues like #9 . As mentioned, one key aspect in these cases is that the (reverse) proxy cannot just copy everything the target emits, because some parts of the output (notable example: HTTP redirections) need to be rewritten to match the site topology. I think that a bytes/2 option would solve such issues: It is efficient for simply copying the response, and also allows parsing if necessary.

triska changed the title ~~http_post/4: codes(Type, Codes) option sometimes incorrectly emits charset twice~~ http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice Jul 30, 2015

wouterbeek assigned JanWielemaker Oct 28, 2015

wouterbeek added the help wanted label Oct 28, 2015

triska mentioned this issue Oct 29, 2015

http_reply: please add support for http_reply(codes(Cs), ...) #9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

triska commented Jul 29, 2015

wouterbeek commented Oct 28, 2015

JanWielemaker commented Oct 29, 2015

triska commented Oct 29, 2015

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

http_post/4: codes(Type, Codes) option sometimes unexpectedly emits charset twice #11

Comments

triska commented Jul 29, 2015

wouterbeek commented Oct 28, 2015

JanWielemaker commented Oct 29, 2015

triska commented Oct 29, 2015