-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Charset by extension #110
Comments
@triska There is a hook |
Complicated issue. The good news is that more and more tools seem to encourage people adding encoding/charset declarations so we can reduce the guessing we need. As Wouter knows better than me, the claim is often wrong, so we still have a long way to go ... The hook in Probably the best way is to extend the mimetype library. I'm not entirely sure how though. There is no sensible default encoding for text files in general. There may be one for a particular deployment, probably defined on a combination of the current locale, content of the files and intended audience. Could we derive the default from the Prolog encoding/locale? Probably hard as locale names differ by OS and are not standardized AFAIK. On the other hand, if we have a text file and the Prolog encoding is Does this make sense? |
In my opinion, at least for If there were, in addition, a way to map file extensions to charsets (in analogy to the already existing extension → content-type mapping), I would already consider it a huge improvement, because it would at least let users specify the encoding they are using for their files, whether it is ISO Latin-1, Shift JIS, UTF-8 or KOI8-R etc. For comparison, please see the Apache directive AddCharset: AddCharset EUC-JP .euc AddCharset ISO-2022-JP .jis AddCharset SHIFT_JIS .sjis This directive lets you specify charsets by extension and even for particular files individually: <Files "example.html"> AddCharset UTF-8 .html </Files> One or two extensible predicates for corresponding settings in SWI-Prolog would be a very welcome addition! |
I agree this needs a solution. Just, how? Based on Wouter's comment I considered associating a I'm thinking of something like this:
Add this stuff to mimetype.pl. Does that cover what we want? |
I think The other approaches you mention (in particular a tight coupling between content types and encodings) seem somewhat dubious and too inflexible to me. The following thread contains some interesting settings that people find useful in practice: https://stackoverflow.com/questions/913869/how-to-change-the-default-encoding-to-utf-8-for-apache Please consider in particular the following: <Files ~ "\.html?$"> Header set Content-Type "text/html; charset=utf-8" </Files> That's clearly much more flexibility than is needed for this concrete issue. However, I still find this very interesting: You can configure Apache to emit particular header fields based on file names. This flexibility could be useful in SWI-Prolog too. The |
I pushed cd44ae4 which I think both provides a fair default as well as the option to take it all in your hand. Please have a look and close it solves your problems. |
Thank you! The description of 1. Determine the media type using file_mime_type/2 2. Determine it is a text file using text_mimetype/1 3. Use the charset from the Prolog flag `default_charset` I do not know whether the description or code is (in)correct, but it seems what is meant is:
The description of the hooks and flags seems also not to capture what is actually happening. In particular, I suggest:
To me personally, it is also a bit surprising that the |
Thanks for the suggestions. Applied. The idea is to use |
Thank you, it works! At least the https://github.com/SWI-Prolog/packages-http/blob/master/mimetype.pl#L49 If applicable, please consider removing the TBD or citing a different example. |
Use case: I would like to serve UTF-8 encoded
*.txt
files.When I use the following server:
then I can fetch
*.txt
files. However, the content-type in responses is:whereas I would like to get responses such as:
Is there an easy way to configure the SWI infrastructure to specify
charset=UTF-8
when serving*.txt
files? Alternatively, would you please consider adding such a feature? Thank you!The text was updated successfully, but these errors were encountered: