On 14 December 2010 11:36, Alex Kapranoff <kapranoff@gmail.com> wrote:
On Tue, Dec 14, 2010 at 13:10, damien krotkine <dkrotkine@gmail.com> wrote:
I was willing to fix a minor thing (the fact that the feature provided in the commit works only with utf8, not any other encoding), then I realize that there are 2 concepts of charsets that are mixed in Dancer (at least I think) :

- the 'charset' setting is used to set the response headers' encoding and charset (see https://github.com/sukria/Dancer/blob/master/lib/Dancer/Handler.pm#L129 )
- the 'charset' setting is used to read files (with this new open_file feature)

These 2 concepts are legitimate, but having them bound to the same setting is error prone : let say you have latin1 template files, and you want your Dancer application to provide UTF8 encoding, you can't. And I can see people trying to fix that by mixing the charset setting, and the ENCODING options of D::P::TemplateToolkit for instance. That will be messy.

As the original author the 'charset' setting I definitely intended it to mean "outgoing chatset" — the one to generate HTTP responses in.
I absolutely agree that using it to mean different things might be very confusing.
 
The Dancer documentation says that the 'charset' setting is "The default charset of outgoing content". I think that it should be kept this way. the 'charset' setting should not be used for local file reading/writing.

Also, Dancer::Renderer::templates provides a default layout template ( https://github.com/sukria/Dancer/blob/master/lib/Dancer/Renderer.pm#L200 ), but set its charset as UTF-8 by default, which contradicts the 'charset' documentation : "Default value is empty which means don't do anything. HTTP responses without charset will be interpreted as ISO-8859-1 by most clients.". But maybe I missed something, especially the cases where Dancer::Renderer::templates is used.

Anyway, I can see the current situation bite us in the back. Unless I have misunderstood the code, in this case, let me know :)

These steps could fix the situation :

- open_file() should not refer to the same thing used by the Handler to set Dancer headers
- there should be a way to indicate to Dancer that local files read / written are in a given charset
- also, my personal opinion is that the 'charset' setting could be utf8 by default, as utf8 is cool :) But that's not mandatory to fix the issues

Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses.

I would strongly suggest limiting the support to UTF-8 only in as many places as possible. I don't see any need to support templates in any other encoding in 2010, for example. Dancer doesn't have the burden of huge legacy systems and it should use this as its advantage. I would also suggest defaulting 'charset' to UTF-8 but that will break things for people not careful enough with their 8bit data.

Well, regarding file i/o, as long as we allow users to set their encoding, we are not limited to utf8. I suspect that a lot of linux operating system are still not using utf8. And Windows variants are not. As for what content Dancer returns, I agree that utf8 could be the only charset supported. But I wouldn't want to make that call by myself.

In addition, there is a catch : there is no strict separation between what is read from the file system and what is served by Dancer : static files are served directly, while templates, being generated by templating system, can produce files in different charsets.

So it seems difficult to draw a line between what 'charset' does and 'file_io_charset' would do.

It's probably easier to consider that the files are in the same charset as the charset that Dancer outputs