[ proposal ] charset discussion
Hi, This email is to share some thoughts I had while reading this commit : https://github.com/sukria/Dancer/commit/6e30f2fd23a9ab00cb45d54e13556ee30b84... I was willing to fix a minor thing (the fact that the feature provided in the commit works only with utf8, not any other encoding), then I realize that there are 2 concepts of charsets that are mixed in Dancer (at least I think) : - the 'charset' setting is used to set the response headers' encoding and charset (see https://github.com/sukria/Dancer/blob/master/lib/Dancer/Handler.pm#L129 ) - the 'charset' setting is used to read files (with this new open_file feature) These 2 concepts are legitimate, but having them bound to the same setting is error prone : let say you have latin1 template files, and you want your Dancer application to provide UTF8 encoding, you can't. And I can see people trying to fix that by mixing the charset setting, and the ENCODING options of D::P::TemplateToolkit for instance. That will be messy. The Dancer documentation says that the 'charset' setting is "The default charset of outgoing content". I think that it should be kept this way. the 'charset' setting should not be used for local file reading/writing. Also, Dancer::Renderer::templates provides a default layout template ( https://github.com/sukria/Dancer/blob/master/lib/Dancer/Renderer.pm#L200 ), but set its charset as UTF-8 by default, which contradicts the 'charset' documentation : "Default value is empty which means don't do anything. HTTP responses without charset will be interpreted as ISO-8859-1 by most clients.". But maybe I missed something, especially the cases where Dancer::Renderer::templates is used. Anyway, I can see the current situation bite us in the back. Unless I have misunderstood the code, in this case, let me know :) These steps could fix the situation : - open_file() should not refer to the same thing used by the Handler to set Dancer headers - there should be a way to indicate to Dancer that local files read / written are in a given charset - also, my personal opinion is that the 'charset' setting could be utf8 by default, as utf8 is cool :) But that's not mandatory to fix the issues Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses. What do you think? I have a branch halfway done, but I'd like to see more inputs and opinions, so that we achieve a common and agreed strategy :) dams
На Tue, 14 Dec 2010 11:10:40 +0100 damien krotkine <dkrotkine@gmail.com> записано:
Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses.
May be we can have file_io_charset option, and if it is set it will override the charset option, but if it don't charset will be used. How's that?
What do you think? I have a branch halfway done, but I'd like to see more inputs and opinions, so that we achieve a common and agreed strategy :)
dams
On 14 December 2010 11:20, <noc@nelcom-voip.com> wrote:
На Tue, 14 Dec 2010 11:10:40 +0100 damien krotkine <dkrotkine@gmail.com> записано:
Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses.
May be we can have file_io_charset option, and if it is set it will override the charset option, but if it don't charset will be used. How's that?
I like the name 'file_io_charset'. However, see this fictive example : you are developing a Dancer App on Mac. The operating system has 'latin1' as global charset, and your files are all in latin1. If you set the 'charset' setting to 'UTF8', you'd want the files to continue being read as latin1, but Dancer output UTF8. That behavior is quite nice. So I would suggest that 'file_io_charset' defaults to nothing, so that unless specified, we let the operating system take care of the files charsets. But I like the 'file_io_charset' name :) dams
На Tue, 14 Dec 2010 11:58:29 +0100 damien krotkine <dkrotkine@gmail.com> записано:
you are developing a Dancer App on Mac. The operating system has 'latin1' as global charset, and your files are all in latin1. If you set the 'charset' setting to 'UTF8', you'd want the files to continue being read as latin1, but Dancer output UTF8. That behavior is quite nice.
So I would suggest that 'file_io_charset' defaults to nothing, so that unless specified, we let the operating system take care of the files charsets. But I like the 'file_io_charset' name :)
You mean unless file_io_charset specified to do nothing with files encoding in perl, right? That makes sense, actually. You know, I'm quite bothered to specify utf8 in so many places already, one more setting will not change the picture. I have specified utf8 already in DBD::Pg options, also I have to manually catch database errors and decode them as utf8, and to set use utf8 in every perl file also.
On Tue, Dec 14, 2010 at 13:10, damien krotkine <dkrotkine@gmail.com> wrote:
I was willing to fix a minor thing (the fact that the feature provided in the commit works only with utf8, not any other encoding), then I realize that there are 2 concepts of charsets that are mixed in Dancer (at least I think) :
- the 'charset' setting is used to set the response headers' encoding and charset (see https://github.com/sukria/Dancer/blob/master/lib/Dancer/Handler.pm#L129 ) - the 'charset' setting is used to read files (with this new open_file feature)
These 2 concepts are legitimate, but having them bound to the same setting is error prone : let say you have latin1 template files, and you want your Dancer application to provide UTF8 encoding, you can't. And I can see people trying to fix that by mixing the charset setting, and the ENCODING options of D::P::TemplateToolkit for instance. That will be messy.
As the original author the 'charset' setting I definitely intended it to mean "outgoing chatset" — the one to generate HTTP responses in. I absolutely agree that using it to mean different things might be very confusing.
The Dancer documentation says that the 'charset' setting is "The default charset of outgoing content". I think that it should be kept this way. the 'charset' setting should not be used for local file reading/writing.
Also, Dancer::Renderer::templates provides a default layout template ( https://github.com/sukria/Dancer/blob/master/lib/Dancer/Renderer.pm#L200), but set its charset as UTF-8 by default, which contradicts the 'charset' documentation : "Default value is empty which means don't do anything. HTTP responses without charset will be interpreted as ISO-8859-1 by most clients.". But maybe I missed something, especially the cases where Dancer::Renderer::templates is used.
Anyway, I can see the current situation bite us in the back. Unless I have misunderstood the code, in this case, let me know :)
These steps could fix the situation :
- open_file() should not refer to the same thing used by the Handler to set Dancer headers - there should be a way to indicate to Dancer that local files read / written are in a given charset - also, my personal opinion is that the 'charset' setting could be utf8 by default, as utf8 is cool :) But that's not mandatory to fix the issues
Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses.
I would strongly suggest limiting the support to UTF-8 only in as many places as possible. I don't see any need to support templates in any other encoding in 2010, for example. Dancer doesn't have the burden of huge legacy systems and it should use this as its advantage. I would also suggest defaulting 'charset' to UTF-8 but that will break things for people not careful enough with their 8bit data.
On 14 December 2010 11:36, Alex Kapranoff <kapranoff@gmail.com> wrote:
On Tue, Dec 14, 2010 at 13:10, damien krotkine <dkrotkine@gmail.com>wrote:
I was willing to fix a minor thing (the fact that the feature provided in the commit works only with utf8, not any other encoding), then I realize that there are 2 concepts of charsets that are mixed in Dancer (at least I think) :
- the 'charset' setting is used to set the response headers' encoding and charset (see https://github.com/sukria/Dancer/blob/master/lib/Dancer/Handler.pm#L129 ) - the 'charset' setting is used to read files (with this new open_file feature)
These 2 concepts are legitimate, but having them bound to the same setting is error prone : let say you have latin1 template files, and you want your Dancer application to provide UTF8 encoding, you can't. And I can see people trying to fix that by mixing the charset setting, and the ENCODING options of D::P::TemplateToolkit for instance. That will be messy.
As the original author the 'charset' setting I definitely intended it to mean "outgoing chatset" — the one to generate HTTP responses in. I absolutely agree that using it to mean different things might be very confusing.
The Dancer documentation says that the 'charset' setting is "The default charset of outgoing content". I think that it should be kept this way. the 'charset' setting should not be used for local file reading/writing.
Also, Dancer::Renderer::templates provides a default layout template ( https://github.com/sukria/Dancer/blob/master/lib/Dancer/Renderer.pm#L200), but set its charset as UTF-8 by default, which contradicts the 'charset' documentation : "Default value is empty which means don't do anything. HTTP responses without charset will be interpreted as ISO-8859-1 by most clients.". But maybe I missed something, especially the cases where Dancer::Renderer::templates is used.
Anyway, I can see the current situation bite us in the back. Unless I have misunderstood the code, in this case, let me know :)
These steps could fix the situation :
- open_file() should not refer to the same thing used by the Handler to set Dancer headers - there should be a way to indicate to Dancer that local files read / written are in a given charset - also, my personal opinion is that the 'charset' setting could be utf8 by default, as utf8 is cool :) But that's not mandatory to fix the issues
Sadly, I don't know how to do that without adding a new setting, something like 'local_charset', or 'files_charset'... which would be used y open_file, the 'charset' be left as it is now, and used only for Dancer responses.
I would strongly suggest limiting the support to UTF-8 only in as many places as possible. I don't see any need to support templates in any other encoding in 2010, for example. Dancer doesn't have the burden of huge legacy systems and it should use this as its advantage. I would also suggest defaulting 'charset' to UTF-8 but that will break things for people not careful enough with their 8bit data.
Well, regarding file i/o, as long as we allow users to set their encoding, we are not limited to utf8. I suspect that a lot of linux operating system are still not using utf8. And Windows variants are not. As for what content Dancer returns, I agree that utf8 could be the only charset supported. But I wouldn't want to make that call by myself. In addition, there is a catch : there is no strict separation between what is read from the file system and what is served by Dancer : static files are served directly, while templates, being generated by templating system, can produce files in different charsets. So it seems difficult to draw a line between what 'charset' does and 'file_io_charset' would do. It's probably easier to consider that the files are in the same charset as the charset that Dancer outputs
participants (3)
-
Alex Kapranoff -
damien krotkine -
noc@nelcom-voip.com