I'm getting data corruption when a UTF-8 encoded web page sends form data back to Dancer via POST. As far as I can tell, the browser is doing the right thing and submitting the data as UTF-8 -- verified with Firebug -- but when I try to retrieve the data with param(), Dancer treats it as something else. To see this, say "dancer -a utf8form" and edit two of the files: 1. Replace the contents of utf8form/views/index.tt with: <form method="post" action="/submit"> <input name="field" type="text"/> </form> 2. Add a POST handler route to utf8form/lib/utf8form.pm: post '/submit' => sub { debug "Got '", param('field'), "' from form."; }; Since my terminal uses UTF-8 and default Dancer apps have UTF-8 in config.yml and the generated HTML templates, the debug() call shouldn't need any special help, IMO. Try submitting "Español", for example.
Tere! 2014-06-26 16:49 GMT+03:00 Warren Young <warren@etr-usa.com>: I'm getting data corruption when a UTF-8 encoded web page sends form data
back to Dancer via POST.
...
2. Add a POST handler route to utf8form/lib/utf8form.pm:
post '/submit' => sub { debug "Got '", param('field'), "' from form."; };
Actually, your code uses here STDERR and seems to me, that Dancer does not take care of it, so in your ./bin/app.pl you should add this: binmode(STDERR, ":encoding(UTF-8)"); But this cures only STDERR, for any other output you may have, you must take care same or similar way. I assume, in views you see UTF-8 correctly. At least I saw with your testcase. -- Wbr, Kõike hääd, Gunnar
On 6/26/2014 09:10, WK wrote:
Actually, your code uses here STDERR and seems to me, that Dancer does not take care of it, so in your ./bin/app.pl <http://app.pl> you should add this:
binmode(STDERR, ":encoding(UTF-8)");
Thanks. That solved the problem for me, indirectly. I think Dancer::Logger::Console should do that when config.yml contains charset: "UTF-8". It seems that Dancer::Logger::File already does the right thing when inserting param() data from a UTF-8 HTML form. I wonder why there is a difference? My test case boiled things down too far. My real issue was different. It was yet another case of me thinking Perl somehow tags strings with their source encoding, so that data coming from a UTF-8 source and going out to a UTF-8 sink wouldn't need translation. I needed to add an explicit encode('utf-8', $s) wrapper to that code path. Your reply was still helpful, WK, in that it reminded me that Perl doesn't tag encoding that way, so thanks! As advanced as Perl is in terms of its Unicode support, it's still more primitive than it should be.
2014-06-27 16:55 GMT+03:00 Warren Young <warren@etr-usa.com>:
I think Dancer::Logger::Console should do that when config.yml contains charset: "UTF-8".
I agree.
My test case boiled things down too far. My real issue was different. It was yet another case of me thinking Perl somehow tags strings with their source encoding, so that data coming from a UTF-8 source and going out to a UTF-8 sink wouldn't need translation. I needed to add an explicit encode('utf-8', $s) wrapper to that code path.
I have had lot problems with UTF-8. I think, main point which made things simpler for me: Perl uses internally _unicode_ strings. So everything coming in or going out needs decoding or encoding. Without explicitly doing so, all strings are treated as being ASCII/Latin1 encoded. For me main problem is: in core is nothing to turn every input/output automatically convert from/to utf-8. Every user has to write pretty boring boilerplate to cover all possibilities. I use utf8::all for this, but some are criticizing it. -- Wbr, Kõike hääd, Gunnar
Hello I'm french and there are lot of spécial caractères in my Dancer App, é à ç etc... I use this config.yml charset: "UTF-8" template: "template_toolkit" engines: template_toolkit: encoding: 'utf8' start_tag: '[%' end_tag: '%]' .... plugins: Database: connections: db: driver: 'mysql' database: 'asav' host: 'localhost' port: 3306 username: 'username' password: 'password' connection_check_threshold: 10 dbi_params: RaiseError: 1 AutoCommit: 1 mysql_enable_utf8 : 1 charset: utf8 log_queries: 1 all is completely transparency - I use this locale config in my centos 6.5 locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_PAPER="fr_FR.UTF-8" LC_NAME="fr_FR.UTF-8" LC_ADDRESS="fr_FR.UTF-8" LC_TELEPHONE="fr_FR.UTF-8" LC_MEASUREMENT="fr_FR.UTF-8" LC_IDENTIFICATION="fr_FR.UTF-8" LC_ALL= mysql> show create database db; +----------+---------------------------------------------------------------+ | Database | Create Database | +----------+---------------------------------------------------------------+ | db | CREATE DATABASE `db` /*!40100 DEFAULT CHARACTER SET utf8 */ | +----------+---------------------------------------------------------------+ 1 row in set (0.00 sec) I use debug cmd to show data for debuging in console - all works well I use redis or YAML , please do not use session JSON, there are a bug with accents when you get data with accents stored in session. #session: 'YAML' session: 'Redis' #session: "JSON" I start with use utf8; on each lib.pm bye Hugues. Le 27/06/2014 20:24, WK a écrit :
2014-06-27 16:55 GMT+03:00 Warren Young <warren@etr-usa.com <mailto:warren@etr-usa.com>>:
I think Dancer::Logger::Console should do that when config.yml contains charset: "UTF-8".
I agree.
My test case boiled things down too far. My real issue was different. It was yet another case of me thinking Perl somehow tags strings with their source encoding, so that data coming from a UTF-8 source and going out to a UTF-8 sink wouldn't need translation. I needed to add an explicit encode('utf-8', $s) wrapper to that code path.
I have had lot problems with UTF-8. I think, main point which made things simpler for me:
Perl uses internally _unicode_ strings. So everything coming in or going out needs decoding or encoding. Without explicitly doing so, all strings are treated as being ASCII/Latin1 encoded.
For me main problem is: in core is nothing to turn every input/output automatically convert from/to utf-8. Every user has to write pretty boring boilerplate to cover all possibilities. I use utf8::all for this, but some are criticizing it.
-- Wbr, Kõike hääd,
Gunnar
_______________________________________________ dancer-users mailing list dancer-users@dancer.pm http://lists.preshweb.co.uk/mailman/listinfo/dancer-users
-- Salutations
participants (3)
-
Hugues -
Warren Young -
WK