[Dancer-users] from_json, utf8

Daniel Perrett dperrett at cambridge.org
Thu Mar 15 18:50:14 CET 2012


I have run into a problem when I use POST characters above \x7F to Dancer 
1.3093 and then applying from_json to them.

I have discussed the issue with ambs, who reported an issue with to_json 
in versions prior to 1.3093, and he has helpfully (and speedily) found a 
fix, but it's still unclear why it needs fixing, and whether this is 
something that Dancer should fix, so I'm posting to the list for wider 
input, ideas, discussion. 

To recreate:

Create a new app:

        dancer -a MyWeb::App

and replace your index.tt and App.pm with those attached (alternatively, 
use the diff attached).

Load it into your browser, click the button, and it sends {"q":"café"} to 
the server, which processes it fine and returns that word. All good so 
far.

Notice that in MyWeb/App.pm, the to_json has a flag utf8=>0. This is the 
mysterious fix.

Now, remove that flag, so the line reads

        my $data = from_json( param('json'));

... and reload the app, click the button, and you will get an 500 internal 
error response, reading:

        {
           "exception" : "malformed UTF-8 character in JSON string, at 
character offset 9 (before \"\\x{98bd}\") at 
/usr/lib/perl5/site_perl/5.10/JSON.pm line 171.\n",
           "error" : "malformed UTF-8 character in JSON string, at 
character offset 9 (before \"\\x{98bd}\") at 
/usr/lib/perl5/site_perl/5.10/JSON.pm line 171.\n"
        }

(NB: in earlier versions, such as 1.3072, you won't get an error.)

What puzzles me most here is the reference to \x{98bd} - I have no idea 
how from_json is getting \x98bd. What gets sent is 
json=%7B%22q%22%3A%22caf%C3%A9%22%7D  - c3+a9 being utf8 for \xe9 i.e é.

Ambs says "Now, why you need to make utf8 to false, because the string is 
in UTF8 but doesn't have the utf8 flag on. So, when asking to parse it as 
utf8 it will double encode the thing (I think)."

The question I have is "Should the utf8 flag be on anyway?" - is this 
something Dancer should be doing?

It seems odd to me that Dancer makes available to the user a utf8 string 
without the utf8 flag, but perhaps there is a good reason for it? (or I 
have misunderstood?)

Possibly Relevant links...

- https://github.com/sukria/Dancer/pull/740/files
- https://github.com/sukria/Dancer/issues/749
- https://github.com/sukria/Dancer/pull/726

Daniel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.backup-manager.org/pipermail/dancer-users/attachments/20120315/6d86f312/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: diff.txt
URL: <http://www.backup-manager.org/pipermail/dancer-users/attachments/20120315/6d86f312/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: index.tt
Type: application/octet-stream
Size: 270 bytes
Desc: not available
URL: <http://www.backup-manager.org/pipermail/dancer-users/attachments/20120315/6d86f312/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: App.pm
Type: application/octet-stream
Size: 256 bytes
Desc: not available
URL: <http://www.backup-manager.org/pipermail/dancer-users/attachments/20120315/6d86f312/attachment-0001.obj>


More information about the Dancer-users mailing list