Fellow Dancers, I am mystified by the following issue. My Dancer-powered web site converts utf-8 encoded, plain text files formatted with Markdown into html. My template is based on html5 using the boilerplate from html5boilerplate.com and has the following incantation <!doctype html> <!-- hacked by Puneet Kishor, based on HTML Boilerplate by Paul Irish and Divya Manian all modifications released under a CC0 waiver by Puneet Kishor March 15, 2011 See http://html5boilerplate.com/ for the original --> <!--[if lt IE 7 ]> <html class="no-js ie6" lang="en"> <![endif]--> <!--[if IE 7 ]> <html class="no-js ie7" lang="en"> <![endif]--> <!--[if IE 8 ]> <html class="no-js ie8" lang="en"> <![endif]--> <!--[if (gte IE 9)|!(IE)]><!--> <html class="no-js" lang="en"> <!--<![endif]--> <head> <meta charset="utf-8"> and my Dancer config setting has charset : utf-8 Yet, a text file that I created today using Hindi shows up as gobbledegook under Safari. The text file is perfectly legible in two different Mac OS X text editors (Coda and TextWrangler) and in the terminal using `less` and `vim`. Safari is set to detect encoding automatically, but seems like the web server (Starman) is not sending the right encoding. Any thoughts? -- Puneet Kishor
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html#showmehow> I am not able to reproduce the problem. Prepare a minimal example that exhibits the problem, and tell the exact steps we need to take to observe it.
On Dec 23, 2011, at 1:57 AM, Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 wrote:
<http://www.chiark.greenend.org.uk/~sgtatham/bugs.html#showmehow>
I am not able to reproduce the problem. Prepare a minimal example that exhibits the problem, and tell the exact steps we need to take to observe it.
I am not sure how to prepare the example, because by the time the text reaches my browser, it is messed up. I can send you the representation that the browser renders... it looks like सà¥à¤°à¤ सॠà¤à¤¿à¤°à¤¨à¥ पर à¤à¤, à¤à¤à¤° à¤à¤¤ पर ठहरॠधà¥à¤ª when it should actually look like सूरज से किरनो पर आई, आकर छत पर ठहरी धूप So, I inserted the following lines in my program open my $fh, '>', '/Volumes/roller/Users/punkish/Sites_development/punkish/public/foo.txt'; say $fh $template_opts{'page_text'}; close $fh; return template $template, \%template_opts, { layout => 'main' }; well, guess what... foo.txt renders perfectly, just like I expect it to, but it shows up like gobbledegook in the browser. Safari's web inspector says that the response header is • Connection:keep-alive • Content-Length:17880 • Content-Type:text/html; charset=utf-8 • Date:Fri, 23 Dec 2011 21:19:47 GMT • Server:Perl Dancer 1.3072 • X-Powered-By:Perl Dancer 1.3072 From what I can decipher from above, everything is fine before stuff leaves the Dancer program. Something happens between `return template $template` and viewing it in the browser. Assuming Dancer is not doing anything funky, it must be Starman. -- Puneet Kishor
On 12/23/2011 03:39 AM, Puneet Kishor wrote:
Fellow Dancers,
I am mystified by the following issue. My Dancer-powered web site converts utf-8 encoded, plain text files formatted with Markdown into html.
How do you open these plain text files inside your Dancer application? If you use Perl's open function or File::Slurp, you have to tell them that your file is UTF-8. There is no way around that. Regards Racke -- LinuXia Systems => http://www.linuxia.de/ Expert Interchange Consulting and System Administration ICDEVGROUP => http://www.icdevgroup.org/ Interchange Development Team
On Dec 23, 2011, at 3:28 PM, Stefan Hornburg (Racke) wrote:
On 12/23/2011 03:39 AM, Puneet Kishor wrote:
Fellow Dancers,
I am mystified by the following issue. My Dancer-powered web site converts utf-8 encoded, plain text files formatted with Markdown into html.
How do you open these plain text files inside your Dancer application? If you use Perl's open function or File::Slurp, you have to tell them that your file is UTF-8. There is no way around that.
That was it. Thanks. Here is what I had to do - open my $fh, "<", $full_path_to_page + open my $fh, "<:encoding(UTF-8)", $full_path_to_page Then I got an error in my customized Markdown.pm where `md5_hex` croaked, so I had to change that - my $key = md5_hex($tag); + my $key = md5_hex(encode_utf8($tag)); It works now. So, I got lulled by the documentation that says that I all I have to do is to set the `charset utf-8` in config.yml, and Dancer would take care of everything. Another interesting thing -- before I made the above changes, as I noted in my earlier email, I just wrote out the output to a file on disk before sending it back to the browser. The file written to the disk has the text rendered just fine. Any explanations why? In any case, all's well now. -- Puneet Kishor
On 12/23/2011 10:42 PM, Puneet Kishor wrote:
On Dec 23, 2011, at 3:28 PM, Stefan Hornburg (Racke) wrote:
On 12/23/2011 03:39 AM, Puneet Kishor wrote:
Fellow Dancers,
I am mystified by the following issue. My Dancer-powered web site converts utf-8 encoded, plain text files formatted with Markdown into html.
How do you open these plain text files inside your Dancer application? If you use Perl's open function or File::Slurp, you have to tell them that your file is UTF-8. There is no way around that.
That was it. Thanks. Here is what I had to do
- open my $fh, "<", $full_path_to_page + open my $fh, "<:encoding(UTF-8)", $full_path_to_page
Then I got an error in my customized Markdown.pm where `md5_hex` croaked, so I had to change that
- my $key = md5_hex($tag); + my $key = md5_hex(encode_utf8($tag));
It works now. So, I got lulled by the documentation that says that I all I have to do is to set the `charset utf-8` in config.yml, and Dancer would take care of everything.
Another interesting thing -- before I made the above changes, as I noted in my earlier email, I just wrote out the output to a file on disk before sending it back to the browser. The file written to the disk has the text rendered just fine. Any explanations why?
File systems don't care about the file contents, so if you read something from there it's unfortunately our task to know about the encoding. Regards Racke -- LinuXia Systems => http://www.linuxia.de/ Expert Interchange Consulting and System Administration ICDEVGROUP => http://www.icdevgroup.org/ Interchange Development Team
"The file written to the disk has the text rendered just fine. Any explanations why?" Because you wrote the file in the same way you read it: no encoding specified, so the new file is a copy of the original. But it *would* look wrong if you wrote the file as ">:encoding(UTF-8)". Here's how it happens 1. You tell Perl to read a file with Hindi without specifying the encoding, i.e. as if it contained only ANSI. You now have a utf8 string which happens to represent only characters which appear in ANSI: Latin letter a with acute, currency symbol, etc. 2. You tell Perl to write the the string to a new file as ANSI. It is identical to the original file. 3. You read the new file as UTF-8 in your text editor. Unlike Perl's read, this reads as UTF-8 so interprets the byte sequences which in ANSI represent Latin letter a with acute, currency symbol, etc as Hindi letters, not separate letters from an 8-bit codepage. 4. Meanwhile, you tell Perl to write the string to a web page as UTF-8. Perl sends the UTF-8 values of the characters like à¤, NOT their ANSI (byte) values. 5. You read the page in your browser. Your page displays Latin letter a with acute, currency symbol, etc. because you have sent the UTF-8 values for these and not the raw bytes. The 8-bit values the server sends are now actually à ¤¸à ¥‚à ¤°à ¤œ à ¤¸à ¥‡ à ¤•à ¤¿à ¤°à ¤¨à ¥‹ à ¤ªà ¤° à ¤†à ¤ˆ, à ¤†à ¤•à ¤° à ¤›à ¤¤ à ¤ªà ¤° à ¤ à ¤¹à ¤°à ¥€ à ¤§à ¥‚à ¤ª (which is also probably how your string is stored by Perl). "So, I got lulled by the documentation that says that I all I have to do is to set the `charset utf-8` in config.yml, and Dancer would take care of everything." The documentation is strictly correct, but you just asked Dancer to do something you didn't mean. By the time you gave Dancer the string, there was no Hindi in it, just a load of currency symbols and accented latin characters, which Dancer faithfully passed on to the browser, in UTF-8. Dancer had no way of knowing that it came from a file which had been read as ANSI. Daniel From: Puneet Kishor <punk.kish@gmail.com> To: "Stefan Hornburg (Racke)" <racke@linuxia.de> Cc: dancer-users@perldancer.org Date: 23/12/2011 21:42 Subject: Re: [Dancer-users] utf-8 issues Sent by: dancer-users-bounces@perldancer.org On Dec 23, 2011, at 3:28 PM, Stefan Hornburg (Racke) wrote:
On 12/23/2011 03:39 AM, Puneet Kishor wrote:
Fellow Dancers,
I am mystified by the following issue. My Dancer-powered web site converts utf-8 encoded, plain text files formatted with Markdown into html.
How do you open these plain text files inside your Dancer application? If you use Perl's open function or File::Slurp, you have to tell them that your file is UTF-8. There is no way around that.
That was it. Thanks. Here is what I had to do - open my $fh, "<", $full_path_to_page + open my $fh, "<:encoding(UTF-8)", $full_path_to_page Then I got an error in my customized Markdown.pm where `md5_hex` croaked, so I had to change that - my $key = md5_hex($tag); + my $key = md5_hex(encode_utf8($tag)); It works now. So, I got lulled by the documentation that says that I all I have to do is to set the `charset utf-8` in config.yml, and Dancer would take care of everything. Another interesting thing -- before I made the above changes, as I noted in my earlier email, I just wrote out the output to a file on disk before sending it back to the browser. The file written to the disk has the text rendered just fine. Any explanations why? In any case, all's well now. -- Puneet Kishor _______________________________________________ Dancer-users mailing list Dancer-users@perldancer.org http://www.backup-manager.org/cgi-bin/listinfo/dancer-users
participants (4)
-
Daniel Perrett -
Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 -
Puneet Kishor -
Stefan Hornburg (Racke)