Tuesday, May 15, 2007

Bad bad (en)coding

Microsoft home page has two different encodings (charset) in the META tag - UTF-8 and UTF-16. If you take the page byte array and convert to string using these encodings, the string with UTF-16 gives you junk characters. Now, browsers generally ignore the advisory encodings given by the html and figure out using their own method. Apparently, IE has better encoding detection that Mozilla family. But, then if I am building a simple application, which will not have such sophisticated pieces in place, then it is source enormous grief. While working with some non-English sites, we came across bunch of sites which think "utf-8" is some innocent directive you can put in the html page. This is the bad thing about standards. There are just too many of them.

ms-home-page-src


ms-encoding-utf8


ms-encoding-utf16

Labels: ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]