Character encoding is hard. Really. If I could point to one thing that causes feeds to be invalid more than anything else, it would be character encoding. When I first started working with RSS I was always suprised at the energy Bill Kearney put into character encoding. If there was one thing you could count on, it was Bill would always jump into a conversation on character encoding. Two years later and I am finally coming to that place. That place where I jump into any discussion on character encoding. I finally get it, and I finally see what grief is causes with XML, and not just in RSS feeds, but in other areas too. Don't believe me? Not even DMOZ can get character encoding right [via diveintomark].
You see, this is one of the things about XML, a conformant XML processor is only required to accept "utf-8" and "utf-16". So it's possible that an XML processor could reject "Shift_JIS", or "ISO-2022-JP". Who knows, there might even be an XML processor out there that rejects well-formed XML encoded in "utf-32". The more I learn about character encoding the more I like "utf-8".
Hey Joe
you might like this one
http://www.joelonsoftware.com/articles/Unicode.html
Posted by Karl on 2004-03-27
Posted by Adriaan on 2004-03-24