If the documents use English tag names (say XHTML or DocBook or SOAP) in conjunction with Asian PCDTA, the difference is even smaller. At one point I experimented with switching between UTF-8 and UTF-16 depending on language, and was surprised to find it really didn't make a big difference. For one real world example, I looked at the Japanese translation of the XML specification included in the W3C XML test suite. The UTF-8 version is 202K. The UTF-16 version is 305K, 50% larger! Of course, this can be highly dependent on the nature of the documents. An originally Japanese document with Japanese markup and no internal DTD subset might reverse these numbers, or at least bring them into parity. Elliotte Rusty Harold on [www-tag]

The more I learn about Unicode and encodings, the more I like UTF-8.

I do everything in UTF-8.

Posted by Adriaan on 2004-03-12