Auto-detect encoding

I wonder if it would be efficient to use Bayesian methods to determine a documents encoding type? You know, for those pesky XML files that come in without an encoding type specified.

It's certainly possible to match, say, known words to their byte patterns, and thus encodings. The question is why?

Posted by Ziv Caspi on 2003-06-28