A part of me wants to rail against Mark Pilgrim's latest article Parsing RSS At All Costs. On one hand I think it introdcuces a positive feedback loop into the systen to introduce a liberal parser. That is, once you sink below Well-Formed, you end up in a race to the bottom. Who is to say what is 'liberal' enough. Tomorrow someone will introduce a new parser that is even more liberal than Marks. When RSS feed quality declines even more then a whole new round of ever more liberal parsers will arrive to fill the need. Where does it end?
The designers of XML knew this all too well and designed in a constriant into the system, the concept of Well-Formed. It is a minimum standard to be called XML and it effectively breaks the positive feedback loop referred to above. Well-Formedness wasn't an accident, it was intentionally put in to avoid such a "race-to-the-bottom". That positive feedback loop may have effects outside the RSS community and have impacts on the larger XML community.
I don't care. Being the guy who maintains The Well-Formed Web you might think I'd be the first to attack the article. I still took my time responding to Mark's article because while I think it will damage XML I'm not convinced it's a bad thing.
Think back to the good/bad old days of web when we all learned HTML by viewing the source and experimenting. Browsers were liberal and documentation was scant. Now there is something to say about the value of reminiscing about the good-old-days, but I think there is an important lesson to be learned from the early versions of HTML. I am not talking about the HTML as specified in some specification but the HTML as practiced. It was sloppy and ill-defined, but it had the following advantages:
- Built in character entities.
- No need to include some stinking DTD, all the good entities were already pre-defined.
- No need to close all the tags, the parser could figure out what you meant most of the time.
- Just dump a new tag in for your custom data. Browsers were supposed to ignore any tags they don't recognize and we didn't have and fancy-schmancy validators back then.
- Built in web
- HTML, unlike XML, was aware of the web from day one, unlike XML where web awareness is still being added in a patchwork fashion, ala XLink, XPointer, etc.
All that and you still ended up with a tree-structured document model.
So do I want a return to the good/bad old days of HTML? Not really. What I do want is the good parts of old HTML along with the experience of Well-Formed XML to strongly influence and inform the development of XML's successor, be that XML 2.0 or some other markup language.