More on Regex-able XML

Joe Gregorio

I have received good feedback on my Regex-able XML. Good in this case doesn't mean people agree with me, just that the responses have been very intelligent and helpful. (On a side note, I will get comments working here before the week is out.)

To clarify, I do know that conforming parsers will wipe away the distictions in the two example documents I posted. My point in presenting them was to point out the complexity in XML and that the formats complexity requires using such a parser.

Also it has been pointed out that this example:

<root:item xmlns:bc="http://purl.org/dc/elements/1.1/" xmlns:root="" >
  <root:title>MetaData</root:title>

  <bc:date>2003-01-12T00:18:05-05:00</bc:date>
  <root:link>http://bitworking.org/news/8</root:link>

  <description>Upon waking, the dinosaur...</description>
</root:item>

isn't valid XML. Turns out the Python parser minidom parses it just fine but the .NET parser to it's credit does rightly pick up on the invalid namespace declaration. That is to say my rule #2 in my previous post is already a part of the XML Namespace specification. Mea Culpa.

Mutiple readers fairly asked for more concrete examples of what I am trying to accomplish that XML is making so difficult. I will some illustrative examples soon.

It was also also pointed out to me that XSLT 2.0 includes support for regexs. That is cool and may be a step in the right direction. Maybe when 2.0 implementations are readily available I'll stop my whining.

I assume someone must have pointed you at this already, but I can't see a comment on this or the previous article (where I came in today from Tim Bray's blog). 'REX' gets mentioned on a regular basis in xml-dev. However as I'm sure other folk have pointed out, a second layer is required to deal with namespaces, entities, etc. I guess what you want is regexps that work intelligently in XML. Yeah, would be nice, wouldn't it :)

Posted by Baz on 2003-03-18

comments powered by Disqus