DOM (Drudgery Object Model)

Joe Gregorio

I have been doing a lot of XML manipulation in the implementation of RESTLog recently and come to really despise the DOM. In the best of situations it leads to akward and verbose code, in the worst of times.. well we won't use those words in a public place...

A pretty good clue that an API is fundamentally broken is you keep re-creating the missing pieces or obvious patches from platform to platform. This I have already done in C# and I am now finding myself doing it again in Python. A function to get the raw text of an element as a single string, function to set the raw text of an element, function to robustly add a namespaced element and it's value to a document overwriting an existing element if it is already there, etc. The list could go on but you get the idea. It stinks and I need something better.

For example, what is the shortest piece of code you can write to add a namespaced element to a document through a DOM API? To be even more specific, how much code does it take to add the element:

<dc:date>2003-01-27T22:52:04-05:00</dc:date>

as a child element to 'item' in this document, but avoid adding a duplicate dc:date if one already exists, instead replacing the 'dc:date' elements contents with the new value.

<item>
   <title>A sunny day</title>
   <link>http://example.com</link>
   <description>Insert witty prose here.</description>
</item>

What I want is an API that would make it easy, for example:

insertElement(
  namespaces={"dc" : "http://purl.org/dc/elements/1.1/"}, 
  path="item/dc:date", 
  value="2003-01-27T22:52:04-05:00", 
  unique=True)

Update:

Timothy Appnel sent me a link to Paul Prescod's explaination of PullDOM as implemented in Python, which is cool. I like pull based parsers for reading XML, in fact I use the pull based parser in the .Net Framework when building the RSS parser for Aggie.

Erick Herring sent me a link to Elliotte Rusty Harold's XOM. In the documentation for this API is also a presentation entitled, "What's Wrong with XML APIs (and how to fix them)". It is a good overview of the different APIs available and their strengths and weaknesses.

Second Update:

Dare Obasanjo has kindly offered up a two line solution in C#. There are however two problems. The first is my fault since I wasn't very clear about the behaviour I wanted if a 'dc:date' element was already present. I have added more verbage to the description to clarify that if 'dc:date' already exists then the item's content should be replaced with the new content.

The second problem with Dare's example is that it's not valid DOM. The InnerXml property isn't a part of the DOM. Even if you were to use the DOM attribute 'nodeValue' the DOM doesn't allow you to set 'nodeValue' when node is of type Element. I think it's great that Microsoft has added useful extensions to the DOM. My point is that they had to add them to make working with XML tolerable.

comments powered by Disqus