Considering PATCH

Dare Obasanjo:

Given that Joe works for Google on GData, I have assumed that Joe's post is Google's attempt to float a trial balloon before extending AtomPub in this way.

As I explained in the comments on Dare's post, this is my personal blog and unless otherwise stated, my own thoughts and ideas. If I weren't just speaking for myself I would make that clear.

Like I am about to do now.

At Google we are considering using PATCH. One of the big open questions surrounding that decision is XML patch formats. What have you found for patch formats and associated libraries?

Unfortunately not much that's usable. All of the xml-diff approaches that have been floated over the years tend towards being grossly overcomplicated.

Posted by James Snell on 2008-02-20

Is XUpdate a viable option?

Posted by Stephen Bounds on 2008-02-20

There's an IETF draft out for something that fails one of James' tests (it's in XML), but seems to have some thought put into the structure and operations you'd need: http://www.ietf.org/internet-drafts/draft-ietf-simple-xml-patch-ops-04.txt

Posted by Gordon Weakliem on 2008-02-21

I am probably giving away my nievity here, but why not use a standard binary diff algorithm and use it equally for all media types?

Posted by Noah Slater on 2008-02-21

Binary diff for XML? Do I here canonicalization :)

Posted by Subbu Allamaraju on 2008-02-21

Gordon,

Yeah, that one, like all the ones I've found so far, tries to take on too much, such as manipulating namespace declarations, comments and processing instructions. In addition it doesn't present on algorithm on how to generate a patch from two documents, only a format for communicating the differences.

A good candidate would have a limited number of operations, and not only specify the patch format and how to apply it, but how to generate a patch document from a pair of documents. For example: flatten the DOM into a sequence of SAX-like events and then applying a line oriented diff to the two sequences. The operations should be very limited in scope: delete element, add element, edit text node, edit element attributes. It could be that simple if I don't care about multi-master synchronization or three-way merge.

Noah,

A binary diff presumes that two actors would serialize the same XML DOM in the same exact way, which as Subbu points out is the realm of XML canonicalization, and not anywhere you want to go.

Posted by Joe on 2008-02-21

Joe: "A good candidate would have a limited number of operations, and not only specify the patch format and how to apply it, but how to generate a patch document from a pair of documents. For example: flatten the DOM into a sequence of SAX-like events and then applying a line oriented diff to the two sequences. The operations should be very limited in scope: delete element, add element, edit text node, edit element attributes. It could be that simple if I don't care about multi-master synchronization or three-way merge."

This is a big part of where I was going with this. For the most part, existing solutions try to do way too much.

Posted by James Snell on 2008-02-21

Joe, as far as an algorithm, the standard for tree differences is the Zhang-Shasha algorithm. The Microsoft XML Diff toolkit implements that along with another "fast" algorithm, which is unpublished to my knowlege. That said, that's an implementation that solves the entire issue. Also, XML isn't strictly a tree (because of attribute axes), so there's that issue. Anyway, this blog post has a pretty good discussion of Zhang-Shasha. I'd blogged on that a couple years ago when I was trying to deal with merging OPML documents. In the end, I pretty much punted on solving my problem because for my particular case, it was gold plating a solution.

Posted by Gordon Weakliem on 2008-02-21

ISTR IBM had something in this space called TreeDiff...

(rummages about the internet)

Hmm, it looks like it's been 'retired':

http://www.alphaworks.ibm.com/tech/xmltreediff

The Wayback Machine has a bit more from before the retirement in late 2003:

http://web.archive.org/web/20031002102746/http://alphaworks.ibm.com/tech/xmltreediff

I guess that's not actually very helpful, sorry.

Posted by Michael R. Bernstein on 2008-02-22

Wouldn't XSLT be a viable option for "patching" an Atom entry? I realize it doesn't fit the traditional patch model, but since it is simple to copy everything and filter as needed, it seems like a simpler solution. For example, the PUT could contain an Atom entry with atom:link[rel='stylesheet'] where the server would take its copy and update it with the result. Just thinking aloud a bit.

Posted by Eric Larson on 2008-02-23

Yes, XSLT would work. However, XSLT is Turing-complete, and I’m not sure it’s a great idea to take code supplied by an arbitrary remote client and run it on the server. Although I guess as long as you impose the obvious limitations (disable the document function, limit the amount of memory and time a transform may hog, that sort of thing), that could work very well. But it does mean that the server has to lug an XSLT processor around, which is a much bigger component than, say, an XUpdate implementation. OTOH, unlike the XML patch formats, it has the upside that server and client need not agree exactly about the document’s Infoset: if elements change position or whitespace nodes shuffle around, the transform need not be affected at all.

Posted by Aristotle Pagaltzis on 2008-02-23