BitWorking

PATCH motivating examples

Sam Ruby:

Spend some time up front specifying the behaviors that you want to address.  In the case of Atom, adding an entry, deleting an entry, adding a category to an entry, fixing a typo in the content are examples of common scenarios.  Feel free to use the Atom wiki for this purpose.


To move the discussion of PATCH forward I've posted some examples on the Atom Wiki. They're good examples because the highlight the problem with coming up with a PATCH format for Atom. The largest of those problems is ordering in Atom. That is:

   <entry xmlns="http://www.w3.org/2005/Atom">
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>

and

   <entry xmlns="http://www.w3.org/2005/Atom">
<updated>2003-12-13T18:30:02Z</updated>
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<summary>Some text.</summary>
</entry>

are the same even though the <updated> element has moved. You can create a patch format that required the server to always serialize elements the same order. For example, in the first example we could order our elements:

   <entry xmlns="http://www.w3.org/2005/Atom">                #1
<title>Atom-Powered Robots Run Amok</title> #2
<link href="http://example.org/2003/12/13/atom03"/> #3
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id> #4
<updated>2003-12-13T18:30:02Z</updated> #5
<summary>Some text.</summary> #6
</entry>

Then our patch format becomes a matter of sending over the element number, and information on either the updated element value or updated attribute values. For example, to change the summary to "foo" we could send:

{
  "n": 6,
  "value": "foo"
}

But given that order is not significant for Atom we may not want to use that approach.

Once you step outside of ordering you need some other way of addressing the elements and attributes that are changed. Note that relying on an already existing XML technology like XPath doesn't solve the problem. For example, here is a severely elided example from Google Calendar:

<?xml version='1.0' encoding='utf-8'?>
<entry xmlns='http://www.w3.org/2005/Atom'
xmlns:batch='http://schemas.google.com/gdata/batch'
xmlns:gCal='http://schemas.google.com/gCal/2005'
xmlns:gd='http://schemas.google.com/g/2005'>
<id>http://www.google.com/calenda...ervobk3ng</id>
...
<gd:who rel='http://schemas.google.com/g/2005#event.attendee'
valueString='Fred Flintstone'
email='fred@example.com'>
<gd:attendeeStatus value='http://schemas.google.com/g/2005#event.invited' />
</gd:who>
<gd:who rel='http://schemas.google.com/g/2005#event.organizer'
valueString='Joe Gregorio'
email='joe@bitworking.org'>
<gd:attendeeStatus value='http://schemas.google.com/g/2005#event.accepted' />
</gd:who>
<gd:where />
</entry>

If I wanted to send back a patch to update Fred's gd:attendeeStatus, what XPath would I construct to isolate that element and attribute?

atom:entry/gd:who/gd:attendeeStatus/@value

No, since that will capture both attendees.

atom:entry/gd:who[position()=0]/gd:attendeeStatus/@value

No, since that relies on the ordering of the elements.

atom:entry/gd:who[valueString="Fred Flintstone"]/gd:attendeeStatus/@value

Maybe, but is there any guarantee that valueStrings are unique?

atom:entry/gd:who[email="fred@example.com"]/gd:attendeeStatus/@value

This is right, but how would a generic client know how to construct such an XPath?

I am bringing this up to highlight the questions that need to be answered.

  1. Does the patch format work only for base Atom elements?
  2. Will it work for any and all extensions?
  3. Do we assume that all extensions are order independent?
  4. Do we presume that the server can always reconstruct the element ordering?
  5. Do we require out-of-band information to use or construct the patch representation?

The gd:attendeeStatus example points out that we either need out-of-band information, i.e. that @email uniquely identifies a gd:who element, or that order is preserved. The only other option is to include "in-band" information that makes constructing patches possible, for example, adding id attributes to each element.

Maybe you could expand a bit more on what problem exactly PATCH is trying to solve. This is still unclear to me, and probably others as well, because I seem to see different interpretations. Is it about:
1) *patching* the entire resource (as in: PUT, but trying to save bandwidth)
2) updating part of the resource/a subresource (probably best solved by giving it a separate URI, and using PUT on that, I'll try not to repeat myself and point to a comment I made on one of your previous posts)
3) batch processing, in the sense of doing multiple updates from the previous point at the same time (there seem to be different ways you could approach this: you could do the equivalent of multiple PUT'S on different Atom entries as a PATCH of the collection, or multiple updates on one entry as a BATCH of PUT's on subresources etc...)

So: why exactly PATCH? What are the use cases where PATCH is better than the other solutions and vice versa? And should an Atom PATCH format try to solve all these cases, or only those things that cannot be done using other approaches?

Posted by Steven Vereecken on 2008-03-04

Steven,

Not #3.

#1 and #2 are different solutions to the same problem, which is the one I want to address, how to update a small portion of a resource without re-transmitting the entire representation back in a PUT.

Posted by Joe on 2008-03-04

Joe,

You ask "5. Do we require out-of-band information to use or construct the patch representation?"

I'd argue that patches should be usable without out-of-band information. Providing out-of-band information to optimize the patch, however, seems eminently reasonable.

As a discussion starter, I've created a theoretical patch format which should work with any ordered or unordered XML file structure.

Posted by Stephen Bounds on 2008-03-04

I don't think it's a huge burden to assume the client has additional semantic knowledge about the elements (that email is a unique key in this case). In most cases it's constructing the XML from some other source (such as a web form) anyway, and so has to have explicit or implicit schema knowledge anyway.

Posted by John on 2008-03-05

Optimizations don't have to be 100% to be useful. A set of rules can be defined for Atom base elements and attributes. Individual extensions may provide additional rules for those extensions. Generic clients may simply opt to always use PUT. Or to only support the base Atom rules and a limited set of extensions, using PUT whenever any other extension is modified. While it may not be possible in this case, more tangible use cases (actually observed and captured existing usage would be ideal) would be helpful.

Posted by Sam Ruby on 2008-03-05

Joe,

You say "#1 and #2 to be different solutions to the same problem", OK, no problem with that (well, I think they are conceptually different operations, but in practice, you'd use them for the same purpose, so it probably doesn't matter). But PATCH is solution #1, and I'm not yet sure why/when this would be desirable over the other solution.
(This is also why I put in #3: if the answer is: so that you can do updates to multiple parts at the same time, this could be categorized under #3)

The advantages of #2 over #1 would seem to me:
- simpler
- no need for new http method
- no need for a PATCH format (just representation of the subresource)
- the server side can actually update that part as a separate thing (if supported, but it can indicate edit-uri's for those things), as in: the title is updated, instead of: a patch is applied, which happens to result in an entry with a different title.
Disadvantages:
- less powerful (only applicable to what the server has implemented as subresources, not ANY change to the xml)
- problematic with multiple elements belonging together, but having no container element (eg. Atom categories: you can't have an identifier on the "set of categories", because there is no grouping element. So you'd either need to change Atom, or invent a group-edit-uri or something, so you'd know where to POST new categories to)

I'm not sure this clarifies what I meant, but I hope so ;-)

PS: inspired by this, but offtopic: some kind of asynchronous patch support in the browser, for responses rather than request... That would be a nice way to do "Restful Ajax": GET the new page by applying a patch to the old page. You'd get smooth dynamic page updates AND get to keep distinct URI's (and a fallback mechanism for browsers that don't support it).
Ok, I'll stop dreaming ;-)

Posted by Steven Vereecken on 2008-03-05

I respect you both immensely but if you think the answer to patch is line by line diffs you guys have seriously cracked.

Posted by rektide on 2008-03-05

Stephen Bounds

The xpatch:keys document in your proposal looks like out of band information to me.

Posted by Joe on 2008-03-05

Steven,

- no need for a PATCH format (just representation of the subresource)

Were you not paying attention? That's "Blechy" and I'm not going there again.

Posted by Joe on 2008-03-05

Joe, well, that's the problem: I feel I've missed the point where it was decided that PATCH was the best/only possible "non-blechy" solution and why... (or maybe I just wasn't convinced myself and didn't get that everyone else was) But I'll leave it...

Posted by Steven Vereecken on 2008-03-06

2008-03-04