BitWorking

How To Do RESTful Partial Updates

Update 1: Wow, apparently using Atom as an example was a bad idea given the number of people with their knickers in a twist.

Update 2: Fixed the URI Template to accomodate Sam's nose.

Update 3: Good feedback from Tim Bray, Mark Nottingham, and Rob Sayre.

There are times when you have a large representation of a resource and only want to edit a small part of that resource. Wikipedia is a good example, where many entries in the encyclopedia are very long and you don't want to wade through all that wiki markup to correct a typo in one sub-section. To make it easier MediaWiki offers an edit link on each sub-section that allows you to edit just that sub-section. This same problem comes up in many other contexts, for example, editing a large Atom Entry via AtomPub, or working over a slow connections, ala a mobile device.

Just like the MediaWiki example, this can be done RESTfully and we'll construct just such a mechanism for AtomPub, and it should be obvious by the time we're done on how you can also do this for JSON.

So here's the goal: to be able to update and/or delete multiple sub-sections of a resource with a single request.

Starting from the MediaWiki example the easiest thing to do is define a URI for each sub-section we want to be updated. That gets us most of the way there, but only allows one sub-section to be updated at a time, and doesn't allow more than one sub-section to be updated in the same request.  So instead of having a single URI for each sub-section, we'll construct a URI that represents the set of sub-sections we want to update. And how shall we construct a URI? With URI Templates of course.

Here's a concrete example, an Atom Entry from an AtomPub Collection that lives at the URI http://example.org/edit/first-post.atom and has the following representation:

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
    <title>Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author><name>John Doe</name></author>
    <content>Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>

Now if I wanted to update part of this entry, say the title, using the mechanisms in RFC 5023 then I would change the value of the title element and PUT the whole modified entry back to the the URI http://example.org/edit/first-post.atom. Now this document isn't large, but we'll use it to demonstrate the concepts. The first thing we want to do is add a URI Template that allows us to construct a URI to PUT changes back to:

<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">
<t:link_template ref="sub" 
        href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title>Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author><name>John Doe</name></author>
    <content>Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>

Then we need to add id's to each of the pieces of the document we wish to be able to individually update. For this we'll use the W3C xml:id specification:

<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">   
    <t:link_template ref="sub" href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title xml:id="X1">Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author xml:id="X2"><name>John Doe</name></author>
    <content xml:id="X3">Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>

So if I wanted to update both the content and the title I would construct the partial update URI using the id's of the elements I want to update:

http://example.org/edit/first-post/X1;X3

And then I would PUT an entry to the URI with only those child elements:

PUT /edit/first-post/X1;X3
Host: example.org

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
   <title xml:id="X1">False alarm on the Atom-Powered Robots things</title>
   <content xml:id="X3">Sorry about that.</content>
</entry>

Notes:

  1. We keep the entry document element, ensuring that this it at least a well-formed XML document, but probably not a valid Atom Entry.
  2. Absence of an element whose id is present in the partial update URI means that it is to be deleted.
  3. You could also do a GET on the partial update URI to retrieve the current state of the sub-sections it identifies.
  4. If no t:link_template/@rel="partial_upate" is found then the server doesn't support partial updates and you drop back to doing a simple PUT of the entire representation as defined in AtomPub.
  5. This puts the server firmly in control of what sub-sections of a document it is willing to handle partial updates on.
  6. The use of URI templates also puts the server firmly in control of the shape of the URI used for partial updates.
  7. I didn't use a link element for the URI Template since URI Templates are not URIs. They become valid URIs after they're filled in, but the presence of the '{' and '}' characters means that they aren't valid URIs themselves.
  8. This doesn't solve all the partial update scenarios, for example this doesn't help if you have a long sub-list that you want to append to.
  9. You'll notice that I didn't give a good URI for the 't' namespace. I know better, if I did there'd be an implementation of this by the end of the day. One of the reasons I don't want to see that happen is that there are some open questions that need to be answered first:

Open Issues:

  1. Do you have to include the xml:id attributes when you PUT back an update?
  2. Do the xml:id attributes appear when you do a GET on such a resource?
  3. Obviously the representation of a partial update resource is not a valid Atom Entry. What should be the mime-type of that resource?
  4. There are undoubtedly XML parsers that will choke on xml:id attributes even though according to the XML specification the 'xml' qname is reserved and should always be defined. Are these problems widespread enough to kill the use of xml:id and warrant the creation of an id attribute in another namespace?
  5. Can t:link_template elements use the same IANA Atom Link Relation Registry or do they need their own registry, or do we just hold our noses and put the URI Template in an atom:link element? Obviously the set of t;link_template relations is a super-set of atom:link relations. The same problem also exists for using URI Templates in HTML link elements.
  6. How do you handle descendents that aren't children of the document element?

That last open question needs a little more explanation. If we had the id on name instead of author:

<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">   
    <t:link_template rel="sub" href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title xml:id="X1">Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author><name xml:id="X2">John Doe</name></author>
    <content xml:id="X3">Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>

Then what does an update look like? Notice that name is a descendent of entry, but not a child. Do we include the author element on the update? That is, do we send:

PUT /edit/first-post/X2
Host: example.org

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
    <name xml:id="X2">John Doe</name>
</entry>

or do we send:

PUT /edit/first-post/X2
Host: example.org

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
    <author><name xml:id="X2">John Doe</name></author>
</entry>

So there you have it, outside of six open questions, a nice RESTful way to do partial updates.

7. What happens when one of the updates succeeds but another fails? I'd be interested in the actual use-case.

Posted by Tim Bray on 2008-02-15

Tim,

What happens in straight AtomPub when you PUT a full entry back to the edit URI and part of the update fails?

Posted by Joe on 2008-02-15

Q#1: Do you have to include the xml:id attributes when you PUT back an update?

Generally I wouldn't think so, however consider if you are partially updating two or more elements of the same kind (eg. atom:link). It shouldn't matter there either if the xml:id's are not meant to be persistent. However, I can imagine cases where you might want to preserve the xml:id's, such as another extension which ties two elements together via xml:id and (eg) xml:idref, or simply for external references to survive partial updates.

The agent doing the partial update wouldn't necessarily be in the position to know whether other extensions rely on the persistence of xml:id or whether there are external references to those xml:id's, and so shouldn't be tasked with the option.

Thus, to answer Do you have to include the xml:id attributes when you PUT back an update? .. my answer is YES.

On Q#6 .. what happens if your example had the following instead:

<author>
    <name xml:id="X2">John Doe</name>
    <name>John Doe</name>
    <uri>http://example.org/</uri>
</author>

then what happens to the atom:uri when we PUT back <author><name xml:id="X2">Jane Doe</name></author>? Does the atom:uri get deleted or does it stay? What if we PUT back <author><name xml:id="X2">Jane Doe</name><uri>http://example.org/different</uri> </author> .. do we also update the author's atom:uri (despite there being no xml:id on it), or do we ignore what was PUT?

I'm leaning towards just sending back a non-hierarchical bag of elements to be updated, using the xml:id's to put them all in the right place. One option that approach disallows though is that you can't move elements around (say swapping the atom:uri of two contributors). Not a problem though: a move is just a shortcut way of deleting and creating.

Posted by eric scheid on 2008-02-15

I doubt many of the readers of this blog have much interest or fondness for the WS-ResourceTransfer specification, but there is plenty of similarity. If anyone cares, I took a quick stab at comparing them: http://stage.vambenepe.com/archives/164

Posted by William Vambenepe on 2008-02-15

@William: interesting notes. The race condition problem you noted has a known solution, being ETags. Joe is probably taking that as assumed in the discussion above since the Atom Publishing Protocol (RFC 5023) uses that solution too.

I've got an addendum to my earlier comment (since the 5 minute timer for updates goes from initial post, not subsequent edit posts)

Eschewing hierarchy also leads to the thought that the document element doesn't have to be atom:entry either, which goes some way towards answering Q#3. Perhaps you could define your own document element (and namespace an mime-type) appropriate for partial updates. Something like <x:update>

eek .. someone stop me before I reinvent something ghastly .. you could even have:

<x:update>
    <x:parts><!-- as above --></x:parts>
    <x:callback>http://example.com/someuri</x:callback>
    <x:some-other-non-payload-metadata>[...]</x:some-other-non-payload-metadata>
</x:update>

Posted by eric scheid on 2008-02-15

I like the idea in general. However, I am concerned that you're trying to infer the content of the surroundings. Note that this doesn't happen with WikiMedia either; you only see the sub-section you're interested in, not some collection of 'and it's located in ...' elements. In fact, the only time you'd need to view the parent is if you wanted to move it to somewhere else.

What's wrong with using # on the URI? That's a way of navigating straight to a different part e.g. thing.atom#foo. I think if the content type is XML, then using an xml:id is fine; but I don't necessarily think it has to be XML specific.

I think the interesting thing is being able to update two parts of the resource at once; but is that a common case? Why not just be able to use this to update one part of the content (and its children) at a time? If you want to update two different parts, use two requests. If they need to be simultaneous, put the parent instead (or a parent containing both).

In fact, isn't the REST approach to treat the URL of thing.atom#foo as just being a resource? The fact it might be just *part* of a larger resource isn't really relevant, in much the same way that blog/2008/ might be a larger collection of blog/2008/02/ -- but when I'm PUTting an entry, I don't have to surround the blog data with AllBlogsEver/2008Blogs/02Blog/MyBlogEntry to be able handle it.

What you've really exposed is a way of exposing child data of a document as a separate URI. You can use normal REST semantics to update that child URI part on its own, regardless of where it actually is.

Posted by Alex on 2008-02-15

Sorry to be a nitpick but you you made this typo a number of time:
<entry xmlns="http://www.w3.org/2005/Atom"
Should be:
<entry xmlns="http://www.w3.org/2005/Atom">

Posted by Noah Slater on 2008-02-15

Noah,

Fixed, thanks!

Posted by Joe on 2008-02-15

Wow, apparently using Atom as an example was a bad idea given the number of people with their knickers in a twist.
Using Atom as an example was a bad idea because it's a bad solution. Perhaps you could take some time to actually address the actual issues that were mentioned?

Posted by James Snell on 2008-02-15

So, in a multiple-update scenario, if one fails for some reason then they all fail? OK, sounds like the only sane policy to me. -Tim

Posted by Tim on 2008-02-15

What you are suggesting is just a way of generating URIs to identify parts of the resources that could be added, deleted or modified. What would a GET to a URI like http://example.org/edit/first-post/X1;X3 return? A partial resource? How about idempotency? It is not clear from your examples, but is this idempotent?

Posted by Subbu Allamaraju on 2008-02-15

(After some more thinking - hope your spam filter won't catch this) One key problem I see that this model requires encoding a diff format into the URI itself. I would be concerned about extending this model as a "general" RESTful solution for patching resources.

Posted by Subbu Allamaraju on 2008-02-15

Have you considered XCAP (http://ietfreport.isoc.org/idref/rfc4825/) ?

Posted by francois leygues on 2008-02-16

I started to reply here, but it got a bit long, so I moved my response over there.

Posted by Aristotle Pagaltzis on 2008-02-17

How can you do bulk updates? Example: add a default area code to all phone numbers that don't have one.
Or continuing along that line, would you not want a kind of more expressive update query language similar to SQL?

Posted by Karl Waclawek on 2008-02-17

You certainly don't need to add XML IDs to the source XML in order to be able to address it, as you can do this with XPath or even byte offsets. Really the problem here is that you're trying to use PUT to perform what Roy wanted to standardise as PATCH - the two are completely distinct and shouldn't be confused. On the other hand you can do this straightforwardly and RESTfully, using the same URL to refer to the resource (no URI templates malarkey), just using POST and some custom MIME type: either application/x-patch or some kind of XML diff format.

Posted by Chris Burdess on 2008-02-18

Why not keep with a single (edit) URI per subresource, and advertise that instead of the xml:id + URI templates (not that I'm against URI templates per se).
Then use something like the BATCH + application/http approach recently proposed by James Snell to send multiple separate PUT's to separate (sub)resources.

It seems like the batch stuff (or something like it) is going to be needed anyway, so why not use that for this case too and keep the subresource-updates themselves as simple as the MediaWiki example? (it might also solve the multiple status problem)

If the parts deserve to be separate resources, then they get their own URI and you use PUT. But it seems a bit far-fetched to me that every combination of parts would need to become a resource in its own right.
If you do not want the parts to be resources themselves: there may be room for PATCH, but some of the proposed diff formats are really blurring the line between sending content (delta encoding of the representation) and sending code. If you're going to say: "execute this javascript/xquery/whatever" instead of sending a new author or title, that seems almost more like RPC than REST. (otoh, it might not be so bad to use a hybrid approach, but you might as well just use POST then)

Posted by Steven Vereecken on 2008-02-18

GET /posts/12345?$expand=authors GET /posts/12345?$partial=title,content PUT /posts/12345?$partial=title,content As ADO.NET seems to be adding $expand to AtomPub, I'd follow a similar route with simply a param of Partial.

Posted by Andy on 2008-02-20

2008-02-14