In which we narrowly save Dare from inventing his own publishingprotocol

Joe Gregorio

Dare Obasanjo has come up with a number of issues he has with the Atom Publishing Protocol. I am led to wonder about the timing of his complaints as the APP is close to getting an RFC number. What spurred this sudden bout of sour grapes?:

For this reason, we will likely standardize on a different RESTful protocol which I'll discuss in a later post.

Ah, so if these issues just turn out to be misunderstandings on your part then Microsoft will just use the APP and not roll out its own protocol? I'm so glad to hear that.

The first complaint is that Dare doesn't understand that an Atom Entry is a document and not an envelope. He would like to cram everything under the sun into an Entry, which isn't how the format is supposed to be used. Note, I'm saying format here, and not protocol, since that is where the problem lies - Dare should have the same complaints about syndicating the data as he has about authoring it.

This actually isn't a problem in the APP, if you have something that doesn't fit into an Entry then store it under a Media Collection. You could store vCards, vCalendars, PDFs, images, videos, etc. When you create such a collection member you actually create two resources, the media itself, and an associated Atom Entry that contains meta-data about the media. The nice part is that those other formats have their own mime-types, which is important and something I stressed when talking about WADL. Also see what Bill has to say about this.

The second complaint is one of data loss:

The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

Fortunately, the only real problem is that Dare seems to have only skimmed the specification. From Section 9.3:

To avoid unintentional loss of data when editing Member Entries or Media Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup as defined in Section 6 of [RFC4287].

And further, from Section 9.5:

Implementers are advised to pay attention to cache controls, and to make use of the mechanisms available in HTTP when editing Resources, in particular entity-tags as outlined in [NOTE-detect-lost-update]. Clients are not assured to receive the most recent representations of Collection Members using GET if the server is authorizing intermediaries to cache them.

Hey look, we actually reference the lost update paper that specifies how to solve this problem, right there in the spec! And Section 9.5.1 even shows an example of just such a conditional PUT failing. Who knew? And just to make this crystal clear, you can build a server that is compliant to the APP that accepts only conditional PUTs. I did, and it performed quite well at the last APP Interop.

The last complaint is a vague one about non-hierarchy. Of course, in the middle of his explanation of the problem, Dare actually admits it really isn't a problem:

This means if you want to represent an item that has children they must be referenced via a link instead of included inline.

Again, what we have here is a complaint about the format and not the protocol, as this applies just as well to syndication as it does to authoring. And yes, that's the way Atom the syndication format, and the protocol, represent relationships between items, via links. One simple, consistent, easy to explain mechanism, as opposed to a hybrid approach of allowing linking and inline inclusion, because even if you allowed inlining you would still need to allow linking because no one has found the one true hierarchy to rule them all.

In summary, the three issues are not issues at all and have very simple solutions.

  1. Use Media Collections when appropriate.
  2. Read the Spec.
  3. Gloves
Maybe Dare wrote that post over several editing sessions, but:
The second problem is more serious and should be of concern to anyone who's read Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout. The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.
Dare needs to freshen up on conditional requests, I think. And a bit of lock-free algorithms: that last sentence is a dead give-away.

Posted by Luis Bruno on 2007-06-10

Luis,

Someone using your name called Dare an 'idiot' in his comment thread. While we might disagree with Dare's analysis and actions, that kind of name calling is unproductive.

Posted by joe on 2007-06-10

I'll own up to it. I'm not sure I regret doing it, as you could delete that first sentence and still read the "insult" between the lines.

I guess it's a cultural thing. I learn fast when people accuse me of idiocy — and substantiate that accusation.

Posted by Luis Bruno on 2007-06-11

Dismissing Dare's ideas out of hand is not smart. He raised some real issues that GYM-sized sites are dealing with. The standards can either adapt to these realities, or watch themselves go largely unused (hello, XQuery). You say linking is just a question of format, but it's much more than that. We're talking about sites that carefully study the effects of DNS lookups and JS/CSS downloads on serialization of network transfers (to achieve maximum parallel throughput to the client). Making separate HTTP requests first to get the outer container and then to get each individual item in the container (not to mention the added complexity of partial retrieval when one of those items fails to download) is a perf-killer vs. just inlining the whole hierarchy. Of course, we deal with this already today, with the HTML page being a container with references to JS and CSS files (which themselves may reference more). Additional complexity such as ETags and caching is layered on top to try to reduce the impact, and before you know it the standard is a whole lot more complex than it started out. APP would be stronger if it dealt with paging and inlining/depth right from the start. This complexity is going to show up regardless, so why not put it in the base protocol? Others, like Astoria are doing that. It's good to see Microsoft paying attention to and learning from the diverse efforts happening in this space. I wish I could say the same for everyone else. Some people have latched onto particular technologies with religious fervor and declared those to be The Right Way, when really there's more than one way to do it, and each way has its pros and cons.

Posted by michael on 2007-06-11

michael,

Dismissing Dare's ideas out of hand is not smart.

Dare's idea was to write a new protocol.

Of course, we deal with this already today, with the HTML page being a container with references to JS and CSS files (which themselves may reference more).

You write that as if it was a bug and not a feature. As if we all know what a failure the web has turned out to be.

Additional complexity such as ETags and caching is layered on top to try to reduce the impact, and before you know it the standard is a whole lot more complex than it started out.

Caching and ETags are part of HTTP, and APP is built on top of HTTP, and takes advantage of HTTP, as opposed to some other "protocol independent" protocols you might be used to. Again, you talk about this as if it's a bug and not a feature.

I just have to let these next two quotes stand in opposition to each other:

APP would be stronger if it dealt with paging and inlining/depth right from the start.

Some people have latched onto particular technologies with religious fervor and declared those to be The Right Way, when really there's more than one way to do it, and each way has its pros and cons.

As for my religious fervor, you appear to have missed my comment on Dare's blog, where I agree that there may be areas where such functionality would be useful, and that APP has well-defined extension points where that could be added. The tools for writing an I-D are here, and the atom-protocol mailing list is here. I look forward to your contributions.

Posted by joe on 2007-06-11

Joe,
Could you please expand a bit on your suggestion to "Use Media Collections when appropriate." Isn't a Media Collection ultimately a collection of references to items? How would one use APP to return a collection of these objects? If I was considering using APP to store Orders say, or anything else that doesn't map well to an Atom Entry, the suggested model is to store off an Atom Entry and an actual Order as it's own resource. Where would APP come into play if I wanted GET, for example, all open Orders? If I were to use APP, even with some as-yet-unspecified search interface, in this case it would return a set of references to the Orders, which would then need to be selected individually - which obviously isn't scalable. Thanks

Posted by Winter on 2007-06-12

Winter,

I know from previous comments you've left elsewhere that you disagree with this design decision in the APP.

...which obviously isn't scalable.

Maybe the non-scalability is obvious to you, but you are going to have to explain it to me. And as you explain it please remember michael's advice:

Some people have latched onto particular technologies with religious fervor and declared those to be The Right Way, when really there's more than one way to do it, and each way has its pros and cons.

And as I've said before, here in this very thread for example, there may be areas where such functionality would be useful, and that the APP has well-defined extension points where that could be added. The links for the tools for writing an I-D and the atom-protocol mailing list are in that previous comment. I look forward to your contributions.

Posted by joe on 2007-06-12

Joe,

Thanks for answering the question. First of all, this was a poor choice of words:

...which obviously isn't scalable.

Scalability has many dimensions and there is nothing inherently "unscalable" about APP.

What I was trying to understand is how or if APP can ameliorate what I see as a problem of working with large Media Collections - or if APP avoids this issue entirely through some usage pattern. If each entry involves another round trip to get the actual resource state, the larger the collection becomes, the more performance degrades.

In the APP implementations I've worked on, we've gone with the GData approach of mapping the entity to an Atom entry, but I've never been particularly happy about the fact that the media type is application/atom+xml. I was surprised to see the Media Collection model as the preferred approach in some cases and was wondering if I was missing something.

It would be nicer if there were a standard way to route on the entry content at the header level without having to crack open the document body.

Posted by Winter on 2007-06-13

Winter,

I was writing a response that kept getting longer and longer, so I turned it into its own blog post.

Posted by joe on 2007-06-13

comments powered by Disqus