Distributed Extensibility

Joe Gregorio

XML namespaces are designed to provide distributed extensibility using URIs.

Distributed extensibility means that multiple people, or organizations, can extend an XML format with out any communication between them, and if they follow the rules they will avoid syntactic collisions. That statement shouldn't be controversial, it's a statement of fact, the definition of distributed extensibility.

The problem is that everyone seems to accept the underlying assumption that distributed extensibilty is a desirable property.

It isn't.

It seems like such a great idea, let a thousand flowers bloom, and all that. And maybe, in a utopian world, where unicorns fart butterflies, it might all work out, but back here the real world, it just falls apart in a bunch of different ways.

The biggest problem with distributed extensibility is that it's a solution to problem that doesn't exist. The idea that you need a system where anyone, anywhere, can add new extension element to an XML document without consulting anyone, at anytime time, is just not a use case. Let's look at Atom as an example; the feedvalidator currently only knows about 80 extension namespaces. You don't need infinite extensiblity to track 80 things. You need a napkin.

But let's put aside the mechanics, what are the social downsides of distributed extensibility?

If you have distributed extensibilty then you don't have to have anyone review the change you are proposing. There's a reason why internally at Google we require that all code get a code review before being checked in; it's important to get another set of eyes on what you're doing. The second downside to distributed extensibility is that there isn't a single place to look for where extensions are defined, which of course presumes they are even documented to begin with.

So what are the alternatives? Fortunately there are several examples of non-distributed extensibility.

Link relations in Atom are tracked in a central repository. Getting your link relation registered is a matter of sending an email.

Another example is HTML5 link relations, in which case you just need to edit a wiki page.

Notice that both of the above systems address the social failings of distributed extensibility by keeping a central repository and providing for reviews of proposed extensions.

If I started today to build a new format I would build it on JSON and not XML; but that's not terribly relevent to the discussion. If I started today to build a format and was forced into using XML, then I would not use namespaces as the extensibility mechanism. Instead I would pick a single namespace, and require extensions to use a registry type system for extensions, like what was done for Atom or HTML5 link relations. If you wanted to add an extension then you'd have to propose the extension, either by email or editing a wiki page, and then that would be added to the registry of all extensions to the format. All extensions would then live in the one namespace for the format. I would also pick a URI for the sole namespace that looked as little like an HTTP URI as possible, maybe use a urn:, or do like WebDAV did and just make up a 4 character string: "DAV:"

And now the caveats. First, this isn't a post about HTML5. I know in the past there was a kerfuffle about distributed extensiblity in HTML5, but I stayed out of it, like I do with almost all discussion of HTML5. Also, this post is just about the concept of distributed extensibilty; I have a whole other tirade about the mechanics of using URIs as the extension mechansim, which will be for another day.

comments powered by Disqus