Two Identifiers

Joe Gregorio

Every Echo entry needs two identifiers, which we'll call, for lack of better names 'post-id' and 'perma-link'. They need to be separate, and they need to be required.

There is still a pretty heavy debate going on in the wiki and in Sam's blog about perma-link versus post-id. Now, I initially was for a single URI that operated as both a perma-link and as a unique id. I have since changed and I'm outlining here the compelling reasons for my change of heart. Also realize that this discussion is in the context of Echo as a syndication format. Echo will also be used as a publishing and possibly a commenting format and the required-ness of these identifiers may be different in those contexts.

Before we start justifying we need some definitions:

perma-link
A URI that points to the post on the web. Now that needs some clarification, first URI is a big concept, and subsumes many other things, for example all URLs are URIs, which means links of the form http:, ftp:, mailto:, and freenet: are all URIs. Also, URNs are also URIs. Secondly, the perma-link should point to the story, not the source. For example, if you write a weblog entry about a story in the NYTimes, the perma-link needs to point to that entry on your weblog and not the story in the NYTimes. The perma-link should be resolvable, for example, http:, but may be non-resolvable, though that is strongly discouraged.
post-id
An identifier that uniquely identifies the post on the web. Again, that needs some clarification. If you write a weblog entry about a story in the NYTimes, and post it to your weblog under two categories, the post-id will be the same regardless of which category it is published in. Also, the post-id is unique among all the Echo entries ever published, by anyone on the web, for all time. Once an item is published, it's post-id never changes. If you edit your entry, the post-id does not change. If you re-categorize your post, it does not change. Unique across space and time. What if you want to include some link to the source material? That is another Echo tag, possibly in another Echo optional module, that allows for citing multiple sources.

A required perma-link

Perma-link should be required. This is a synidcation format, and the perma-link points back to the thing you are really interested in. The only excuse for not being able to supply a perma-link is that the resource you are describing is not on the web. That's a pretty thin excuse, but for those extremely rare cases, you can stuff a URN or some other non-resolvable URI in this field. But really, if you can generate an XML Echo file that lives on the web that describes your resource, do you really have any excuse for not providing an HTML view of that same data?

A required post-id

Now that you have a required perma-link, do you really need a post-id? This is where I need to show three things.

  1. While a perma-link is a URI, it may not uniquely identify a weblog entry.
  2. A method to uniquely identify a weblog entry is necessary.
  3. Post-id must be required.

1. Perma-links aren't unique The first one is easy if you consider categories. For example, I subscribe to the NYTimes RSS feeds, both the science and the technology feeds. There is overlap, and some stories appear in both the science and the technology feeds. Which means that they show up twice in my aggregator. Similarly MT users can turn on multiple archiving methods, which means that the same story can have mutliple URIs. For each archiving method, the story is the same but sits in a different context. In can sit in a weekly archive, a monthly archive, or in multiple category archives.

But if they are the same story, won't they have the same perma-link? No, the perma-link may point back to the story based on the context. For example, if you are subscribed to an Echo feed that contains just posts from a certain category, the perma-link could bring you to a page that contained just post's from that category, and that's what you want to happen. So it is possible that the same story could have multiple perma-links and that those perma-links show up in different Echo feeds.

2. Uniqueness is required Which brings up to the second question, do you really need a unique identifier? Yes, because this will allow the aggregator builders to track posts and allow the end-user to control whether they see the same item if it appears in multiple contexts. Also, it will allow aggregators to more easily and consistently implement new functionality. For example, with a guaranteed unique id I can track changes to an entry, possibly higlighting differences in versions. I can also more easily and consistenly do threading if each entry has a unique id. I can group Echo entries that are all about the same thing.

On the CMS vendor side, some need a unique id to track items, and the post-id, particulary in the form of a URN, gives them a place to store that information in an easy to parse format.

3. Benfits of being required For the third case, a required element gives a couple of advantages. It makes the specification for Echo easier to write. There are two elements. They are both required. End of story. You don't have to worry about precedence or dis-ambiguation, and it makes for a really simple case for simpler CMS's, just make the two elements the same, and since you are already generating a required perma-link, spitting out the same value in a different element is not a big hurdle to implementation.

Also, if post-id and perma-link are both required this helps support a "view-source" paradigm. If you see both tags in every Echo feed then you can be sure you'll include them in your own feed. If they're optional you might not see a feed with both and subsequently miss out on the advantages.

Summary

Echo needs both a required perma-link and a required post-id. Since both are URIs, if the posts from your CMS only have one URL then just set post-id = perma-link. Sure it's a little redundant, but it's easy to implement. If you have content that isn't on the web, then use a URN for perma-link, but think long and hard about justifying what should be an extremely rare situation. Both supply potentially unique information, with the perma-link preserving the context of the weblog entry while the post-id is the same regardless of the context.

I can't help but think that it would be interesting if Echo entries could reference unique ids of other entries with some relationship to them.

For example, if I was to write an entry about Joe's entry, I could say "Joe's entry has the relationship 'linked to' from this entry."

If my software has that entry on record, it could then provide me with some link between the two.

However, this assumes my software sees both entries. Being able to reference the Echo version of an entry from an Echo entry would be interesting, but I suspect unimplementable since it would require everyone to keep Echo entries for all of their entries perpetually.

Posted by Martin Atkins on 2003-06-26

I strongly feel that the redundancy should be handled by the aggregator rather than the CMS. I do not currently use a CMS on my weblog so I have a vested interest in keeping the Echo specification as simple as possible.

You say yourself that all that needs to happen if there is no defined post-id is to set its value to equal the permalink. Why then force a requirement onto producers of the echo feed? If someone is posting across multiple categories then they should include the post-id, you could even specify "must". However  if a person is not posting across multiple categories why require them to include redundant information when that information can easily be generated by the aggregator?

Posted by Ben Meadowcroft on 2003-06-26

Ben,
  "Why then force a requirement onto producers of the echo feed?"

1. Because it makes the spec that much simpler. There are two elements. They're both required. End of story. No precedence, no dis-ambiguation, and a really simple case for simple CMS's: make them the same.

2. It is not an onerous hurdle. Really, we're talking about just one possibly redundant printf in your code. That's it. This will not melt the internet.

3. It supports the "view-source" paradigm. If you see both tags in every Echo feed then you can be sure you'll include them in your own feed. If they're optional you might not see a feed with both in there and miss out on the advantages.

Thanks for the questions, they have helped and I'm going to fold the above answers back into my post.

Posted by Joe on 2003-06-26

Joe, did you get a chance to review the following page, does it affect the arguments you make?

http://www.intertwingly.net/wiki/pie/EntryIdentifier

Update: In particular the definition of echo-identifier, which is comparible to permalink, but I believe addresses the issues of multiple references.  One could adopt the definition of echo-identifier, and call it "permalink" (which is part of the point I make).  For multiple references, there can be "also-at" and "mirrored-at".  Make sure to catch "Definitions -- Part Deux", which adapts the definitions to the existing terms "permalink" and "publisher-id" (or post-id).

Posted by Ken MacLeod on 2003-06-26

Although I can see the merit of perma-link and post-id, if your trying for simplicity then duplicating redundant data seems like the least simple and sensible thing to do.  If I look at my existing RSS 2.0 feed, the permalink URI is 15% of the total item data (135 of 864 bytes).  Adding a 15% overhead to my feed just to satisfy your notion of simplicity if hardly going to be a good thing for the internet.

Also, the term 'post-id' is way too blog-centric for something that is touted as some sort of open standard.  Or don't you want someone to re-use your constructs?

The fact that people have to discuss and rationalise the existense of a tag in the way that perma-link and post-id have been over the last few days ought to tell you that you've not yet identified the 'simple' solution

Posted by Dave Meehan on 2003-06-27

Dave,
  Sorry, but one more tag in an Echo feed will not melt the internet.

  As for names, 'post-id' and 'perma-link' are the current "concept" names off the wiki. Final tag names are still to be decided.

And they aren't my constructs, this is all a community effort on the wiki.

"The fact that people have to discuss and rationalise the existense of a tag in the way that perma-link and post-id have been over the last few days ought to tell you that you've not yet identified the 'simple' solution"

The fact that people are openly discussing and recording the merits of various alternatives is how concensus is reached. The only other alternative is specification via benevolent dictator, we've tried that, it failed, that's why we're here today.

Posted by Joe on 2003-06-27

I don't know if I agree one way or another about the use of two identifiers.  However, in the case of perma-link, wouldn't a URI that changes based on assigned "categories" be going against the RESTful "opaque URIs" practice?  There is no good reason for a perma-link to have categorization embeded in it, but I can think of at least one reason not to encourage this practice:  the discussion of whether we should have a single identifier or not.

Posted by Seairth on 2003-06-30

Seairth,
  A single resource can have multiple URIs and that really doesn't affect opaqueness. Opaque has a very narrow meaning, that is when you break up a URI into it's component parts: scheme, authority, path, query and fragment, that the client doesn't peek into the wrong parts to determine how to display the resource representation, or how to retrieve the resource representation. For example, if I server up a file 'fred.svg' the client shouldn't use the 'path' to determine the mime-type. That is, it shouldn't look at the 'svg' extension and try to display it as an SVG file, the client needs to look at the content-type returned in the HTTP headers.

Posted by joe on 2003-06-30

When a webpage(log) is HTML wise correctly written and proper use is made of the identification of the "news-entry" in the weblog (<div id="a559>...), then one could use at least three links in the RSS file.

As a child of the CHANNEL element the LINK element with a uri to the webpage(log) itself (http://www.mydomain/blog/blogindex.htm).

As a child element of the ITEM element the LINK element with a uri to the on the webpage living blog-entry (ID of the entries container DIV or Table or..) (http://www.mydomain/blog/blogindex.htm#a659)
As a child element of the ITEM element the GUID element with a uri to the blog-entries archived version(http://www.mydomain/blog/archives/20030703.htm#a659)
(It should be noted that when the weblog is refreshed and the RSS file is reflecting that refresh, neither a LINK of a GUID to the (a659) blog-entry is available unless the user/reader saved the older RSS feed. One can dispute that while the blogentry is on the weblog's page what the vaule is of having both a link to the weblog entry and the archived-entry)

If the RSS file is accumulating the blogentries over a longer time then the blogentries lifetime on the weblog then a guid(permalink) is usefull).

Cybarber

Posted by cybarber on 2003-07-02

It's surprising to me that there is even any debate about the need for a PostID or an ItemID that is separate from a permalink. Or that making them both required is a good thing. One is the address and one is the identifier. It is good hygiene to keep these things separate.

It most definitely keeps things more simple despite the protests of people who want to have the permalink be the postid in the absence of a separate postid. That's not simple. That's a pain in the ass!

A post id is just like a message id in a mail message or a news posting.

Where I think the debate should happen is over the format of the ID. It strikes me as inevitable that some bright bulb will decided that an Echo ID should be akin to an XML ID and thus won't be able to start with a number. Gah! Numbers make creating truly meaningless identifiers very easy and we want the identifiers to be meaningless as that provide maximum flexibility.

See this

http://home.att.net/~Flexibility/Identifiers/Part1.htm

for some info on persistent unique identifiers.

Posted by Chris Dent on 2003-07-07

comments powered by Disqus