Ocean boiling in the age of microformats

Joe Gregorio

Have I mentioned that syndicating microformats is hot. That it's important? That it's possibly one of the most important things in syndication? Ever?

Well, I have now.

Avoiding plain XML and presentational markup from Tantek Çelik is a collection of links and observations. The takeaway for me is that if given the choice between enhancing an already semantically rich format like XHTML to carry the data you want or creating new elements in new namespaces, choose the former rather than the latter.

The marketing message of XML has been for people to develop their own tags to express whatever they wanted, rather than being stuck with the limited predefined tag set in HTML. This approach has often been labeled "plain XML" or "generic XML" or "SGML, but easier, better, and designed just for the Web

The problem with this approach is that while having the freedom to make up all your own tags and attributes sounds like a huge improvement over the (mostly perceived) limits of HTML, making up your own XML has numerous problems, both for the author, and for users / readers, especially when sharing with others (e.g. anything you publish on the Web) is important.

Of course, if you are syndicating microformats then that means that someone had to publish that information, so we also get a nice intersection of the Atom Publishing Protocol and microformats. I've outlined how microformats should interact with the APP on the [atom-protocol] mailing list.

If you have any doubt how important it is to get this right in the APP then go listen to Adam Bosworth's "Database Requirements in the Age of Scalable Services" [via lesscode.org].

To quote Sam Ruby:

My theory is that most of the interesting metadata is in the content.

but what about the namespace problem?

sure, profiles can be written and referenced, but what if a page uses two profiles which share a common symbol?

Posted by eric scheid on 2005-07-27

I'm hoping the last chapter of my book coming out next book will be well received.  In it, I throw together some Python scripts to convert from iCalendar to hCalendar to syndication feed to mod_event-enriched syndication feed... and back to iCalendar again.

Posted by l.m.orchard on 2005-07-27

(next book?  er next month, I mean.  :) )

Posted by l.m.orchard on 2005-07-27

Eric,
  Here is the xpath selection I use to pick out the microformat in my Secure Syndication script:

  //div[contains('encrypted blowfish', @class)]//div[contains('encdata', @class)]

  A lot more than a single class name would have to collide for this to fail.

Posted by Joe on 2005-07-28

l.m.orchard,
  That's great! I'm looking forward to your book.

Posted by Joe on 2005-07-28

"A lot more than a single class name would have to collide for this to fail." - ok in this case, but is everyone else taking such care? (I personally don't think it's likely to be a problem in practice).

I don't disagree on your (and Sam's) general point, but would suggest care when trying to generalise beyond syndication of content+metadata.

Sometimes -

<p class='age'>30


is appropriate, but sometimes -

<x:age>30</x:age>

and sometimes even -

age = 30;

(I'm looking forward to the book too!)

Posted by Danny on 2005-07-28

Joe's XPath says:

//div[contains('encrypted blowfish', @class)]//div[contains('encdata', @class)]

But that will have a lot of false negatives: if the class is "blowfish encrypted", or "encrypted<newline>blowfish", or "encrypted<space><space>blowfish".  (Actually, if I read the HTML spec correctly, the newline should be simply ignored, and can therefore occur in the middle of an attribute name.)  Likewise, there are false negatives; things like "unencrypted blowfish" should not match.

The XPath to match this correctly is more complex.  I think the correct "contains" clause looks something like [contains(concat(" ", translate(@class, " \r\n\t", "  "), " "), " encrypted ") and contains(concat(" ", translate(@class, " \r\n\t", "  "), " "), " blowfish ")].  Except that
"\r", "\n", and "\t" have to be literal carriage return, newline, and tab characters, as far as I can tell.

Perhaps XPath is not the ideal solution to this problem.

Posted by Kragen Sitaker on 2005-07-28

Joe, great post.  What you're saying, and what Sam Ruby is saying and others who are tired of waiting for complex solutions are saying all makes sense to a lot of folks.

Eric, in response to your questions.

The namespace problem?  The short answer is that it's nothing more than academic chicken-littling.  The internet has worked just fine (at many levels, with many applications, specifications) without namespaces.  Namespaces try to solve a 1% problem while burdening the 99%, which is always bad economics.

What if a page uses two profiles?  This is clearly explained in the XMDP spec.  See http://gmpg.org/xmdp/description

Thanks,

Tantek
http://tantek.com/log/

Posted by Tantek on 2005-07-28

Tantek,
  Thanks! And thanks for summarizing so well the problem with namespaces.

Posted by Joe on 2005-07-29

I accept Tantek's point in the context of microformat documents. Naming clashes are highly improbable within an individual document. But I'm sorry, the Web does depend on namespaces, in the form of URIs.

(Having said that, the string "Tantek" will take you to his site, using the Google protocol ;-)

Posted by Danny on 2005-07-29

Namespaces exist for human implicitly. When I read a weblog, when I hear the use of words by someone. I usually know after a while, that when Tantek says something, it means this and that with all these consequences (this is the tantek namespace). When Danny says another thing using the same words, the person, the previous words etc define a context to the expression.

It's the basics of all social communications, nothing academic in that :) Another attempt to make devil something which is not, but completely natural and daily.

A very good example of that is "friend" in English and "friend" (ami) in French, it doesn't have the same meaning at all. What most English native speakers call a friend is what French calls "someone you know" or "acquaintance".

Namespaces on the Web are nothing more than that, it helps you to disambiguate things, to give a bit more context. It's not rocket science.

It's just acknowledging the diversity, it's helping to accept the differences. Nothing academics again, just the plain reality of social relationships. :)

Posted by karl on 2005-08-01

comments powered by Disqus