Bitworking - theories of software development

by Joe Gregorio

The Well-Formed Web

2002-10-01T23:46:12-04:00:00

Over a month ago Paul Ford published a great essay entitled How Google beat Amazon and Ebay to the Semantic Web. After reading it the first time I thought it was a great introduction to the Semantic Web, an idea I had been trying to wrap my head around even since encountering RDF as it is baked into RSS 1.0. I had seen the light and bought into the promise of the Semantic Web.

Time passes...

With Dave Winer's floating of the idea of RSS 2.0 discussions ensue about the RDF in RSS 1.0. After spending some time badgering poor Bill Kearney for a concrete benefit of having RDF in RSS 1.0 and not getting a really satisfactory answer I went back and read Paul Ford's essay again. I wanted to get that old religious feeling back again. It didn't work. The magic was gone.

Jump back to a month ago, Mark Pilgrim and I were having a discussion about news aggregators accepting non-well-formed XML. Well-formed is a strictly defined term in the XML specification. It is a series of constraints a file must pass before it can be considered XML. Failing to meet any of the constraints means the file is not an XML file. It is the minimum threshhold for XML and an important measure because it means that the XML file can be loaded into any number of XML tools or libraries and manipulated programatically.

Now don't get me wrong, any text file can be manipulated programmatically, just load the file as a string and do search and replace using using regular expressions. The advantage of XML is that it imposes a structure that you can navigate using XML tools. And if there are many files of the same format, for example RSS files, then it becomes easy to process a great many of these files at one time and extract useful information. Just the kind of processing done by news aggregators. That's the idea of what I call the "Well-Formed Web", instead of a web of ill-formed and difficult to decipher HTML pages, the Well-Formed Web is all those HTML pages backed up with Well-Formed XML documents in well-known formats.

It's the simple power of well-formed XML documents and the ability to easily process them that took the sheen off the Semantic Web for me. Go back and read Paul Ford's essay again, but this time every time you see "RDF" substitute it with "XML" and every time he mentions "Semantic Web" replace it with "Well-Formed Web".

Go read it. I'll wait...

So what would be really needed to make Paul's vision come true? Google already indexes XML documents. So we need an XML format for selling items, for example the following file could be posted to my web site as http://bitworking.org/forsale.xml:

<forsale>
    <item id="guitar1">
        <minimumBid currency="dollars">300</minimumBid>
        <description>Guitar, Electric. Barely used.</description>
        <image>http://...jpg</image>
	<biddingEnds>2002-09-29T22:49:10-04:00:00</biddingEnds>
    </item>
    ...
    <item id="amp1">
    </item>
</forsale>

And a format for recording bids, with the following format, could be posted on a bidders web-site at http://iwantit.....org/bids.xml:

<bid>
    <bidder>
        <email>joe@bitw..</email>
    </bidder>
    <item>
        <reference>http://bitworking.org/forsale.xml#guitar1</reference>
        <offered currency="dollars">350</offered>
    </item>
</bid>

Now the bidder could have found the 'forsale' file by searching google, but how is he going to notify the seller that he's posted a bid for it? By using referer logs. That is, the application that creates and posts these XML files (you didn't expect to do this by hand did you?), can also request the 'forsale' file from the sellers site and when it does it fills in the referer information with a URL that points back to the bidders 'bid' file. Now the sellers software can do what Mark Pilgrim's Automatic linkback software does and collect referer log entries every hour and update the 'forsale' document to list the highest bids:

<forsale>
    <item id="guitar1">
        <minimumBid currency="dollars">300</minimumBid>
        <description>Guitar, Electric. Barely used.</description>
        <image>http://...jpg</image>
	<biddingEnds>2002-09-29T22:49:10-04:00:00</biddingEnds>
	<highestBid>http://iwantit.....org/bids.xml</highestBid>
    </item>
    ...
    <item id="amp1">
    </item>
</forsale>

Sure, not everbody has a web-site to post 'bid's or 'forsale's, so web-hosting services can do "micro-hosting". For $5 a year they'll give you one of these selling/bidding apps to run from home and a miniscule amount of file space to posting your files. The bigger and more frequently updated sites get searched by google more frequently and a whole new service category springs to life.

Distributed eBay. Now that's a web service. Very RESTian. No RDF. Hmmm, anybody got some angel money for me to implement this...:)

This is just one example of the possibilities of the Well-Formed Web. It can be built today with current tools and with no need for RDF, 3-tuples or ontologies.