BitWorking

This is Joe Gregorio's writings (archives), projects and status updates.

Meaning, Semantics and RDF is a title like <methodcall><params><param><struct><name>title</name><value>Some Title</value> is a title like <jabber>...<reply><subject>Some Subject</subject> is a title. XML Schema provides some basic data typing for individual xml datatypes, but it has no support for saying that a title in one schema is like (or is the same as) a title in another schema. Surely, one could build a layer alongside XML that provided that support. Or implement import/export facilities that perform the semantic mapping (as Haystack, like most other RDF applications, does with existing non-RDF formats). Currently, there is no emphasis on reuse of element names and attributes in "the well formed web" (seperate from embedding and extending, which is reuse of XML fragments). RDF places an emphasis on reusing the same terms where they *already* mean the same things. <del>Later,</del> As a corollary, maybe in a different thread we can talk about how a consistent, regular data structure has benefits compared to unstructured XML. Haystack: http://haystack.lcs.mit.edu/</reply></jabber></struct></param></params></methodcall>

Many things in life are cyclical, and one them is the recurring debates about RDF and the Semantic Web. I don't think the Semantic Web will ever work. Am I trying to discourage people from working on it? No. Keep working away, who knows, I could be wrong and something worthwhile may come of it, but I doubt it. This essay is about hightlighting that doubt.

First, the XML serialization of RDf is cryptic. Tim Bray makes very good points in his two posts about the difficulty of working with the XML serialization of RDF. It is also a point I make in XHTML+XForms+XLink=Xanadu. In that essay I was talking about XHTML versus HTML, but the argument applies just as well to XML versus RDF, that is, RDF in no way moves you up a level of abstraction.

But let's ignore the XML serialization, assume it gets fixed to something perfectly legible and amenable to "view-source". What of it? Well, I have also previously made the point that many of the promises that the RDF proponents make are realizable today with plain XML and we don't need to wait for RDF. This is the crux of the Well-Formed Web, both the essay and the web site, where I am demonstrating with working applications the practicality of the approach. But let's ignore that for now.

Finally, for the sake of argument, I'll also ignore that meanings change over time, people are stupid, people are lazy, people lie, and that there's more than one way to describe something.

Let's focus on the root of RDF, from Tim Berners-Lee, it's raison d'etre.

The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form.

This document gives a road map - a sequence for the incremental introduction of technology to take us, step by step, from the Web of today to a Web in which machine reasoning will be ubiquitous and devastatingly powerful.

This is the basics of the Semantic Web and for me it's very telling the TBL raises, then discards, artificial intelligence. It is telling because what he is talking about here is meaning. What does it mean? The whole Semantic Web initiative is based around the idea that you can boil human thought down into a machine digestable format. I can see the allure. For example, search is hard. That is, if I do a search on google for 'adagio', am I looking for the dictionary definition, the company, or the dog? The search engine is really searching for a web page that best matches what I mean when I say adagio. Now here's the crux, for me meaning is what happens when data reaches an intelligence. Consider the word "adagio". What is it. Well right now it's either a pattern of pixels on a screen, or if you printed this out, it's a pattern of ink on paper. Your eye detects the shapes, your brain processes the words and your brain assigns it meaning. Nothing more. If this essay is loaded up by computer it is nothing more that a string of bytes to be processed. Machines can do some pattern matching, indexing, referencing, and various other calculations, but meaning only comes from an intelligent entity.

Yes, I said entity instead of person. First, theres a good chance that there are primates and other animals that may be capable of higher thought and reasoning, and they are capable of assigning meaning to symbols. Unlike Tim Berners-Lee, I'm not so quick to dismiss the AI aspect, because I do believe that someday we will have artificial intelligences, and they will have to navigate not just the formalized halls of an RDF world, but will have to interact with the messy, fuzzy, contradition filled world the rest of us live in. And believe me, there's no RDF out here.

For those who were wondering, "Adagio" is the name of my dog. It is also the name of a famous piece of music by Samuel Barber, a jazz club in Savannah, and a classical trio on mp3.com.

Posted by Mark on 2003-06-04

And Adagio is online a (quite good) online tea shop... :)

Posted by Ryan on 2003-06-04

Ken, "RDF places an emphasis on reusing the same terms where they already mean the same things." But the point is that *meaning* is subjective. Let's look at an example along the lines of the one you already chose: . Now is that a perma-link to the item, or a link to the story that the item is about? It's a trick question, because both meanings are in use today, in RSS 1.0, which is RDF. So why isn't the magic of RDF fixing this ambiguity of meaning in RSS 1.0? And in response to your blog entry, I didn't setup an AI straw man here. The argument was about an intelligence, any intelligent being, and how that was the only measure of *meaning*. "Later, we can talk about how a consistent, regular data structure has benefits compared to unstructured XML." Sorry Ken, but this is a 'line' I hear repeatedly from RDF advocates, and it is extremely irritating. Irritating for the following reasons. 1. It drips with condescension. 2. XML is unstructured? Compared to what? All the XML files I have ever used have a nice structure. Here's a challenge, RSS 2.0 is just plain old XML, barely above a plain text file by your definition, and then there's RSS 1.0 which is RDF. Please give me one concrete, real-world, can-put-my-hands-on-it-today benefit of RSS 1.0 over RSS 2.0. What are the benefits of it's "consistent, regular data structure." If you do come up with one benefit, make sure it's not one that I can already do with a database full of RSS 2.0 files that understands XQuery. 3. XML is about markup, not data structures. This isn't all a programming exercise. There are real humans out there typing this stuff in all day long. To quote Tim Bray, "It's the Syntax, Stupid". XML is not an Info-Set, it's a syntax, used for data interchange. 4. And finally, if your definition of 'unstructured' XML is mixed content then consider this, there are currently 3 billion pages on the World-Wide Web. All of it could be passed through Tidy and made into XHTML, but all of would still be 'mixed content' XML. Guess what. Nobody is going to translate those 3 billion pages into RDF. Nobody. Mixed content is messy, and the majority of the web is messy mixed content because the real world is messy, and can't be stuffed into triples.

Posted by Joe on 2003-06-05

Haystack appears to be the most complete RDF user application I've seen to date. In particular, it uses the RDF data model as its core, unlike Chandler which appears to be leaning towards RDF only as an externalization. It is a great simplification knowing that a dc:title in an email message is the same as a dc:title in a web page is the same as a dc:title in a calendar item. In contrast to somehow knowing that a is a title like <methodcall><params><param><struct><name>title</name><value>Some Title</value> is a title like <jabber>...<reply><subject>Some Subject</subject> is a title. XML Schema provides some basic data typing for individual xml datatypes, but it has no support for saying that a title in one schema is like (or is the same as) a title in another schema. Surely, one could build a layer alongside XML that provided that support. Or implement import/export facilities that perform the semantic mapping (as Haystack, like most other RDF applications, does with existing non-RDF formats). Currently, there is no emphasis on reuse of element names and attributes in "the well formed web" (seperate from embedding and extending, which is reuse of XML fragments). RDF places an emphasis on reusing the same terms where they *already* mean the same things. <del>Later,</del> As a corollary, maybe in a different thread we can talk about how a consistent, regular data structure has benefits compared to unstructured XML. Haystack: http://haystack.lcs.mit.edu/</reply></jabber></struct></param></params></methodcall>

Posted by Ken MacLeod on 2003-06-06

Ken wrote, "RDF places an emphasis on reusing the same terms where they already mean the same things." Joe replied, "But the point is that meaning is subjective." The use of terms in RDF is intended to be far more objective than subjective. Whether it can succeed at doing so is a seperate issue, and one that is covered well in Metacrap, as you linked. Are the usages of RSS titles, HTML titles, Jabber subjects, and email subjects consistent with each other? Ignoring RDF for just this question, wouldn't it benefit XML processing if there were a way to indicate, outside of coding this information in every application, that those usages were consistent? Say, for example, by adding a notation in an XML Schema that said, "this element, in this schema, is a 'title', as defined by the Dublin Core Metadata Initiative." Then, with all those items dragged to a Folder within an XQueryable database of XML, one could say, "display all the values that are 'DC titles' in that folder". I'm not a rabid supporter of RDF, just recognizing many of the good ideas among the bad. I know what its warts are, and it's got some bigger ones than some other technologies. I wouldn't at all mind discussing how to take its _good_ features, like the one above, and figuring out how to make it work elsewhere. Trying to overstate that feature, and somehow indicate that it is magic or divinable only by intelligent thought, and then dismiss the technique, and RDF because of it, as nothing but doomed to failure appears to be a straw man to me. The destructiveness of the straw man is that we can not then even begin to discuss using the feature elsewhere. (reply on structured data to follow)

Posted by Ken MacLeod on 2003-06-06

Regarding consistent, regular data structure compared to unstructured XML, I'll start with the points I agree most strongly with: "4. [...] if your definition of 'unstructured' XML is mixed content [...]" No, I mixed content. I've been using both presentational and descriptive markup in one form or another for my whole career. Norm Walsh's site (which rocks, btw) uses DocBook for each blog entry, and his knowing how to map the markup to extract metainformation is astounding in its thoroughness. Whether or how something like that can be done on a broader scale has huge untapped potential. Where it is today is best described by a google search I did the other day: searching on two terms, the top two hits were because the terms were found in a "headlines box" that contained the words "earlier in the raw HTML" than the the articles themselves linked thru those headline boxes. "3. XML is about markup, not data structures." I agree completely, yet the pattern of XML elements and attributes usually falls into one of three categories that I've termed "custom", "automatic", or "fixed": "custom" is where the (little-s) schema of the XML is specific to the information being interchanged (be it HTML, DocBook, or RSS 2), "automatic" is where the schema follows a common or standardized pattern, but uses element and attribute names drawn from the information from different applications using the schema (SOAP and RDF), and "fixed" is where the schema is explicit, and only the values differ among applications (WDDX and XML-RPC). "2. XML is unstructured? Compared to what?" By "consistent, regular structure" I'm referring to either "automatic" or "fixed". By "unstructured" I'm referring to "custom", or unique-per-application schemas. There are three primary benefits to automatic and fixed representations: a) the information can be unmarshalled directly into internal representations without knowledge of the application that will ultimately use it. b) the internal representation is consistent and regular throughout also, so traversal across the information is the same regardless of where you are in the context (think of rows in a database, objects in a language, dictionaries and lists, or nodes and properties). Translations can and often do occur between fixed, automatic, and custom representations, both at the syntax level using tools like XSLT and in their internal representations using tools like DOM and SAX. "If you do come up with one benefit, make sure it's not one that I can already do with a database full of RSS 2.0 files that understands XQuery." The example I gave in my previous message is a good one: given a "folder" (where a folder can be a list of URLs, a filesystem directory, a Python sequence of DOMs, or a select group of instances in an XML db) containing an RSS 2.0 item, a Jabber instant message, and an email message, what is the XQuery to retreive the title-like value of each? Note that the question doesn't have anything to do with RDF. The same thing could be done using XSLT to import each schema into a "common" schema, where the XQuery would be simply "//my_folder/./dc:title". Thus, even at the syntax level, disregarding any "internal" representation, c) by using an agreed-upon common representation _with_ common terms, applications become much simpler. In current practice, by using common representation and common terms, translations (like XSLT) into and out of the common representation can be also be shared between applications, rather than being uniquely implemented by each application. "1. It drips with condescension." My apologies, I literally meant "let's pick this up later". I've updated the post accordingly.

Posted by Ken MacLeod on 2003-06-06

Joe, Your blog has interesting content, and it is formatted nicely in paragraphs and lists when I read it in a browser (IE). But when I read it in my RSS aggregator (BottomFeeder), each item is just one long string of text -- and much too much work to read. I looked at your RSS file to see if maybe my aggregator was at fault, but no, each item in the feed is just one long string. I know there is an ongoing debate about the use of HTML tags in item descriptions, but there are ways to do so. IMHO, RSS item formatting is a must! Regards...Rich Demers

Posted by Rich Demers on 2003-06-06

Rich, Are you seeing this problem on my main feed or on the comments feed?

Posted by Joe on 2003-06-06

Joe, The main feed, for sure. Not sure about comments, though my last comment had paragraphs separated by a blank line -- which have disappeared in the main feed. I am only subscribed to your main feed. Rich

Posted by Rich Demers on 2003-06-07

Speaking of comment feeds, Joe, could you please retitle your comment feed so that it won't have the same name as the main feed? I haven't got around to adding the "override original title" feature to Aggie yet...

Posted by Ziv Caspi on 2003-06-07

Paul Prescod (not known for being a fan of RDF) recently said something on an xml-dev thread on "document vs. data-oriented XML" that resonates with me on this topic:
If one could go back in time, one could approach the problem from scratch with the needs of document and data heads equally represented. It would not just be useful to combine them so we could reuse tools. It would be useful to combine them because most documents have a data-oriented subset (if only the "metadata" element at the top) and many data applications have a document-oriented subset (if only rich text fields). Another reason to combine them is that there is no clear boundary. There is a spectrum.
That's the fascination I find in Norm Walsh's site: he sees the spectrum, and coded it.

Posted by Ken MacLeod on 2003-06-07

Ken, First, sorry for not responding earlier. I have been ill and didn't want to respond with my body full of possibly mind-numbing chemicals. "The use of terms in RDF is intended to be far more objective than subjective. Whether it can succeed at doing so is a seperate issue, and one that is covered well in Metacrap, as you linked." Ah, but that was the point of the whole essay, that I don't think it can be successful at being objective, because language and meaning are *so* subjective. "Surely, one could build a layer alongside XML that provided that support. Or implement import/export facilities that perform the semantic mapping (as Haystack, like most other RDF applications, does with existing non-RDF formats)." And this is a good thing, because there are huge amounts of information out there that will never be translated into RDF. This is one of the annoyances of the serialization of RDF, which I really didn't want to talk about, but RDF takes the wrong posture. That is, RDF adopts a sovereign posture versus a transient posture. I shouldn't have to re-design my XML formats to conform to an RDF world view. What would benefit RDF greatly is transformation language that allowed you to specify all the triples in a given XML (non-RDF) format. An idea that I have been pushing for a while.

Posted by Joe on 2003-06-10

Rich and Ziv,

I have updated Bulu so that the comment feed now has a title and the comments should be displayed with all their formatting in place. Unfortunately, this will only work for comments going forward, it isn’t retroactive for old comments. While I was at it I also added full Textile support to the comments.

Posted by joe on 2003-06-11

This is a test of the _comment_ escaping <object>i</object>.

Posted by joe on 2003-06-22

2003-06-03