Reconsidering Search (Kinda) in the AtomAPI.

Joe Gregorio

Of all the facets of the AtomAPI, the search mechanism has caused the most angst. Indeed there is a whole page dedicated to discussing just this facet on the wiki. The search mechanism has gone through many changes and at this point I would like to re-introduce the search mechanism from the RESTLog API.

Actually, to call it a seach mechanism is really the wrong word, it is really a structured mechanism for browsing the archives of a site. In the case of the RESTLog Archive interface, the form and function of the browsing is completely under the control of the server.

Let's start by looking at the RESTLog Archive format:


<archives xmlns="http://www.purl.org/RESTLog/archives/1.0">
  <res href="http://wfw.org/news/5">RESTLog Interface</res>
  <res href="http://wfw.org/news/4">One step at a time</res>
  <res href="http://wfw.org/news/3">What's the point?</res>
  <res href="http://wfw.org/news/2">RESTLog Overview</res>
  <res href="http://wfw.org/news/1">Welcome to the Well-Formed Web</res>
</archives>

This is a very simple example of an archive file. In this case it is just a list of res elements that each have an href attribute that points to the Entry, and an element value that is a string that is used to display to the user to help them select which Entry to choose.

Here is a more complicated example:


<archives xmlns="http://www.purl.org/RESTLog/archives/1.0">
   <group title="Last Ten Stories">
      <res href="http://wfw.org/news/100">My Most Recent Post</res>
      <res href="http://wfw.org/news/99">My Next Most Recent Pos</res>
      .
      .
      .
      <res href="http://wfw.org/news/91">Some Post In The Recent Past</res>
   </group>
   <more href="http://wfw.org/news/moreViews">All Items</more>
</archives>

Note that this example introduces the group element. This allows multiple resources to be grouped together. Note that multiple groups can be used, and that they can also be nested.

In addition this example introduces the more element. This is an element that points to another file in archive format. In this way the client can navigate around a set of archive files and not have to retrieve the whole list at one time.

Let's go back and consider the group element. Think about what this would look like if you wanted to present it to a user. With their ability to nest you would use a Tree control, with folders for each of the group elements and files for each of the res elements. If you keep that analogy, then the more element is also just a folder, but one that doesn't get retrieved or displayed until the user clicks on it.

So how would this integrate into the AtomAPI? The Introspection file facet for seaching, currently search-entries would be changed to a more appropriate browse-entries and doing a GET on that URI would retrieve the first archive formatted file. Note that more GETs might follow as the client followed more links in that archive file, which lead to still further archive files.

Now the advantages of this approach are that is puts the entire browsing experience in the hands of the server. The server could present very simple archive files or it could present a rich and varied archive format. The server could present multiple views into the archive, with one group or more presenting a heirarchy by date, and another group presenting a heirarchy by subject, or by poster, or by content-type. It really doesn't matter as the server has complete control. In addition the text of the res element isn't restricted, you could put the post title there, or the server could put any reasonable information the user might find useful there.

This also puts the server in firm control of the amount of bandwidth the browsing interface uses. There is no "atom-all" list of all the feeds in the archive unless the server decides to produce it.

Now admittedly this does raise the bar for the client developers as they have to implement a more complicated interface, but the payback is in a much richer browsing experience. This also let's the server developers compete by producing different browsing strategies, all while using the same format and mechanisms.

So how would I use this to ask the server for the list of entries that contain the phrase "foo bar", or the list of entries that match various criteria?

Posted by Eric Scheid on 2003-10-17

The one problem I see is that software doesn't understand "navigation" until programmed to do so.  When a client starts or is done spidering all the archives, where does it begin to present it?

Posted by Ken MacLeod on 2003-10-17

Ken,
  The client displays one 'archive' at a time, it doesn't retrieve the next 'archive' until a user action, such as clicking on a folder of a 'more' element.

Posted by Joe on 2003-10-17

Joe - this is pretty close to what I have been trying to find a way to express, but so far have not been successful.  Perhaps it is time for me to start prototyping, but in any case, what I was thinking of is closer to what you see at:

http://www.python.org/doc/current/lib/module-re.html

Note the bar across the top with left, up, and right arrows, as well as special purpose links.  Of course the body of this page has a number of implicit "down" arrows.

If we could make this information machine readable, every Atom file would potentially be a directory and an introspection file.

Posted by Sam Ruby on 2003-10-17

Erik,
  As to the 'foo bar' search, that wouldn't be covered by this interface. Actually that wouldn't be covered by the current search interface either. I actually don't think a single blogging API has this functionality. Not to say that it isn't a good idea, just that if it's done it should be done under a different facet, say 'text-search'.
  As for other criteria, this format leaves it up to the server to present the data in useful categorizations. They are limited only by the developers imagination.

Posted by Joe on 2003-10-17

Sam,
  The current archive format could handle that kind of navigation, the server could serve up an 'archive' file with 'previous' , 'next' and 'up' pointers held in 'more' elements.

The only difference in navigation I see is that when following those links you would want the whole screen replaced with the contents of that archive and not just have the folder expanded. To accomodate that kind of navigation let me propose the following change to the 'archive' format:

  The 'more' element can have one of two attributes, "href" or "src", if the attribute is "href" then the user agent should replace the current view with the  'archive' at the given URI. If the attribute is "src" then the 'archive' at the URI is to be expanded in place in the current archive view.

Posted by Joe on 2003-10-17

Joe, I apologize as this is clearly a case of me not quite finding the right words to express what I have in mind.  I imagine that what it will take is a concrete example for us to discuss.  I'm working on it.

In the URL I provided, that information is within the page itself.  There isn't a separate set of "directories" vs "content" pages.  Of course, some pages have a higher content vs hypertext link ratio than others, but that is by choice and not an inherent limitation of the format.

My atom feed contains the last 20 blog entries.  It could provide a link to the previous 20 blog entries (a.k.a. "left").  It could provide a link to each of the comment feeds for the 20 blog entries ("down").  It could also provide a link to the list of feeds that are provided for this site ("up").

Each of those pages could also have similar navigational information, as appropriate.

Posted by Sam Ruby on 2003-10-17

This static system looks nice, and is a good replacement for the just as static 'search'-part of the current Atom API.

I only see one caveat: The RESTLog Archive Interface requires the user to do something before the application can retrieve the information. Of course, the application can retrieve everything without knowing what it is, but what it retrieves doesn't mean anything. The application doesn't know what it is.

This might not be a problem as there are users in the end that are going to read this, but unless we settle on a common dialect to express different views of the entry-database, the entries will never be possible to e.g. index automatically.

I think the RESTLog format can be used as a simple view into the entry-db, but it should be recommended to provide a richer and more standardized API for searching, where queries are registered and given common and global meaning.

How the queries should look like, and how they should be given meaning is not something I've thought of, but I think it's important to know what the entries you (you = the application) are looking at is, and why you are looking at them.

PS: I got an exception while trying to post my name with oslash instead of o.

Posted by Asbjorn Ulsberg on 2003-10-27

comments powered by Disqus