Thoughts on the Google API

Joe Gregorio

It's been a while since Google released it's SOAP based API and all ensuing discussion. I only recently have had a chance to play with the API and it does raise a question.

Paul Prescod covered what the API would have looked like if it were formulated under REST, but his formulation does have a major weakness, in that it encodes the Google Key into the URI. The problem is that the URI may show up in referrer logs and thus increasing your chance of getting your key stolen.

On the other hand, Google opted for SOAP and thus embeds the key directly in the requesting SOAP body.

What both of these approaches ignore is that there are Six Places to store information in any HTTP request/response pair. In particular they both ignore HTTP headers, which in this case is the perfect location to store the Google key.

So if you remember, the old unrestricted pre-SOAP Google interface was to just replace /search to /xml. If I were to search for the word 'adagio' using such a REST version of the API then the request would look like:

GET /xml?q=adagio HTTP/1.1
Host: www.google.com
Accept: application/xml 
X-Google-Key: 734981732987374940

Now the key doesn't get held in the URI and the API reverts to a simple GET with no need for POSTing SOAP envelope wrapped XML query parameters.

Is there any reason not to use HTTP Authentication / Authorization for the key? Something like this (for user bitworking.org with the given Google-Key):

GET /documents?q=adagio HTTP/1.1
Host: www.google.com
Accept: application/xml
Authorization: Basic Yml0d29ya2luZy5vcmc6NzM0OTgxNzMyOTg3Mzc0OTQw

/documents would work for both HTML and XML. As names xml and search both miss the point IMHO: Google talks about how "Your search - someStrangeWord - did not match any documents."

Posted by Arien on 2003-08-11

Yes, that is also a possibility.

The one downside to that is if you are extremely worried about performance or server load. With HTTP authentication you end up doing a round-trip since the server has to challenge the client to get the authentication. That performance hit is not present on subsequent calls to the same or URI or any URI below the one given, since the client should automatically send the credentials on those requests.

With respect to the /document /search and /xml I was just copying what Google currently does, or did, as the case may be.

Posted by joe on 2003-08-11

As RFC 2617 clearly states, the client may send an Authorization header without the server asking for one, so the round-trip is in no way required.

Posted by Arien on 2003-08-12

Arien,
  That might be true with Basic auth, but since Digest auth is a Challenge-Response mechanism, where the server Challenges with a nonce value that is used in the Response, the round trip is required on the first request.

Posted by Joe on 2003-08-12

Good catch. :-)

But then, this is not a consequence of using HTTP authentication instead of the X-Google-Key header (as you said above): the round-trip would be required in either case when doing things Digest-style.

Anyway, I'm not trying to pick nits. I was just curious as to why you used a nonstandard header.

Posted by Arien on 2003-08-12

http://www.intertwingly.net/blog/1557.html#c1060610077

"And I can't help but note that I can't just include a link here to the validator output since the RDF validator uses POST for it's form instead of GET."

Paul's proposal retains this essential characteristic of HTTP GET.  Your's destroys it.

If one really wants to superimpose a request/response semantics over an interaction, I'd suggest that HTTP POST be used.  That's what it was designed for.

Posted by Sam Ruby on 2003-08-12

Nice catch Sam, but it only lends support to Ariens idea of using HTTP auth.

It also completely ignores the implementation costs. I could either use my current HTTP library and just add one custom header or bring in the entire SOAP processing model.

Posted by Joe on 2003-08-12

... where "entire SOAP processing model" reduces to "scan for mustUnderstand and reply with faults on errors".

HTTP Auth would have been a resonable solution in this case.  The issues with HTTP Auth tend to be on the server side, something that Google presumably could have handled.

Posted by Sam Ruby on 2003-08-12

... where the entire processing model includes the mandated PSVI (Post Scheme Validation Infoset) where this:

<Amount>12.30</Amount>

turns into this:

<Amount>12.30000000000000</Amount>

Posted by Joe on 2003-08-13

Grep http://www.w3.org/TR/SOAP/ or  http://www.w3.org/TR/soap12-part1/ for PSVI.  You won't find it.

SOAP does not prereq WSDL or a schema.  Even if a schema is used, amount may simply be a string.  Take a close look at the Atom 0.2 specifications: how many floats do you see?

Even in the toolkits where xsd is used, you will find that such mappings have a lot more wiggle room than you might expect.  See To infinity and beyond - the quest for SOAP interoperability for more insight on this subject.

Posted by Sam Ruby on 2003-08-14

J:\TMP>wget http://www.w3.org/TR/soap12-part1/

J:\TMP>grep -i -c infoset index.html
29

Posted by Joe on 2003-08-14

Just stumbled upon this type of URL (via comments at Jeremy Zawodny's blog):

http://www.google.com/keyword/adagio

Nice.

Posted by Arien on 2003-08-23

comments powered by Disqus