Apparently Google does not recognize XHTML, at least not when served as application/xhtml+xml. Try this search which should return exactly one hit pointing to an XHTML document. Notice that the file format is “unrecognized” and they offer to let you “View it as HTML”.
Ouch. I appear to be suffering from the same fate. Oddly enough, Sam doesn't seem to have that problem. The difference between us may be in content negotiation. Here is my code for determining when to use application/xhml+xml
or text/html
:
# Only serve XHTML to those clients that can understand it.
if serialization in matching:
best = mimeparse.best_match(matching.keys(), environ.get('HTTP_ACCEPT',
'application/xhtml+xml'))
(contenttype, serialization) = (best, match[best])
if serialization == 'xhtml' and environ.get('HTTP_USER_AGENT', '').find("MSIE") >= 0:
(contenttype, serialization) = extensions['html']
Note the great bit of code for detecting and compensating
for IE, which sends the equivalent of "*/*"
in its Accept: header, yet
doesn't handle XHTML.
The mimeparse module follows the algorithm given in RFC 2616 for matching the media type against the Accept: header, and it is covered in detail in my XML.com article, "Just use media types?".
Compare this to Sam's detection algorithm, which
is implemented in his .htaccess
file:
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0\.?0*(\s|,|$)
RewriteCond %{REQUEST_URI} \.html$
RewriteRule .* - [T=application/xhtml+xml;charset=utf-8]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml\s*;\s*q=0\.?0*(\s|,|$)
RewriteCond %{REQUEST_URI} !\.
RewriteRule .* - [T=application/xhtml+xml;charset=utf-8]
The difference being that Sam will serve up his content as
application/xhtml+xml
only if that media type appears explicitly in the Accept: header, and it's not followed with a zero or fractional quality parameter. This neatly fixes the problem for IE and apparently also for the Google crawler. If I were a gambling man I'd wager that Google also sends Accept: */*
, and just like IE, doesn't understand XHTML.
I've warned against using content negotiation before and this is just another example of the kinds of problems it can cause.
At this point I will probably drop back to text/html
for all my pages and explore other means of displaying SVG, or convert the SVG into PNGs on the fly.
Posted by Jeff on 2007-03-08
Posted by Patrick Mueller on 2007-03-08
How about using local quality preferences to weigh in on the content negotiation?
Your application can be configured to serve pages with MIME types in the following order of highest preference descending:
- text/html
- application/xhtml+xml
If the browser requests a resource with an ACCEPT header that matches both of these MIME types then you will always server up the MIME type with the highest local quality, I.E. text/html.
Only if a browser explicitly requests application/xhtml+xml will they be served XHTML.
Posted by Noah Slater on 2007-03-08
A defect report at bugzilla.mozilla.org reported this problem something like six years ago, but unfortunately every Mozilla developer who sees it reacts with something like "mneeeh, if you can't guarantee your app will always output well-formed XML you shouldn't publish as XHTML!" as if the apps they write, much less *any* web application, actually does that. Even web services, which have to return well-formed XML to function at all, output plain text error messages in all sorts of conditions, under both .NET and Java and every other platform I've seen.
All they'd have to do is display the full XHTML source along with the invalid XML error and it would be much more workable. It's really unfortunate because Firefox is otherwise pretty good at providing informative error messages.
Posted by ticktock256 on 2007-03-08
Posted by Danny on 2007-03-08
Posted by Asbjørn Ulsberg on 2007-03-09
Google search for
site:thebjoernhoehrmannproject.org
.Posted by Anne van Kesteren on 2007-03-08