When the number of hits on the sparklines web service topped 100,000 a week I started poking around in the logs. I discovered a couple things, including that my log statistics package wasn't giving me the whole story. Part of the problem may be that my sparklines web service returns an ETag with each response.
That ETag allows each client to do a conditional GET request. If the image hasn't changed then the server just returns an HTTP status code of 304 with no response body, which can potentially save a lot of bandwidth. Here is a diagram of how ETags work:
Client Server | | +------ GET spark.cgi?d=1 -->| | | |<----- 200 Ok, ETag:"foo" --+ | [binary png] | | | .... time passes ... | | +------ GET spark.cgi?d=1 -->| | If-None-Match:"foo" | | | |<----- 304 Not Modified ---+ | |
On the first request the client receives
the PNG it requested and also receives
and stores an ETag. On subsequent
requests to the same URI the client
sends along that ETag in an If-None-Match:
header. This turns a regular GET into
a "conditional" GET (wonder twin powers activate).
When the server receives that conditional GET, if
the resource hasn't changed then the response
returned is a 304 Not Modified
with no response
body. If the resource has changed then a 200 Ok
would be returned.
Of course, I've simplified the scenario slightly and
you should read RFC 2616
for the full story.
I grepped around in my logs and found the following number of hits on the service for a stretch of five days and categorized the results by status code.
Day | Status 200 | Status 304 | Ratio (304/200) |
---|---|---|---|
2006-12-11 | 5222 | 6195 | 1.2 |
2006-12-12 | 6754 | 15156 | 2.2 |
2006-12-13 | 3947 | 16807 | 4.3 |
2006-12-14 | 7667 | 27592 | 3.6 |
2006-12-15 | 7651 | 22201 | 2.9 |
From that table you can see that about three out of every four requests resulted in a 304. That represents a large savings in bandwidth and computation time and is done without turning on caching explicitly. That is, I have no Cache-Control: headers which would probably increase the savings even more. You can also see that my log statistics package was under-reporting the hits and that I am receiving not 100,000, but over 200,000 hits a week.
In case you're wondering, 200,000 hits a week on the sparklines web service is hardly going to drive me into the poor house. Those images are small and the service represents less than 1% of the total number of bytes my site serves in a week.