ETags This stuff matters

When the number of hits on the sparklines web service topped 100,000 a week I started poking around in the logs. I discovered a couple things, including that my log statistics package wasn't giving me the whole story. Part of the problem may be that my sparklines web service returns an ETag with each response.

That ETag allows each client to do a conditional GET request. If the image hasn't changed then the server just returns an HTTP status code of 304 with no response body, which can potentially save a lot of bandwidth. Here is a diagram of how ETags work:

   Client                      Server
     |                            |
     +------ GET spark.cgi?d=1 -->|
     |                            |
     |<----- 200 Ok, ETag:"foo" --+
     |       [binary png]         |
     |                            |

     .... time passes ...

     |                            |
     +------ GET spark.cgi?d=1 -->|
     |     If-None-Match:"foo"    |
     |                            |
     |<----- 304 Not Modified  ---+
     |                            |

On the first request the client receives the PNG it requested and also receives and stores an ETag. On subsequent requests to the same URI the client sends along that ETag in an If-None-Match: header. This turns a regular GET into a "conditional" GET (wonder twin powers activate). When the server receives that conditional GET, if the resource hasn't changed then the response returned is a 304 Not Modified with no response body. If the resource has changed then a 200 Ok would be returned. Of course, I've simplified the scenario slightly and you should read RFC 2616 for the full story.

I grepped around in my logs and found the following number of hits on the service for a stretch of five days and categorized the results by status code.

Hits on the sparklines web service
Day	Status 200	Status 304	Ratio (304/200)
2006-12-11	5222	6195	1.2
2006-12-12	6754	15156	2.2
2006-12-13	3947	16807	4.3
2006-12-14	7667	27592	3.6
2006-12-15	7651	22201	2.9

From that table you can see that about three out of every four requests resulted in a 304. That represents a large savings in bandwidth and computation time and is done without turning on caching explicitly. That is, I have no Cache-Control: headers which would probably increase the savings even more. You can also see that my log statistics package was under-reporting the hits and that I am receiving not 100,000, but over 200,000 hits a week.

In case you're wondering, 200,000 hits a week on the sparklines web service is hardly going to drive me into the poor house. Those images are small and the service represents less than 1% of the total number of bytes my site serves in a week.