Instead of a list of changed user ids or URIs, we can represent the state as a sparse bit array corresponding to all of Flickr’s users. I don’t know exactly how many users there are at Flickr, but let’s be generous and estimate it at one million. One million bits seems like a lot, but it is only 122kB in an uncompressed array. Considering that this array will only contain 1s when an update has occurred within the last minute, my guess is that it would average under 1kB per representation.
I can just imagine people reading "sparse bit array" and thinking that I must be talking about some optimal data structure that only chief scientists would think to use on the Web. I’m not. Any black-and-white GIF or PNG image is just a sparse bit array, and they have the nice side-effect of being easy to visualize.
I'll see your GIF and raise you a Bloom Filter. Not that it in anyway detracts from the point of the article, which is to put more thought into resource design before jumping off into a new protocol.
Yes, just like Roy's GIF example, but with the possibility of false positives, which you trade off for much smaller representation, which in the case of the 1000x1000 black and white GIF might be hard to beat, but might be applicable in other areas.
Posted by Joe Gregorio on 2008-09-11
Posted by Pierre Phaneuf on 2008-09-12
Posted by anon on 2008-09-12
Posted by Markus on 2008-09-16
2008-09-11
Posted by Ian Bicking on 2008-09-11