The underlying assumption, which I whole heartedly agree with, of
Joe Armstrong's algorithms is that any system that is scalable,
fault-tolerant, and upgradable is composed of N
nodes, where N > 1
.
The problem with current data storage systems, with
rare exception, is that
they are all "one box native" applications, i.e. from a world
where N = 1
. From Berkeley DB to MySQL, they were all designed
initially to sit on one box. Even after several years of dealing
with MegaData you still see painful stories like what the YouTube guys went through as they scaled up. All of this stems from
an N = 1
mentality.
So when Sam answers Anant's calls for patterns of information needs of Web 2.0 applications,
I would add N > 1
as a prerequisite, regardless of
which 'size' of database you choose
The place where N > 1 is very hard, though, is when you have data you need to write with transactional guarantees.
I totally agree, but I would push in the other direction, that we need something that is natively N > 1
, that replicates data over nodes transparently, and that only supports a limited set of transactional guarantees. Yes, whatever that is will have a different programming model from SQL, and while not everybody needs BigTable today, more and more people will over time. The sooner we start building a common understanding of how to program against such a model the better off we'll be.
Posted by joe on 2007-07-22
Posted by Dave Tauzell on 2007-07-23
Posted by Nelson Minar on 2007-07-22