BitWorking

This is Joe Gregorio's writings (archives), projects and status updates.

Stonebraker on MapReduce

I normally point to Michael Stonebraker as a source of information on what comes next after the RDBMS, but after reading this article on MapReduce I may have to rethink that. The very premise of the article is flawed: comparing MapReduce to a database is a category error, and the rest of the article goes down hill from there. I don't need to say much more than that as the article is pretty heavily panned in the comments. See also: Relational Database Experts Jump The MapReduce Shark

I think what they're comparing is building algorithms parallelize work and then collect the result, using either map/reduce or by feeding everything from the database server. You can write a Web crawler either way. So they're not directly comparing one to the other, more like restaurants and microwave ovens as two competing technologies to help you eat regularly, except they don't explain that they're concerned with eating. I still think they're wrong on every account, though.

Posted by Assaf on 2008-01-19

The authors of that article need to learn what a RDBMS is and what an algorithm is.

I don't know how they are making the jump but the are some how joining dots that leads them to say "map/reduce is a database". Why on earth would map/reduce be deficient because it doesn't have a bulk loader or indexing; it has nothing to do with bulk loading or indexing.

I would definitely put that article down to a moment of weakness, clouded thinking or brain fart.

Posted by Al on 2008-01-19

Al - are you suggesting that Michael Stonebraker needs to learn what an RDBMS is? That is so intensely ironic that I wish I had written it! The point, which I'm sure someone has pointed out, can be found by looking at the summary of their findings. Here are a few: - A sub-optimal implementation, in that it uses brute force instead of indexing - Missing most of the features that are routinely included in current DBMS - Incompatible with all of the tools DBMS users have come to depend on These read like a quote from The Innovator's Dilemma. People enjoy the benefits of MapReduce for large scale data processing /because/ of these points. The lack of support for these DBMS features and tools are the /reason/ it scales like a mother fucker. That is the feature people want. And it does it 10x better and cheaper than anything else. Those other things simply don't matter to them. It's a new audience and a new market.

Posted by MikeD on 2008-01-19

2008-01-19