Megadata and NoSQL

Do not get me wrong, I'm thrilled to see some of the ideas that I've talked about finally getting traction in the wider community, and pleased that someone came up with a different name than Megadata, but in looking at many of the discussions it appears that there's way too much focus on the "SQL" part of NoSQL. While breaking away from looking at data merely in a relational manner is a good start, it misses the larger picture, which is that working with Megadata, really large data sets, the nature of the problems and the nature of the solutions markedly changes.

On the simplest of level you have to get beyond N=1 thinking, you can't possibly store all that data on one machine, nor can you hope to process it through one machine. Whatever software you build to house and handle all that data must be an N > 1 endeavor.

But also missing is the idea that there are problems that are simpler to solve, or are only possible to solve, once you have enough data . That brings me back to one of the most exciting annoucements at Google IO, which was the Prediction API; in conjunction with Google Storage for Developers it puts the tools for leveraging Megadata into more hands.

So I'm glad to see NoSQL getting traction, but I'm looking forward to the day when the discussion moves beyond what was left behind, and moves on to what new things we can build going forward.