Here are two great talks by Google engineers on dealing with large sets of data and how they can be used to improve your product.
The first is a talk by Peter Norvig - The Unreasonable Effectiveness of Data:
The point of the talk is that having more data is better than having a more clever algorithm.
And you can see this in action in this second talk by Douglas Merrill. In this clip, starting at 22 minutes, he talks about how the “did you mean” functionality in Google Search was built not by crawling the web, but by analyzing the queries of Google Search users:
The more people that use Google, the better the “did you mean” results will become.