Skip to content

Month: February 2011

What Is ‘Big Data’?

I spent three glorious days at the Strata conference on “big data” earlier this month — in sunny Santa Clara, surrounded by statistics nerds. The confab, put on by the folks at O’Reilly, proved to be fertile ground for potential stories, as well as for new ways to convey them based on data.

But one question still nags me about this field: What is “big data” in the first place? After all, large data sets have been around for years — although it’s true that we’re now talking petabytes instead of lowly terabytes. Something else that isn’t so new: “data mining,” or the parsing of said data to find patterns, often using artificial intelligence. Furthermore, it’s not always the size of the data that matters; the visualization techniques being discussed at Strata, for example, could very well be used with smaller data sets.

What’s new isn’t just the size of the data involved, or even the fact that it’s being analyzed, but how important and accessible it now is. The point is that data are now everywhere, being scattered like so many breadcrumbs. Tyler Bell at O’Reilly Radar has a good post on the many metaphors being used to describe the concept — like “the new oil,” “data deluge” and my personal favorite, “data exhaust.”

Several folks at the conference posed “data science” as an alternative term to “big data,” and I think that works. It certainly broadens the subject and seems more understandable.