What We’ve Got All Wrong About Big Data

Analytics Experts on Big Data Misconceptions: Big Data isn’t difficult, it isn’t expensive, but it does require thinking! BigDataExperts


As mentioned yesterday, in preparation for the Useful Business Analytics Summit, so that fellow attendees will know more about each other and the event I’ll be posting a series of questions related to analytics here on the blog. Today’s question is about Big Data:






Eight of our speakers responded to today’s question below. While answers are varied I agree with several of the thoughts here. Big Data does not have to be as difficult or expensive as some seem to believe. However, useful analysis certainly does require serious thinking regardless of data size.



The biggest misconception to me is that it has to be complex; that to extract meaningful insights from big data requires complex modeling, etc. In reality, I’ve found that the most successful ways to leverage big data and show its value in the workplace are no different from typical relational datasets. You still need to understand what sort of story you want to be able to tell and condense the analysis you perform to answer those questions. The data is complex enough, there’s no real reason to over-complicate (especially at first) with models and algorithms that your senior leadership team likely won’t understand. Start simple and build credibility for your practice.



By and large Big Data means something different to everyone, which means that one person's conception is another's misconception



That it is nothing new, simply a linear extension of the historical trend of data storage getting cheaper and cheaper. And the troubling thing is, this line comes from some of the wisest and best-qualified elder statesmen of the data world. In reality, data storage is an order of magnitude cheaper than it’s ever been before, and ordinary companies with ordinary budgets can now store and retrieve essentially the entire digital histories of their target clients from Hadoop clusters, and slice the data for insights using freeware R, without shelling out $50K for a SAS license. In reality, all hype aside, this is an authentic sea change…a clean break from how things were done before.



 While I heard and read many misconceptions about big data like being a new concept or it is very expensive or the fact that it is an IT or a Hadoop thing…I feel like the biggest misconception about big data is that it’s just a tool or a technology of some sort…I believe that big data is a discipline and a practice that requires, very much like data management, a combined and specialized set of related processes, technologies and people to make it happen.



 Big is a relative term. Fifteen years ago, people thought POS data was big enough to challenge software/hardware/methods in practice. Then came user web navigation data, and Google developed non-tabular methods to extract value from it. Now we have much vaster amounts of data generated by machines without human intervention ...like RFID tag data. What has changed is the extent to which companies and organizations must leverage this information in order not to fail.



That Big data is new ,in fact it always existed, it’s just getting bigger due to the digital world.


That it has specific format, in fact it might be in any format.


That there are many effective ways to use it, in fact, most of fortune 500 companies are only taking advantage of limited data.



 The biggest misconception is what the word or the concept 'Big Data' itself means. It is more difficult than people think and it is also less difficult than people think. Essentially to do this right just a software or solution will not solve the problem. You need to re-think your approach to collecting data, structure the data, process or analyze the data. It will not magically solve all your business problems, would require a lot of upfront work / investment, and at the end of the day might not be the right solution for your business.



I can identify at least three misconceptions:


1. The term "Big". By focusing on the size of data, something

challenging in and by itself since big is relative, we are removing out

attention from the problems and business challenges we are trying to

address. Let's instead focus on problems first and determine how and

what data can help us second, be it big, small or tiny.


2. The delusion that bigger=better. Usually, this is not the case because the

ability to extract a valid signal from data, depends on a lot more than

just size. Big often brings other methodological challenges that, to

this day, we don't know how to solve.


3. The illusion that one is working with the whole population leading practitioners to ignore more fundamental quality and validity issues. This is especially true for things like selection bias, measurement errors, or plain non-sensical relationships.


There's no question that big data is here to stay. Nonetheless, we need to stop discriminating data by its size. Can we just call it data?


A big thank you to Alex Uher at L'Oréal Paris, Jonathan Isernhagen at Travelocity, Farouk Ferchichi from Toyota, Larry Shiller from Yale, Anthony Palella at Angie's List, Sofia Freyder from MaterCard, Thomas Speidel from Suncor Energy, and Deepak Tiwari from Google who answered the Big Data question today.

I’d love to hear your thoughts on whether you agree or disagree with our speakers in the comment section. Look forward to the possibility of meeting you at the event!

Check back over the next few days as I pose more questions to our esteemed speakers.


@TomHCanderson @OdinText



[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]