Posts tagged data analytics
Support OdinText - Make Data Science Accessible!

Take 7 Seconds to Support the OdinText Mission: Help Make Data Science Accessible! I’m excited to announce that OdinText will participate in the IIEX2016 Insight Innovation Competition!

The competition celebrates innovation in market research and provides a platform for young companies and startups to showcase truly novel products and services with the potential to transform the consumer insights field.

Marketing and research are becoming increasingly complex, and the skills needed to thrive in this environment have changed.

To that end, OdinText was designed to make advanced data analytics and data science accessible to marketers and researchers.

Help us in that mission. It only takes 7 seconds.

Please visit and cast a ballot for OdinText!

You can view and/or vote for the other great companies here if you like.

Thank you for your consideration and support!


Tom H. C. Anderson Founder - OdinText Inc. Info/Demo Request

ABOUT ODINTEXT OdinText is a patented SaaS (software-as-a-service) platform for natural language processing and advanced text analysis. Fortune 500 companies such as Disney and Coca-Cola use OdinText to mine insights from complex, unstructured text data. The technology is available through the venture-backed Stamford, CT firm of the same name founded by CEO Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. The company is the recipient of numerous awards for innovation from industry associations such as ESOMAR, CASRO, the ARF and the American Marketing Association. Anderson tweets under the handle @tomhcanderson.


Big Data Insights

Big Data Insights presented by Silicon Alley Network big+data+silicon+alley+new+york

In Manhattan this Monday? If so please join OdinText founder Tom H. C Anderson along with Phil Leig Bjerknes of AllDayEveryDay and Afshim Goodarzi of 1010 data. Great networking event for anyone interested in big data analytics and tech startups. This is an invitation only event, so please contact us if you are interested in attending and we’ll try to get you on the list if space allows. More info here.

Mr Big Data VS. Mr Text Analytics

[Interview re-posted w/ permission from Text Analytics News]

Mr. Big Data & Mr. Text Analytics Weigh In Structured VS. Unstructured Big Data


kirk_borne Text Analytics News

If you pay attention to Big Data news you’re sure to have heard of Kirk Borne who’s well respected views on the changing landscape are often shared on social media. Kirk is professor of Astrophysics and Computational Science at George Mason University. He has published over 200 articles and given over 200 invited talks at conferences and universities worldwide. He serves on several national and international advisory boards and journal editorial boards related to big data



tom_anderson Text Analytics News

Tom H. C. Anderson was an early champion of applied text analytics, and gives over 20 conference talks on the topic each year, as well as lectures at Columbia Business School and other universities. In 2007 he founded the Next Gen Market Research community online where over 20,000 researchers frequently share their experiences online. Tom is founder of Anderson Analytics, developers of text analytics software as a service OdinText. He serves on the American Marketing Association’s Insights Council and was the first proponent of natural language processing in the marketing research/consumer insights field.


Ahead of the Text Analytics Summit West 2014, Data Driven Business caught up with them to gain perspectives on just how important and interlinked Big Data is with Text Analytics.


Q1. What was the biggest hurdle that you had to overcome in order to reach your current level of achievement with Big Data Analytics?

KB: The biggest hurdle for me has consistently been cultural -- i.e., convincing others in the organization that big data analytics is not "business as usual", that the opportunities and potential for new discoveries, new insights, new products, and new ways of engaging our stakeholders (whether in business, or education, or government) through big data analytics are now enormous.

After I accepted the fact that the most likely way for people to change their viewpoint is for them to observe demonstrated proof of these big claims, I decided to focus less on trying to sell the idea and focus more on reaching my own goals and achievements with big data analytics. After making that decision, I never looked back -- whatever successes that I have achieved, they are now influencing and changing people, and I am no longer waiting for the culture to change.

THCA: There are technical/tactical hurdles, and methodological ones. The technical scale/speed ones were relatively easy to deal with once we started building our own software OdinText. Computing power continues to increase, and the rest is really about optimizing code.

The methodological hurdles are far more challenging. It’s relatively easy to look at what others have done, or even to come up with new ideas. But you do have to be willing to experiment, and more than just willingness, you need to have the time and the data to do it! There is a lot of software coming out of academia now. They like to mention their institution in every other sentence “MIT this” or ‘UCLA that”. The problem they face is twofold. On the one hand they don’t have access to enough real data to see if their theories play out. Secondly, they don’t have the real world business experience and access to clients to know what things are actually useful and which are just novelty.

So, our biggest hurdle has been the time and effort invested through empirical testing. It hasn’t always been easy, but it’s put me and my company in an incredibly unique position.

Q2. Size of data, does it really matter? How much data is too little or too much?

THCA: Great question, with text analytics size really does matter. While it’s technically possible to get insights from very small data, for instance on our blog during the elections one of my colleagues did a little analysis of Romney VS. Obama debate transcripts, text analytics really is data mining, and when you’re looking for patterns in text, the more data you have the more interesting relationships you can find.

KB: Size of data doesn't really matter if you are just getting started. You should get busy with analytics regardless of how little data you have. The important thing is to identify what you need (new skills, technologies, processes, and data-oriented business objectives) in order to take advantage of your digital resources and data streams. As you become increasingly comfortable with those, then you will grow in confidence to step up your game with bigger data sets. If you are already confident and ready-to-go, then go! The big data revolution is like a hyper-speed train -- you cannot wait for it to stop in order to get on board -- it isn't stopping or slowing down! At the other extreme, we do have to wonder if there is such a thing as too much data. The answer to this question is "yes" if we dive into big data's deep waters blindly without the appropriate "swimming instruction" (i.e., without the appropriate skills, technologies, processes, and data-oriented business objectives). However, with the right preparations, we can take advantage of the fact that bigger data collections enable a greater depth of discovery, insight, and data-driven decision support than ever before imagined.

Q3. What is the one thing that motivates and inspires you the most in your Big Data Analytics work?

KB: Discovery! As a scientist, I was born curious. I am motivated and inspired to ask questions, to seek answers, to contemplate what it all means, and then to ask more questions. The rewards from these labors are the discoveries that are made along the way. In data analytics, the discoveries may be represented by a surprising unexpected pattern, trend, association, correlation, event, or outlier in the data set. That discovery then becomes an intellectual challenge (that I love): What does it mean? What new understanding does this discovery reveal about the domain of study (whether it is astrophysics, or retail business, or national security, or healthcare, or climate, or social, or whatever)? The discovery and the corresponding understanding are the benefits of all the hard work of data wrangling.

THCA: Anyone working with analytics has to be curious by nature. Satisfying that curiosity is what drives us. More specifically in my case, if our clients get excited about using our software and the insights they’ve uncovered, then that really gets me and my whole team excited. This can be challenging, and not all data is created equal.

It can be hard to tell someone who is excited about trying Text Analytics that their data really isn’t suitable. The opposite is even more frustrating though, knowing that a client has some really interesting data but is apprehensive about trying something new because they have some old tools lying around that they haven’t used, or because they have a difficult time getting access to the data because it’s technically “owned” by some other department that doesn’t ‘Get’ analytics. But helping them build a case and then helping them look good by making data useful to the organization really feeds into that basic curiosity. We often discover problems to solve we had no idea existed. And that’s very inspiring and rewarding.

Q4. Which big data analytics myth would you like to squash right here and now?

KB: Big data is not about data volume! That is the biggest myth and red herring in the business of big data analytics. Some people say that "we have always had big data", referring to the fact that each new generation has more data than the previous generation's tools and technologies are able to handle. By this reasoning, even the ancient Romans had big data, following their first census of the known world. But that's crazy. The truth of big data analytics is that we are now studying, measuring, tracking, and analyzing just about everything through digital signals (whether it is social media, or surveillance, or satellites, or drones, or scientific instruments, or web logs, or machine logs, or whatever). Big data really is "everything, quantified and tracked". This reality is producing enormously huge data volumes, but the real power of big data analytics is in "whole population analysis", signaling a new era in analytics: the "end of demographics", the diminished use of small samples, the "segment of one", and a new era of personalization. We have moved beyond mere descriptive analysis, to predictive, prescriptive, and cognitive analytics.

THCA: Tough one. There are quite a few. I’ll avoid picking on “social media listening” for a bit, and pick something else. One of the myths out there is that you have to be some sort of know it all ‘data scientist’ to leverage big data. This is no longer the truth. Along with this you have a lot dropping of buzz words like “natural language processing” or “machine learning” which really don’t mean anything at all.

If you understand smaller data analytics, then there really is no reason at all that you shouldn’t understand big data analytics. Don’t ever let someone use some buzz word that you’re not sure of to impress you. If they can’t explain to you in layman’s terms exactly how a certain software works or how exactly an analysis is done and what the real business benefit is, then you can be pretty sure they don’t actually have the experience you’re looking for and are trying to hide this fact.

Q5.What’s more important/valuable, structured or unstructured data?

KB: Someone said recently that there is no such thing as unstructured data. Even binary-encoded images or videos are structured. Even free text and sentences (like this one) are structured (through the rules of language and grammar). Even some meaning this sentence has. One could say that analytics is the process of extracting order, meaning, and understanding from data. That process is made easier when the data are organized into databases (tables with rows and columns), but the importance and value of the data are inherently no more or no less for structured or unstructured data. Despite these comments, I should say that the world is increasingly generating and collecting more "unstructured data" (text, voice, video, audio) than "structured data" (data stored in database tables). So, in that sense, "unstructured data" is more important and valuable, simply because it provides a greater signal on the pulse of the world. But I now return to my initial point: to derive the most value from these data sources, they need to be analyzed and mined for the patterns, trends, associations, correlations, events, and outliers that they contain. In performing that analysis, we are converting the inherent knowledge encoded in the data from a "byte format" to a "structured information format". At that point, all data really become structured.

THCA: A trick question. We all begin with a question and relatively unstructured data. The goal of text analytics is structuring that data which is often most unstructured.

That said, based on the data we often look at (voice of customer surveys, call center and email data, various other web based data), I’ve personally seen that the unstructured text data is usually far richer. I say that because we can usually take that unstructured data and accurately predict/calculate any of the available structured data metrics from it. On the other hand, the unstructured data usually contain a lot of additional information not previously available in the structured data. So unlocking this richer unstructured data allows us to understand systems and processes much better than before and allows us to build far more accurate models.

So yes, unstructured/text data is more valuable, sorry.

Q6. What do you think is the biggest difference between big data analysis being done in academia vs in business?

KB: Perhaps the biggest difference is that data analysis in academia is focused on design (research), while business is focused on development (applications). In academia, we are designing (and testing) the optimal algorithm, the most effective technique, the most efficient methodology, and the most novel idea. In business, you might be 100% satisfied to apply all of those academic results to your business objectives, to develop products and services, without trying to come up with a new theory or algorithm. Nevertheless, I am actually seeing more and more convergence (though that might be because I am personally engaged in both places through my academic and consulting activities). I see convergence in the sense that I see businesses who are willing to investigate, design, and test new ideas and approaches (those projects are often led by data scientists), and I see academics who are willing to apply their ideas in the marketplace (as evidenced by the large number of big data analytics startups with university professors in data science leadership positions). The data "scientist" job category should imply that some research, discovery, design, modeling, and hypothesis generation and testing are part of that person's duties and responsibilities. Of course, in business, the data science project must also address a business objective that serves the business needs (revenue, sales, customer engagement, etc.), whereas in academia the objective is often a research paper, or a conference presentation, or an educational experience. Despite those distinctions, data scientists on both sides of the academia-business boundary are now performing similar big data analyses and investigations. Boundary crossing is the new normal, and that's a very good thing.

THCA: I kind of answered that in the first question. I think academics have the freedom and time to pursue a research objective even if it doesn’t have an important real outcome. So they can pick something fun, that may or may not be very useful, such as are people happier on Tuesdays or Wednesday’s? They’ll often try to solve these stated objectives in some clever ways (hopefully), though there’s a lot of “Pop” research going on even in academia these days. They are also often limited in the data available to them, having to work with just a single data set that has somehow become available to them.

So, academia is different in that they raise some interesting fun questions, and sometimes the ideas borne out of their research can be applied to business.

Professional researchers have to prove an ROI in terms of time and money. Of course, technically we also have access to both more time and more money, and also a lot more data. So an academic team of researcher working on text analytics for 2-3 years is not going to be exposed to nearly as much data as a professional team.

That’s also why academic researchers often seem so in love with their models and accuracy. If you only have a single data set to work with, then you split it in half and use that for validation. In business on the other hand, if you are working across industries like we do, while we certainly may build and validate models for a specific client, we know that having a model that works across companies or industries is nearly impossible. But when we do find something that works, you can bet it’s going to be more likely to be useful.

Big Data, Text Analytics and Privacy

Disruptive Technologists Panel Discuss Big Unstructured Data and Text Mining

Last week Anderson Analytics – OdinText CEO Tom H. C. Anderson participated in a panel on Big Data Analytics at the Disruptive Technologists event in New York.


The experienced panelists discussed the opportunities and challenges surrounding use of big data, including the combination of text analytics with predictive analytics. The panel discussion is now available on YouTube here.

10 Useful Business Analytics Posts


A Recap of Our Client Side Q&A Series

In case you missed it last week, below are links to 10 Q&A blog posts ahead of the Useful Business Analytics Summit in Boston where I’ve been asked to moderate the panel on CRM and Loyalty Analytics.

We received a lot of positive feedback on a few of the posts below, especially the 'Top 10 Big Data Analytics Tips', and want to thank the speakers who participated:



Thomas Speidel – Statistician & Data Scientist, Suncor Energy Deepak Tiwari – Head of Strategic Analytics and Insights, Google Sofia Freyder – Director of Product Management/ Product Leader, MasterCard Anthony Palella – VP of Data Analytics at Angie’s List Farouk Ferchichi – Executive Director, Toyota Financial Services Jonathan Isernhagen – , Director – Marketing Analysis, Travelocity Larry Shiller – Data Management Consultant, Yale University Alex Uher – Director, CRM & Analytics, L’Oréal Paris

Here are the Q&A posts in reverse chronological order

  1. Useful Business Analytics Summit
  2. What We’ve Got All Wrong About Big Data
  3. Analytics – Keeping Up to Date
  4. Top 10 Big Data Analytics Tips
  5. Are All Data Created Equal?
  6. The Text Analytics Opportunity
  7. How to Select Your Analytics Software
  8. Selling Analytics – Top 5 Tips (Client POV)
  9. Five Analytics Business Startup Tips
  10. Talking Analytics to the CEO – Top 5 Tips

@TomHCAnderson @OdinText


[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of patented Next Generation Text Analytics™ software platform OdinText. For more information and to inquire about software licensing visit ODINTEXT INFO REQUEST]

[Above also posted on the Next Gen Market Research blog]

Talking Analytics to the CEO – Top 5 Tips


Hot to be an Analytics Star at that Next C-Suite Meeting

Wrapping up my interview series with my client side experts from the Useful Business Analytics Summit, in this 10th and final post I asked for advice on how to communicate analytics insights to the C-Suite. Here are the

Top-5 C-Suite Analytics Tips:

  1. Dumb it down, way down! - No the CEO isn't a five year old, but if you’re spouse falls asleep reading it, your CEO might too.
  2. It should be all about the Benjamins! - you better be talking increased sales, ROI, value, or savings…
  3. Follow the 2:1 Analysis to Communication Ratio – Spend at least half as much time thinking about how to best communicate the answer as you did finding the answer
  4. Be Prepared to Catch a Lot of Javelins – If you enjoy answering hard questions, this is your Olympics event
  5. Make sure it’s topical – Most CEO’s don’t have ADD, but you better not be talking about what was important last month!

Q. What ways if any have you found to be most successful in communicating with C-Suite/Upper Management about Data and Analytics issues? [What really gets them to pay attention?]


AlexUherLoreal Simplicity & ROI. I almost wanted to stop after those two words. Take the presentation that you think is simplified, and then have your spouse or partner read it over. Do they get it? Are they bored? Do they get the point? If not – it’s way too complex. Your powerpoint shouldn’t be an aggregation of every fact and figure you have to share, it should be the visual highlights that you want to standout. You can talk to the rest and follow-up offline with data to support the rest.


JonathanIsernhagenTravelocity Most of my interactions with the C-suite involve javelin-catching of hard questions related to standard reports, and come pre-packaged with all the attention I could hope for. That said, I try to proceed directly from their question, stick to what’s being asked, emphasize insights about the data vs. the data itself and put everything that might be of tangential interest into the appendix. There’s less emphasis on storytelling with a C-suite presentation than there is with a conference presentation. “Never Confuse a Memo With Reality” says to spend one minute thinking about how to communicate the answer for every two minutes you spent discovering the answer. I think that’s a good ratio.


FaroukFerchichiToyota I believe what works is the balance between being high conceptual / strategic (discussing the short and long term possibilities of data) but able to support it with concrete examples that they care about addressing business issues they are facing at the moment.


AnthonyPalellaAngiesList Show that you think in terms of a business unit's financial success in order to truly partner with the C-level exec running that business unit.


SofiaFreyderMasterCard Give them specific data examples and explain how this data can be used to change the strategy or the tactics of the company with following impact to the bottom line: revenue/ profit gains within specific period of time.


LarryShillerYale Make a value proposition and give an example of how you can increase sales, savings, profitability, and/or quality of brand/image.


DeepakTiwariGoogle Show them the impact. That's all.


A big thank you to all our experts for taking part in this interview series:

Thomas Speidel - Statistician & Data Scientist, Suncor Energy Deepak Tiwari - Head of Strategic Analytics and Insights, Google Sofia Freyder - Director of Product Management/ Product Leader, MasterCard Anthony Palella - VP of Data Analytics at Angie's List Farouk Ferchichi - Executive Director, Toyota Financial Services Jonathan Isernhagen - , Director - Marketing Analysis, Travelocity Larry Shiller - Data Management Consultant, Yale University Alex Uher - Director, CRM & Analytics, L'Oréal Paris

I hope you've enjoyed the Q&A series and look forward to seeing some of you in one of my favorite cities, Boston, in June when I’ll be discussing CRM and Loyalty Analytics in far more detail with a few of our experts.

@TomHCAnderson @OdinText


[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of patented Next Generation Text Analytics™ software platform OdinText. For more information and to inquire about software licensing visit ODINTEXT INFO REQUEST]

[Above also posted on the Next Gen Market Research blog]

Top 10 Big Data Analytics Tips

Top10AnalyticsTips As part of the interview series leading up to the Useful Business Analytics Summit today we post the Top 10 Tips from our analytics experts. Whether you are data mine more structured data, or like myself more often work with unstructured or mixed data using text analytics, I think you’ll agree that the following 10 tips are critical.

  1. Keep It [ridiculously] Simple (10 times more so than is necessary to get your point across).
  2. Hypothesize/Put Problem First
  3. Don’t Assume Data is Good – Check/Validate!
  4. Automate repeat tasks & Carve out time to go exploring
  5. Set a Data Strategy – don’t just collect data for the sake of collecting it
  6. In a rapidly expanding field, work with people on the leading edge
  7. Be a Skeptic about models etc.
  8. Look for the pragmatic and cost effective solutions
  9. Don’t torture Data – in the end it will confess
  10. Think like a Business Owner – what would you like to know?

Below are more detailed tips from some of our client experts. We’d love to hear you tips if you’ve got one to add in the comments section.



Honestly, I think I’d boil it down to a single tip that is more important than all others, in my experience, but is the one most ignored and poorly executed. Keep it simple. Ridiculously simple. Ten times more simple than what you think necessary. Just about then, you are actually getting your point across in a way that people are starting to follow you. You can always increase the complexity from there, but the first time you have an experience and realize that you’ve actually conveyed a complex analytical presentation to a group of C-suite execs, you’ll understand what you’ve been doing wrong this whole time before. Hint – those head nods and blank stares aren’t what you are looking for…


- Understand that any problem is easier if you approach it correctly don't necessarily take a cookie cutter approach. Conventional wisdom is not so wise in a rapidly evolving field.

- Work with people who are able to work on the leading edge ...the people who are helping expand the envelope.



Automate anything you do more than once. It’s very easy to fill your time with routine pulls of data which lie just beyond the reach of the visualization tools available to business stakeholders. You can’t ignore these requests and it frankly feels great for us geeks to bask in the gratitude of camera-ready cool kids, but these tasks may not represent the highest-value use of your time. The more experience you have with the data, the more likely you are to be the only person with eyes on a particular business problem. So carve out time to go exploring. Think entrepreneurially like a business owner, and ask yourself “if I owned this P&L, what would I want to know?”


  -Ensure there is a purpose you understand of why analytics is valuable to the organization. Purpose can be a business sponsor like discovering new ways (i.e. products, markets, etc.) to increase revenue, retention, profit, or control costs. So ask the tough questions and align with executives mandates.

-Ensure clarity around the level of effort you spend gathering data vs. designing experiments, mining and analyzing data. The need / urge to have data to accomplish a specific task can lead to disparate / disjointed data gathering and management effort that can take over the data scientist or analytics professional work and analytics can become a second thought. So be a sponsor or an advocate for a data strategy.


1) Don't assume the data is good. Is the data lineage (with transformation rules) exposed? Is data quality measured and reportable as a trend?

2) Hypothesize and/or uncover non-time-based relationships: These are usually the richest.



Double check your results using data from different sources

Make sure it makes sense

In case of discrepancies use it directionally

Reach out to experts to obtain their opinion



1. Think of the broader perspective. Take a step back. Understand the business and the problem before jumping into solutions.

2. Be an analyst: Adopt a critical approach to thinking all analytical problems. There is nothing wrong with a slight dose of skepticism about models and results. It is healthy.

3. Try to find pragmatic and cost-effective models / solutions. For example you can probably do machine learning and neural networks to solve a lot of problems but a linear regression might sometimes be enough.



 1. Be humble: sometimes data tells us nothing or, worse, will lie to us. Cognitive dissonance is the norm rather than the exception.

2. If you torture data it will confess to any sins (attributed to Frank Harrell).

3. Go ahead, ask questions, be curious, don't be afraid to cross cultures.


Big thanks again to our client side analytics experts. Feel free to check out our previous questions on Big Data and How to Keep Up on Analytics. Don’t forget to check back in for our next question about the value of various types of data… Look forward to seeing you at the Summit!




[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]

Forget Big Data, Think Mid Data

Stop Chasing the Big Data; Mid Data makes more sense After attending the American Marketing Association’s first conference on Big Data this week, I’m even more convinced of what I already suspected from speaking to hundreds of Fortune 1000 marketers the last couple of years. Extremely few are working with anything approaching what would be called “Big Data” – And I believe they don’t need to – But many should start thinking about how to work with Mid Data!


“Big Data”, “Big Data”, “Big Data”. It seems like everyone is talking about it, but I find extremely few researchers are actually doing it. Should they be?

If you’re reading this, chances are that you’re a social scientist or business analyst working in consumer insights or related area. I think it’s high time that we narrowed the definition of ‘Big Data’ a bit and introduced a new more meaningful and realistic term “MID DATA” to describe what is really the beginning of Big Data.

If we introduce this new term, it only makes sense that we refer to everything that isn’t Big or Mid data as Small Data (I hope no one gets offended).

Small Data

I’ve included a chart, and for simplicity will think of size here as number of records, or sample if you prefer.

‘Small Data’ can include anything from one individual interview in qualitative research to several thousand survey responses in longitudinal studies. At this level of size quantitative and qualitative can technically be lumped together as neither currently fit the generally agreed upon (and admittedly loose) definition of what is currently “Big Data”. You see, rather than a specific size, the current definition of Big Data is varies depending on the capabilities of the organization in question. The general rule for what would be considered Big Data would be data which cannot be analyzed by commonly used software tools.

As you can imagine, this definition is an IT/hardware vendor’s dream, as it describes a situation where a firm does not have the resources to analyze (supposedly valuable) data without spending more on infrastructure, usually a lot more.

Mid Data

What then is Mid Data? At the beginning of Big Data, some of the same data sets we might call Small Data can quickly turn into Big Data. For instance, the 30,000-50,000 records from a customer satisfaction survey which can sometimes be analyzed in commonly available analytical software like IBM-SPSS without crashing. However, add text comments to this same data set and performance slows considerably. These same data sets will now often take too long to process or more typically crash.

If these same text comments are also coded as is the case in text mining, the additional variables added to this same dataset may increase significantly in size. This then is currently viewed as Big Data, where more powerful software will be needed. However I believe a more accurate description would be Mid Data, as it is really the beginning of Big Data, and there are many relatively affordable approaches to dealing with this size of data. But more about this in a bit…

Big Data

Now that we’ve taken a chunk out of Big Data and called it Mid Data, let’s redefine Big Data, or at least agree on where Mid Data ends and when ‘Really Big Data’ begins.

To understand the differences between Mid Data and Big Data we need to consider a few dimensions. Gartner analyst Doug Laney famously referred to Big Data as being 3-Dimensional; that is having increasing volume, variety, and velocity (now commonly referred to as the 3V model).

To understand the difference between Mid Data and Big Data though, only two variables need to be considered, namely Cost and Value. Cost (whether in time or dollars) and expected value are of course what make up ROI. This could also be referred to as the practicality of Big Data Analytics.

While we often know that some data is inherently more valuable than other data (100 customer complaints emailed to your office should be more relevant than a 1000 random tweets about your category), one thing is certain. Data that is not analyzed has absolutely no value.

As opposed to Mid Data, to the far right of Big Data or Really Big Data, is really the point beyond which an investment in analysis, due to cost (which includes risk of not finding insights worth more than the dollars invested in the Big Data) does not make sense. Somewhere after Mid Data, big data analytics will be impractical both theoretically, and for your firm in very real economic terms.

Mid Data on the other hand then can be viewed as the Sweet Spot of Big Data analysis. That which may be currently possible, worthwhile and within budget.

So What?

Mid Data is where many of us in market research have a great opportunity. It is where very real and attainable insight gains await.

Really Big Data, on the other hand, may be well past a point of diminishing returns.

On a recent business trip to Germany I had the pleasure of meeting a scientist working on a real Big Data project, the famous Large Hedron Collider project at CERN. Unlike the Large Hadron Collider, consumer goods firms will not fund the software and hardware needed to analyze this level of Big Data. Data magnitudes common at the Collider (output of 150 million sensors delivering data 40 million times per second) are not economically feasible but nor are they needed. In fact, scientists at CERN do not analyze this amount of Big Data. Instead, they filter out 99.999% of collisions focusing on just 100 of the “Collisions of Interest” per second.

The good news for us in business is that if we’re honest, customers really aren’t that difficult to understand. There are now many affordable and excellent Mid Data software available, for both data and text mining, that do not require the exabytes of data or massively parallel software running on thousands of servers. While magazines and conference presenters like to reference Amazon, Google and Facebook, even these somewhat rare examples sound more like IT sales science fiction and do not mention the sampling of data that occurs even at these companies.

As scientists at Cern have already discovered, it’s more important to properly analyze the fraction of the data that is important (“of interest”) than to process all the data.

At this point some of you may be wondering, well if Mid Data is more attractive than Big Data, then isn’t small data even better?

The difference of course is that as data increases in size we can not only be more confident in the results, but we can also find relationships and patterns that would not have surfaced in traditional small data. In marketing research this may mean the difference between discovering a new niche product opportunity or quickly countering a competitor’s move. In Pharma, it may mean discovering a link between a smaller population subgroup and certain high cancer risk, thus saving lives!

Mid Data could benefit from further definition and best practices. Ironically some C-Suite executives are currently asking their IT people to “connect and analyze all our data” (specifically the “varied” data in the 3-D model), and in the process they are attempting to create Really Big (often bigger than necessary) Data sets out of several Mid Data sets. This practice exemplifies the ROI problem I mentioned earlier. Chasing after a Big Data holy grail will not guarantee any significant advantage. Those of us who are skilled in the analysis of Small or Mid Data clearly understand that conducting the same analysis across varied data is typically fruitless.

It makes as much sense to compare apples to cows as accounting data to consumer respondent data. Comparing your customers in Japan to your customers in the US makes no sense for various reasons ranging from cultural differences to differences in very real tactical and operational options.

No, for most of us, Mid Data is where we need to be.


[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics which develops and sells patent pending data mining and text analytics software platform OdinText]


Text Analytics at the AMA Science Fair

Big Data Experts on Hand for Q&A

A bit last minute I know, but wanted to let Next Gen researchers out on the West Coast know about the American Marketing Association’s event next week in San Diego. It’s called Analytics with Purpose: The Human Edge of Big Data and what’s unique about this event is that it has a ‘Science Fair’ component where one leader from each area of Big Data analytics has been selected as the subject matter expert and will answer any questions that come up among attendees during the event.

I was honored to be asked to participate as the Text Analytics expert, and really look forward to being out on the West Coast. Especially, as until very recently, all the text analytics related events have taken place here on the East coast (Boston or NYC).

I understand the event may already be sold out, but if you’re on the West coast and plan to be there please feel free to ask me anything. I’ll do my best to answer any question as honestly as I can. I’m obviously a bit biased in favor of OdinText but, that said, I quite often refer clients to other vendors if I see there isn’t a good match.

Hope to see you Monday.

@TomHCAnderson @OdinText

PS. If you can’t be at the event and have a question on text mining you can contact me directly. Always happy to help if I can.

ROI of Big Data and Text Analytics

In Depth: Making text - and big data - analytics a default business solution

[Republished from Text Analytics News] 

We recently caught up with two text analytics experts ahead of our Text Analytics Summit in San Francisco to get their thoughts on the growing commercial importance of text and big data analytics, and where the next wave of gains will be realized.

 Our contributors:

Ashok Shrivistava, Principal Scientist Data Mining Systems & Health Management, NASA

Tom H. C. Anderson, Founder and Managing Partner, Anderson Analytics OdinText


What needs to happen in the market, in the process, and in the translation of results in order for business (end users) to trust and depend on text analytics as a part of everyday business practice and budget allocation?


Ashok: My first reaction is that historically speaking, like for the last 100 years, people have been thinking about how to use numeric data to solve problems, and statistics was built on this - the average of data and standard deviations, and this is how people SHOULD think. But 1/3 of all business leaders make decisions without numeric data. Text is a step beyond this because a lot of the math machinery is hard to apply to text due to semantics. And there are many types of data which are easy to process by the human mind but not the machine mind. So we need improvement in semantic technologies that can help mark and tag and see the underlying meaning in large text corpa. There are large companies that do that, but from what I’ve seen, they don’t lend themselves to easy analysis. You can tag your text and get stuff out of it, but it’s still up to the human to make the connections. So for example, if we start developing techniques for the overlap, then that would be of great help. You can imagine this doing a very large analysis and automatically discovering topics and looking for trends.

Tom: Well I think about all the effort and cost that is currently being devoted to collecting and storing this kind of data. Currently there us a lot of buzz around “social media monitoring”, but most companies have a large amount of text coming in from people who they know are their real customers.

So I would throw the question back and ask, what is the ROI of collecting and storing combinations of structured and unstructured information from sources such as customer call centers (call logs and email complaints and suggestions), customer satisfaction and brand tracking surveys etc. if you’re not going to bother leveraging that information for insights?

The good news related to text analytics is that as long as your competitors aren’t doing it yet, the information advantage which can be gained through leveraging text analytics is even greater.

Our challenge as text analytics software developers has been to make the software as powerful and easy to use as possible. However software is just half the battle. Equally important is allocating analysts time to think about how to best leverage the data and tools.

When the first calculator came out no one was expecting the calculator to create presentations ready for the C-Suite without the investment in an analyst. A calculation on proper return on investment on text analytics needs to include a certain human time investment factor as well.


How far away are we from these technologies reaching that level of reliability?


Ashok: We’re getting closer. We know reasoning technologies are still a ways out. And IBM in the way of automated query and analysis is notable. But as far as semantic reasoning and understanding, we are still a few years out. Maybe 5-7 years.

 Tom: Depending on whom you talk to we are already there. Comparing human analysis of unstructured data to text analytics software in isolation is like comparing apples to cows. Output from text analytics software because it is in fact 100% consistent is far more reliable than human analysis and because of this lends itself to statistical/mathematical analysis that human text coding cannot. Of course as I mentioned earlier, a proper text analytics effort requires machine and human to work in concert.


How do we need to think differently – to ask questions differently – to encourage development towards greater ROI?


Ashok: You need a specific business problem in mind. SO let’s call it vertical application that targets specific needs of the business community. If we were to digress, look at Hadoop. There is a real business problem that it solves. And it’s why they are seeing so much traction. And the distribution is open source.

 Tom: Approach research involving text data the same way that you approach analysis of structured data. Too often we do not think sufficiently about the data we have available and to those who would benefit from analysis of that data. Ask internal clients, what are you struggling with? What assumptions are you making? Does it make sense to leverage text analytics to explore, quantify, prove and model these assumptions? Often the question is yes, sometimes it is no. It depends on both the objective and the data in question.


Within the cycle of, say, 5-7 years, what will the market be clamoring for; what demand will developers and engineers be striving to meet?


 Ashok: My reaction is that it depends on the way problems are (being) solved. If they are solved in a vacuum, and we as technologists have solved (problems) without a larger business problem in mind, we will still not have a large degree of adoption. I say that we are trying to solve a specific problem, and then we build tools and tech to solve that spec problem. I can understand hesitancy to technologies at large.

 Tom: It’s hard for me to imagine all the ways text analytics can be used; I just know there will be many. My company’s expertise has been in the field of consumer insights/marketing research, therefore our software was designed for specific types of text data with specific types of analysis in mind.

To think that text analytics could or should only be used one way or that there will only be one dominant solution out there is sort of crazy to me. Almost like saying that math could only be applied one way to one profession. What is clear to me is that there will be many different kinds of implementation for very different purposes.


And if you look forward to next 3 years, what progress would you like to see made to bring data and text analysis closer to a household name?


Ashok: I’d like to see applications come out that allow for rapid ingestion of text and numeric data simultaneously, so that we can look at the combination of text and numeric data simultaneously with ease. And also algorithms that can bring these two sources easily. And I think it could happen in 18 months for a specific vertical. It’s all just data in the end. And we need to give people the ability to analyse as fast as possible (with data) stored in ways that automation can access and take decisions on that data in real time.

Tom: It could happen overnight. Again, if we are talking about a household name, I’m assuming you mean something that could be used or sold B-C rather than B-B. I think it will happen on the web first for the obvious reasons that this is where most people are creating and using data most. So web search like Google, or how we use various PC applications including email, social media related to what we discuss and comparing and suggesting to us based on this data how we are different from our peers and what we might be interested in are obvious ways in which I believe this could happen rather quickly.


And what if we’re looking at a specific vertical?


Ashok: If we successfully address specific vertical problems, we can generate results that have repercussions that solve problems beyond that, but we have to show an appreciable ROI to a real life business problem (first). And I differ this way to a lot of technologists

 Tom: As I noted earlier, I think this is the key, and this is what I’ve been saying for years. The best way to achieve significant gains in the ROI of text analytics at this point is by incorporating industry/vertical/domain expertise. Text analytics is absolutely not something that will be owned by any one company or domain. Just like math, there are simply too many opportunities and ways it should be combined with other expertise in that vertical.

So yes, expect to see many different things in many different verticals.