Seven Text Analytics Myths Exposed at IIEX

Tom H. C. Anderson
June 15th, 2017

What I Learned from Attendees in IIEX Text Analytics Sessions

This week I had the opportunity to attend and to present at the Insights Innovation Exchange (IIEX) in Atlanta. This conference always provides a wonderful chance to connect with a lot of smart, forward-looking researchers.

For those who missed IIEX or weren’t able to attend my presentation, I provided a case study outlining how we conducted a massive international study in 10 countries and eight languages for almost no cost with results analyzed in just two hours. If you’d like to know more, feel free contact us for a free e-book detailing the project.

My presentation aside, what I’d like to cover here today actually came out of the Text Analytics Information Sessions we were asked to host on Monday, and which I’m pleased to report were well attended—notably by representatives from more than a few major supplier and client brands.

Text Analytics IIEX

I had originally anticipated that there would be more group conversation and peer-to-peer sharing, but it turned out that most of the attendees were less interested in talking than they were in learning, and so the sessions involved quite a bit of Q&A, with my colleague Tim Lynch and I fielding more questions about text analytics, generally, than expected.

What I took from these sessions was a sense that a lot of confusion and misperception around text analytics persists among researchers today and that the industry is urgently in need of more educational resources on the topic (more on this at the end of the blog).

I’ve cherry-picked for you here today the most common misconceptions revealed in these sessions. Hopefully, this will help dispel some persistent myths that do anyone interested in text analytics a huge disservice…

MYTH 1: Text analytics is synonymous with social media monitoring

As I feared, a common misconception about text analytics is that its primary application—and pretty much the extent of its practical utility—is for analyzing social media data. Nothing could be further from the truth!

While social media monitoring firms have done a great job marketing themselves, this is just ONE SMALL SUBSET of data that text analytics can be used to solve for. Moreover, while everyone seems fixated on social media analysis, in my honest opinion, social media monitoring is NOT where the greatest opportunity lies for using text analytics in market research.

And a word of caution: yes, text analytics platforms can easily handle social media data, but the same cannot be said about social media monitoring tools, so be careful not to limit yourself.

MYTH 2: Text analytics are perfect for analyzing qualitative transcripts

I cannot tell you how often I’ve been approach by researchers who want to use text analytics software to analyze focus group transcripts. My first response is always why would you want to do that?

Just because focus group data contains a lot of text doesn’t mean you should run it through a text analytics platform, unless you have very large qualitative communities or run the same exact group 10 times within a category.

Bear in mind, text analytics can be applied quite effectively to small samples (I actually didn’t think so until I learned otherwise from a client), but using small sample IDIs or focus groups doesn’t typically make a lot of sense because text analytics is all about pattern identification.

If you talk to just 15 physicians, for example, you’ll still need to read each of their comments. Text analysis may add additional value, but usually it isn’t worthwhile UNLESS you either have a large enough sample to mine for patterns AND/OR the data is extremely important/valuable (e.g., these are the top 15 MD PhDs in their field working on a life-saving cure).

MYTH 3: Sentiment is REALLY important and useful

Sentiment has been COMPLETELY hyped. In the majority of our text analytics projects sentiment isn’t even a factor. In fact, some firms purporting to offer “text analytics” only offer sentiment analysis. This is unbelievable to me. Having worked with text analytics for the past 15 years I don’t understand why someone would approach data that simplistically. There are so many other, potentially more useful and valuable ways to look at data.

When thinking about text analytics, relevant feature/topic extraction is most important. As important is how this can be turned into actionable advice or a recommended course of action. If you analyze data and come back to management with something as simplistic as “this is what makes people angry,” or happy, chances are you’ll soon be replaced by someone who can tell management how to increase return behavior and revenue.

MYTH 4: Look for AI and Machine Learning

I’ve blogged about this before, and it still drives me nuts!

Everyone seems hung up on this year’s buzzwords—“artificial Intelligence” (AI) and/or “machine learning”—and just about every possible vendor is touting them, whatever the solution they’re selling. For your purposes, I’m telling you they are meaningless.

This is not to say that AI and machine learning are not important—in fact, they’re integral components to the OdinText platform—but they’re terms that are misused, abused, and thrown about cavalierly without any explanation as to how or why they matter. If someone tells you their tool uses AI or machine learning, ask them what they mean by “AI” specifically and to explain precisely how that enables their tool to deliver differentiated results. I’ll wager you’ll walk away from that conversation without any better understanding of why AI is a feature they’re touting than you did before the conversation began. (For more information on this topic, again, read this post.)

Beware also other technical-sounding terms (including sentiment, mentioned above) that frequently crop up around text analytics like NLP (natural language processing), ontologies, taxonomies, support vector machines… I could go on.

If a sales person is throwing jargon like this at you, chances are they are using it to conceal their own lack of knowledge about text analytics.

Conversations should instead focus on: How do I quickly identify the most important topics/ideas mentioned by my customers? How do I know they are important? How do they affect my KPIs? Show me with my data how I can quickly do these things.

MYTH 5: All text analytics are basically the same

Text analytics are not a commoditizable, standardized sort of item. Unlike the deliverables from panel companies or survey vendors, the variety of potential forms text analytics can take is diverse and complex, ranging from more linguistically-based approaches to more mathematical/statistical solutions.

Beyond this, though, practical experience in the given field of application also comes into play. What experience do the developers have in answering problems in your specific field? This will impact underlying thinking as well as user interface considerations.

DO NOT assume that just because a feature is listed in one company’s sell sheet (see buzzwords above, for example), it is a must-have or even a good-to-have, and that you should look for it across vendors.

Again, always fall back to your own data. How does this software tell me how customer group A is different than Group B? How will I know the impact of topics X, Y and Z on sales? These are the questions to ask.

MYTH 6: Text analytics is as easy as just pressing a button and may be totally automated

I’m sorry, but again, no.

On the one hand there are extremely involved and expensive mechanical turk solutions you can purchase. Typically, using one of these solutions will require a few months to build a static dictionary for, say, your customer satisfaction data set, which is then dashboarded. You can easily expect to pay mid-six figures for something like this, and it won’t allow you to do any ad hoc analysis.

The other option is a pure AI/Machine learning solution like IBM’s Watson. It’s fast and cheap because it’s not valuable. (If it were, then IBM could charge a lot more for it.) Look for their case studies and actual customers who have been happy with their solutions. You won’t find many, if any.

Included in the same category as IBM Watson are Microsoft Azure, Amazon AWS and Google NLP tools, as well as vendors that do other things (surveys etc.); plug into one of these and they’ll claim they have “text analytics.” But these tools will not get management what it needs to make intelligent decisions.

The optimal solution is somewhere in between, where machine and human meet in the most effective and intuitive manner. This will mean high-value analysis. What you get back in terms of value of insights depends on the quality of data and the analytical thinking brought to bear by the analyst—just like on any quantitative data project!

MYTH 7: There are lots of great resources for learning about text analytics

Sadly, the net of these IIEX groups on Monday was that it became clear to me that we still don’t have ANY solid educational or training resources devoted to text analytics in this industry. NONE!!!

MR trade orgs don’t offer any; the top masters and MBA programs in research don’t offer much; Burke Institute (whose training I love, by the way) doesn’t offer any…

There aren’t any good books on the subject, either; they’re either way too academic and 10+ years behind, or they’re sales tools in disguise, or it’s just a chapter in a book written by a research generalist who does not specialize in text analytics.

We need educational and training resources rather desperately, it seems.

I plan on continuing to do my part by lecturing on the subject at a few MBA classes each year. I’ve also offered to work with the Burke Institute and the University of Georgia’s Terry School’s Master of Marketing Research program on developing resources.

BUT in the meantime, if you have any questions about text analytics, generally, and totally apart from OdinText, please consider me a resource. Feel free to ping me on LinkedIn or via the info request button here.

I hope this was helpful. Thanks for reading and I welcome your comments!


About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the “Four under 40” market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

2 thoughts on “Seven Text Analytics Myths Exposed at IIEX”

  1. Hi Tom – agree with alot of this. I do think that “AI”/NLP etc should not be downplayed though. You’re right to counsel users to ask more deeply about what they mean by “AI” since the term is bandied about so loosely. As it applies to text analytics though, what we’re generally talking about is machine learning: supervised, unsupervised or (in our case) semi-supervised which keeps “humans in the loop.” Language is an intrinsically human trait and it’s impossible to divorce humans entirely from the process. One other point that i strongly advocate to those looking for solutions: when weeding through accuracy claims, demand to get an F1 score — F1 scores, as you know, are a measure of both precision (how well does the algorithm match human performance (and we advocate 3 independent humans because individual humans generally only agree with each other 65-80% of the time on average), but then also how many “signals” they’re getting out of the data. The more signals the better because sentence or document level analysis is often too coarse for any serious insight work. It’s easy to have high precision with low recall. By demanding (and companies producing) f1 scores for their classifiers, it would demystify some of the questionable claims and otherwise allow users to have assurance on the performance/accuracy of their data when using to either report or model. Thanks for writing this. Clearly alot more education needed.

  2. Thanks Rob. Definitely AI is very useful, interesting and important. But the terms you just mentioned supervised, unsupervised or (in our case) semi-supervised also means relatively little without specificity. Why I wrote this blog post on AI a few months back

    There certainly is a lot of interest on the topic though, and I was recently asked to take part in an interview on the subject, and so I plan on following up here soon with that interview once it’s been published. I think hopefully it will be a bit more illuminating.

Leave a Reply

Your email address will not be published. Required fields are marked *