Text Analytics, The Difficult Future You Can’t Avoid

Tom H. C. Anderson
May 10th, 2012

Looking at Text Analytics by Area of Expertise
(Thoughts from Sentiment Analytics Symposium 2012)

Sentiment Analysis is a terribly difficult problem. The problem is in defining the problem – the input is the problem – but you can’t avoid it, IT IS the future, and I’m very optimistic about it!

Professor Bing Liu started off the Sentiment Analytics Symposium yesterday with the statement above and I couldn’t agree more. Subsequently he gave the pre-workshop audience a detailed 3.5 hour overview of the state of text analytics. It was not surprising to me that almost a quarter of the audience were young developers (MacBook Pro in hand), with the hopes to learn how to incorporate their own sentiment analysis engines into their business applications.

What was a surprise to me though, given that text analytics is so clearly “the future”, was that only three ‘traditional’ marketing research industry firms were represented at the conference (Toluna, Survey Analytics, and my firm Anderson Analytics‘ – OdinText).

I’ll be honest and tell you that I attend/speak at different conferences for different reasons. University events can be a fun way to give back to the next generation of researchers. Marketing Research industry events are an enjoyable and effective way to meet up and network with colleagues and potential clients. I attend Text Analytics events such as the Sentiment Analysis Symposium to keep an eye on the potential competition.

In a growing and competitive field like text analytics I don’t expect to learn too much about what other vendors are doing. In fact I would be very disappointed to learn that someone had gotten further than OdinText in our specific niche (market research). Most wise suppliers are careful what they choose to share, especially if they do not yet have sufficient patents in place. Still it’s always surprising to see what some are willing to divulge.

Of course one thing most text analytics vendors are willing to share are case studies. However getting client permission is often difficult even for those of us who work in the private sector (never mind those doing primarily military/defense work). That is why academics like Bing are such a valuable resource; they often do have relatively broad and deep experience and are more willing to share it.

Don’t get me wrong, there are certain non-critical yet important things suppliers will share with each other. Best practices regarding whether or not to use machine translation before analysis for instance was one of several interesting presentation shared earlier today. Conference Chair, Seth Grimes usually does a great job vetting the various speakers, which makes SAS12 one of the better conferences on text analytics.

So, what was my overall takeaway from this year’s event?
I think it’s becoming clearer and clearer to everyone that domain expertise, data source, and objective of research all benefit from some level of customization and expertise:

  • Domain Expertise – Using the same approach to sentiment across brands or industries, while possible, certainly isn’t as good as customization. But customization can be expensive and time consuming
  • Data Source Expertise – A marketing research open ended survey response is very direct, and so is a tweet believe it or not (because of its short concise 140 character limit). Blogs or news articles on the other hand can be very indirect and require a very different approach
  • Objective Expertise – An analytical approach to understanding how to gain actionable consumer insights is very different than detecting fraud/terrorism or selecting the best resumes out of a bunch

So do you need to find a text analytics software firm that specializes only on your industry, on your specific data source of interest, and on your specific use case?

Some approaches will lend themselves to being useful across different similar data sources (such as survey data and twitter data I mentioned earlier), also many industries can benefit from the same or only slightly customized sentiment algorithms. However, be wary of any firm that claims to do be able to handle any kind of data for any industry, for any use case. No one tool is a panacea for all use cases. Luckily there are likely to be many smart folks who have been working on your specific use case for some time.

So while I was surprised that there weren’t any Honomichl Top 5 firms represented at this year’s event, I can’t say I’m not happy about it. There’s enough competition in the field, and I’m more than happy to license OdinText to these firms 😉