Posts tagged mixed data
IIEX 2016 Competition Showcases Innovation in Market Research

Artificial Intelligence, Mixed Data Analytics and Passive Listening Capture Minds - 2016 Insight Innovation Exchange

I’m just back from the IIEX conference in Atlanta, where OdinText competed in the Insight Innovation Competition. Although I was disappointed that we didn’t win, I’m pleased to report that the judges told me we placed a very close second.

IIeX 2016

IIeX 2016

Attending conferences like this affords me the opportunity to get a pulse on the industry, and I was struck by the fact that text analytics are no longer viewed as a shiny new toy in market research. In fact, as someone who has been working in the natural language processing field for so long, it’s actually somewhat remarkable to see how perceptions of text analytics have matured over just the last year.  Text analytics have become a must-have, and the market has a new wave of healthy competition as a result, which I think is further evidence of a healthy market.

Since OdinText goes beyond just text data and incorporates mixed data—text and quantitative—in our competition pitch we highlighted OdinText’s ability to essentially enable market researchers to do data science.

I strongly believe making data science more accessible is a huge opportunity that OdinText is uniquely positioned to solve, and it’s an area where market researchers can step up to meet a desperate need as we currently have a shortage of about 200,000 data scientists in the US alone.

(Check out this 5-minute video of my IIEX competition pitch and let me know what YOU think!)

Download PDF

Download PDF

You are also most welcome to download a PDF of the PPT presentation >>>

“Machine learning” appears to be the new buzz phrase in research circles, and at IIEX I was hard pressed to find a single vendor not claiming to use machine learning in some respect, no matter where on the service chain they fit. Honestly, though, I got the sense that many use the term without entirely understanding what it means.

We continue to leverage machine learning where it makes sense at OdinText, and there are a few other vendors out there who also clearly have an excellent grasp of the technique.

One such company—which took first place in the competition, in fact—was Remesh. They’re actually using machine learning in a very unique and novel way, by automating the role of an online moderator almost akin to a chat bot. They’ve positioned this as AI, and to replace humans completely with a computer is a holy grail for almost any industry.

I’m optimistic on AI in my field of data and text mining as well, but we’re still a ways off in terms of taking the human out of the mix, and so our goal at OdinText is to use the human as efficiently as possible.

While totally automating what a data scientist does is appealing, in the short term we’re happy with being able to allow a market researcher to do in a few hours what would take a typical data scientist with skills in advanced statistics, NLP, Python, R and C++ days or weeks to do.

Still I admit the prospect of AI replacing researchers completely is an interesting one—albeit not necessarily a popular one among the people who would be replaced—and it’s an area that I’m certainly thinking about.

Third place in the competition I understand was Beatgrid Media, which leverages smart phones (without using almost any battery life) to passively listen to audio streams from radio and TV and overlaying geo demographics with these panelists’ data to better predict advertising reach and efficacy. This is admittedly going to be a very hard field to break into by a start-up as there are many big players in the space who want to own their own measurement. And so this may have been one of the reasons Beatgrid had trouble taking more than third, even though they admittedly have some very interesting technology that could perhaps also be applied in other ways.

Let me know what you think!

(And if you’re interested in a demo of OdinText, please contact us here!)

Tom H.C. Anderson | @TomHCanderson | @OdinText

Tom H.C. Anderson

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Preventing Customer Churn with Text Analytics

3 Ways You Can Improve Your Lost Customer Analysis

Preventing Customer Churn with Text Analytics

Lapsed Customers, Customer Churn, Customer Attrition, Customer Defection, Lost Customers, Non-Renewals, whatever you call them this kind of customer research is becoming more relevant everywhere, and we are seeing more and more companies turning to text analytics in order to better answer how to retain more customers longer.  Why are they turning to text analytics? Because no structured survey data does a better job predicting customer behavior as well as actual voice of customer text comments!

Today’s post will highlight 3 mistakes we often see being made in this kind of research.

1. Most Customer Loss/Churn Analysis is done on the customers who leave, in isolation from customers who stay. Understandable since it would make little sense to ask a customer who is still with you a survey question such as “Why have you stopped buying from us?”. But customer churn analysis can be so much more powerful if you are able to compare customers who are still with you to those who have left. There are a couple of ways to do this:

  • Whether or not you conduct a separate lapsed customer survey among those who are no longer purchasing, also consider doing a separate post-hoc analysis of your customer satisfaction survey data. It doesn’t have to be current. Just take a time period of say the last 6-9 months and analyze the comment data from those customers who have left VS those who are still with you. What did the two groups say differently just before the lapsed customers left? Can these results be used to predict who is likely to churn ahead of time? The answer is very likely yes, and in many cases you can do something about it!
  • Whenever possible text questions should be asked of all customers, not just a subgroup such as the leavers. Here sampling as well as how you ask the questions both come into play.

Consider expanding your sampling frame to include not just customers who are no longer purchasing from you, but also customers who are still purchasing from you (especially those who are purchasing more) as well as those still purchasing, but purchasing less. What you really want to understand after all is what is driving purchasing – who gives a damn if they claim they are more or less likely to recommend you – promoter and detractor analysis is over hyped!

Reducing Customer Churn

You may also consider casting an even wider sampling net than just past and current customers. Why not use a panel sample provider and try to include some competitor’s customer as well? You will need to draw the line somewhere for scope and budget, but you get the idea. The survey should be short and concise and should have the text questions up front, starting very broad (top of mind unaided) and then probe.

Begin with a question such as “Q. How, if at all, has your purchasing of Category X changed over the last couple of months?” and/or “Q. You indicated your purchasing of category X has changed, why? (Please be as specific as possible)”. Or perhaps even better, “Q. How if at all has your purchasing of category X changed over the past couple of months? If it has not changed please also explain why it hasn’t changed? (please be as specific as possible)”. As you can see, almost anyone can answer these questions no matter how much or little they have purchased. This is exactly what is needed for predictive text analytics! Having only leaver’s data will be insufficient!

2. Include other structured (real behavior data in the analysis). Some researchers analyze their survey data in isolation. Mixed data usually adds predictive power, especially if it’s real behavior data from your CRM database, and not just stated/recall behavior from your survey. In either case, the key to unlocking meaning and predictability is likely to come from the unstructured comment data. Nothing else can do a better job explaining what happened to them.

3. PLEASE PLEASE, Resist the urge to start your leaver survey with a structured question asking a battery of “check all that apply” reasons for leaving/shopping less. Your various pre-defined reasons, even if you include an “Other Specify_____” will have several negative effects on your data quality.

First, the customer will often forget their primary reason for their change in purchase frequency, they will assume incorrectly that you are most interested in these reasons you have pre-identified. Second there will be no way for you to tell which of these several reasons they are now likely to check, is truly the most important to them. Third, some customers will repeat themselves in the other specify, while others will decide not to answer it at all since they checked so many of your boxes. Either way, you’ve just destroyed the best chance you had in accurately understanding why your customers purchasing has changed!

These are many other ways to improve your insights in lapsed customer survey research by asking fewer yet better comment questions in the right order.  I hope the above tips have given you some things to consider. We’re happy to give you additional tips if you like, and we often find that as customers begin using OdinText their use of survey data both structured and unstructured improves greatly along with their understanding of their customers.


Brand Analytics Tips – How Old is Your Brand?

Text Analytics Tips Text Analytics Tips Answers, How Old Is Your Brand? - Using OdinText on Brand Mention Type Comment Data By Tom H. C. Anderson

[METHODOLOGICAL NOTES (If you’re not a researcher feel free to skip down to ‘Brands & Age’ section below): In our first official Text Analytics Tips I’ve started with exploring one of the arguably simplest types of unstructured/text data there is, the unaided top-of-mind ‘brand mention’ open-ended survey question. These kinds of questions are especially important to brand positioning, brand equity, brand loyalty and advertising effectiveness research. In this case we’ve allowed for more than one brand mention. The questions reads “Q. When you think of brand names, what company’s product or service brand names first come to mind? [Please name at least 5]”. The question was fielded to n=1,089 US Gen Pop Representative survey respondents in the CriticalMix Panel in December of 2015. The confidence interval is +/-2.9% at the 95% confidence level]

Making Good Use Comment Data Can Be Easy and Insightful

An interesting and rather unique way to look at your brand is to understand for whom it is most likely to be top-of-mind.

Unfortunately, though they have proven more accurate than structured choice or Likert scale rating questions in predicting actual behavior, free form (open end) survey questions are rare due to the assumed difficulty in analyzing results.  Even when they are included in a survey and analyzed, results are rarely expressed in anything more useful than a simple frequency ranked table (or worse, a word cloud). Thanks to the unique patented approach to unstructured and structured data in OdinText, analyzing this type of data is both fast and easy, and insights are only limited to the savviness of the analyst.

The core question asked here is rather simple i.e. “When you think of brand names, what company’s product or service brand names first come to mind?”. However, asking this question to over a thousand people, because of the share volume of brands that will be mentioned (in our case well over 500), even this ‘small data’ can seem overwhelming in volume.

The purpose of this post is to show you just how easy/fast yet insightful analysis of even more specific and technically more basic comment data can be using Next Generation Text AnalyticsTM.

After uploading the data into OdinText, there are numerous ways to look at this comment data, not only the somewhat more obvious frequency counts, but also several other statistics including any interesting relationships to available structured data. Today we will be looking at how brand mentions are related to just one such variable, the age of the respondent. [Come back tomorrow and we may take a look at a few other statistics and variable relationships.]

Text Analytics Tips Age OdinText

Brands by Age

Below is a sortable list of the most frequently mentioned brands ranked by the average age of those mentioning said brand. This is a direct export from OdinText. The best way to think about lists like these is comparatively (i.e. how old is my brand vs. other brands?). If showing a table such as this in a presentation I would highly recommend color coding which can be done either in OdinText (depending on your version), or in excel using the conditional formatting tool.

[NOTE: For additional analytics notes and visualizations please scroll to the bottom of the table below]


Brand Name Average Age
Maxwell House 66
Hunts 66
Aspirin 66
Chrysler 64.6
Stouffers 63.7
Marie Callender's 63.7
Walgreen 63.7
Cooper (Mini) 63.7
Bayer 62.6
USAA 62.5
Epson 62.5
Brother 61.3
Aol 61.3
Comet 61.3
Snapple 61.3
Lowes 61.2
Marriott 60.3
Ritz 60.3
Hellman's 60.3
Ikea 60.3
Belk 60.3
State Farm 60.3
Oscar Mayer 60
Folgers 59.8
Libby's 59.8
Hormel 59.2
Depot 59.2
Heinz 59.2
Electric 59.2
Bordens 59.2
Nestles 59
Green Giant 59
Sargento 58.3
Del Monte 58
Prego 58
Kashi 58
Westinghouse 58
Stouffer 58
Taylor 58
Home Depot 57.6
Publix 57.5
Banquet (Frozen Dinners) 57.5
Buick 57
Krogers 57
Hellman's 57
Safeway 56.5
Purex 56.4
Hewlett 56.4
Unilever 56.1
RCA 56.1
Post 56.1
P&G 55.9
Budweiser 55.9
Yoplait 55.8
Chobani 55.7
Ragu 55.7
Campbell's 55.5
Wells Fargo 55.2
Hershey 55.1
Betty Crocker 55
Sharp 55
Hines 55
Trader Joe's 55
Palmolive 54.9
Kia 54.7
Lexus 54.7
Life 54.7
Hotpoint 54.7
Campbells 54.6
Oscar Mayer 54.5
Dial 54.4
Nissan 54.4
Hillshire Farms 54.3
Motorola 54.1
Keebler 54
CVS 53.8
Canon 53.8
Lakes 53.7
Pillsbury 53.3
Hilton 53.3
Faded Glory 53.3
Friskies 53.3
Duncan Hines 53.3
Puffs 53.3
Olay 52.8
Sketchers 52.5
Fred Meyer 52.5
Delta 52.5
Hunt 52.3
Bose 52.3
Ocean Spray 52.3
Ivory 52.3
Swanson 52.3
Dewalt 52.3
Firestone 51.8
Estee Lauder 51.5
Miller 51.5
Tide 51.4
Honda 51.3
Meijer 51.3
Perdue 51.3
Jeep 51.3
Head 51.3
Lee Jeans 51.3
Pantene 51
Chevrolet 51
Cannon 50.8
Chef Boyardee 50.8
Frito Lay 50.6
Avon 50.5
Motors 50.4
Kodak 50.4
General Mills 50.2
BMW 50
Lipton 49.8
Kohl's 49.8
Goodyear 49.7
Kraft 49.6
Craftsman 49.5
Sunbeam 49.4
IBM 49.3
Frigidare 49.1
Sears 49.1
Ford 49.1
Walgreens 49.1
Dole 49.1
Chevy 49
Wonder (Bread) 49
Dannon 49
JVC 49
Hyundai 49
Clinique 49
Marlboro 49
Mercedes 49
Gerber 49
Acme 49
Kleenex 48.8
Kelloggs 48.7
JC Penney 48.6
Louis Vuitton 48.5
Calvin 48.4
LL Bean 48.4
Gillette 48.4
Johnson & Johnson 48.3
Shell 48.3
Kenmore 48.1
Dawn 48
Hanes 48
Macdonalds 48
Tylenol 48
Colgate 47.5
Wrangler (Jeans) 47.3
Burger King 47.3
Whirlpool 47.1
GMC 47
Yahoo 46.9
Dish Network 46.8
Verizon 46.7
Hersheys 46.6
Whole Foods 46.5
Sara Lee 46.5
Hostess 46.5
Mazda 46.5
Toyota 46.4
Arm & Hammer 46.4
Nabisco 46.3
Tyson 46.1
Starbucks 46
Wal-Mart 45.9
Western Family 45.8
Wegmans 45.8
Dr Pepper 45.7
Hulu 45.7
Time Warner 45.7
Maybelline 45.7
MLB 45.7
Iams 45.7
Cox 45.7
Country Crock 45.7
Compaq 45.7
Sonoma 45.7
Quaker Oats 45.7
Nordstrom 45.4
Coca 45.3
Champion 45.3
Bass 45
Chrome 44.7
Coors 44.7
iPhone 44.6
Bounty 44.5
Dodge 44.4
Maytag 44.3
Black & Decker 44.2
Pfizer 44.2
Suave 44.2
HP 44
Scott 44
Subway 44
Skechers 44
Geico 44
Panasonic 43.9
Lays 43.8
KFC 43.8
Charmin 43.8
Dell 43.8
Polo 43.8
Windex 43.7
Burts Bees 43.5
Purina 43.5
Clorox 43.5
Columbia 43.3
Ralph Lauren 43.2
Visa 43.2
Pepsi 43
Crest 43
NFL 43
Sanyo 43
Dove 42.9
Intel 42.9
Wendy's 42.8
Kroger 42.8
Remington 42.3
Phillips 42.3
Mars 42.3
Cover Girl 42.3
Heb 42.3
Twitter 42.3
Amazon 42
Body Works 42
Best Buy 41.8
Costco 41.8
Banana Republic 41.8
Disney 41.7
Amway 41.7
Levi 41.5
Sony 41.4
Samsung 41.4
Macy's 41.1
Glade 41.1
Boost 41
Boost Mobile 41
Toshiba 40.8
Ebay 40.8
Comcast 40.7
Facebook 40.6
Walmart 40.5
Microsoft 40.5
Google 40.4
Kitchen 40.4
Nestle 39.8
Mcdonalds 39.5
Gucci 39.5
Vons 39.3
Philip Morris 39.3
Loreal 39.3
Mattel 39.1
Apple 39
Pepperidge Farm 39
Vizio 39
Lysol 39
Ugg 39
Tropicana 39
Sure 39
Fila 39
Tmobile 39
Coach 38.9
Acer 38.8
Tommy Hilfiger 38.6
Nike 38.1
Target 38
Old Navy 37.9
Chase 37.8
Michael Kors 37.7
K-Mart 37.5
Lenovo 37.5
Equate 37.2
Hoover 36.8
Under Armour 36.6
Windows 36.5
Asics 36.5
Kitchenaid 36.5
Victoria's Secret 36.2
Mac 36.1
Reebok 36.1
Android 36
Direct TV 36
Sprint 36
Netflix 35.9
Adidas 35.7
Citizen 35.7
New Balance 35.6
Guess 35.4
Bic 35.2
Great Value 35.2
Pizza Hut 35
Puma 34.9
Asus 34.4
Fox 34.3
Justice 34.3
North Face 34.1
Xbox 33.6
Gap 33.4
Doritos 33.4
HTC 33.4
Converse 33.3
Sprite 33.2
Febreeze 33
Axe 33
Kay 32.7
Glad 32.7
Mary Kay 32.7
Viva 32.7
Reese's 31.8
Lego 31.7
Amazon Prime 31.5
Nintendo 31.2
Vans 31.2
Taco Bell 31
Fisher Price 30.4
Chanel 29.7
Old Spice 29.7
Playstation 29.4
Eagle 29.4
Hamilton Beach 29.3
Footlocker 29.3
Pink 29.3
Swiffer 29.3
Timberlands 29.3
Naked Juice 29
Youtube 29
Bing 29
Air Jordans 28.4
Huggies 28.2
Aeropostale 27.7
Hollister 27.3
Prada 27.3
Carters 26.8
Kirkland 26.3
Forever 26.3
Aeropostle 26.3
Arizona 25.6
Pampers 24.5
Versace 24.5
Urban Outfitters 24.5


A few interesting points from the longer list of brands are:

The oldest brand, “Maxwell House Coffee”, has an average age of 66. (If anything, this mean age is actually conservative, as the age question gets coded as 66 for anyone answering that they are “65 or older”). This is a typical technique in OdinText, choosing the mid-point to calculate the mean if the data are in numeric ranges, as is often the case with survey or customer entry form based data.

The Youngest brand on the list, “Urban Outfitters”, with an average age of 24 also probably skews even younger in actuality for the same reason (as is standard in studies representative of the US General Population, typically only adults aged 18+ are included in the research).

Dr Pepper is in the exact middle of our list  (46 years old). Brands like Dr. Pepper which are in the middle (with an average age close to the upper range of Generation X) are of course popular not just among those 46 years old, but are likely to be popular across a wider range of ages. A good example, Coca-Cola also near the middle, mentioned by 156 people with an average age of 45, is pulling from both young and old. The most interesting thing then, as is usual in almost any research, is comparative analysis. Where is Pepsi relative to Coke for instance? As you might suspect, Pepsi does skew younger, but only somewhat younger on average, mentioned by 107 consumers yielding an average for the brand of 43. As is the case with most data, relative differences are often more valuable than specific values.

If there are any high level category trends here related to age, they seem to be that Clothing brands like Urban Outfitters and Versace (both with the youngest average age of 24), Aeropostale (26), and Forever 21 (Ironically with an average age of 26), and several others in the clothing retail category tend to skew very young. Snack Food especially drinks like Arizona Ice Tea (age 25), and Naked Juice (29), as well as web properties (Bing and YouTube both 29), and electronics (obviously PlayStation 29 and slightly older Nintendo 31 being examples), are associated with a younger demographic on average.

In the middle age group, other than products with a wide user base like major soda brands, anything related to the home, either entertainment like Time Warner Cable or even Hulu (both 45), or major retailers like Wegmans and Wal-Mart (also both 45), are likely to skew more middle age.

The scariest position for a brand manager is probably at the top of the list, with average age for Maxwell House, and Hunts (both 66), Stouffers and Marie Callender's (both 64), the question has got to be, who will replace my customer base when they die? What we see by looking at the data are in fact that a slight negative correlation between age and number of mentions.

Again, it’s often the comparative differences that are interesting to look at, and of course the variance. Take Coca-Cola VS Pepsi for instance, while their mean ages are surprisingly close to each other at 45 and 43 respectively, looking at the variance associated with each gives us the spread (i.e. which brand is pulling from a broader demographic). Coca-Cola with a standard deviation of 14.5 years for instance is pulling from a wider demographic than Pepsi which as a standard deviation of 12.9 years. There are several ways to visualize these data and questions in OdinText, though some of our clients also like to use OdinText output in visualization software like Tableau which can have more visualization options, but little to no text analytics capabilities.

Co-Occurrence (aka Market Basket Analysis)

Last but not least, looking at which brands are often mentioned together, either because they are head to head competitors going after the exact same customers or because there may be complimentary (market basket analysis type opportunities if you will) can also certainly be interesting to look at. Brands that co-occur frequently (are mentioned by the same customers), and are not competitors may in fact represent interesting opportunities for ‘co-opetition’.  You may have noticed more cross category partnering on advertising recently as marketers seem to be catching on to the value of joining forces in this manner. Below is one such visualization created using OdinText with just the Top 20 brand mentions visualized in an x-y plot using multi-dimensional scaling (MDS) to plot co-occurrence of brand names.

Text Analytics of Brands with OdinText

Hope you enjoyed today’s discussion of a very simple text question and what can be done with it in OdinText. Come back again soon as we will be giving more tips and mini analysis on interesting mixed data. In fact, if there is significant interest in today’s post we could look at one or two other variables and how they relate to brand awareness comment data tomorrow.

Of course if you aren’t already using OdinText, please feel free to request a demo here.


OdinText Wins 2015 CASRO Research Award

CASRO Honors OdinText’s Innovative Next Generation Text Analytics Software at 40th Annual Conference OdinText, a provider of cloud-based analytics software, today announced that its Next Generation Text Analytics software-as-a-service (SaaS) product, has been awarded the Research Entrepreneur of the Year award by CASRO, an organization that represents more than 300 companies and market research operations.

The award honors organizations that—through the excellence of their work, professionalism of their practice, and integrity of their conduct— exemplify the best work in the research industry. The award also acknowledges an organization that has introduced a new direction or service to its research business portfolio and provides leading-edge and innovative services that expand traditional market, opinion, and social research.

Recognized for its patented SaaS technology, OdinText allows companies to analyze large amounts of unstructured and mixed data. OdinText can be used across various types of data including but not limited to survey research, email and telephone data, discussion board ratings, and news articles.

“At OdinText, we don’t see a difference between structured and unstructured data - text mining and data mining – they are far more meaningful together,” said Tom H. C. Anderson, CEO of OdinText. “We are honored to be recognized by CASRO, an organization that has such a long history of championing innovative and sound research techniques.”

In addition to exploring patterns in the data and allowing users to confirm hypothesis, OdinText suggests key relationships in the data that may be overlooked by the user. The software also allows for one-step simulation and predictive analytics.

“Marketing research is evolving, getting both broader and deeper in terms of skill sets needed to succeed,” said Jim DeMarco, vice president of business intelligence and analytics at FreshDirect. “OdinText provides researchers with the capability to access more advanced analysis quicker and helps the business they work on gain an information advantage. This is exactly the kind of innovation our industry needs right now.”

The Coca-Cola Company as well as online grocer, FreshDirect sponsored OdinText’s nomination and the company received the award at CASRO’s 40th Annual Conference, in addition to the $5,000 prize.

“The work of OdinText is indicative of the exciting new methodologies and technologies which are having an increased influence on our changing industry,” said Diane Bowers, president of CASRO. “Acknowledgement of this type of work and the financial support that accompanied this honor highlights our role as a leader in the future of our industry.”


About OdinText Inc. OdinText’s Next Generation Text AnalyticsTM turns market researchers into data scientists. The powerful cloud-based software helps users discover patterns and trends in complex unstructured text data. Visit to learn more or schedule a demo. Backed by Connecticut Innovations and private investors, OdinText is a privately-held company based in Stamford, Conn. Request more information here.

Top 10 Big Data Analytics Tips

Top10AnalyticsTips As part of the interview series leading up to the Useful Business Analytics Summit today we post the Top 10 Tips from our analytics experts. Whether you are data mine more structured data, or like myself more often work with unstructured or mixed data using text analytics, I think you’ll agree that the following 10 tips are critical.

  1. Keep It [ridiculously] Simple (10 times more so than is necessary to get your point across).
  2. Hypothesize/Put Problem First
  3. Don’t Assume Data is Good – Check/Validate!
  4. Automate repeat tasks & Carve out time to go exploring
  5. Set a Data Strategy – don’t just collect data for the sake of collecting it
  6. In a rapidly expanding field, work with people on the leading edge
  7. Be a Skeptic about models etc.
  8. Look for the pragmatic and cost effective solutions
  9. Don’t torture Data – in the end it will confess
  10. Think like a Business Owner – what would you like to know?

Below are more detailed tips from some of our client experts. We’d love to hear you tips if you’ve got one to add in the comments section.



Honestly, I think I’d boil it down to a single tip that is more important than all others, in my experience, but is the one most ignored and poorly executed. Keep it simple. Ridiculously simple. Ten times more simple than what you think necessary. Just about then, you are actually getting your point across in a way that people are starting to follow you. You can always increase the complexity from there, but the first time you have an experience and realize that you’ve actually conveyed a complex analytical presentation to a group of C-suite execs, you’ll understand what you’ve been doing wrong this whole time before. Hint – those head nods and blank stares aren’t what you are looking for…


- Understand that any problem is easier if you approach it correctly don't necessarily take a cookie cutter approach. Conventional wisdom is not so wise in a rapidly evolving field.

- Work with people who are able to work on the leading edge ...the people who are helping expand the envelope.



Automate anything you do more than once. It’s very easy to fill your time with routine pulls of data which lie just beyond the reach of the visualization tools available to business stakeholders. You can’t ignore these requests and it frankly feels great for us geeks to bask in the gratitude of camera-ready cool kids, but these tasks may not represent the highest-value use of your time. The more experience you have with the data, the more likely you are to be the only person with eyes on a particular business problem. So carve out time to go exploring. Think entrepreneurially like a business owner, and ask yourself “if I owned this P&L, what would I want to know?”


  -Ensure there is a purpose you understand of why analytics is valuable to the organization. Purpose can be a business sponsor like discovering new ways (i.e. products, markets, etc.) to increase revenue, retention, profit, or control costs. So ask the tough questions and align with executives mandates.

-Ensure clarity around the level of effort you spend gathering data vs. designing experiments, mining and analyzing data. The need / urge to have data to accomplish a specific task can lead to disparate / disjointed data gathering and management effort that can take over the data scientist or analytics professional work and analytics can become a second thought. So be a sponsor or an advocate for a data strategy.


1) Don't assume the data is good. Is the data lineage (with transformation rules) exposed? Is data quality measured and reportable as a trend?

2) Hypothesize and/or uncover non-time-based relationships: These are usually the richest.



Double check your results using data from different sources

Make sure it makes sense

In case of discrepancies use it directionally

Reach out to experts to obtain their opinion



1. Think of the broader perspective. Take a step back. Understand the business and the problem before jumping into solutions.

2. Be an analyst: Adopt a critical approach to thinking all analytical problems. There is nothing wrong with a slight dose of skepticism about models and results. It is healthy.

3. Try to find pragmatic and cost-effective models / solutions. For example you can probably do machine learning and neural networks to solve a lot of problems but a linear regression might sometimes be enough.



 1. Be humble: sometimes data tells us nothing or, worse, will lie to us. Cognitive dissonance is the norm rather than the exception.

2. If you torture data it will confess to any sins (attributed to Frank Harrell).

3. Go ahead, ask questions, be curious, don't be afraid to cross cultures.


Big thanks again to our client side analytics experts. Feel free to check out our previous questions on Big Data and How to Keep Up on Analytics. Don’t forget to check back in for our next question about the value of various types of data… Look forward to seeing you at the Summit!




[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]

The N in Text Analytics: Text Mining with Different Sample Sizes

 [Interview Reposted with Permission From Jeffrey Henning's ResearhAccess] runes of old

I recently had the opportunity to interview Tom H. C. Anderson, the founder of Anderson Analytics, about his ongoing application of text analytics to market research.

Q: What’s the process for optimally using text analytics with survey verbatim responses?

A: Well, that patented process is something that we’ve obviously put a lot of time and thought into with OdinText, and something that continues to evolve.

Generally speaking though I can say it’s important to look beyond the individual sentences, and not to get wrapped up in linguistically derived sentiment. The mistake I see being made most often is that text analytics is approached as a replacement for human coding. In our view they are apples and oranges. Yes, text analytics can replace human coding. But coding is just a small part of what we do: our real focus is on analytics, and often that means that the optimal use of verbatim responses is predictive analytics. That is the optimal use of survey verbatims.

Q: Is there a minimal sample size this makes sense for?

A:  I wouldn’t say that there’s a minimum size per se, though I would say that the ROI of text analytics increases exponentially with the size of the data. In our point of view “Natural Language Processing”, “Text Analytics”, “Text Mining” and even “Data Mining” are all synonyms, the last two of which are a better description of the process. What that means is that without a certain minimum size of data there will be no meaningful patterns to find (to mine).

Focus group data generally is not suitable for text analytics. It’s partly because the n is so small. But also because — although they can produce a large amount of text in total — this text is heavily influenced by the moderator. It very much depends on the data though. The smallest data size ever looked at in OdinText had sample size of n=2. This was the Obama/Romney debates, and each candidate spoke for about 45 minutes. More typically, though, text analytics is used to analyze tens of thousands, or hundreds of thousands, of records. These data are either from customer satisfaction/loyalty survey trackers, customer service center telephone transcripts or emails, or yes, social media.

Many of our customers do find text analytics useful for smaller ad-hoc survey data with sample sizes around n = ~1,000 as well. Once you are up and running with text analytics, it’s very easy and fast to use text analytics to get insights from data such as this. But you are somewhat more limited with the kinds of analysis that you can do with these smaller data sets. But if you do enough of these ad-hoc projects, text analytics can certainly provide relatively good ROI here too.

Q: Is it better suited for tracking studies rather than one-off surveys?

A: Better ROI with bigger better data. If you only do 5 to 10 ad-hoc surveys per year with an average of n=300, then text analytics may not be worth it.  As you move beyond this, it becomes more and more valuable.

Q: My initial impression after first hearing about your NPS work was simply that you improved the value of the survey by adding text analytics. But it seems like you are really about a holistic process, using CRM data and other information to build a predictive model. What are the data sources that you find produce the best value? While I think of Odin Text as text analytics, is it actually a predictive analytics solution whose differentiation is its text analytics capabilities?

A: Well, yes, you are right that OdinText is a text analytics system. We are not trying to become the next SAS or SPSS, per se; both of them have some good packages for basic statistics. Where OdinText is best is when there is also text data, and when the data gets bigger. Our clients are often working with data sets so large that they would take too long to run or more typically crash SPSS and the like. Working with text data requires more computing power. That’s something we are able to offer through our SaaS model.

In the case you mentioned, Shell was using OdinText to analyze their n = ~400,000 Jiffy Lube Net Promoter survey data. We suggested that they add some data from their CRM database, so they added actual behavioral data: visits as well as sales.

This is a unique strength to OdinText. We don’t believe it makes much sense to analyze text in isolation. We are building more analytics capabilities into OdinText currently.

Q: The text analytics space is very crowded — I’ve personally look at over 20 platforms. What sets Odin Text apart from other systems?

Three things, really all tied to our patented approach to text analytics:

  1. The way we allow you to use mixed data not just text.
  2. The way we filter our ‘noise’ and alert the analyst to things they might not have considered.
  3. And finally, our approach, while powerful, is also intuitive. We recognized early on that most clients don’t have any relevant training data, and when they do, using it to build models would just be mimicking inferior human coding. So unlike other enterprise solutions that require a lot of custom set up, our approach was developed to work very well off the shelf: it’s far more nimble in being able to deal with different data sources.

Jeffrey Henning, PRC, is president of Researchscape International, which provides “Do It For You” custom surveys at Do It Yourself prices.  He is a Director at Large on the Marketing Research Association’s Board of Directors. You can follow him on Twitter @jhenning.

New Text Analytics Process Patented

Anderson Analytics’ OdinText Announces Patent for Powerful New Text Analytics Process  

Text analytics Patent Anderson Analtyics OdinText

Anderson Analytics today announced the granting of Patent No. 8,473,498 by the United States Patent and Trademark Office (USTPO) for the new powerful Natural Language Text Analytics process utilized in their OdinText software.

The approach leverages contextual data and provides a process for filtering out the noise which is so common in unstructured data. Both of these important benefits have been deficient in text analytics software until now.

"The problem with most approaches of text analytics out there is that they are focused on trying to do what humans do best rather than on what computers do best. They’re also focused on the individual document level, thus completely missing some of the greatest benefits that come from the increases in computing power, consideration of contextual information, and statistical techniques." explained founder Tom H. C. Anderson, adding "In our approach, it’s not just about unstructured (text) data any more. It’s often about ‘mixed data’, both structured and unstructured, and this approach takes advantage of that whenever possible".

OdinText can read and analyze millions of customer comments and other text data in a matter of minutes. The process is extremely fast and leverages the 100% consistency of coding inherent in text analytics. The approach works with non-English language text as well. An international patent application has been filed, designating a large number of countries, including the European Patent Office.

Marketing researchers can use OdinText to monitor and improve their customer satisfaction programs or understand key drivers from various other survey data. The software is also being used by customer service departments to analyze customer complaints, praise and suggestions received via telephone or email. Finally, OdinText is also extremely powerful when analyzing various social media and web based data regardless of whether foreign languages, slang or other acronyms are used.

In 2005, Anderson Analytics became the first marketing research firm to leverage text analytics and has been honored with several awards for innovation from trade organizations such as the American Marketing Association, the Advertising Research Foundation, and the European Society for Opinion and Market Research. The firm’s expertise ranges across several industries and includes traditional fortune 500 companies from Disney to Unilever as well as new social media giants such as Facebook and LinkedIn.

About OdinText - Text Analytics Applied!(TM)

OdinText is the first text analytics software platform developed by market research professionals for market research professionals. OdinText leverages a decade of Anderson Analytics’ experience in actual Applied Text Analytics(TM). Those interested in finding out more about OdinText including how to request a demo may do so at OdinText Info Request.