Posts tagged trending
Why Text Analytics Needs to Move at the Speed of Slang

Do You Speak Teen? 10 Terms You May Not Know

Translating the words teens use has been a headache and source of embarrassment for generations of parents. It’s as though the kids speak a different language. Let’s call it a “slanguage”. And you know you’re old when you need google to understand it.

Nowadays, too, it’s much harder to bridge the communication gap because the Internet has dramatically increased the pace at which slanguage changes. In fact, every year hundreds of new slang words and phrases that originated on the Internet are added to the terrestrial dictionary.

"Slanguage" is a moving target

And thanks to social media, new terms, phrases and acronyms—which, in cases, can describe an entire situation—crop up and go viral literally overnight.

In short, slanguage has become a moving target; it seems to change faster than we can pick it up. As soon as we’re proficient, we’re out of touch again.

For obvious reasons, this is not only a problem for parents, it’s particularly frustrating for anyone researching or marketing to youth.

The Problem with “Dictionaries”

Text analytics software has enabled us to monitor what young people are saying online, but it does us little good when the software can’t keep up with slanguage.

One of the primary weaknesses of most text analytics software platforms is that they rely on “dictionaries” to understand what is being discussed or to assign sentiment.

These dictionaries are only as good as the data used to create them. If the data changes in any way (e.g., new words are used or used in different ways) the software will miss it.

So in order to stay current using a conventional text analytics platform, one must manually identify new slang terms as they emerge and continually update the dictionary.

In contrast, OdinText is uniquely able to identify new, never-before-used terms—slang, acronyms, industry jargon, new product/competitor names, etc.—without user input.

Test Your Teenspeak Proficiency!

Staying abreast of changes in teenspeak requires some vigilance. You may be further out of touch than you realize. Let’s take a quiz: just for fun, I randomly pulled 10 terms that have become popular with post-Millennials (very roughly rank-ordered by use below).

If you’re not familiar with these terms or can’t define them, don’t worry. You’re not alone. I didn’t understand any of them either. It’s not necessarily easy to figure out what many of these new terms mean, either.

A conventional, mainstream dictionary won’t be any help here, but the Urban Dictionary can be a lifesaver. You can also learn a lot by researching the images online that are associated with a new trending slang term (especially “memes”) for context. YouTube videos and music can be similarly helpful.

Triangulating using these sources and the most common context is often the best way to stay on top of these moving targets, which as I noted come and go relatively quickly.

Many of the ten I’ve listed below can have more than one meaning depending on context, and some may even be used differently by different demographic groups and even within the same demographic group.

So, without further ado, here are 10 of the top 10 slang terms we’ve spotted circulating within the past few months.

Without skipping ahead to the answers, how many do you know?

  • One (or 1)

  • Dab

  • Schlonged ($ other ‘Trumpisms’)

  • Bae

  • Fetch

  • Lit

  • BRUH

  • Fleek

  • Swag

  • Bazinga!

one love

one love

“One” or “1”

In teenspeak, “one” or “1” doesn’t always signify a quantity. It can also mean “One Love” and is used frequently in parting (like “goodbye”). It may be used in person, on the telephone or via digital communication.




You knew this was a verb meaning to pat or tap gently, but that’s not what the kids mean. The recent uptick in “Dab” was inspired by a dance move popularized in a 2014 video by Atlanta rapper Skippa da Flippa. It’s often used as a sort of victory swagger (“Keep dabbin' ... let the haters hate ... Dab on”). Check out this YouTube clip for more.



“Trumped” & “Schlonged”

“Trump” as a noun and as a verb traditionally referred to a stronger hand of cards or other competitive advantage. But due in no small measure to Donald J. Trump the presidential candidate’s ascendancy, various combinations of “Trump” and “Trumped” and several memes and other digital chat have been cropping up with a variety of meanings.

“Trump” has appeared as an adjective describing someone rich or spoiled. A couple of months ago we also saw a renewed interest in “Schlonged” again due to media coverage of candidate Trump. There was some debate on what the actual meaning was. Here again I think the Urban Dictionary is one of the best resources for you to make up your own mind.




According to our analysis, this one seems more popular among women—about twice so—and also somewhat more popular in the Midwest. “Bae” is a pet name for one’s significant other. It may have been derived from “baby” (like “B” and “boo”) or it could be an acronym for “Before Anyone Else.”




It’s not a command for a dog. Think slang predecessors like “cool” or “awesome.”  This one can be traced to the cult hit “Mean Girls”. Ironically, in the film the term never catches on despite one character’s dogged attempts to popularize it.




A hit with the youngest demographic, and skewing somewhat more Northeast regionally, rappers and other musician entertainers have been using “Lit” in recent songs and videos. It can mean a number of things, including that something is “hot” or popular, but also that someone is drunk or high. When used in a phrase like,“It’s Lit,” it means exciting, good or worthwhile. “Come on down, it’s Lit!”




It’s “bro” phonetically tweaked—basically means “buddy” among guys—but it can also be an expression of surprise (and usually as a disappointment) as in “Damn!” The latter use seems to have originated at least partly thanks to a video that appeared on Vine featuring high school basketball star Tony Farmer being sentenced to prison and consequently collapsing.




More popular among younger women, particularly in the South, “fleek” is a synonym for another popular slang phrase, "on point"—basically looking sharp, well-groomed or stylish. Recently, “fleek” has become specifically about eyebrows, in part due to a couple of Instagram videos, and mainstreamed when Kim Kardashian used it to describe a picture of her bleached eyebrows as #EyebrowsOnFleek.




“Swag” may actually already be on the way out, but it’s still quite popular. Derived from “swagger”—the supremely confident style of walking or strutting—“swag” has come to refer generally to an urban style and look associated with Hip-Hop. It could relate to a haircut or shoes, are simply an attitude or presence that exudes confidence and even arrogance.  Example video: Soulja Boy Tell'em - Pretty Boy Swag




This one comes courtesy of “The Big Bang Theory” character Sheldon Cooper and means “Gotcha” or “I fooled you.”

Don’t Let Words Fail You!

I hope you had some fun with this quiz and maybe picked up some new vocabulary, but I’d like to emphasize that slang isn’t the only terminology that changes. Keeping on top of new market entrants, drug names, etc., is important. If you don’t have a technology solution like OdinText that can identify new terms with implications for your business or category, make sure that you at least set up a manual process to regularly check for them.

Until next time – One!



PS. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

What Your Customer Satisfaction Research Isn’t Telling You and Why You Should Care

Why most customer experience management surveys aren’t very useful


Most of your customers, hopefully, are not unhappy with you. But if you’re relying on traditional customer satisfaction research—or Customer Experience Management (CXM) as it’s come to be known—to track your performance in the eyes of your customers, you’re almost guaranteed not to learn much that will enable you to make a meaningful change that will impact your business.Why Are Your Customers Mad At You-revise v2

That’s because the vast majority of companies are almost exclusively listening to happy customers. And this is a BIG problem.

Customer Satisfaction Distribution - Misconception: Most Customer Feedback is Negative

To understand what’s going on here, we first need to recognize that the notion that most customer feedback is negative is a widespread myth. Most of us assume incorrectly that unhappy customers are proportionately far more likely than satisfied customers to give feedback.

... the vast majority of companies are almost exclusively listening to happy customers. And this is a BIG problem.

In fact, the opposite is true. The distribution of satisfied to dissatisfied customers in the results of the average customer satisfaction survey typically follows a very different distribution. Indeed, most customers who respond in a customer feedback program are actually likely to be very happy with the company.

Generally speaking, for OdinText users that conduct research using conventional customer satisfaction scales and the accompanying comments, about 70-80% of the scores from their customers land in the Top 2 or 3 boxes. In other words, on a 10-point satisfaction scale or 11-point likeliness-to-recommend scale (i.e. Net Promoter Score), customers are giving either a perfect or very good rating.

That leaves only 20% or so of customers, of which about half are neutral and half are very dissatisfied.

So My Survey Says Most of My Customers Are Pretty Satisfied. What’s the Problem?

Our careful analyses of both structured (Likert scale) satisfaction data and unstructured (text comment) data have revealed a couple of important findings that most companies and customer experience management consultancies seem to have missed.

We first identified these issues when we analyzed almost one million Shell Oil customers using OdinText over a two-year period  (view the video or download the case study here), and since then we have seen the same trends again and again, which frankly left us wondering how we could have missed these patterns in earlier work.

1.  Structured/Likert scale data is duplicative and nearly meaningless

We’ve seen that there is very little real variance in structured customer experience data. Variance is what companies should really be looking for.

The goal, of course, is to better understand where to prioritize scarce resources to maximize ROI, and to use multivariate statistics to tease out more complex relationships. Yet we hardly ever tie this data to real behavior or revenue. If we did, we would probably discover that it usually does NOT predict real behavior. Why?

2.  Satisficing: Everything gets answered the same way

The problem is that customers look at surveys very differently than we do. We hope our careful choice of which attributes to measure is going to tell us something meaningful. But the respondent has either had the pleasant experience she expected with you  OR in some (hopefully) rare instances a not-so-pleasant experience.

The problem is that customers look at surveys very differently than we do. We hope our careful choice of which attributes to measure is going to tell us something meaningful.

In the former case her outlook will be generally positive. This outlook will carry over to just about every structured question you ask her. Consider the typical set of customer sat survey questions…

  • Q. How satisfied were you with your overall experience?
  • Q. How likely to recommend the company are you?
  • Q. How satisfied were you with the time it took?
  • Q. How knowledgeable were the employees?
  • Q. How friendly were the employees? Etc…

Jane's Experience: Jane, who had a positive experience, answers the first two or three questions with some modicum of thought, but they really ask the same thing in a slightly different way, and therefore they get very similar ratings. Very soon the questions—none of which is especially relevant to Jane—dissolve into one single, increasingly boring exercise.

But since Jane did have a positive experience and she is a diligent and conscientious person who usually finishes what she starts, she quickly completes the survey with minimal thought giving you the same Top 1, 2 or 3 box scores across all attributes.

John's Experience: Next is John, who belongs to the fewer than 10% of customers who had a dissatisfying experience. He basically straightlines the survey like Jane did; only he checks the lower boxes. But he really wishes he could just tell you in a few seconds what irritated him and how you could improve.

Instead, he is subjected to a battery of 20 or 30 largely irrelevant questions until he finally gets an opportunity to tell you his problem in the single text question at the end. If he gets that far and has any patience left, he’ll tell you what you need to know right there.

Sadly, many companies won’t do much if anything with this last bit of crucial information. Instead they’ll focus on the responses from the Likert scale questions, all of which Jane and John answered with a similar lack of thought and differentiation between the questions.

3.  Text Comments Tell You How to Improve

So, structured data—that is, again, the aggregated responses from Likert-scale-type survey questions—won’t tell you how to improve. For example, a restaurant customer sat survey may help you identify a general problem area—food quality, service, value for the money, cleanliness, etc.—but the only thing that data will tell you is that you need to conduct more research.

For those who really do want to improve their business results, no other variable in the data can be used to predict actual customer behavior (and ultimately revenue) better than the free-form text response to the right open-ended question, because text comments enable customers to tell you exactly what they feel you need to hear.

4.  Why Most Customer Satisfaction or NPS Open-End Comment Questions Fail

Let’s assume your company appreciates the importance of customer experience management and you’ve invested in the latest text analytics software and sentiment tools. You’ve even shortened your survey because you recognize that the be Overall Satisfaction (OSAT) and most predictive answers come from text questions and not from the structured data.

You’re all set, right? Wrong.

Unfortunately, we see a lot of clients make one final, common mistake that can be easily remedied. Specifically, they ask the recommended Net Promoter Score (NPS) or Overall Satisfaction (OSAT) open-end follow-up question: “Why did you give that rating?” And they ask only this question.

There’s nothing ostensibly wrong with this question, except that you get back what you ask. So when you ask the 80% of customers who just gave you a positive rating why they gave you that rating, you will at best get a short positive about your business. Those fewer than 10% who slammed you will give you  problem area certainly, but this gives you very little to work with other than a few pronounced problems that you probably knew were important anyway.

What you really need is information that you didn’t know and that will enable you to improve in a way that matters to customers and offers a competitive advantage.

An Easy Fix

The solution is actually quite simple: Ask a follow-up probe question like, “What, if anything, could we do better?”

This can then be text analyzed separately, or better yet, combined with the original text comment which as mentioned earlier usually reads Q. “Why did you give the satisfaction score you gave? And due to the Possion distribution in customer satisfaction yields almost only positive comments with few ideas for improvement.  This one-two question combination when text analyzed together fives a far more complete picture to the question about how customer view your company and how you can improve.

Final Tip: Make the comment question mandatory. Everyone should be able to answer this question, even if it means typing an “NA” in some rare cases.

Good luck!

Ps. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior,  please contact us or request a Free Demo here >


[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips ]

Peaks and Valleys or Critical Moments Analysis

Text Analytics Tips - Branding Peaks and Valleys or Critical Moments Analysis Text Analytics Tips by Gosia


 How can you gain interesting insights just from looking at descriptive charts based on your data? Select a key metric of interest like Overall Satisfaction (scale 1-5) and using a text analytics software allowing you to plot text data as well as numeric data longitudinally (e.g., OdinText) view your metric averages across time. Next, view the plot using different time intervals (e.g, the plot could display daily, weekly, bi-weekly, or monthly overall satisfaction averages) and look for obvious “peaks” (sudden increases in the average score) or “valleys” (sudden decreases in the average score). Note down the time periods in which you have observed any peaks or valleys and try to identify reasons or events associated with these trends, e.g., changes in management, a new advertising campaign, customer service quality, etc. The next step is to plot average overall satisfaction scores for selected themes and see how they relate to the identified “peaks” or “valleys” as these themes may provide you with potential answers to the identified critical moments in your longitudinal analysis.

In the figure below you can see how the average overall satisfaction of a sample company varied during approximately one month of time (each data point/column represents one day in a given month). Whereas no “peaks” were found in the average overall satisfaction curve, there was one significant “valley” visible at the beginning of the studied month (see plot 1 in Figure 1). It represented a sudden drop from the average satisfaction of 5.0 (day 1) to 3.1 (day 2) and 3.5 (day 3) before again rising up and oscillating around the average satisfaction of 4.3 for the rest of the days that month. So what could be the reason for this sudden and deep drop in customer satisfaction?

Text Anaytics Tip 2a OdinText Gosia

Text Analytics Tip 2b Gosia OdinText

Text NAalytics Tip 2c Gosia OdinText

Figure 1. Annotated OdinText screenshots showing an example of a exploratory analysis using longitudinal data (Overall Satisfaction).

Whereas a definite answer requires more advanced predictive analyses (also available in OdinText), a quick and very easy way to explore potential answers is possible simply by plotting the average satisfaction scores associated with a few themes identified earlier. In this sample scenario, average satisfaction scores among customers who mentioned “customer service” (green bar; second plot) overlap very well with the overall satisfaction trendline (orange line) suggesting that customer service complaints may have been the reason for lowered satisfaction ratings on days 2 and 3. Another theme plotted, “fast service” (see plot 3), did not at all follow the overall satisfaction trendline as customers mentioning this theme were highly satisfied almost on every day except day 6.

This kind of simple exploratory analysis can be very powerful in showing you what factors might have effects on customer satisfaction and may serve as a crucial step for subsequent quantitative analysis of your text and numeric data.


Text Analytics Tips with Gosi


[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Brand Analytics Tips – How Old is Your Brand?

Text Analytics Tips Text Analytics Tips Answers, How Old Is Your Brand? - Using OdinText on Brand Mention Type Comment Data By Tom H. C. Anderson

[METHODOLOGICAL NOTES (If you’re not a researcher feel free to skip down to ‘Brands & Age’ section below): In our first official Text Analytics Tips I’ve started with exploring one of the arguably simplest types of unstructured/text data there is, the unaided top-of-mind ‘brand mention’ open-ended survey question. These kinds of questions are especially important to brand positioning, brand equity, brand loyalty and advertising effectiveness research. In this case we’ve allowed for more than one brand mention. The questions reads “Q. When you think of brand names, what company’s product or service brand names first come to mind? [Please name at least 5]”. The question was fielded to n=1,089 US Gen Pop Representative survey respondents in the CriticalMix Panel in December of 2015. The confidence interval is +/-2.9% at the 95% confidence level]

Making Good Use Comment Data Can Be Easy and Insightful

An interesting and rather unique way to look at your brand is to understand for whom it is most likely to be top-of-mind.

Unfortunately, though they have proven more accurate than structured choice or Likert scale rating questions in predicting actual behavior, free form (open end) survey questions are rare due to the assumed difficulty in analyzing results.  Even when they are included in a survey and analyzed, results are rarely expressed in anything more useful than a simple frequency ranked table (or worse, a word cloud). Thanks to the unique patented approach to unstructured and structured data in OdinText, analyzing this type of data is both fast and easy, and insights are only limited to the savviness of the analyst.

The core question asked here is rather simple i.e. “When you think of brand names, what company’s product or service brand names first come to mind?”. However, asking this question to over a thousand people, because of the share volume of brands that will be mentioned (in our case well over 500), even this ‘small data’ can seem overwhelming in volume.

The purpose of this post is to show you just how easy/fast yet insightful analysis of even more specific and technically more basic comment data can be using Next Generation Text AnalyticsTM.

After uploading the data into OdinText, there are numerous ways to look at this comment data, not only the somewhat more obvious frequency counts, but also several other statistics including any interesting relationships to available structured data. Today we will be looking at how brand mentions are related to just one such variable, the age of the respondent. [Come back tomorrow and we may take a look at a few other statistics and variable relationships.]

Text Analytics Tips Age OdinText

Brands by Age

Below is a sortable list of the most frequently mentioned brands ranked by the average age of those mentioning said brand. This is a direct export from OdinText. The best way to think about lists like these is comparatively (i.e. how old is my brand vs. other brands?). If showing a table such as this in a presentation I would highly recommend color coding which can be done either in OdinText (depending on your version), or in excel using the conditional formatting tool.

[NOTE: For additional analytics notes and visualizations please scroll to the bottom of the table below]


Brand Name Average Age
Maxwell House 66
Hunts 66
Aspirin 66
Chrysler 64.6
Stouffers 63.7
Marie Callender's 63.7
Walgreen 63.7
Cooper (Mini) 63.7
Bayer 62.6
USAA 62.5
Epson 62.5
Brother 61.3
Aol 61.3
Comet 61.3
Snapple 61.3
Lowes 61.2
Marriott 60.3
Ritz 60.3
Hellman's 60.3
Ikea 60.3
Belk 60.3
State Farm 60.3
Oscar Mayer 60
Folgers 59.8
Libby's 59.8
Hormel 59.2
Depot 59.2
Heinz 59.2
Electric 59.2
Bordens 59.2
Nestles 59
Green Giant 59
Sargento 58.3
Del Monte 58
Prego 58
Kashi 58
Westinghouse 58
Stouffer 58
Taylor 58
Home Depot 57.6
Publix 57.5
Banquet (Frozen Dinners) 57.5
Buick 57
Krogers 57
Hellman's 57
Safeway 56.5
Purex 56.4
Hewlett 56.4
Unilever 56.1
RCA 56.1
Post 56.1
P&G 55.9
Budweiser 55.9
Yoplait 55.8
Chobani 55.7
Ragu 55.7
Campbell's 55.5
Wells Fargo 55.2
Hershey 55.1
Betty Crocker 55
Sharp 55
Hines 55
Trader Joe's 55
Palmolive 54.9
Kia 54.7
Lexus 54.7
Life 54.7
Hotpoint 54.7
Campbells 54.6
Oscar Mayer 54.5
Dial 54.4
Nissan 54.4
Hillshire Farms 54.3
Motorola 54.1
Keebler 54
CVS 53.8
Canon 53.8
Lakes 53.7
Pillsbury 53.3
Hilton 53.3
Faded Glory 53.3
Friskies 53.3
Duncan Hines 53.3
Puffs 53.3
Olay 52.8
Sketchers 52.5
Fred Meyer 52.5
Delta 52.5
Hunt 52.3
Bose 52.3
Ocean Spray 52.3
Ivory 52.3
Swanson 52.3
Dewalt 52.3
Firestone 51.8
Estee Lauder 51.5
Miller 51.5
Tide 51.4
Honda 51.3
Meijer 51.3
Perdue 51.3
Jeep 51.3
Head 51.3
Lee Jeans 51.3
Pantene 51
Chevrolet 51
Cannon 50.8
Chef Boyardee 50.8
Frito Lay 50.6
Avon 50.5
Motors 50.4
Kodak 50.4
General Mills 50.2
BMW 50
Lipton 49.8
Kohl's 49.8
Goodyear 49.7
Kraft 49.6
Craftsman 49.5
Sunbeam 49.4
IBM 49.3
Frigidare 49.1
Sears 49.1
Ford 49.1
Walgreens 49.1
Dole 49.1
Chevy 49
Wonder (Bread) 49
Dannon 49
JVC 49
Hyundai 49
Clinique 49
Marlboro 49
Mercedes 49
Gerber 49
Acme 49
Kleenex 48.8
Kelloggs 48.7
JC Penney 48.6
Louis Vuitton 48.5
Calvin 48.4
LL Bean 48.4
Gillette 48.4
Johnson & Johnson 48.3
Shell 48.3
Kenmore 48.1
Dawn 48
Hanes 48
Macdonalds 48
Tylenol 48
Colgate 47.5
Wrangler (Jeans) 47.3
Burger King 47.3
Whirlpool 47.1
GMC 47
Yahoo 46.9
Dish Network 46.8
Verizon 46.7
Hersheys 46.6
Whole Foods 46.5
Sara Lee 46.5
Hostess 46.5
Mazda 46.5
Toyota 46.4
Arm & Hammer 46.4
Nabisco 46.3
Tyson 46.1
Starbucks 46
Wal-Mart 45.9
Western Family 45.8
Wegmans 45.8
Dr Pepper 45.7
Hulu 45.7
Time Warner 45.7
Maybelline 45.7
MLB 45.7
Iams 45.7
Cox 45.7
Country Crock 45.7
Compaq 45.7
Sonoma 45.7
Quaker Oats 45.7
Nordstrom 45.4
Coca 45.3
Champion 45.3
Bass 45
Chrome 44.7
Coors 44.7
iPhone 44.6
Bounty 44.5
Dodge 44.4
Maytag 44.3
Black & Decker 44.2
Pfizer 44.2
Suave 44.2
HP 44
Scott 44
Subway 44
Skechers 44
Geico 44
Panasonic 43.9
Lays 43.8
KFC 43.8
Charmin 43.8
Dell 43.8
Polo 43.8
Windex 43.7
Burts Bees 43.5
Purina 43.5
Clorox 43.5
Columbia 43.3
Ralph Lauren 43.2
Visa 43.2
Pepsi 43
Crest 43
NFL 43
Sanyo 43
Dove 42.9
Intel 42.9
Wendy's 42.8
Kroger 42.8
Remington 42.3
Phillips 42.3
Mars 42.3
Cover Girl 42.3
Heb 42.3
Twitter 42.3
Amazon 42
Body Works 42
Best Buy 41.8
Costco 41.8
Banana Republic 41.8
Disney 41.7
Amway 41.7
Levi 41.5
Sony 41.4
Samsung 41.4
Macy's 41.1
Glade 41.1
Boost 41
Boost Mobile 41
Toshiba 40.8
Ebay 40.8
Comcast 40.7
Facebook 40.6
Walmart 40.5
Microsoft 40.5
Google 40.4
Kitchen 40.4
Nestle 39.8
Mcdonalds 39.5
Gucci 39.5
Vons 39.3
Philip Morris 39.3
Loreal 39.3
Mattel 39.1
Apple 39
Pepperidge Farm 39
Vizio 39
Lysol 39
Ugg 39
Tropicana 39
Sure 39
Fila 39
Tmobile 39
Coach 38.9
Acer 38.8
Tommy Hilfiger 38.6
Nike 38.1
Target 38
Old Navy 37.9
Chase 37.8
Michael Kors 37.7
K-Mart 37.5
Lenovo 37.5
Equate 37.2
Hoover 36.8
Under Armour 36.6
Windows 36.5
Asics 36.5
Kitchenaid 36.5
Victoria's Secret 36.2
Mac 36.1
Reebok 36.1
Android 36
Direct TV 36
Sprint 36
Netflix 35.9
Adidas 35.7
Citizen 35.7
New Balance 35.6
Guess 35.4
Bic 35.2
Great Value 35.2
Pizza Hut 35
Puma 34.9
Asus 34.4
Fox 34.3
Justice 34.3
North Face 34.1
Xbox 33.6
Gap 33.4
Doritos 33.4
HTC 33.4
Converse 33.3
Sprite 33.2
Febreeze 33
Axe 33
Kay 32.7
Glad 32.7
Mary Kay 32.7
Viva 32.7
Reese's 31.8
Lego 31.7
Amazon Prime 31.5
Nintendo 31.2
Vans 31.2
Taco Bell 31
Fisher Price 30.4
Chanel 29.7
Old Spice 29.7
Playstation 29.4
Eagle 29.4
Hamilton Beach 29.3
Footlocker 29.3
Pink 29.3
Swiffer 29.3
Timberlands 29.3
Naked Juice 29
Youtube 29
Bing 29
Air Jordans 28.4
Huggies 28.2
Aeropostale 27.7
Hollister 27.3
Prada 27.3
Carters 26.8
Kirkland 26.3
Forever 26.3
Aeropostle 26.3
Arizona 25.6
Pampers 24.5
Versace 24.5
Urban Outfitters 24.5


A few interesting points from the longer list of brands are:

The oldest brand, “Maxwell House Coffee”, has an average age of 66. (If anything, this mean age is actually conservative, as the age question gets coded as 66 for anyone answering that they are “65 or older”). This is a typical technique in OdinText, choosing the mid-point to calculate the mean if the data are in numeric ranges, as is often the case with survey or customer entry form based data.

The Youngest brand on the list, “Urban Outfitters”, with an average age of 24 also probably skews even younger in actuality for the same reason (as is standard in studies representative of the US General Population, typically only adults aged 18+ are included in the research).

Dr Pepper is in the exact middle of our list  (46 years old). Brands like Dr. Pepper which are in the middle (with an average age close to the upper range of Generation X) are of course popular not just among those 46 years old, but are likely to be popular across a wider range of ages. A good example, Coca-Cola also near the middle, mentioned by 156 people with an average age of 45, is pulling from both young and old. The most interesting thing then, as is usual in almost any research, is comparative analysis. Where is Pepsi relative to Coke for instance? As you might suspect, Pepsi does skew younger, but only somewhat younger on average, mentioned by 107 consumers yielding an average for the brand of 43. As is the case with most data, relative differences are often more valuable than specific values.

If there are any high level category trends here related to age, they seem to be that Clothing brands like Urban Outfitters and Versace (both with the youngest average age of 24), Aeropostale (26), and Forever 21 (Ironically with an average age of 26), and several others in the clothing retail category tend to skew very young. Snack Food especially drinks like Arizona Ice Tea (age 25), and Naked Juice (29), as well as web properties (Bing and YouTube both 29), and electronics (obviously PlayStation 29 and slightly older Nintendo 31 being examples), are associated with a younger demographic on average.

In the middle age group, other than products with a wide user base like major soda brands, anything related to the home, either entertainment like Time Warner Cable or even Hulu (both 45), or major retailers like Wegmans and Wal-Mart (also both 45), are likely to skew more middle age.

The scariest position for a brand manager is probably at the top of the list, with average age for Maxwell House, and Hunts (both 66), Stouffers and Marie Callender's (both 64), the question has got to be, who will replace my customer base when they die? What we see by looking at the data are in fact that a slight negative correlation between age and number of mentions.

Again, it’s often the comparative differences that are interesting to look at, and of course the variance. Take Coca-Cola VS Pepsi for instance, while their mean ages are surprisingly close to each other at 45 and 43 respectively, looking at the variance associated with each gives us the spread (i.e. which brand is pulling from a broader demographic). Coca-Cola with a standard deviation of 14.5 years for instance is pulling from a wider demographic than Pepsi which as a standard deviation of 12.9 years. There are several ways to visualize these data and questions in OdinText, though some of our clients also like to use OdinText output in visualization software like Tableau which can have more visualization options, but little to no text analytics capabilities.

Co-Occurrence (aka Market Basket Analysis)

Last but not least, looking at which brands are often mentioned together, either because they are head to head competitors going after the exact same customers or because there may be complimentary (market basket analysis type opportunities if you will) can also certainly be interesting to look at. Brands that co-occur frequently (are mentioned by the same customers), and are not competitors may in fact represent interesting opportunities for ‘co-opetition’.  You may have noticed more cross category partnering on advertising recently as marketers seem to be catching on to the value of joining forces in this manner. Below is one such visualization created using OdinText with just the Top 20 brand mentions visualized in an x-y plot using multi-dimensional scaling (MDS) to plot co-occurrence of brand names.

Text Analytics of Brands with OdinText

Hope you enjoyed today’s discussion of a very simple text question and what can be done with it in OdinText. Come back again soon as we will be giving more tips and mini analysis on interesting mixed data. In fact, if there is significant interest in today’s post we could look at one or two other variables and how they relate to brand awareness comment data tomorrow.

Of course if you aren’t already using OdinText, please feel free to request a demo here.


OdinText Wins 2015 CASRO Research Award

CASRO Honors OdinText’s Innovative Next Generation Text Analytics Software at 40th Annual Conference OdinText, a provider of cloud-based analytics software, today announced that its Next Generation Text Analytics software-as-a-service (SaaS) product, has been awarded the Research Entrepreneur of the Year award by CASRO, an organization that represents more than 300 companies and market research operations.

The award honors organizations that—through the excellence of their work, professionalism of their practice, and integrity of their conduct— exemplify the best work in the research industry. The award also acknowledges an organization that has introduced a new direction or service to its research business portfolio and provides leading-edge and innovative services that expand traditional market, opinion, and social research.

Recognized for its patented SaaS technology, OdinText allows companies to analyze large amounts of unstructured and mixed data. OdinText can be used across various types of data including but not limited to survey research, email and telephone data, discussion board ratings, and news articles.

“At OdinText, we don’t see a difference between structured and unstructured data - text mining and data mining – they are far more meaningful together,” said Tom H. C. Anderson, CEO of OdinText. “We are honored to be recognized by CASRO, an organization that has such a long history of championing innovative and sound research techniques.”

In addition to exploring patterns in the data and allowing users to confirm hypothesis, OdinText suggests key relationships in the data that may be overlooked by the user. The software also allows for one-step simulation and predictive analytics.

“Marketing research is evolving, getting both broader and deeper in terms of skill sets needed to succeed,” said Jim DeMarco, vice president of business intelligence and analytics at FreshDirect. “OdinText provides researchers with the capability to access more advanced analysis quicker and helps the business they work on gain an information advantage. This is exactly the kind of innovation our industry needs right now.”

The Coca-Cola Company as well as online grocer, FreshDirect sponsored OdinText’s nomination and the company received the award at CASRO’s 40th Annual Conference, in addition to the $5,000 prize.

“The work of OdinText is indicative of the exciting new methodologies and technologies which are having an increased influence on our changing industry,” said Diane Bowers, president of CASRO. “Acknowledgement of this type of work and the financial support that accompanied this honor highlights our role as a leader in the future of our industry.”


About OdinText Inc. OdinText’s Next Generation Text AnalyticsTM turns market researchers into data scientists. The powerful cloud-based software helps users discover patterns and trends in complex unstructured text data. Visit to learn more or schedule a demo. Backed by Connecticut Innovations and private investors, OdinText is a privately-held company based in Stamford, Conn. Request more information here.

Mr Big Data VS. Mr Text Analytics

[Interview re-posted w/ permission from Text Analytics News]

Mr. Big Data & Mr. Text Analytics Weigh In Structured VS. Unstructured Big Data


kirk_borne Text Analytics News

If you pay attention to Big Data news you’re sure to have heard of Kirk Borne who’s well respected views on the changing landscape are often shared on social media. Kirk is professor of Astrophysics and Computational Science at George Mason University. He has published over 200 articles and given over 200 invited talks at conferences and universities worldwide. He serves on several national and international advisory boards and journal editorial boards related to big data



tom_anderson Text Analytics News

Tom H. C. Anderson was an early champion of applied text analytics, and gives over 20 conference talks on the topic each year, as well as lectures at Columbia Business School and other universities. In 2007 he founded the Next Gen Market Research community online where over 20,000 researchers frequently share their experiences online. Tom is founder of Anderson Analytics, developers of text analytics software as a service OdinText. He serves on the American Marketing Association’s Insights Council and was the first proponent of natural language processing in the marketing research/consumer insights field.


Ahead of the Text Analytics Summit West 2014, Data Driven Business caught up with them to gain perspectives on just how important and interlinked Big Data is with Text Analytics.


Q1. What was the biggest hurdle that you had to overcome in order to reach your current level of achievement with Big Data Analytics?

KB: The biggest hurdle for me has consistently been cultural -- i.e., convincing others in the organization that big data analytics is not "business as usual", that the opportunities and potential for new discoveries, new insights, new products, and new ways of engaging our stakeholders (whether in business, or education, or government) through big data analytics are now enormous.

After I accepted the fact that the most likely way for people to change their viewpoint is for them to observe demonstrated proof of these big claims, I decided to focus less on trying to sell the idea and focus more on reaching my own goals and achievements with big data analytics. After making that decision, I never looked back -- whatever successes that I have achieved, they are now influencing and changing people, and I am no longer waiting for the culture to change.

THCA: There are technical/tactical hurdles, and methodological ones. The technical scale/speed ones were relatively easy to deal with once we started building our own software OdinText. Computing power continues to increase, and the rest is really about optimizing code.

The methodological hurdles are far more challenging. It’s relatively easy to look at what others have done, or even to come up with new ideas. But you do have to be willing to experiment, and more than just willingness, you need to have the time and the data to do it! There is a lot of software coming out of academia now. They like to mention their institution in every other sentence “MIT this” or ‘UCLA that”. The problem they face is twofold. On the one hand they don’t have access to enough real data to see if their theories play out. Secondly, they don’t have the real world business experience and access to clients to know what things are actually useful and which are just novelty.

So, our biggest hurdle has been the time and effort invested through empirical testing. It hasn’t always been easy, but it’s put me and my company in an incredibly unique position.

Q2. Size of data, does it really matter? How much data is too little or too much?

THCA: Great question, with text analytics size really does matter. While it’s technically possible to get insights from very small data, for instance on our blog during the elections one of my colleagues did a little analysis of Romney VS. Obama debate transcripts, text analytics really is data mining, and when you’re looking for patterns in text, the more data you have the more interesting relationships you can find.

KB: Size of data doesn't really matter if you are just getting started. You should get busy with analytics regardless of how little data you have. The important thing is to identify what you need (new skills, technologies, processes, and data-oriented business objectives) in order to take advantage of your digital resources and data streams. As you become increasingly comfortable with those, then you will grow in confidence to step up your game with bigger data sets. If you are already confident and ready-to-go, then go! The big data revolution is like a hyper-speed train -- you cannot wait for it to stop in order to get on board -- it isn't stopping or slowing down! At the other extreme, we do have to wonder if there is such a thing as too much data. The answer to this question is "yes" if we dive into big data's deep waters blindly without the appropriate "swimming instruction" (i.e., without the appropriate skills, technologies, processes, and data-oriented business objectives). However, with the right preparations, we can take advantage of the fact that bigger data collections enable a greater depth of discovery, insight, and data-driven decision support than ever before imagined.

Q3. What is the one thing that motivates and inspires you the most in your Big Data Analytics work?

KB: Discovery! As a scientist, I was born curious. I am motivated and inspired to ask questions, to seek answers, to contemplate what it all means, and then to ask more questions. The rewards from these labors are the discoveries that are made along the way. In data analytics, the discoveries may be represented by a surprising unexpected pattern, trend, association, correlation, event, or outlier in the data set. That discovery then becomes an intellectual challenge (that I love): What does it mean? What new understanding does this discovery reveal about the domain of study (whether it is astrophysics, or retail business, or national security, or healthcare, or climate, or social, or whatever)? The discovery and the corresponding understanding are the benefits of all the hard work of data wrangling.

THCA: Anyone working with analytics has to be curious by nature. Satisfying that curiosity is what drives us. More specifically in my case, if our clients get excited about using our software and the insights they’ve uncovered, then that really gets me and my whole team excited. This can be challenging, and not all data is created equal.

It can be hard to tell someone who is excited about trying Text Analytics that their data really isn’t suitable. The opposite is even more frustrating though, knowing that a client has some really interesting data but is apprehensive about trying something new because they have some old tools lying around that they haven’t used, or because they have a difficult time getting access to the data because it’s technically “owned” by some other department that doesn’t ‘Get’ analytics. But helping them build a case and then helping them look good by making data useful to the organization really feeds into that basic curiosity. We often discover problems to solve we had no idea existed. And that’s very inspiring and rewarding.

Q4. Which big data analytics myth would you like to squash right here and now?

KB: Big data is not about data volume! That is the biggest myth and red herring in the business of big data analytics. Some people say that "we have always had big data", referring to the fact that each new generation has more data than the previous generation's tools and technologies are able to handle. By this reasoning, even the ancient Romans had big data, following their first census of the known world. But that's crazy. The truth of big data analytics is that we are now studying, measuring, tracking, and analyzing just about everything through digital signals (whether it is social media, or surveillance, or satellites, or drones, or scientific instruments, or web logs, or machine logs, or whatever). Big data really is "everything, quantified and tracked". This reality is producing enormously huge data volumes, but the real power of big data analytics is in "whole population analysis", signaling a new era in analytics: the "end of demographics", the diminished use of small samples, the "segment of one", and a new era of personalization. We have moved beyond mere descriptive analysis, to predictive, prescriptive, and cognitive analytics.

THCA: Tough one. There are quite a few. I’ll avoid picking on “social media listening” for a bit, and pick something else. One of the myths out there is that you have to be some sort of know it all ‘data scientist’ to leverage big data. This is no longer the truth. Along with this you have a lot dropping of buzz words like “natural language processing” or “machine learning” which really don’t mean anything at all.

If you understand smaller data analytics, then there really is no reason at all that you shouldn’t understand big data analytics. Don’t ever let someone use some buzz word that you’re not sure of to impress you. If they can’t explain to you in layman’s terms exactly how a certain software works or how exactly an analysis is done and what the real business benefit is, then you can be pretty sure they don’t actually have the experience you’re looking for and are trying to hide this fact.

Q5.What’s more important/valuable, structured or unstructured data?

KB: Someone said recently that there is no such thing as unstructured data. Even binary-encoded images or videos are structured. Even free text and sentences (like this one) are structured (through the rules of language and grammar). Even some meaning this sentence has. One could say that analytics is the process of extracting order, meaning, and understanding from data. That process is made easier when the data are organized into databases (tables with rows and columns), but the importance and value of the data are inherently no more or no less for structured or unstructured data. Despite these comments, I should say that the world is increasingly generating and collecting more "unstructured data" (text, voice, video, audio) than "structured data" (data stored in database tables). So, in that sense, "unstructured data" is more important and valuable, simply because it provides a greater signal on the pulse of the world. But I now return to my initial point: to derive the most value from these data sources, they need to be analyzed and mined for the patterns, trends, associations, correlations, events, and outliers that they contain. In performing that analysis, we are converting the inherent knowledge encoded in the data from a "byte format" to a "structured information format". At that point, all data really become structured.

THCA: A trick question. We all begin with a question and relatively unstructured data. The goal of text analytics is structuring that data which is often most unstructured.

That said, based on the data we often look at (voice of customer surveys, call center and email data, various other web based data), I’ve personally seen that the unstructured text data is usually far richer. I say that because we can usually take that unstructured data and accurately predict/calculate any of the available structured data metrics from it. On the other hand, the unstructured data usually contain a lot of additional information not previously available in the structured data. So unlocking this richer unstructured data allows us to understand systems and processes much better than before and allows us to build far more accurate models.

So yes, unstructured/text data is more valuable, sorry.

Q6. What do you think is the biggest difference between big data analysis being done in academia vs in business?

KB: Perhaps the biggest difference is that data analysis in academia is focused on design (research), while business is focused on development (applications). In academia, we are designing (and testing) the optimal algorithm, the most effective technique, the most efficient methodology, and the most novel idea. In business, you might be 100% satisfied to apply all of those academic results to your business objectives, to develop products and services, without trying to come up with a new theory or algorithm. Nevertheless, I am actually seeing more and more convergence (though that might be because I am personally engaged in both places through my academic and consulting activities). I see convergence in the sense that I see businesses who are willing to investigate, design, and test new ideas and approaches (those projects are often led by data scientists), and I see academics who are willing to apply their ideas in the marketplace (as evidenced by the large number of big data analytics startups with university professors in data science leadership positions). The data "scientist" job category should imply that some research, discovery, design, modeling, and hypothesis generation and testing are part of that person's duties and responsibilities. Of course, in business, the data science project must also address a business objective that serves the business needs (revenue, sales, customer engagement, etc.), whereas in academia the objective is often a research paper, or a conference presentation, or an educational experience. Despite those distinctions, data scientists on both sides of the academia-business boundary are now performing similar big data analyses and investigations. Boundary crossing is the new normal, and that's a very good thing.

THCA: I kind of answered that in the first question. I think academics have the freedom and time to pursue a research objective even if it doesn’t have an important real outcome. So they can pick something fun, that may or may not be very useful, such as are people happier on Tuesdays or Wednesday’s? They’ll often try to solve these stated objectives in some clever ways (hopefully), though there’s a lot of “Pop” research going on even in academia these days. They are also often limited in the data available to them, having to work with just a single data set that has somehow become available to them.

So, academia is different in that they raise some interesting fun questions, and sometimes the ideas borne out of their research can be applied to business.

Professional researchers have to prove an ROI in terms of time and money. Of course, technically we also have access to both more time and more money, and also a lot more data. So an academic team of researcher working on text analytics for 2-3 years is not going to be exposed to nearly as much data as a professional team.

That’s also why academic researchers often seem so in love with their models and accuracy. If you only have a single data set to work with, then you split it in half and use that for validation. In business on the other hand, if you are working across industries like we do, while we certainly may build and validate models for a specific client, we know that having a model that works across companies or industries is nearly impossible. But when we do find something that works, you can bet it’s going to be more likely to be useful.

Top 10 Big Data Analytics Tips

Top10AnalyticsTips As part of the interview series leading up to the Useful Business Analytics Summit today we post the Top 10 Tips from our analytics experts. Whether you are data mine more structured data, or like myself more often work with unstructured or mixed data using text analytics, I think you’ll agree that the following 10 tips are critical.

  1. Keep It [ridiculously] Simple (10 times more so than is necessary to get your point across).
  2. Hypothesize/Put Problem First
  3. Don’t Assume Data is Good – Check/Validate!
  4. Automate repeat tasks & Carve out time to go exploring
  5. Set a Data Strategy – don’t just collect data for the sake of collecting it
  6. In a rapidly expanding field, work with people on the leading edge
  7. Be a Skeptic about models etc.
  8. Look for the pragmatic and cost effective solutions
  9. Don’t torture Data – in the end it will confess
  10. Think like a Business Owner – what would you like to know?

Below are more detailed tips from some of our client experts. We’d love to hear you tips if you’ve got one to add in the comments section.



Honestly, I think I’d boil it down to a single tip that is more important than all others, in my experience, but is the one most ignored and poorly executed. Keep it simple. Ridiculously simple. Ten times more simple than what you think necessary. Just about then, you are actually getting your point across in a way that people are starting to follow you. You can always increase the complexity from there, but the first time you have an experience and realize that you’ve actually conveyed a complex analytical presentation to a group of C-suite execs, you’ll understand what you’ve been doing wrong this whole time before. Hint – those head nods and blank stares aren’t what you are looking for…


- Understand that any problem is easier if you approach it correctly don't necessarily take a cookie cutter approach. Conventional wisdom is not so wise in a rapidly evolving field.

- Work with people who are able to work on the leading edge ...the people who are helping expand the envelope.



Automate anything you do more than once. It’s very easy to fill your time with routine pulls of data which lie just beyond the reach of the visualization tools available to business stakeholders. You can’t ignore these requests and it frankly feels great for us geeks to bask in the gratitude of camera-ready cool kids, but these tasks may not represent the highest-value use of your time. The more experience you have with the data, the more likely you are to be the only person with eyes on a particular business problem. So carve out time to go exploring. Think entrepreneurially like a business owner, and ask yourself “if I owned this P&L, what would I want to know?”


  -Ensure there is a purpose you understand of why analytics is valuable to the organization. Purpose can be a business sponsor like discovering new ways (i.e. products, markets, etc.) to increase revenue, retention, profit, or control costs. So ask the tough questions and align with executives mandates.

-Ensure clarity around the level of effort you spend gathering data vs. designing experiments, mining and analyzing data. The need / urge to have data to accomplish a specific task can lead to disparate / disjointed data gathering and management effort that can take over the data scientist or analytics professional work and analytics can become a second thought. So be a sponsor or an advocate for a data strategy.


1) Don't assume the data is good. Is the data lineage (with transformation rules) exposed? Is data quality measured and reportable as a trend?

2) Hypothesize and/or uncover non-time-based relationships: These are usually the richest.



Double check your results using data from different sources

Make sure it makes sense

In case of discrepancies use it directionally

Reach out to experts to obtain their opinion



1. Think of the broader perspective. Take a step back. Understand the business and the problem before jumping into solutions.

2. Be an analyst: Adopt a critical approach to thinking all analytical problems. There is nothing wrong with a slight dose of skepticism about models and results. It is healthy.

3. Try to find pragmatic and cost-effective models / solutions. For example you can probably do machine learning and neural networks to solve a lot of problems but a linear regression might sometimes be enough.



 1. Be humble: sometimes data tells us nothing or, worse, will lie to us. Cognitive dissonance is the norm rather than the exception.

2. If you torture data it will confess to any sins (attributed to Frank Harrell).

3. Go ahead, ask questions, be curious, don't be afraid to cross cultures.


Big thanks again to our client side analytics experts. Feel free to check out our previous questions on Big Data and How to Keep Up on Analytics. Don’t forget to check back in for our next question about the value of various types of data… Look forward to seeing you at the Summit!




[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]

Analytics - Keeping Up to Date

AnayticsImage Today I asked a few of the Useful Business Analytics Summit speakers how they keep up to date in their fields.


The answers range from the more common and expected such as conferences, blogs and books to the somewhat more novel such as, Coursera and MIT open courseware. LinkedIn groups were also mentioned as an excellent resource, and I’m sure many Next Gen Market Researchers would agree.

One of our speakers, Jonathan from Travelocity, also brought up some more specific resources such as Tnooz and Skift. Though I do a lot of work in travel and hospitality industry, these were new to me. I’d love to hear what sources you find useful in the comments section.


Of course there are aplenty of amazing resources internally within the company for us. A lot of people on our team also takes targeted classes at Stanford and Berkeley. There is also MIT open courseware for math and computer science that is very useful.


It might not be the most orthodox answer, but for me I’ve found switching jobs & industries somewhat frequently has kept me more informed on the latest analytical trends and tools more than anything else.

More conventionally, if you work in a larger city – and you don’t already have a network of friends who do the same work you do, take advantage of the endless opportunities for after-work cocktails and meetup events. Sometimes it’s as simple as a short conversation where you bring up your challenge that day -- perhaps it's how to collect names on a Facebook contest that is going live the next day, and the person who you just clinked glasses with suggests a vendor and saves you hours of research and vetting.


I don’t absorb new ideas deeply unless I can really dig into them and get my hands dirty, so I’ve been taking advantage of the Coursera Big Data courses, which are outstanding. LinkedIn has dozens of interest groups focused on data/analytics, which also have the advantages of connecting you with people who can advance your career and (if you participate constructively) raising your visibility to recruiters. In the travel industry generally, Tnooz and Skift are useful.


Partnerships with vendors, participation in financial services social networks (e.g., in LinkedIn) and forums and summits like the one I am attending in June 10 and 11th - Useful Business Analytics Summit.



Talking with leading practitioners because it's not enough to keep up with the latest approaches have to understand ways to implement those approaches that gives you most of the value for a fraction of the work ...80/20 rule.


Primary and secondary research on market trends, future intentions and competitive intelligence.

Primary research has to be performed in both channels: off-line and on-line

Internal financial performance analytics, customer data base analytics.

It has to be a combination of tools!


Attending conferences gets me thinking out of my norm, which promotes more lateral thinking and helps identify holes in my linear thinking.


Blogs and books coupled with

interesting problems to solve. I also like to follow what a few of the

titans in my fields are working on.


Thanks again to all the speakers for their thoughtful answers. In case you missed it, yesterday’s question was on Big Data. Stop by next week for our speakers list of top tips for analytics professionals!

@TomHCAnderson @OdinText



[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]

What We’ve Got All Wrong About Big Data

Analytics Experts on Big Data Misconceptions: Big Data isn’t difficult, it isn’t expensive, but it does require thinking! BigDataExperts


As mentioned yesterday, in preparation for the Useful Business Analytics Summit, so that fellow attendees will know more about each other and the event I’ll be posting a series of questions related to analytics here on the blog. Today’s question is about Big Data:






Eight of our speakers responded to today’s question below. While answers are varied I agree with several of the thoughts here. Big Data does not have to be as difficult or expensive as some seem to believe. However, useful analysis certainly does require serious thinking regardless of data size.



The biggest misconception to me is that it has to be complex; that to extract meaningful insights from big data requires complex modeling, etc. In reality, I’ve found that the most successful ways to leverage big data and show its value in the workplace are no different from typical relational datasets. You still need to understand what sort of story you want to be able to tell and condense the analysis you perform to answer those questions. The data is complex enough, there’s no real reason to over-complicate (especially at first) with models and algorithms that your senior leadership team likely won’t understand. Start simple and build credibility for your practice.



By and large Big Data means something different to everyone, which means that one person's conception is another's misconception



That it is nothing new, simply a linear extension of the historical trend of data storage getting cheaper and cheaper. And the troubling thing is, this line comes from some of the wisest and best-qualified elder statesmen of the data world. In reality, data storage is an order of magnitude cheaper than it’s ever been before, and ordinary companies with ordinary budgets can now store and retrieve essentially the entire digital histories of their target clients from Hadoop clusters, and slice the data for insights using freeware R, without shelling out $50K for a SAS license. In reality, all hype aside, this is an authentic sea change…a clean break from how things were done before.



 While I heard and read many misconceptions about big data like being a new concept or it is very expensive or the fact that it is an IT or a Hadoop thing…I feel like the biggest misconception about big data is that it’s just a tool or a technology of some sort…I believe that big data is a discipline and a practice that requires, very much like data management, a combined and specialized set of related processes, technologies and people to make it happen.



 Big is a relative term. Fifteen years ago, people thought POS data was big enough to challenge software/hardware/methods in practice. Then came user web navigation data, and Google developed non-tabular methods to extract value from it. Now we have much vaster amounts of data generated by machines without human intervention RFID tag data. What has changed is the extent to which companies and organizations must leverage this information in order not to fail.



That Big data is new ,in fact it always existed, it’s just getting bigger due to the digital world.


That it has specific format, in fact it might be in any format.


That there are many effective ways to use it, in fact, most of fortune 500 companies are only taking advantage of limited data.



 The biggest misconception is what the word or the concept 'Big Data' itself means. It is more difficult than people think and it is also less difficult than people think. Essentially to do this right just a software or solution will not solve the problem. You need to re-think your approach to collecting data, structure the data, process or analyze the data. It will not magically solve all your business problems, would require a lot of upfront work / investment, and at the end of the day might not be the right solution for your business.



I can identify at least three misconceptions:


1. The term "Big". By focusing on the size of data, something

challenging in and by itself since big is relative, we are removing out

attention from the problems and business challenges we are trying to

address. Let's instead focus on problems first and determine how and

what data can help us second, be it big, small or tiny.


2. The delusion that bigger=better. Usually, this is not the case because the

ability to extract a valid signal from data, depends on a lot more than

just size. Big often brings other methodological challenges that, to

this day, we don't know how to solve.


3. The illusion that one is working with the whole population leading practitioners to ignore more fundamental quality and validity issues. This is especially true for things like selection bias, measurement errors, or plain non-sensical relationships.


There's no question that big data is here to stay. Nonetheless, we need to stop discriminating data by its size. Can we just call it data?


A big thank you to Alex Uher at L'Oréal Paris, Jonathan Isernhagen at Travelocity, Farouk Ferchichi from Toyota, Larry Shiller from Yale, Anthony Palella at Angie's List, Sofia Freyder from MaterCard, Thomas Speidel from Suncor Energy, and Deepak Tiwari from Google who answered the Big Data question today.

I’d love to hear your thoughts on whether you agree or disagree with our speakers in the comment section. Look forward to the possibility of meeting you at the event!

Check back over the next few days as I pose more questions to our esteemed speakers.


@TomHCanderson @OdinText



[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of a patented Next Generation approach to text analytics known as OdinText. For more information and to inquire about software licensing visit OdinText INFO Request.]

Text Analytics World Interview

The Future Directions for Text Analytics [Text Analytics World Pre Conference Interview with Tom H. C. Anderson, CEO of Anderson Analytics - OdinText and Jeremy Bentley CEO, Smart Logic. April 2, 2013 Q&A Reposted with permission from Text Analytics World]

We asked two leading text analytics experts, Tom Anderson of Anderson Analytics – Odin Text and Jeremy Bentley of Smart Logic, what their take was on some possible future directions for the field. Their answers are shown below:

Tom Reamy: What do you see as the major trends in text analytics in the next year or two?

Tom Anderson: Realizing that customization is key. I think we’re only at the tip of the iceberg. It’s great that we’re starting to finally leverage all the data (CRM, Survey etc.) that we’ve spent so much time and money collecting and storing. But over the next two years I predict we’ll be using it in several other areas that are hard for us to foresee now.

Tom Reamy: What are the problems and issues that are slowing down the field?

Tom Anderson: The infatuation with “Social Media Monitoring” which really mainly is “Twitter Monitoring”. Until walled gardens around Facebook and LinkedIn data come down (I’ve been waiting and waiting), there really is limited usefulness in this area and we may be better off concentrating more of our efforts elsewhere. As clients start realizing they’re just listening to 8% of the population on Twitter or blogs, whom really often are somewhat different than normal customers they begin to question the ROI here.

The reason this can be problematic is that clients are so wrapped up thinking that they need to listen to “what people are saying about us on the Internet” that they don’t think about all the valuable data sources text analytics companies can help them with today.

For instance many are already paying a lot of money to field incoming customer calls and emails, storing this data, and yet don’t take the time to listen to what these very real customers are saying.

This in my opinion is hindering the advancement of text analytics in some ways. The focus needs to be broader.

Tom Reamy: What new technologies and developments in text analytics or related fields (predictive analytics, machine learning, artificial intelligence, etc.) do you see or want to see in the next year or two?

Tom Anderson: I think data visualization today is incredibly poor. I can’t believe many of our competitors in the text analytics field still offer simple “word clouds” as output.

Conversely, I think clients have to realize that data visualization techniques are generally best used as exploration tools, and not one click export to a management level PowerPoint slide.

There is currently an opportunity in best ways to communicate insights from text analytics. Having powerful software and the right data is half the battle. But we also need more creative analysts who understand the respective business and data and who can communicate the findings effectively. This more of a shortage of good analysts with the time to use these tools problem than a need for additional technology.

Tom Reamy: Do you see any revolutionary changes for text analytics on the horizon?

Tom Anderson: Yes, what I’ve been talking about a lot is domain expertise. OdinText for instance is focused on the use of text analytics for consumer insights. That is a very different thing than using text analytics for engaging with twitters or detecting terrorists or fraud etc. All these require special knowledge, rule and code modification.

I think there will be less “Enterprise” as well as “Twitter Monitoring” firms, and a lot more domain and industry specific text analytics tools/firms.

Also this technology will be incorporated by most of the companies that own sizeable amounts of unstructured data. So there will be more licensing and acquisitions going on.

Tom Reamy: Is there anything else you would like to say about the future of text analytics?

Tom Anderson: I’m so glad I got into text analytics as early as I did. It’s still in its infancy, not in terms of what we can do with it already/the power, but in terms of adoption and creatively thinking about how to leverage it in different ways. Very exciting times ahead!


Tom Reamy: What do you see as the major trends in text analytics in the next year or two?

Jeremy Bentley: To borrow from Big Data parlance – Velocity, Volume and Variety mean text analytics in real time over a lot of it, in different formats and from different places. Content Intelligence (which includes text analytics) brings structure to unstructured information so it can be joined with the data world. Data tells you what happened, and content tells you why. Associating the what with the why is the major requirement for organizations that protect, value and make money from their information.

Tom Reamy: What are the problems and issues that are slowing down the field?

Jeremy Bentley: The reality check that content is not clean, properly managed or sufficiently findable today. Information overload (the often cited big issue) is nothing but a filter problem – the problem is that the filter parameters are not present in the current information management systems of CMS, ERDMS and search engines. Until it is recognized that the gritty and unglamorous task of metadata management and automatic application of whatever metadata is needed for a particular view of the content at any particular point in time. Once addressed content becomes process-able and valuable.

Tom Reamy: What new technologies and developments in text analytics or related fields (predictive analytics, machine learning, artificial intelligence, etc.) do you see or want to see in the next year or two?

Jeremy Bentley: There is a balance to be drawn between what is fully automatic and what requires some human oversight – Classification and text analysis should be fully automatic – the methods and rules used to drive the analysis should be subject to user oversight. Machine learning and AI have a role to play in the latter – as software become more sophisticated so the effort needed to achieve quality analytics and metadata derivation will go down.

Tom Reamy: Do you see any revolutionary changes for text analytics on the horizon?

Jeremy Bentley: Most users see text analytics as pretty cutting edge as it is, so to this question we have to widen it from Text to Content – in all of its forms to see where the revolution comes.

Content Intelligence for Big Data will revolutionize how organizations use their information to gain insight and competitive advantage. This is already happen ing in forward thinking enterprises- inclreasingly it will not just be the larger organizations that benefit from such an approach.

Tom Reamy: Is there anything else you would like to say about the future of text analytics?

Jeremy Bentley: Being able to process content, as we do data in a database will seem standard in a decades time.