Posts tagged demographics
Can Text Analytics Shed Light On Trump's Appeal?

From tacit to more explicit insights, text analytics helps answer the why’s in voting

Because of the interest in yesterday’s post I decided to continue on the topics of politics today.  As a marketing researcher and data scientist though I found yesterday’s analysis a bit more interesting. Not because of the findings per se, but because we were able to use text analytics to accurately predict real attitudes and behavior by not just ‘reading between the lines’ but extrapolating a relationship between seemingly non related attitudes and opinions, which of course are related and predictive when you look more closely.

Of course text analytics can be interesting when used on more explicit data as well. So today I’ll take a look at two more open ended comment questions two different surveys.

In case you're wondering, the benefit of a text answer rather than asking several structured survey questions with rating scales is that unaided text questions give a much truer measure of what issues are actually important to a respondent. Rating scale questions force respondents to have an opinion on issues even when there is none, and thus structured survey questions (even the popular ones like Net Promoter Score) are usually far less effective in predicting actual behavior than text data in our experience.

Reason for Political Affiliation

Immediately after the self-description exercise in yesterday’s analysis we obviously needed to ask what the respondents political affiliation was (so that we could understand what relationship, if any, there is between how we view ourselves and political affiliation).

Respondents were able to designate which party if any they were affiliated with, whether they considered themselves Independent, Tea Party, Green, or something else, and why?

mm1

mm1

WhyDemRep

WhyDemRep

The ability to get a good quantitative relative measure to a why question is something unique to text analytics. Perhaps surprisingly there were rather few mentions of specific campaign issues. Instead the tendency was to use far more general reasons to explain why one votes a certain way.

While Republicans and Democrats are equally unlikely to mention “Conservatism’ and “Liberalism” when describing themselves (from yesterday's post), Republicans are about twice as likely to say they are affiliated with the Republican party because of their “Conservative” values (11% VS 5% “liberal” for Democrats).

Democrats say they vote the way they do because the Democratic party is “For the People”, “Cares about the Poor” and “the Middle [and] working class”.

Republicans on the other hand say they vote Republican because of “values” especially the belief that “you have to work for what you get”. Many also mention “God” and/or their “Christian” Faith as the reason. The desire for smaller/less government and greater Military/Defense spending are also significant reasons for Republicans.

Of course we could have probed deeper in the OE comments with a second question if we had wanted to. Still it is telling that specific issues like Healthcare, Education, Gay Rights and Taxes are less top-of-mind among voters than these more general attitudes about which party is right for them.

Describe Your Ideal President

As mentioned earlier we are looking toward social media to understand and build models. Therefore, we also recently asked a separate sample of n=1000 Americans, all who are active on Twitter, what qualities they felt the President of the United States (POTUS) should have.

mm44

mm44

TextAnalyticsPOTUS

TextAnalyticsPOTUS

The chart above is divided by those who said they tend to vote or at least typically skew toward that respective party.

The findings do help explain the current political climate a bit. Both Democrats and Republicans were most likely to mention “honesty” as a quality they look for, perhaps indicating a greater frustration with politics in general. The idea of “honesty” though is more important to voters who skew toward the GOP.

Those who favor the Democratic party are significantly more likely to value traits like Intelligence, Compassion/Empathy, skill, educational attainment of the candidate and open-mindedness.

Those who lean Republican however are significantly more likely to value a candidate who is perceived both as a strong leader in general, but also more specifically is strongly for America. Rather than educational attainment, softer more tacit skills are valued by this group, for instance Republican voters put greater emphasis on experience and “know how”. Not surprisingly, based on yesterday’s data on how voters view themselves, Republican voters also value Family values and Christian faith in their ideal POTUS.

Research has shown that people prefer leaders similar to themselves. Looking back to some of the self descriptions in yesterday's data we definitely see a few similarities in the above...

Thanks for all the feedback on yesterday’s post. Please join me week after next when I plan on sharing some more interesting survey findings not related to politics, but of course to text analytics.

@TomHCAnderson

WhyVoteTextAnalytics

WhyVoteTextAnalytics

Tom H.C. Anderson

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Software Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Text Analysis Predicts Your Politics Without Asking

How What You Say Says Way More Than What You Said

Pretend for a moment that you had a pen pal overseas and they asked you to describe yourself. What would you tell them? What makes you “you”?

It turns out that which traits, characteristics and aspects of your identity you choose to focus on may say more than you realize.

For instance, they can be used to predict whether you are a Democrat or a Republican.

With the U.S. presidential race underway in earnest, I thought it would be interesting to explore what if any patterns in the way people describe themselves could be used to identify their political affiliation.

So we posed the question above verbatim to a nationally representative sample of just over n=1000 (sourced via CriticalMix) and ran the responses through OdinText.

Not surprisingly, responses to this open-ended question were as varied as the people who provided them, but OdinText was nevertheless able to identify several striking and statistically significant differences between the way Republicans and Democrats described themselves.

NOT About Demographics

Let me emphasize that this exercise had nothing to do with demographics. We’re all aware of the statistical demographic differences between Republicans and Democrats.

For our purposes, what if any specific demographic information people shared in describing themselves was only pertinent to the extent that it constituted a broader response pattern that could predict political affiliation.

For example, we found that Republicans were significantly more likely than Democrats to say they have blonde hair.

Of course, this does not necessarily mean that someone with blonde hair is significantly more likely to be a Republican; rather, it simply means that if you have blonde hair, you are significantly more likely to feel it noteworthy to mention it when describing yourself if you are a Republican than if you are a Democrat.

Predicting Politics with Text Analytics

Predicting Politics with Text Analytics

Self-Image: Significant Differences

OdinText’s analysis turned up several predictors predictors for party affiliation, here are 15 examples indexed below.

  • Republicans were far more likely to include their marital status, religion, ethnicity and education level in describing themselves, and to mention that they are charitable/generous.

  • Democrats, on the other hand, were significantly more likely to describe themselves in terms of friendships, work ethic and the quality of their smile.

Interestingly, we turned up quite a few more predictors for Republicans than Democrats, suggesting that the former may be more homogeneous in terms of which aspects of their identities matter most. This translates to a somewhat higher level of confidence in predicting affinity with the Republican Party.

As an example, if you describe yourself as both “Christian” and “married,” without knowing anything else about you I can assume with 90% accuracy that you vote Republican.

Again, this does not mean that Christians who are married are more than 90% likely to be Republicans, but it does mean that people who mention these two things when asked to tell a stranger about themselves are extremely likely to be Republicans.

So What?

While this exercise was exploratory and the results should not be taken as such, it demonstrates that text analytics make it entirely possible to read between the lines and determine far more about you than one would think possible.

Obviously, there is a simpler, more direct way to determine a respondent’s political affiliation: just ask them. We did. That’s how we were able to run this analysis. But it’s hardly the point.

The point is we don’t necessarily have to ask.

In fact, we’ve already built predictive models around social media profiles and Twitter feeds that eliminate the need to pose questions—demographic, or more importantly, psychographic.

Could a political campaign put this capability to work segmenting likely voters and targeting messages? Absolutely.

But the application obviously extends well beyond politics. With an exponentially-increasing flood of Customer Experience FeedbackCRM and consumer-generated text online, marketers could predicatively model all manner of behavior with important business implications.

One final thought relating to politics: What about Donald Trump, whose supporters it has been widely noted do not all fit neatly into the conventional Republican profile? It would be pretty easy to build a predictive model for them, too! And that could be useful given the widespread reports that a significant number of people who plan to vote for him are reluctant to say so.

IIEX 2016 Competition Showcases Innovation in Market Research

Artificial Intelligence, Mixed Data Analytics and Passive Listening Capture Minds - 2016 Insight Innovation Exchange

I’m just back from the IIEX conference in Atlanta, where OdinText competed in the Insight Innovation Competition. Although I was disappointed that we didn’t win, I’m pleased to report that the judges told me we placed a very close second.

IIeX 2016

IIeX 2016

Attending conferences like this affords me the opportunity to get a pulse on the industry, and I was struck by the fact that text analytics are no longer viewed as a shiny new toy in market research. In fact, as someone who has been working in the natural language processing field for so long, it’s actually somewhat remarkable to see how perceptions of text analytics have matured over just the last year.  Text analytics have become a must-have, and the market has a new wave of healthy competition as a result, which I think is further evidence of a healthy market.

Since OdinText goes beyond just text data and incorporates mixed data—text and quantitative—in our competition pitch we highlighted OdinText’s ability to essentially enable market researchers to do data science.

I strongly believe making data science more accessible is a huge opportunity that OdinText is uniquely positioned to solve, and it’s an area where market researchers can step up to meet a desperate need as we currently have a shortage of about 200,000 data scientists in the US alone.

(Check out this 5-minute video of my IIEX competition pitch and let me know what YOU think!)

Download PDF

Download PDF

You are also most welcome to download a PDF of the PPT presentation >>>

“Machine learning” appears to be the new buzz phrase in research circles, and at IIEX I was hard pressed to find a single vendor not claiming to use machine learning in some respect, no matter where on the service chain they fit. Honestly, though, I got the sense that many use the term without entirely understanding what it means.

We continue to leverage machine learning where it makes sense at OdinText, and there are a few other vendors out there who also clearly have an excellent grasp of the technique.

One such company—which took first place in the competition, in fact—was Remesh. They’re actually using machine learning in a very unique and novel way, by automating the role of an online moderator almost akin to a chat bot. They’ve positioned this as AI, and to replace humans completely with a computer is a holy grail for almost any industry.

I’m optimistic on AI in my field of data and text mining as well, but we’re still a ways off in terms of taking the human out of the mix, and so our goal at OdinText is to use the human as efficiently as possible.

While totally automating what a data scientist does is appealing, in the short term we’re happy with being able to allow a market researcher to do in a few hours what would take a typical data scientist with skills in advanced statistics, NLP, Python, R and C++ days or weeks to do.

Still I admit the prospect of AI replacing researchers completely is an interesting one—albeit not necessarily a popular one among the people who would be replaced—and it’s an area that I’m certainly thinking about.

Third place in the competition I understand was Beatgrid Media, which leverages smart phones (without using almost any battery life) to passively listen to audio streams from radio and TV and overlaying geo demographics with these panelists’ data to better predict advertising reach and efficacy. This is admittedly going to be a very hard field to break into by a start-up as there are many big players in the space who want to own their own measurement. And so this may have been one of the reasons Beatgrid had trouble taking more than third, even though they admittedly have some very interesting technology that could perhaps also be applied in other ways.

Let me know what you think!

(And if you’re interested in a demo of OdinText, please contact us here!)

Tom H.C. Anderson | @TomHCanderson | @OdinText

Tom H.C. Anderson

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Look Who’s Talking, Part 1: Who Are the Most Frequently Mentioned Research Panels?

Survey Takers Average Two Panel Memberships and Name Names

Who exactly is taking your survey?

It’s an important question beyond the obvious reasons and odds are your screener isn’t providing all of the answers.

Today’s blog post will be the first in a series previewing some key findings from a new study exploring the characteristics of survey research panelists.

The study was designed and conducted by Kerry Hecht, Director of Research at Ramius. OdinText was enlisted to analyze the text responses to the open-ended questions in the survey.

Today I’ll be sharing an OdinText analysis of results from one simple but important question: Which research companies are you signed up with?

Note: The full findings of this rather elaborate study will be released in June in a special workshop at IIEX North America (Insight Innovation Exchange) in Atlanta, GA. The workshop will be led by Kerry Hecht, Jessica Broome and yours truly. For more information, click here.

About the Data

The dataset we’ve used OdinText to analyze today is a survey of research panel members with just over 1,500 completes.

The sample was sourced in three equal parts from leading research panel providers Critical Mix and Schlesinger Associates and from third-party loyalty reward site Swagbucks, respectively.

The study’s author opted to use an open-ended question (“Which research companies are you signed up with?”) instead of a “select all that apply” variation for a couple of reasons, not the least of which being that the latter would’ve needed to list more than a thousand possible panel choices.

Only those panels that were mentioned by at least five respondents (0.3%) were included in the analysis. As it turned out, respondents identified more than 50 panels by name.

How Many Panels Does the Average Panelist Belong To?

The overwhelming majority of respondents—approx. 80%—indicated they belong to only one or two panels. (The average number of panels mentioned among those who could recall specific panel names was 2.3.)

Less than 2% told us they were members of 10 or more panels.

Finally, even fewer respondents told us they were members of as many as 20+ panels; others could not recall the name of a single panel when asked. Some declined to answer the question.

Naming Names…Here’s Who

Caption: To see the data more closely, please click this screenshot for an Excel file. 

In Figure 1 we have the 50 most frequently mentioned panel companies by respondents in this survey.

It is interesting to note that even though every respondent was signed up with at least one of the three companies from which we sourced the sample, a third of respondents failed to name that company.

Who Else? Average Number of Other Panels Mentioned

Caption: To see the data more closely, please click this screenshot for an Excel file.

As expected—and, again, taking the fact that the sample comes from each of just three firms we mentioned earlier—larger panels are more likely than smaller, niche panels to contain respondents who belong to other panels (Figure 2).

Panel Overlap/Correlation

Finally, we correlate the mentions of panels (Figure 3) and see that while there is some overlap everywhere, it looks to be relatively evenly distributed.

Caption: To see the data more closely, please click this screenshot for an Excel file.

Finally, we correlate the mentions of panels (Figure 3) and see that while there is some overlap everywhere, it looks to be relatively evenly distributed. In a few cases where correlation ishigher, it may be that these panels tend to recruit in the same place online or that there is a relationship between the companies.

What’s Next?

Again, all of the data provided above are the result of analyzing just a single, short open-ended question using OdinText.

In subsequent posts, we will look into what motivates these panelists to participate in research, as well as what they like and don’t like about the research process. We’ll also look more closely at demographics and psychographics.

You can also look forward to deeper insights from a qualitative leg provided by Kerry Hecht and her team in the workshop at IIEX in June.


Thank you for your readership. As always, I encourage your feedback and look forward to your comments!

@TomHCanderson @OdinText

Tom H.C. Anderson

PS. Just a reminder that OdinText is participating in the IIEX 2016 Insight Innovation Competition!

Voting ends Today! Please visit MAKE DATA ACCESSIBLE and VOTE OdinText!

 

[If you would like to attend IIEX feel free to use our Speaker discount code ODINTEXT]

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips ]

 

Why Text Analytics Needs to Move at the Speed of Slang

Do You Speak Teen? 10 Terms You May Not Know

Translating the words teens use has been a headache and source of embarrassment for generations of parents. It’s as though the kids speak a different language. Let’s call it a “slanguage”. And you know you’re old when you need google to understand it.

Nowadays, too, it’s much harder to bridge the communication gap because the Internet has dramatically increased the pace at which slanguage changes. In fact, every year hundreds of new slang words and phrases that originated on the Internet are added to the terrestrial dictionary.

"Slanguage" is a moving target

And thanks to social media, new terms, phrases and acronyms—which, in cases, can describe an entire situation—crop up and go viral literally overnight.

In short, slanguage has become a moving target; it seems to change faster than we can pick it up. As soon as we’re proficient, we’re out of touch again.

For obvious reasons, this is not only a problem for parents, it’s particularly frustrating for anyone researching or marketing to youth.

The Problem with “Dictionaries”

Text analytics software has enabled us to monitor what young people are saying online, but it does us little good when the software can’t keep up with slanguage.

One of the primary weaknesses of most text analytics software platforms is that they rely on “dictionaries” to understand what is being discussed or to assign sentiment.

These dictionaries are only as good as the data used to create them. If the data changes in any way (e.g., new words are used or used in different ways) the software will miss it.

So in order to stay current using a conventional text analytics platform, one must manually identify new slang terms as they emerge and continually update the dictionary.

In contrast, OdinText is uniquely able to identify new, never-before-used terms—slang, acronyms, industry jargon, new product/competitor names, etc.—without user input.

Test Your Teenspeak Proficiency!

Staying abreast of changes in teenspeak requires some vigilance. You may be further out of touch than you realize. Let’s take a quiz: just for fun, I randomly pulled 10 terms that have become popular with post-Millennials (very roughly rank-ordered by use below).

If you’re not familiar with these terms or can’t define them, don’t worry. You’re not alone. I didn’t understand any of them either. It’s not necessarily easy to figure out what many of these new terms mean, either.

A conventional, mainstream dictionary won’t be any help here, but the Urban Dictionary can be a lifesaver. You can also learn a lot by researching the images online that are associated with a new trending slang term (especially “memes”) for context. YouTube videos and music can be similarly helpful.

Triangulating using these sources and the most common context is often the best way to stay on top of these moving targets, which as I noted come and go relatively quickly.

Many of the ten I’ve listed below can have more than one meaning depending on context, and some may even be used differently by different demographic groups and even within the same demographic group.

So, without further ado, here are 10 of the top 10 slang terms we’ve spotted circulating within the past few months.

Without skipping ahead to the answers, how many do you know?

  • One (or 1)

  • Dab

  • Schlonged ($ other ‘Trumpisms’)

  • Bae

  • Fetch

  • Lit

  • BRUH

  • Fleek

  • Swag

  • Bazinga!

one love

one love

“One” or “1”

In teenspeak, “one” or “1” doesn’t always signify a quantity. It can also mean “One Love” and is used frequently in parting (like “goodbye”). It may be used in person, on the telephone or via digital communication.

dab

dab

“Dab”

You knew this was a verb meaning to pat or tap gently, but that’s not what the kids mean. The recent uptick in “Dab” was inspired by a dance move popularized in a 2014 video by Atlanta rapper Skippa da Flippa. It’s often used as a sort of victory swagger (“Keep dabbin' ... let the haters hate ... Dab on”). Check out this YouTube clip for more.

trumped

trumped

“Trumped” & “Schlonged”

“Trump” as a noun and as a verb traditionally referred to a stronger hand of cards or other competitive advantage. But due in no small measure to Donald J. Trump the presidential candidate’s ascendancy, various combinations of “Trump” and “Trumped” and several memes and other digital chat have been cropping up with a variety of meanings.

“Trump” has appeared as an adjective describing someone rich or spoiled. A couple of months ago we also saw a renewed interest in “Schlonged” again due to media coverage of candidate Trump. There was some debate on what the actual meaning was. Here again I think the Urban Dictionary is one of the best resources for you to make up your own mind.

bae

bae

“Bae”

According to our analysis, this one seems more popular among women—about twice so—and also somewhat more popular in the Midwest. “Bae” is a pet name for one’s significant other. It may have been derived from “baby” (like “B” and “boo”) or it could be an acronym for “Before Anyone Else.”

fetch

fetch

“Fetch”

It’s not a command for a dog. Think slang predecessors like “cool” or “awesome.”  This one can be traced to the cult hit “Mean Girls”. Ironically, in the film the term never catches on despite one character’s dogged attempts to popularize it.

lit

lit

“Lit”

A hit with the youngest demographic, and skewing somewhat more Northeast regionally, rappers and other musician entertainers have been using “Lit” in recent songs and videos. It can mean a number of things, including that something is “hot” or popular, but also that someone is drunk or high. When used in a phrase like,“It’s Lit,” it means exciting, good or worthwhile. “Come on down, it’s Lit!”

bruh

bruh

“Bruh”

It’s “bro” phonetically tweaked—basically means “buddy” among guys—but it can also be an expression of surprise (and usually as a disappointment) as in “Damn!” The latter use seems to have originated at least partly thanks to a video that appeared on Vine featuring high school basketball star Tony Farmer being sentenced to prison and consequently collapsing.

fleek

fleek

“Fleek”

More popular among younger women, particularly in the South, “fleek” is a synonym for another popular slang phrase, "on point"—basically looking sharp, well-groomed or stylish. Recently, “fleek” has become specifically about eyebrows, in part due to a couple of Instagram videos, and mainstreamed when Kim Kardashian used it to describe a picture of her bleached eyebrows as #EyebrowsOnFleek.

swag

swag

“Swag"

“Swag” may actually already be on the way out, but it’s still quite popular. Derived from “swagger”—the supremely confident style of walking or strutting—“swag” has come to refer generally to an urban style and look associated with Hip-Hop. It could relate to a haircut or shoes, are simply an attitude or presence that exudes confidence and even arrogance.  Example video: Soulja Boy Tell'em - Pretty Boy Swag

bazinga

bazinga

“Bazinga”

This one comes courtesy of “The Big Bang Theory” character Sheldon Cooper and means “Gotcha” or “I fooled you.”

Don’t Let Words Fail You!

I hope you had some fun with this quiz and maybe picked up some new vocabulary, but I’d like to emphasize that slang isn’t the only terminology that changes. Keeping on top of new market entrants, drug names, etc., is important. If you don’t have a technology solution like OdinText that can identify new terms with implications for your business or category, make sure that you at least set up a manual process to regularly check for them.

Until next time – One!

Tom

@TomHCanderson@OdinText

PS. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Brand Analytics Tips – Branding and Politics

Text Analytics Tips - Branding Ford = Donald Trump and Adidas = Bernie Sanders! Text Analysis of 500 Brands by Political Affiliation - by Tom H. C. Anderson (Final part of this week’s Brand Analytics post)

During the last two days we’ve analyzed a very simple open-ended/comment survey question, namely Q. When you think of brand names, what company’s product or service brand names first come to mind?”. While OdinText can handle this question easily right out of the box without any customization, doing the types of analysis we’ve been doing either by human coding method, or any other text analytics software, or scripting in R and Python etc. would have been difficult to impossible.

While we could certainly continue to analyze this brand awareness question, which can be very useful in brand equity tracking or advertising effectiveness , I’m going to end the analysis of this question today by looking at a variable I thought would be interesting, namely political affiliation.

OdinTextAnalyticsPolitics2

Politics of Brands

Are major brand names politically affiliated? There has been some previous research into this question by looking at how corporations tend to make political donations. In fact, there’s even an iPhone app called, you guessed it, BuyPartisan which compiles campaign finance data from the top Fortune 500 companies and matches it with their products. But do Republicans and Democrats have different brand consideration sets?

Today we’ll look at our recent random sample of over a thousand Americans to see if there is a significant link between unaided top of mind brand awareness and political affiliation in hopes of understanding which liquid soap if any Donald Trump supporters might be more likely to buy.

Political Polling Democrat Republican Text Analytics Chart

Though there isn’t a statistically significant differences between Democrats and Republicans across the majority of the 500+ brands mentioned in our study, there are a few notable exceptions.

The most Republican brand out there is Dawn. For some reason this liquid soap is TEN times more likely to be mentioned by Republicans as Democrats (3% vs. 0.3%)!

Five other popular brands that skew significantly more Republican than Democrat are Ford (12% vs. 6%), Kellogg’s (4% vs. 1%), Palmolive and Wells Fargo (both 1.4% vs. 0%).

Conversely, brands that are more likely to be Top of mind among Democrats are Air Jordan (2.3% vs. 0%), Target (7% vs. 3%), Adidas (5% vs. 2%), and Bose (1.4% vs. 0%).

Why some of these differences? Your guess is as good as mine. In some cases like Dawn, perhaps it has to do with National distribution channels and more ‘Red States’ getting stocked with this brand? Interestingly though, according to BuyPartisan, “buying Dawn dish soap will support the Republican party”, so perhaps there is more to some of these categories. In other cases like Air Jordan and Adidas (the latter which is German), these two brands at least are perhaps more likely to be seen in larger urban settings and therefore more ‘blue’?

Blogging about brands for the past three days was more fun than I thought. Several people reached out to us with further questions. Of course if anyone would like additional information on OdinText please fill out our simple text analytics demo request.

Please come back next week for more Text Analytics Tips as we plan to explore a very different data set with different insights.

Tom @OdinText

 

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc.]

Brand Analytics Tips – How Old is Your Brand?

Text Analytics Tips Text Analytics Tips Answers, How Old Is Your Brand? - Using OdinText on Brand Mention Type Comment Data By Tom H. C. Anderson

[METHODOLOGICAL NOTES (If you’re not a researcher feel free to skip down to ‘Brands & Age’ section below): In our first official Text Analytics Tips I’ve started with exploring one of the arguably simplest types of unstructured/text data there is, the unaided top-of-mind ‘brand mention’ open-ended survey question. These kinds of questions are especially important to brand positioning, brand equity, brand loyalty and advertising effectiveness research. In this case we’ve allowed for more than one brand mention. The questions reads “Q. When you think of brand names, what company’s product or service brand names first come to mind? [Please name at least 5]”. The question was fielded to n=1,089 US Gen Pop Representative survey respondents in the CriticalMix Panel in December of 2015. The confidence interval is +/-2.9% at the 95% confidence level]

Making Good Use Comment Data Can Be Easy and Insightful

An interesting and rather unique way to look at your brand is to understand for whom it is most likely to be top-of-mind.

Unfortunately, though they have proven more accurate than structured choice or Likert scale rating questions in predicting actual behavior, free form (open end) survey questions are rare due to the assumed difficulty in analyzing results.  Even when they are included in a survey and analyzed, results are rarely expressed in anything more useful than a simple frequency ranked table (or worse, a word cloud). Thanks to the unique patented approach to unstructured and structured data in OdinText, analyzing this type of data is both fast and easy, and insights are only limited to the savviness of the analyst.

The core question asked here is rather simple i.e. “When you think of brand names, what company’s product or service brand names first come to mind?”. However, asking this question to over a thousand people, because of the share volume of brands that will be mentioned (in our case well over 500), even this ‘small data’ can seem overwhelming in volume.

The purpose of this post is to show you just how easy/fast yet insightful analysis of even more specific and technically more basic comment data can be using Next Generation Text AnalyticsTM.

After uploading the data into OdinText, there are numerous ways to look at this comment data, not only the somewhat more obvious frequency counts, but also several other statistics including any interesting relationships to available structured data. Today we will be looking at how brand mentions are related to just one such variable, the age of the respondent. [Come back tomorrow and we may take a look at a few other statistics and variable relationships.]

Text Analytics Tips Age OdinText

Brands by Age

Below is a sortable list of the most frequently mentioned brands ranked by the average age of those mentioning said brand. This is a direct export from OdinText. The best way to think about lists like these is comparatively (i.e. how old is my brand vs. other brands?). If showing a table such as this in a presentation I would highly recommend color coding which can be done either in OdinText (depending on your version), or in excel using the conditional formatting tool.

[NOTE: For additional analytics notes and visualizations please scroll to the bottom of the table below]

 

Brand Name Average Age
Maxwell House 66
Hunts 66
Aspirin 66
Chrysler 64.6
Stouffers 63.7
Marie Callender's 63.7
Walgreen 63.7
Cooper (Mini) 63.7
Bayer 62.6
USAA 62.5
Epson 62.5
Brother 61.3
Aol 61.3
Comet 61.3
Snapple 61.3
Lowes 61.2
Marriott 60.3
Ritz 60.3
Hellman's 60.3
Ikea 60.3
Belk 60.3
State Farm 60.3
Oscar Mayer 60
Folgers 59.8
Libby's 59.8
Hormel 59.2
Depot 59.2
Heinz 59.2
Electric 59.2
Bordens 59.2
Nestles 59
Green Giant 59
Sargento 58.3
Del Monte 58
Prego 58
Kashi 58
Westinghouse 58
Stouffer 58
Taylor 58
Home Depot 57.6
Publix 57.5
Banquet (Frozen Dinners) 57.5
Buick 57
Krogers 57
Hellman's 57
Safeway 56.5
Purex 56.4
Hewlett 56.4
Unilever 56.1
RCA 56.1
Post 56.1
P&G 55.9
Budweiser 55.9
Yoplait 55.8
Chobani 55.7
Ragu 55.7
Campbell's 55.5
Wells Fargo 55.2
Hershey 55.1
Betty Crocker 55
Sharp 55
Hines 55
Trader Joe's 55
Palmolive 54.9
Kia 54.7
Lexus 54.7
Life 54.7
Hotpoint 54.7
Campbells 54.6
Oscar Mayer 54.5
Dial 54.4
Nissan 54.4
Hillshire Farms 54.3
Motorola 54.1
Keebler 54
CVS 53.8
Canon 53.8
Lakes 53.7
Pillsbury 53.3
Hilton 53.3
Faded Glory 53.3
Friskies 53.3
Duncan Hines 53.3
Puffs 53.3
Olay 52.8
Sketchers 52.5
Fred Meyer 52.5
Delta 52.5
Hunt 52.3
Bose 52.3
Ocean Spray 52.3
Ivory 52.3
Swanson 52.3
Dewalt 52.3
Firestone 51.8
Estee Lauder 51.5
Miller 51.5
Tide 51.4
Honda 51.3
Meijer 51.3
Perdue 51.3
Jeep 51.3
Head 51.3
Lee Jeans 51.3
Pantene 51
Chevrolet 51
Cannon 50.8
Chef Boyardee 50.8
Frito Lay 50.6
Avon 50.5
Motors 50.4
Kodak 50.4
General Mills 50.2
BMW 50
Lipton 49.8
Kohl's 49.8
Goodyear 49.7
Kraft 49.6
Craftsman 49.5
Sunbeam 49.4
IBM 49.3
Frigidare 49.1
Sears 49.1
Ford 49.1
Walgreens 49.1
Dole 49.1
Chevy 49
Wonder (Bread) 49
Dannon 49
JVC 49
Hyundai 49
Clinique 49
Marlboro 49
Mercedes 49
Gerber 49
Acme 49
Kleenex 48.8
Kelloggs 48.7
JC Penney 48.6
Louis Vuitton 48.5
Calvin 48.4
LL Bean 48.4
Gillette 48.4
Johnson & Johnson 48.3
Shell 48.3
Kenmore 48.1
Dawn 48
Hanes 48
Macdonalds 48
Tylenol 48
Colgate 47.5
Wrangler (Jeans) 47.3
Burger King 47.3
Whirlpool 47.1
GMC 47
Yahoo 46.9
Dish Network 46.8
Verizon 46.7
Hersheys 46.6
Whole Foods 46.5
Sara Lee 46.5
Hostess 46.5
Mazda 46.5
Toyota 46.4
Arm & Hammer 46.4
Nabisco 46.3
Tyson 46.1
Starbucks 46
Wal-Mart 45.9
Western Family 45.8
Wegmans 45.8
Dr Pepper 45.7
Hulu 45.7
Time Warner 45.7
Maybelline 45.7
MLB 45.7
Iams 45.7
Cox 45.7
Country Crock 45.7
Compaq 45.7
Sonoma 45.7
Quaker Oats 45.7
Nordstrom 45.4
Coca 45.3
Champion 45.3
Bass 45
Chrome 44.7
Coors 44.7
iPhone 44.6
Bounty 44.5
Dodge 44.4
Maytag 44.3
Black & Decker 44.2
Pfizer 44.2
Suave 44.2
HP 44
Scott 44
Subway 44
Skechers 44
Geico 44
Panasonic 43.9
Lays 43.8
KFC 43.8
Charmin 43.8
Dell 43.8
Polo 43.8
Windex 43.7
Burts Bees 43.5
Purina 43.5
Clorox 43.5
Columbia 43.3
Ralph Lauren 43.2
Visa 43.2
Pepsi 43
Crest 43
NFL 43
Sanyo 43
Dove 42.9
Intel 42.9
Wendy's 42.8
Kroger 42.8
Remington 42.3
Phillips 42.3
Mars 42.3
Cover Girl 42.3
Heb 42.3
Twitter 42.3
Amazon 42
Body Works 42
Best Buy 41.8
Costco 41.8
Banana Republic 41.8
Disney 41.7
Amway 41.7
Levi 41.5
Sony 41.4
Samsung 41.4
Macy's 41.1
Glade 41.1
Boost 41
Boost Mobile 41
Toshiba 40.8
Ebay 40.8
Comcast 40.7
Facebook 40.6
Walmart 40.5
Microsoft 40.5
Google 40.4
Kitchen 40.4
Nestle 39.8
Mcdonalds 39.5
Gucci 39.5
Vons 39.3
Philip Morris 39.3
Loreal 39.3
Mattel 39.1
Apple 39
Pepperidge Farm 39
Vizio 39
Lysol 39
Ugg 39
Tropicana 39
Sure 39
Fila 39
Tmobile 39
Coach 38.9
Acer 38.8
Tommy Hilfiger 38.6
Nike 38.1
Target 38
Old Navy 37.9
Chase 37.8
Michael Kors 37.7
K-Mart 37.5
Lenovo 37.5
Equate 37.2
Hoover 36.8
Under Armour 36.6
Windows 36.5
Asics 36.5
Kitchenaid 36.5
Victoria's Secret 36.2
Mac 36.1
Reebok 36.1
Android 36
Direct TV 36
Sprint 36
Netflix 35.9
Adidas 35.7
Citizen 35.7
New Balance 35.6
Guess 35.4
Bic 35.2
Great Value 35.2
Pizza Hut 35
Puma 34.9
Asus 34.4
Fox 34.3
Justice 34.3
North Face 34.1
Xbox 33.6
Gap 33.4
Doritos 33.4
HTC 33.4
Converse 33.3
Sprite 33.2
Febreeze 33
Axe 33
Kay 32.7
Glad 32.7
Mary Kay 32.7
Viva 32.7
Reese's 31.8
Lego 31.7
Amazon Prime 31.5
Nintendo 31.2
Vans 31.2
Taco Bell 31
Fisher Price 30.4
Chanel 29.7
Old Spice 29.7
Playstation 29.4
Eagle 29.4
Hamilton Beach 29.3
Footlocker 29.3
Pink 29.3
Swiffer 29.3
Timberlands 29.3
Naked Juice 29
Youtube 29
Bing 29
Air Jordans 28.4
Huggies 28.2
Aeropostale 27.7
Hollister 27.3
Prada 27.3
Carters 26.8
Kirkland 26.3
Forever 26.3
Aeropostle 26.3
Arizona 25.6
Pampers 24.5
Versace 24.5
Urban Outfitters 24.5

 

A few interesting points from the longer list of brands are:

The oldest brand, “Maxwell House Coffee”, has an average age of 66. (If anything, this mean age is actually conservative, as the age question gets coded as 66 for anyone answering that they are “65 or older”). This is a typical technique in OdinText, choosing the mid-point to calculate the mean if the data are in numeric ranges, as is often the case with survey or customer entry form based data.

The Youngest brand on the list, “Urban Outfitters”, with an average age of 24 also probably skews even younger in actuality for the same reason (as is standard in studies representative of the US General Population, typically only adults aged 18+ are included in the research).

Dr Pepper is in the exact middle of our list  (46 years old). Brands like Dr. Pepper which are in the middle (with an average age close to the upper range of Generation X) are of course popular not just among those 46 years old, but are likely to be popular across a wider range of ages. A good example, Coca-Cola also near the middle, mentioned by 156 people with an average age of 45, is pulling from both young and old. The most interesting thing then, as is usual in almost any research, is comparative analysis. Where is Pepsi relative to Coke for instance? As you might suspect, Pepsi does skew younger, but only somewhat younger on average, mentioned by 107 consumers yielding an average for the brand of 43. As is the case with most data, relative differences are often more valuable than specific values.

If there are any high level category trends here related to age, they seem to be that Clothing brands like Urban Outfitters and Versace (both with the youngest average age of 24), Aeropostale (26), and Forever 21 (Ironically with an average age of 26), and several others in the clothing retail category tend to skew very young. Snack Food especially drinks like Arizona Ice Tea (age 25), and Naked Juice (29), as well as web properties (Bing and YouTube both 29), and electronics (obviously PlayStation 29 and slightly older Nintendo 31 being examples), are associated with a younger demographic on average.

In the middle age group, other than products with a wide user base like major soda brands, anything related to the home, either entertainment like Time Warner Cable or even Hulu (both 45), or major retailers like Wegmans and Wal-Mart (also both 45), are likely to skew more middle age.

The scariest position for a brand manager is probably at the top of the list, with average age for Maxwell House, and Hunts (both 66), Stouffers and Marie Callender's (both 64), the question has got to be, who will replace my customer base when they die? What we see by looking at the data are in fact that a slight negative correlation between age and number of mentions.

Again, it’s often the comparative differences that are interesting to look at, and of course the variance. Take Coca-Cola VS Pepsi for instance, while their mean ages are surprisingly close to each other at 45 and 43 respectively, looking at the variance associated with each gives us the spread (i.e. which brand is pulling from a broader demographic). Coca-Cola with a standard deviation of 14.5 years for instance is pulling from a wider demographic than Pepsi which as a standard deviation of 12.9 years. There are several ways to visualize these data and questions in OdinText, though some of our clients also like to use OdinText output in visualization software like Tableau which can have more visualization options, but little to no text analytics capabilities.

Co-Occurrence (aka Market Basket Analysis)

Last but not least, looking at which brands are often mentioned together, either because they are head to head competitors going after the exact same customers or because there may be complimentary (market basket analysis type opportunities if you will) can also certainly be interesting to look at. Brands that co-occur frequently (are mentioned by the same customers), and are not competitors may in fact represent interesting opportunities for ‘co-opetition’.  You may have noticed more cross category partnering on advertising recently as marketers seem to be catching on to the value of joining forces in this manner. Below is one such visualization created using OdinText with just the Top 20 brand mentions visualized in an x-y plot using multi-dimensional scaling (MDS) to plot co-occurrence of brand names.

Text Analytics of Brands with OdinText

Hope you enjoyed today’s discussion of a very simple text question and what can be done with it in OdinText. Come back again soon as we will be giving more tips and mini analysis on interesting mixed data. In fact, if there is significant interest in today’s post we could look at one or two other variables and how they relate to brand awareness comment data tomorrow.

Of course if you aren’t already using OdinText, please feel free to request a demo here.

@TomHCAnderson

Mr Big Data VS. Mr Text Analytics

[Interview re-posted w/ permission from Text Analytics News]

Mr. Big Data & Mr. Text Analytics Weigh In Structured VS. Unstructured Big Data

 

kirk_borne Text Analytics News

If you pay attention to Big Data news you’re sure to have heard of Kirk Borne who’s well respected views on the changing landscape are often shared on social media. Kirk is professor of Astrophysics and Computational Science at George Mason University. He has published over 200 articles and given over 200 invited talks at conferences and universities worldwide. He serves on several national and international advisory boards and journal editorial boards related to big data

 

 

tom_anderson Text Analytics News

Tom H. C. Anderson was an early champion of applied text analytics, and gives over 20 conference talks on the topic each year, as well as lectures at Columbia Business School and other universities. In 2007 he founded the Next Gen Market Research community online where over 20,000 researchers frequently share their experiences online. Tom is founder of Anderson Analytics, developers of text analytics software as a service OdinText. He serves on the American Marketing Association’s Insights Council and was the first proponent of natural language processing in the marketing research/consumer insights field.

 

Ahead of the Text Analytics Summit West 2014, Data Driven Business caught up with them to gain perspectives on just how important and interlinked Big Data is with Text Analytics.

 

Q1. What was the biggest hurdle that you had to overcome in order to reach your current level of achievement with Big Data Analytics?

KB: The biggest hurdle for me has consistently been cultural -- i.e., convincing others in the organization that big data analytics is not "business as usual", that the opportunities and potential for new discoveries, new insights, new products, and new ways of engaging our stakeholders (whether in business, or education, or government) through big data analytics are now enormous.

After I accepted the fact that the most likely way for people to change their viewpoint is for them to observe demonstrated proof of these big claims, I decided to focus less on trying to sell the idea and focus more on reaching my own goals and achievements with big data analytics. After making that decision, I never looked back -- whatever successes that I have achieved, they are now influencing and changing people, and I am no longer waiting for the culture to change.

THCA: There are technical/tactical hurdles, and methodological ones. The technical scale/speed ones were relatively easy to deal with once we started building our own software OdinText. Computing power continues to increase, and the rest is really about optimizing code.

The methodological hurdles are far more challenging. It’s relatively easy to look at what others have done, or even to come up with new ideas. But you do have to be willing to experiment, and more than just willingness, you need to have the time and the data to do it! There is a lot of software coming out of academia now. They like to mention their institution in every other sentence “MIT this” or ‘UCLA that”. The problem they face is twofold. On the one hand they don’t have access to enough real data to see if their theories play out. Secondly, they don’t have the real world business experience and access to clients to know what things are actually useful and which are just novelty.

So, our biggest hurdle has been the time and effort invested through empirical testing. It hasn’t always been easy, but it’s put me and my company in an incredibly unique position.

Q2. Size of data, does it really matter? How much data is too little or too much?

THCA: Great question, with text analytics size really does matter. While it’s technically possible to get insights from very small data, for instance on our blog during the elections one of my colleagues did a little analysis of Romney VS. Obama debate transcripts, text analytics really is data mining, and when you’re looking for patterns in text, the more data you have the more interesting relationships you can find.

KB: Size of data doesn't really matter if you are just getting started. You should get busy with analytics regardless of how little data you have. The important thing is to identify what you need (new skills, technologies, processes, and data-oriented business objectives) in order to take advantage of your digital resources and data streams. As you become increasingly comfortable with those, then you will grow in confidence to step up your game with bigger data sets. If you are already confident and ready-to-go, then go! The big data revolution is like a hyper-speed train -- you cannot wait for it to stop in order to get on board -- it isn't stopping or slowing down! At the other extreme, we do have to wonder if there is such a thing as too much data. The answer to this question is "yes" if we dive into big data's deep waters blindly without the appropriate "swimming instruction" (i.e., without the appropriate skills, technologies, processes, and data-oriented business objectives). However, with the right preparations, we can take advantage of the fact that bigger data collections enable a greater depth of discovery, insight, and data-driven decision support than ever before imagined.

Q3. What is the one thing that motivates and inspires you the most in your Big Data Analytics work?

KB: Discovery! As a scientist, I was born curious. I am motivated and inspired to ask questions, to seek answers, to contemplate what it all means, and then to ask more questions. The rewards from these labors are the discoveries that are made along the way. In data analytics, the discoveries may be represented by a surprising unexpected pattern, trend, association, correlation, event, or outlier in the data set. That discovery then becomes an intellectual challenge (that I love): What does it mean? What new understanding does this discovery reveal about the domain of study (whether it is astrophysics, or retail business, or national security, or healthcare, or climate, or social, or whatever)? The discovery and the corresponding understanding are the benefits of all the hard work of data wrangling.

THCA: Anyone working with analytics has to be curious by nature. Satisfying that curiosity is what drives us. More specifically in my case, if our clients get excited about using our software and the insights they’ve uncovered, then that really gets me and my whole team excited. This can be challenging, and not all data is created equal.

It can be hard to tell someone who is excited about trying Text Analytics that their data really isn’t suitable. The opposite is even more frustrating though, knowing that a client has some really interesting data but is apprehensive about trying something new because they have some old tools lying around that they haven’t used, or because they have a difficult time getting access to the data because it’s technically “owned” by some other department that doesn’t ‘Get’ analytics. But helping them build a case and then helping them look good by making data useful to the organization really feeds into that basic curiosity. We often discover problems to solve we had no idea existed. And that’s very inspiring and rewarding.

Q4. Which big data analytics myth would you like to squash right here and now?

KB: Big data is not about data volume! That is the biggest myth and red herring in the business of big data analytics. Some people say that "we have always had big data", referring to the fact that each new generation has more data than the previous generation's tools and technologies are able to handle. By this reasoning, even the ancient Romans had big data, following their first census of the known world. But that's crazy. The truth of big data analytics is that we are now studying, measuring, tracking, and analyzing just about everything through digital signals (whether it is social media, or surveillance, or satellites, or drones, or scientific instruments, or web logs, or machine logs, or whatever). Big data really is "everything, quantified and tracked". This reality is producing enormously huge data volumes, but the real power of big data analytics is in "whole population analysis", signaling a new era in analytics: the "end of demographics", the diminished use of small samples, the "segment of one", and a new era of personalization. We have moved beyond mere descriptive analysis, to predictive, prescriptive, and cognitive analytics.

THCA: Tough one. There are quite a few. I’ll avoid picking on “social media listening” for a bit, and pick something else. One of the myths out there is that you have to be some sort of know it all ‘data scientist’ to leverage big data. This is no longer the truth. Along with this you have a lot dropping of buzz words like “natural language processing” or “machine learning” which really don’t mean anything at all.

If you understand smaller data analytics, then there really is no reason at all that you shouldn’t understand big data analytics. Don’t ever let someone use some buzz word that you’re not sure of to impress you. If they can’t explain to you in layman’s terms exactly how a certain software works or how exactly an analysis is done and what the real business benefit is, then you can be pretty sure they don’t actually have the experience you’re looking for and are trying to hide this fact.

Q5.What’s more important/valuable, structured or unstructured data?

KB: Someone said recently that there is no such thing as unstructured data. Even binary-encoded images or videos are structured. Even free text and sentences (like this one) are structured (through the rules of language and grammar). Even some meaning this sentence has. One could say that analytics is the process of extracting order, meaning, and understanding from data. That process is made easier when the data are organized into databases (tables with rows and columns), but the importance and value of the data are inherently no more or no less for structured or unstructured data. Despite these comments, I should say that the world is increasingly generating and collecting more "unstructured data" (text, voice, video, audio) than "structured data" (data stored in database tables). So, in that sense, "unstructured data" is more important and valuable, simply because it provides a greater signal on the pulse of the world. But I now return to my initial point: to derive the most value from these data sources, they need to be analyzed and mined for the patterns, trends, associations, correlations, events, and outliers that they contain. In performing that analysis, we are converting the inherent knowledge encoded in the data from a "byte format" to a "structured information format". At that point, all data really become structured.

THCA: A trick question. We all begin with a question and relatively unstructured data. The goal of text analytics is structuring that data which is often most unstructured.

That said, based on the data we often look at (voice of customer surveys, call center and email data, various other web based data), I’ve personally seen that the unstructured text data is usually far richer. I say that because we can usually take that unstructured data and accurately predict/calculate any of the available structured data metrics from it. On the other hand, the unstructured data usually contain a lot of additional information not previously available in the structured data. So unlocking this richer unstructured data allows us to understand systems and processes much better than before and allows us to build far more accurate models.

So yes, unstructured/text data is more valuable, sorry.

Q6. What do you think is the biggest difference between big data analysis being done in academia vs in business?

KB: Perhaps the biggest difference is that data analysis in academia is focused on design (research), while business is focused on development (applications). In academia, we are designing (and testing) the optimal algorithm, the most effective technique, the most efficient methodology, and the most novel idea. In business, you might be 100% satisfied to apply all of those academic results to your business objectives, to develop products and services, without trying to come up with a new theory or algorithm. Nevertheless, I am actually seeing more and more convergence (though that might be because I am personally engaged in both places through my academic and consulting activities). I see convergence in the sense that I see businesses who are willing to investigate, design, and test new ideas and approaches (those projects are often led by data scientists), and I see academics who are willing to apply their ideas in the marketplace (as evidenced by the large number of big data analytics startups with university professors in data science leadership positions). The data "scientist" job category should imply that some research, discovery, design, modeling, and hypothesis generation and testing are part of that person's duties and responsibilities. Of course, in business, the data science project must also address a business objective that serves the business needs (revenue, sales, customer engagement, etc.), whereas in academia the objective is often a research paper, or a conference presentation, or an educational experience. Despite those distinctions, data scientists on both sides of the academia-business boundary are now performing similar big data analyses and investigations. Boundary crossing is the new normal, and that's a very good thing.

THCA: I kind of answered that in the first question. I think academics have the freedom and time to pursue a research objective even if it doesn’t have an important real outcome. So they can pick something fun, that may or may not be very useful, such as are people happier on Tuesdays or Wednesday’s? They’ll often try to solve these stated objectives in some clever ways (hopefully), though there’s a lot of “Pop” research going on even in academia these days. They are also often limited in the data available to them, having to work with just a single data set that has somehow become available to them.

So, academia is different in that they raise some interesting fun questions, and sometimes the ideas borne out of their research can be applied to business.

Professional researchers have to prove an ROI in terms of time and money. Of course, technically we also have access to both more time and more money, and also a lot more data. So an academic team of researcher working on text analytics for 2-3 years is not going to be exposed to nearly as much data as a professional team.

That’s also why academic researchers often seem so in love with their models and accuracy. If you only have a single data set to work with, then you split it in half and use that for validation. In business on the other hand, if you are working across industries like we do, while we certainly may build and validate models for a specific client, we know that having a model that works across companies or industries is nearly impossible. But when we do find something that works, you can bet it’s going to be more likely to be useful.

Text Analysis of 2012 Presidential Debates

Obama more certain and positive - Romney more negative and direct Lately there's been a craze in analyzing 140 character Tweets to make all sorts of inferences in regard to everything from brand affinity to political opinion. While I'm generally of the position that the best return on investment of text analytics is on large volumes of comments, I fear we often overlook other interesting data sources in favor of what a small percentage (about 8%) of the population says in tweets or blogs.

When the speakers are the current and possibly next president of the US, looking at what if anything can be gained by leveraging text analytics on even very small data sets start becoming more interesting.

Therefore ahead of the final presidential debate between Obama and Romney we uploaded the last two presidential debates into our text analytics software, OdinText, to see what if anything political pundits and strategists might find useful. OdinText read and coded the debates in well under a minute, and below are some brief top-line findings for those interested.

[Note, typically text analytics should not be used in isolation from human domain expert analysis. However, in the spirit of curiosity, and in hopes of providing a quick and unbiased analysis we're providing these findings informally ahead of tonight's debate.]

The Devil in the Detail

Comments from sources like a debate are heavily influenced by the questions that are asked by the moderator. Therefore, unlike analysis of more free flowing unguided comments by the many, where often the primary benefit of text analytics is to understand what is being discussed by how many, the benefit of analyzing a carefully moderated discussion between just two people is more likely to lie in the detail. Therefore rather than focusing on the typical charts quantifying exactly which issues are discussed which are technically controlled by the moderator the focus of text analytics on these types of smaller data is on the details of exactly how things are said as well as what often isn't said or avoided.

That's not the right answer for America. I'll restore the vitality that gets America working again (Governor Romney Debate #2)

In text analysis of the debates the first findings often reveal frequency differences in specific terms and issues such as the fact that Governor Romney is far more likely than President Obama to mention "America" when speaking (88 vs. 42 times across the two first debates). We make no assumptions in this analysis whether or not this is a strategic consideration during the debates, or is a matter of personal style, and whether or not it has a beneficial impact on the audience.

However, certainly the differences in frequency and repetition of certain terms mentioned by a speaker such as "Millions looking for work" obviously do reflect how important the speaker believes these issues may be. How Obama and Romney refer to the audience, the moderator and to US citizens is easy to quantify and may also play a role in how they are perceived. For instance Romney prefers the term "people" (used 77 times in the second debate vs. Obama's 26 times), whereas Obama prefers the term "folks" (19 times vs. Romney's 2 times). Text Analytics also quickly identified that unlike the case in the first debate, Obama was twice as likely as Romney to mention the moderator "Candy" by name in the second debate.

Certain terms like "companies", "taxes" and "families" were favored more by Obama/avoided by Romney. Conversely, Romney was significantly more likely to mention measuring terms though many were rather indefinite such as "number", "high" and "half" I.e. "...unemployment at chronically high level...", we did however also see an attempt by Romney to reference specific percentages as well. Obviously, text analytics cannot fact check quantitative claims; this is where domain expertise by a human analyst comes into play.

From Specific terms to general Linguistic Differences Taking text analytics a step beyond the specifics to analyze emotion and linguistic measures of speech can also be interesting...

Volume and Complexity (Obama more complex - Romney more verbose)

In both debates, Romney spoke approx. 500 more words than Obama (7% and 6% more words, respectively); this greater talkativeness sometimes reflects a more competitive/aggressive behavior. Obama on the other hand used more sophisticated language than Romney in the first debate (7% more words with 6 or more letters, see chart presenting percentage differences in the use of certain types of language by the two candidates; comparisons were done separately for the first and second debate). However, he reduced the use of such language in his speech during the second debate.

Past, Present and Future Tense (Obama explains past - Romney focuses on future)

Both candidates were equally likely to speak in the present tense. However, in both debates Obama was significantly more likely than Romney to speak in the past tense in both debates (55% and 18% more often, respectively). Romney on the other hand was more likely to speak in the future tense in both debates (60% and 34% more often). This contrast between past versus future orientation in the debates is of course in part explained by their differing status, that is, Obama's prior presidential experience and Romney's aspiration to become elected for this office in the future.

Personal Pronouns (Obama Collectivist - Romney Direct)

Whereas, both candidates expressed equally often an individualistic tone in their speaking (i.e., the frequency of the use of 1st person singular pronouns e.g., I, me, mine), Obama in both debates was more likely to use a collectivist tone (42% and 60% more, respectively). This use of 1st person plural pronouns e.g., we, us, our), often suggests a stronger identification with a group, team, nation. In part this may coincide with Obama's slogan from the first elections ("Yes, we can.), which may reflect collectivist rather than individualist values.

In the second debate, Romney used direct language more often than Obama, by addressing the president and/or the moderator. Romney was 57% more likely than Obama to use 2nd person personal pronouns, e.g., you, your). For instance, in phrases like "Let me give you some advice. Look at your pension. You also have investments in Chinese companies (...)" or "Thank you Kerry for your question." Obama, on the other hand, reduced the use of such language from the first to the second debate (using 38% more direct language in the first debate as compared to the second).

Emotion (Obama more positive - Romney more negative)

The analysis of the emotional content of the debate revealed that candidates' speeches was often emotionally charged but the focus on the positive or negative affect differed among the candidates. Both candidates used positive emotions equally often in the first debate and they used negative emotions equally often in the second debate.

Emotional tone of candidates' speech could have had an important impact on their perception by the audience. Especially, heavier use of negative affect by Romney in the first debate could have made the voters pay more attention to him and possibly offer more support.

In the first debate, Romney used significantly more negative emotions in his speech (54% more often than Obama) and in particular he expressed more words pertaining to sadness (169% more often than Obama). Conversely, in the second debate, Obama's speech was significantly more likely to contain positive emotions than Romney's (12% more).

Complexity (Obama Ideas - Romney Details)

In both debates, Obama used cognitive language more often than Romney (10% and 13% more in the first and second debates, respectively). Cognitive language contains references to knowledge, awareness, thinking, etc. Obama was also more likely to use language pertaining to causation (75% and 30% more often in the first and second debates) and in the second debate he was also 47% more likely than Romney to express certainty in his speech. The latter may also be partly reflective of a more confident tone of Obama during the second debate in which his performance has been deemed better than the first.

In this same debate, Romney was 47% more likely than Obama to make references to insight and sources of knowledge. Related to this, in both debates Romney speech indicated a greater insistence on numbers/quantitative data and details (75% and 65% more often).

General Issues Focus (Obama Society & Family - Romney Healthcare & Jobs)

Even though the topics discussed during the debates were prompted and moderated, some patterns of heavier focus on certain issues by the two candidates emerged. Romney made significantly more references to health issues than Obama did during the first debate (43% more). In the first debate, Romney was also more likely to mention occupational issues (26% more often) as well as achievement (36%). Obama, on the other hand, referred to social relationships and family significantly more often than Romney in both debates (social relationships - 9% and 6% more often; family - 104% and 138% more often). Both candidates referred to financial issues equally often in both debates, though this area was mentioned less often during the second debate.

Linguistic Summary (Key Differences by Speaker in Debates)

As mentioned earlier, whether specific use of language by the two candidates was intentional or not, whether it was part of the candidate's tactic, or a mere reflection of the character and demographic background is unclear without deeper analysis by a domain expert. Nevertheless, some of the above linguistic differences may certainly have contributed to a candidate winning over more audience support in one or both of the debates. The diagram above presents in a visual form which parts of speech differed significantly between the two candidates. Those marked in bold highlight speech categories that were used by a candidate significantly more often during only one of the debates, hinting at debate-specific language style. For instance, unique during the first debate was Obama's use of sophisticated language, where Romney relied more on negative emotions, sadness and focused more on health, occupational, and achievement issues. These speech categories were not used significantly more often by either candidate in the second debate. In the latter debate, Obama relied more on the use of positive emotions and certainty in his language, whereas Romney used more direct language and references to insight.

Conclusion (Negative VS Positive Emotion and Certainty Related to Specific Issues)

Debates are certainly a unique type of unstructured data. The debate follows a predetermined outline, is moderated, and we can assume both participants have invested time anticipating and practicing responses which their team believes will have maximum possible effect for their side. To what extent the types of speech used was intentional or simply related to these different questions and political position of the candidates is hard to say without further research and analysis.

However, if I were on either candidate's political team, I think even this rather quick text analysis would be useful. As the general consensus is that Romney performed better in the first debate and Obama in the second one, a strategic recommendation may be for Romney to counter Obama's sophistication on certain issues with negativity and focus on areas where Obama seems to want to focus less on such as health care and Jobs. Conversely, I might counsel Obama to counter Romney's negative emotion with even greater positive emotion when possible, and continue/encourage Romney to go into more detail and counter these with the certainty present in his speech from debate #2.

Further analysis would be needed to better understand exactly what impact the various speech patterns had in the debate. That said, it seems some tactics known to be successful in social and business situations have been used during the debates. For instance, Obama by using more 1st person plural pronouns (e.g., we, our, us) may be identifying better with the entire nation and thus may have created a feeling of unity, shared goals and beliefs with the public.

This simple tactic has been used by managers and orators for a long time. Sometimes the use of more individualistic language may lead to too much separation and loss of potential support. However, we also need to acknowledge that different strategies are successful for candidates at different stages. For instance, negative emotions are likely resulting from Romney's critique of the current state of affairs and Obama's actions. Negative emotion here and in moderation may well be an appropriate choice of language for someone aspiring to change things.

Conversely, Obama responding and reflecting on his past 4 years in office using more positive affect is an obvious way of presenting his experience and work as a president in the better light.

A very exciting line of further research could explore candidate's facial expressions during the debate. They may match onto findings from the text analysis (e.g., amount of positive versus negative emotions) but may also reveal interesting discrepancies and tendencies of the candidates. It would be an interesting analysis because body language can be as an important a source of information as spoken language and it can be very a powerful tool in winning over support. This new avenue of research could be very helpful in understanding which candidate received more support and whether it was only influenced by political attitudes, language, or body language of candidates or a combination of the three.

Ideally further analysis combining text analytics with other data from people meters, facial expressions, or other biometric measures could help answer some of these questions more definitively and provide insight into exactly how powerful language choice and style can be.

@TomHCAnderson @OdinText

PS. Special thanks to my colleague Dr. Gosia Skorek for indulging my idea and helping me run these data so quickly on a Saturday! ;)

[NOTE: There are several ways to text analyze this type of data. The power of text analytics depends on the volume and quality of data available, domain expertise, time invested and creativity of the analyst, as well as other methodological considerations on how the data can be processed using the software. Anderson Analytics - OdinText is not a political consultancy, and our focus is generally on much larger volumes of comments within the consumer insights and CRM domain. Those interested in more detail regarding the analysis may contact us at odintext.com]