Posts tagged Gosia
What Does the Co-Occurence Graph Tell You?

Text Analytics Tips - Branding What does the co-occurrence graph tell you?Text Analytics Tips by Gosia

The co-occurrence graph in OdinText may look simple at first sight but it is in fact a very complex visualization. Based on an example we are going to show you how to read and interpret this graph. See the attached screenshots of a single co-occurrence graph based on a satisfaction survey of 500 car dealership customers (Fig. 1-4).

The co-occurrence graph is based on multidimensional scaling techniques that allow you to view the similarity between individual cases of data (e.g., automatic terms) taking into account various aspects of the data (i.e., frequency of occurrence, co-occurrence, relationship with the key metric). This graph plots the co-occurrence of words represented by the spatial distance between them, i.e., it plots as well as it can terms which are often mentioned together right next to each other (aka approximate overlap/concurrence).

Figure 1. Co-occurrence graph (all nodes and lines visible).

The attached graph (Fig. 1 above) is based on 50 most frequently occurring automatic terms (words) mentioned by the car dealership customers. Each node represents one term. The node’s size corresponds to the number of occurrences, i.e., in how many customer comments a given word was found (the greater node’s size, the greater the number of occurrences). In this example, green nodes correspond to higher overall satisfaction and red nodes to lower overall satisfaction given by customers who mentioned a given term, whereas brown nodes reflect satisfaction scores close to the metric midpoint. Finally, the thickness of the line connecting two nodes highlights how often the two terms are mentioned together (aka actual overlap/concurrence); the thicker the line, the more often they are mentioned together in a comment.

Figure 2. Co-occurrence graph (“unprofessional” node and lines highlighted).

So what are the most interesting insights based on a quick look at the co-occurrence graph of the car dealership customer satisfaction survey?

  • “Unprofessional” is the most negative term (red node) and it is most often mentioned together with “manager” or “employees” (Fig. 2 above).
  • “Waiting” is a relatively frequently occurring (medium-sized node) and a neutral term (brown node). It is often mentioned together with “room” (another neutral term) as well as “luxurious”, “coffee”, and “best”, which are corresponding to high overall satisfaction (light green node). Thus, it seems that the luxurious waiting room with available coffee is highly appreciated by customers and makes the waiting experience less negative (Fig. 3 below).
  • The dealership “staff” is often mentioned together with such positive terms as “always”, “caring”, “nice”, “trained”, and “quick” (Fig. 4 below). However, staff is also mentioned with more negative terms including “unprofessional”, “trust”, “helpful” suggesting a few negative customer evaluations related to these terms which may need attention and improvement.

    Figure 3. Co-occurrence graph (“waiting” node and lines highlighted).

    Figure 4. Co-occurrence graph (“staff” node and lines highlighted).

    Hopefully, this quick example can help you extract quick and valuable insights based on your own data!

Gosia

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Customer Satisfaction: What do satisfied vs. dissatisfied customers talk about?

Text Analytics Tips - Branding What do satisfied versus dissatisfied customers talk about? - Group Comparison Example Text Analytics Tips by Gosia

In this post we are going to discuss one of the first questions most researchers tend to explore using OdinText: what do satisfied versus dissatisfied customers talk about? Many market researchers not only seek to find out what the entire population of their survey respondents mentions but it is even more critical for them to understand the strengths mentioned by customers who are happy and the problems mentioned by those who are less happy with the product or service.

To perform this kind of analysis you need to first identify “satisfied” and “dissatisfied” customers in your data. The best way to do it is based on a satisfaction or satisfaction-related metric, e.g., Overall Satisfaction or NPS (Net Promoter Score) Rating (i.e., likelihood to recommend). In this example, satisfied customers are going to be those who answered 4 – “Somewhat satisfied” or 5 – “Very satisfied” to the Overall Satisfaction question (scale 1-5). And dissatisfied customers are those who answered 1 – “Very dissatisfied” or 2 – “Somewhat dissatisfied”.

Next, you can compare the content of comments provided by the two groups of customers (Group Comparison tab). I suggest you first select the frequency of occurrence statistic for your comparison. You can use a dictionary or create your own issues that are meaningful to you and see whether the two groups of customers discuss these issues with different frequency or you can look at any differences in the frequency of most commonly mentioned automatic terms (which OdinText has generated automatically for you).

Figure 1. Frequency of issues mentioned by satisfied (Overall Satisfaction 4-5) versus dissatisfied (Overall Satisfaction 1-2) customers. Descending order of frequency for satisfied customers.Figure 1. Frequency of issues mentioned by satisfied (Overall Satisfaction 4-5) versus dissatisfied (Overall Satisfaction 1-2) customers. Descending order of frequency for satisfied customers.

In the attached figure you can see a chart based on a simple group comparison using a dictionary of terms of a sample service company. There you go, lots of exciting insights to present to your colleagues based on a very quick analysis!

Gosia

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Code by Hand? The Benefits of Automated and User-Guided Automated Customer Comment Coding

Text Analytics Tips - Branding Why you should not code text data by hand: Benefits of automated and user-guided automated coding Text Analytics Tips by Gosia

Most researchers know very well that the coding of text data manually (using human coders who read the text and mark different codes) is very expensive both in terms of time that coders need to take and money needed to compensate them for this effort.

However, the major advantage of using human coding is their high understanding of complex meaning of text including sarcasms or jokes.

Usually at least two coders are required to code any type of text data and the calculation of inter-rater reliability or inter-rater agreement is a must. This statistic enables us to see how similarly any number of coders has coded the data, i.e., how often they have agreed on using the exact same codes.

Often even with the simplest codes the accuracy of human coding is low. No two human coders consistently code larger amounts of data the same way because of different interpretations of text or simply due to error. The latter is a reason why no single coder will code the same text data identically when done for the second time (perfect reliability for a single coder could be achieved in theory though, e.g., for very small datasets that can be proofread multiple times).

Another limitation is that human coders can only keep in their working memory a limited number of codes while reading the text. Finally, any change to the code will require repeating the entire coding process from the beginning. Because the process of manual coding of larger datasets is expensive and unreliable automated coding using computer software was introduced.

Automated or algorithm-based text coding solves many of the issues of human coding:

  1. it is fast (thousands of text comments can be read in seconds)
  2. cost-effective (automated coding should be always cheaper than human coding as it requires much less time)
  3. offers perfect consistency (same rules are applied every time without errors)
  4. an unlimited number of codes can be used in theory (some software might have limitations)

However, this process does also have disadvantages. As already mentioned above, humans are the only ones who can perfectly understand the complex meaning of text and simple algorithms are likely going to fail when trying to understand it (even though some new algorithms are under development recently, which can be almost as good as humans). Moreover, most software available on the market has low flexibility as codes cannot be known to or changed by the user.

Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.

Therefore, OdinText developers decided to let users guide the automated coding. Users can view and edit the default codes and dictionaries, create and upload their own, or build custom dictionaries based on the exploratory results provided by the automated analysis. The codes can be very complex and specific producing a good understanding of the meaning of text, which is the key goal of each text analytics software.

OdinText is a user-guided automated text analytics solution, which has aspects and benefits of both fully automated and human coding. It is fast, cost-effective, accurate, and allows for an unlimited number of codes like many other automated text analytics tools. However, OdinText surpasses the capabilities of other software by providing high flexibility and customization of codes/dictionaries and thus a better understanding of the meaning of text. Moreover, OdinText allows you to conduct statistical analyses and create visualizations of your data in the same software.

Try switching from human coding to user-guided automated coding and you will be pleasantly surprised how easy and powerful it is!

Gosia

Text Analytics Tips with Gosi

[Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

[NOTE: OdinText is NOT a tool for human assisted coding. It is a tool used by analysts for better and faster insights from mixed (structured and unstructured) data.]

Beyond Sentiment - What Are Emotions, and Why Are They Useful to Analyze?
Text Analytics Tips - Branding

Text Analytics Tips - Branding

Beyond Sentiment - What are emotions and why are they useful to analyze?Text Analytics Tips by Gosia

Emotions - Revealing What Really Matters

Emotions are short-term intensive and subjective feelings directed at something or someone (e.g., fear, joy, sadness). They are different from moods, which last longer, but can be based on the same general feelings of fear, joy, or sadness.

3 Components of Emotion: Emotions result from arousal of the nervous system and consist of three components: subjective feeling (e.g., being scared), physiological response (e.g., a pounding heart), and behavioral response (e.g., screaming). Understanding human emotions is key in any area of research because emotions are one of the primary causes of behavior.

Moreover, emotions tend to reveal what really matters to people. Therefore, tracking primary emotions conveyed in text can have powerful marketing implications.

The Emotion Wheel - 8 Primary Emotions

OdinText can analyze any psychological content of text but the primary attention has been paid to the power of emotions conveyed in text.

8 Primary Emotions: OdinText tracks the following eight primary emotions: joy, trust, fear, surprise, sadness, disgust, anger, and anticipation (see attached figure; primary emotions in bold).

Sentiment Analysis

Sentiment Analysis

Bipolar Nature: These primary emotions have a bipolar nature; joy is opposed to sadness, trust to disgust, fear to anger, and surprise to anticipation. Emotions in the blank spaces are mixtures of the two neighboring primary emotions.

Intensity: The color intensity dimension suggests that each primary emotion can vary in ntensity with darker hues representing a stronger emotion (e.g., terror > fear) and lighter hues representing a weaker emotion (e.g. apprehension < fear). The analogy between theory of emotions and the theory of color has been adopted from the seminal work of Robert Plutchik in 1980s. [All 32 emotions presented in the figure above are a basis for OdinText Emotional Sentiment tracking metric].

Stay tuned for more tips giving details on each of the above emotions.

Gosia

Text Analytics Tips with Gosi

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior. 

How to Increase the Amount of Text Data for Analysis

Text Analytics Tips - Branding How to Increase the Amount of Text Data for AnalysisText Analytics Tips by Gosia

If you find yourself slightly disappointed by the quantity or quality of text comments provided by your respondents you are definitely not alone. This is a common problem especially when survey respondents are not compensated for their answers and when they are allowed to leave open-ended questions unanswered.

However, don’t give up and immediately start collecting more data or design a new survey. You current dataset may still contain valuable information in the form of text comments. A good practice is to pool together all text comments from a number of text variables in your dataset. You can select all of them or just a subset that makes the most sense to be analyzed together.

Pooling text data for a richer analysis.

Figure 1. Pooling text data for a richer analysis.

In the attached figure, the bubble on the left represents probably the most frequently analyzed question in customer satisfaction surveys – the open-ended question following a key rating (e.g., Overall Satisfaction Rating or Net Promoters Score Rating). Most of these surveys will have at least one or more very good questions that can compliment the answers given to the open-ended question on the left (see the remaining bubbles on the right of the figure). So why not analyze them altogether? To do that - simply merge these text variables in your data editor remembering to leave a blank space between the content of the columns you are merging.

Conclusion: Enriching your data can be simple and powerful.

This very simple pooling of text data from various open-ended questions will allow you to significantly enrich you analysis in OdinText.

Gosia

 

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Peaks and Valleys or Critical Moments Analysis

Text Analytics Tips - Branding Peaks and Valleys or Critical Moments Analysis Text Analytics Tips by Gosia

 

 How can you gain interesting insights just from looking at descriptive charts based on your data? Select a key metric of interest like Overall Satisfaction (scale 1-5) and using a text analytics software allowing you to plot text data as well as numeric data longitudinally (e.g., OdinText) view your metric averages across time. Next, view the plot using different time intervals (e.g, the plot could display daily, weekly, bi-weekly, or monthly overall satisfaction averages) and look for obvious “peaks” (sudden increases in the average score) or “valleys” (sudden decreases in the average score). Note down the time periods in which you have observed any peaks or valleys and try to identify reasons or events associated with these trends, e.g., changes in management, a new advertising campaign, customer service quality, etc. The next step is to plot average overall satisfaction scores for selected themes and see how they relate to the identified “peaks” or “valleys” as these themes may provide you with potential answers to the identified critical moments in your longitudinal analysis.

In the figure below you can see how the average overall satisfaction of a sample company varied during approximately one month of time (each data point/column represents one day in a given month). Whereas no “peaks” were found in the average overall satisfaction curve, there was one significant “valley” visible at the beginning of the studied month (see plot 1 in Figure 1). It represented a sudden drop from the average satisfaction of 5.0 (day 1) to 3.1 (day 2) and 3.5 (day 3) before again rising up and oscillating around the average satisfaction of 4.3 for the rest of the days that month. So what could be the reason for this sudden and deep drop in customer satisfaction?

Text Anaytics Tip 2a OdinText Gosia

Text Analytics Tip 2b Gosia OdinText

Text NAalytics Tip 2c Gosia OdinText

Figure 1. Annotated OdinText screenshots showing an example of a exploratory analysis using longitudinal data (Overall Satisfaction).

Whereas a definite answer requires more advanced predictive analyses (also available in OdinText), a quick and very easy way to explore potential answers is possible simply by plotting the average satisfaction scores associated with a few themes identified earlier. In this sample scenario, average satisfaction scores among customers who mentioned “customer service” (green bar; second plot) overlap very well with the overall satisfaction trendline (orange line) suggesting that customer service complaints may have been the reason for lowered satisfaction ratings on days 2 and 3. Another theme plotted, “fast service” (see plot 3), did not at all follow the overall satisfaction trendline as customers mentioning this theme were highly satisfied almost on every day except day 6.

This kind of simple exploratory analysis can be very powerful in showing you what factors might have effects on customer satisfaction and may serve as a crucial step for subsequent quantitative analysis of your text and numeric data.

 

Text Analytics Tips with Gosi

 

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Key Driver Analysis: Top-down & Bottom-up Approach

Text Analytics Tips - Branding Get a complete picture of your data: The ‘Top-Down and Bottom-Up Approach’

At OdinText we’ve found that the best way to identify all key drivers in any analysis really, especially in customer experience management (including but not limited to KPI’s such as OSAT, Net Promoter Score, Likelihood to Return or other real behavior) is through a dual process combining a theory-driven (aka “top-down”) and a data-exploratory or data-driven approach (aka “bottom-up”):

Top-Down

This approach requires you to identify important concepts or themes before even starting to explore and analyze your data. In customer satisfaction or brand equity research you can often start by identifying these key concepts by reviewing the strengths and weaknesses associated with your brand or product, or by listing the advantages and challenges that you believe may be prevalent (e.g., good customer service, poor management, professionalism etc.). This is an a priori approach where the user/analyst identifies a few things that they believe may be important.

Bottom-Up

This approach requires you to use a more advanced text analytics software, like OdinText, to mark and extract concepts or themes that are most frequently mentioned in customers’ text comments found in your dataset and that are relevant to your brand or product evaluation (e.g., high cost, unresponsiveness, love). Better analytics software should be able to automatically identify important things that the user/analyst didn’t know to look for.

Top-down vs. Bottom-up

The top-down approach does not reflect the content of your data, whereas the bottom-up approach while being purely based on the data can fail to include important concepts or themes that occur in your data less frequently or is abstracted in some way. For instance, in a recent customer satisfaction analysis, very few customer comments explicitly mentioned problems associated with management of the local branches (therefore, “management” was not mentioned frequently enough to be identified as a key driver by the software using the bottom-up approach).

However as the analyst had hypothesized that management might be an important issue, more subtle mentions associated with the concept of management were included in the analysis. Subsequently predictive analytics revealed that “poor management” was in fact a major driver of customer dissatisfaction. This key driver was only “discovered” due to the fact that the analyst had also used a top-down approach in their text analysis.

It may be that some of the concepts or themes identified using the two approaches overlap but this will only ensure that the most important concepts are included.

Remember, that only when combining these two very different approaches can you confidently identify a complete range of key drivers of satisfaction or other important metrics.

I hope you found today’s Text Analytics Tip useful.

Please check back in the next few days as we plan to post a new interesting analysis similar to, but even more exciting than last week’s Brand Analysis.

-Gosia

Text Analytics Tips with Gosi

 

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Text Analytics Tips

Text Analytics Tips, with your Hosts Tom & Gosia: Introductory Post Today, we’re blogging to let you know about a new series of posts starting in January 2016 called ‘Text Analytics Tips’. This will be an ongoing series and our main goal is to help marketers understand text analytics better.

We realize Text Analytics is a subject with incredibly high awareness, yet sadly also a subject with many misconceptions.

The first generation of text analytics vendors over hyped the importance of sentiment as a tool, as well as ‘social media’ as a data source, often preferring to use the even vaguer term ‘Big Data’ (usually just referring to tweets). They offered no evidence of the value of either, and have usually ignored the much richer techniques and sources of data for text analysis. Little to no information or training is offered on how to actually gain useful insights via text analytics.

What are some of the biggest misconceptions in text analytics?

  1. “Text Analytics is Qualitative Research”

FALSE – Text Analytics IS NOT qualitative. Text Analytics = Text Mining = Data Mining = Pattern Recognition = Math/Stats/Quant Research

  1. It’s Automatic (artificial intelligence), you just press a button and look at the report / wordcloud

FALSE – Text Analytics is a powerful technique made possible thanks to tremendous processing power. It can be easy if using the right tool, but just like any other powerful analytical tools, it is limited by the quality of your data and the resourcefulness and skill of the analyst.

  1. Text Analytics is a Luxury (i.e. structured data analysis is of primary importance and unstructured data is an extra)

FALSE – Nothing could be further from the truth. In our experience, usually when there is text data available, it almost always outperforms standard available quant data in terms of explaining and/or predicting the outcome of interest!

There are several other text analytics misconceptions of course and we hope to cover many of them as well.

While various OdinText employees and clients may be posting in the ‘Text Analytics Tips’ series over time, Senior Data Scientist, Gosia, and our Founder, Tom, have volunteered to post on a more regular basis…well, not so much volunteered as drawing the shortest straw (our developers made it clear that “Engineers don’t do blog posts!”).

Kidding aside, we really value education at OdinText, and it is our goal to make sure OdinText users become proficient in text analytics.

Though Text Analytics, and OdinText in particular, are very powerful tools, we will aim to keep these posts light, fun yet interesting and insightful. If you’ve just started using OdinText or are interested in applied text analytics in general, these posts are certainly a good start for you.

During this long running series we’ll be posting tips, interviews, and various fun short analysis. Please come back in January for our first post which will deal with analysis of a very simple unstructured survey question.

Of course, if you’re interested in more info on OdinText, no need to wait, just fill out our short Request Info form.

Happy New Year!

Your friends @OdinText

Text Analytiics Tips T G

[NOTE: Tom is Founder and CEO of OdinText Inc.. A long time champion of text mining, in 2005 he founded Anderson Analytics LLC, the first consumer insights/marketing research consultancy focused on text analytics. He is a frequent speaker and data science guest lecturer at university and research industry events.

Gosia is a Senior Data Scientist at OdinText Inc.. A PhD. with extensive experience in content analytics, especially psychological content analysis (i.e. sentiment analysis and emotion in text), as well as predictive analytics using unstructured data, she is fluent in German, Polish and Spanish.]

 

Text Analysis of 2012 Presidential Debates

Obama more certain and positive - Romney more negative and direct Lately there's been a craze in analyzing 140 character Tweets to make all sorts of inferences in regard to everything from brand affinity to political opinion. While I'm generally of the position that the best return on investment of text analytics is on large volumes of comments, I fear we often overlook other interesting data sources in favor of what a small percentage (about 8%) of the population says in tweets or blogs.

When the speakers are the current and possibly next president of the US, looking at what if anything can be gained by leveraging text analytics on even very small data sets start becoming more interesting.

Therefore ahead of the final presidential debate between Obama and Romney we uploaded the last two presidential debates into our text analytics software, OdinText, to see what if anything political pundits and strategists might find useful. OdinText read and coded the debates in well under a minute, and below are some brief top-line findings for those interested.

[Note, typically text analytics should not be used in isolation from human domain expert analysis. However, in the spirit of curiosity, and in hopes of providing a quick and unbiased analysis we're providing these findings informally ahead of tonight's debate.]

The Devil in the Detail

Comments from sources like a debate are heavily influenced by the questions that are asked by the moderator. Therefore, unlike analysis of more free flowing unguided comments by the many, where often the primary benefit of text analytics is to understand what is being discussed by how many, the benefit of analyzing a carefully moderated discussion between just two people is more likely to lie in the detail. Therefore rather than focusing on the typical charts quantifying exactly which issues are discussed which are technically controlled by the moderator the focus of text analytics on these types of smaller data is on the details of exactly how things are said as well as what often isn't said or avoided.

That's not the right answer for America. I'll restore the vitality that gets America working again (Governor Romney Debate #2)

In text analysis of the debates the first findings often reveal frequency differences in specific terms and issues such as the fact that Governor Romney is far more likely than President Obama to mention "America" when speaking (88 vs. 42 times across the two first debates). We make no assumptions in this analysis whether or not this is a strategic consideration during the debates, or is a matter of personal style, and whether or not it has a beneficial impact on the audience.

However, certainly the differences in frequency and repetition of certain terms mentioned by a speaker such as "Millions looking for work" obviously do reflect how important the speaker believes these issues may be. How Obama and Romney refer to the audience, the moderator and to US citizens is easy to quantify and may also play a role in how they are perceived. For instance Romney prefers the term "people" (used 77 times in the second debate vs. Obama's 26 times), whereas Obama prefers the term "folks" (19 times vs. Romney's 2 times). Text Analytics also quickly identified that unlike the case in the first debate, Obama was twice as likely as Romney to mention the moderator "Candy" by name in the second debate.

Certain terms like "companies", "taxes" and "families" were favored more by Obama/avoided by Romney. Conversely, Romney was significantly more likely to mention measuring terms though many were rather indefinite such as "number", "high" and "half" I.e. "...unemployment at chronically high level...", we did however also see an attempt by Romney to reference specific percentages as well. Obviously, text analytics cannot fact check quantitative claims; this is where domain expertise by a human analyst comes into play.

From Specific terms to general Linguistic Differences Taking text analytics a step beyond the specifics to analyze emotion and linguistic measures of speech can also be interesting...

Volume and Complexity (Obama more complex - Romney more verbose)

In both debates, Romney spoke approx. 500 more words than Obama (7% and 6% more words, respectively); this greater talkativeness sometimes reflects a more competitive/aggressive behavior. Obama on the other hand used more sophisticated language than Romney in the first debate (7% more words with 6 or more letters, see chart presenting percentage differences in the use of certain types of language by the two candidates; comparisons were done separately for the first and second debate). However, he reduced the use of such language in his speech during the second debate.

Past, Present and Future Tense (Obama explains past - Romney focuses on future)

Both candidates were equally likely to speak in the present tense. However, in both debates Obama was significantly more likely than Romney to speak in the past tense in both debates (55% and 18% more often, respectively). Romney on the other hand was more likely to speak in the future tense in both debates (60% and 34% more often). This contrast between past versus future orientation in the debates is of course in part explained by their differing status, that is, Obama's prior presidential experience and Romney's aspiration to become elected for this office in the future.

Personal Pronouns (Obama Collectivist - Romney Direct)

Whereas, both candidates expressed equally often an individualistic tone in their speaking (i.e., the frequency of the use of 1st person singular pronouns e.g., I, me, mine), Obama in both debates was more likely to use a collectivist tone (42% and 60% more, respectively). This use of 1st person plural pronouns e.g., we, us, our), often suggests a stronger identification with a group, team, nation. In part this may coincide with Obama's slogan from the first elections ("Yes, we can.), which may reflect collectivist rather than individualist values.

In the second debate, Romney used direct language more often than Obama, by addressing the president and/or the moderator. Romney was 57% more likely than Obama to use 2nd person personal pronouns, e.g., you, your). For instance, in phrases like "Let me give you some advice. Look at your pension. You also have investments in Chinese companies (...)" or "Thank you Kerry for your question." Obama, on the other hand, reduced the use of such language from the first to the second debate (using 38% more direct language in the first debate as compared to the second).

Emotion (Obama more positive - Romney more negative)

The analysis of the emotional content of the debate revealed that candidates' speeches was often emotionally charged but the focus on the positive or negative affect differed among the candidates. Both candidates used positive emotions equally often in the first debate and they used negative emotions equally often in the second debate.

Emotional tone of candidates' speech could have had an important impact on their perception by the audience. Especially, heavier use of negative affect by Romney in the first debate could have made the voters pay more attention to him and possibly offer more support.

In the first debate, Romney used significantly more negative emotions in his speech (54% more often than Obama) and in particular he expressed more words pertaining to sadness (169% more often than Obama). Conversely, in the second debate, Obama's speech was significantly more likely to contain positive emotions than Romney's (12% more).

Complexity (Obama Ideas - Romney Details)

In both debates, Obama used cognitive language more often than Romney (10% and 13% more in the first and second debates, respectively). Cognitive language contains references to knowledge, awareness, thinking, etc. Obama was also more likely to use language pertaining to causation (75% and 30% more often in the first and second debates) and in the second debate he was also 47% more likely than Romney to express certainty in his speech. The latter may also be partly reflective of a more confident tone of Obama during the second debate in which his performance has been deemed better than the first.

In this same debate, Romney was 47% more likely than Obama to make references to insight and sources of knowledge. Related to this, in both debates Romney speech indicated a greater insistence on numbers/quantitative data and details (75% and 65% more often).

General Issues Focus (Obama Society & Family - Romney Healthcare & Jobs)

Even though the topics discussed during the debates were prompted and moderated, some patterns of heavier focus on certain issues by the two candidates emerged. Romney made significantly more references to health issues than Obama did during the first debate (43% more). In the first debate, Romney was also more likely to mention occupational issues (26% more often) as well as achievement (36%). Obama, on the other hand, referred to social relationships and family significantly more often than Romney in both debates (social relationships - 9% and 6% more often; family - 104% and 138% more often). Both candidates referred to financial issues equally often in both debates, though this area was mentioned less often during the second debate.

Linguistic Summary (Key Differences by Speaker in Debates)

As mentioned earlier, whether specific use of language by the two candidates was intentional or not, whether it was part of the candidate's tactic, or a mere reflection of the character and demographic background is unclear without deeper analysis by a domain expert. Nevertheless, some of the above linguistic differences may certainly have contributed to a candidate winning over more audience support in one or both of the debates. The diagram above presents in a visual form which parts of speech differed significantly between the two candidates. Those marked in bold highlight speech categories that were used by a candidate significantly more often during only one of the debates, hinting at debate-specific language style. For instance, unique during the first debate was Obama's use of sophisticated language, where Romney relied more on negative emotions, sadness and focused more on health, occupational, and achievement issues. These speech categories were not used significantly more often by either candidate in the second debate. In the latter debate, Obama relied more on the use of positive emotions and certainty in his language, whereas Romney used more direct language and references to insight.

Conclusion (Negative VS Positive Emotion and Certainty Related to Specific Issues)

Debates are certainly a unique type of unstructured data. The debate follows a predetermined outline, is moderated, and we can assume both participants have invested time anticipating and practicing responses which their team believes will have maximum possible effect for their side. To what extent the types of speech used was intentional or simply related to these different questions and political position of the candidates is hard to say without further research and analysis.

However, if I were on either candidate's political team, I think even this rather quick text analysis would be useful. As the general consensus is that Romney performed better in the first debate and Obama in the second one, a strategic recommendation may be for Romney to counter Obama's sophistication on certain issues with negativity and focus on areas where Obama seems to want to focus less on such as health care and Jobs. Conversely, I might counsel Obama to counter Romney's negative emotion with even greater positive emotion when possible, and continue/encourage Romney to go into more detail and counter these with the certainty present in his speech from debate #2.

Further analysis would be needed to better understand exactly what impact the various speech patterns had in the debate. That said, it seems some tactics known to be successful in social and business situations have been used during the debates. For instance, Obama by using more 1st person plural pronouns (e.g., we, our, us) may be identifying better with the entire nation and thus may have created a feeling of unity, shared goals and beliefs with the public.

This simple tactic has been used by managers and orators for a long time. Sometimes the use of more individualistic language may lead to too much separation and loss of potential support. However, we also need to acknowledge that different strategies are successful for candidates at different stages. For instance, negative emotions are likely resulting from Romney's critique of the current state of affairs and Obama's actions. Negative emotion here and in moderation may well be an appropriate choice of language for someone aspiring to change things.

Conversely, Obama responding and reflecting on his past 4 years in office using more positive affect is an obvious way of presenting his experience and work as a president in the better light.

A very exciting line of further research could explore candidate's facial expressions during the debate. They may match onto findings from the text analysis (e.g., amount of positive versus negative emotions) but may also reveal interesting discrepancies and tendencies of the candidates. It would be an interesting analysis because body language can be as an important a source of information as spoken language and it can be very a powerful tool in winning over support. This new avenue of research could be very helpful in understanding which candidate received more support and whether it was only influenced by political attitudes, language, or body language of candidates or a combination of the three.

Ideally further analysis combining text analytics with other data from people meters, facial expressions, or other biometric measures could help answer some of these questions more definitively and provide insight into exactly how powerful language choice and style can be.

@TomHCAnderson @OdinText

PS. Special thanks to my colleague Dr. Gosia Skorek for indulging my idea and helping me run these data so quickly on a Saturday! ;)

[NOTE: There are several ways to text analyze this type of data. The power of text analytics depends on the volume and quality of data available, domain expertise, time invested and creativity of the analyst, as well as other methodological considerations on how the data can be processed using the software. Anderson Analytics - OdinText is not a political consultancy, and our focus is generally on much larger volumes of comments within the consumer insights and CRM domain. Those interested in more detail regarding the analysis may contact us at odintext.com]