Text Analysis of 2012 Presidential Debates
Obama more certain and positive - Romney more negative and direct Lately there's been a craze in analyzing 140 character Tweets to make all sorts of inferences in regard to everything from brand affinity to political opinion. While I'm generally of the position that the best return on investment of text analytics is on large volumes of comments, I fear we often overlook other interesting data sources in favor of what a small percentage (about 8%) of the population says in tweets or blogs.
When the speakers are the current and possibly next president of the US, looking at what if anything can be gained by leveraging text analytics on even very small data sets start becoming more interesting.
Therefore ahead of the final presidential debate between Obama and Romney we uploaded the last two presidential debates into our text analytics software, OdinText, to see what if anything political pundits and strategists might find useful. OdinText read and coded the debates in well under a minute, and below are some brief top-line findings for those interested.
[Note, typically text analytics should not be used in isolation from human domain expert analysis. However, in the spirit of curiosity, and in hopes of providing a quick and unbiased analysis we're providing these findings informally ahead of tonight's debate.]
The Devil in the Detail
Comments from sources like a debate are heavily influenced by the questions that are asked by the moderator. Therefore, unlike analysis of more free flowing unguided comments by the many, where often the primary benefit of text analytics is to understand what is being discussed by how many, the benefit of analyzing a carefully moderated discussion between just two people is more likely to lie in the detail. Therefore rather than focusing on the typical charts quantifying exactly which issues are discussed which are technically controlled by the moderator the focus of text analytics on these types of smaller data is on the details of exactly how things are said as well as what often isn't said or avoided.
That's not the right answer for America. I'll restore the vitality that gets America working again (Governor Romney Debate #2)
In text analysis of the debates the first findings often reveal frequency differences in specific terms and issues such as the fact that Governor Romney is far more likely than President Obama to mention "America" when speaking (88 vs. 42 times across the two first debates). We make no assumptions in this analysis whether or not this is a strategic consideration during the debates, or is a matter of personal style, and whether or not it has a beneficial impact on the audience.
However, certainly the differences in frequency and repetition of certain terms mentioned by a speaker such as "Millions looking for work" obviously do reflect how important the speaker believes these issues may be. How Obama and Romney refer to the audience, the moderator and to US citizens is easy to quantify and may also play a role in how they are perceived. For instance Romney prefers the term "people" (used 77 times in the second debate vs. Obama's 26 times), whereas Obama prefers the term "folks" (19 times vs. Romney's 2 times). Text Analytics also quickly identified that unlike the case in the first debate, Obama was twice as likely as Romney to mention the moderator "Candy" by name in the second debate.
Certain terms like "companies", "taxes" and "families" were favored more by Obama/avoided by Romney. Conversely, Romney was significantly more likely to mention measuring terms though many were rather indefinite such as "number", "high" and "half" I.e. "...unemployment at chronically high level...", we did however also see an attempt by Romney to reference specific percentages as well. Obviously, text analytics cannot fact check quantitative claims; this is where domain expertise by a human analyst comes into play.
From Specific terms to general Linguistic Differences Taking text analytics a step beyond the specifics to analyze emotion and linguistic measures of speech can also be interesting...
Volume and Complexity (Obama more complex - Romney more verbose)
In both debates, Romney spoke approx. 500 more words than Obama (7% and 6% more words, respectively); this greater talkativeness sometimes reflects a more competitive/aggressive behavior. Obama on the other hand used more sophisticated language than Romney in the first debate (7% more words with 6 or more letters, see chart presenting percentage differences in the use of certain types of language by the two candidates; comparisons were done separately for the first and second debate). However, he reduced the use of such language in his speech during the second debate.
Past, Present and Future Tense (Obama explains past - Romney focuses on future)
Both candidates were equally likely to speak in the present tense. However, in both debates Obama was significantly more likely than Romney to speak in the past tense in both debates (55% and 18% more often, respectively). Romney on the other hand was more likely to speak in the future tense in both debates (60% and 34% more often). This contrast between past versus future orientation in the debates is of course in part explained by their differing status, that is, Obama's prior presidential experience and Romney's aspiration to become elected for this office in the future.
Personal Pronouns (Obama Collectivist - Romney Direct)
Whereas, both candidates expressed equally often an individualistic tone in their speaking (i.e., the frequency of the use of 1st person singular pronouns e.g., I, me, mine), Obama in both debates was more likely to use a collectivist tone (42% and 60% more, respectively). This use of 1st person plural pronouns e.g., we, us, our), often suggests a stronger identification with a group, team, nation. In part this may coincide with Obama's slogan from the first elections ("Yes, we can.), which may reflect collectivist rather than individualist values.
In the second debate, Romney used direct language more often than Obama, by addressing the president and/or the moderator. Romney was 57% more likely than Obama to use 2nd person personal pronouns, e.g., you, your). For instance, in phrases like "Let me give you some advice. Look at your pension. You also have investments in Chinese companies (...)" or "Thank you Kerry for your question." Obama, on the other hand, reduced the use of such language from the first to the second debate (using 38% more direct language in the first debate as compared to the second).
Emotion (Obama more positive - Romney more negative)
The analysis of the emotional content of the debate revealed that candidates' speeches was often emotionally charged but the focus on the positive or negative affect differed among the candidates. Both candidates used positive emotions equally often in the first debate and they used negative emotions equally often in the second debate.
Emotional tone of candidates' speech could have had an important impact on their perception by the audience. Especially, heavier use of negative affect by Romney in the first debate could have made the voters pay more attention to him and possibly offer more support.
In the first debate, Romney used significantly more negative emotions in his speech (54% more often than Obama) and in particular he expressed more words pertaining to sadness (169% more often than Obama). Conversely, in the second debate, Obama's speech was significantly more likely to contain positive emotions than Romney's (12% more).
Complexity (Obama Ideas - Romney Details)
In both debates, Obama used cognitive language more often than Romney (10% and 13% more in the first and second debates, respectively). Cognitive language contains references to knowledge, awareness, thinking, etc. Obama was also more likely to use language pertaining to causation (75% and 30% more often in the first and second debates) and in the second debate he was also 47% more likely than Romney to express certainty in his speech. The latter may also be partly reflective of a more confident tone of Obama during the second debate in which his performance has been deemed better than the first.
In this same debate, Romney was 47% more likely than Obama to make references to insight and sources of knowledge. Related to this, in both debates Romney speech indicated a greater insistence on numbers/quantitative data and details (75% and 65% more often).
General Issues Focus (Obama Society & Family - Romney Healthcare & Jobs)
Even though the topics discussed during the debates were prompted and moderated, some patterns of heavier focus on certain issues by the two candidates emerged. Romney made significantly more references to health issues than Obama did during the first debate (43% more). In the first debate, Romney was also more likely to mention occupational issues (26% more often) as well as achievement (36%). Obama, on the other hand, referred to social relationships and family significantly more often than Romney in both debates (social relationships - 9% and 6% more often; family - 104% and 138% more often). Both candidates referred to financial issues equally often in both debates, though this area was mentioned less often during the second debate.
Linguistic Summary (Key Differences by Speaker in Debates)
As mentioned earlier, whether specific use of language by the two candidates was intentional or not, whether it was part of the candidate's tactic, or a mere reflection of the character and demographic background is unclear without deeper analysis by a domain expert. Nevertheless, some of the above linguistic differences may certainly have contributed to a candidate winning over more audience support in one or both of the debates. The diagram above presents in a visual form which parts of speech differed significantly between the two candidates. Those marked in bold highlight speech categories that were used by a candidate significantly more often during only one of the debates, hinting at debate-specific language style. For instance, unique during the first debate was Obama's use of sophisticated language, where Romney relied more on negative emotions, sadness and focused more on health, occupational, and achievement issues. These speech categories were not used significantly more often by either candidate in the second debate. In the latter debate, Obama relied more on the use of positive emotions and certainty in his language, whereas Romney used more direct language and references to insight.
Conclusion (Negative VS Positive Emotion and Certainty Related to Specific Issues)
Debates are certainly a unique type of unstructured data. The debate follows a predetermined outline, is moderated, and we can assume both participants have invested time anticipating and practicing responses which their team believes will have maximum possible effect for their side. To what extent the types of speech used was intentional or simply related to these different questions and political position of the candidates is hard to say without further research and analysis.
However, if I were on either candidate's political team, I think even this rather quick text analysis would be useful. As the general consensus is that Romney performed better in the first debate and Obama in the second one, a strategic recommendation may be for Romney to counter Obama's sophistication on certain issues with negativity and focus on areas where Obama seems to want to focus less on such as health care and Jobs. Conversely, I might counsel Obama to counter Romney's negative emotion with even greater positive emotion when possible, and continue/encourage Romney to go into more detail and counter these with the certainty present in his speech from debate #2.
Further analysis would be needed to better understand exactly what impact the various speech patterns had in the debate. That said, it seems some tactics known to be successful in social and business situations have been used during the debates. For instance, Obama by using more 1st person plural pronouns (e.g., we, our, us) may be identifying better with the entire nation and thus may have created a feeling of unity, shared goals and beliefs with the public.
This simple tactic has been used by managers and orators for a long time. Sometimes the use of more individualistic language may lead to too much separation and loss of potential support. However, we also need to acknowledge that different strategies are successful for candidates at different stages. For instance, negative emotions are likely resulting from Romney's critique of the current state of affairs and Obama's actions. Negative emotion here and in moderation may well be an appropriate choice of language for someone aspiring to change things.
Conversely, Obama responding and reflecting on his past 4 years in office using more positive affect is an obvious way of presenting his experience and work as a president in the better light.
A very exciting line of further research could explore candidate's facial expressions during the debate. They may match onto findings from the text analysis (e.g., amount of positive versus negative emotions) but may also reveal interesting discrepancies and tendencies of the candidates. It would be an interesting analysis because body language can be as an important a source of information as spoken language and it can be very a powerful tool in winning over support. This new avenue of research could be very helpful in understanding which candidate received more support and whether it was only influenced by political attitudes, language, or body language of candidates or a combination of the three.
Ideally further analysis combining text analytics with other data from people meters, facial expressions, or other biometric measures could help answer some of these questions more definitively and provide insight into exactly how powerful language choice and style can be.
PS. Special thanks to my colleague Dr. Gosia Skorek for indulging my idea and helping me run these data so quickly on a Saturday! ;)
[NOTE: There are several ways to text analyze this type of data. The power of text analytics depends on the volume and quality of data available, domain expertise, time invested and creativity of the analyst, as well as other methodological considerations on how the data can be processed using the software. Anderson Analytics - OdinText is not a political consultancy, and our focus is generally on much larger volumes of comments within the consumer insights and CRM domain. Those interested in more detail regarding the analysis may contact us at odintext.com]