Text Analysis Predicts Your Politics Without Asking

Tom H. C. Anderson
August 2nd, 2016

How What You Say Says Way More Than What You Said

Pretend for a moment that you had a pen pal overseas and they asked you to describe yourself. What would you tell them? What makes you “you”?

It turns out that which traits, characteristics and aspects of your identity you choose to focus on may say more than you realize.

For instance, they can be used to predict whether you are a Democrat or a Republican.

With the U.S. presidential race underway in earnest, I thought it would be interesting to explore what if any patterns in the way people describe themselves could be used to identify their political affiliation.

So we posed the question above verbatim to a nationally representative sample of just over n=1000 (sourced via CriticalMix) and ran the responses through OdinText.

Not surprisingly, responses to this open-ended question were as varied as the people who provided them, but OdinText was nevertheless able to identify several striking and statistically significant differences between the way Republicans and Democrats described themselves.

NOT About Demographics

Let me emphasize that this exercise had nothing to do with demographics. We’re all aware of the statistical demographic differences between Republicans and Democrats.

For our purposes, what if any specific demographic information people shared in describing themselves was only pertinent to the extent that it constituted a broader response pattern that could predict political affiliation.

For example, we found that Republicans were significantly more likely than Democrats to say they have blonde hair.

Of course, this does not necessarily mean that someone with blonde hair is significantly more likely to be a Republican; rather, it simply means that if you have blonde hair, you are significantly more likely to feel it noteworthy to mention it when describing yourself if you are a Republican than if you are a Democrat.

Predicting Politics with Text Analytics

Self-Image: Significant Differences

OdinText’s analysis turned up several predictors predictors for party affiliation, here are 15 examples indexed below.

  • Republicans were far more likely to include their marital status, religion, ethnicity and education level in describing themselves, and to mention that they are charitable/generous.
  • Democrats, on the other hand, were significantly more likely to describe themselves in terms of friendships, work ethic and the quality of their smile.

Interestingly, we turned up quite a few more predictors for Republicans than Democrats, suggesting that the former may be more homogeneous in terms of which aspects of their identities matter most. This translates to a somewhat higher level of confidence in predicting affinity with the Republican Party.

As an example, if you describe yourself as both “Christian” and “married,” without knowing anything else about you I can assume with 90% accuracy that you vote Republican.

Again, this does not mean that Christians who are married are more than 90% likely to be Republicans, but it does mean that people who mention these two things when asked to tell a stranger about themselves are extremely likely to be Republicans.

So What?

While this exercise was exploratory and the results should not be taken as such, it demonstrates that text analytics make it entirely possible to read between the lines and determine far more about you than one would think possible.

Obviously, there is a simpler, more direct way to determine a respondent’s political affiliation: just ask them. We did. That’s how we were able to run this analysis. But it’s hardly the point.

The point is we don’t necessarily have to ask.

In fact, we’ve already built predictive models around social media profiles and Twitter feeds that eliminate the need to pose questions—demographic, or more importantly, psychographic.

Could a political campaign put this capability to work segmenting likely voters and targeting messages? Absolutely.

But the application obviously extends well beyond politics. With an exponentially-increasing flood of Customer Experience FeedbackCRM and consumer-generated text online, marketers could predicatively model all manner of behavior with important business implications.

One final thought relating to politics: What about Donald Trump, whose supporters it has been widely noted do not all fit neatly into the conventional Republican profile? It would be pretty easy to build a predictive model for them, too! And that could be useful given the widespread reports that a significant number of people who plan to vote for him are reluctant to say so.

12 thoughts on “Text Analysis Predicts Your Politics Without Asking”

  1. Indeed. While asking people to describe themselves on a strictly O/E text basis may have limited applications, I think you are suggesting that other text and unstructured data could be combined and analyzed that would achieve the same ends.

    If I said knowing whether or not a vehicle was bought new, used, the miles or age of the vehicle when bought, the Make, Model, Model Year, and body style (2d, 4dr, sedan, SUV, etc.) would be predictors of political party, would you believe me?

  2. Yes exactly Terry. We could use social media profiles, or better yet someone’s entire twitter feed, or I could use an email from you (or all emails you’ve ever sent me), our text or SMS chat, or a transcript of a telephone conversation we’re having in realtime to predict all kids of things about you.
    No I’m not at all surprised the data points you mention would be predictive of party. See our previous post on Brands and Politics here: http://odintext.com/blog/brand-analytics-tips-branding-and-politics/

  3. Interesting stuff Tom. I suspect the distribution is not binary (all Rep vs. all Dems- there are probably sub-clusters within the two populations with substantially different characteristics and interaction variables that impact these characteristics. For instance, wonder what education level and geographic location splits do to the above chart.

  4. The chart (showing some of the predictors in the initial model) above is based on just those who identified themselves as either Republicans or Democrats (we removed any others). In terms of demographics, the sample was pulled to be US Rep in terms of GEO etc.

  5. Makes sense. There’s convergent validity with all the stuff that’s been written about this topic (see for example the Vox article on how whether you a strong believer in “there’s a right way to do things” predicts Trump support far better than demographics, and the WaPo follow up). I’m wondering if you developed the self-description “traits” based on a learning sample and ran predictions vs a test sample, and how well it predicted (though it’s hard to do with n=1000). What I’m worrying about is will “How would you describe yourself?” or other such questions predict heavy usage of xyz brand? What is the predictive validity? Beating demographics is easy. It’s how well we predict that’s the hard part.

  6. Getting a more robust data set now (and a social one). May share some more on this tomorrow if there is interest. On the very first go around with initial set of features, the model was already 62% accurate overall, but we also have people in there that classified themselves as Tea Party, Green, Independent and Other. Anyway, as I mentioned in the post, if you say something like “I’m White, I’ve been married for 10 years, I’m a good Husband..” right about there I’ve got about a 97% accuracy in guessing correctly that you are a Republican.

    RE branding etc. sure. However, we are not limited to the data in a profile. If on Twitter for instance, we could look back in time on your last 50, 100 or even more tweets to get more data (something more contextually relevant perhaps) and increase accuracy… It can also be used on emails, phone calls etc.

  7. This sounds interesting. However, I will like to ask what other functions this text analytics can do. For instance can it be used to analyse transcripts from FGD or KII for a U and A study?

  8. @Ade Certainly it is great for all manner of survey research. We also have several other use cases from analysis of transcribed telephone calls, emails, discussion/rating and other social data etc. We do not recommend it for use on small qualitative projects (n<100) as there are too few data points typically, but there can be usefulness even there.

  9. @Chandra Getting a more robust data set now (and a social one). May share some more on this in the coming days. On the very first go around the model was already 62% accurate overall (vs the 50% random), but we also have people in there that classified themselves as Tea Party, Green, Independent and Other. Anyway, as I mentioned in the post, if you say something like “I’m White, I’ve been married for 10 years, I’m a good Husband..” right about there I’ve got about a 97% accuracy in guessing correctly that you are a Republican.

    RE branding etc. sure. However, we are not limited to the data in a profile. If on Twitter for instance, we could look back in time on your last 50, 100 or even more tweets to get more data (something more contextually relevant perhaps) and increase accuracy…

Comments are closed.

Search

REQUEST A DEMO TODAY!
Become a Data Scientist!
No PhD Required, No Boring and time consuming scripting in Python or SQL Just Bring Your Data. OdinText is as easy! Let us show you how:
* we hate spam and never share your details.
Thank You. We will process your demo request and send instructions shortly.