Posts tagged Machine Translation
Analitica de Texto En Español

Analitica de Texto En Español – Spanish Text Analysis

Analitica de Texto En Español, I didn’t write that, it is machine translation of "Text Analytics in Spanish"

Mathematics has often been called the Universal Language, but in an age of instant machine translation, any text, or text data, is as understandable as math.

That’s one of the reasons I was very happy to take part in a special series of interviews in celebration of the Spanish Association of Market Research’s 50th Anniversary.

Several of our clients are analyzing non English text with OdinText, but in some ways a single mono lingual analyst being able to instantly analyze the comments of millions of customers speaking multiple foreign languages is even more exciting. And this isn’t science fiction, many of our global clients have been doing this for some time now.

The current issue of AEDMO’s Magazine (Asociación Española de Estudios de Mercado, Marketing y Opinión) celebrates technology in the world of research, and several prominent researchers have been invited to write on their core issues of expertise. I was honored to give an interview on text analytics.

If you don’t get their magazine you can read our Q&A on their blog here in Spanish or English.

Their Editor Xavier Moraño asked some very interesting and pertinent questions.

I’d love to hear your thoughts and questions.

Tom H. C. Anderson Chief Research Officer @OdinText

How Fear of Frexit Helped Macron Win the French Presidential Election
NEW Text Analytics PollTM Shows a Trump-Style Le Pen Upset May Have Been Averted by Overwhelming Opposition to a Frexit

Last week on this blog, I reported findings from a Text Analytics Poll™ of 3,000 French citizens showing that Marine Le Pen’s positioning going into the runoff looked remarkably similar to that of another recent underdog candidate, Donald Trump, just days before his stunning U.S. election upset.

Indeed, a similar set of circumstances appeared to be in play, as noted by the New York Times in an article on Election Day: “Populist anger at the political establishment; economic insecurity among middle class voters; public alienation toward mainstream political parties; rising resentment toward immigrants.”

Yet on Sunday, the French people elected Emmanuel Macron president over Le Pen by about 66/34. So why wasn’t the race closer?

The answer may be in data we collected from French and British respondents, which shows that the prospect of a Le Pen “Frexit” probably figured highly in Macron’s victory.

Positioning: Voting Against a Candidate

Our data in the French presidential poll were eerily reminiscent of data we collected prior to the U.S. election, which suggested a victory may not so much amount to an endorsement of one candidate as a rejection of the other.

Our analysis showed that first and foremost, the French associated Le Pen with bigotry and hatemongering, but text analysis also showed that among the French she was strongly positioned around immigration reform and putting France first—a platform that worked effectively for Trump, who had also been labeled a bigot in the minds of many Americans. In fact, the perception of Trump as a bigot was only slightly lower among Americans than the perception of Le Pen as a bigot among the French (11% vs 15%, respectively).

In contrast, respondents most frequently associated Macron with “liberalism”—meaning economic liberalism favoring free markets—followed by capitalism, neither of which is necessarily an asset in terms of positioning in French politics, particularly for a wealthy investment banker at a time when job security is a major concern among middleclass voters.

But the main platform issue that people associated with Macron—which trailed just behind people’s view of him as a proponent of free markets/capitalism—was Europe/EU, in stark contrast to Le Pen, who was well known to strongly favor an EU “Frexit.” The EU is also synonymous with the free movement of commerce and people, which, of course, stands in contrast to the dual protectionist/anti-immigration platform championed by Le Pen.

This, naturally, begged the question: How important is EU membership to the French population?

If the mood of the French electorate were anything like that of British Brexit voters, then favoring EU membership could be a liability. So just days ahead of the election we ran a second Text Analytics Poll—once again a single question—only this time we polled 3000 voters each in France and the UK:

  1. “What does the European Union mean to you?” (or “Qu'est ce que l'Union Européenne représente pour vous?” in French).

EU Membership Means “Hope”

It’s worth noting that turnout for this election was reportedly the lowest in 36 years. These were presumably voters who never would’ve cast a ballot for Le Pen, but who also could not be mobilized for Macron. In short, they were Macron’s to lose.

This new poll data helps explain why, in spite of inspiring lackluster confidence and support from anti-Le Pen voters, Macron nonetheless won the election by a sizable margin.


While a significant number of the French tell us the EU means nothing to them, this is significantly lower than the Brits who say so.

Conversely, the French are more than five times as likely as Brits to say the EU means “Everything/A Lot” to them. The French are also far less likely than their UK counterparts to criticize the EU for corruption, wastefulness and such.

Instead, the French are extremely optimistic about the EU, with many indicating it provides “future hope” and keeps them out of wars and at “Peace” —something Brits are more likely to attribute to NATO.

High Positive Emotions for EU

Ultimately, emotions are what really drive behavior, and in the end, the French electorate’s highly positive emotional disposition toward the EU—notably their “Anticipation” and hopefulness—may have countered Macron’s relatively weak positioning in this election.


Closing Thoughts

I read some responses to our original analysis that I’d characterize as emotionally overwrought. I understand that this is an occupational hazard for anyone conducting political opinion research, but our duty is to present and report objectively what the data tells us—even if what we’re seeing in the data isn’t necessarily pleasant.

The job of these polls was to assess the candidates’ brand positioning in the minds of voters, and to review the potential opportunities and threats in the “marketplace” as we would for any brand.

I want to stress that I am not discounting people’s distaste for Marine Le Pen’s perceived bigotry as being a key factor behind her loss in this election, but I’ll emphasize again that it was only slightly higher (15% vs 11%) than what we saw for Donald Trump, who, as you know, is now the President of the United States.

And at the end of the day, the hard truth is that more than a third of those who voted in this election voted for a right-wing nationalist—a candidate whose background makes Donald Trump look like a civil rights activist by comparison. Moreover, 25% of the electorate were not sufficiently affronted by Madame Le Pen’s politics to at least vote against her by voting for Macron; instead, they just abstained.

Like many people, I am relieved by the outcome of this election, but it seems clear from the positioning of both candidates—as reported by French citizens, unaided, in their own words—and the data on EU membership from our second poll that the French people did not simply reject Marine Le Pen because she is positioned as a racist/hatemonger; she was on the wrong side of Frexit.


*Note: n=3,000 responses were collected via Google Surveys 3/3-5/5 2017. Google Surveys allow researchers to reach a validated French General Population Representative sample by intercepting people attempting to access high-quality online content or who have downloaded the Google Opinion Rewards mobile app. Results are +/- 2.51% accurate at the 95% confidence interval.

Text Analytics Tips

About Tom H. C. Anderson Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the “Four under 40” market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

Text Analytics Reveals Potential French Election Upset

Text Analytics Poll Shows Le Pen Positioned to “Trump” Macron

To Americans following the French Presidential Election taking place in less than a week, it might appear as though recent history is repeating itself. And in many ways, it is.

Late last week we ran a Text Analytics Poll™ in France, and the results of our analysis bear a striking resemblance to those of an identical poll we ran in the US just a couple of days prior to the November 8 presidential election.

You may recall that in November, just a day before the US Presidential Election, we posted on this blog results from a Text Analytics Poll™ indicating that Hillary Clinton had a major positioning problem that could cost her the election, in contrast to conventional pollsters’ predictions that had almost universally and, it turns out, incorrectly forecast her winning by a sizable Electoral College margin.

Well, as was the case in our US poll, actual comment data from French respondents in their own words indicates a much, much closer race between Emmanuel Macron and Marine Le Pen than the 60/40 split pollsters have thus far predicted.

In fact, as of Sunday night when we closed this poll—exactly one week before the May 7 runoff election—Marine Le Pen looked a lot like Donald Trump.

About this Text Analytics Poll™

For this French election Text Analytics Poll™, we replicated our November US election poll, taking a French general population sample of 3,000, splitting it in half randomly, and asking each half the same single question, substituting only the candidate’s name:

“Without looking, off the top of your mind, what issues does [insert candidate name] stand for?”

We then machine-translated the responses and analyzed them using the patented OdinText software platform, which identified and quantified potentially important themes/ideas/topics in people’s comments and also qualified and quantified the emotions expressed in those comments.

We use this approach because we’ve found time and again that conventional quantitative survey questions—the sort used in political polls—are usually not terrific predictors of actual behavior.

We know that consumers (and, yes, voters) are generally not rational decision-makers; people rely on emotions and heuristics to make most of our decisions. Ergo, if I really want to understand what will drive actual behavior, the surest way to find out is by allowing you to tell me unaided, in your own words, off the top of your head. Oftentimes, we can accomplish this with one, well-designed question!

French Election Outsider vs. Reformer

Much as we saw in the US race, the French electorate appears to be in a decidedly anti-establishment mood. So it’s no surprise under the circumstances that both of the final contenders in the French presidential runoff could accurately be described as “outsiders,” but what voters may really be after is a reformer.

Bernie Sanders and Donald Trump were both considered outsiders and reformers, although unlike Trump, who successfully hijacked the Republican nomination, Sanders failed to pull off a similar grassroots coup in the Democratic primary. As a result, US voters were faced with a choice between a reformer/outsider and an establishment candidate.

Le Pen has been a member of the French Parliament for more than a decade and she held elected office at the regional level before that. She’s also the scion of a famous political family and, more importantly, the former president of a prominent, albeit right wing, political party, the National Front (FN). Le Pen’s relative “outsider” status stems from the fact that the FN has historically promoted a nationalist agenda and was until recently viewed as outside of the political mainstream (and outside the two major coalitions that have alternated between control of the French government for the last 30-plus years).

Emmanuel Macron, too, is a relative outsider. He’s a former Minister of the Economy and founded the “En Marche”(“Forward!”) political movement in 2016, but he has never held elected office and, as of our poll, remains something of a mystery to potential French voters save for the fact that it’s well known that he made a fortune in investment banking.

Whatever you think of her politics, Le Pen clearly qualifies as a reformer, whereas Macron, while an outsider, appears to have a positioning problem around reform. Let’s take a closer look…

It’s All About Brand Positioning… Again

Whether you’re a corporation or a candidate for office, properly positioning your brand in the mind of your target is arguably the single most important part of the marketing process.

As I noted, our US poll back in November strongly suggested that Hillary Clinton was in more trouble than any of the other polling data to that point had indicated, and the problem was one of positioning relative to the competition.


- The #1 most popular response for Hillary Clinton involved the perception of dishonesty/corruption.

- The #1 and #2 most popular responses for Donald Trump related to platform (immigration, followed by pro-USA/America First), followed thirdly by perceived racism/hatemongering.

Again, I’ll emphasize that these responses were not selected from a list of possible choices, but top-of-mind and unaided from voters in their own words.

What the comment data revealed was that Donald Trump’s campaign messaging was very focused around a two issues—immigration and protectionism—and had been effective in galvanizing voters to whom these positions appealed; Hillary Clinton’s messaging was relatively scattered across a variety of issues, and therefor diluted, which made it difficult for voters to identify her with a key issue they could rally around.

And while an alarmingly high proportion of responses to our question were for both candidates emotionally-charged character attacks, the negative emotional disposition toward Hillary Clinton was actually higher than for Donald Trump. In other words, the dislike among people who disliked Hillary Clinton outweighed the dislike among people who disliked Donald Trump. This probably had little to do with Trump campaign messaging—although they certainly capitalized on it—and was more a reflection of the fact that Hillary Clinton had been highly visible and active in national politics for decades and was already positioned in the minds of voters.

How does this relate to what we see in the French Election data?

The chart below depicts responses from the French to our single question after being analyzed by OdinText and sorted by prevalence of topics/themes (coded red for Macron and blue for Le Pen).


First, it’s important to note that there are inherently fewer issues with which politicians can differentiate themselves in French politics than there are in US politics. For example, issues like abortion, education, healthcare, gun ownership, etc., in France are not hotly contested as they are in the States.

In France—like most European countries in the post-Brexit era—political debate centers primarily around economics internally and in relation to other countries (i.e. the EU), security, and, importantly, immigration.

Here, Le Pen’s positioning is unmistakable, as she was frequently associated with immigration, which works in her favor among those who view immigration as a problem. The issue is tied to security, as well, and given the 2015 Paris attacks, the heightened fear about terrorism coupled with domestic economic concerns could lead voters who might have been historically more sympathetic to pro-immigration platforms to actually vote for Le Pen.

That said, like Hillary Clinton, Marine Le Pen is well known to the French, and already positioned in their eyes. Although she has taken steps to soften the perception, respondents to our poll most frequently said she stands for racism/hate/xenophobia, which does not bode well for her candidacy in socially liberal France.

Macron, by contrast, remains a relative enigma to the French people. Almost twice as many French people said they aren’t sure what Macron stands for compared to Le Pen. In fact, Macron is not tied to any standout platform or issue of importance to the French, whereas Le Pen is positioned as a reformer on immigration to an electorate that, again, is not enamored with the status quo.

Moreover, respondents most frequently associated Macron with “liberalism,” followed by capitalism, which are nearly the same. Indeed, I put liberalism in quotes here to make a very important distinction that might have otherwise been lost on Americans who are not familiar with French politics: Liberalism in France actually refers to economic liberalism favoring free markets—almost the opposite of how the term is used in US politics!

Neither liberalism nor capitalism are necessarily assets in terms of positioning in French politics, particularly for a wealthy investment banker at a time when job security is a major concern among voters. Macron has campaigned as a centrist, stating emphatically that ideologically he is neither left nor right, but our data suggests that he is positioned in the minds of the French as something of a neo-conservative and perhaps an elite. Indeed, the Le Pen campaign has been feeding this positioning and tying it to fears about globalization undermining the economic security of the French people.

We do see in the data that Le Pen’s positioning of Macron as a capitalist “sell-out” and instrument of status quo globalists has achieved some success, but it may be too little too late. While 7.8% of the French in our poll view Macron as capitalist/money man, nearly twice as many describe Le Pen as a hatemongering racist (15.3%).

Ironically, we noted in our US poll that Donald Trump was also described as a racist by more than 10% of Americans just days before the election; however, more than 12% of Americans said that Hillary Clinton was dishonest/“crooked.”

The combined chart below shows how both the French and the American candidates appeared in the eyes of respondents from their respective countries. (Again, note that “liberal” for Macron does not mean fiscal or socially liberal as it does in the context of US politics, but refers to free-market economic liberalism.)

French Election 4

French Election 4

Final Analysis

This upcoming election is actually runoff, and the opponents have basically two weeks to position one another. To this point, the job of defining one’s opponent was much trickier because there were five candidates in the race. In US politics, obviously, candidates have a lot more time to cement positioning against a single opponent.

But French campaign strategists are accustomed to operating within this short timeline. The Macron campaign has enjoyed an advantage in that negative positioning around Le Pen was already firmly in place, whereas Macron was relatively unknown. Conversely, the Le Pen campaign now has a huge opportunity to negatively position Macron as an instrument of global bankers and the status quo and to sway voters with a message of protectionism and security at a time when both have high appeal.


The wild card here is the EU. An EU “Frexit” is generally accepted to be less appealing among the majority of French, and although Le Pen has been softening her rhetoric, she is known to strongly favor leaving the EU. Macron, however, is most assuredly opposed to a Frexit, and the data show that respondents understand this difference.

Much like we saw in the US election results foreshadowed by our own polling data, a victory in this election may not so much amount to an endorsement of one candidate as a rejection of the status quo. And of the two candidates, Le Pen is better positioned as the reformer. She could yet ride a wave of populism that Macron is not equipped to tap into.

In short, do not be surprised if Marine Le Pen pulls off a Trump-style upset in the French Presidential Election. The data strongly suggest she is positioned to do so!


*Note: n=3,000 responses were collected via Google Surveys 4/24-4/30 2017. Google Surveys allow researchers to reach a validated French General Population Representative sample by intercepting people attempting to access high-quality online content or who have downloaded the Google Opinion Rewards mobile app. Results are +/- 2.51% accurate at the 95% confidence interval.

Text Analytics Tips

Text Analytics Tips

About Tom H. C. Anderson Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

Marketing Research Blooper Reveals Lots of Surprises and Two Important Lessons

April Foolishness: What Happens When You Survey People in the Wrong Language?

I’m going to break with convention today and, in lieu of an April Fool’s gag, I’m going to tell you about an actual goof we recently made that yielded some unexpected but interesting results for researchers.

As you know, last week on the blog we highlighted findings from an international, multilingual Text Analytics Poll™ we conducted around culture. This particular poll spanned 10 countries and eight languages, and when we went to field it we accidentally sent the question to our U.S. sample in Portuguese!

Shockingly, in many cases, people STILL took the time to answer our question! How?

First, bear in mind that these Text Analytics Polls™ consist of only one question and it’s open-ended, not multiple choice. The methodology we use intercepts respondents online and requires them to type an answer to our question before they can proceed to content they’re interested in.

Under the circumstances, you might expect someone to simply type “n/a” or “don’t understand” or even some gibberish in order to move on quickly, and indeed we saw plenty of that. But in many cases, people took the time to thoughtfully point out the error, and even with wit.

Verbatim examples [sic]:

“Are you kidding me, an old american who can say ¡adios!”

“Tuesday they serve grilled cheese sandwiches.” “What the heck is that language?”

“No habla espanol”

“i have no idea what that means”

“2 years of Spanish class and I still don't understand”

Others expressed themselves more…colorfully…

“No, I don't speak illegal immigrant.”

“Speak English! I'm switching to News 13 Orlando. They have better coverage than FT.”

Author’s note:I suspect that last quote was from someone who was intercepted while trying to access a Financial Times article. ;-)

While a lot of people clearly assumed our question was written in Spanish, still others took the time to figure out what the language was and even to translate the question!

“I had to use google translate to understand the question.”

“what the heck does this mean i don't speak Portuguese”

But what surprised me most was that a lot of Americans actually answered our question—i.e., they complied with what we had asked—even though it was written in Portuguese. And many of those replies were in Spanish!!!

We caught our mistake quickly enough when we went to machine-translate the responses and we were told that replies to a question in Portuguese were now being translated from English to English, but two important lessons were learned here:

Takeaway One: Had we made this mistake with a multiple-choice instrument, we either might not have caught it until after the analysis or perhaps not at all. Not only would respondents not have been able to tell us that we had made a mistake, but they would’ve had the easy option of just clicking a response at random. And unless those random clicks amounted to a conspicuous pattern in the data, we could’ve potentially taken the data as valid!

Takeaway Two: The notion that people will not take the time to thoughtfully respond to an open-ended question is total bunk. People not only took the time to answer our question in detail when it was correctly served to them in their own language, but they even spared a thought for us when they didn’t understand the language!

I want to emphasize here that if you’re one of those researchers (and I used to be among this group, by the way) who thinks you can’t include an open-ended question in a quantitative instrument, compel the respondent to answer it, and get a meaningful answer to your question, you are not only mistaken but you’re doing yourself and your client a huge disservice.

Take it from this April fool, open-ended questions not only tell you what you didn’t know; they tell you what you didn’t know you didn’t know.

Thanks for reading. I’d love to hear what you think!


P.S. Find out how much more value an open-ended question can add to your survey using OdinText. Contact us to talk about it.

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

Text Analytics Explores Whether All Culture Is Becoming American? Part 3

Emotion Speaks Louder than Words Across 11 Cultures, 10 Countries and 8 Languages!

Welcome to Part 3 of our international, multilingual exploration of culture using text analytics!

In Part 1 of this series, I provided a topline analysis of comments from more than 15,500 people spanning 11 cultures in 10 countries and eight languages in response to one question:

“How would you explain <insert country> culture to someone who isn’t at all familiar with it?”

Part 2 took a deeper dive into the key similarities and differences among cultures in our study, revealing how respective members see themselves.

But things got really interesting when OdinText analyzed people’s comments for emotion. Here we have a bit of a surprise. One might expect people’s descriptions of their cultures to be generally positive and for the range of emotions to be fairly narrow, but this was hardly the case. In fact, the emotional analysis revealed much more than just people’s impressions of their own cultures; this exercise tapped into state of mind! You’ll see what I mean in the spider charts for our emotional analysis and verbatim* comments included below.

*Note: Verbatim comments are either translated or [sic]

U.S.A.  (High Positive Sentiment)

Americans are Angry! Twice as Angry as the international average. The Anger is accompanied by high levels of Fear/Anxiety and even Disgust, an emotion we don’t see often outside of food categories and which in this case appears to be related to the recent presidential election.

Joy is also lower than average (and trust is slightly below average), which begs the question: How could we also have a somewhat higher than average overall positive sentiment? The answer lies in a very polarized/divided populous, almost half of whom are bullish and joyful in their descriptions!


[USA Emotions Blue - International Mean Red]

Verbatim examples:

“Expect cordiality and indifference equally, as well as politeness and kindness that may turn to anger and malice. We are all different, and reflections of the world around us. We expect to be treated fairly and bear grudges beyond what is necessary. Racism is a dread poison that has seeped into the veins of our country. While none truly want to take the antidote. There is no standard in our country, people are all different as America breeds individuality.” – FEAR/ANXIETY (and mixed emotion)

“the expression of self in the most obnoxious form one can think of…Fat, dumb and ugly, Loud and obnoxious, Donald Trump - the ugly American” – DISGUST

“I honestly don't know what American culture is. We're such a large country, not at all homogenous. I think we have regional cultures and I would be comfortable explaining southern culture to someone. In the south, most people are neighborly, incredibly polite, and have a strong sense of pride for their region. I would have said a unifying feature of American citizenry was out unified devotion to country, but even that is questionable at times. Overall, I think it depends where in America one is.” – TRUST

“Freedom. Even with all the stuff going on, we still have the best country in the world because we have freedom of speech, choice, and worship…” – JOY

UNITED KINGDOM (Average Positive Sentiment)

In the UK, emotion around culture scored pretty average with one notable exception: Fear/Anxiety registered almost twice as high as the international average (although neither was as pronounced as what Americans expressed).


Verbatim examples:

“Difficult to say as different parts of Britain have different cultures… difficult to understand Polite hypocritical compassionate confusing people” – FEAR/ANXIETY

“We have a prime minister we didn't elect, England messed up Scotland independence, Brexit is a disaster but we never give up.” – FEAR/ANXIETY

“Unsure, confused and varied… It's dead” – FEAR/ANXIETY

[NOTE: comments like “unsure, confused and varied” is a common theme in many of the cultural descriptions, not just for the UK]

AUSTRALIA (Very High Positive Sentiment)

Australians described their culture as laid back, and the emotions they expressed back it up. Their comments contained far less (about half as much) Anger than the international average, lower Sadness and significantly higher Joy. Australian comments also don’t reflect much Surprise, with very few using terms such as “amazing.” Comments are more often relaxed (and often mention this term).


Verbatim examples:

“Its full of kindness, reslectfuly, courageness and happy” - JOY

“limited. But great mateship” – JOY

“Inclusive, relaxed, full of laughter” – JOY

“Laid back, relaxed and able to laugh at ourselves” – JOY

BRAZIL (Low Positive Sentiment)

Even though Carnival was a frequently mentioned feature in descriptions of Brazilian culture, life for Brazilians isn’t one big party. Brazilians’ culture comments are significantly more likely than average to contain Anger. They also contain fewer Trust mentions. Most of these sentiments involved frustration with corruption and/or crime. Paradoxically, at the same time, we found low instances of Anticipation and Fear/Anxiety, indicating Brazilians have somewhat resigned themselves or have grown accustomed to these conditions. Moreover, Joy is neither significantly lower nor higher than the international average.


Verbatim examples:

“…Because of the [income] distribution … very Robin Hood, ie acceptable to steal from large companies and also the government. So bank robberies without victims are not perceived negatively by the population, stealing TV signals, tax evasion, political and corruption in general is high, there is strong prejudice against the poor. unqualified civil servants are lazy (stealing their government salaries) High use of pesticides in food, eliminating its nutrients.”  – ANGER (multiple examples)

“A mixture of cultures, and now with evil people in charge making it very difficult to live with the current culture” – ANGER

“good. I believe in Brazil, that one day it will be great” - JOY

FRANCE (Average Positive Sentiment)

French comments contain less Surprise than average. In other words, they are less likely than average to use terms like “amazing” and “extraordinary” to describe their culture. This may be because French culture, conceptually, is so familiar and established in the minds of the French, yet the opposing emotion to Surprise—Anticipation—is also not significantly higher than average. French comments describing their culture are also somewhat less likely than average to contain Anger.


Verbatim example:

“We cannot explain French culture. We can only share its ideology, although doing so has evident limitations. I consequently, and personally, see it as wealth gained by mixing cultures: extraordinary traditions gained through the people who have lived here before us. – SURPRISE (rare example)

MEXICO (High Positive Sentiment)

Mexicans exhibit a high level of positivity in describing their culture, with their comments containing almost twice the amount of Joy as the international mean. Similarly, their Anger is also almost half that of the ten-country aggregate. Mexicans are also notable for their amount of Surprise—almost three times the average!


Verbatim examples:

“It is very rich and has many very beautiful and amazing things, traditions are super beautiful and have much biodiversity” – JOY and SURPRISE

“As a wonderful and amazing and different gift to what can be seen elsewhere in the world.” – JOY and SURPRISE

“Full of diversity and incredible things that transport you back in time to a magical place” – SURPRISE

“Mexican culture as a set of traditions and art that defined not only the beauty but the feeling of the nation is very particular as we have a very cheerful culture.” – JOY

SPAIN (Low Positive Sentiment)

When describing Spanish culture, the Spanish are three times less likely than others to mention issues related to Trust. Surprisingly, they also exhibit almost twice the average level of Sadness. And importantly, we found significantly higher amounts of Anger in Spanish comments about their culture, often related to corruption.


Verbatim Examples:

“For me, the Spanish culture is summed up in the torture of an animal (bull) and very rich food like potato omelette and paella” - ANGER

“bulls, crisis, corruption, political thieves, injustice, cachondeo” – ANGER

“Culture rather low and in many cases ridiculous. Eat and drink like monkeys and hang as much as possible with whoever is around. Idiots, political vermin, thieves and plunderers posing as big cahunas, big wealthy guys, the magnates of oil companies. These guys at the oil companies, they are just clowns but because they work there they become very wealthy, they steal and get a lot of money from the oil companies. They are thieves, corrupt. They become rich. They call this success. Like Rafa Mora or Belen Esteban they are very mediocre people in this country. I am ashamed of these people.” – ANGER and SADNESS

GERMANY (Low Positive Sentiment)

Germans have far less positive sentiment in descriptions and about half the proportion of Joy compared to the international average. Like the French, there is also very little Surprise in their comments. It’s not that negative emotions like Anger and Sadness are significantly higher, but rather the lack of positive emotions is significant.


Verbatim Examples:

“Well organized, industrious, intelligent, technically well developed.” – JOY (infrequent example of German Joy)

“Conservative, many rules, precise but also pleasure in little things, family” – JOY (infrequent example of German Joy)

JAPAN (Very Low Positive Sentiment)

By “Very Low Positive Sentiment” we do not mean that Japanese sentiment was negative, but that the Japanese sentiment was absent. The Japanese are very reserved and conservative, so it should come as no shock that the degree to which they expressed emotions, generally, was significantly lower than average.


Verbatim Examples:

“Though it is the culture of an island isolated by sea, it is special in its ability to ‘mimic’, and therefore it has developed into a simultaneously unique and multifaceted culture” – JOY

“The origins of the great culture of the Samurai” – JOY

“Japanese culture is special in that it is mellow and refined and is characterized by many gorgeous things. For example, tea ceremony, calligraphy, flower arranging, etc., at first appear to be quite quiet and plain endeavors, however such an impression belies a perfection and refined beauty that exists therein.” – JOY

“Japanese culture is a culture of hospitality and care” – TRUST

CANADA-ENGLISH (High Positive Sentiment)

Peace of mind doesn’t appear to be much of a problem for English-speaking Canadians, whose comments reflected significantly low Anger and high Trust. They also exhibited significantly less Fear/Anxiety than the international average, and a modestly higher level of JOY.


Verbatim examples:

“Look great from the outside, is great on the inside. But does have its flaws. Not to mention prejudice, inequality and racism is still embedded in large portions of our culture. Media also does a great job of covering stories that don't matter and are not actually informing.” – JOY (mixed/modest)

“Canadians are usually warm and welcoming people. We are mostly immigrants and understand peoples needs and desires to strive for a better life. We tend to supposrt one another yet respect peoples privacy.” - TRUST

“Like America only with gun control, socialized health care, and French on the packaging. And a much cuter leader.” – TRUST

“Friendly, fair, safe and welcoming” – TRUST

CANADA-FRENCH (Lower Positive Sentiment)

The Quebecois’ level of Joy is significantly higher than the international average, but it’s accompanied by equally high levels of Anger and Fear/Anxiety. This combination was unique in our data, perhaps as it represents a strong, well-understood and distinct culture that is defensively positioned within a larger, somewhat opposing culture that sometimes feels threatening. Comments from French Canadians—in contrast to those of the actual French from France—contained quite a lot of emotion. There were also significantly higher levels of Trust, and Sadness scored slightly above average.


Verbatim examples:

“People welcoming, open and proud. rich and diverse culture.” – JOY

“Mixture of French European roots in a North American context. Culture which developed from a difficult kind, hard winter. But a warm and supportive culture, proud of its language on an English-speaking continent.” – JOY

“A welcoming culture, which focuses on French and fights for its rights. Who are past present and future is important. Which is multi-ethnic” – ANGER

“people proud of its language and its history. Quebec is slightly open, yet desperate to preserve its values.” – FEAR/ANXIETY

“We are tolerant but do not humiliate us. Our history is full of situations where we have been crushed but wounds heal slowly. We are proud revelers but we lack confidence in us. We need to assert ourselves in the world and we are receiving from everyone so obviously there is no danger for us.” – FEAR/ANXIETY


What Have We Learned?

First of all, thank you to everyone for the incredible interest you’ve shown and for joining us on this journey!

While I’ll leave the final word on the cultural impact of globalization to anthropologists and others specializing in the study of culture, this surface-level read strongly suggests that we are becoming more alike. Multiculturalism, in particular, has become an important component of cultural identities across many countries and cultures. The data also obviously show that 1. significant differences endure, 2. their dimensions and 3. the degree to which they matter.

Somewhat surprisingly, the hero today may have been the emotional analysis, which told us that cultural identity is not necessarily a static construct, and that how people think about their culture at a given point in time is strongly influenced by current affairs and circumstances, hence the variation in emotions expressed and their intensity.

But what’s really striking about this exercise is that we were able to run these analyses and visualizations and glean all of these insights from data collected from a SINGLE open-ended question.

Look at how much we learned!

Imagine for a moment trying to collect this same information using a multiple-choice instrument. You’d need more than one, and I still don’t think it would be possible to achieve the same insights.

Then there’s the scale to consider. We analyzed responses from more than 15,500 peoplein their own words.

Lastly, we accomplished this using OdinText across 11 cultures, 10 countries and eight languages in fewer than two hours! (It actually took longer to prepare this blog post than it did to translate and analyze the data!)

In summary, research innovation today is generally assessed in three questions:

  • Is it better? Yes! This approach yielded insight that would have been impossible to achieve with a conventional, multiple-choice survey.

  • Is it faster? Yes! Manual coding alone would’ve taken days or weeks. OdinText did it in fewer than two hours.

  • Is it cheaper? Yes! This international project was affordable enough to conduct on a whim.

Whatever the size of your organization or your resources, this project demonstrates that you can now conduct, translate and analyze a multinational, multilingual study among key consumers in key markets and capture meaningful insights quickly, affordably and easily without even getting up from your desk using OdinText.

Contact us here to talk about it.

Thanks for reading. I’d love to hear what you think!

@TomHCAnderson (@OdinText)

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

Text Analytics Answers - Is All Culture Becoming American? Part 2

Defined in Their Own Words: 11 Cultures, 10 Countries & Eight Languages  

In Part 1 of this series, I provided a top line from our analysis of comments from more than 15,500 people spanning 11 cultures in 10 countries and eight languages in response to one question:

“How would you explain <insert country> culture to someone who isn’t at all familiar with it?”

After translating and analyzing these data with OdinText—an exercise that took fewer than two hours—we discovered that across cultures, by and large, one of the defining characteristics of almost every culture represented in our sample is that it is multicultural, suggesting that there may indeed be some validity to the argument that globalization is having a “melting pot” effect on cultures around the world.

Of course, multiculturalism/diversity was far from the only common attribute that people mentioned across cultures (it was simply the most prevalent one); it took quite a few commonalities mentioned across cultures to generate what we saw in the aggregate visualization we shared in Part 1, which showed by graphic proximity how alike or dissimilar the 11 cultures in our sample are and which, not coincidentally, put U.S. culture at the relative center of it all.

Not surprisingly, though, we also found that every culture retains unique characteristics in the eyes of its respective members. Today we’re going to look closely at what those similarities and differences are for each culture.

Cultural Characteristics in Their Own Words

Each of the charts below contains primary cultural descriptors—features/attributes/topics—identified by OdinText at the country/culture level compared to the mean aggregate for all countries/cultures studied in the sample.


Baseball, hotdogs, apple pie and Chevrolet are surprisingly NOT top-of-mind for Americans. In fact, only FOUR people out of 1,500 mentioned baseball. Instead, we found that Americans overwhelmingly view their cultural identity in terms of freedom and multiculturalism/melting pot.

Visualization is a powerful and important tool for telling a story through data in research today, so just to offer a little variety I rendered the same data in a spider chart. What do you think? What does this visualization say about U.S. culture compared to the international aggregate?


The Brits are well known for their humor, and apparently they consider it a key part of their cultural identity. They are also a little unusual in that Brits closely associate their culture with a culinary staple—fish & chips—something we had expected to see more of across cultures, but did not.



It’s almost cliché, but Aussies are laid back and they know it.



Brazilians are keenly aware of their cultural diversity and they think trendiness and sexiness set them apart.


Recall that in Part 1 yesterday I noted that terms like “French,” “American,” and “Spanish” turned up in people’s descriptions and that for our purposes here they weren’t terribly useful? Well, this is one case where the use of the term “French” speaks volumes.  The French are unusually self-aware and see their culture as being so distinctive and pronounced that little explanation is actually needed. It’s almost self-evident in their minds, so they assume that characterizing something as “French”—“French cuisine,” for example—is sufficient to describe their culture.


Mexicans explain their culture in terms of tradition and vibrancy—color, beauty, flavor.


Like their neighbors in France, Spaniards have a sense of their culture as being highly distinctive. Lifestyle featured prominently here—things like siesta, the beach and sunshine, etc. I personally found it interesting that the Spanish simultaneously see diversity/multiculturalism as a key facet of their culture.


Asked about their culture, Germans point to beer, but there isn’t much in the way of fun or frivolity beyond that.  They also consider their culture to be versatile/flexible, orderly/rule-abiding and efficacious.

More importantly, comments from our German sample had a conspicuously lower incidence of actual cultural features than those of other cultures. This would seem to indicate that Germans are somewhat uncomfortable talking about German culture, which isn’t entirely surprising. Obviously, there’s a great deal of sensitivity and angst around discussion of German identity today as a legacy of Nazism. Remember also that until relatively recently Germany was two different countries. What German culture is, exactly, post-reunification may not be entirely clear to Germans, themselves.


The Japanese were unique in many ways, not the least of which being that describing their culture proved exceedingly difficult—and a very different kind of difficult from what we see in the German analysis. A significant number of Japanese respondents characterize Japanese culture as something that almost defies description and must instead be experienced to be understood. The Japanese also see their culture as being rigid and extremely pronounced and comments suggest that the Japanese find great comfort in rules. Indeed, this is the only group in our sample where not a single person mentioned “freedom.”

CANADA (English)

I promised you 11 cultures. To illustrate, here’s a side-by-side comparison of French Canadians and English Canadians.

For residents of English-speaking Canada, multiculturalism is a huge facet of their culture, while tradition is apparently less important. Canadians also take their national pastime—Hockey—as a cultural hallmark (unlike their neighbors in the States who, again, hardly mentioned baseball).

CANADA (French)

I promised you 11 cultures.

French-speaking Canadians (aka the Québécois) are, of course, quite dissimilar from their English-speaking countrymen in many ways. First and foremost, they’re fiercely French—so much so that “Frenchness” is more important to their cultural identity than it is to the actual French in France!

This concludes Part 2 of our international culture expedition.

In Part 3, I’ll share the fascinating results of OdinText’s emotional analysis of this comment data. What people say about their respective cultures, analyzed for significant patterns of emotion, tells an entirely new story! Join us for Part 3 tomorrow.

Tomorrow: Part III – How Emotions Speak Louder than Words


@TomHCAnderson - @OdinText

PS. Have questions about today’s post? Feel free to post a comment or request more info here.

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

Text Analytics Identifies Globalization Impact on Culture

International Text Analytics Poll™ Explores 11 Cultures in 10 Countries and 8 Languages! [Part I]

When pundits declare that the western world is  now in the throes of a globalization “backlash,” they’re generally referring to the reversal of decades of economic and trade policy, things like Brexit.

But what of other concerns typically associated with globalization? What about culture?

Specifically, there are those who argue that globalization will mean the end of cultures, that the various cultures of the world will over time dilute and blend until there is ultimately just one global melting pot culture.

They may be right.

When we think about culture, it’s often in terms of food, music, customs, etc., but it turns out that when you ask people in countries around the world to describe their own culture in their own words, one nearly universal and unexpected attribute rises to the top: diversity/multiculturalism.

In fact, multiculturalism/diversity was one of the primary and most frequently mentioned attributes used by over 15,500 people to describe 11 different cultures across 10 countries and eight languages!

Text Analytics on a Massive, Multilingual International Scale

Last week on this blog, we published the results of a Text Analytics Poll™ for the favorite movie of all time across six countries and five languages. The project generated a flood of inquiries.

Since everyone is so interested in what can be accomplished on an international scale, we increased the scope of this project significantly.

This time, we asked more than 15,500 people (at least n=1,500 per country) in 10 countries and eight languages the following:

“How would you explain <insert country> culture to someone who isn’t at all familiar with it?”

Then we ran their comments through OdinText, which identified the top 200 cultural markers or features from more than 15,500 text comments and also analyzed those comments for significant patterns of emotion.

How We Translated AND Analyzed the Data (In Less Than Two Hours)

Author’s note: If you’re not interested in methodology, please feel free to skip ahead to the results down below!

Many of you contacted us asking for more details last week, so I’ve provided some additional nuts and bolts here…

Step 1: Data Prep (Translation)

I usually limit total analytical time for any of these Text Analytics Poll™ projects to fewer than two hours. I admit that’s going to be a challenge today, as I’m looking at more than 15,500 comments across 11 cultures from 10 countries in eight languages.

The first challenge is translation. I happen to speak a few languages in addition to English, but in this case I’m faced with seven languages that I don’t understand well enough to analyze. If I did understand each of the languages, or were working with analysts who did, we could easily conduct the analysis in OdinText in the native form.

I’ll point out that while some corporations claim to be “global” in everything they do, in reality there is never enough language fluency at corporate to handle this type of analysis, so analyses are typically divvied up and entrusted to local divisions—a time-consuming and imperfect task, especially when the goal in this case is to make head-to-head comparisons across these countries.

Therefore translation is necessary. While less precise than human translation, machine translation lends itself quite well to a project like this and is more than sufficient for OdinText to identify patterns and even to determine which quotes should be of interest. Nothing has a better ROI. Case in point, it took two minutes to translate the data. For those keeping track, I’m at

Above we have an example of machine translated raw data vs. the original French from the multi-country movie analysis I conducted last week. In the case above I’m looking at all mentions of “La Ligne Verte,” a title OdinText identified as appearing frequently among comments from French respondents. I don’t speak French, so I prefer to work with machine translated data on the left, which translated “La Ligne Verte” literally to “The Green Line” –the French title for the U.S. movie “The Green Mile.”

Step 2: Topic Identification

Using the top-down/bottom-up approach we teach in OdinText training and which we’ve blogged about here before, we identify 200 or so topics/features for analysis. This is a semi-supervised approach, and so a human is involved.

Given this somewhat larger multi-country data set, I allowed about 45 minutes for this task, so we’re at 

Step 3: Artificial Intelligence and Structuring the Analysis

Structuring the analysis is the most important and the most difficult part of any project, especially an exploratory mission where you don’t know what you are looking for at the outset.

You may be surprised to know that artificial intelligence and advanced machine learning algorithms can be a lot less useful than one might think. They have a tendency to identify the obvious—the attribute/topic “tradition” in this case—or, in cases, the unexplainable. For instance, terms like “French,” “American,” “Japanese,” “Spanish,” etc., came up in responses to our question. These are, of course, very useful if you’re building an algorithm to predict where comments originate, for example, but they aren’t terribly illuminating for us here.

Examples of other topics auto identified as ‘of interest’ by our AI include “friendliness,” “relaxed/laid back,” “freedom,” and “equality fraternity liberty.” (You can probably guess where that last one came from.) Some of these other, less expected ones warrant a closer look and will be included in the analysis.

We could move right into an exhaustive analysis of each country, but I’m looking to quickly find any interesting patterns in this data, so I elect to use a quick visualization first.

Cultural Differences and Similarities Vizualized

Cultural Differences and Similarities Vizualized (A Few Key Descriptive Dimensions Added)

These visualizations (above) plot cultures that were described in more similar terms by people closer together and those that were described more differently further apart, yielding some interesting patterns. The USA, UK, Brazil, France and even Spain look quite similar. Two countries—Germany and Japan—cluster slightly away from this main bunch, but very close to each other. Then there are those that appear to be most dissimilar from the rest—Mexico, French- and English-speaking Canada, respectively, and Australia.

To my earlier question about whether or not globalization is having a homogenizing effect on cultures, it would appear so at a glance. We’ve noted that several countries cluster closely around the U.S. But look again—the U.S. appears to occupy the center of the cultural universe here! That’s no coincidence, I suspect, as U.S. culture could in many ways be considered the “melting pot” model and, as we saw last week, culture is a major U.S. export.

Analytical time to review multiple visualizations and decide that this is a repeating pattern was 10 minutes. Total analytical time =

Given that we have a full hour left (remember I did not want to spend more than two hours on this analysis), as a next step we conducted a little bottom-up work to look at what makes each country unique from the international aggregate/total and to see whether the pattern in the visualization makes sense.

Example: Why do Germany and Japan look so similar to OdinText?

A glance at the two charts below shows significant differences between how the Japanese and Germans describe their cultures. For instance, the Japanese were 11 times more likely than Germans to say their culture was something that needed to be experienced in order to be understood, and they were four times more likely than Germans to mention their history. They were also 14 times less likely to mention certain places of interest and three times more likely than Germans to mention food.

In contrast, Germans were 27 times more likely to mention beer and eight times more likely to describe their culture as rule-abiding and orderly. (Of course, this does not mean that Japanese culture is any less rule-abiding or orderly; rather, it suggests that for the Japanese these are not defining cultural characteristics.)

Respondents from both countries were more likely than average to mention language, tradition, and politeness, BUT the similarities between these two cultures actually lie primarily in the extent to which they both differ from the other cultures sampled, notably by how infrequently certain features mentioned by people from other cultures appeared in comments from German and Japanese respondents.Total Analytical Time =

This concludes Part 1 of our cultural safari. In Part 2 tomorrow we’ll take a deeper dive into each of the 11 cultures in our study individually, exploring how their members define themselves and the extent to which key cultural drivers differ from or are similar to the international aggregrate. Stay tuned!

Tomorrow: Part II – Key Cultural Drivers in Their Own Words

@TomHCAnderson - @OdinText

PS. Have questions about today's post? Feel free to post a comment or request more info here.

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He he tweets under the handle @tomhcanderson.


Making Film History with Text Analytics: Six Countries, Five Languages, and One Fill-in-the-Blank Question!

The U.S. trade deficit may have hit a five-year high in January, but at least one American export remains the undisputed world leader: Hollywood films.

Even with growing competition from China and Bollywood, and in spite of the recent wave of America-bashing that seems to have swept the globe, the appetite for this most American of cultural artifacts is robust as ever.

In fact, the most beloved films of moviegoers in the UK, continental Europe and Japan are overwhelmingly and almost exclusively American, and the top three favorite films of all time among these international audiences collectively are Titanic, Star Wars and Harry Potter (in that order).

We know this not because these films made a killing in theaters, but because for the first time ever we asked people and they told us so!


No other source that I’m aware of has compiled an international list of the most beloved films of all time because the project would’ve been too expensive, labor-intensive, and time-consuming to be worthwhile.

Global box office has until now been the one and only measure for a film’s popularity internationally, but even adjusted for inflation it isn’t a perfect proxy for the films people cherish.

So, we asked general population samples (n=1,500 per country) in the UK, France, Germany, Spain and Japan as well as 3,000 Americans the following:

“What is your favorite movie of all time (name up to three if you can)?”

Stating the obvious (because it’s important): This was not a multiple-choice question. People could have said anything. In fact, on average, close to 200 unique movie titles were mentioned from each country surveyed.

In total, we collected more than 10,500 text comments from six countries spanning five languages, which OdinText analyzed in just over one hour! Ta da!

If you’re not as impressed as I am by this feat, try fielding an international survey with an open-ended question in five languages (including a non-Roman alphabet like Japanese), then translating, coding and analyzing the responses manually.


Machine translation usually works very well with text analysis, though in a few instances may require a little human knowledge/tweaking. For example, the movie “The Green Mile” was renamed “The Green Line” in France (something that can easily be accounted for in OdinText).


As a standard for comparison, I’ve included two lists below of the top-grossing films of all time worldwide. This first list below, sourced from Wikipedia and originally from Guinness World Records, presents the top 10 films adjusted for inflation as of 2014:

Highest-Grossing Films Worldwide (as of 2014, adjusted for inflation)

Rank | Title | Worldwide gross | (2014 $) | Year

1 Gone with the Wind | $3,440,000,000 | 1939

2 Avatar | $3,020,000,000 | 2009

3 Star Wars | $2,825,000,000 | 1977

4 Titanic | $2,516,000,000 | 1997

5 The Sound of Music | $2,366,000,000 | 1965

6 E.T. the Extra-Terrestrial | $2,310,000,000 | 1982

7 The Ten Commandments | $2,187,000,000 | 1956

8 Doctor Zhivago | $2,073,000,000 | 1965

9 Jaws | $2,027,000,000 | 1975

10 Snow White and the Seven Dwarfs | $1,819,000,000 | 1937

This second list at Box Office Mojo is not adjusted for inflation like the Guinness list above, but it’s current and more extensive yet is very different than what we found when people are allowed to tell us their favorite movies.


So, how did people’s responses to our question stack up to global box office figures?


As you can see, the top four highest grossing films (adjusted for inflation) in our first chart—“Gone with the Wind,” “Avatar,” “Star Wars,” and “Titanic”—also appeared in the top 10 favorites, albeit not in the same order. The other six highest grossing films did not even make the top international 25 favorites. Similarly, six of the top 10 international favorites were not among the top 10 highest grossing films.


At the individual country levels you’ll note some differences. The top three favorites were very popular in every country, but some of the internationally highest-grossing films were not. “Avatar,” for example, is well-loved everywhere except in the UK and Japan.

And while it's perhaps not surprising that “Gone with the Wind” is a favorite among Americans, the fact that it made the top 10 in France was a surprise to me. And GwtW just missed the top 25 for Japan, coming in #26. It’s actually least popular in Germany, coming in #35 there.

Moreover, not a single domestic film appeared in the top 25 favorites for the UK and Spain, respectively. The closest to a domestic film for Spain was “A Monster Comes to See Me,” whose director is Spanish. German audiences only named one German favorite, and even its title contains a misspelled English curseword: “Fack Ju Gothe.”


France and Japan (below) are particularly noteworthy for several reasons. France is renowned for its film making and its cultural pride, yet conspicuously only one French film appeared in the top 10 favorites of French movie watchers.


Japan, which not only has an established domestic film industry but arguably the most pronounced and culture of the countries sampled, differed the most.

Japanese movie watchers bucked the international trend by listing three Japanese films among their top 10 favorites, most conspicuously by naming a non-U.S. film their number one favorite! (The popularity of the film “Your Name” in Japan was even sufficient to propel the title into the aggregated top 10 across countries!)

In addition, animated films like “Your Name” and “My Neighbor Tortoro” figured prominently among Japanese favorites.

Favorites in other countries like “Dirty Dancing” and “Lord of the Rings” weren’t particularly well-liked by the Japanese, whereas surprises like “Resident Evil” and “Roman Holiday” were favorites.


There are a lot of ways to easily slice these data with OdinText. Just for fun, we asked OdinText what the gender split was for the top favorite:


And if there’s such a thing as a “chick flick,” then there’s also a male equivalent. OdinText identified a number of favorites that were only mentioned by men or by women!

Little White Lies, Bridget Jones, Pearl Harbor, Fifty Shades of Gray, and Sweet Home Alabama are among these ‘Chick Flick Only’ movies, whereas The Good The Bad & The Ugly, Transformers, Zulu, Das Boot, and Super Troopers are examples of Guy only flicks.


While we were at it, we asked a separate gen pop sample of Americans (n=1500) to name the worst movie ever made.


Likely due to the proximity to the Oscars when the survey was fielded, the number one was shockingly “La La Land”! In addition, some of the international favorites and highest grossing films globally—notably “Titanic,” “Avatar,” “Jaws,” and “Star Wars”—were also raspberries for a lot of people. Go figure!


So movie fans, I think you’ll agree we squeezed quite a bit out one open-ended question here!

Not only were we able to provide this data for the first time ever, but we managed to collect, translate and analyze the data quickly, easily and affordably.

That’s six countries, five languages, one open-ended question. And it’s no Hollywood fairytale.

Although we used a direct response instrument to collect the data here, I'd like to point out that OdinText might have been able to do a similar analysis without a survey. For example, in our recent "Showhole" post, OdinText predicted what television shows people would like based on thousands of comments scraped from the Internet.

Thanks for reading. I’d love to hear what you think!

@TomHCAnderson (@OdinText)

P.S. Want to conduct your own multi-country, multilingual survey with an open-ended question? Contact us here to talk about it.

Why Communicating with Aliens is Easier than You Think – And What It Means for Your Company

The Movie “Arrival,” Text Analytics and Machine Translation When I speak with prospective OdinText users who’ve been exposed to other text analytics software providers, I find they tend to mention and ask about things like POS tagging, taxonomies, ontologies, etc.

These terms come from linguistics, the discipline upon which many of the text analytics software platforms in the market today are predicated.

But you may be surprised to learn that as a basis for text analytics, linguistics is shockingly inefficient compared to approaches that rely on mathematics/statistics.

One of the most popular movies in theaters right now, “Arrival,” inadvertently makes this case rather well.

Understanding Alien Languages is Easy (Provided You’re Not a Linguist)



“Arrival” begins with a flock of spaceships touching down in locations around the world. Linguistics professor Louise Banks (Amy Adams) is then recruited to lead an elite team of experts in a race against time to find a way to communicate with the extraterrestrial visitors and avert a global war.

The film proceeds to build a lot of drama around a pretty minor problem of language analysis and translation—conveniently consuming several months during which the plot can thicken—when, in fact, the task of understanding an alien language like in the movie would be quite EASY.

I daresay in all modesty that I could have done this in a fraction of the time with OdinText and with a much smaller team than Adams’ character had!



It Only Takes a Few Words

In her first conversation with the aliens, Louise introduces herself by writing the word “human” on a little whiteboard she carries, to which the aliens respond by introducing themselves in their language.

After this initial exchange, in the real world, only a few more words would be necessary to start creating and applying a code book (a taxonomy or ontology in linguistics speak), which would allow one to quickly translate anything else said and to then communicate via a small, imperfect but highly effective vocabulary.

For example, a little later in the movie, one of the aliens tells Louise that another alien who is missing from their meeting that day is “in the death process,” which, of course, means the other alien is absent because he is dying.

Everyone in the audience gets what the alien means by “in the death process.”  Indeed, communicating successfully with a small, imperfect vocabulary like this is far more efficient and reliable than one might assume. My two-year-old son and I are quite good at communicating in these sorts of two- or three-word phrases.  And no parts of speech tagging are necessary (nor would they be very helpful here).

I’ll come back to this idea of small, imperfect but surprisingly efficient vocabularies in a bit. But first, let’s consider a related but more challenging matter: breaking code.

How the Allies Used Text Analytics to Break the German Code

Compared to translating an alien language, it would be only slightly more difficult—though honestly not that much more difficult—to crack the Nazi Enigma code that helped the Allies win WWII today using OdinText.

Why more difficult? Because unlike the aliens in “Arrival,” who actually want the humans to learn their language in order to communicate, the Nazis wanted their encrypted language to stay indecipherable.



In the 2014 movie “The Imitation Game,” Benedict Cumberbatch stars as Alan Turing, the genius British mathematician, logician, cryptologist and computer scientist who led the effort to crack the German code.

In contrast to “Arrival,” the drama in “The Imitation Game” centers on Turing’s determination to build a decryption machine, instead of attempting to decode Enigma by hand like every other scientist assigned to the task.

When his boss refuses to fund his machine’s construction, Turing writes to Churchill, who arranges the funding and names him team leader. Turing subsequently fires the key linguists from the project and the linguistic approach to this text analysis (i.e., code breaking) is chucked in favor of computational mathematics.

Turing’s machine is, of course, critical to the solution (though the technology is simple by today’s standards), but the real breakthrough happens when the scientists realize that the machine can be sped up by recognizing routinely used phrases like “Heil Hitler” (again providing a basic code frame or taxonomy).

The Turing Test: Did You Know You Were Talking to a Computer?

In computer engineering classes on artificial intelligence there is an oft-mentioned thought experiment called “The Chinese Room,” which is used to think about the differences between human and computer cognition. It’s often referenced when discussing the Turing Test, which assesses computer intelligence based on whether a human being can distinguish between a computer and a human being’s replies to the same questions.

Going back now to my earlier point about a small taxonomy being sufficient for communication, and keeping in mind that today’s far more powerful computers running Google Translate or OdinText can process unstructured text data in any language order of magnitudes faster than any human or Turing’s machine, I think The Chinese Room analogy is not just an interesting AI thought experiment, but a good way to explain why translating the alien language in “Arrival” should have been so much easier than the film made it out to be.

The Chinese Room

Imagine for a moment a room with no windows, only a door with a small mail slot.

In the room, we find an average English speaker recruited randomly off the street, someone without any advanced education or background in foreign languages or linguistics.

This person has been paid to spend the day in this room and given a code book for a “squiggly language” he/she has been tasked with translating. In the story, it’s typically Chinese, but it could be any foreign language with which the person is totally unfamiliar. Let’s assume Chinese to stay close to the original story.

After giving him/her this code book—basically an English-to-Chinese/Chinese-to-English dictionary—we tell this person that on occasion we may pass them a note written in Chinese and that they will need to use the code book to figure out what the message means in English. Likewise, if they need anything—water, food, bathroom break, etc.—they will need to pass the request in a note written in Chinese back through the mail slot to us.

Note that this person has ABSOLUTELY NO TRAINING in the syntax or grammar of Chinese. His/her notes may be rudimentary, but certainly they will still be understood.

What’s more, if a native Chinese speaker walked by and observed the notes coming out, they would probably assume that there was a Chinese speaker in the room.

Now, instead of a code book, suppose the person in the room was using a computer program like Google Translate or OdinText, which can instantaneously translate or otherwise process any number of words coming out of the room, making it even more likely that the Chinese-speaking passerby assumes the person in the room speaks Chinese.

Think about this the next time you’re wondering whether data translated by machine—which is so much faster and cheaper than human translation—is sufficient for text analytics purposes (i.e. understanding what hundreds or hundreds of thousands of humans are saying in some foreign language).

My strong belief is yes, definitely. Whether I’m looking at Swedish or Chinese, I’m always rather impressed by how on point today’s computer translation is, and how irrelevant any nuance is, especially at the aggregate level, which is usually where we need to be.

You don’t need a team of NASA scientists, nor a month to do it. You can have it ready by morning! The technology is already here!


  1. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior here on Earth, please contact us or request a FREE demo using your own data here!

[Key Terms: AI, Artificial Intelligence, Machine Translation, Text Analytics, Linguistics, Computational Linguistics, Taxonomies, Ontologies, Natural Language Processing, NLP]



Tom H. C. Anderson OdinText Inc.


OdinText is a patented SaaS (software-as-a-service) platform for advanced analytics. Fortune 500 companies such as Disney and Shell Oil use OdinText to mine insights from complex, unstructured text data. The technology is available through the venture-backed Stamford, CT firm of the same name founded by CEO Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. Anderson is the recipient of numerous awards for innovation from industry associations such as ESOMAR, CASRO, the ARF and the American Marketing Association. He tweets under the handle @tomhcanderson.