International Text Analytics Poll™ Explores 11 Cultures in 10 Countries and 8 Languages! [Part I]
When pundits declare that the western world is now in the throes of a globalization “backlash,” they’re generally referring to the reversal of decades of economic and trade policy, things like Brexit.
But what of other concerns typically associated with globalization? What about culture?
Specifically, there are those who argue that globalization will mean the end of cultures, that the various cultures of the world will over time dilute and blend until there is ultimately just one global melting pot culture.
They may be right.
When we think about culture, it’s often in terms of food, music, customs, etc., but it turns out that when you ask people in countries around the world to describe their own culture in their own words, one nearly universal and unexpected attribute rises to the top: diversity/multiculturalism.
In fact, multiculturalism/diversity was one of the primary and most frequently mentioned attributes used by over 15,500 people to describe 11 different cultures across 10 countries and eight languages!
Text Analytics on a Massive, Multilingual International Scale
Since everyone is so interested in what can be accomplished on an international scale, we increased the scope of this project significantly.
This time, we asked more than 15,500 people (at least n=1,500 per country) in 10 countries and eight languages the following:
“How would you explain <insert country> culture to someone who isn’t at all familiar with it?”
Then we ran their comments through OdinText, which identified the top 200 cultural markers or features from more than 15,500 text comments and also analyzed those comments for significant patterns of emotion.
How We Translated AND Analyzed the Data (In Less Than Two Hours)
Author’s note: If you’re not interested in methodology, please feel free to skip ahead to the results down below!
Many of you contacted us asking for more details last week, so I’ve provided some additional nuts and bolts here…
Step 1: Data Prep (Translation)
I usually limit total analytical time for any of these Text Analytics Poll™ projects to fewer than two hours. I admit that’s going to be a challenge today, as I’m looking at more than 15,500 comments across 11 cultures from 10 countries in eight languages.
The first challenge is translation. I happen to speak a few languages in addition to English, but in this case I’m faced with seven languages that I don’t understand well enough to analyze. If I did understand each of the languages, or were working with analysts who did, we could easily conduct the analysis in OdinText in the native form.
I’ll point out that while some corporations claim to be “global” in everything they do, in reality there is never enough language fluency at corporate to handle this type of analysis, so analyses are typically divvied up and entrusted to local divisions—a time-consuming and imperfect task, especially when the goal in this case is to make head-to-head comparisons across these countries.
Therefore translation is necessary. While less precise than human translation, machine translation lends itself quite well to a project like this and is more than sufficient for OdinText to identify patterns and even to determine which quotes should be of interest. Nothing has a better ROI. Case in point, it took two minutes to translate the data. For those keeping track, I’m at
Above we have an example of machine translated raw data vs. the original French from the multi-country movie analysis I conducted last week. In the case above I’m looking at all mentions of “La Ligne Verte,” a title OdinText identified as appearing frequently among comments from French respondents. I don’t speak French, so I prefer to work with machine translated data on the left, which translated “La Ligne Verte” literally to “The Green Line” –the French title for the U.S. movie “The Green Mile.”
Step 2: Topic Identification
Using the top-down/bottom-up approach we teach in OdinText training and which we’ve blogged about here before, we identify 200 or so topics/features for analysis. This is a semi-supervised approach, and so a human is involved.
Given this somewhat larger multi-country data set, I allowed about 45 minutes for this task, so we’re at
Step 3: Artificial Intelligence and Structuring the Analysis
Structuring the analysis is the most important and the most difficult part of any project, especially an exploratory mission where you don’t know what you are looking for at the outset.
You may be surprised to know that artificial intelligence and advanced machine learning algorithms can be a lot less useful than one might think. They have a tendency to identify the obvious—the attribute/topic “tradition” in this case—or, in cases, the unexplainable. For instance, terms like “French,” “American,” “Japanese,” “Spanish,” etc., came up in responses to our question. These are, of course, very useful if you’re building an algorithm to predict where comments originate, for example, but they aren’t terribly illuminating for us here.
Examples of other topics auto identified as ‘of interest’ by our AI include “friendliness,” “relaxed/laid back,” “freedom,” and “equality fraternity liberty.” (You can probably guess where that last one came from.) Some of these other, less expected ones warrant a closer look and will be included in the analysis.
We could move right into an exhaustive analysis of each country, but I’m looking to quickly find any interesting patterns in this data, so I elect to use a quick visualization first.
Cultural Differences and Similarities Vizualized
Cultural Differences and Similarities Vizualized (A Few Key Descriptive Dimensions Added)
These visualizations (above) plot cultures that were described in more similar terms by people closer together and those that were described more differently further apart, yielding some interesting patterns. The USA, UK, Brazil, France and even Spain look quite similar. Two countries—Germany and Japan—cluster slightly away from this main bunch, but very close to each other. Then there are those that appear to be most dissimilar from the rest—Mexico, French- and English-speaking Canada, respectively, and Australia.
To my earlier question about whether or not globalization is having a homogenizing effect on cultures, it would appear so at a glance. We’ve noted that several countries cluster closely around the U.S. But look again—the U.S. appears to occupy the center of the cultural universe here! That’s no coincidence, I suspect, as U.S. culture could in many ways be considered the “melting pot” model and, as we saw last week, culture is a major U.S. export.
Analytical time to review multiple visualizations and decide that this is a repeating pattern was 10 minutes. Total analytical time =
Given that we have a full hour left (remember I did not want to spend more than two hours on this analysis), as a next step we conducted a little bottom-up work to look at what makes each country unique from the international aggregate/total and to see whether the pattern in the visualization makes sense.
Example: Why do Germany and Japan look so similar to OdinText?
A glance at the two charts below shows significant differences between how the Japanese and Germans describe their cultures. For instance, the Japanese were 11 times more likely than Germans to say their culture was something that needed to be experienced in order to be understood, and they were four times more likely than Germans to mention their history. They were also 14 times less likely to mention certain places of interest and three times more likely than Germans to mention food.
In contrast, Germans were 27 times more likely to mention beer and eight times more likely to describe their culture as rule-abiding and orderly. (Of course, this does not mean that Japanese culture is any less rule-abiding or orderly; rather, it suggests that for the Japanese these are not defining cultural characteristics.)
Respondents from both countries were more likely than average to mention language, tradition, and politeness, BUT the similarities between these two cultures actually lie primarily in the extent to which they both differ from the other cultures sampled, notably by how infrequently certain features mentioned by people from other cultures appeared in comments from German and Japanese respondents.Total Analytical Time =
This concludes Part 1 of our cultural safari. In Part 2 tomorrow we’ll take a deeper dive into each of the 11 cultures in our study individually, exploring how their members define themselves and the extent to which key cultural drivers differ from or are similar to the international aggregrate. Stay tuned!
PS. Have questions about today's post? Feel free to post a comment or request more info here.
About Tom H. C. Anderson
Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He he tweets under the handle @tomhcanderson.