Posts tagged Natural Language Processing
A New Trend in Qualitative Research

Almost Half of Market Researchers are doing Market Research Wrong! - My Interview with the QRCA (And a Quiet New Trend - Science Based Qualitative).

Two years ago I shared some research on research about how market researchers view Quantitative and Qualitative research. I stated that almost half of researchers don’t understand what good data is. Some ‘Quallies’ tend to rely and work almost exclusively with comment data from extremely small samples (about 25% of market researchers surveyed), conversely there is a large group of ‘Quant Jockey’s’ who while working with larger more representative sample sizes, purposefully avoid any unstructured data such as open ended comments because they don’t want to deal with coding and analyzing it or don’t believe in it’s accuracy and ability to add to the research objectives. In my opinion both researcher groups have it totally wrong, and are doing a tremendous disservice to their companies and clients.  Today, I’ll be focusing on just the first group above, those who tend to rely primarily on qualitative research for decisions.

Note that today’s blog post is related to a recent interview, which I was asked to take part in by the QRCA’s (Qualitative Research Consultant’s Association) Views Magazine. When they contacted me I told them that in most cases (with some exceptions), Text Analytics really isn’t a good fit for Qualitative Researchers, and asked if they were sure they wanted to include someone with that opinion in their magazine? I was told that yes, they were ok with sharing different viewpoints.

I’ll share a link to the full interview in the online version of the magazine at the bottom of this post. But before that, a few thoughts to explain my issues with qualitative data and how it’s often applied as well as some of my recent experiences with qualitative researchers licensing our text analytics software, OdinText.

The Problem with Qualitative Research

IF Qual research was really used in the way it’s often positioned, ‘as a way to inform quant research’, that would be ok. The fact of the matter is though, Qual often isn’t being used that way, but instead as an end in and of itself. Let me explain.

First, there is one exception to this rule of only using Qual as pilot feedback for Quant. If you had a product for instance which was specifically made only for US State Governors, then your total population is only N=50. And of course it is highly unlikely that you would ever get all the Governors of each and every US State to participate in any research (which would be a census of all governors), and so if you were fortunate enough to have a group of say 5 Governors whom were willing to give you feedback on your product or service, you would and should obviously hang on to and over analyze every single comment they gave you.

IF however you have even a slightly more common mainstream product, I’ll take a very common product like hamburgers as an example, and you are relying on 5-10 focus groups of n=12 to determine how different parts of the USA (North East, Mid-West, South and West) like their burgers, and rather than feeding  directly into some quantitative research instrument with a greater sample, you issue a ‘Report’ that you share with management; well then you’ve probably just wasted a lot of time and money for some extremely inaccurate and dangerous findings. Yet surprisingly, this happens far more often than one would imagine.

Cognitive Dissonance Among Qual Researchers when Using OdinText

How do I know this you may ask? Good Text Analytics software is really about data mining and pattern recognition. When I first launched OdinText we had a lot of inquiries from Qualitative researchers who wanted some way to make their lives easier. After all, they had “a lot” of unstructured/text comment data which was time consuming for them to process, read, organize and analyze. Certainly, software made to “Analyze Text” must therefore be the answer to their problems.

The problem was that the majority of Qual researchers work with tiny projects/sample, interviews and groups between n=1 and n=12. Even if they do a couple of groups like in the hamburger example I gave above, we’re still taking about a total of just around n=100 representing four or more regional groups of interest, and therefore fewer than n=25 per group. It is impossible to get meaningful/statistically comparable findings and identify real patterns between the key groups of interest in this case.

The Little Noticed Trend In Qual (Qual Data is Getting Bigger)

However, slowly across the past couple of years or so, for the first time I’ve seen a movement of some ‘Qualitative’ shops and researchers, toward Quant. They have started working with larger data sets than before. In some cases, it has been because they have been pulled in to manage larger ongoing community/boards, in some cases larger social media projects, and in others, they have started using survey data mixed with qual, or even better, employing qualitative techniques in quant research (think better open-ends in survey research).

For this reason, we now have a small but growing group of ‘former’ Qual researchers using OdinText. These researchers aren’t our typical mixed data or quantitative researchers, but qualitative researchers that are working with larger samples.

And guess what, “Qualitative” has nothing to do with whether data is in text or numeric format, instead it has everything to so with sample size. And so perhaps unknowingly, these ‘Qualitative Researchers’ have taken the step across the line into Quantitative territory, where often for the first time in their career, statistics can actually be used. – And it can be shocking!

My Experience with ‘Qualitative’ Researchers going Quant/using Text Analytics

Let me explain what I mean. Recently several researchers that come from a clear ‘Qual’ background have become users of our software OdinText. The reason is that the amount of data they had was quickly getting “bigger than they were able to handle”. They believe they are still dealing with “Qualitative” data because most of it is text based, but actually because of the volume, they are now Quant researchers whether they know it or not (text or numeric data is irrelevant).

Ironically, for this reason, we also see much smaller data sizes/projects than ever before being uploaded to the OdinText servers. No, not typically single focus groups with n=12 respondents, but still projects that are often right on the line between quant and qual (n=100+).

The discussions we’re having with these researchers as they begin to understand the quantitative implications of what they have been doing for years are interesting.

Let me preface this with the fact that I have a great amount of respect for the ‘Qualitative’ researchers that begin using OdinText. Ironically, the simple fact that we have mutually determined that an OdinText license is appropriate for them means that they are no longer ‘Qualitative’ researchers (as I explained earlier). They are in fact crossing the line into Quant territory, often for the first time in their careers.

The data may be primarily text based, though usually mixed, but there’s no doubt in their mind nor ours, that one of the most valuable aspects of the data is the customer commentary in the text, and this can be a strength

The challenge lies in getting them to quickly accept and come to terms with quantitative/statistical analysis, and thereby also the importance of sample size.

What do you mean my sample is too small?

When you have licensed OdinText you can upload pretty much any data set you have. So even though they may have initially licensed OdinText to analyze some projects with say 3,000+ comments, there’s nothing to stop them from uploading that survey or set of focus groups with just n=150 or so.

Here’s where it sometimes gets interesting. A sample size of n=150 is right on the borderline. It depends on what you are trying to do with it of course. If half of your respondents are doctors (n=75) and half are nurses (n=75), then you may indeed be able to see some meaningful differences between these two groups in your data.

But what if these n=150 respondents are hamburger customers, and your objective was to understand the difference between the 4 US regions in the I referenced earlier? Then you have about n=37 in each subgroup of interest, and you are likely to have very few, IF ANY, meaningful patterns or differences.

Here’s where that cognitive dissonance can happen --- and the breakthroughs if we are lucky.

A former ‘Qual Researcher’ who has spent the last 15 years of their career making ‘management level recommendations’ on how to market burgers differently in different regions based on data like this, for the first time is looking at software which says that there are maybe just two to 3 small differences, or even worse, NO MEANINGFUL PATTERNS OR DIFFERENCES WHATSOEVER, in their data, may be in shock!

How can this be? They’ve analyzed data like this many times before, and they were always able to write a good report with lots of rich detailed examples of how North Eastern Hamburger consumers preferred this or that because of this and that. And here we are, looking at the same kind of data, and we realize, there is very little here other than completely subjective thoughts and quotes.

Opportunity for Change

This is where, to their credit, most of our users start to understand the quantitative nature of data analysis. They, unlike the few ‘Quant Only Jockie’s’ I referenced at the beginning of the article already understand that many of the best insights come from text data in free form unaided, non-leading, yet creative questions.

They only need to start thinking about their sample sizes before fielding a project. To understand the quantitative nature of sampling. To think about the handful of structured data points that they perhaps hadn’t thought much about in previous projects and how they can be leveraged together with the unstructured data. They realize they need to start thinking about this first, before the data has all been collected and the project is nearly over and ready for the most important step, the analysis, where rubber hits the road and garbage in really should mean garbage out.

If we’re lucky, they quickly understand, its not about Quant and Qual any more. It’s about Mixed Data, it’s about having the right data, it’s about having enough data to generate robust findings and then superior insights!

Final Thoughts on the Two Meaningless Nearly Terms of ‘Quant and Qual’

As I’ve said many times before here and on the NGMR blog, the terms “Qualitative” and “Quantitative” at least the way they are commonly used in marketing research, is already passé.

The future is Mixed Data. I’ve known this to be true for years, and almost all our patent claims involve this important concept. Our research shows time and time again, that when we use both structured and unstructured data in our analysis, models and predictions, the results are far more accurate.

For this reason we’ve been hard at work developing the first ever truly Mixed Data Analytics Platform, we’ll be officially launching it three months from now, but many of our current customers already have access. [For those who are interested in learning more or would like early access you can inquire here: OdinText.com/Predict-What-Matters].

In the meantime, if you’re wondering whether you have enough data to warrant advanced mixed data and text annalysis, check out the online version of article in QRCA Views magazine here. Robin Wedewer at QRCA really did an excellent job in asking some really pointed questions that forced me too answer more honestly and clearly than I might otherwise have.

I realize not everyone will agree with today’s post nor my interview with QRCA, and I welcome your comments here. I just please ask that you read both the post above, as well as the interview in QRCA before commenting solely based on the title of this post.

Thank you for reading. As always, I welcome questions publicly in post below or privately via LinkedIn or our Inquiry form.

@TomHCAnderson

Share Your Text Analytics Success with us at The Sentiment Analysis Symposium

Emotion—Influence—Activation: Call for Speakers, 2017 Sentiment Analysis Symposium  Writing today to OdinText users as well as other fellow practitioners, especially those on the client side.

I’m working with Seth Grimes Chairman of the Sentiment Analysis Symposium to get the call out for speakers as well as panelists for an interesting and interactive discussion at the event this Summer.

OdinText has been a long time supporter of the event which this year takes place June 27-28 in New York. The Seniment Analysis Symposium tackles the business value of sentiment, opinion, and emotion in our big data world.

Emotion is one of the keys to customer (and patient, voter, and market) understanding. The symposium is _the_ place to stay current with the technologies and their research and insights applications. Please join us in June, as either an attendee or presenter...

The key to a great conference is great speakers. Whether you're a business visionary, experienced user, technologist, or consultant, please consider presenting. You may submit your proposal here. Choose from among the suggested topics or surprise us. Help us build on our track record of bringing attendees useful, informative technical and business content (along with excellent networking opportunities). Submit by January 31 if possible.

We're inviting talks that focus on customer experience, brand strategy, market research, media & publishing, social insights, healthcare, and financial markets. On the tech side, show off what you know about natural language processing, machine learning, speech and emotion AI, and the data economy.

Please help us create another great symposium! I look forward to seeing you at the event. Feel free to reach out if you have any questions. @TomHCAnderson

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the “Four under 40” market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson

Why Communicating with Aliens is Easier than You Think – And What It Means for Your Company

The Movie “Arrival,” Text Analytics and Machine Translation When I speak with prospective OdinText users who’ve been exposed to other text analytics software providers, I find they tend to mention and ask about things like POS tagging, taxonomies, ontologies, etc.

These terms come from linguistics, the discipline upon which many of the text analytics software platforms in the market today are predicated.

But you may be surprised to learn that as a basis for text analytics, linguistics is shockingly inefficient compared to approaches that rely on mathematics/statistics.

One of the most popular movies in theaters right now, “Arrival,” inadvertently makes this case rather well.

Understanding Alien Languages is Easy (Provided You’re Not a Linguist)

arrivallanguage

arrivallanguage

“Arrival” begins with a flock of spaceships touching down in locations around the world. Linguistics professor Louise Banks (Amy Adams) is then recruited to lead an elite team of experts in a race against time to find a way to communicate with the extraterrestrial visitors and avert a global war.

The film proceeds to build a lot of drama around a pretty minor problem of language analysis and translation—conveniently consuming several months during which the plot can thicken—when, in fact, the task of understanding an alien language like in the movie would be quite EASY.

I daresay in all modesty that I could have done this in a fraction of the time with OdinText and with a much smaller team than Adams’ character had!

arrival-human

arrival-human

It Only Takes a Few Words

In her first conversation with the aliens, Louise introduces herself by writing the word “human” on a little whiteboard she carries, to which the aliens respond by introducing themselves in their language.

After this initial exchange, in the real world, only a few more words would be necessary to start creating and applying a code book (a taxonomy or ontology in linguistics speak), which would allow one to quickly translate anything else said and to then communicate via a small, imperfect but highly effective vocabulary.

For example, a little later in the movie, one of the aliens tells Louise that another alien who is missing from their meeting that day is “in the death process,” which, of course, means the other alien is absent because he is dying.

Everyone in the audience gets what the alien means by “in the death process.”  Indeed, communicating successfully with a small, imperfect vocabulary like this is far more efficient and reliable than one might assume. My two-year-old son and I are quite good at communicating in these sorts of two- or three-word phrases.  And no parts of speech tagging are necessary (nor would they be very helpful here).

I’ll come back to this idea of small, imperfect but surprisingly efficient vocabularies in a bit. But first, let’s consider a related but more challenging matter: breaking code.

How the Allies Used Text Analytics to Break the German Code

Compared to translating an alien language, it would be only slightly more difficult—though honestly not that much more difficult—to crack the Nazi Enigma code that helped the Allies win WWII today using OdinText.

Why more difficult? Because unlike the aliens in “Arrival,” who actually want the humans to learn their language in order to communicate, the Nazis wanted their encrypted language to stay indecipherable.

BENEDICT CUMBERBATCH stars in THE IMITATION GAME

BENEDICT CUMBERBATCH stars in THE IMITATION GAME

In the 2014 movie “The Imitation Game,” Benedict Cumberbatch stars as Alan Turing, the genius British mathematician, logician, cryptologist and computer scientist who led the effort to crack the German code.

In contrast to “Arrival,” the drama in “The Imitation Game” centers on Turing’s determination to build a decryption machine, instead of attempting to decode Enigma by hand like every other scientist assigned to the task.

When his boss refuses to fund his machine’s construction, Turing writes to Churchill, who arranges the funding and names him team leader. Turing subsequently fires the key linguists from the project and the linguistic approach to this text analysis (i.e., code breaking) is chucked in favor of computational mathematics.

Turing’s machine is, of course, critical to the solution (though the technology is simple by today’s standards), but the real breakthrough happens when the scientists realize that the machine can be sped up by recognizing routinely used phrases like “Heil Hitler” (again providing a basic code frame or taxonomy).

The Turing Test: Did You Know You Were Talking to a Computer?

In computer engineering classes on artificial intelligence there is an oft-mentioned thought experiment called “The Chinese Room,” which is used to think about the differences between human and computer cognition. It’s often referenced when discussing the Turing Test, which assesses computer intelligence based on whether a human being can distinguish between a computer and a human being’s replies to the same questions.

Going back now to my earlier point about a small taxonomy being sufficient for communication, and keeping in mind that today’s far more powerful computers running Google Translate or OdinText can process unstructured text data in any language order of magnitudes faster than any human or Turing’s machine, I think The Chinese Room analogy is not just an interesting AI thought experiment, but a good way to explain why translating the alien language in “Arrival” should have been so much easier than the film made it out to be.

The Chinese Room

Imagine for a moment a room with no windows, only a door with a small mail slot.

In the room, we find an average English speaker recruited randomly off the street, someone without any advanced education or background in foreign languages or linguistics.

This person has been paid to spend the day in this room and given a code book for a “squiggly language” he/she has been tasked with translating. In the story, it’s typically Chinese, but it could be any foreign language with which the person is totally unfamiliar. Let’s assume Chinese to stay close to the original story.

After giving him/her this code book—basically an English-to-Chinese/Chinese-to-English dictionary—we tell this person that on occasion we may pass them a note written in Chinese and that they will need to use the code book to figure out what the message means in English. Likewise, if they need anything—water, food, bathroom break, etc.—they will need to pass the request in a note written in Chinese back through the mail slot to us.

Note that this person has ABSOLUTELY NO TRAINING in the syntax or grammar of Chinese. His/her notes may be rudimentary, but certainly they will still be understood.

What’s more, if a native Chinese speaker walked by and observed the notes coming out, they would probably assume that there was a Chinese speaker in the room.

Now, instead of a code book, suppose the person in the room was using a computer program like Google Translate or OdinText, which can instantaneously translate or otherwise process any number of words coming out of the room, making it even more likely that the Chinese-speaking passerby assumes the person in the room speaks Chinese.

Think about this the next time you’re wondering whether data translated by machine—which is so much faster and cheaper than human translation—is sufficient for text analytics purposes (i.e. understanding what hundreds or hundreds of thousands of humans are saying in some foreign language).

My strong belief is yes, definitely. Whether I’m looking at Swedish or Chinese, I’m always rather impressed by how on point today’s computer translation is, and how irrelevant any nuance is, especially at the aggregate level, which is usually where we need to be.

You don’t need a team of NASA scientists, nor a month to do it. You can have it ready by morning! The technology is already here!

@TomHCAnderson

  1. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior here on Earth, please contact us or request a FREE demo using your own data here!

[Key Terms: AI, Artificial Intelligence, Machine Translation, Text Analytics, Linguistics, Computational Linguistics, Taxonomies, Ontologies, Natural Language Processing, NLP]

tomtextanalyticstips

tomtextanalyticstips

Tom H. C. Anderson OdinText Inc. www.odintext.com

ABOUT ODINTEXT

OdinText is a patented SaaS (software-as-a-service) platform for advanced analytics. Fortune 500 companies such as Disney and Shell Oil use OdinText to mine insights from complex, unstructured text data. The technology is available through the venture-backed Stamford, CT firm of the same name founded by CEO Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. Anderson is the recipient of numerous awards for innovation from industry associations such as ESOMAR, CASRO, the ARF and the American Marketing Association. He tweets under the handle @tomhcanderson.