Posts tagged longitudinal analysis
Peaks and Valleys or Critical Moments Analysis

Text Analytics Tips - Branding Peaks and Valleys or Critical Moments Analysis Text Analytics Tips by Gosia

 

 How can you gain interesting insights just from looking at descriptive charts based on your data? Select a key metric of interest like Overall Satisfaction (scale 1-5) and using a text analytics software allowing you to plot text data as well as numeric data longitudinally (e.g., OdinText) view your metric averages across time. Next, view the plot using different time intervals (e.g, the plot could display daily, weekly, bi-weekly, or monthly overall satisfaction averages) and look for obvious “peaks” (sudden increases in the average score) or “valleys” (sudden decreases in the average score). Note down the time periods in which you have observed any peaks or valleys and try to identify reasons or events associated with these trends, e.g., changes in management, a new advertising campaign, customer service quality, etc. The next step is to plot average overall satisfaction scores for selected themes and see how they relate to the identified “peaks” or “valleys” as these themes may provide you with potential answers to the identified critical moments in your longitudinal analysis.

In the figure below you can see how the average overall satisfaction of a sample company varied during approximately one month of time (each data point/column represents one day in a given month). Whereas no “peaks” were found in the average overall satisfaction curve, there was one significant “valley” visible at the beginning of the studied month (see plot 1 in Figure 1). It represented a sudden drop from the average satisfaction of 5.0 (day 1) to 3.1 (day 2) and 3.5 (day 3) before again rising up and oscillating around the average satisfaction of 4.3 for the rest of the days that month. So what could be the reason for this sudden and deep drop in customer satisfaction?

Text Anaytics Tip 2a OdinText Gosia

Text Analytics Tip 2b Gosia OdinText

Text NAalytics Tip 2c Gosia OdinText

Figure 1. Annotated OdinText screenshots showing an example of a exploratory analysis using longitudinal data (Overall Satisfaction).

Whereas a definite answer requires more advanced predictive analyses (also available in OdinText), a quick and very easy way to explore potential answers is possible simply by plotting the average satisfaction scores associated with a few themes identified earlier. In this sample scenario, average satisfaction scores among customers who mentioned “customer service” (green bar; second plot) overlap very well with the overall satisfaction trendline (orange line) suggesting that customer service complaints may have been the reason for lowered satisfaction ratings on days 2 and 3. Another theme plotted, “fast service” (see plot 3), did not at all follow the overall satisfaction trendline as customers mentioning this theme were highly satisfied almost on every day except day 6.

This kind of simple exploratory analysis can be very powerful in showing you what factors might have effects on customer satisfaction and may serve as a crucial step for subsequent quantitative analysis of your text and numeric data.

 

Text Analytics Tips with Gosi

 

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Text analysis answers: Is the Quran really more violent than the Bible? (Part 2 of 3)

BIBLE 728x90 Text analysis answers: Is the Quran really more violent than the Bible? (Part 2 of 3) by Tom H. C. Anderson

Part II: Emotional Analysis Reveals Bible is “Angriest”

In my previous post, I discussed our potentially hazardous plan to perform a comparative analysis using an advanced data mining platform—OdinText—across three of the most important texts in human history: The Old Testament, The New Testament and the Quran.

Author’s note: For more details about the data sources and methodology, please see Part I of this series.

The project was inspired by the ongoing public debate around whether or not terrorism connected with Islamic fundamentalism reflects something inherently and distinctly violent about Islam compared to other major religions.

Before sharing the first set of results with you here today, due to the sensitive nature of this topic, I feel obliged to reiterate that this analysis represents only a cursory, superficial view of just the texts, themselves. It is in no way intended to advance any agenda or to conclusively prove anyone’s point.

Step 1: Sentiment Analysis

We started with a high-level look at Sentiment—positive and negative—and overall results were fairly similar: approx. 30% positive and 20% negative sentiment for each of the three texts. The Old Testament looked to have slightly more negative sentiment than either the New Testament or the Quran, but let’s come back to that later in more detail…

Staying at a high level, I was curious to see what the longitudinal pattern looked like across each of the three texts. Looking for any positive emotion in the texts from beginning to end allows us to get a sense how they progress longitudinally. (See figure 1)

Author’s note: Unlike the Old and New Testaments, in the Quran, verses (suras) are arranged in order of length and not in chronological order.

Any Positive Sentiment

Sentiment Analysis 1

Sentiment Analysis 2

Sentiment Analysis 3

While there is some fluctuation throughout each in terms of positive sentiment, the New Testament appears to be unique in that it peaks on positive sentiment (Corinthians) and ends on a less positive note (Revelations).

It’s also worth noting that positive and negative sentiment are usually highly correlated. In other words when there is more emotion in text, usually, though not always, there is both more positive and negative sentiment.

But let’s look deeper into emotions, beyond simple positive vs. negative sentiment (which is rarely very interesting) and into the eight major human emotion categories: Joy, Anticipation, Anger, Disgust, Sadness, Surprise, Fear/Anxiety and Trust.

Author’s note: These eight major emotion categories were derived from widely-accepted theory in modern psychology.

Step 2: Emotional Analysis

A look at the combined Old and New Testaments—the Bible—compared to the Quran reveals similarities and differences. The Bible and Quran are fairly uniform in ‘Surprise’, ‘Sadness’ and ‘Disgust’. But the Bible registers higher in ‘Anger’ and the Quran rates higher in ‘Joy’ but also in ‘Fear/Anxiety’ and ‘Trust’.

Sentiment Analysis Bible Quran

As we mentioned yesterday, we decided to split the Old and New Testaments for analysis for a couple of reasons. Here’s what they look like:

Sentiment Analysis 5

Comparing our three religious texts across the eight major emotions we find that the Old Testament is the ‘Angriest’ (including most mentions of ‘Disgust’); it also contains the least amount of ‘Joy’.

Here’s an example of a passage that registered under ‘Anger’:

But the LORD said to him "Not so; if anyone kills Cain he will suffer vengeance seven times over." Then the LORD put a mark on Cain so that no one who found him would kill him.

Genesis 4:15

In text analytics, ‘Disgust’ rarely appears outside of food categories; however, it appears in Leviticus several times:

…whether among all the swarming things or among all the other living creatures in the water—you are to detest.

And since you are to detest them, you must not eat their meat and you must detest their carcasses.

Anything living in the water that does not have fins and scales is to be detestable to you.

'These are the birds you are to detest and not eat because they are detestable: the eagle the vulture the black vulture

Leviticus 11:10-13 The Quran, on the other hand, contains the most ‘Fear/Anxiety’ and ‘Trust/Belief’ issues. In this case ‘Fear/Anxiety’ is highly linked to ‘Trust’. Terms such as “doubt” and “disbelief” appear repeatedly in the Quran and are relevant to and affect both of these two primary emotions.

Or like abundant rain from the cloud in which is darkness, and thunder and lightning; they put their fingers into their ears because of the thunder-peal, for fear of death. And Allah encompasses the disbelievers.

Quaran Sūrat al-Baqarah 2:19 As noted in figure 2 above, the New Testament has relatively more ‘Anticipation’ and ‘Surprise’:

But if we hope for what we do not yet have we wait for it patiently.

Romans 8:25

Everyone was amazed and gave praise to God. They were filled with awe and said, ‘We have seen remarkable things today.” 

Luke 5:26

Tomorrow in Part 3, we’ll take a deeper dive to understand some of the underlying reasons for these differences in greater detail and we’ll look into which, if any, of these texts is significantly more violent. Stay tuned!

Up Next: Part III – Violence, Mercy and Non-Believers

Text Analytics for 2015 – Are You Ready?

OdinText SaaS Founder Tom H. C. Anderson is on a mission to educate market researchers about text analytics  [Interview Reposted from Greenbook]

TextAnalyticsGreenbookJudging from the growth of interest in text analytics tracked in GRIT each year, those not using text analytics in market research will soon be a minority. But still, is text analytics for everyone?

Today on the blog I’m very pleased to be talking to text analytics pioneer Tom Anderson, the Founder and CEO of Anderson Analytics, which develops one of the leading Text Analytics software platforms designed specifically for the market research field, OdinText.

Tom’s firm was one of the first to leverage text analytics in the consumer insights industry, and they have remained a leader in the space, presenting case studies at a variety events every year on how companies like Disney and Shell Oil are leveraging text analytics to produce remarkably impactful insights.

Lenny: Tom, thanks for taking the time to chat. Let’s dive right in! I think that you, probably more so than anyone else in the MR space, has witnessed the tremendous growth of text analytics within the past few years. It’s an area we post about often here on GreenBook Blog, and of course track via GRIT, but I wonder, is it really the panacea some would have us believe?

Tom: Depends on what you mean by panacea. If you think about it as a solution to dealing with one of the most important types of data we collect, then yes, it can and should be viewed exactly that way. On the other hand, it can only be as meaningful and powerful as the data you have available to use it on.

Lenny: Interesting, so I think what you’re saying is that it depends on what kind of data you have. What kind of data then is most useful, and which is not at all useful?

Tom: It’s hard to give a one size fits all rule. I’m most often asked about size of data. We have clients who use OdinText to analyze millions of records across multiple languages, on the other hand we have other clients who use it on small concept tests. I think it is helpful though to keep in mind that Text Analytics = Text Mining = Data Mining, and that data mining is all about pattern recognition. So if you are talking about interviews with five people, well since you don’t have a lot of data there’s not really going to be many patterns to discover.

Lenny: Good Point! I’ve been really impressed with the case studies you’ve releases in the past year or two on how clients have been using your software. One in particular was the NPS study with Shell Oil. A lot of researchers (and more importantly CMOs) really believed in the Net Promoter Score before that case study. Are those kinds of insights possible with social media data as well?

Tom: Thanks Lenny. I like to say that “not all data are created equal”. Social media is just one type of data that our clients analyze, often there is far more interesting data to analyze. It seems that everyone thinks they should be using text analytics, and often they seem to think all it can be used for is social media data. I’ve made it an early 2015 new year’s resolution to try to help educate as many market researchers as I can about the value of other text data.

Lenny: Is the situation any different than it was last year?

Tom: Awareness of text analytics has grown tremendously, but knowledge about it has not kept up. We’re trying to offer free mini consultations with companies to help them understand exactly what (if any) data they have are good candidates for text analytics.

Lenny: What sources of data, if any, don’t you feel text analytics should be used on?

It seems the hype cycle has been focused on social media data, but our experience is that often these tools can be applied much more effectively to a variety of other sources of data.

However, we often get questions about IDI (In-Depth-Interviews) and focus group data. This smaller scale qualitative data, while theoretically text analytics could help you discover things like emotions etc. there aren’t really too many patterns in the data because it’s so small. So we usually counsel against using text analytics for qual, in part due to lower ROI.

Often it’s about helping our clients take an inventory around what data they have, and help them understand where if at all text analytics makes sense.

Many times we find that a client really doesn’t have enough text data to warrant text analytics. However this is sad in cases where we also find out they do a considerable amount of ad-hoc surveys and/or even a longitudinal trackers that go out to tens of thousands of customers, and they’ve purposefully decided to exclude open ends because they don’t want to deal with looking at them later. Human coding is a real pain, takes a long time, is inaccurate and expensive; so I understand their sentiment.

But this is awful in my opinion. Even if you aren’t going to do anything with the data right now, an open ended question is really the only question every single customer who takes a survey is willing and able to answer. We usually convince them to start collecting them.

Lenny: Do you have any other advice about how to best work with open ends?

ODIN AD 1 300X250

Tom: Well we find that our clients who start using OdinText end up completely changing how they leverage open ends. Usually they get far wiser about their real estate and end up asking both less closed ended questions AND open ended questions. It’s like a light bulb goes off, and everything they learned about survey research is questioned.

Lenny: Thanks Tom. Well I love what your firm is doing to help companies do some really interesting things that I don’t think could have been done with any other traditional research techniques.

Tom: Thanks for having me Lenny. I know a lot of our clients find your blog useful and interesting.

If any of your readers want a free expert opinion on whether or not text analytics makes sense for them, we’re happy to talk to them about it. Best way to do so is probably to hit the info request button on our site, but I always try my best to respond directly to anyone who reaches out to me personally on LinkedIn as well.

Lenny: Thanks Tom, always a pleasure to chat with you!

For readers interested in hearing more of Tom’s thoughts on Text Analytics in market research, here are two videos from IIeX Atlanta earlier this year that are chock full of good information:

Panel: The Great Methodology Debate: Which Approaches Really Deliver on Client Needs?

Discussing the Future of Text Analytics with Tom Anderson of Odin Text

Forget Big Data, Think Mid Data

Stop Chasing the Big Data; Mid Data makes more sense After attending the American Marketing Association’s first conference on Big Data this week, I’m even more convinced of what I already suspected from speaking to hundreds of Fortune 1000 marketers the last couple of years. Extremely few are working with anything approaching what would be called “Big Data” – And I believe they don’t need to – But many should start thinking about how to work with Mid Data!

BigDataMidDataSmallData

“Big Data”, “Big Data”, “Big Data”. It seems like everyone is talking about it, but I find extremely few researchers are actually doing it. Should they be?

If you’re reading this, chances are that you’re a social scientist or business analyst working in consumer insights or related area. I think it’s high time that we narrowed the definition of ‘Big Data’ a bit and introduced a new more meaningful and realistic term “MID DATA” to describe what is really the beginning of Big Data.

If we introduce this new term, it only makes sense that we refer to everything that isn’t Big or Mid data as Small Data (I hope no one gets offended).

Small Data

I’ve included a chart, and for simplicity will think of size here as number of records, or sample if you prefer.

‘Small Data’ can include anything from one individual interview in qualitative research to several thousand survey responses in longitudinal studies. At this level of size quantitative and qualitative can technically be lumped together as neither currently fit the generally agreed upon (and admittedly loose) definition of what is currently “Big Data”. You see, rather than a specific size, the current definition of Big Data is varies depending on the capabilities of the organization in question. The general rule for what would be considered Big Data would be data which cannot be analyzed by commonly used software tools.

As you can imagine, this definition is an IT/hardware vendor’s dream, as it describes a situation where a firm does not have the resources to analyze (supposedly valuable) data without spending more on infrastructure, usually a lot more.

Mid Data

What then is Mid Data? At the beginning of Big Data, some of the same data sets we might call Small Data can quickly turn into Big Data. For instance, the 30,000-50,000 records from a customer satisfaction survey which can sometimes be analyzed in commonly available analytical software like IBM-SPSS without crashing. However, add text comments to this same data set and performance slows considerably. These same data sets will now often take too long to process or more typically crash.

If these same text comments are also coded as is the case in text mining, the additional variables added to this same dataset may increase significantly in size. This then is currently viewed as Big Data, where more powerful software will be needed. However I believe a more accurate description would be Mid Data, as it is really the beginning of Big Data, and there are many relatively affordable approaches to dealing with this size of data. But more about this in a bit…

Big Data

Now that we’ve taken a chunk out of Big Data and called it Mid Data, let’s redefine Big Data, or at least agree on where Mid Data ends and when ‘Really Big Data’ begins.

To understand the differences between Mid Data and Big Data we need to consider a few dimensions. Gartner analyst Doug Laney famously referred to Big Data as being 3-Dimensional; that is having increasing volume, variety, and velocity (now commonly referred to as the 3V model).

To understand the difference between Mid Data and Big Data though, only two variables need to be considered, namely Cost and Value. Cost (whether in time or dollars) and expected value are of course what make up ROI. This could also be referred to as the practicality of Big Data Analytics.

While we often know that some data is inherently more valuable than other data (100 customer complaints emailed to your office should be more relevant than a 1000 random tweets about your category), one thing is certain. Data that is not analyzed has absolutely no value.

As opposed to Mid Data, to the far right of Big Data or Really Big Data, is really the point beyond which an investment in analysis, due to cost (which includes risk of not finding insights worth more than the dollars invested in the Big Data) does not make sense. Somewhere after Mid Data, big data analytics will be impractical both theoretically, and for your firm in very real economic terms.

Mid Data on the other hand then can be viewed as the Sweet Spot of Big Data analysis. That which may be currently possible, worthwhile and within budget.

So What?

Mid Data is where many of us in market research have a great opportunity. It is where very real and attainable insight gains await.

Really Big Data, on the other hand, may be well past a point of diminishing returns.

On a recent business trip to Germany I had the pleasure of meeting a scientist working on a real Big Data project, the famous Large Hedron Collider project at CERN. Unlike the Large Hadron Collider, consumer goods firms will not fund the software and hardware needed to analyze this level of Big Data. Data magnitudes common at the Collider (output of 150 million sensors delivering data 40 million times per second) are not economically feasible but nor are they needed. In fact, scientists at CERN do not analyze this amount of Big Data. Instead, they filter out 99.999% of collisions focusing on just 100 of the “Collisions of Interest” per second.

The good news for us in business is that if we’re honest, customers really aren’t that difficult to understand. There are now many affordable and excellent Mid Data software available, for both data and text mining, that do not require the exabytes of data or massively parallel software running on thousands of servers. While magazines and conference presenters like to reference Amazon, Google and Facebook, even these somewhat rare examples sound more like IT sales science fiction and do not mention the sampling of data that occurs even at these companies.

As scientists at Cern have already discovered, it’s more important to properly analyze the fraction of the data that is important (“of interest”) than to process all the data.

At this point some of you may be wondering, well if Mid Data is more attractive than Big Data, then isn’t small data even better?

The difference of course is that as data increases in size we can not only be more confident in the results, but we can also find relationships and patterns that would not have surfaced in traditional small data. In marketing research this may mean the difference between discovering a new niche product opportunity or quickly countering a competitor’s move. In Pharma, it may mean discovering a link between a smaller population subgroup and certain high cancer risk, thus saving lives!

Mid Data could benefit from further definition and best practices. Ironically some C-Suite executives are currently asking their IT people to “connect and analyze all our data” (specifically the “varied” data in the 3-D model), and in the process they are attempting to create Really Big (often bigger than necessary) Data sets out of several Mid Data sets. This practice exemplifies the ROI problem I mentioned earlier. Chasing after a Big Data holy grail will not guarantee any significant advantage. Those of us who are skilled in the analysis of Small or Mid Data clearly understand that conducting the same analysis across varied data is typically fruitless.

It makes as much sense to compare apples to cows as accounting data to consumer respondent data. Comparing your customers in Japan to your customers in the US makes no sense for various reasons ranging from cultural differences to differences in very real tactical and operational options.

No, for most of us, Mid Data is where we need to be.

@TomHCAnderson

[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics which develops and sells patent pending data mining and text analytics software platform OdinText]