Posts tagged data mining
Let’s Connect at IIEX 2016!

OdinText Presentations at 2016 Insight Innovation Exchange

I’m looking forward to the Insight Innovation Exchange (IIEX) in Atlanta this coming week.

In just a few years it’s become one of the best marketing research trade events and probably my favorite when it comes to meeting those interested in Next Generation Market Research.

IIeX 2016

If you’re attending please let me know. I’d love to meet up briefly and say hello in person. My colleague Sean Timmins and I would love to meet up, hear what you’re working on and see whether OdinText might be something that could help you get to better insights faster.

[PSST If you would like to attend IIEX feel free to use our Speaker discount code ODINTEXT!]

There are so many cool sessions at the conference, and the venue and the neighborhood are great (love the Atlanta food options).  In case you are still considering which sessions to attend I’d love to invite you to our sessions:

1. Monday 2:00-3:00 pm / Making Data Science More Accessible

Monday 2:00-3:00 In the Grand Ballroom please come support our mission of making data science more accessible in the Insight Innovation Competition. If you are at IIEX, this is THE session you don’t want to miss! [We blogged about this exciting session earlier here].

2. Tuesday 12:00-2:00 pm / Interactive Roundtable

Tuesday 12:00-2:00 also in the Grand Ballroom I will be hosting an interactive roundtable on Text Analytics & Text Mining. In this discussion group, I will be hosting an informative and lively discussion on where and how this very powerful technology is best deployed now and how it will change the future of analytics. This effects everything from social media monitoring, and survey data, to email and call center log analysis and a whole lot more…

3. Tuesday 5:00 pm / Special Panel 

Tuesday 5:00 in a special analysis of survey panelists I will be joining Kerry Hecht Labsuirs, Director of Research Services at Recollective and Jessica Broome, Research Guru at Jessica Broome Research in an investigation of survey panelists. The session is entitled Exploring the Participant Experience. (sneak peek here!)

OdinText was used to analyze the unstructured data from this research, and so I will help by reviewing some of those findings briefly. You can read about some of the initial results here on the blog. We plan to follow up with a second post after the conference.

Again, we really hope to see you at the conference. Please reach out ahead of time and let us know if you’ll be there so we can plan to grab a coffee.  If you can’t make it to the event, and any of the above interests you let us know, I’d be happy to schedule a call.

See you in Atlanta!


Tom H.C. Anderson

@TomHCanderson @OdinText

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips ]

 

Code by Hand? The Benefits of Automated and User-Guided Automated Customer Comment Coding

Text Analytics Tips - Branding Why you should not code text data by hand: Benefits of automated and user-guided automated coding Text Analytics Tips by Gosia

Most researchers know very well that the coding of text data manually (using human coders who read the text and mark different codes) is very expensive both in terms of time that coders need to take and money needed to compensate them for this effort.

However, the major advantage of using human coding is their high understanding of complex meaning of text including sarcasms or jokes.

Usually at least two coders are required to code any type of text data and the calculation of inter-rater reliability or inter-rater agreement is a must. This statistic enables us to see how similarly any number of coders has coded the data, i.e., how often they have agreed on using the exact same codes.

Often even with the simplest codes the accuracy of human coding is low. No two human coders consistently code larger amounts of data the same way because of different interpretations of text or simply due to error. The latter is a reason why no single coder will code the same text data identically when done for the second time (perfect reliability for a single coder could be achieved in theory though, e.g., for very small datasets that can be proofread multiple times).

Another limitation is that human coders can only keep in their working memory a limited number of codes while reading the text. Finally, any change to the code will require repeating the entire coding process from the beginning. Because the process of manual coding of larger datasets is expensive and unreliable automated coding using computer software was introduced.

Automated or algorithm-based text coding solves many of the issues of human coding:

  1. it is fast (thousands of text comments can be read in seconds)
  2. cost-effective (automated coding should be always cheaper than human coding as it requires much less time)
  3. offers perfect consistency (same rules are applied every time without errors)
  4. an unlimited number of codes can be used in theory (some software might have limitations)

However, this process does also have disadvantages. As already mentioned above, humans are the only ones who can perfectly understand the complex meaning of text and simple algorithms are likely going to fail when trying to understand it (even though some new algorithms are under development recently, which can be almost as good as humans). Moreover, most software available on the market has low flexibility as codes cannot be known to or changed by the user.

Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.

Therefore, OdinText developers decided to let users guide the automated coding. Users can view and edit the default codes and dictionaries, create and upload their own, or build custom dictionaries based on the exploratory results provided by the automated analysis. The codes can be very complex and specific producing a good understanding of the meaning of text, which is the key goal of each text analytics software.

OdinText is a user-guided automated text analytics solution, which has aspects and benefits of both fully automated and human coding. It is fast, cost-effective, accurate, and allows for an unlimited number of codes like many other automated text analytics tools. However, OdinText surpasses the capabilities of other software by providing high flexibility and customization of codes/dictionaries and thus a better understanding of the meaning of text. Moreover, OdinText allows you to conduct statistical analyses and create visualizations of your data in the same software.

Try switching from human coding to user-guided automated coding and you will be pleasantly surprised how easy and powerful it is!

Gosia

Text Analytics Tips with Gosi

[Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

[NOTE: OdinText is NOT a tool for human assisted coding. It is a tool used by analysts for better and faster insights from mixed (structured and unstructured) data.]

How to Increase the Amount of Text Data for Analysis

Text Analytics Tips - Branding How to Increase the Amount of Text Data for AnalysisText Analytics Tips by Gosia

If you find yourself slightly disappointed by the quantity or quality of text comments provided by your respondents you are definitely not alone. This is a common problem especially when survey respondents are not compensated for their answers and when they are allowed to leave open-ended questions unanswered.

However, don’t give up and immediately start collecting more data or design a new survey. You current dataset may still contain valuable information in the form of text comments. A good practice is to pool together all text comments from a number of text variables in your dataset. You can select all of them or just a subset that makes the most sense to be analyzed together.

Pooling text data for a richer analysis.

Figure 1. Pooling text data for a richer analysis.

In the attached figure, the bubble on the left represents probably the most frequently analyzed question in customer satisfaction surveys – the open-ended question following a key rating (e.g., Overall Satisfaction Rating or Net Promoters Score Rating). Most of these surveys will have at least one or more very good questions that can compliment the answers given to the open-ended question on the left (see the remaining bubbles on the right of the figure). So why not analyze them altogether? To do that - simply merge these text variables in your data editor remembering to leave a blank space between the content of the columns you are merging.

Conclusion: Enriching your data can be simple and powerful.

This very simple pooling of text data from various open-ended questions will allow you to significantly enrich you analysis in OdinText.

Gosia

 

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Text analysis answers: Is the Quran really more violent than the Bible? (3of3)

Text analysis answers: Is the Quran really more violent than the Bible?by Tom H. C. Anderson

Text Analytics Bible Q

Part III: The Verdict

To recap…

President Obama in his State of the Union last week urged Congress and Americans to “reject any politics that target people because of race or religion”—clearly a rebuke of presidential candidate Donald Trump’s call for a ban on Muslims entering the United States.

This exchange, if you will, reflects a deeper and more controversial debate that has wended its way into not only mainstream politics but the national discourse: Is there something inherently and uniquely violent about Islam as a religion?

It’s an unpleasant discussion at best; nonetheless, it is occurring in living rooms, coffee shops, places of worship and academic institutions across the country and elsewhere in the world.

Academics of many stripes have interrogated the texts of the great religions and no doubt we’ll see more such endeavors in the service of one side or the other in this debate moving forward.

We thought it would be an interesting exercise to subject the primary books of these religions—arguably the core of their philosophy and tenets—to comparison using the advanced data mining technology that Fortune 500 corporations, government agencies and other institutions routinely use to comb through large sets of unstructured text to identify patterns and uncover insights.

So, we’ve conducted a surface-level comparative analysis of the Quran and the Old and New Testaments using OdinText to uncover with as little bias as possible the extent to which any of these texts is qualitatively and/or quantitatively distinct from the others using metrics associated with violence, love and so on.

Again, some qualifiers…

First, I want to make very clear that we have not set out to prove or disprove that Islam is more violent than other religions.

Moreover, we realize that the Old and New Testaments and the Quran are neither the only literature in Islam, Christianity and Judaism, nor do they constitute the sum of these religions’ teachings and protocols.

I must also reemphasize that this analysis is superficial and the findings are by no means intended to be conclusive. Ours is a 30,000-ft, cursory view of three texts: the Quran and the Old and New Testaments, respectively.

Lastly, we recognize that this is a deeply sensitive topic and hope that no one is offended by this exercise.

 

Analysis Step: Similarities and Dissimilarities

Author’s note: For more details about the data sources and methodology, please see Part I of this series.

In Part II of the series, I shared the results of our initial text analysis for sentiment—positive and negative—and then broke that down further across eight primary human emotion categories: Joy, Anticipation, Anger, Disgust, Sadness, Surprise, Fear/Anxiety and Trust.

The analysis determined that of the three texts, the Old Testament was the “angriest,” which obviously does not appear to support an argument that the Quran is an especially violent text relative to the others.

The next step was to, again, staying at a very high level, look at the terms frequently mentioned in the texts to see what if anything these three texts share and where they differ.

Similarity Plot

Text Analytics Similarity Plot 2

This is yet another iterative way to explore the data from a Bottom-Up data-driven approach and identify key areas for more in-depth text analysis.

For instance—and not surprisingly—“Jesus” is the most unique and frequently mentioned term in the New Testament, and when he is mentioned, he is mentioned positively (color coding represents sentiment).

“Jesus” is also mentioned a few times in the Quran, and, for obvious reasons, not mentioned at all in the Old Testament. But when “Jesus” is mentioned in the New Testament, terms that are more common in the Old Testament—such as “God” and “Lord”—often appear with his name; therefore the placement of “Jesus” on the map above, though definitely most closely associated with the New Testament, is still more closely related to the Old Testament than the Quran because these terms appear more often in the former.

Similarly, it may be surprising to some that “Israel” is mentioned more often in the Quran than the New Testament, and so the Quran and the Old Testament are more textually similar in this respect.

So…Is the Quran really more violent than the Old and New Testaments?

Old Testament is Most Violent

A look into the verbatim text suggests that the content in the Quran is not more violent than its Judeo-Christian counterparts. In fact, of the three texts, the content in the Old Testament appears to be the most violent.

Killing and destruction are referenced slightly more often in the New Testament than in the Quran (2.8% vs. 2.1%), but the Old Testament clearly leads—more than twice that of the Quran—in mentions of destruction and killing (5.3%).

New Testament Highest in ‘Love’, Quran Highest in ‘Mercy’

The concept of ‘Love’ is more often mentioned in the New Testament (3.0%) than either the Old Testament (1.9%) or the Quran (1.26%).

But the concept of ‘Forgiveness/Grace’ actually occurs more often in the Quran (6.3%) than the New Testament (2.9%) or the Old Testament (0.7%). This is partly because references to “Allah” in the Quran are frequently accompanied by “The Merciful.” Some might dismiss this as a tag or title, but we believe it’s meaningful because mercy was chosen above other attributes like “Almighty” that are arguably more closely associated with deities.

Text Analytics Plot 3

‘Belief/ Faith’, ‘Non-Members’ and ‘Enemies’

A key difference emerged immediately among the three texts around the concept of ‘Faith/Belief’.

Here the Quran leads with references to ‘believing’ (7.6%), followed by the New Testament (4.8%) and the Old Testament a distant third (0.2%).

Taken a step further, OdinText uncovered what appears to be a significant difference with regard to the extent to which the texts distinguish between ‘members’ and ‘non-members’.

Both the Old and New Testaments use the term “gentile” to signify those who are not Jewish, but the Quran is somewhat distinct in referencing the concept of the ‘Unbeliever’ (e.g.,“disbelievers,” “disbelieve,” “unbeliever,” “rejectors,” etc.).

And in two instances, the ‘Unbeliever’ is mentioned together with the term “enemy”:

“And when you journey in the earth, there is no blame on you if you shorten the prayer, if you fear that those who disbelieve will give you trouble. Surely the disbelievers are an open enemy to you

 An-Nisa 4:101

“If they overcome you, they will be your enemies, and will stretch forth their hands and their tongues towards you with evil, and they desire that you may disbelieve

Al-Mumtahina 60:2

That said, the concept of “Enemies” actually appears most often in the Old Testament (1.8%).

And while the concept of “Enemies” occurs more often in the Quran than in the New Testament (0.7% vs 0.5%, respectively), there is extremely little difference in how they are discussed (i.e., who and how to deal with them) with one exception: the Quran is slightly more likely than the New Testament to mention “the Devil” or “evil” as being an enemy (.2% vs 0.1%).

Conclusion

While A LOT MORE can be done with text analytics than what we’ve accomplished here, it appears safe to conclude that some commonly-held assumptions about and perceptions of these texts may not necessarily hold true.

Those who have not read or are not fairly familiar with the content of all three texts may be surprised to learn that no, the Quran is not really more violent than its Judeo-Christian counterparts.

Personally, I’ll admit that I was a bit surprised that the concept of ‘Mercy’ was most prevalent in the Quran; I expected that the New Testament would rank highest there, as it did in the concept of ‘Love’.

Overall, the three texts rated similarly in terms of positive and negative sentiment, as well, but from an emotional read, the Quran and the New Testament also appear more similar to one another than either of them is to the significantly “angrier” Old Testament.

Of course, we’ve only scratched the surface here. A deep analysis of unstructured data of this complexity requires contextual knowledge, and, of course, some higher level judgment and interpretation.

That being said, I think this exercise demonstrates how advanced text analytics and data mining technology may be applied to answer questions or make inquiries objectively and consistently outside of the sphere of conventional business intelligence for which our clients rely on OdinText.

I hope you found this project as interesting as I did and I welcome your thoughts.

Yours fondly,

Tom @OdinText

TOM DEC 300X250

 

Text analysis answers: Is the Quran really more violent than the Bible? (Part 2 of 3)

BIBLE 728x90 Text analysis answers: Is the Quran really more violent than the Bible? (Part 2 of 3) by Tom H. C. Anderson

Part II: Emotional Analysis Reveals Bible is “Angriest”

In my previous post, I discussed our potentially hazardous plan to perform a comparative analysis using an advanced data mining platform—OdinText—across three of the most important texts in human history: The Old Testament, The New Testament and the Quran.

Author’s note: For more details about the data sources and methodology, please see Part I of this series.

The project was inspired by the ongoing public debate around whether or not terrorism connected with Islamic fundamentalism reflects something inherently and distinctly violent about Islam compared to other major religions.

Before sharing the first set of results with you here today, due to the sensitive nature of this topic, I feel obliged to reiterate that this analysis represents only a cursory, superficial view of just the texts, themselves. It is in no way intended to advance any agenda or to conclusively prove anyone’s point.

Step 1: Sentiment Analysis

We started with a high-level look at Sentiment—positive and negative—and overall results were fairly similar: approx. 30% positive and 20% negative sentiment for each of the three texts. The Old Testament looked to have slightly more negative sentiment than either the New Testament or the Quran, but let’s come back to that later in more detail…

Staying at a high level, I was curious to see what the longitudinal pattern looked like across each of the three texts. Looking for any positive emotion in the texts from beginning to end allows us to get a sense how they progress longitudinally. (See figure 1)

Author’s note: Unlike the Old and New Testaments, in the Quran, verses (suras) are arranged in order of length and not in chronological order.

Any Positive Sentiment

Sentiment Analysis 1

Sentiment Analysis 2

Sentiment Analysis 3

While there is some fluctuation throughout each in terms of positive sentiment, the New Testament appears to be unique in that it peaks on positive sentiment (Corinthians) and ends on a less positive note (Revelations).

It’s also worth noting that positive and negative sentiment are usually highly correlated. In other words when there is more emotion in text, usually, though not always, there is both more positive and negative sentiment.

But let’s look deeper into emotions, beyond simple positive vs. negative sentiment (which is rarely very interesting) and into the eight major human emotion categories: Joy, Anticipation, Anger, Disgust, Sadness, Surprise, Fear/Anxiety and Trust.

Author’s note: These eight major emotion categories were derived from widely-accepted theory in modern psychology.

Step 2: Emotional Analysis

A look at the combined Old and New Testaments—the Bible—compared to the Quran reveals similarities and differences. The Bible and Quran are fairly uniform in ‘Surprise’, ‘Sadness’ and ‘Disgust’. But the Bible registers higher in ‘Anger’ and the Quran rates higher in ‘Joy’ but also in ‘Fear/Anxiety’ and ‘Trust’.

Sentiment Analysis Bible Quran

As we mentioned yesterday, we decided to split the Old and New Testaments for analysis for a couple of reasons. Here’s what they look like:

Sentiment Analysis 5

Comparing our three religious texts across the eight major emotions we find that the Old Testament is the ‘Angriest’ (including most mentions of ‘Disgust’); it also contains the least amount of ‘Joy’.

Here’s an example of a passage that registered under ‘Anger’:

But the LORD said to him "Not so; if anyone kills Cain he will suffer vengeance seven times over." Then the LORD put a mark on Cain so that no one who found him would kill him.

Genesis 4:15

In text analytics, ‘Disgust’ rarely appears outside of food categories; however, it appears in Leviticus several times:

…whether among all the swarming things or among all the other living creatures in the water—you are to detest.

And since you are to detest them, you must not eat their meat and you must detest their carcasses.

Anything living in the water that does not have fins and scales is to be detestable to you.

'These are the birds you are to detest and not eat because they are detestable: the eagle the vulture the black vulture

Leviticus 11:10-13 The Quran, on the other hand, contains the most ‘Fear/Anxiety’ and ‘Trust/Belief’ issues. In this case ‘Fear/Anxiety’ is highly linked to ‘Trust’. Terms such as “doubt” and “disbelief” appear repeatedly in the Quran and are relevant to and affect both of these two primary emotions.

Or like abundant rain from the cloud in which is darkness, and thunder and lightning; they put their fingers into their ears because of the thunder-peal, for fear of death. And Allah encompasses the disbelievers.

Quaran Sūrat al-Baqarah 2:19 As noted in figure 2 above, the New Testament has relatively more ‘Anticipation’ and ‘Surprise’:

But if we hope for what we do not yet have we wait for it patiently.

Romans 8:25

Everyone was amazed and gave praise to God. They were filled with awe and said, ‘We have seen remarkable things today.” 

Luke 5:26

Tomorrow in Part 3, we’ll take a deeper dive to understand some of the underlying reasons for these differences in greater detail and we’ll look into which, if any, of these texts is significantly more violent. Stay tuned!

Up Next: Part III – Violence, Mercy and Non-Believers

Text analysis answers: Is the Quran really more violent than the Bible?

Text Analytics Tips: Is the Quran really more violent than the Bible? by Tom H. C. Anderson Part I: The Project

With the proliferation of terrorism connected to Islamic fundamentalism in the late-20th and early 21st centuries, the question of whether or not there is something inherently violent about Islam has become the subject of intense and widespread debate.

Even before 9/11—notably with the publication of Samuel P Huntington’s “Clash of Civilizations in 1996—pundits have argued that Islam incites followers to violence on a level that sets it apart from the world’s other major religions.

The November 2015 Paris attacks and the politicking of a U.S. presidential election year—particularly candidate Donald Trump’s call for a ban on Muslim’s entering the country and President Obama’s response in the State of the Union address last week—have reanimated the dispute in the mainstream media, and proponents and detractors, alike, have marshalled “experts” to validate their positions.

To understand a religion, it’s only logical to begin by examining its literature. And indeed, extensive studies in a variety of academic disciplines are routinely conducted to scrutinize and compare the texts of the world’s great religions.

We thought it would be interesting to bring to bear the sophisticated data mining technology available today through natural language processing and unstructured text analytics to objectively assess the content of these books at the surface level.

So, we’ve conducted a shallow but wide comparative analysis using OdinText to determine with as little bias as possible whether the Quran is really more violent than its Judeo-Christian counterparts.

A few words of caution…

Due to the sensitive nature of this subject, I must emphasize that this analysis is by no means exhaustive, nor is it intended to advance any agenda or to conclusively prove anyone’s point.

The topic and data sources selected for this project constitute a significant departure from the consumer intelligence use cases for which clients typically turn to text analytics, so we thought this would be an interesting opportunity to demonstrate how this tool can be much more broadly applied to address questions and issues outside the realm of market research and business intelligence.

Again, this is only a cursory analysis. I believe there is more than one Ph.D. thesis awaiting students of theology, literature or political science who want to take a much deeper dive into this data.

About the “Data” Sources

First off, it seemed sensible and appropriate to analyze the Old and New Testaments separately. (The Jewish Torah makes up the first five books of the Christian Old Testament, of course, while the New Testament is unique to Christianity.)

We decided to split them for analysis for a couple of reasons: 1) They were written hundreds of years apart and 2) their combined size relative to the Quran.

Though all data (Old Testament, New Testament and Quran) were combined and read into OdinText as a single file, the Old Testament is the largest with over 23K verses and about 623K words, followed by the New Testament with just under 8K verses and 185K words, and then the Quran with just over 6K verses and less than 78K words.

Secondly, there are obviously multiple versions and translations of the texts available for study. We’ve selected the ones that were most accessible and best suited for this kind of analysis.

With regard to the Christian Bible, instead of the King James version, we opted to use the New International Version (NIV) because the somewhat updated language should be easier to work with.

In selecting an English translation of the Quran, we considered the Tafsir-ul-Quran (1957) by the Indian scholar Abdul Majid Daryabad, but decided to go with The Holy Qur'an (1917, 4th rev. ed. 1951) by Maulana Muhammad Ali because this version is more widely used and the data are more easily accessed.

We do not believe the text in either of these choices to differ materially.

Approach: A ‘Top-Down/Bottom-Up’ Inquiry

We recommend and OdinText employs a  ‘Top-Down/Bottom-Up’ approach to text analysis.

This means that identification of issues for investigation will be partly a priori or ‘Top-Down’ (i.e. the analyst determines specific topic areas to explore such as “violence”).

But there will also be a data-driven or ‘Bottom-Up’ aspect in which the software helps to identify topics or areas that may not have occurred to the analyst, but which could be important given the data.

For example…

OdinText looks for sentiments and emotions in the data as soon as it has been uploaded to our servers; however, as this particular data set is rather unique, certain custom dictionary definitions—what we refer to as “issues”—will also need to be created through the Top-Down/Bottom-Up approach.

One simple and unbiased way to do this is to allow the process by which these definitions are created to be as data-driven as possible. There are several ways to look to the data for information. For instance, we might start by looking at the top words mentioned in each source to understand what concepts cut across our data, and how they might be defined. (See figure 1)

3WayTextAnalyticsComparison

In this way, an overarching concept for comparison in each of the three sources can then be developed. For instance, a concept like “God” would need to include all common terms for this concept in each text source.

We can name such a concept something like “God All Inclusive,” and allowing all common definitions/terms for God in each of the texts to be picked up under this concept.

Accordingly, “God All Inclusive” would include any mention of “Lord” (28%) or “God” (11%) in the Old Testament, as well as any mentions of “Jesus” (17%), “God” (16%), “Lord” (8%) or “Christ” (7%) in the New Testament, and any mentions of “Allah” (30%) or “Lord” (14%) in the Quran.

As mentioned earlier, in order to keep this analysis as unbiased as possible (and in order to do it as quickly as possible), we will also rely on OdinText’s built in functionality to understand broader concepts such as positive and negative sentiment as well as other psychological constructs and emotion in text.  In other words, when we look at positive and negative emotion we will be using this broad-based metric across the three texts without any customization at all.

Now that I’ve laid the groundwork for this project, please join me tomorrow as we take a look at the initial results!

Ps.! Considering many people take at least a year to read just one of these texts, you may find it interesting that it took OdinText less than 120 seconds to read, parse and analyze all three texts at once!

 

Up Next: Part II – One of these texts is angrier!

 

Text Analytics Tips

Text Analytics Tips, with your Hosts Tom & Gosia: Introductory Post Today, we’re blogging to let you know about a new series of posts starting in January 2016 called ‘Text Analytics Tips’. This will be an ongoing series and our main goal is to help marketers understand text analytics better.

We realize Text Analytics is a subject with incredibly high awareness, yet sadly also a subject with many misconceptions.

The first generation of text analytics vendors over hyped the importance of sentiment as a tool, as well as ‘social media’ as a data source, often preferring to use the even vaguer term ‘Big Data’ (usually just referring to tweets). They offered no evidence of the value of either, and have usually ignored the much richer techniques and sources of data for text analysis. Little to no information or training is offered on how to actually gain useful insights via text analytics.

What are some of the biggest misconceptions in text analytics?

  1. “Text Analytics is Qualitative Research”

FALSE – Text Analytics IS NOT qualitative. Text Analytics = Text Mining = Data Mining = Pattern Recognition = Math/Stats/Quant Research

  1. It’s Automatic (artificial intelligence), you just press a button and look at the report / wordcloud

FALSE – Text Analytics is a powerful technique made possible thanks to tremendous processing power. It can be easy if using the right tool, but just like any other powerful analytical tools, it is limited by the quality of your data and the resourcefulness and skill of the analyst.

  1. Text Analytics is a Luxury (i.e. structured data analysis is of primary importance and unstructured data is an extra)

FALSE – Nothing could be further from the truth. In our experience, usually when there is text data available, it almost always outperforms standard available quant data in terms of explaining and/or predicting the outcome of interest!

There are several other text analytics misconceptions of course and we hope to cover many of them as well.

While various OdinText employees and clients may be posting in the ‘Text Analytics Tips’ series over time, Senior Data Scientist, Gosia, and our Founder, Tom, have volunteered to post on a more regular basis…well, not so much volunteered as drawing the shortest straw (our developers made it clear that “Engineers don’t do blog posts!”).

Kidding aside, we really value education at OdinText, and it is our goal to make sure OdinText users become proficient in text analytics.

Though Text Analytics, and OdinText in particular, are very powerful tools, we will aim to keep these posts light, fun yet interesting and insightful. If you’ve just started using OdinText or are interested in applied text analytics in general, these posts are certainly a good start for you.

During this long running series we’ll be posting tips, interviews, and various fun short analysis. Please come back in January for our first post which will deal with analysis of a very simple unstructured survey question.

Of course, if you’re interested in more info on OdinText, no need to wait, just fill out our short Request Info form.

Happy New Year!

Your friends @OdinText

Text Analytiics Tips T G

[NOTE: Tom is Founder and CEO of OdinText Inc.. A long time champion of text mining, in 2005 he founded Anderson Analytics LLC, the first consumer insights/marketing research consultancy focused on text analytics. He is a frequent speaker and data science guest lecturer at university and research industry events.

Gosia is a Senior Data Scientist at OdinText Inc.. A PhD. with extensive experience in content analytics, especially psychological content analysis (i.e. sentiment analysis and emotion in text), as well as predictive analytics using unstructured data, she is fluent in German, Polish and Spanish.]

 

OdinText Wins 2015 CASRO Research Award

CASRO Honors OdinText’s Innovative Next Generation Text Analytics Software at 40th Annual Conference OdinText, a provider of cloud-based analytics software, today announced that its Next Generation Text Analytics software-as-a-service (SaaS) product, has been awarded the Research Entrepreneur of the Year award by CASRO, an organization that represents more than 300 companies and market research operations.

The award honors organizations that—through the excellence of their work, professionalism of their practice, and integrity of their conduct— exemplify the best work in the research industry. The award also acknowledges an organization that has introduced a new direction or service to its research business portfolio and provides leading-edge and innovative services that expand traditional market, opinion, and social research.

Recognized for its patented SaaS technology, OdinText allows companies to analyze large amounts of unstructured and mixed data. OdinText can be used across various types of data including but not limited to survey research, email and telephone data, discussion board ratings, and news articles.

“At OdinText, we don’t see a difference between structured and unstructured data - text mining and data mining – they are far more meaningful together,” said Tom H. C. Anderson, CEO of OdinText. “We are honored to be recognized by CASRO, an organization that has such a long history of championing innovative and sound research techniques.”

In addition to exploring patterns in the data and allowing users to confirm hypothesis, OdinText suggests key relationships in the data that may be overlooked by the user. The software also allows for one-step simulation and predictive analytics.

“Marketing research is evolving, getting both broader and deeper in terms of skill sets needed to succeed,” said Jim DeMarco, vice president of business intelligence and analytics at FreshDirect. “OdinText provides researchers with the capability to access more advanced analysis quicker and helps the business they work on gain an information advantage. This is exactly the kind of innovation our industry needs right now.”

The Coca-Cola Company as well as online grocer, FreshDirect sponsored OdinText’s nomination and the company received the award at CASRO’s 40th Annual Conference, in addition to the $5,000 prize.

“The work of OdinText is indicative of the exciting new methodologies and technologies which are having an increased influence on our changing industry,” said Diane Bowers, president of CASRO. “Acknowledgement of this type of work and the financial support that accompanied this honor highlights our role as a leader in the future of our industry.”

 

About OdinText Inc. OdinText’s Next Generation Text AnalyticsTM turns market researchers into data scientists. The powerful cloud-based software helps users discover patterns and trends in complex unstructured text data. Visit www.odintext.com to learn more or schedule a demo. Backed by Connecticut Innovations and private investors, OdinText is a privately-held company based in Stamford, Conn. Request more information here.

When Text Analytics is Your Brand

When Text Analytics is Your BrandWhat I learned about personal branding at IIEX

Coming back from Insight Innovation Exchange (IIEX) this week in Atlanta and thought I’d blog briefly about the two panel sessions on Personal/Digital Branding in which I participated.

 

Text Analytics

My main reason for attending IIEX was actually to give a brief presentation on the dramatic improvements we've made to our OdinText text analytics software, and how it brings value to untapped consumer text data (open-ends, NPS reasons, customer feedback, website comments, etc.), and how it can really turn any market research analyst into a powerful Data Scientist. Because of IIeX’s stellar reputation, this was the first time we’ve ever given any kind of demo of OdinText in public. Usually our presentations are approved case studies about how our clients like Coca-Cola, Disney, Shell Oil, etc. are using the tool. Also, as text analytics remains a very competitive field, we prefer to share details around the software with those we know have the kind of data where OdinText can be useful.

 

However, since we are launching a new version of OdinText and I was assured by Lenny Murphy that, contrary to what I believed, most attendees actually want to see software demos rather than just hear use cases. In case you missed it, I've posted a brief teaser video below, along with a shameless plug before I go on. If you regularly collect comment type text data, we’d love to hear from you and get you more info about OdinText (Request Info Here). Shameless ad plug over.

.

 

 

Personal Branding

Other than showing off OdinText though, I was also honored to be asked to sit on a personal branding panel with prolific market research tweeters Tom Ewing and Annie Pettit, as well as Dave McCaughan who is a well-known name in East Asian and Australian market research circles.

On the Summer Friday (at 5:30pm no less) before our Monday morning session, Annie Pettit came up with the idea to field an impromptu convenience sample survey, and to my surprise by Sunday afternoon we already had about 150 comments relating to the panelists. Lenny Murphy who has also accumulated a loyal #MRX following on Twitter and on the Greenbook blog was also included in the survey which asked something like “Q. What three things first come to mind when you hear each of these names/personal brands?”.

Though this sample is a bit on the small side for OdinText I quickly visualized the comments to give us some idea of how similar/different each of these 5 ‘brands’ are and what specific topics most frequently co-occur with each of them.

HowSilimarAreTheyIIEXOdinTextviz3

I’m sure all of us were equally interested in the findings, because let’s face it, while EVERYONE has a personal brand (even if unfortunately not everyone recognizes it), few of us ever get an insight into what it really means to people in this unaided top-of-mind market research sort of way.

We agreed not to share any of each other’s raw data, but I’m fine sharing the first 40 responses I received (both good, bad and ugly) below, sorted alphabetically:

American linked in conversationalist

Analytical, ever-present, helpful analytics omnipresent

analytics geek

Arrogant

beard, omnipresence and self publicity

Clever

controversial

Cool Guy

Entreprenuer

Fun honest text analytics

Hans Christian Anderson

He's all about new, cool & hip in the quant world

His banner ads pursue me remorselessly around the web marketing

inconsiderate

Innovative

know his name but can't recall...

Lover of anything that reminds him of the Swedish socialist utopia

money-face

MR. NGMR!

next gen guy

odin text - text pro

OdinText Text Analytics, smart, trustworthy

Opinionated

rebel

Research

respected, helpful, innovative smart

Self promoter

Social media junkie

straight shooter. willing to challenge hyped claims. maybe falling too in love with his own methodology

text analysis

text analytics

text analytics odintext

Text analytics pro

Text Analytics, expert, outspoken, industry leader,

text analytics, NGMR, vikings

Text master, text Analytics

The first to advocate Next Gen Market Research, especially Text Analytics and Data Mining,

The first market researcher to truly understand social, AND bold enough to stand up against trade orgs on behalf

of mid-small research firms. A true research hero

Tom is a great example of focusing on one thing you really care about and want to make better,

and then actually doing that..

Tweeted this survey

up against trade orgs on behalf of mid-small research firms. A true research hero.

A first thing that struck me looking at both the responses for my ‘brand’ as well as those of the others on the panel was that the negative comments, while few overall, were also rather consistent proportionately across all of us.

I think this may have come as a surprise to some of the others, but I expected a few negative remarks related to some of the positions I’ve taken about market research. While I believe the majority of US researchers agree with me, my positions weren’t as welcome by an outspoken few researchers more closely associated or working for these trade organizations. So the question is, as it relates to our personal brands, should we shy away from controversy (as long as it’s not personal or destructive in nature)? And the answer is, I don’t think it’s hurt my brand at all; controversy often leads to change, and usually change for the better. I'm happy to be associated with these issues, and do not fear ruffling feathers.

Of greater importance, and more surprising to me, was that our company brands were almost never mentioned for any of us. I’ve been concerned whether my comments related to other areas of consumer insights research have taken away from what I really want to be known for, OdinText and Text Analytics. The good news was that when market researchers who know me think of me, they think “Text Analytics”. The bad news was that few mention the brand OdinText. But how bad is this really?

A few months ago I wrote about personal branding and Kristin Luck (someone else whom I definitely think should also have been on the panel). You can read that piece here, however, I think the main point is that personal brands undoubtedly create a different and more complex association network in the minds of people than corporate brands or logos do.

 

This can’t be a bad thing, I believe they are complimentary. If people think Tom H. C. Anderson = Text Analytics, they also are likely to think Text Analytics = Tom H. C. Anderson, and so when they have a need for text analytics, some will think of me, and then OdinText (even if the brand OdinText doesn’t first come to mind).

I’m not sure what the association network is for uber personal brands like Bill Gates or the late Steve Jobs, but I would venture to guess it’s similar. Surprisingly perhaps, Microsoft and Apple may well not be the first thing that comes to mind when someone first thinks about these two individual brands. Both really are far more complex than either of the company brands Microsoft and Apple. The individuals stand for so much more (philanthropy, design, success, strength, perseverance, intelligence, innovation…).

Definitely an interesting area, and one that could use more research, aided by text analytics of course, and OdinText ideally .

My takeaway and advice to other market researchers is that personal branding is a good thing. It’s a complex thing, and that’s a good thing. Unlike a simple company product or logo, we as people are deeper and have ability to encompass far more, and deeper dimensions. I believe these personal brands, as I know from experience is the case for both myself and Kristin Luck, have been very beneficial to the companies associated with us. It’s a truism, that this is a people business, and people buy from people.

I encourage everyone to give some thought to their personal brands. Unlike corporate brands they don’t have to be perfect. If they were, they would be very boring and one dimensional. Just be you – and let others know it!

Tom

 

[Tom H. C. Anderson is Founder & CEO of Text Analytics SaaS firm OdinText (www.OdnText.com). He tweets under @TomHCAnderson, blogs at www.tomhhcanderson.com and manages one of the largest and most engaged market research groups on LinkedIn, Next Gen Market Research.]

Text Analytics for 2015 – Are You Ready?

OdinText SaaS Founder Tom H. C. Anderson is on a mission to educate market researchers about text analytics  [Interview Reposted from Greenbook]

TextAnalyticsGreenbookJudging from the growth of interest in text analytics tracked in GRIT each year, those not using text analytics in market research will soon be a minority. But still, is text analytics for everyone?

Today on the blog I’m very pleased to be talking to text analytics pioneer Tom Anderson, the Founder and CEO of Anderson Analytics, which develops one of the leading Text Analytics software platforms designed specifically for the market research field, OdinText.

Tom’s firm was one of the first to leverage text analytics in the consumer insights industry, and they have remained a leader in the space, presenting case studies at a variety events every year on how companies like Disney and Shell Oil are leveraging text analytics to produce remarkably impactful insights.

Lenny: Tom, thanks for taking the time to chat. Let’s dive right in! I think that you, probably more so than anyone else in the MR space, has witnessed the tremendous growth of text analytics within the past few years. It’s an area we post about often here on GreenBook Blog, and of course track via GRIT, but I wonder, is it really the panacea some would have us believe?

Tom: Depends on what you mean by panacea. If you think about it as a solution to dealing with one of the most important types of data we collect, then yes, it can and should be viewed exactly that way. On the other hand, it can only be as meaningful and powerful as the data you have available to use it on.

Lenny: Interesting, so I think what you’re saying is that it depends on what kind of data you have. What kind of data then is most useful, and which is not at all useful?

Tom: It’s hard to give a one size fits all rule. I’m most often asked about size of data. We have clients who use OdinText to analyze millions of records across multiple languages, on the other hand we have other clients who use it on small concept tests. I think it is helpful though to keep in mind that Text Analytics = Text Mining = Data Mining, and that data mining is all about pattern recognition. So if you are talking about interviews with five people, well since you don’t have a lot of data there’s not really going to be many patterns to discover.

Lenny: Good Point! I’ve been really impressed with the case studies you’ve releases in the past year or two on how clients have been using your software. One in particular was the NPS study with Shell Oil. A lot of researchers (and more importantly CMOs) really believed in the Net Promoter Score before that case study. Are those kinds of insights possible with social media data as well?

Tom: Thanks Lenny. I like to say that “not all data are created equal”. Social media is just one type of data that our clients analyze, often there is far more interesting data to analyze. It seems that everyone thinks they should be using text analytics, and often they seem to think all it can be used for is social media data. I’ve made it an early 2015 new year’s resolution to try to help educate as many market researchers as I can about the value of other text data.

Lenny: Is the situation any different than it was last year?

Tom: Awareness of text analytics has grown tremendously, but knowledge about it has not kept up. We’re trying to offer free mini consultations with companies to help them understand exactly what (if any) data they have are good candidates for text analytics.

Lenny: What sources of data, if any, don’t you feel text analytics should be used on?

It seems the hype cycle has been focused on social media data, but our experience is that often these tools can be applied much more effectively to a variety of other sources of data.

However, we often get questions about IDI (In-Depth-Interviews) and focus group data. This smaller scale qualitative data, while theoretically text analytics could help you discover things like emotions etc. there aren’t really too many patterns in the data because it’s so small. So we usually counsel against using text analytics for qual, in part due to lower ROI.

Often it’s about helping our clients take an inventory around what data they have, and help them understand where if at all text analytics makes sense.

Many times we find that a client really doesn’t have enough text data to warrant text analytics. However this is sad in cases where we also find out they do a considerable amount of ad-hoc surveys and/or even a longitudinal trackers that go out to tens of thousands of customers, and they’ve purposefully decided to exclude open ends because they don’t want to deal with looking at them later. Human coding is a real pain, takes a long time, is inaccurate and expensive; so I understand their sentiment.

But this is awful in my opinion. Even if you aren’t going to do anything with the data right now, an open ended question is really the only question every single customer who takes a survey is willing and able to answer. We usually convince them to start collecting them.

Lenny: Do you have any other advice about how to best work with open ends?

ODIN AD 1 300X250

Tom: Well we find that our clients who start using OdinText end up completely changing how they leverage open ends. Usually they get far wiser about their real estate and end up asking both less closed ended questions AND open ended questions. It’s like a light bulb goes off, and everything they learned about survey research is questioned.

Lenny: Thanks Tom. Well I love what your firm is doing to help companies do some really interesting things that I don’t think could have been done with any other traditional research techniques.

Tom: Thanks for having me Lenny. I know a lot of our clients find your blog useful and interesting.

If any of your readers want a free expert opinion on whether or not text analytics makes sense for them, we’re happy to talk to them about it. Best way to do so is probably to hit the info request button on our site, but I always try my best to respond directly to anyone who reaches out to me personally on LinkedIn as well.

Lenny: Thanks Tom, always a pleasure to chat with you!

For readers interested in hearing more of Tom’s thoughts on Text Analytics in market research, here are two videos from IIeX Atlanta earlier this year that are chock full of good information:

Panel: The Great Methodology Debate: Which Approaches Really Deliver on Client Needs?

Discussing the Future of Text Analytics with Tom Anderson of Odin Text