Posts tagged Natural Language Processing (NLP)
IIEX 2016 Competition Showcases Innovation in Market Research

Artificial Intelligence, Mixed Data Analytics and Passive Listening Capture Minds - 2016 Insight Innovation Exchange

I’m just back from the IIEX conference in Atlanta, where OdinText competed in the Insight Innovation Competition. Although I was disappointed that we didn’t win, I’m pleased to report that the judges told me we placed a very close second.

IIeX 2016

IIeX 2016

Attending conferences like this affords me the opportunity to get a pulse on the industry, and I was struck by the fact that text analytics are no longer viewed as a shiny new toy in market research. In fact, as someone who has been working in the natural language processing field for so long, it’s actually somewhat remarkable to see how perceptions of text analytics have matured over just the last year.  Text analytics have become a must-have, and the market has a new wave of healthy competition as a result, which I think is further evidence of a healthy market.

Since OdinText goes beyond just text data and incorporates mixed data—text and quantitative—in our competition pitch we highlighted OdinText’s ability to essentially enable market researchers to do data science.

I strongly believe making data science more accessible is a huge opportunity that OdinText is uniquely positioned to solve, and it’s an area where market researchers can step up to meet a desperate need as we currently have a shortage of about 200,000 data scientists in the US alone.

(Check out this 5-minute video of my IIEX competition pitch and let me know what YOU think!)

Download PDF

Download PDF

You are also most welcome to download a PDF of the PPT presentation >>>

“Machine learning” appears to be the new buzz phrase in research circles, and at IIEX I was hard pressed to find a single vendor not claiming to use machine learning in some respect, no matter where on the service chain they fit. Honestly, though, I got the sense that many use the term without entirely understanding what it means.

We continue to leverage machine learning where it makes sense at OdinText, and there are a few other vendors out there who also clearly have an excellent grasp of the technique.

One such company—which took first place in the competition, in fact—was Remesh. They’re actually using machine learning in a very unique and novel way, by automating the role of an online moderator almost akin to a chat bot. They’ve positioned this as AI, and to replace humans completely with a computer is a holy grail for almost any industry.

I’m optimistic on AI in my field of data and text mining as well, but we’re still a ways off in terms of taking the human out of the mix, and so our goal at OdinText is to use the human as efficiently as possible.

While totally automating what a data scientist does is appealing, in the short term we’re happy with being able to allow a market researcher to do in a few hours what would take a typical data scientist with skills in advanced statistics, NLP, Python, R and C++ days or weeks to do.

Still I admit the prospect of AI replacing researchers completely is an interesting one—albeit not necessarily a popular one among the people who would be replaced—and it’s an area that I’m certainly thinking about.

Third place in the competition I understand was Beatgrid Media, which leverages smart phones (without using almost any battery life) to passively listen to audio streams from radio and TV and overlaying geo demographics with these panelists’ data to better predict advertising reach and efficacy. This is admittedly going to be a very hard field to break into by a start-up as there are many big players in the space who want to own their own measurement. And so this may have been one of the reasons Beatgrid had trouble taking more than third, even though they admittedly have some very interesting technology that could perhaps also be applied in other ways.

Let me know what you think!

(And if you’re interested in a demo of OdinText, please contact us here!)

Tom H.C. Anderson | @TomHCanderson | @OdinText

Tom H.C. Anderson

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Code by Hand? The Benefits of Automated and User-Guided Automated Customer Comment Coding

Text Analytics Tips - Branding Why you should not code text data by hand: Benefits of automated and user-guided automated coding Text Analytics Tips by Gosia

Most researchers know very well that the coding of text data manually (using human coders who read the text and mark different codes) is very expensive both in terms of time that coders need to take and money needed to compensate them for this effort.

However, the major advantage of using human coding is their high understanding of complex meaning of text including sarcasms or jokes.

Usually at least two coders are required to code any type of text data and the calculation of inter-rater reliability or inter-rater agreement is a must. This statistic enables us to see how similarly any number of coders has coded the data, i.e., how often they have agreed on using the exact same codes.

Often even with the simplest codes the accuracy of human coding is low. No two human coders consistently code larger amounts of data the same way because of different interpretations of text or simply due to error. The latter is a reason why no single coder will code the same text data identically when done for the second time (perfect reliability for a single coder could be achieved in theory though, e.g., for very small datasets that can be proofread multiple times).

Another limitation is that human coders can only keep in their working memory a limited number of codes while reading the text. Finally, any change to the code will require repeating the entire coding process from the beginning. Because the process of manual coding of larger datasets is expensive and unreliable automated coding using computer software was introduced.

Automated or algorithm-based text coding solves many of the issues of human coding:

  1. it is fast (thousands of text comments can be read in seconds)
  2. cost-effective (automated coding should be always cheaper than human coding as it requires much less time)
  3. offers perfect consistency (same rules are applied every time without errors)
  4. an unlimited number of codes can be used in theory (some software might have limitations)

However, this process does also have disadvantages. As already mentioned above, humans are the only ones who can perfectly understand the complex meaning of text and simple algorithms are likely going to fail when trying to understand it (even though some new algorithms are under development recently, which can be almost as good as humans). Moreover, most software available on the market has low flexibility as codes cannot be known to or changed by the user.

Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.

Therefore, OdinText developers decided to let users guide the automated coding. Users can view and edit the default codes and dictionaries, create and upload their own, or build custom dictionaries based on the exploratory results provided by the automated analysis. The codes can be very complex and specific producing a good understanding of the meaning of text, which is the key goal of each text analytics software.

OdinText is a user-guided automated text analytics solution, which has aspects and benefits of both fully automated and human coding. It is fast, cost-effective, accurate, and allows for an unlimited number of codes like many other automated text analytics tools. However, OdinText surpasses the capabilities of other software by providing high flexibility and customization of codes/dictionaries and thus a better understanding of the meaning of text. Moreover, OdinText allows you to conduct statistical analyses and create visualizations of your data in the same software.

Try switching from human coding to user-guided automated coding and you will be pleasantly surprised how easy and powerful it is!

Gosia

Text Analytics Tips with Gosi

[Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

[NOTE: OdinText is NOT a tool for human assisted coding. It is a tool used by analysts for better and faster insights from mixed (structured and unstructured) data.]

Support OdinText - Make Data Science Accessible!

Take 7 Seconds to Support the OdinText Mission: Help Make Data Science Accessible! I’m excited to announce that OdinText will participate in the IIEX2016 Insight Innovation Competition!

The competition celebrates innovation in market research and provides a platform for young companies and startups to showcase truly novel products and services with the potential to transform the consumer insights field.

Marketing and research are becoming increasingly complex, and the skills needed to thrive in this environment have changed.

To that end, OdinText was designed to make advanced data analytics and data science accessible to marketers and researchers.

Help us in that mission. It only takes 7 seconds.

Please visit http://www.iicompetition.org/idea/view/387 and cast a ballot for OdinText!

You can view and/or vote for the other great companies here if you like.

Thank you for your consideration and support!

Tom

Tom H. C. Anderson Founder - OdinText Inc. www.odintext.com Info/Demo Request

ABOUT ODINTEXT OdinText is a patented SaaS (software-as-a-service) platform for natural language processing and advanced text analysis. Fortune 500 companies such as Disney and Coca-Cola use OdinText to mine insights from complex, unstructured text data. The technology is available through the venture-backed Stamford, CT firm of the same name founded by CEO Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. The company is the recipient of numerous awards for innovation from industry associations such as ESOMAR, CASRO, the ARF and the American Marketing Association. Anderson tweets under the handle @tomhcanderson.

 

Why Text Analytics Needs to Move at the Speed of Slang

Do You Speak Teen? 10 Terms You May Not Know

Translating the words teens use has been a headache and source of embarrassment for generations of parents. It’s as though the kids speak a different language. Let’s call it a “slanguage”. And you know you’re old when you need google to understand it.

Nowadays, too, it’s much harder to bridge the communication gap because the Internet has dramatically increased the pace at which slanguage changes. In fact, every year hundreds of new slang words and phrases that originated on the Internet are added to the terrestrial dictionary.

"Slanguage" is a moving target

And thanks to social media, new terms, phrases and acronyms—which, in cases, can describe an entire situation—crop up and go viral literally overnight.

In short, slanguage has become a moving target; it seems to change faster than we can pick it up. As soon as we’re proficient, we’re out of touch again.

For obvious reasons, this is not only a problem for parents, it’s particularly frustrating for anyone researching or marketing to youth.

The Problem with “Dictionaries”

Text analytics software has enabled us to monitor what young people are saying online, but it does us little good when the software can’t keep up with slanguage.

One of the primary weaknesses of most text analytics software platforms is that they rely on “dictionaries” to understand what is being discussed or to assign sentiment.

These dictionaries are only as good as the data used to create them. If the data changes in any way (e.g., new words are used or used in different ways) the software will miss it.

So in order to stay current using a conventional text analytics platform, one must manually identify new slang terms as they emerge and continually update the dictionary.

In contrast, OdinText is uniquely able to identify new, never-before-used terms—slang, acronyms, industry jargon, new product/competitor names, etc.—without user input.

Test Your Teenspeak Proficiency!

Staying abreast of changes in teenspeak requires some vigilance. You may be further out of touch than you realize. Let’s take a quiz: just for fun, I randomly pulled 10 terms that have become popular with post-Millennials (very roughly rank-ordered by use below).

If you’re not familiar with these terms or can’t define them, don’t worry. You’re not alone. I didn’t understand any of them either. It’s not necessarily easy to figure out what many of these new terms mean, either.

A conventional, mainstream dictionary won’t be any help here, but the Urban Dictionary can be a lifesaver. You can also learn a lot by researching the images online that are associated with a new trending slang term (especially “memes”) for context. YouTube videos and music can be similarly helpful.

Triangulating using these sources and the most common context is often the best way to stay on top of these moving targets, which as I noted come and go relatively quickly.

Many of the ten I’ve listed below can have more than one meaning depending on context, and some may even be used differently by different demographic groups and even within the same demographic group.

So, without further ado, here are 10 of the top 10 slang terms we’ve spotted circulating within the past few months.

Without skipping ahead to the answers, how many do you know?

  • One (or 1)

  • Dab

  • Schlonged ($ other ‘Trumpisms’)

  • Bae

  • Fetch

  • Lit

  • BRUH

  • Fleek

  • Swag

  • Bazinga!

one love

one love

“One” or “1”

In teenspeak, “one” or “1” doesn’t always signify a quantity. It can also mean “One Love” and is used frequently in parting (like “goodbye”). It may be used in person, on the telephone or via digital communication.

dab

dab

“Dab”

You knew this was a verb meaning to pat or tap gently, but that’s not what the kids mean. The recent uptick in “Dab” was inspired by a dance move popularized in a 2014 video by Atlanta rapper Skippa da Flippa. It’s often used as a sort of victory swagger (“Keep dabbin' ... let the haters hate ... Dab on”). Check out this YouTube clip for more.

trumped

trumped

“Trumped” & “Schlonged”

“Trump” as a noun and as a verb traditionally referred to a stronger hand of cards or other competitive advantage. But due in no small measure to Donald J. Trump the presidential candidate’s ascendancy, various combinations of “Trump” and “Trumped” and several memes and other digital chat have been cropping up with a variety of meanings.

“Trump” has appeared as an adjective describing someone rich or spoiled. A couple of months ago we also saw a renewed interest in “Schlonged” again due to media coverage of candidate Trump. There was some debate on what the actual meaning was. Here again I think the Urban Dictionary is one of the best resources for you to make up your own mind.

bae

bae

“Bae”

According to our analysis, this one seems more popular among women—about twice so—and also somewhat more popular in the Midwest. “Bae” is a pet name for one’s significant other. It may have been derived from “baby” (like “B” and “boo”) or it could be an acronym for “Before Anyone Else.”

fetch

fetch

“Fetch”

It’s not a command for a dog. Think slang predecessors like “cool” or “awesome.”  This one can be traced to the cult hit “Mean Girls”. Ironically, in the film the term never catches on despite one character’s dogged attempts to popularize it.

lit

lit

“Lit”

A hit with the youngest demographic, and skewing somewhat more Northeast regionally, rappers and other musician entertainers have been using “Lit” in recent songs and videos. It can mean a number of things, including that something is “hot” or popular, but also that someone is drunk or high. When used in a phrase like,“It’s Lit,” it means exciting, good or worthwhile. “Come on down, it’s Lit!”

bruh

bruh

“Bruh”

It’s “bro” phonetically tweaked—basically means “buddy” among guys—but it can also be an expression of surprise (and usually as a disappointment) as in “Damn!” The latter use seems to have originated at least partly thanks to a video that appeared on Vine featuring high school basketball star Tony Farmer being sentenced to prison and consequently collapsing.

fleek

fleek

“Fleek”

More popular among younger women, particularly in the South, “fleek” is a synonym for another popular slang phrase, "on point"—basically looking sharp, well-groomed or stylish. Recently, “fleek” has become specifically about eyebrows, in part due to a couple of Instagram videos, and mainstreamed when Kim Kardashian used it to describe a picture of her bleached eyebrows as #EyebrowsOnFleek.

swag

swag

“Swag"

“Swag” may actually already be on the way out, but it’s still quite popular. Derived from “swagger”—the supremely confident style of walking or strutting—“swag” has come to refer generally to an urban style and look associated with Hip-Hop. It could relate to a haircut or shoes, are simply an attitude or presence that exudes confidence and even arrogance.  Example video: Soulja Boy Tell'em - Pretty Boy Swag

bazinga

bazinga

“Bazinga”

This one comes courtesy of “The Big Bang Theory” character Sheldon Cooper and means “Gotcha” or “I fooled you.”

Don’t Let Words Fail You!

I hope you had some fun with this quiz and maybe picked up some new vocabulary, but I’d like to emphasize that slang isn’t the only terminology that changes. Keeping on top of new market entrants, drug names, etc., is important. If you don’t have a technology solution like OdinText that can identify new terms with implications for your business or category, make sure that you at least set up a manual process to regularly check for them.

Until next time – One!

Tom

@TomHCanderson@OdinText

PS. To learn more about how OdinText can help you learn what really matters to your customers and predict real behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Attensity Sells IP to InContact: What Does it Mean?

Goodbye to Attensity, an early text analytics pioneer Attensity

Fellow text analytics colleagues,

I wanted to direct your attention to some industry news that broke today. It appears that Attensity—an OdinText competitor and a long time name in the text analytics software space—is selling its IP assets to InContact, a provider of call center solutions. (Read about it here.)

For people who’ve been watching the text analytics sector, this probably comes as no big surprise. It is well known that Attensity has been in the throes of some financial difficulty for a while. And just last month they sold their European business to an investment consortium. (Read about it here.)

It is my guess that Attensity’s text analysis software will be bundled into InContact’s portfolio as a value add for call center customers—essentially using the former’s basic NLP for a voice-to-text application and then to code results.

Whether or not Attensity’s product continues to exist in a standalone capacity, too, or what this means for their existing customers remains to be seen.

I think this development is noteworthy in part because Attensity was one of the earliest players in the text analytics space. In fact, even Claraview—the company from which Clarabridge was later spun off in 2006—initially licensed Attensity’s technology, before developing their own very similar tool.

As I noted in a very recent blog post, Attensity and Clarabridge both adhere to a rules-based approach that requires costly and time-consuming customization.

At the risk of being self-serving, it seems to me that as the text analytics market continues to mature and buyers become better informed, we’ll see increasing demand for more flexible solutions that are faster to get up and running and easier to use with a better total cost of ownership.

That’s good news for OdinText, but as the situation with Attensity suggests it doesn’t bode well for competitors with eight figure debt selling dated approaches.

That said, it is an extremely exciting time for industry as a whole with adoption continuing to increase as more and more use cases now move to clients ‘must have’ lists.

@TomHCAnderson

 

[NOTE: Tom is Founder and CEO of OdinText Inc.. A long time champion of text mining, in 2005 he founded Anderson Analytics LLC, the first consumer insights/marketing research consultancy focused on text analytics. In 2015 he founded OdinText SaaS which take a new, Next Generation approach to text analytics. He is a frequent speaker and data science guest lecturer at university and research industry events.]

OdinText Wins American Marketing Association Lavidge Global Marketing Research Prize

OdinTextAnalyticsAwardAMA AMA Honors Cloud-Based Text Analytics Software Provider OdinText for Making Data Science Accessible to Marketers

OdinText Inc., developer of the Next Generation Text Analytics SaaS (software-as-a-service) platform of the same name, today was named winner of the American Marketing Association’s  2016 Robert J. Lavidge Global Marketing Research Prize for innovation in the field.

The Lavidge Prize, which includes a $5000 cash award, globally recognizes a marketing research/consumer insight procedure or solution that has been successfully implemented and has a practical application for marketers.

According to Chris Chapman, President of the AMA Marketing Insights Council, OdinText earned the award for its contribution to advancing the practice of marketing by making data science accessible to non-data scientists.

“Consumers are creating oceans of unstructured text data, but putting this tremendously valuable information to practical use has posed a significant challenge for marketers and companies,” said Chapman.

“The nominations for OdinText highlighted how the company has distilled very complex applied analytics processes into an intuitive tool that enables marketers to run sophisticated predictive analyses and simulations by themselves, quickly and easily. This is exactly the kind of practical advancement we look for in awarding the Lavidge Prize,” added Chapman

The cloud-based OdinText software platform enables marketers with no advanced training or data science expertise to harness vast quantities of complex, unstructured text data—survey open-ends, call center transcripts, email, social media, discussion boards—and to rapidly mine valuable insights that would not have been otherwise obtainable without a data scientist.

“Marketing is evolving, getting both broader and deeper in terms of skill sets needed to succeed,” said FreshDirect Vice President of Business Intelligence and Analytics Jim DeMarco, who nominated OdinText for the Lavidge Prize.

“OdinText provides marketers with the capability to access more advanced analysis faster and helps the business they work on gain an information advantage. This is exactly the kind of innovation our industry needs right now,” DeMarco said.

The Lavidge Prize was presented in a special ceremony today at the AMA’s 2016 Analytics with Purpose Conference in Scottsdale, AZ. OdinText CEO Tom H. C. Anderson—a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research—accepted the award on behalf of the firm.

“One of our goals in creating OdinText was to build the tool from an analyst’s perspective, not a software developer’s, so that a marketer armed with OdinText could derive the same insights but faster than a data scientist using traditional techniques and tools,” said Anderson.

“To be recognized for this achievement by the AMA—one of the largest and most prestigious professional associations for marketers in the world, which has devoted itself to leading the way forward into a new era of marketing excellence—is deeply gratifying,” said Anderson.

 

ABOUT ODINTEXT

OdinText is a patented SaaS (software-as-a-service) platform for natural language processing and advanced text analysis. Fortune 500 companies such as Disney and Shell Oil use OdinText to mine insights from complex, unstructured text data easily and rapidly. The technology is available through the venture-backed Stamford, CT firm of the same name founded by CEO Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. He tweets under the handle @tomhcanderson.

For more information, visit OdinText Info Request

ABOUT THE AMERICAN MARKETING ASSOCIATION

With a global network of over 30,000 members, the American Marketing Association (AMA) serves as one of the largest marketing associations in the world.  The AMA is the leading professional association for marketers and academics involved in the practice, teaching, and study of marketing worldwide.  Members of the AMA count on the association to be their most credible marketing resource, helping them to establish valuable professional connections and stay relevant in the industry with knowledge, training, and tools to enhance lifelong learning.

For more information, visit www.ama.org

Text analysis answers: Is the Quran really more violent than the Bible?

Text Analytics Tips: Is the Quran really more violent than the Bible? by Tom H. C. Anderson Part I: The Project

With the proliferation of terrorism connected to Islamic fundamentalism in the late-20th and early 21st centuries, the question of whether or not there is something inherently violent about Islam has become the subject of intense and widespread debate.

Even before 9/11—notably with the publication of Samuel P Huntington’s “Clash of Civilizations in 1996—pundits have argued that Islam incites followers to violence on a level that sets it apart from the world’s other major religions.

The November 2015 Paris attacks and the politicking of a U.S. presidential election year—particularly candidate Donald Trump’s call for a ban on Muslim’s entering the country and President Obama’s response in the State of the Union address last week—have reanimated the dispute in the mainstream media, and proponents and detractors, alike, have marshalled “experts” to validate their positions.

To understand a religion, it’s only logical to begin by examining its literature. And indeed, extensive studies in a variety of academic disciplines are routinely conducted to scrutinize and compare the texts of the world’s great religions.

We thought it would be interesting to bring to bear the sophisticated data mining technology available today through natural language processing and unstructured text analytics to objectively assess the content of these books at the surface level.

So, we’ve conducted a shallow but wide comparative analysis using OdinText to determine with as little bias as possible whether the Quran is really more violent than its Judeo-Christian counterparts.

A few words of caution…

Due to the sensitive nature of this subject, I must emphasize that this analysis is by no means exhaustive, nor is it intended to advance any agenda or to conclusively prove anyone’s point.

The topic and data sources selected for this project constitute a significant departure from the consumer intelligence use cases for which clients typically turn to text analytics, so we thought this would be an interesting opportunity to demonstrate how this tool can be much more broadly applied to address questions and issues outside the realm of market research and business intelligence.

Again, this is only a cursory analysis. I believe there is more than one Ph.D. thesis awaiting students of theology, literature or political science who want to take a much deeper dive into this data.

About the “Data” Sources

First off, it seemed sensible and appropriate to analyze the Old and New Testaments separately. (The Jewish Torah makes up the first five books of the Christian Old Testament, of course, while the New Testament is unique to Christianity.)

We decided to split them for analysis for a couple of reasons: 1) They were written hundreds of years apart and 2) their combined size relative to the Quran.

Though all data (Old Testament, New Testament and Quran) were combined and read into OdinText as a single file, the Old Testament is the largest with over 23K verses and about 623K words, followed by the New Testament with just under 8K verses and 185K words, and then the Quran with just over 6K verses and less than 78K words.

Secondly, there are obviously multiple versions and translations of the texts available for study. We’ve selected the ones that were most accessible and best suited for this kind of analysis.

With regard to the Christian Bible, instead of the King James version, we opted to use the New International Version (NIV) because the somewhat updated language should be easier to work with.

In selecting an English translation of the Quran, we considered the Tafsir-ul-Quran (1957) by the Indian scholar Abdul Majid Daryabad, but decided to go with The Holy Qur'an (1917, 4th rev. ed. 1951) by Maulana Muhammad Ali because this version is more widely used and the data are more easily accessed.

We do not believe the text in either of these choices to differ materially.

Approach: A ‘Top-Down/Bottom-Up’ Inquiry

We recommend and OdinText employs a  ‘Top-Down/Bottom-Up’ approach to text analysis.

This means that identification of issues for investigation will be partly a priori or ‘Top-Down’ (i.e. the analyst determines specific topic areas to explore such as “violence”).

But there will also be a data-driven or ‘Bottom-Up’ aspect in which the software helps to identify topics or areas that may not have occurred to the analyst, but which could be important given the data.

For example…

OdinText looks for sentiments and emotions in the data as soon as it has been uploaded to our servers; however, as this particular data set is rather unique, certain custom dictionary definitions—what we refer to as “issues”—will also need to be created through the Top-Down/Bottom-Up approach.

One simple and unbiased way to do this is to allow the process by which these definitions are created to be as data-driven as possible. There are several ways to look to the data for information. For instance, we might start by looking at the top words mentioned in each source to understand what concepts cut across our data, and how they might be defined. (See figure 1)

3WayTextAnalyticsComparison

In this way, an overarching concept for comparison in each of the three sources can then be developed. For instance, a concept like “God” would need to include all common terms for this concept in each text source.

We can name such a concept something like “God All Inclusive,” and allowing all common definitions/terms for God in each of the texts to be picked up under this concept.

Accordingly, “God All Inclusive” would include any mention of “Lord” (28%) or “God” (11%) in the Old Testament, as well as any mentions of “Jesus” (17%), “God” (16%), “Lord” (8%) or “Christ” (7%) in the New Testament, and any mentions of “Allah” (30%) or “Lord” (14%) in the Quran.

As mentioned earlier, in order to keep this analysis as unbiased as possible (and in order to do it as quickly as possible), we will also rely on OdinText’s built in functionality to understand broader concepts such as positive and negative sentiment as well as other psychological constructs and emotion in text.  In other words, when we look at positive and negative emotion we will be using this broad-based metric across the three texts without any customization at all.

Now that I’ve laid the groundwork for this project, please join me tomorrow as we take a look at the initial results!

Ps.! Considering many people take at least a year to read just one of these texts, you may find it interesting that it took OdinText less than 120 seconds to read, parse and analyze all three texts at once!

 

Up Next: Part II – One of these texts is angrier!

 

Mr Big Data VS. Mr Text Analytics

[Interview re-posted w/ permission from Text Analytics News]

Mr. Big Data & Mr. Text Analytics Weigh In Structured VS. Unstructured Big Data

 

kirk_borne Text Analytics News

If you pay attention to Big Data news you’re sure to have heard of Kirk Borne who’s well respected views on the changing landscape are often shared on social media. Kirk is professor of Astrophysics and Computational Science at George Mason University. He has published over 200 articles and given over 200 invited talks at conferences and universities worldwide. He serves on several national and international advisory boards and journal editorial boards related to big data

 

 

tom_anderson Text Analytics News

Tom H. C. Anderson was an early champion of applied text analytics, and gives over 20 conference talks on the topic each year, as well as lectures at Columbia Business School and other universities. In 2007 he founded the Next Gen Market Research community online where over 20,000 researchers frequently share their experiences online. Tom is founder of Anderson Analytics, developers of text analytics software as a service OdinText. He serves on the American Marketing Association’s Insights Council and was the first proponent of natural language processing in the marketing research/consumer insights field.

 

Ahead of the Text Analytics Summit West 2014, Data Driven Business caught up with them to gain perspectives on just how important and interlinked Big Data is with Text Analytics.

 

Q1. What was the biggest hurdle that you had to overcome in order to reach your current level of achievement with Big Data Analytics?

KB: The biggest hurdle for me has consistently been cultural -- i.e., convincing others in the organization that big data analytics is not "business as usual", that the opportunities and potential for new discoveries, new insights, new products, and new ways of engaging our stakeholders (whether in business, or education, or government) through big data analytics are now enormous.

After I accepted the fact that the most likely way for people to change their viewpoint is for them to observe demonstrated proof of these big claims, I decided to focus less on trying to sell the idea and focus more on reaching my own goals and achievements with big data analytics. After making that decision, I never looked back -- whatever successes that I have achieved, they are now influencing and changing people, and I am no longer waiting for the culture to change.

THCA: There are technical/tactical hurdles, and methodological ones. The technical scale/speed ones were relatively easy to deal with once we started building our own software OdinText. Computing power continues to increase, and the rest is really about optimizing code.

The methodological hurdles are far more challenging. It’s relatively easy to look at what others have done, or even to come up with new ideas. But you do have to be willing to experiment, and more than just willingness, you need to have the time and the data to do it! There is a lot of software coming out of academia now. They like to mention their institution in every other sentence “MIT this” or ‘UCLA that”. The problem they face is twofold. On the one hand they don’t have access to enough real data to see if their theories play out. Secondly, they don’t have the real world business experience and access to clients to know what things are actually useful and which are just novelty.

So, our biggest hurdle has been the time and effort invested through empirical testing. It hasn’t always been easy, but it’s put me and my company in an incredibly unique position.

Q2. Size of data, does it really matter? How much data is too little or too much?

THCA: Great question, with text analytics size really does matter. While it’s technically possible to get insights from very small data, for instance on our blog during the elections one of my colleagues did a little analysis of Romney VS. Obama debate transcripts, text analytics really is data mining, and when you’re looking for patterns in text, the more data you have the more interesting relationships you can find.

KB: Size of data doesn't really matter if you are just getting started. You should get busy with analytics regardless of how little data you have. The important thing is to identify what you need (new skills, technologies, processes, and data-oriented business objectives) in order to take advantage of your digital resources and data streams. As you become increasingly comfortable with those, then you will grow in confidence to step up your game with bigger data sets. If you are already confident and ready-to-go, then go! The big data revolution is like a hyper-speed train -- you cannot wait for it to stop in order to get on board -- it isn't stopping or slowing down! At the other extreme, we do have to wonder if there is such a thing as too much data. The answer to this question is "yes" if we dive into big data's deep waters blindly without the appropriate "swimming instruction" (i.e., without the appropriate skills, technologies, processes, and data-oriented business objectives). However, with the right preparations, we can take advantage of the fact that bigger data collections enable a greater depth of discovery, insight, and data-driven decision support than ever before imagined.

Q3. What is the one thing that motivates and inspires you the most in your Big Data Analytics work?

KB: Discovery! As a scientist, I was born curious. I am motivated and inspired to ask questions, to seek answers, to contemplate what it all means, and then to ask more questions. The rewards from these labors are the discoveries that are made along the way. In data analytics, the discoveries may be represented by a surprising unexpected pattern, trend, association, correlation, event, or outlier in the data set. That discovery then becomes an intellectual challenge (that I love): What does it mean? What new understanding does this discovery reveal about the domain of study (whether it is astrophysics, or retail business, or national security, or healthcare, or climate, or social, or whatever)? The discovery and the corresponding understanding are the benefits of all the hard work of data wrangling.

THCA: Anyone working with analytics has to be curious by nature. Satisfying that curiosity is what drives us. More specifically in my case, if our clients get excited about using our software and the insights they’ve uncovered, then that really gets me and my whole team excited. This can be challenging, and not all data is created equal.

It can be hard to tell someone who is excited about trying Text Analytics that their data really isn’t suitable. The opposite is even more frustrating though, knowing that a client has some really interesting data but is apprehensive about trying something new because they have some old tools lying around that they haven’t used, or because they have a difficult time getting access to the data because it’s technically “owned” by some other department that doesn’t ‘Get’ analytics. But helping them build a case and then helping them look good by making data useful to the organization really feeds into that basic curiosity. We often discover problems to solve we had no idea existed. And that’s very inspiring and rewarding.

Q4. Which big data analytics myth would you like to squash right here and now?

KB: Big data is not about data volume! That is the biggest myth and red herring in the business of big data analytics. Some people say that "we have always had big data", referring to the fact that each new generation has more data than the previous generation's tools and technologies are able to handle. By this reasoning, even the ancient Romans had big data, following their first census of the known world. But that's crazy. The truth of big data analytics is that we are now studying, measuring, tracking, and analyzing just about everything through digital signals (whether it is social media, or surveillance, or satellites, or drones, or scientific instruments, or web logs, or machine logs, or whatever). Big data really is "everything, quantified and tracked". This reality is producing enormously huge data volumes, but the real power of big data analytics is in "whole population analysis", signaling a new era in analytics: the "end of demographics", the diminished use of small samples, the "segment of one", and a new era of personalization. We have moved beyond mere descriptive analysis, to predictive, prescriptive, and cognitive analytics.

THCA: Tough one. There are quite a few. I’ll avoid picking on “social media listening” for a bit, and pick something else. One of the myths out there is that you have to be some sort of know it all ‘data scientist’ to leverage big data. This is no longer the truth. Along with this you have a lot dropping of buzz words like “natural language processing” or “machine learning” which really don’t mean anything at all.

If you understand smaller data analytics, then there really is no reason at all that you shouldn’t understand big data analytics. Don’t ever let someone use some buzz word that you’re not sure of to impress you. If they can’t explain to you in layman’s terms exactly how a certain software works or how exactly an analysis is done and what the real business benefit is, then you can be pretty sure they don’t actually have the experience you’re looking for and are trying to hide this fact.

Q5.What’s more important/valuable, structured or unstructured data?

KB: Someone said recently that there is no such thing as unstructured data. Even binary-encoded images or videos are structured. Even free text and sentences (like this one) are structured (through the rules of language and grammar). Even some meaning this sentence has. One could say that analytics is the process of extracting order, meaning, and understanding from data. That process is made easier when the data are organized into databases (tables with rows and columns), but the importance and value of the data are inherently no more or no less for structured or unstructured data. Despite these comments, I should say that the world is increasingly generating and collecting more "unstructured data" (text, voice, video, audio) than "structured data" (data stored in database tables). So, in that sense, "unstructured data" is more important and valuable, simply because it provides a greater signal on the pulse of the world. But I now return to my initial point: to derive the most value from these data sources, they need to be analyzed and mined for the patterns, trends, associations, correlations, events, and outliers that they contain. In performing that analysis, we are converting the inherent knowledge encoded in the data from a "byte format" to a "structured information format". At that point, all data really become structured.

THCA: A trick question. We all begin with a question and relatively unstructured data. The goal of text analytics is structuring that data which is often most unstructured.

That said, based on the data we often look at (voice of customer surveys, call center and email data, various other web based data), I’ve personally seen that the unstructured text data is usually far richer. I say that because we can usually take that unstructured data and accurately predict/calculate any of the available structured data metrics from it. On the other hand, the unstructured data usually contain a lot of additional information not previously available in the structured data. So unlocking this richer unstructured data allows us to understand systems and processes much better than before and allows us to build far more accurate models.

So yes, unstructured/text data is more valuable, sorry.

Q6. What do you think is the biggest difference between big data analysis being done in academia vs in business?

KB: Perhaps the biggest difference is that data analysis in academia is focused on design (research), while business is focused on development (applications). In academia, we are designing (and testing) the optimal algorithm, the most effective technique, the most efficient methodology, and the most novel idea. In business, you might be 100% satisfied to apply all of those academic results to your business objectives, to develop products and services, without trying to come up with a new theory or algorithm. Nevertheless, I am actually seeing more and more convergence (though that might be because I am personally engaged in both places through my academic and consulting activities). I see convergence in the sense that I see businesses who are willing to investigate, design, and test new ideas and approaches (those projects are often led by data scientists), and I see academics who are willing to apply their ideas in the marketplace (as evidenced by the large number of big data analytics startups with university professors in data science leadership positions). The data "scientist" job category should imply that some research, discovery, design, modeling, and hypothesis generation and testing are part of that person's duties and responsibilities. Of course, in business, the data science project must also address a business objective that serves the business needs (revenue, sales, customer engagement, etc.), whereas in academia the objective is often a research paper, or a conference presentation, or an educational experience. Despite those distinctions, data scientists on both sides of the academia-business boundary are now performing similar big data analyses and investigations. Boundary crossing is the new normal, and that's a very good thing.

THCA: I kind of answered that in the first question. I think academics have the freedom and time to pursue a research objective even if it doesn’t have an important real outcome. So they can pick something fun, that may or may not be very useful, such as are people happier on Tuesdays or Wednesday’s? They’ll often try to solve these stated objectives in some clever ways (hopefully), though there’s a lot of “Pop” research going on even in academia these days. They are also often limited in the data available to them, having to work with just a single data set that has somehow become available to them.

So, academia is different in that they raise some interesting fun questions, and sometimes the ideas borne out of their research can be applied to business.

Professional researchers have to prove an ROI in terms of time and money. Of course, technically we also have access to both more time and more money, and also a lot more data. So an academic team of researcher working on text analytics for 2-3 years is not going to be exposed to nearly as much data as a professional team.

That’s also why academic researchers often seem so in love with their models and accuracy. If you only have a single data set to work with, then you split it in half and use that for validation. In business on the other hand, if you are working across industries like we do, while we certainly may build and validate models for a specific client, we know that having a model that works across companies or industries is nearly impossible. But when we do find something that works, you can bet it’s going to be more likely to be useful.

Text (AkA Buzz) Analytics

[Re Posted from Next Gen Market Research Blog] First Ever Text Analytics Cartoon

It's been a while since I've posted a cartoon here on the blog. However, all the buzzwords (Big Data, Hadoop, Natural Language Processing and Machine Learning etc.) constantly bandied about in our field inspired me - plus I don't think I've ever seen a text analytics cartoon before.

Hope you like it?

@TomHCAnderson

[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of patented Next Generation Text Analytics™software platform OdinText. For more information and to inquire about software licensing visit ODINTEXT INFO REQUEST]

Selecting the Best Text Analytics Software

The Non-Dummies Guide for Selecting a Text Analytics (or any other) Partner WhatToLookForWhenBuyingBestTextAnalyticsSoftwareSolution

Text Analytics is a Process, Not and End!

What would you say should be the goal of good text analytics software?

Based on the questions we get from clients investigating text analytics solutions there seems to be no small amount of confusion. The fault isn’t theirs, it’s the fault of the early text analytics and social media monitoring vendors who overpromised and under delivered.

Rather than explaining to clients what kind of analysis and insights they should rightfully expect they choose instead to hide the fact that they know very little themselves about how text analytics can and should actually be applied, instead most text analytics sales staff preferred to talk theoretically using as many technical buzzwords like “natural language processing” as possible.

Here are questions you can safely set aside when investigating the right text analytics solution. They have next to no meaning whatsoever in terms of efficacy for your use case:

-How do you handle xyz stemming, semantic ABC, Ontologies and ______? [Insert other favorite buzz word you’ve heard but don’t really understand]

-What does the output look like, do you have a pretty dashboard? [If you buy text analytics software for pie charts and word clouds you’ll be in trouble. Dashboards, even if you find they make sense need serious customization]

-Do you have a cool black sci-fi looking background with neon colored maps? [If you plan to put a bunch of monitors up and pretend you or on the bridge of starship enterprise I suppose this may make sense?!?!]

Instead, these kinds of questions are what you should be asking:

-Tell me about a client with the same kind of data that I have. How have they benefited from the tool? [They better be darn specific]

-Show me how it works with my own data!? [It’s easy to give a demo of poorly working software with canned data. Always make then use your data and never give them more than a day or two max to set it up]

Even better Text Analytics tools are becoming easier to use, and I admit, keeping OdinText intuitive as we add more features is challenging. However, one of the biggest single misconceptions about text analytics software is that they somehow have this magical “artificial intelligence” power. Some sort of power to discern everything and automatically write the report for you. I’m really not exaggerating.

Text analytics is not an end, it is a process. Find a vendor who understands this and whose software is not black box. Here simple is better. If how the software does its coding is hidden in a black box, and the sales person throws buzz words at you to make you feel safe/confused about the fact you have no idea about how the sausage is made, it’s not because they have valuable “linguistic” or “machine learning” rules (more buzz words) -those can only be developed after carefully studying your own data, it’s because their software doesn’t actually work too well and will require a lot of expensive and time consuming customization for unproven performance.

After choosing a text analytics software tool that is powerful and intuitive, a software that you can trust, then the fun begins. You or your analyst should be able to learn how to use the tool relatively quickly, but as with anything, you should expect to get better with experience.

Remember the early statistical software tools like SPSS and SAS. They worked very well on smaller data and you could trust that they actually did what you expected them to. However you still needed to know what clustering and factor analysis was, and why to look at a mean VS. a median. Just like these tools text analytics software also requires an analyst who can think about the data and how to get the most valuable insights for management.

Unfortunately, people who have never analyzed big data or conducted text analytics for real clients are building text analytics and “social listening” software. Find a vendor who understands your business. Their products will make you a data scientist. You’ll have to do a little more than press one button to understand the data, but since when has anything worthwhile been that easy?

To answer the question I posed earlier - what should be the goal of good text analytics software? – the answer depends on what field you’re in…

If you’re a marketer, then the main question you should be asking is how will this text analytics software help me sell more product to more customers less expensively?

@TomHCAnderson

 

[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of patented Next Generation Text Analytics™ software platform OdinText. For more information and to inquire about software licensing visit ODINTEXT INFO REQUEST]

[Above also posted on the Next Gen Market Research blog]