Posts tagged machine learning
What you Need to Know Before Buying AI/Machine Learning

7 Things to Know About AI/Machine Learning (Boiled Down to two Cliff Notes that are even more important).

In case you missed our session on Artificial Intelligence and Machine Learning (AI/ML) at the Insights Associations’ NEXT conference last week, I thought I would share a bit on the blog about what you missed. We had a full room, with some great questions both during and after the session. However, 30 minutes wasn’t enough time to cover everything thoroughly. In the end we agreed on four takeaways:

  • AI is part of how research & insights pros will address the ever-increasing demand for fast research results
  • AI Helps focus on the most important data
  • AI can’t compensate for bad data
  • AI isn’t perfect

So today I thought I would share seven additional points about AI/ML that I often get questions on, and then at the end of this post I’m going to share the ‘Cliff Notes’, i.e. I’m going to share just the 2 most important things you really need to know.  So, unless you want to geek out with me a bit, feel free to scroll to the bottom.

OK, first, before we can talk about anything, we need to define what Artificial Intelligence (AI) is and isn’t.

1. AI/ML definition is somewhat fuzzy

AI, and more specifically machine learning (ML) is a term that is abused almost as often as it is used. On the one hand this is because a lot of folks are inaccurately claiming to be using it, but also because not unlike big data, its definitions can be a bit unclear, and don’t always make perfect sense.

Let’s take this common 3-part regression analysis process:

  1. Data Prep (pre-processing including cleaning, feature identification, and dimension reduction)
  2. Regression
  3. Analysis of process & reporting

This process, even if automated would not be considered machine learning. However, switch out regression with a machine learning technique like Neural Nets, SVM, Decision Trees or Random Forests and bang, it’s machine learning. Why?

Regression models are also created to predict something, and they also require training data. If the data is linear, then there is no way any of these other models will beat regression in terms of ROI. So why would regression not be considered machine learning?

Who knows. Probably just because the first writers of the first few academic papers on ML refenced these techniques and not regression as ML. It really doesn’t make much sense.

2. There are basically 2 types of ML

Some ML approaches are binary like SVM (Support Vector Machines), for predicting something like male or female, and others like Decision Trees are multi class classification.

If you are using decision trees to predict an NPS rating on an 11 point scale then that’s a multi class problem. However, you can ‘trick’ binary techniques like SVM to solve the multi class problem by setting it up to run multiple times.

Either way, you are predicting something.

3. ML can be slow

Depending on the approach used, like Neural Nets for instance, training a model can take several days on a normal computer. There are other issues with Neural Nets as well, like the difficulty for humans to understand and control what they are doing.

But let’s focus on speed for now. Of course, if you can apply a previously trained model on very similar data, then results will be very fast indeed. This isn’t always possible though

If your goal is to insert ML into a process to solve a problem which a user is waiting for, then training an algorithm might not be a very good solution. If another technique, ‘machine learning’ or not, can solve the problem much faster with similar accuracy, then that should be the approach to use.

4. Neural Nets are not like the brain

I’ll pick on Neural Nets a bit more, because they are almost a buzz word unto themselves. That’s because a lot of people have claimed they work like the human brain. This isn’t true. If we’re going to be honest, we’re not sure how the human brain works. In fact, what we do know about the human brain makes me think the human brain is quite different.

The human brain contains nearly 90 billion neurons, each with thousands of synapses. Some of these fire and send information for a given task, some will not fire, and yet others fire and do not send any information. The fact is we don’t know exactly why. This is something we are still working on with hopes that new more powerful quantum computers may give us some insight.

We can however map some functions of the brain to robotics to do things like lift arms, without knowing exactly what happens in between.

There is one problematic similarity between the brain and Neural Nets though. That is, we’re not quite sure how Neural Nets work either. When running a Neural Net, we cannot easily control or explain what happens in the intermediary nodes. So, this (along with speed I mentioned earlier) is more of a reason to be cautious about using Neural Nets.

5. Not All Problems are best solved with Machine Learning

Are all problems best solved with ML? No, probably not.

Take pricing as an example. People have solved for this problem for years, and there are many different solutions depending on your unique situation. These solutions can factor in everything from supply and demand, to cost.

Introducing machine learning, or even just a simpler non-ML based automated technique can sometimes cause unexpected problems. As an example, consider the automated real-time pricing model which Uber used to model supply and demand as inputs. When fares skyrocketed to over $1,000 as drunk people were looking for a ride on New Years eve, the model created a lot of angry customers and bad press.

More on dangers of AI/ML in a bit…

6. It’s harder to beat humans than you think

One of the reasons ML is often touted as a solution is because of how much better than humans computers allegedly are. While theoretically there is truth to this, when applied to real world situations we often see a less ideal picture.

Take self driving cars as an example. Until recently they were touted as “safer than humans”. That was until they began crashing and blowing up.

Take the recent Tesla crash as an example. The AI/ML accidentally latched onto an older faded lane line rather than the newly painted correct lane line and proceeded without breaking, at full speed, into a head on collision with a divider. A specific fatal mistake no human would have been likely to make.

The truth is if we remove driving under the influence and falling asleep from the statistics (two things that are illegal anyway), then human accident statistics are incredibly low.

7. ML is Context Specific!

This is an important one. IBM Watson might be able to Google Lady Gaga’s age quickly, but Watson will be completely useless in identifying her in a picture. Machine learning solutions are extremely context specific.

This context specificity also comes into play when training any type of model. The model will only be as good as the training data used to create it, and the similarity to future data it is uses for predictions.

Model validation methods only test the accuracy of the model on the same exact type of data (typically a random portion of the same data), it does not test the quality of the data itself, nor the application of this model on future data other than the training data.

Be wary of anyone who claims their AI does all sorts of things well, or does it with extremely 100% accuracy.

My final point about Machine Learning & two Cliff Notes…

If some of the above points make it sound as if I’m not bullish on machine learning, I want to clarify that in fact I am. At OdinText we are continuously testing and implementing ML when it makes sense. I’m confident that we as an industry will get better and better at machine learning.

In the case of Tesla above, there are numerous ways to make the computers more efficient, including using special paint that would be easier for computer cameras to see, and traffic lights that send signals telling the computer stating “I am red”, “I am Green” etc., rather than having to guess it via color/light sensing. Things will certainly change, and AI/ML will play an important part.

However, immediately after my talk at the Insights Association I had two very interesting conversations on how to “identify the right AI solution”? In both instances, the buyer was evaluating vendors that made a lot of claims. Way too many in my opinion.

If you forget everything else from today’s post, please remember these two simple Cliff Notes on AI:

  1. You Don’t Buy AI, you buy a solution that does a good job solving your need (which may or may not involve AI)
  2. Remember AI is context specific, and not perfect. Stay away from anyone who says anything else. Select vendors you know you can trust.

There’s no way to know whether something is AI or not without looking at the code.

Unlike academics who share everything under peer review, companies protect their IP, Trade Secrets and code, so there will technically be no way for you to evaluate whether something actually is “AI” or not.

However, the good news is, this makes your job easier. Rather than reviewing someone’s code your job is simply still to decide whether the products solves your needs well or not.

In fact, in my opinion it is far more important to choose a vendor who is honest with you about what they can do to solve your problems. If a vendor claims they have AI everywhere that solves all kinds of various needs, and does so with 100% accuracy, run!

@TomHCAnderson

AI and Machine Learning NEXT at The Insights Association
Insight practitioners from Aon, Conagra and Verizon speak out on what they think about AI and Machine Learning

Artificial Intelligence and Machine Learning are hot topics today in many fields, and marketing research is no  exception. At the Insights Association’s NEXT conference on May 1 in NYC I've been asked to take part in a practitioner panel on AI to share a bit about how we are using AI in natural language processing and analytics at OdinText.

While AI is an important part of what data mining and text analytics software providers like OdinText do, before the conference I thought I’d reach out to a couple of the client-side colleagues to see what they think about the subject.

With me today I have David Lo, Associate Partner at the Scorpio Partnership (a collaboration between McLagan and the Aon Hewitt Corporation) Thatcher Schulte, Sr. Director, Strategic Insights at Conagra Brands, and Jonathan Schwedel, Consumer & Marketplace Insights at Verizon, all who will also be speaking at NEXT.

THCA: Artificial Intelligence means different things to different people and companies. What does it mean to you, and how if at all you are planning to use it in your departments?

Thatcher Schulte – Conagra:

Artificial intelligence is like many concepts we discuss in business, it’s a catch all that loses its meaning as more and more people use it.  I’ve even heard people refer to “Macros” as AI.  To me it means trying to make machines make decisions like people would, but that would beg the question on whether it would be “intelligent.”  I make stupid decisions all the time.

We’re working with Voice to make inferences on what help consumers might need as they make decisions around food.

Jonathan Schwedel – Verizon:

I'm not a consumer insight professional - I'm a data analyst who works in the insights department, so my perspective is different. There are teams in other parts of Verizon who are doing a lot with more standard artificial intelligence and machine learning approaches, so I want to be careful not to conflate the term with broader advanced analytics. I have this image of cognitive scientists sitting in a lab, and am tempted to reduce "AI" to that.

For our specific insights efforts, we work on initiatives that are AI-adjacent - with automation, predictive modeling, machine learning, and natural language processing, but with a few exceptions those efforts are not scaled up, and are ad hoc on a project by project basis. We dabble with a lot of the techniques that are highlighted at NEXT, but I'm not knowledgeable enough about our day to day custom research efforts to speak well to them. One of the selling points of the knowledge management system we are launching is that it's supposed to leverage machine learning to push the most relevant content to our researchers and partners around our company.

David Lo – Scorpio Partnership/McLagan:

Working in the financial services space and specifically within wealth management, AI is a hot topic as it relates to how it will change advice delivery

[we are looking at using it for] Customer journey mapping through the various touchpoints they have with an organization.

 

THCA: There’s a lot of hype these days around AI. What is your impression on what you’ve been hearing, and about the companies you’ve been hearing it from, is it believable?

Thatcher Schulte - Conagra:

I don’t get pitched on AI a lot except through email, which frankly hurts the purpose of those people pitching me solutions.  I don’t read emails from vendors.

Jonathan Schwedel – Verizon:

It's easy to tell if someone does not have a minimum level of domain expertise. The idea that any tool or platform can provide instant shortcuts is fiction. Most of the value in these techniques are very matter of fact and practical. Fantastic claims demand a higher level of scrutiny. If instead the conversation is about how much faster, cheaper, or easier they are, those are at least claims that can be quickly evaluated.

David Lo – Scorpio Partnership/McLagan:

Definitely a lot of hype.  I think as it relates to efficiency, the hype is real.  We will continue to see complex tasks such as trade execution optimized through AI.

 

THCA: For the Insights function specifically, how ready do you think the idea of completely unsupervised vs. supervised/guided AI is? In other words, do you think that the one size fits all AI provided by likes of Microsoft, Amazon, Google and IBM are very useful for research, or does AI need to be more customized and fine tuned/guided before it can be very useful to you?

And related to this, what areas of Market Research do you thing AI currently is better suited to AI?

 Thatcher Schulte - Conagra:

Data sets are more important to me than the solutions that are in the market.  Food decision making is specialized and complex and it varies greatly by what life stage you are in and where you live. Valid data around those factors are frankly more important than the company we push the data through.

David Lo – Scorpio Partnership/McLagan:

Guard rails are always important, particularly as it relates to unique customer needs.

[In terms of usefulness to market research], Data mining

Jonathan Schwedel – Verizon:

Most custom quantitative research studies use small sample sizes, making it often not feasible to do bespoke advanced analytics. When you are working with much larger data sets (the kind you'd see in analytics as a function as opposed to insights), AWS and Azure let you scale, especially with limited resources. It's a good general approach to use algorithmic type approaches with brand new data sets, and then start customizing when you hit the point of diminishing returns, in a way that your work can later be automated at scale.

[In regard to marketing research] It depends how you're defining research - are we broadening that to customer experience? Then text analytics is a most prominent area, because there are many prominent use cases for large companies at the enterprise level. If "market research" covers broader buckets of customer data, then there's potentially a lot you can do.

 

THCA: OK, so which areas are currently less well suited to AI?

David Lo – Scorpio Partnership/McLagan:

Hard to say, but probably less suited toward qualitative research.  In my line of business we do a lot of work among UHNW investors where sample sizes are very small and there isn’t a lot of activity in the online space.

Jonathan Schwedel – Verizon:

I think sample size is often an issue when talking about research studies. Then it comes down to the research design. Is the machine learning component going to be baked in from the start, or is it just bolted on? A lot of these efforts are difficult to quantify. Verizon's insights group learns things all the time from talking to and observing consumers that we would not have otherwise thought to ask.

 

THCA: Does anyone have thoughts on usefulness of chat bots and/or other social media/twitter bots currently?

Jonathan Schwedel – Verizon:

They could potentially allow you to collect a lot more data, and reach under-represented consumers groups in the channels that they want to be in. A lot of our team's focus at Verizon is on the user experience and building a great digital experience for our customers. I think they will be important tools to understand and improve in that area.

 

THCA: Realistically where do you see AI in market research being 3-4 years from now?

David Lo – Scorpio Partnership/McLagan:

Integrated more fully with traditional quantitative research techniques, with researchers re-focusing their efforts on the more creative and thoughtful interpretations of the output.

Jonathan Schwedel – Verizon:

They will provide some new techniques that will be important for specific use cases, but I think the bulk of the fruitful efforts will come from automation and improved scalability. The desire to do more with less is pretty universal, and there's a good roadmap there. The prospect of genuinely groundbreaking insights offers a lot more uncertainty, but it would be great if we do see that level of innovation.

 

Big thanks to Jonathan, David and Thatcher for sharing their insights and opinions on AI.

If you’re interested in further discussion on AI and Machine Learning please feel free too post a comment here, or join me for the 'What’s New & What’s Ahead for AI & Machine Learning?' Panel on May 1st . I will be joined by John Colias of Decision Analyst, Andrew Konya of remesh, and moderator Kathryn Korostoff of Research Rockstar.

-Tom H. C. Anderson @OdinText

 

PS. If you would like to learn more about how OdinText can help you better understand your customers and employees feel free to request more info here. If you’re planning on attending the confernece feel free use my speaker code for a $150 discount [ODINTEXT]. I look forward to seeing some of you at the event!

 

Artificial Intelligence in Consumer Insights

A Q&A session with ESOMAR’s Research World on Artificial Intelligence, Machine Learning, and implications in Marketing Research  [As part of an ESOMAR Research World article on Artificial Intelligence OdinText Founder Tom H. C. Anderson was recently took part in a Q&A style interview with ESOMAR’s Annelies Verheghe. For more thoughts on AI check out other recent posts on the topic including Why Machine Learning is Meaningless, and Of Tears and Text Analytics. We look forward to your thoughts or questions via email or in the comments section.]

 

ESOMAR: What is your experience with Artificial Intelligence & Machine Learning (AI)? Would you describe yourself as a user of AI or a person with an interest in the matter but with no or limited experience?

TomHCA: I would describe myself as both a user of Artificial Intelligence as well as a person with a strong interest in the matter even though I have limited mathematical/algorithmic experience with AI. However, I have colleagues here at OdinText who have PhD's in Computer Science and are extremely knowledgeable as they studied AI extensively in school and used it elsewhere before joining us. We continue to evaluate, experiment, and add AI into our application as it makes sense.

ESOMAR: For many people in the research industry, AI is still unknown. How would you define AI? What types of AI do you know?

TomHCA: Defining AI is a very difficult thing to do because people, whether they are researchers, data scientists, in sales, or customers, they will each have a different definition. A generic definition of AI is a set of processes (whether they are hardware, software, mathematical formulas, algorithms, or something else) that give anthropomorphically cognitive abilities to machines. This is evidently a wide-ranging definition. A more specific definition of AI pertaining to Market Research, is a set of knowledge representation, learning, and natural language processing tools that simplifies, speeds up, and improves the extraction of meaningful data.

The most important type of AI for Market Research is Natural Language Processing. While extracting meaningful information from numerical and categorical data (e.g., whether there is a correlation between gender and brand fidelity) is essentially an easy and now-solved problem, doing the same with text data is much more difficult and still an open research question studied by PhDs in the field of AI and machine learning. At OdinText, we have used AI to solve various problems such as Language Detection, Sentence Detection, Tokenizing, Part of Speech Tagging, Stemming/Lemmatization, Dimensionality Reduction, Feature Selection, and Sentence/Paragraph Categorization. The specific AI and machine learning algorithms that we have used, tested, and investigated range a wide spectrum from Multinomial Logit to Principal Component Analysis, Principal Component Regression, Random Forests, Minimum Redundancy Maximum Relevance, Joint Mutual Information, Support Vector Machines, Neural Networks, and Maximum Entropy Modeling.

AI isn’t necessarily something everyone needs to know a whole lot about. I blogged recently, how I felt it was almost comical how many were mentioning AI and machine learning at MR conferences I was speaking at without seemingly any idea what it means. http://odintext.com/blog/machine-learning-and-artificial-intelligence-in-marketing-research/

In my opinion, a little AI has already found its way into a few of the applications out there, and more will certainly come. But, if it will be successful, it won’t be called AI for too long. If it’s any good it will just be a seamless integration helping to make certain processes faster and easier for the user.

ESOMAR: What concepts should people that are interested in the matter look into?

TomHCA: Unless you are an Engineer/Developer with a PhD in Computer Science, or someone working closely with someone like that on a specific application, I’m not all that sure how much sense it makes for you to be ‘learning about AI’. Ultimately, in our applications, they are algorithms/code running on our servers to quickly find patterns and reduce data.

Furthermore, as we test various algorithms from academia, and develop our own to test, we certainly don’t plan to share any specifics about this with anyone else. Once we deem something useful, it will be incorporated as seamlessly as possible into our software so it will benefit our users. We’ll be explaining to them what these features do in layman’s terms as clearly as possible.

I don’t really see a need for your typical marketing researcher to know too much more than this in most cases. Some of the algorithms themselves are rather complex to explain and require strong mathematical and computer science backgrounds at the graduate level.

ESOMAR: Which AI applications do you consider relevant for the market research industry? For which task can AI add value?

TomHCA: We are looking at AI in areas of Natural Language Processing (which includes many problem subsets such as Part of Speech Tagging, Sentence Detection, Document Categorization, Tokenization, and Stemming/Lemmatization), Feature Selection, Data Reduction (i.e., Dimensionality Reduction) and Prediction. But we've gone well beyond that. As a simple example, take key driver analysis. If we have a large number of potential predictors, which are the most important in driving a KPI like customer satisfaction?

ESOMAR: Can you share any inspirational examples from this industry or related industries (advertisement, customer service)  that can illustrate these opportunities

TomHCA: As one quick example, a user of OdinText I recently spoke to used the software to investigate what text comments were most likely to drive belonging into either of several predefined important segments. The nice thing about AI is that it can be very fast. The not so nice thing is that sometimes at first glance some of the items identified, the output, can either be too obvious, or on the other extreme, not make any sense whatsoever.  The gold is in the items somewhere in the middle. The trick is to find a way for the human to interact with the output which gives them confidence and understanding of the results.

a human is not capable of correctly analyzing thousands, 100s of thousands, or even millions of comments/datapoints, whereas AI will do it correctly in a few seconds. The downside of AI is that some outcomes are correct but not humanly insightful or actionable. It’s easier for me to give examples when it didn’t work so well since its hard for me to share info on how are clients are using it. But for instance recently AI found that people mentioning ‘good’ 3 times in their comments was the best driver of NPS score – this is evidently correct but not useful to a human.

In another project a new AI approach we were testing reported that one of the most frequently discussed topics was “Colons”. But this wasn’t medical data! Turns out the plural of Colon is Cola, I didn’t know that. Anyway, people were discussing Coca-Cola, and AI read that as Colons…  This is exactly the part of AI that needs work to be more prevalent in Market Research.”

Since I can’t talk about too much about how our clients use our software on their data, In a way it’s easier for me to give a non-MR example. Imagine getting into a totally autonomous car (notice I didn’t have to use the word AI to describe that). Anyway, you know it’s going to be traveling 65mph down the highway, changing lanes, accelerating and stopping along with other vehicles etc.

How comfortable would you be in stepping into that car today if we had painted all the windows black so you couldn’t see what was going on?  Chances are you wouldn’t want to do it. You would worry too much at every turn that you might be a casualty of oncoming traffic or a tree.  I think partly that’s what AI is like right now in analytics. Even if we’ll be able to perfect the output to be 100 or 99% correct, without knowing what/how we got there, it will make you feel a bit uncomfortable.  Yet showing you exactly what was done by the algorithm to arrive at the solution is very difficult.

Anyway, the upside is that in a few years perhaps (not without some significant trial and error and testing), we’ll all just be comfortable enough to trust these things to AI. In my car example, you’d be perfectly fine getting into an Autonomous car and never looking at the road, but instead doing something else like working on your pc or watching a movie.

The same could be true of a marketing research question. Ultimately the end goal would be to ask the computer a business question in natural language, written or spoken, and the computer deciding what information was already available, what needed to be gathered, gathering it, analyzing it, and presenting the best actionable recommendation possible.

ESOMAR: There are many stories on how smart or stupid AI is. What would be your take on how smart AI Is nowadays. What kind of research tasks can it perform well? Which tasks are hard to take over by bots?

TomHCA: You know I guess I think speed rather than smart. In many cases I can apply a series of other statistical techniques to arrive at a similar conclusion. But it will take A LOT more time. With AI, you can arrive at the same place within milliseconds, even with very big and complex data.

And again, the fact that we choose the technique based on which one takes a few milliseconds less to run, without losing significant accuracy or information really blows my mind.

I tell my colleagues working on this that hey, this can be cool, I bet a user would be willing to wait several minutes to get a result like this. But of course, we need to think about larger and more complex data, and possibly adding other processes to the mix. And of course, in the future, what someone is perfectly happy waiting for several minutes today (because it would have taken hours or days before), is going to be virtually instant tomorrow.

ESOMAR: According to an Oxford study, there is a 61% chance that the market research analyst job will be replaced by robots in the next 20 years. Do you agree or disagree? Why?

TomHCA: Hmm. 20 years is a long time. I’d probably have to agree in some ways. A lot of things are very easy to automate, others not so much.

We’re certainly going to have researchers, but there may be fewer of them, and they will be doing slightly different things.

Going back to my example of autonomous cars for a minute again. I think it will take time for us to learn, improve and trust more in automation. At first autonomous cars will have human capability to take over at any time. It will be like cruise control is now. An accessory at first. Then we will move more and more toward trusting less and less in the individual human actors and we may even decide to take the ability for humans to intervene in driving the car away as a safety measure. Once we’ve got enough statistics on computers being safe. They would have to reach a level of safety way beyond humans for this to happen though, probably 99.99% or more.

Unlike cars though, marketing research usually can’t kill you. So, we may well be comfortable with a far lower accuracy rate with AI here.  Anyway, it’s a nice problem to have I think.

ESOMAR: How do you think research participants will react towards bot researchers?

TomHCA: Theoretically they could work well. Realistically I’m a bit pessimistic. It seems the ability to use bots for spam, phishing and fraud in a global online wild west (it cracks me up how certain countries think they can control the web and make it safer), well it’s a problem no government or trade organization will be able to prevent from being used the wrong way.

I’m not too happy when I get a phone call or email about a survey now. But with the slower more human aspect, it seems it’s a little less dangerous, you have more time to feel comfortable with it. I guess I’m playing devil’s advocate here, but I think we already have so many ways to get various interesting data, I think I have time to wait RE bots. If they truly are going to be very useful and accepted, it will be proven in other industries way before marketing research.

But yes, theoretically it could work well. But then again, almost anything can look good in theory.

ESOMAR: How do you think clients will feel about the AI revolution in our industry?

TomHCA: So, we were recently asked to use OdinText to visualize what the 3,000 marketing research suppliers and clients thought about why certain companies were innovative or not in the 2017 GRIT Report. One of the analysis/visualizations we ran which I thought was most interesting visualized the differences between why clients claimed a supplier was innovative VS why a supplier said these firms were innovative.

I published the chart on the NGMR blog for those who are interested [ http://nextgenmr.com/grit-2017 ], and the differences couldn’t have been starker. Suppliers kept on using buzzwords like “technology”, “mobile” etc. whereas clients used real end result terms like “know how”, "speed" etc.

So I’d expect to see the same thing here. And certainly, as AI is applied as I said above, and is implemented, we’ll stop thinking about it as a buzz word, and just go back to talking about the end goal. Something will be faster and better and get you something extra, how it gets there doesn’t matter.

Most people have no idea how a gasoline engine works today. They just want a car that will look nice and get them there with comfort, reliability and speed.

After that it’s all marketing and brand positioning.

 

[Thanks for reading today. We’re very interested to hear your thoughts on AI as well. Feel free to leave questions or thoughts below, request info on OdinText here, or Tweet to us @OdinText]

Text Analytics Identifies Globalization Impact on Culture

International Text Analytics Poll™ Explores 11 Cultures in 10 Countries and 8 Languages! [Part I]

When pundits declare that the western world is  now in the throes of a globalization “backlash,” they’re generally referring to the reversal of decades of economic and trade policy, things like Brexit.

But what of other concerns typically associated with globalization? What about culture?

Specifically, there are those who argue that globalization will mean the end of cultures, that the various cultures of the world will over time dilute and blend until there is ultimately just one global melting pot culture.

They may be right.

When we think about culture, it’s often in terms of food, music, customs, etc., but it turns out that when you ask people in countries around the world to describe their own culture in their own words, one nearly universal and unexpected attribute rises to the top: diversity/multiculturalism.

In fact, multiculturalism/diversity was one of the primary and most frequently mentioned attributes used by over 15,500 people to describe 11 different cultures across 10 countries and eight languages!

Text Analytics on a Massive, Multilingual International Scale

Last week on this blog, we published the results of a Text Analytics Poll™ for the favorite movie of all time across six countries and five languages. The project generated a flood of inquiries.

Since everyone is so interested in what can be accomplished on an international scale, we increased the scope of this project significantly.

This time, we asked more than 15,500 people (at least n=1,500 per country) in 10 countries and eight languages the following:

“How would you explain <insert country> culture to someone who isn’t at all familiar with it?”

Then we ran their comments through OdinText, which identified the top 200 cultural markers or features from more than 15,500 text comments and also analyzed those comments for significant patterns of emotion.

How We Translated AND Analyzed the Data (In Less Than Two Hours)

Author’s note: If you’re not interested in methodology, please feel free to skip ahead to the results down below!

Many of you contacted us asking for more details last week, so I’ve provided some additional nuts and bolts here…

Step 1: Data Prep (Translation)

I usually limit total analytical time for any of these Text Analytics Poll™ projects to fewer than two hours. I admit that’s going to be a challenge today, as I’m looking at more than 15,500 comments across 11 cultures from 10 countries in eight languages.

The first challenge is translation. I happen to speak a few languages in addition to English, but in this case I’m faced with seven languages that I don’t understand well enough to analyze. If I did understand each of the languages, or were working with analysts who did, we could easily conduct the analysis in OdinText in the native form.

I’ll point out that while some corporations claim to be “global” in everything they do, in reality there is never enough language fluency at corporate to handle this type of analysis, so analyses are typically divvied up and entrusted to local divisions—a time-consuming and imperfect task, especially when the goal in this case is to make head-to-head comparisons across these countries.

Therefore translation is necessary. While less precise than human translation, machine translation lends itself quite well to a project like this and is more than sufficient for OdinText to identify patterns and even to determine which quotes should be of interest. Nothing has a better ROI. Case in point, it took two minutes to translate the data. For those keeping track, I’m at

Above we have an example of machine translated raw data vs. the original French from the multi-country movie analysis I conducted last week. In the case above I’m looking at all mentions of “La Ligne Verte,” a title OdinText identified as appearing frequently among comments from French respondents. I don’t speak French, so I prefer to work with machine translated data on the left, which translated “La Ligne Verte” literally to “The Green Line” –the French title for the U.S. movie “The Green Mile.”

Step 2: Topic Identification

Using the top-down/bottom-up approach we teach in OdinText training and which we’ve blogged about here before, we identify 200 or so topics/features for analysis. This is a semi-supervised approach, and so a human is involved.

Given this somewhat larger multi-country data set, I allowed about 45 minutes for this task, so we’re at 

Step 3: Artificial Intelligence and Structuring the Analysis

Structuring the analysis is the most important and the most difficult part of any project, especially an exploratory mission where you don’t know what you are looking for at the outset.

You may be surprised to know that artificial intelligence and advanced machine learning algorithms can be a lot less useful than one might think. They have a tendency to identify the obvious—the attribute/topic “tradition” in this case—or, in cases, the unexplainable. For instance, terms like “French,” “American,” “Japanese,” “Spanish,” etc., came up in responses to our question. These are, of course, very useful if you’re building an algorithm to predict where comments originate, for example, but they aren’t terribly illuminating for us here.

Examples of other topics auto identified as ‘of interest’ by our AI include “friendliness,” “relaxed/laid back,” “freedom,” and “equality fraternity liberty.” (You can probably guess where that last one came from.) Some of these other, less expected ones warrant a closer look and will be included in the analysis.

We could move right into an exhaustive analysis of each country, but I’m looking to quickly find any interesting patterns in this data, so I elect to use a quick visualization first.

Cultural Differences and Similarities Vizualized

Cultural Differences and Similarities Vizualized (A Few Key Descriptive Dimensions Added)

These visualizations (above) plot cultures that were described in more similar terms by people closer together and those that were described more differently further apart, yielding some interesting patterns. The USA, UK, Brazil, France and even Spain look quite similar. Two countries—Germany and Japan—cluster slightly away from this main bunch, but very close to each other. Then there are those that appear to be most dissimilar from the rest—Mexico, French- and English-speaking Canada, respectively, and Australia.

To my earlier question about whether or not globalization is having a homogenizing effect on cultures, it would appear so at a glance. We’ve noted that several countries cluster closely around the U.S. But look again—the U.S. appears to occupy the center of the cultural universe here! That’s no coincidence, I suspect, as U.S. culture could in many ways be considered the “melting pot” model and, as we saw last week, culture is a major U.S. export.

Analytical time to review multiple visualizations and decide that this is a repeating pattern was 10 minutes. Total analytical time =

Given that we have a full hour left (remember I did not want to spend more than two hours on this analysis), as a next step we conducted a little bottom-up work to look at what makes each country unique from the international aggregate/total and to see whether the pattern in the visualization makes sense.

Example: Why do Germany and Japan look so similar to OdinText?

A glance at the two charts below shows significant differences between how the Japanese and Germans describe their cultures. For instance, the Japanese were 11 times more likely than Germans to say their culture was something that needed to be experienced in order to be understood, and they were four times more likely than Germans to mention their history. They were also 14 times less likely to mention certain places of interest and three times more likely than Germans to mention food.

In contrast, Germans were 27 times more likely to mention beer and eight times more likely to describe their culture as rule-abiding and orderly. (Of course, this does not mean that Japanese culture is any less rule-abiding or orderly; rather, it suggests that for the Japanese these are not defining cultural characteristics.)

Respondents from both countries were more likely than average to mention language, tradition, and politeness, BUT the similarities between these two cultures actually lie primarily in the extent to which they both differ from the other cultures sampled, notably by how infrequently certain features mentioned by people from other cultures appeared in comments from German and Japanese respondents.Total Analytical Time =

This concludes Part 1 of our cultural safari. In Part 2 tomorrow we’ll take a deeper dive into each of the 11 cultures in our study individually, exploring how their members define themselves and the extent to which key cultural drivers differ from or are similar to the international aggregrate. Stay tuned!

Tomorrow: Part II – Key Cultural Drivers in Their Own Words

@TomHCAnderson - @OdinText

PS. Have questions about today's post? Feel free to post a comment or request more info here.

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He he tweets under the handle @tomhcanderson.

 

Shop Talk on Research Trends: Our Interview with the Industry’s Top Pundit!

GreenBook Interview Covers Partnering, AI/Machine Learning and the Latest Insights Applications for Text Analytics “We should be less worried about each other and more worried about the potential new entrants to this industry.”

That’s what I told GreenBook Blog Editor & Chief Leonard Murphy in an interview recently when he asked me about the trend toward partnering and collaboration between research providers.

It’s not often that one gets to talk shop at length with the industry’s top pundit, so Tim Lynch and I were delighted when Lenny invited us for a frank and broad-based discussion that covered some important ground, including:

  • Why partnering and collaboration among research companies is becoming a critically important factor in today’s marketplace;
  • What the buzz around AI and machine learning is really about and what researchers need to know;
  • How text analytics are being deployed in powerful and novel ways to produce insights that either were not accessible or couldn’t be obtained practically in the past.

Check out Lenny’s post about it here and have a look at the interview below:

 

Special thanks again to Lenny Murphy for a great interview and for your efforts to keep us all informed and to help us get better at what we do!

@TomHCAnderson  - @OdinText

P.S. Want to know more about anything we covered in the interview? Contact us here.

 

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He tweets under the handle @tomhcanderson.

 

Why Machine Learning is Meaningless

Beware These Buzzwords! The Truth About "Machine Learning" and "Artificial Intelligence" Machine learning, artificial intelligence, deep learning… Unless you’ve been living under a rock, chances are you’ve heard these terms before. Indeed, they seem to have become a must for market researchers.

Unfortunately, so many precise terms have never meant so little!

For computer scientists these terms entail highly technical algorithms and mathematical frameworks; to the layman they are synonyms; but as far as most of us should be concerned, increasingly, they are meaningless.

My engineers would severely chastise me if I used these words incorrectly—an easy mistake to make since there is technically no correct or incorrect way to use these terms, only strict and less strict definitions.

Nor, evidently, is there any regulation about how they’re used for marketing purposes.

(To simplify the rest of this blog post, let’s stick with the term “machine learning” as a catch-all.)

Add to this ambiguity the fact that no sane company would ever divulge the specifics underpinning their machine learning solution for fear of intellectual property theft. Still others may just as easily hide behind an IP claim.

Bottom line: It is simply impossible for clients to know what they are actually getting from companies that claim to offer machine learning unless the company is able and chooses to patent said algorithm.

It’s an environment that is ripe for unprincipled or outright deceitful marketing claims.

A Tale of Two Retailers

Not all machine learning capabilities are created equal. To illustrate, let’s consider two fictitious competing online retailers who use machine learning to increase their add-on sales:

  • The first retailer suggests other items that may be of interest to the shopper by randomly picking a few items from the same category as the item in the shopper’s cart.

 

  • The second retailer builds a complex model of the customer, incorporating spending habits, demographic information and historical visits, then correlates that information with millions of other shoppers who have a similar profile, and finally suggests a few items of potential interest by analyzing all of that data.

In this simplistic example, both retailers can claim they use machine learning to improve shoppers’ experiences, but clearly the second retailer employs a much more sophisticated approach. It’s simply a matter of the standard to which they adhere.

This is precisely what I’m seeing in the insights marketplace today.

At the last market research conference I attended, I was stunned by how many vendors—no matter what they were selling—claimed their product leveraged advanced machine learning and artificial intelligence.

Many of the products being sold would not even benefit from what I would classify as machine learning because the problems they are solving are so simple.

Why run these data through a supercomputer and subject them to very complicated algorithms only to arrive at the same conclusions you could come to with basic math?

Even if all these companies actually did what they claimed, in many cases it would be silly or wasteful.

Ignore Buzzwords, Focus on Results

In this unregulated, buzzword-heavy environment, I urge you to worry less about what it’s called and focus instead on how the technology solves problems and meets your needs.

At OdinText, we use advanced algorithms that would be classified as machine learning/AI, yet we refrain from using these buzzwords because they don’t really say anything.

Look instead for efficacy, real-world results and testimonials from clients who have actually used the tool.

And ALWAYS ask for a real-time demo with your ACTUAL data!

Yours truly,

@TomHCanderson

Ps. See firsthand how OdinText can help you learn what really matters to your customers and predict real behavior. Contact us for a demo using your own data here!

About Tom H. C. Anderson

Tom H. C. Anderson is the founder and managing partner of OdinText, a venture-backed firm based in Stamford, CT whose eponymous, patented SAS platform is used by Fortune 500 companies like Disney, Coca-Cola and Shell Oil to mine insights from complex, unstructured and mixed data. A recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research, Anderson is the recipient of numerous awards for innovation from industry associations such as CASRO, ESOMAR and the ARF. He was named one of the "Four under 40" market research leaders by the American Marketing Association in 2010. He  tweets under the handle @tomhcanderson.

65 CEOs Share Thoughts on Insights

Insight Association’s Inaugural CEO Summit: Future Tied to Collaboration and Technology Writing this at the Miami Airport as I’ve just finished up a great 3 day meeting of the minds at the new Insights Association’s first official event, the Marketing Research CEO Summit.

Though this event was formerly part of the Marketing Research Association (MRA), after the merger between The MRA and the Council for American Survey Research Organizations (CASRO), it is now is part of the greater and brand new Insights Association. This is also the reason I chose to attend the event for the first time this year. I like many others are eager for positive change in our industry and optimistically welcome new initiatives (as I mentioned in a post on their founding earlier this month).

Steve Schlesinger, CEO of Schlesinger Associates and Merrill Dubrow of M/A/R/C Research did a great job putting together and hosting the event.

While the obvious benefit of any event like this is the attendees and not the speakers, we had some other interesting and well respected client guests including Walmart’s Urvi Bhandari, Merck’s Lisa Courtade, Electrolux’s Brett Townsend and Dhan Kashyap from Humana. Their very candid evaluations of how well the industry is delivering *Hint* it’s not even close to as well as we think, was worth the cost of attendance.

Getting back to the attendees though, Market researchers as a breed are a cautious bunch and CEO’s in any industry are likely going to be “Alpha’s”. Quickly gaining trust and enabling sharing among this audience of would be competitors is not an easy task. Partly this was made possible via a fun case study competition sponsored by La Quinta CEO Keith Cline who also spoke at the event.

Another interesting aspect of the event was the Hot Seat interviews wherein a handful of the CEO’s in attendance were asked a series of tough and sometimes semi personal questions. I was one of those selected for this impromptu exercise and was asked what I thought about various aspects of the future of marketing research including digital/social (which I like to separate from other text analytics), and of course the topic of machine learning/AI which seems to be on everyone’s mind. For that reason I’ve decided to do a short blog post on AI and Machine learning later this week.

What I’d like to end this post with though is in re-answering one of the questions which I think Merrill indirectly asked me, and which I was asked by a couple of other attendees. I think the question is also related to the future of research. Do you think of yourself as a Marketing Research co. CEO or a software CEO? [Prior to founding OdinText Inc. in 2015 I ran boutique research firm Anderson Analytics for 10 years]

I admit it’s a tricky question, and obviously if I didn’t consider myself at least in part a marketing research CEO I wouldn’t have attended. Yet many of our software users definitely aren’t market researchers.

So here goes, I think we as an industry have an important skill set and understanding of our clients that no outsider has. I’m proud of this background and like other speakers including ZappiStore’s CRO Ryan Barry and Dan Foreman of Hatted pointed out, the future is not in resisting technology, nor is it necessarily in building your own technology, which can be time consuming and wasteful, but it’s about embracing technology and often learning how to rent or partner with technology experts and adding what you are best at (often data and as importantly consultative insights and strategy).

Several of the CEO’s I spoke with separately admitted having tried various internal technology builds which either weren’t right, or in some cases may have been right when the effort began, but didn’t evolve quickly enough and so was outdated when they did come to market.

Yet it was also quite clear to most of these CEO’s that while it’s critical to watch out for new technology oriented entrants into the market research space, more often than not these simply do not have the knowledge necessary to deliver truly actionable insights. Companies like IBM Watson for instance, certainly have a strong brand name in computers, but their offering as a plug in for marketing research API’s is sorely lacking to say the least.

The point is, knowledge and trust is what we have in good supply at both the event and in our industry in overall. The key to evolving is to remember the knowledge and best practices our industry was based on while being open to understanding outside technologies and ideas, yet resisting the urge to just try to copy them. Importantly as Merrill Dubrow pointed out, there are tremendous benefits in overcoming your fear of collaborating with other research and technology companies and partnering.

This is the idea I’m most optimistic about coming away from the conference. I made several new friends at the event, and I welcome anyone who attended to please reach out if they have are any questions in regard to text analytics and data mining software and discussing potential mutually beneficial relationships.

Until Next Year!

@TomHCAnderson

 

ABOUT ODINTEXT

OdinText is a patented SaaS (software-as-a-service) platform for advanced analytics. Fortune 500 companies such as Disney and Shell Oil use OdinText to mine insights from complex, unstructured text data. The technology is available through the venture-backed Stamford, CT firm of the same name founded by Tom H. C. Anderson, a recognized authority and pioneer in the field of text analytics with more than two decades of experience in market research. Anderson and OdinText have received numerous awards for innovation from industry associations such as ESOMAR, CASRO, the ARF and the American Marketing Association. He tweets under the handle @tomhcanderson. Request OdinText Info or a free demo here.

IIEX 2016 Competition Showcases Innovation in Market Research

Artificial Intelligence, Mixed Data Analytics and Passive Listening Capture Minds - 2016 Insight Innovation Exchange

I’m just back from the IIEX conference in Atlanta, where OdinText competed in the Insight Innovation Competition. Although I was disappointed that we didn’t win, I’m pleased to report that the judges told me we placed a very close second.

IIeX 2016

IIeX 2016

Attending conferences like this affords me the opportunity to get a pulse on the industry, and I was struck by the fact that text analytics are no longer viewed as a shiny new toy in market research. In fact, as someone who has been working in the natural language processing field for so long, it’s actually somewhat remarkable to see how perceptions of text analytics have matured over just the last year.  Text analytics have become a must-have, and the market has a new wave of healthy competition as a result, which I think is further evidence of a healthy market.

Since OdinText goes beyond just text data and incorporates mixed data—text and quantitative—in our competition pitch we highlighted OdinText’s ability to essentially enable market researchers to do data science.

I strongly believe making data science more accessible is a huge opportunity that OdinText is uniquely positioned to solve, and it’s an area where market researchers can step up to meet a desperate need as we currently have a shortage of about 200,000 data scientists in the US alone.

(Check out this 5-minute video of my IIEX competition pitch and let me know what YOU think!)

Download PDF

Download PDF

You are also most welcome to download a PDF of the PPT presentation >>>

“Machine learning” appears to be the new buzz phrase in research circles, and at IIEX I was hard pressed to find a single vendor not claiming to use machine learning in some respect, no matter where on the service chain they fit. Honestly, though, I got the sense that many use the term without entirely understanding what it means.

We continue to leverage machine learning where it makes sense at OdinText, and there are a few other vendors out there who also clearly have an excellent grasp of the technique.

One such company—which took first place in the competition, in fact—was Remesh. They’re actually using machine learning in a very unique and novel way, by automating the role of an online moderator almost akin to a chat bot. They’ve positioned this as AI, and to replace humans completely with a computer is a holy grail for almost any industry.

I’m optimistic on AI in my field of data and text mining as well, but we’re still a ways off in terms of taking the human out of the mix, and so our goal at OdinText is to use the human as efficiently as possible.

While totally automating what a data scientist does is appealing, in the short term we’re happy with being able to allow a market researcher to do in a few hours what would take a typical data scientist with skills in advanced statistics, NLP, Python, R and C++ days or weeks to do.

Still I admit the prospect of AI replacing researchers completely is an interesting one—albeit not necessarily a popular one among the people who would be replaced—and it’s an area that I’m certainly thinking about.

Third place in the competition I understand was Beatgrid Media, which leverages smart phones (without using almost any battery life) to passively listen to audio streams from radio and TV and overlaying geo demographics with these panelists’ data to better predict advertising reach and efficacy. This is admittedly going to be a very hard field to break into by a start-up as there are many big players in the space who want to own their own measurement. And so this may have been one of the reasons Beatgrid had trouble taking more than third, even though they admittedly have some very interesting technology that could perhaps also be applied in other ways.

Let me know what you think!

(And if you’re interested in a demo of OdinText, please contact us here!)

Tom H.C. Anderson | @TomHCanderson | @OdinText

Tom H.C. Anderson

Tom H.C. Anderson

To learn more about how OdinText can help you understand what really matters to your customers and predict actual behavior,  please contact us or request a Free Demo here >

[NOTE: Tom H. C. Anderson is Founder of Next Generation Text Analytics software firm OdinText Inc. Click here for more Text Analytics Tips]

Mr Big Data VS. Mr Text Analytics

[Interview re-posted w/ permission from Text Analytics News]

Mr. Big Data & Mr. Text Analytics Weigh In Structured VS. Unstructured Big Data

 

kirk_borne Text Analytics News

If you pay attention to Big Data news you’re sure to have heard of Kirk Borne who’s well respected views on the changing landscape are often shared on social media. Kirk is professor of Astrophysics and Computational Science at George Mason University. He has published over 200 articles and given over 200 invited talks at conferences and universities worldwide. He serves on several national and international advisory boards and journal editorial boards related to big data

 

 

tom_anderson Text Analytics News

Tom H. C. Anderson was an early champion of applied text analytics, and gives over 20 conference talks on the topic each year, as well as lectures at Columbia Business School and other universities. In 2007 he founded the Next Gen Market Research community online where over 20,000 researchers frequently share their experiences online. Tom is founder of Anderson Analytics, developers of text analytics software as a service OdinText. He serves on the American Marketing Association’s Insights Council and was the first proponent of natural language processing in the marketing research/consumer insights field.

 

Ahead of the Text Analytics Summit West 2014, Data Driven Business caught up with them to gain perspectives on just how important and interlinked Big Data is with Text Analytics.

 

Q1. What was the biggest hurdle that you had to overcome in order to reach your current level of achievement with Big Data Analytics?

KB: The biggest hurdle for me has consistently been cultural -- i.e., convincing others in the organization that big data analytics is not "business as usual", that the opportunities and potential for new discoveries, new insights, new products, and new ways of engaging our stakeholders (whether in business, or education, or government) through big data analytics are now enormous.

After I accepted the fact that the most likely way for people to change their viewpoint is for them to observe demonstrated proof of these big claims, I decided to focus less on trying to sell the idea and focus more on reaching my own goals and achievements with big data analytics. After making that decision, I never looked back -- whatever successes that I have achieved, they are now influencing and changing people, and I am no longer waiting for the culture to change.

THCA: There are technical/tactical hurdles, and methodological ones. The technical scale/speed ones were relatively easy to deal with once we started building our own software OdinText. Computing power continues to increase, and the rest is really about optimizing code.

The methodological hurdles are far more challenging. It’s relatively easy to look at what others have done, or even to come up with new ideas. But you do have to be willing to experiment, and more than just willingness, you need to have the time and the data to do it! There is a lot of software coming out of academia now. They like to mention their institution in every other sentence “MIT this” or ‘UCLA that”. The problem they face is twofold. On the one hand they don’t have access to enough real data to see if their theories play out. Secondly, they don’t have the real world business experience and access to clients to know what things are actually useful and which are just novelty.

So, our biggest hurdle has been the time and effort invested through empirical testing. It hasn’t always been easy, but it’s put me and my company in an incredibly unique position.

Q2. Size of data, does it really matter? How much data is too little or too much?

THCA: Great question, with text analytics size really does matter. While it’s technically possible to get insights from very small data, for instance on our blog during the elections one of my colleagues did a little analysis of Romney VS. Obama debate transcripts, text analytics really is data mining, and when you’re looking for patterns in text, the more data you have the more interesting relationships you can find.

KB: Size of data doesn't really matter if you are just getting started. You should get busy with analytics regardless of how little data you have. The important thing is to identify what you need (new skills, technologies, processes, and data-oriented business objectives) in order to take advantage of your digital resources and data streams. As you become increasingly comfortable with those, then you will grow in confidence to step up your game with bigger data sets. If you are already confident and ready-to-go, then go! The big data revolution is like a hyper-speed train -- you cannot wait for it to stop in order to get on board -- it isn't stopping or slowing down! At the other extreme, we do have to wonder if there is such a thing as too much data. The answer to this question is "yes" if we dive into big data's deep waters blindly without the appropriate "swimming instruction" (i.e., without the appropriate skills, technologies, processes, and data-oriented business objectives). However, with the right preparations, we can take advantage of the fact that bigger data collections enable a greater depth of discovery, insight, and data-driven decision support than ever before imagined.

Q3. What is the one thing that motivates and inspires you the most in your Big Data Analytics work?

KB: Discovery! As a scientist, I was born curious. I am motivated and inspired to ask questions, to seek answers, to contemplate what it all means, and then to ask more questions. The rewards from these labors are the discoveries that are made along the way. In data analytics, the discoveries may be represented by a surprising unexpected pattern, trend, association, correlation, event, or outlier in the data set. That discovery then becomes an intellectual challenge (that I love): What does it mean? What new understanding does this discovery reveal about the domain of study (whether it is astrophysics, or retail business, or national security, or healthcare, or climate, or social, or whatever)? The discovery and the corresponding understanding are the benefits of all the hard work of data wrangling.

THCA: Anyone working with analytics has to be curious by nature. Satisfying that curiosity is what drives us. More specifically in my case, if our clients get excited about using our software and the insights they’ve uncovered, then that really gets me and my whole team excited. This can be challenging, and not all data is created equal.

It can be hard to tell someone who is excited about trying Text Analytics that their data really isn’t suitable. The opposite is even more frustrating though, knowing that a client has some really interesting data but is apprehensive about trying something new because they have some old tools lying around that they haven’t used, or because they have a difficult time getting access to the data because it’s technically “owned” by some other department that doesn’t ‘Get’ analytics. But helping them build a case and then helping them look good by making data useful to the organization really feeds into that basic curiosity. We often discover problems to solve we had no idea existed. And that’s very inspiring and rewarding.

Q4. Which big data analytics myth would you like to squash right here and now?

KB: Big data is not about data volume! That is the biggest myth and red herring in the business of big data analytics. Some people say that "we have always had big data", referring to the fact that each new generation has more data than the previous generation's tools and technologies are able to handle. By this reasoning, even the ancient Romans had big data, following their first census of the known world. But that's crazy. The truth of big data analytics is that we are now studying, measuring, tracking, and analyzing just about everything through digital signals (whether it is social media, or surveillance, or satellites, or drones, or scientific instruments, or web logs, or machine logs, or whatever). Big data really is "everything, quantified and tracked". This reality is producing enormously huge data volumes, but the real power of big data analytics is in "whole population analysis", signaling a new era in analytics: the "end of demographics", the diminished use of small samples, the "segment of one", and a new era of personalization. We have moved beyond mere descriptive analysis, to predictive, prescriptive, and cognitive analytics.

THCA: Tough one. There are quite a few. I’ll avoid picking on “social media listening” for a bit, and pick something else. One of the myths out there is that you have to be some sort of know it all ‘data scientist’ to leverage big data. This is no longer the truth. Along with this you have a lot dropping of buzz words like “natural language processing” or “machine learning” which really don’t mean anything at all.

If you understand smaller data analytics, then there really is no reason at all that you shouldn’t understand big data analytics. Don’t ever let someone use some buzz word that you’re not sure of to impress you. If they can’t explain to you in layman’s terms exactly how a certain software works or how exactly an analysis is done and what the real business benefit is, then you can be pretty sure they don’t actually have the experience you’re looking for and are trying to hide this fact.

Q5.What’s more important/valuable, structured or unstructured data?

KB: Someone said recently that there is no such thing as unstructured data. Even binary-encoded images or videos are structured. Even free text and sentences (like this one) are structured (through the rules of language and grammar). Even some meaning this sentence has. One could say that analytics is the process of extracting order, meaning, and understanding from data. That process is made easier when the data are organized into databases (tables with rows and columns), but the importance and value of the data are inherently no more or no less for structured or unstructured data. Despite these comments, I should say that the world is increasingly generating and collecting more "unstructured data" (text, voice, video, audio) than "structured data" (data stored in database tables). So, in that sense, "unstructured data" is more important and valuable, simply because it provides a greater signal on the pulse of the world. But I now return to my initial point: to derive the most value from these data sources, they need to be analyzed and mined for the patterns, trends, associations, correlations, events, and outliers that they contain. In performing that analysis, we are converting the inherent knowledge encoded in the data from a "byte format" to a "structured information format". At that point, all data really become structured.

THCA: A trick question. We all begin with a question and relatively unstructured data. The goal of text analytics is structuring that data which is often most unstructured.

That said, based on the data we often look at (voice of customer surveys, call center and email data, various other web based data), I’ve personally seen that the unstructured text data is usually far richer. I say that because we can usually take that unstructured data and accurately predict/calculate any of the available structured data metrics from it. On the other hand, the unstructured data usually contain a lot of additional information not previously available in the structured data. So unlocking this richer unstructured data allows us to understand systems and processes much better than before and allows us to build far more accurate models.

So yes, unstructured/text data is more valuable, sorry.

Q6. What do you think is the biggest difference between big data analysis being done in academia vs in business?

KB: Perhaps the biggest difference is that data analysis in academia is focused on design (research), while business is focused on development (applications). In academia, we are designing (and testing) the optimal algorithm, the most effective technique, the most efficient methodology, and the most novel idea. In business, you might be 100% satisfied to apply all of those academic results to your business objectives, to develop products and services, without trying to come up with a new theory or algorithm. Nevertheless, I am actually seeing more and more convergence (though that might be because I am personally engaged in both places through my academic and consulting activities). I see convergence in the sense that I see businesses who are willing to investigate, design, and test new ideas and approaches (those projects are often led by data scientists), and I see academics who are willing to apply their ideas in the marketplace (as evidenced by the large number of big data analytics startups with university professors in data science leadership positions). The data "scientist" job category should imply that some research, discovery, design, modeling, and hypothesis generation and testing are part of that person's duties and responsibilities. Of course, in business, the data science project must also address a business objective that serves the business needs (revenue, sales, customer engagement, etc.), whereas in academia the objective is often a research paper, or a conference presentation, or an educational experience. Despite those distinctions, data scientists on both sides of the academia-business boundary are now performing similar big data analyses and investigations. Boundary crossing is the new normal, and that's a very good thing.

THCA: I kind of answered that in the first question. I think academics have the freedom and time to pursue a research objective even if it doesn’t have an important real outcome. So they can pick something fun, that may or may not be very useful, such as are people happier on Tuesdays or Wednesday’s? They’ll often try to solve these stated objectives in some clever ways (hopefully), though there’s a lot of “Pop” research going on even in academia these days. They are also often limited in the data available to them, having to work with just a single data set that has somehow become available to them.

So, academia is different in that they raise some interesting fun questions, and sometimes the ideas borne out of their research can be applied to business.

Professional researchers have to prove an ROI in terms of time and money. Of course, technically we also have access to both more time and more money, and also a lot more data. So an academic team of researcher working on text analytics for 2-3 years is not going to be exposed to nearly as much data as a professional team.

That’s also why academic researchers often seem so in love with their models and accuracy. If you only have a single data set to work with, then you split it in half and use that for validation. In business on the other hand, if you are working across industries like we do, while we certainly may build and validate models for a specific client, we know that having a model that works across companies or industries is nearly impossible. But when we do find something that works, you can bet it’s going to be more likely to be useful.

Text (AkA Buzz) Analytics

[Re Posted from Next Gen Market Research Blog] First Ever Text Analytics Cartoon

It's been a while since I've posted a cartoon here on the blog. However, all the buzzwords (Big Data, Hadoop, Natural Language Processing and Machine Learning etc.) constantly bandied about in our field inspired me - plus I don't think I've ever seen a text analytics cartoon before.

Hope you like it?

@TomHCAnderson

[Full Disclosure: Tom H. C. Anderson is Managing Partner of Anderson Analytics, developers of patented Next Generation Text Analytics™software platform OdinText. For more information and to inquire about software licensing visit ODINTEXT INFO REQUEST]