Code by Hand? The Benefits of Automated and User-Guided Automated Customer Comment Coding

May 25th, 2016

Text Analytics Tips - Branding

Why you should not code text data by hand: Benefits of automated and user-guided automated coding – Text Analytics Tips by Gosia

Most researchers know very well that the coding of text data manually (using human coders who read the text and mark different codes) is very expensive both in terms of time that coders need to take and money needed to compensate them for this effort.

However, the major advantage of using human coding is their high understanding of complex meaning of text including sarcasms or jokes.

Usually at least two coders are required to code any type of text data and the calculation of inter-rater reliability or inter-rater agreement is a must. This statistic enables us to see how similarly any number of coders has coded the data, i.e., how often they have agreed on using the exact same codes.

Often even with the simplest codes the accuracy of human coding is low. No two human coders consistently code larger amounts of data the same way because of different interpretations of text or simply due to error. The latter is a reason why no single coder will code the same text data identically when done for the second time (perfect reliability for a single coder could be achieved in theory though, e.g., for very small datasets that can be proofread multiple times).

Another limitation is that human coders can only keep in their working memory a limited number of codes while reading the text. Finally, any change to the code will require repeating the entire coding process from the beginning. Because the process of manual coding of larger datasets is expensive and unreliable automated coding using computer software was introduced.

Automated or algorithm-based text coding solves many of the issues of human coding:

  1. it is fast (thousands of text comments can be read in seconds)
  2. cost-effective (automated coding should be always cheaper than human coding as it requires much less time)
  3. offers perfect consistency (same rules are applied every time without errors)
  4. an unlimited number of codes can be used in theory (some software might have limitations)

However, this process does also have disadvantages. As already mentioned above, humans are the only ones who can perfectly understand the complex meaning of text and simple algorithms are likely going to fail when trying to understand it (even though some new algorithms are under development recently, which can be almost as good as humans). Moreover, most software available on the market has low flexibility as codes cannot be known to or changed by the user.

Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.Figure 1. Comparison of OdinText with “human coding” and “automated coding” approaches.

Therefore, OdinText developers decided to let users guide the automated coding. Users can view and edit the default codes and dictionaries, create and upload their own, or build custom dictionaries based on the exploratory results provided by the automated analysis. The codes can be very complex and specific producing a good understanding of the meaning of text, which is the key goal of each text analytics software.

OdinText is a user-guided automated text analytics solution, which has aspects and benefits of both fully automated and human coding. It is fast, cost-effective, accurate, and allows for an unlimited number of codes like many other automated text analytics tools. However, OdinText surpasses the capabilities of other software by providing high flexibility and customization of codes/dictionaries and thus a better understanding of the meaning of text. Moreover, OdinText allows you to conduct statistical analyses and create visualizations of your data in the same software.

Try switching from human coding to user-guided automated coding and you will be pleasantly surprised how easy and powerful it is!


Text Analytics Tips with Gosi

[Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

[NOTE: OdinText is NOT a tool for human assisted coding. It is a tool used by analysts for better and faster insights from mixed (structured and unstructured) data.]

5 thoughts on “Code by Hand? The Benefits of Automated and User-Guided Automated Customer Comment Coding”

  1. Tom, you sent me a promotional email leading to your blog post. However, I think you might be overstepping your area of expertise in your zeal to monetize your product. An intelligent human interpreter will beat a software program hands down when it comes to understanding context and generating insight. Simplifying large, simple, manual tasks is what computers are great for, but marketers are often looking for insight rather than just computation. In addition, not just handling but understanding complexities like color, visuals, photos, audiovisual, and their interaction and placement is often very important, but also not something software handles as agilely as a good interpreter. Finally, all coding is not the same. A good anthropologist will look at data in a way very different than the content analysis mode you seem to assume is your competition. Products like yours have their uses, but making claims that they are as good as a PhD or always better than human coders shows that you have a lot to learn when it comes to understanding how to scientifically compare approaches and methods. I hope you are honest enough to print this.

    1. Thanks a lot for this comment!
      I think to an extent you have actually reiterated the point made in the above post. The anthropologist IS better than a computer and that is why we need user-guided coding. We do state clearly that only humans can understand the meaning of text perfectly (see figure). But the computer is needed because the anthropologist is limited in the same way every other human is and he is bound to make mistakes due to fatigue, memory, etc. This article is making a careful distinction, typology of types of coding and simply suggests a new approach combining the strengths of existing methods – humans & computers.
      To make the point clearer: an anthropologist will do just fine reading a few interviews and will understand and interpret them perfectly. However, when the same anthropologist will have to read 10,000 interviews or compare them across several groups, he will be limited in the ways described above. Then, with the help of a software like OdinText the antropologist will be able to use his in-depth knowledge of the population he’s studying and design very detailed and restrictive codes to capture, quantify and analyze content he is interested in. The codes can be very similar to the specific rules the anthroplogist is using when reading the interviews himself, even though the broader context of the text may be less easy to grasp by the software. Hope this helps.

  2. Thank for your comment Robert,

    I assume Gosia will probably reply also as this is her post, but will add 2 cents as this is something I feel very strongly about as well.
    A PhD in Sociology would most definitely do a better job understanding the intricacies between 30 minute interviews with five village Chieftains. They would be willing to bring their knowledge to bear and five interviews is something that a human mind can easily manage. We can look for the similarities and differences between what these five say and do a far better job at that tan any computer.

    Just so you understand how the use case is different. We do not recommend using OdinText OR ANY text analytics for qualitative research among a handful of participants in a focus group. While it would provide some benefit, there really are no patterns (not statistically anyway) among what five people say, and as I already said, anything that is there is very easy for the human to spot.

    Now if you take a use case like 1,000 survey responses (and certainly 10,000 100,000 or 1M+), this is when text analytics starts outperforming human coding. [And we have done head to head testing across tens of thousands of data points to prove this]

    But here is the issue, with the OdinText approach, the software when looking across these n=1,000 survey responses say, can also easily take into consideration what each person who commented gave as a satisfaction rating, as well as their income, gender etc. (far too much info for the PhD to manage). The software will also code with 100% consistency. Our PhD could not say the same. I know I myself would grow tires, hungry and bored a few hours into coding. Not just that my understanding of the problem changes as I go, and this too is likely to affect my coding. And ironically it probably should, except that would mean I would need to start from scratch and recode everything with this new superior understanding of the codes. But of course no human coder has time for that iteration, text analytics codes and be changed and everything can be recoded with new assumptions within seconds.

    There is a lot more I could say, but this is already a bit lengthy for a blog reply. I may follow-up with a blog post of my own.
    I think the main point, other than the above, is that in fact OdinText is a tool for the analyst/PhD in Sociology. Rather than passing off the 10,000 interviews to a junior assistant or some human coding shop (as is common to do in consumer insights research), the researcher guides OdinText to get the best of both worlds.

  3. Interesting responses, Gosia and Tom, and a interesting dialog. I think the big thing you both miss, however, is my comment about insight. It really depends on what you are looking for, and what you want to do. In some, maybe many cases, if I want to construct an ad campaign or design a new product, or understand competitive positioning, I am far better having one really smart person look through a limited number of well-chosen relevant texts like, say, 100 (or even 10, as I do in my research). Their insight is going to be far more valuable to me for a task like constructing an ad, making sure the image and language and visuals are just right. Looking at 10,000+ texts will give you a big picture, but often it isn’t all that useful for something like ad construction or NPD, where understanding and insight are key. It is, however, very useful when you want to be able to describe mass Internet response to something, or to code massive amounts of data in a consistent manner, like in survey response recognition. I hope that makes the point clearer. As well, humans have a huge edge in things like visual and audio, as I said before. The internet is only partially text. Skilled people still have a huge role to play in analyzing data. Probably even moreso now because there is so much of it.

    1. Thanks for bringing this up again, Robert. You are absolutely right that in some cases, especially qualitative research, an indeth analysis of text and/or visuals and audio by a skilled analyst is best. However, we are positive that in quantitiative research with larger samples only a computer-assisted approach can yield answers that are reliable, consistent, fast, and replicable. As Tom mentioned above, a lot of our own as well as academic research has confirmed this. Perhaps we mean different things by “coding”. It is after all a different process in quantitative than in qualitative analyses and the above article focuses on the former.

Comments are closed.


Become a Data Scientist!
No PhD Required, No Boring and time consuming scripting in Python or SQL Just Bring Your Data. OdinText is as easy! Let us show you how:
* we hate spam and never share your details.
Thank You. We will process your demo request and send instructions shortly.