What Does the Co-Occurence Graph Tell You?

Gosia
October 23rd, 2016

Text Analytics Tips - Branding

What does the co-occurrence graph tell you?– Text Analytics Tips by Gosia

The co-occurrence graph in OdinText may look simple at first sight but it is in fact a very complex visualization. Based on an example we are going to show you how to read and interpret this graph. See the attached screenshots of a single co-occurrence graph based on a satisfaction survey of 500 car dealership customers (Fig. 1-4).

The co-occurrence graph is based on multidimensional scaling techniques that allow you to view the similarity between individual cases of data (e.g., automatic terms) taking into account various aspects of the data (i.e., frequency of occurrence, co-occurrence, relationship with the key metric). This graph plots the co-occurrence of words represented by the spatial distance between them, i.e., it plots as well as it can terms which are often mentioned together right next to each other (aka approximate overlap/concurrence).

Figure 1. Co-occurrence graph (all nodes and lines visible).
Figure 1. Co-occurrence graph (all nodes and lines visible).

The attached graph (Fig. 1 above) is based on 50 most frequently occurring automatic terms (words) mentioned by the car dealership customers. Each node represents one term. The node’s size corresponds to the number of occurrences, i.e., in how many customer comments a given word was found (the greater node’s size, the greater the number of occurrences). In this example, green nodes correspond to higher overall satisfaction and red nodes to lower overall satisfaction given by customers who mentioned a given term, whereas brown nodes reflect satisfaction scores close to the metric midpoint. Finally, the thickness of the line connecting two nodes highlights how often the two terms are mentioned together (aka actual overlap/concurrence); the thicker the line, the more often they are mentioned together in a comment.

Figure 2. Co-occurrence graph (“unprofessional” node and lines highlighted).
Figure 2. Co-occurrence graph (“unprofessional” node and lines highlighted).

So what are the most interesting insights based on a quick look at the co-occurrence graph of the car dealership customer satisfaction survey?

  • “Unprofessional” is the most negative term (red node) and it is most often mentioned together with “manager” or “employees” (Fig. 2 above).
  • “Waiting” is a relatively frequently occurring (medium-sized node) and a neutral term (brown node). It is often mentioned together with “room” (another neutral term) as well as “luxurious”, “coffee”, and “best”, which are corresponding to high overall satisfaction (light green node). Thus, it seems that the luxurious waiting room with available coffee is highly appreciated by customers and makes the waiting experience less negative (Fig. 3 below).
  • The dealership “staff” is often mentioned together with such positive terms as “always”, “caring”, “nice”, “trained”, and “quick” (Fig. 4 below). However, staff is also mentioned with more negative terms including “unprofessional”, “trust”, “helpful” suggesting a few negative customer evaluations related to these terms which may need attention and improvement.
    Figure 3. Co-occurrence graph (“waiting” node and lines highlighted).
    Figure 3. Co-occurrence graph (“waiting” node and lines highlighted).
    Figure 4. Co-occurrence graph (“staff” node and lines highlighted).
    Figure 4. Co-occurrence graph (“staff” node and lines highlighted).
    Hopefully, this quick example can help you extract quick and valuable insights based on your own data!

Gosia

Text Analytics Tips with Gosi

[NOTE: Gosia is a Data Scientist at OdinText Inc. Experienced in text mining and predictive analytics, she is a Ph.D. with extensive research experience in mass media’s influence on cognition, emotions, and behavior.  Please feel free to request additional information or an OdinText demo here.]

Search