Text analysis answers: Is the Quran really more violent than the Bible?
Text Analytics Tips: Is the Quran really more violent than the Bible? by Tom H. C. Anderson
Part I: The Project
With the proliferation of terrorism connected to Islamic fundamentalism in the late-20th and early 21st centuries, the question of whether or not there is something inherently violent about Islam has become the subject of intense and widespread debate.
Even before 9/11—notably with the publication of Samuel P Huntington’s “Clash of Civilizations” in 1996—pundits have argued that Islam incites followers to violence on a level that sets it apart from the world’s other major religions.
The November 2015 Paris attacks and the politicking of a U.S. presidential election year—particularly candidate Donald Trump’s call for a ban on Muslim’s entering the country and President Obama’s response in the State of the Union address last week—have reanimated the dispute in the mainstream media, and proponents and detractors, alike, have marshalled “experts” to validate their positions.
To understand a religion, it’s only logical to begin by examining its literature. And indeed, extensive studies in a variety of academic disciplines are routinely conducted to scrutinize and compare the texts of the world’s great religions.
We thought it would be interesting to bring to bear the sophisticated data mining technology available today through natural language processing and unstructured text analytics to objectively assess the content of these books at the surface level.
So, we’ve conducted a shallow but wide comparative analysis using OdinText to determine with as little bias as possible whether the Quran is really more violent than its Judeo-Christian counterparts.
A few words of caution…
Due to the sensitive nature of this subject, I must emphasize that this analysis is by no means exhaustive, nor is it intended to advance any agenda or to conclusively prove anyone’s point.
The topic and data sources selected for this project constitute a significant departure from the consumer intelligence use cases for which clients typically turn to text analytics, so we thought this would be an interesting opportunity to demonstrate how this tool can be much more broadly applied to address questions and issues outside the realm of market research and business intelligence.
Again, this is only a cursory analysis. I believe there is more than one Ph.D. thesis awaiting students of theology, literature or political science who want to take a much deeper dive into this data.
About the “Data” Sources
First off, it seemed sensible and appropriate to analyze the Old and New Testaments separately. (The Jewish Torah makes up the first five books of the Christian Old Testament, of course, while the New Testament is unique to Christianity.)
We decided to split them for analysis for a couple of reasons: 1) They were written hundreds of years apart and 2) their combined size relative to the Quran.
Though all data (Old Testament, New Testament and Quran) were combined and read into OdinText as a single file, the Old Testament is the largest with over 23K verses and about 623K words, followed by the New Testament with just under 8K verses and 185K words, and then the Quran with just over 6K verses and less than 78K words.
Secondly, there are obviously multiple versions and translations of the texts available for study. We’ve selected the ones that were most accessible and best suited for this kind of analysis.
With regard to the Christian Bible, instead of the King James version, we opted to use the New International Version (NIV) because the somewhat updated language should be easier to work with.
In selecting an English translation of the Quran, we considered the Tafsir-ul-Quran (1957) by the Indian scholar Abdul Majid Daryabad, but decided to go with The Holy Qur’an (1917, 4th rev. ed. 1951) by Maulana Muhammad Ali because this version is more widely used and the data are more easily accessed.
We do not believe the text in either of these choices to differ materially.
Approach: A ‘Top-Down/Bottom-Up’ Inquiry
This means that identification of issues for investigation will be partly a priori or ‘Top-Down’ (i.e. the analyst determines specific topic areas to explore such as “violence”).
But there will also be a data-driven or ‘Bottom-Up’ aspect in which the software helps to identify topics or areas that may not have occurred to the analyst, but which could be important given the data.
OdinText looks for sentiments and emotions in the data as soon as it has been uploaded to our servers; however, as this particular data set is rather unique, certain custom dictionary definitions—what we refer to as “issues”—will also need to be created through the Top-Down/Bottom-Up approach.
One simple and unbiased way to do this is to allow the process by which these definitions are created to be as data-driven as possible. There are several ways to look to the data for information. For instance, we might start by looking at the top words mentioned in each source to understand what concepts cut across our data, and how they might be defined. (See figure 1)
In this way, an overarching concept for comparison in each of the three sources can then be developed. For instance, a concept like “God” would need to include all common terms for this concept in each text source.
We can name such a concept something like “God All Inclusive,” and allowing all common definitions/terms for God in each of the texts to be picked up under this concept.
Accordingly, “God All Inclusive” would include any mention of “Lord” (28%) or “God” (11%) in the Old Testament, as well as any mentions of “Jesus” (17%), “God” (16%), “Lord” (8%) or “Christ” (7%) in the New Testament, and any mentions of “Allah” (30%) or “Lord” (14%) in the Quran.
As mentioned earlier, in order to keep this analysis as unbiased as possible (and in order to do it as quickly as possible), we will also rely on OdinText’s built in functionality to understand broader concepts such as positive and negative sentiment as well as other psychological constructs and emotion in text. In other words, when we look at positive and negative emotion we will be using this broad-based metric across the three texts without any customization at all.
Now that I’ve laid the groundwork for this project, please join me tomorrow as we take a look at the initial results!
Considering many people take at least a year to read just one of these texts, you may find it interesting that it took OdinText less than 120 seconds to read, parse and analyze all three texts at once!
Up Next: Part II – One of these texts is angrier!