An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measures
General Material Designation
[Thesis]
General Material Designation
[Thesis]
General Material Designation
[Thesis]
General Material Designation
[Thesis]
First Statement of Responsibility
Salha Hassan Muhammed Qahl
Subsequent Statement of Responsibility
Fokoue, Ernest
.PUBLICATION, DISTRIBUTION, ETC
Name of Publisher, Distributor, etc.
Rochester Institute of Technology
Date of Publication, Distribution, etc.
2014
PHYSICAL DESCRIPTION
Specific Material Designation and Extent of Item
104
GENERAL NOTES
Text of Note
Committee members: Chen, Linlin; Parody, Robert
NOTES PERTAINING TO PUBLICATION, DISTRIBUTION, ETC.
Text of Note
Place of publication: United States, Ann Arbor; ISBN=978-1-321-40085-4
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
M.S.
Discipline of degree
Applied Statistics
Body granting the degree
Rochester Institute of Technology
Text preceding or following the note
2014
SUMMARY OR ABSTRACT
Text of Note
Is there any similarity between the contexts of the Holy Bible and the Holy Quran, and can this be proven mathematically? The purpose of this research is using the Bible and the Quran as our corpus, we explore the performance of various feature extraction and machine learning techniques. The unstructured nature of text data adds an extra layer of complexity in the feature extraction task, and the inherently sparse nature of the corresponding data matrices makes text mining a distinctly difficult task. Among other things, We assess the difference between domain-based syntactic feature extraction and domain-free feature extraction, and then use a variety of similarity measures like Euclidean, Hillinger, Manhattan, cosine, Bhattacharyya, symmetries kullback-leibler, Jensen Shannon, probabilistic chi-square and clark. For a similarity to identify similarities and differences between sacred texts.
TOPICAL NAME USED AS SUBJECT
Mathematics; Statistics; Computer science
UNCONTROLLED SUBJECT TERMS
Subject Term
Pure sciences;Applied sciences;Data mining;Machine learning;Sacred texts;Similarity measures