Fine-Grained Quran Dataset

Mohamed Osman Hegazi; Anwer Hilal; Mohammad Alhawarat

doi:10.14569/IJACSA.2015.061241

DOI: 10.14569/IJACSA.2015.061241

PDF

Fine-Grained Quran Dataset

Author 1: Mohamed Osman Hegazi

Author 2: Anwer Hilal

Author 3: Mohammad Alhawarat

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 6 Issue 12, 2015.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Extracting knowledge from text documents has become one of the main hot topics in the field of Natural Language Processing (NLP) in the era of information explosion. Arabic NLP is considered immature due to several reasons including the low available resources. On the other hand, automatically extracting reliable knowledge from specialized data sources as holy books is considered ultimately a challenging task but of great benefit to all humans. In this context, this paper provides a comprehensive Quranic Dataset as a first part (foundation) of an ongoing research that attempts to lay grounds for approaches and applications to explore the holy Quran. The paper presents the algorithms and approaches that have been designed to extract an aggregative data from massive Arabic text sources including the holy Quran and tightly associated books. Holy Quran text is transferred into structured multi-dimensional data records starting from the chapter level, the word level and then the character level. All these are linked with interpretations and meanings, parsing, translations, intonation roots and stems of words, all from authentic and reliable sources. The final dataset is represented in excel sheets and database records format. Also, the paper presents models of the dataset at all levels. The Quranic dataset presented in this paper was designed to be appropriate for: database, data mining, text mining and Artificial Intelligence applications; it is also designed to serve as a comprehensive encyclopedia of holy Quran and the Quranic Science books.

Keywords: Arabic Language; Holy Quran; Quranic Dataset; Text Mining; NLP

Mohamed Osman Hegazi, Anwer Hilal and Mohammad Alhawarat, “Fine-Grained Quran Dataset” International Journal of Advanced Computer Science and Applications(IJACSA), 6(12), 2015. http://dx.doi.org/10.14569/IJACSA.2015.061241

@article{Hegazi2015,
title = {Fine-Grained Quran Dataset},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2015.061241},
url = {http://dx.doi.org/10.14569/IJACSA.2015.061241},
year = {2015},
publisher = {The Science and Information Organization},
volume = {6},
number = {12},
author = {Mohamed Osman Hegazi and Anwer Hilal and Mohammad Alhawarat}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

Fine-Grained Quran Dataset

Upcoming Conferences