Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.
Digital Object Identifier (DOI) : 10.14569/IJACSA.2015.061241
Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 6 Issue 12, 2015.
Abstract: Extracting knowledge from text documents has become one of the main hot topics in the field of Natural Language Processing (NLP) in the era of information explosion. Arabic NLP is considered immature due to several reasons including the low available resources. On the other hand, automatically extracting reliable knowledge from specialized data sources as holy books is considered ultimately a challenging task but of great benefit to all humans. In this context, this paper provides a comprehensive Quranic Dataset as a first part (foundation) of an ongoing research that attempts to lay grounds for approaches and applications to explore the holy Quran. The paper presents the algorithms and approaches that have been designed to extract an aggregative data from massive Arabic text sources including the holy Quran and tightly associated books. Holy Quran text is transferred into structured multi-dimensional data records starting from the chapter level, the word level and then the character level. All these are linked with interpretations and meanings, parsing, translations, intonation roots and stems of words, all from authentic and reliable sources. The final dataset is represented in excel sheets and database records format. Also, the paper presents models of the dataset at all levels. The Quranic dataset presented in this paper was designed to be appropriate for: database, data mining, text mining and Artificial Intelligence applications; it is also designed to serve as a comprehensive encyclopedia of holy Quran and the Quranic Science books.
Mohamed Osman Hegazi, Anwer Hilal and Mohammad Alhawarat, “Fine-Grained Quran Dataset” International Journal of Advanced Computer Science and Applications(IJACSA), 6(12), 2015. http://dx.doi.org/10.14569/IJACSA.2015.061241