QTID: Quran Text Image Dataset

Mahmoud Badry; Hesham Hassan; Hanaa Bayomi; Hussien Oakasha

doi:10.14569/IJACSA.2018.090351

DOI: 10.14569/IJACSA.2018.090351

PDF

QTID: Quran Text Image Dataset

Author 1: Mahmoud Badry

Author 2: Hesham Hassan

Author 3: Hanaa Bayomi

Author 4: Hussien Oakasha

International Journal of Advanced Computer Science and Applications(IJACSA), Volume 9 Issue 3, 2018.

Abstract and Keywords
How to Cite this Article
{} BibTeX Source

Abstract: Improving the accuracy of Arabic text recognition in imagery requires a big modern dataset as data is the fuel for many modern machine learning models. This paper proposes a new dataset, called QTID, for Quran Text Image Dataset, the first Arabic dataset that includes Arabic marks. It consists of 309,720 different 192x64 annotated Arabic word images that contain 2,494,428 characters in total, which were taken from the Holy Quran. These finely annotated images were randomly divided into 90%, 5%, 5% sets for training, validation, and testing, respectively. In order to analyze QTID, a different dataset statistics were shown. Experimental evaluation shows that current best Arabic text recognition engines like Tesseract and ABBYY FineReader cannot work well with word images from the proposed dataset.

Keywords: HDF5 dataset; Arabic script; Holy Quran text image; handwritten text recognition; Arabic OCR; text image datasets

Mahmoud Badry, Hesham Hassan, Hanaa Bayomi and Hussien Oakasha, “QTID: Quran Text Image Dataset” International Journal of Advanced Computer Science and Applications(IJACSA), 9(3), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090351

@article{Badry2018,
title = {QTID: Quran Text Image Dataset},
journal = {International Journal of Advanced Computer Science and Applications},
doi = {10.14569/IJACSA.2018.090351},
url = {http://dx.doi.org/10.14569/IJACSA.2018.090351},
year = {2018},
publisher = {The Science and Information Organization},
volume = {9},
number = {3},
author = {Mahmoud Badry and Hesham Hassan and Hanaa Bayomi and Hussien Oakasha}
}

Copyright Statement: This is an open access article licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, even commercially as long as the original work is properly cited.

QTID: Quran Text Image Dataset

Upcoming Conferences