A Machine Learning Approach for Recognizing the Holy Quran Reciter

Mainly, the holy Quran is the holy book for all Muslims. Reading the holy Quran is a special reading with rules. Reading the Holy Quran is called recitation. One of the Muslim essential activities is reading or listening to the Holy Quran. In this paper, a machine learning approach for recognizing the reader of the holy Quran (reciter) is proposed. The proposed system contains basic traditional phases for a recognition system, including data acquisition, pre-processing, feature extraction, and classification. A dataset is created for ten well-known reciters. The reciters are the prayer leaders in the holy mosques in Mecca and Madinah. The audio dataset set is analyzed using the Mel Frequency Cepstral Coefficients (MFCC). Both the K nearest neighbor (KNN) classifier, and the artificial neural network (ANN) classifier are applied for classification purpose. The pitch is used as features which are utilized to train the ANN and the KNN for classification. Two chapters from the Holy Quran are selected in this paper for system validation. Excellent accuracy is achieved. Using the ANN, the proposed system gives 97.62% accuracy for chapter 18 and 96.7% accuracy for chapter 36. On the other hand, the proposed system gives 97.03% accuracy for chapter 18 and 96.08% accuracy for chapter 36 by using the KNN. Keywords—Holy Quran audio analysis; MFCC; KNN; ANN; Machine learning


I. INTRODUCTION
The holy Quran is the holy book for all Muslims. Mainly, the Holy Quran contains 30 parts and 114 chapters where each chapter is known as Surah. The length of each chapter is different in which each surah contains a different number of verses. Some Surahs have long verses such as Al-baqarah and Al Omran have 286 and 200 verses respectively. On the other hand, some surahs have short verses such as Alikhlas, Alfalaq, and Alnas have 4, 5, and 6 verses respectively. Reading the holy Quran is a special reading mechanism where it requires some rules to be applied during the reading. The reading mechanism of the holy Quran is known as Tajweed. Quran reciters recognition has not well investigated compared to speech and speaker recognition in the English Language. The lag of Quran reciters recognition compared to speech recognition mechanism requires a unique biometric signal for each reciter may be referred to lack of uniform Quran reciters datasets [1] [2]. In this paper, the speech recognition concept is exploited in recognizing the holy Quran reciters. Mainly, the recognition of Quran reciters investigation is lacking common datasets of Quran reciters. There are many fields of speech recognition including but not limited to voice calling /dialing, speech datasets indexing, exam/tests, and forensic [3]. Most of the speech recognition researchers have conducted their research on English language datasets. Little research had conducted on the Arabic language. Arabic speech recognition systems are limited compared to other languages.
Quran reciter recognition is a challenging task since it is performed with Tajweed. Each Quran reciter has his own unique signal. Each reciter signal has a temporary dynamic change over time in terms of pronunciation and the way of reciting which is known as Tarteel [4]. Recognizing and classifying the holy Quran reciters is considered an area of voice and speech recognition systems. This paper is focused on recognizing the holy Quran reciters. A new method is applied in extracting the features using the MFCC. The remainder of this paper is organized as follows: Sections 2 presents the related work while Section 3 describes the proposed systems in detail. The experimental results are presented in Section 4. Finally, Section 5 presents brief conclusions and future work.

II. RELATED WORK
Research on Quran reciters is limited. Al-Ayyoub et al. [1] introduced the use of identifying the right way of applying the main rules in reciting the holy Quran. Three different features were used: the MFCC, the Markov Model-based Spectral Peak Location (HMM-SPL), and the Wavelet Packet Decomposition (WPD). Furthermore, three different classifiers were used as well: the SVM, the Random Forest (RF), and the K-Nearest Neighbors (KNN). By applying the deep learning techniques, the obtained accuracy was 97.7%. Adnan Qayyoum et al. [5] introduced a deep learning approach for identifying the Quran reciter based on the recurrent neural network (RNN) by applying the bidirectional long short term memory (BLSTM) resulting in a significant result. Nahar et al. [6] introduced a Quran reciter identification using the support vector machine (SVM) and the artificial neural network (ANN). The MFCC coefficient features of 15 reciters were obtained and mapped into two different classifiers: the SVM and ANN. The system accuracy was 96.59% using the SVM and 86.1 using the ANN. Bezoui et al. [7] introduced the Quranic verses MFCCs features using MFCC. Hussaini et al. [8] presented an automatic reciter recognition system using the MFCC with a text-independent speaker recognition technique. By using their own dataset of 20 reciters, they achieved an accuracy of 86.5%. Muhammad et al. [9] developed a system for testing the people who memorized the Quran orally (E-hafiz). They extracted the features using the MFCC technique. These features were mapped into the trained collected data for matching. A mismatch error was pointed out, and the encouraging result was obtained. Alshayeb et al., [10], presented an iPhone application based on the audio fingerprinting for identifying the reciter details. Their system could show and play the reciter identified surah.

A. Introduction
The main part of the proposed system is the feature extraction step. Upon extracting the features, these features were mapped into the ANN or KNN classifiers to recognize the reciter. Fig. 1 summarizes the proposed system model. The MFCC technique is applied for extracting the features, where these features were mapped into the ANN or KNN for training and testing to identify the Quran reciter. Firstly, the trained model had built after extracting the features. Mainly, some samples of ten reciters were collected to create the dataset. The reciters are the prayer leaders in both mosques in Mecca and Madinah. Reading or listening to the holy Quran is one of the main essential activities for Muslims. It is recommended to read chapter 18 every Friday. Also, it is preferred to read chapter 36 on dead people. Two same chapters were chosen for the ten reciters. The chapters are chapters 18 and 36. Chapter 18 is Alkahf surah with 110 verses and chapter 36 is Yasin surah with 83 verses. These two chapters are common for Muslims. Table I shows ten different reciters.
In this paper, the corpus is constructed for ten reciters and two chapters. Mainly, a wave file was created for each verse in each chapter for each reciter. Each reciter recited the chapters where each verse in each chapter was saved in a wave file. Basically, in chapter 18, there were 1100 wave files due to the 110 recited verses. Similarly, in chapter 36 there were 830 wave files due to the 83 recited verses. The holy Quran contains 114 chapters, where each chapter contains (n) numbers of verses. In both chapters, the speech/audio signal for each reciter was divided into segments of 20 ms frames which was empirically chosen since the best recognition rate was achieved using it. Larger frame size did not improve the quality of recognition.
The length of the verses is different in each chapter. Due to the variation in verses length, the MFCC technique was applied for extracting the features for each verse (wave file). Basically, a 20 feature of each wave file was extracted for the 1100 and 830 acoustic waves. Basically, all the features were combined into two separate files as feature matrix files. One file for chapter 18, and another file for chapter 36. Each verse feature vector was mapped to the ANN or the KNN, respectively.

B. Mel-Frequency Cepstrum Coefficients (MFCC)
In general, extracting features is the first step in developing any speech recognition system. The main goal for the features is to identify the main components which represent the speech signal, and to discard all the redundant data in the speech signal. Understanding the speech/audio signal helps in extracting robust features. Understanding any speech signal needs the knowledge of the sound shape generated by reciters. Mainly, accurate determination of the sound shape yields an accurate representation of the wave file (phoneme). The shape itself comes in a short period time of the power spectrum. By exploiting the MFCC technique which is widely used in speech recognition systems, the wave file features were extracted. The Mel scale frequencies were based on the variations of the bandwidth in the human ears in capturing the characteristics of the speech [11]. In reciting Quran, the voice and the pronunciation varied from one reciter to another which yields into a tone variation.
For I = 1,2,3, 4, ………..., p The cepstral coefficients (C i ) with an order of p, and the number of magnitude coefficients of the Discrete Fourier transform (DFT), the log energy output from the filters (X k ), using N filters. N= 20. By using equation 1, the MFCC 20 features were found for each reciter [6]. (1)

C. KNN Classification
The KNN classification algorithm method is applied in this paper. Mainly, the KNN is considered a fast high-speed machine learning algorithm. The KNN is used to classify the unknown testing parameters since the training set is done as a supervised learning algorithm. To classify a reciter, the MFCC features for a specific reciter is loaded via the testing set and compared with the training features according to their distance. Later, the prediction class for the testing reciter is determined 269 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 7, 2020 based on the minimum distance between the testing reciter and the training reciter samples by using the Euclidean distance. For example, given a query instance for a given reciter, the K nearest instances to this query reciter is the most common class. This is done according to the distance function. Basically, the KNN algorithm takes the neighborhood samples as prediction values in the testing set. This algorithm works for the minimum distance from the training set samples. The Euclidean distance D between two feature vectors X and Y is defined as follow: where x i and y i are the elements of X and Y.

D. Artificial Neural Networks
Classification techniques depend on the nature of the extracted features. In this paper, the artificial neural network (ANN) is applied as a classifier. The ANN has been applied to speech recognition systems successfully. The ANN is a nonlinear system (computational model) which consists of many processing elements. In general, the ANN consists of three main layer types: the input layer, the hidden layer, and the output layer. Mainly, the input layer is the initial data for the ANN which are the extracted features. The hidden layer is the intermediate layer between the output and the input layers. All the required computations in the ANN take place in the hidden layer. The output layer generates the output of the given inputs which are the classes. Typically, the ANN accepts inputs and generates outputs based on its predefined activation function. There are various types of ANNs. In this paper, the multilayer perceptron (MLP) with the backpropagation learning algorithm was applied [12]. The ANN is used for training and recognizing the Quran reciters. Fig. 2 shows the ANN architecture.
The topology of the MLP designed for Quran reciters recognition had the following parameters: The input layer contains features (the feature matrix).
The first hidden layer contains 20 neurons.
The second hidden layer contains 20 neurons.
The output layer contains 15 neurons.
It is worth noting that all the training was performed using the backpropagation learning algorithm. Fig. 2. The ANN Architecture [12].

IV. EXPERIMENTAL RESULTS
The proposed system of the Quran reciters recognition depends mainly on the MFCC extracted features from the wave files, the learning algorithm, and on the classification. The ANN was trained with ten Quran reciters. The 20 MFCC features for the ten reciters were combined in one file. In sum, two main files were combined. One file for chapter 18 and another file for chapter 36. In our experiments, a crossvalidation was used in order to verify the proposed system performance. 80% of the data were used in training, the remaining 20% of the data were used in testing. The training and testing data were chosen randomly from the combined file. Therefore, the training and the testing experiments were repeated six times of randomly selected data for training and testing in terms of 80% and 20% respectively. Table II shows the ANN and the KNN average recognition results for the reciter of Ali AlHothaify reciter in each chapter. Fig. 3 summarizes the recognition rate for the six different experiments and their average for the reciter of Ali AlHothaify reciter in each chapter.
Similar experiments were conducted for the remaining holy Quran reciters, and the average recognition rate for each Quran reciter is recorded in Table III.   Table III shows that the average ANN accuracy is 97.6% and 96.7% for chapters 18 and 36 respectively. Almost 2.5% and 3.5% of the entire data were misclassified. Furthermore, it shows that the average KK accuracy is 97.03% and 96.08% for chapters 18 and 36 respectively. The main reason for misclassification was the way of reading and reciting different verses. In addition, the tone variation and the expressed emotion of the reciter with the verse are critical. This is due to the reciting rule of Tajweed where all reciters follow the same rules.
In comparison with the existing work, it is difficult to compare the performance of the proposed system to other existing similar systems in [2,5,6] since other criteria and other datasets were used there. Basically, the existing work used different reciters and different datasets. Table IV summarizes the performance of the existing systems and the proposed system.

V. CONCLUSION
In this paper, a machine learning approach for Quran reciters recognition was proposed by using the KNN and ANN classifiers. The performance of both classifiers the KNN and the ANN were reported. Basically, the MFCC features were extracted and mapped into the ANN for both training and testing. Two common chapters were selected by ten famous reciters in Meccca and Madinah. The obtained average recognition rate for the whole reciters were 97.6 and 96.7 for chapter 18 and 36 respectively using the ANN. However, the obtained average recognition rate for the whole reciters were 97.03 and 96.08 for chapter 18 and 36 respectively using the KNN. In the future, in order to improve the system performance and to increase the system accuracy, it is suggested to apply the sliding window features to the wave signals with other classification algorithms such as, the Hidden Markov Models (HMMs).