On Arabic Character Recognition Employing Hybrid Neural Network

Arabic characters illustrate intricate, multidimensional and cursive visual information. Developing a machine learning system for Arabic character recognition is an exciting research. This paper addresses a neural computing concept for Arabic Optical Character Recognition (OCR). The method is based on local image sampling of each character to a selected feature matrix and feeding these matrices into a Bidirectional Associative Memory followed by Multilayer Perceptron (BAMMLP) with back propagation learning algorithm. The efficacy of the system has been justified over different test patterns of Arabic characters. Experimental results validate that the system is well efficient to recognize Arabic characters with overall more than 82% accuracy. Keywords—Arabic characters; Arabic OCR; image histogram; BAMMLP; hybrid neural network


INTRODUCTION
Arabic language occupies a significant role in mass communication.Over 200 million people speak in Arabic language as mother tongue [1], and more than one billion people exercise it for multifarious religion-oriented matters.Arabic character recognition, therefore, has become one of the exciting areas of research.In spite of its emergent interests in this area, no appropriate solution is presented due to the distinct and intricate characteristics of Arabic scripts.
Numerous research articles have been cited in scientific journals in the field of recognizing English, Chinese, Japanese, Latin, Indian and Bangla characters [2]- [8].A minute development, however, has been attained in the recognition of Arabic characters, principally owing to their cursive behavior [9].A simple method for Arabic character recognition system was proposed by Abdelwadood et al. [10] where segmentation of Arabic characters were performed by dynamic windowing and correlation were employed to recognize Arabic alphabets.AbdelRaouf et al. offered a comprehensive study on multimodal Arabic corpus for OCR development [11].Dreuw et al. proposed a hidden Markov model based OCR system [12].Oujaoura et al. proposed a Zernike moments based Walsh Transformation for feature extraction and employed neural networks for classification of Arabic characters [13].Abulnaja and Batawi have proposed a fault-tolerant method to increase the success rate of Arabic character recognition [14].With cursive styles, Alkhateeb et al. [15] employed hidden Markov model for Arabic alphabet identification.Vaseghi et al. [16] presented a holistic approach to recognize handwritten Farsic/Arabic word employing discrete Markov chain and Kohonen feature map for Arabic character recognition.Al-Taani et al. [17] analyzed the structural features of Arabic characters and made a decision tree learning approach for character identification.
AbdelRaouf et al. [18] have proposed the Haar cascade classifier approach which employs discrepancies between rectangular sub-windows to collect features of the Arabic characters.Although the characters with diagonal shapes were prominent while considering the rotated features, but character with other orientations were poorly recognized by their method.Elnagar and Bentrcia [19] have used a neural network to validate the over-segmentation problem in Arabic character recognition and proposed a heuristic-based rule to accumulate strokes for accurate segmentation of characters.Supriana and Nasution [20] have implemented binarization and median filter for Arabic character recognition.They employed Hilditch operator for thinning combined by two templates, one to prevent redundant tail and the other one to eliminate redundant interest points.During segmentation, they employed line segmentation by horizontal projection by connected pixel components, and letter segmentation by Zidouri algorithm.For feature extraction, they used 24 features.Parvez and Mahmoud [21] have segmented the Arabic texts into words and subwords to extracted the dots and have developed an Arabic handwriting script recognition by means of morphological procedures and fuzzy polygon matching algorithm.Mohammad et al. [22] employed three hidden Marcov model skewed windows: aligned to the left, right, and vertical, and combined the effects employing a set of arrangements: addition law, majority vote and multilayer perceptron.Al-Helali and Mahmoud [23] have processed the delayed strokes of Arabic characters and proposed a framework for Arabic character recognition.Although they evaluated the statistical features of Arabic characters but they did not consider the connectivity problems, variability, and style change of text.This paper proposes a BAMMLP approach for Arabic character recognition that is commenced on local image sampling by converting each Arabic character into a selected M×N feature matrix.The system is organized with a Bidirectional Associative Memory (BAM) and a Multi-Layer Perception (MLP).The remainder of the article is organized as: Section II describes salient features of Arabic scripts, Section III describes the proposed Arabic OCR algorithm, Section IV highlights the architecture of BAMMLP network, Section V outlines the experimental results, and finally the conclusion section outlines the overall conclusions of the article.www.ijacsa.thesai.orgII.
SALIENT FEATURES OF ARABIC CHARACTERS Arabic scripts are written from right to left and are always cursive [24], [25].There are 28 basic characters and each character has multiple forms depending on its place in the word.Table 1 shows the 28 Arabic characters with their numerous forms: Isolated, Beginning, Middle, and End forms.While writing separately, each Arabic character is patterned in an isolated style and is implied in three different styles when it is joined with other characters.Fig. 1 shows some characters whose isolated forms are distinguished from the Beginning, Middle, and End forms.Characters possessing the same shape but vary in number of dots provide the similar characteristics.Arabic scripts belong to the following features [1]: 1) The texts are being written from right to left.
2) Different characters have different sizes.
3) Different characters have different number of dots.Some characters have dots located in the upper side, some have in the lower side, some contain one dot, some contain two dots, some contain three dots, and some characters even do not have any dot.
4) The same character appears in diverse profiles depending on its location in the word.
5) Within a word, every character is usually joined to the preceding character.However, there are six characters that do not attach to the preceding character.These characters have only the Isolated and End forms.
6) Some Arabic words consist of sub-words.Example, the word ‫رسول‬ contains three sub-words: a character ‫,ر‬ the second sub-word ‫,سو‬ and finally the character ‫.ل‬

III. ARABIC CHARACTER RECOGNITION
Since the Arabic alphabets possess diverse profiles at different positions of a word and most letters contain one, two, or three dots, the proposed Arabic OCR algorithm, therefore, employs a two stage method: the first stage serves for dots identification; and the second stage is dedicated for recognizing the main shape of the characters.The reason behind dots identification is to reduce the complexity of the problem domain.Since some characters have different number of dots above or below the basic skeleton but have the similar shapes, as shown in Fig. 3, so counting the dots and identification of the basic shape reduces the search space.To recognize the main shape of characters, the system employs a three steps procedure, as shown in Fig. 4.

‫ب‬ ‫ت‬ ‫خ‬ ‫ى‬
Step 1: Image acquisition: The proposed Arabic OCR system is commenced on image acquisition process that scans the texts in 600 dots per inch and the generated images are being saved in .pgmfiles.This research employs popular Arabic words for image database.After scanning, images of the characters are being Affine (scaling, translation and rotation) transformed [26].
Step 2: Image pre-processing: The input images sometimes may be corrupted by various sources of noise.If the noise is not suppressed, it may cause incorrect results.Therefore, these images are filtered by median filter to remove noise and then converted into binary image for processing.
Step 3: Image recognition: This step involves word segmentation, character segmentation and recognition steps.
Arabic characters are being segmented by histogram analysis and baseline detection method.The baseline is described by one or more rows with the higher number of black pixels on them compared to other lines.Baselines are being detected by employing histogram construction in counting the number of black pixels followed by white pixels in a single line, as shown in Fig. 5. Subsequently, each line is considered separately for segmenting the words.

IV. BAMMLP NETWORK
The BAMMLP is the hybridization of two neural networks: 1) Bidirectional Associative Memory (BAM) network and 2) Multilayer Perceptron (MLP).The design of the BAMMLP network [27] is illustrated in Fig. 6.
Once the image pre-processing is done, the Arabic characters are patterned in a 20×20 matrix and subjected to the input of the BAM network.Thus the matrix pattern is characterized as vectors of 400 neurons.The BAM accepts an input pattern as a vector and generates an associated vector to reduce the size.To develop the BAM, a correlation matrix is created for each pattern pair.The BAM disseminates the input Vector A to the B layer where the net input is computed as: and control the output values by the thresholding function: for k=1, 2, …, N.
The pattern B formed in the Y layer is then disseminated back to the X layer computing the net input as: and decide the output values as: www.ijacsa.thesai.org The output of the BAM layer is subjected to the input of the MLP.The Multi-layer Perceptron (MLP) is being trained by back-propagation algorithm [28], [29].
Step 1: Initialization: Initialize the network with all the weights and threshold parameters of the MLP to small random numbers.
where sigmoid is the sigmoidal activation function, kl w and lm w are the weights between neuron k is the input layer of MLP and neuron l in the hidden layer, and neuron l is the hidden layer and neuron m in the output layer, respectively.l  and m  are the threshold values of the respective neurons.
Step 3: Weight modification: Modify the weights of the MLP disseminating the errors in the backward direction.
Step 4: Iteration: Increase iteration i by one, loop back to Step 2 and repeat the process until the error value reduces to the desired level.
For reorganizing Arabic characters, all the characters of Arabic dictionary need not train.Only the basic or mainstream characters (without dots) need to be trained.All other characters can be assessed by means of the information about the position and number of dots containing the characters.

V. EXPERIMENTAL RESULTS
The efficacy of the approach has been validated with numerous Arabic texts of different resolutions.Our system is capable of segmenting and identifying characters in images of various orientations and background conditions.Experiments are carried out on an Intel Core ™ i5-2390T CPU @ 2.70 GHz PC with 4 GB MB RAM.The Arabic character recognition system has been implemented employing Visual C++ programming language.Fig. 7 illustrates the program snapshot for a typical Arabic character individually.The graphs imply that the errors reduce exponentially.Although for BAMMLP, the error value reduces to 0.01 at 1996 iterations, it still remains 0.264 even after 35000 iterations.The graphs reveal that the BAMMLP network outperforms the MLP in terms of minimum number of iterations to train the Arabic characters.Fig. 9 shows the error versus iteration graph for BAMMLP for 50% and 70% neurons with respect to the input layer, respectively.Obviously, as the number of neurons in the hidden layer is less, there is less computational cost and recognition process becomes faster.But for accuracy, we need more neurons in the hidden layer.So there is always a trade off in choosing the number of neurons in the hidden layer.For this experiment, the learning process achieved the expected threshold level within less than 5,000 iterations while choosing the number of neurons in the hidden layer to be 50% of the number of neurons in the input layer.On the contrary, considering 70% neurons in the hidden layer, the same threshold level is being achieved after 30,000 iterations.Therefore, the number of neurons in the hidden layer was chosen as 50% of the number of neurons in the input layer for recognizing Arabic characters.Later on, the BAMMLP hybrid neural network was used to recognize characters randomly, as shown Fig. 10.The recognition rate for different Arabic characters in isolated form is shown in Fig. 11.

VI. CONCLUSIONS
An efficient Arabic character recognition system has been presented through a hybrid neural network which consists of a BAM and a multilayer perceptron.The system is very fast and is able to carry out the recognition in less than 1ms for all forms of Arabic characters, which demonstrates that the method is an appropriate one for real-time applications.Our next approach will be to recognize Arabic number plate identification for any desired application, including black-lists, white-lists, and alarm functions.

Fig. 1 .
Fig. 1.Characters whose isolated forms are distinguished from their Beginning, Middle, and End forms.

Fig. 2
Fig.2illustrates a precise summary of the striking features of Arabic scripts: 1) written from right to left; 2) different characters have different sizes; 3) different characters have different number of dots, some characters even do not contain any dot; 4) the same character appears with different profiles; characters are not connected to the succeeding characters; 6) some words consist of sub-words.

Step 2 :
Activation: Activate the MLP by subjecting the training set activation of neurons in the l and m layers:

Fig. 7 .
Fig. 7. Snapshot of the software interface for Arabic character recognition.

Fig. 10 .
Fig. 10.Recognition of three test images ( ).))Experiments were conducted separately for Arabic character recognition for four different forms: Isolated, Beginning, Middle, and End form and their outcomes are furnished in the Table2. .ijacsa.thesai.org

Fig. 11 .
Fig. 11.Recognition rate for different Arabic characters in isolated form.

TABLE .
II. ACCURACY FOR DIFFERENT FORMS OF ARABIC CHARACTERS