Off-Line Arabic ( Indian ) Numbers Recognition Using Expert System

This paper proposes an effective approach to automatic recognition of printed Arabic numerals which are extracted from digital images. First, the input image is normalized and pre-processed to an acceptable form. From the preprocessed image, components of the words are segmented into individual objects representing different numbers. Second, the numerical recognition is performed using an expert system based on a set of if-else rules, where each set of rules represents the categorization of each number. Finally, rigorous experiments are carried out on 226 random Arabic numerals selected from 40 images of Iraqi car plate numbers. The proposed method attained an accuracy of 97%. Keywords—Arabic numeral character recognition; Image Processing; Pattern Recognition; Feature Extraction; Object Segmentation; Expert System


INTRODUCTION
Automatic character recognition is becoming very important in many practical applications such as postcode identification and car plate number recognition.A traffic police officer may want to document the license plate numbers of approaching vehicles.Manually performing this task would obviously be very laborious, and incur significant amount of time.Conversely, an automated process that involves the application of a camera to capture the plate numbers and recognize them using a predictive model, would not only be beneficial in terms of computational time, but also ease the amount of human effort required for such task.Furthermore, these systems can be used for surveillance or monitoring of specific events using numbers.
However, most research studies in this area are mainly concentrated on Latin character recognition.In this paper, an effective method for Arabic character recognition is presented, which is also applicable to Kurdish and Persian languages [1].
Arabic is the first language in all Arabic countries.In total, the estimated population of these countries is 280 million and some other countries that consider Arabic as a second language have an estimated population of 250 million.Moreover, Arabic language is ranked fifth out of the most commonly used languages in the world.
Due to the wide usage of Arabic language, it is highly desirable to develop an effective and automatic Arabic character recognition system.Therefore, in this paper, a technique to recognize Arabic Numerals by using handcrafted features and expert system for decision making is proposed.This method involves extracting geometric features for each object to be further classified using expert system, which is discussed in-depth in the subsequent sections.It is worth noting that in this paper, Indian Numerals are basically denoted as Arabic, Kurdish and Persian numbers.The basic Indian numbers are 10, ranging from 0-to-9as 9,8,7,6,5,4,3,2,1,0.The organization of the rest of the paper is as follows.Section 2 describes the review of previous research works in this domain.Section 3 outlines the methods used for developing the proposed numeral character recognition system.Section 4 describes the experiment implementation.Section 5 provides the analysis of results and discussions on the main findings of the paper.Finally, this research is summarized in Section 6.

II. LITERATURE REVIEW
Many research studies have been conducted on automatic Arabic numeral recognition.Some studies are focused on handwritten recognition, while others concentrate on printed materials.A recognition system for offline handwritten Arabic numerals that exploited the properties of Hindi (Arabic) numerals as powerful set of features is proposed in [3], This method is mainly based on image processing operations and a decision making stage that uses if-else statements to determine the appropriate character output [3].Also, a Latin number recognition system for number plate localization & segmentation is presented, here, the authors adopted skeletonization method for feature extraction and recognition of the characters is based on Support Vector Machine (SVM).It is claimed that the method is invariant to translation and illumination variation [4,5].
Olasimbo Ayodeji Arigbabu contributed to this work while he was a graduate student at Universiti Putra Malaysia.www.ijacsa.thesai.orgAn efficient shift and scale invariant approach for offline machine-printed decimal digit recognition that computes the correlation factor between the reference and test image to perform recognition, is described in [6].Also, in [7], a technique utilizes a number of statistical methods to perform machine print recognition.In addition, several approaches that are based on Neural Networks and Support Vector Machine (SVM) have been investigated for recognition of on-line and off-line handwritten Arabic and Hindi numerals [8][9][10][11][12][13][14][15].Likewise, Hidden Markov Models have also been adopted for recognition of off-line handwritten numerals [16].Furthermore, in [17] a genetic programming is used to perform the recognition of hand-written digits, however, it has lack in terms of recognition rate.In [18] translational motion estimation has been examined for the recognition of offline machine-print Hindi digits.

A. Framework Design
The overall steps involved in the proposed method are illustrated in Figure 1.The process starts with an input image that contains object numbers.Initially, the image is converted to black and white.Then, image normalization (crop) is performed to localize the region of interest (ROI), which is mainly composed of the number section.In addition to that, image complement operation is applied to the normalized image to enable further processing on image pixels with value of 1.

Before Normalization After Normalization
Fig. 2. Car number plate image, before normalization and after normalization operation Further, preprocessing operations including removal of noise and image enhancement are performed.Afterward, image segmentation is applied using region labeling method [19], and finally, the proposed algorithm for number recognition is implemented to obtain the decision of each number's identity.

B. Preprocessing
The normalized input image is converted to binary and the complement of the binary image is derived, as shown in Fig 3a .Then, median filter as a 3 x 3 kernel size for noise removal is used, as depicted in Fig 3b .Closing morphological operation is performed on the filtered image using a 7 x 7 structure element and opening operation is used to remove unwanted small pixels from the binary image.Figure 3 depicts the outputs of the four operations adopted in this research for preprocessing before proceeding to number recognition.

C. Expert System
An expert system is a computer system that emulates the decision-making ability of a human expert [22].Expert systems are suitable tools for implementing structural pattern recognition techniques and it helps to solve difficult pattern recognition problems.More rules and human experience can be added easily using rule-based systems, especially in closedsystem applications with precise inputs and logical outputs [23,24].Expert systems have a number of major system components and interface with individuals who interact with the system in various roles as shown in figure 4. In rule-based expert systems, there are two basic techniques; Forward chaining and Backward chaining inference.The domain knowledge is represented by a set of IF-THEN production rules and the data is represented by a set of facts about the current situation.The matching of the rule IF parts to the facts produces inference chains.The inference chain indicates how an expert system applies the rules to reach a conclusion.The inference engine must decide when the rules have to be fired [24].An inference engine using forward chaining searches the inference rules until it finds one, where the IF clause is known to be true.Forward chaining is used in this paper because of the similarity to the methodology that depends on the datadriven reasoning.The reasoning starts from the known data and proceeds forward with that data.Each time, only the topmost rule is executed, and when fired, the rule adds a new fact to the database.Any rule can be executed only once and the match-fire cycle stops when no further rules can be fired [25,26].

D. Feature Extraction and Recognition
This section discusses the features that are useful for recognition of each numerical object, as well as how the features are obtained or extracted.Prior to that, it is essential to mention that the recognition operation is performed by processing each object once at a time.Therefore, object segmentation operation is considered very crucial in the proposed method.Details of the segmentation operation are elaborated in [27].For instance, figure 5 shows some examples of segmented Arabic numbers for feature extraction, where each number will be processed separately.The feature extraction and recognition of each number are performed sequentially.In other words, the features of a particular number are extracted using the proposed algorithm, and then, the decision of the number's identity is performed based on the set of rules in the expert system.The process examines every possible match of the facts provided in the inference engine to determine the expected identity of the number.For instance, a random number can be predicted by first checking whether the feature properties align with the facts about number 5 in the inference engine.If an alignment is not found, the system examines the possibility with the facts about number 9. The process is repeated till it reaches number 6, but if a match is found then the expected identity will be presented to the user.

Number 5 Recognition:
In order to assign an identity '5' to an object, two conditions (facts) should be satisfied.Firstly, the Euler number should be zero.Euler number describes the relation between the number of contiguous parts and the number of holes on a shape.Let S denote the number of contiguous parts and N be the number of holes in a shape.Thus, the Euler number is determined as in (1): (1) For example, the Euler number for Shape (B) is -1, Shape (9) is 0, and shape (3) is 1.Secondly, the number of flips should be greater than or equal to 3 flips.Flips number is www.ijacsa.thesai.orgcomputed by scanning the object from left to right at the midlevel of the image, as shown in figure 6.The pseudo code for number five is as follows:

If [( Euler Number No. =0) && (Flip_number_mid_horizonatally ) >= 3] Then
The Object is '5'; In figure 6, it can be seen that the arrow indicates three flips by scanning from left to right at the mid-level of the image.Number of flips is simply a count of the alternating transitions of pixel values from "1" to "0" or vice versa.Below is a pseudo code for extracting the number of flips.
first_value= (  In case the conditions for number '5' are not satisfied, the object will be examined with the conditions for number '9'.The first condition is that the Euler number should be zero.The second is that the aspect ratio (calculated by dividing the minor axes by the major axes as shown in figure 7) should be more than 0.6.The Aspect ratio formula is specified in (2): (2) The third condition is the flips count resulting from object scanning from left to right direction at the lower part of the object, should be less than or equal to 2. As illustrated in figure 7, the arrow indicates the scanning direction to count the number of flips, which in this case is equals to 2. Pseudo code for number 9 recognition is:

If [(Euler Number =0) && (Aspect ratio > 0.6 ) && (flipslower-horizontal <=2) ] Then
The Object is '9'; Number '8' Recognition: Three conditions are used to examine whether the object is number '8', when the conditions for number '5' and '9' are not satisfied.Firstly, the Euler Number should be equal to one.Secondly, the widths of the object at the upper, middle, and lower segments are checked.Basically, the width at the upper segment should be less than the width at the middle and lower segments.The final condition is that the middle width should be less than the width at lower segment of the object.The pseudo code is as follows:

If [( Euler Number No.=0) && ( lower_dist > middle_dist > upper_dist)] Then
The Object is '8'; Three conditions are also considered to determine whether the preprocessed object is number 7, when the conditions of '5', '9', and '8' are not satisfied.The Euler Number of the object should be to one.The width of the object at the upper segment should be greater than the width of the middle and lower segments.The final condition is that, the width at the middle segment should be larger than the lower segment width.The pseudo code is as follows:

If [(Euler Number =0) && (lower_dist < middle_dist < upper_dist ) ] Then
The Object is '7'; upper_dist middle-dist lower-dist www.ijacsa.thesai.orgWhen the conditions for number '5' '9', '8' and '7' are not satisfied, the object will be examined with the following conditions to determine whether the number is '3'.Firstly, the Euler Number should be equal to 1.Then, the algorithm calculates the number of flips for two separated parts.Firstly, the part located in the top quarter of the number object, as highlighted with the upper arrow in figure 10.Here, the number of flips must be greater than or equal to 6.This kind of feature is quite discriminating in comparison to the features of other number since 3 is the only object whose number of flips is equal to 6 in the mentioned position.The third condition is achieved by calculating the flips in the lower quarter of the object and the result should be equal to 2. The pseudo code for the number three object is as follows:

If [ (EulerNumber= 1) && (flip_top_quarter >=6) && (flip_bottom_quarter=2)] Then
The Object is '3'; Now, conditions are described to determine the identity of the input image as 2, when the conditions for number '5' '9', '8', '7' and '3' are not satisfied.However, prior to that, it is imperative to mention that morphological processing based on skeletonization [28] as shown in figure 11, is adopted in order to extract features that are peculiar to number 2, and also enhance the processing of further feature computations.The four conditions are as follows: the Euler Number should be equal to one.The ratio fraction of the object should be greater than or equal to 0.25.This factor is determined by dividing the width (W) distance by height (H) distance.In order to compute each of the mentioned distances, three basic points are located on the object as illustrated in figure 11, which are denoted by the following: P1 is positioned at the bottom of the object, P2 is positioned at the upper left, and P3 lays on the upper right.Since, each point has its coordinates x and y as P(x,y), Height (H), width (W) and slop distance are determined according to the Euclidean distance:

√
(5) Then the ratio is computed as: (6) Third condition is that, the angle between the slop (P1-P3) and x-axes must be positive (larger than zero) to indicate that the object is number '2', let P3(x3,y3) and P1(x1,y1) denote two points of the slop line as shown in figure 11, then the angle is computed using ( 7) and ( 8): The fourth condition is achieved by counting the number of flips, which should be equal to 2 in order to differentiate the number from number three.The following is the pseudo code for the four conditions: In case conditions for number '5' '9', '8', '7' '3' and '2' are not satisfied, the object will be examined with the conditions for number '0' which are described as follows: Firstly, the Euler Number should be equal to one.Secondly, P1 www.ijacsa.thesai.orgsolidity (S) factor should be greater than 0.9, (S > 0.9).Solidity (S) is a scalar specifying the proportion of the pixels object in the convex hull that are also in the region as shown in figure 12, it is computed in (9) as follows: Where Area of object (Area_S) is calculated according to the conditional equation in (10): And the convex hull is calculated as follows: ∑ ∑ (11) The third condition is that, the aspect ratio, which is the division of the minor axis by the major axes, should be more than 0.5.This factor is chosen as both of the mentioned axes sum up to 1, thus 0.5 is considered to take the worst case.The pseudo code of three conditions is: && Solidity > 0.9 && Aspect_Ratio > 0.5] Then The Object is '0'; Number 4 Recognition: In case conditions for number '5' '9', '8', '7', ''3, '2' and '0' are not satisfied, the object will be examined with number '4' conditions which are two conditions.Firstly, Euler number should be equal to one.Secondly, the number of flips should be greater than or equal to 4. To retrieve the number of flips, the object is scanned from the top middle point to the bottom of the image, as shown in figure 13.By examining the object with the aforementioned conditions, if the output of the object cannot be accurately decided, then the following conditions describing the facts about number '1' will be considered.Similarly, in this case skeletonization is utilized to enable extraction of detailed information about number 1. Afterwards, the three conditions considered are: Firstly, Euler Number must equal to one.Secondly, the ratio fraction of the object shall be less than factor (0.25) (opposite of the number 2 and 6 recognition).This factor is determined by dividing width distance on the height distance.In order to compute each mentioned distance, three basic points have to be located in the object as in the figure 14, as following: P1 is positioned in bottom of the object, P2 is positioned upper left, and P3 lays upper right.
As usual each point has its trajectories x and y as P(x,y).Height, width and slop distances are determined according to Euclidean distance and depicted in the figure 14.The determinations have been explained as in equations in (3), ( 4), ( 5) and ( 6).Additionally, third condition, the angle between the slop (P1-P3) and x-axes must be negative (less than zero) to indicate that this object is number '1', let P3(x3,y3) and P1(x1,y1) are two points of the slop line as shown in figure 14.Then, the angle is computed by taking Tan inverse to the theta as explained in equation ( 7) and ( 8).Now, the pseudo code for the three conditions is as follows: Finally, if conditions for numbers '5' '9', '8', '7', '3', '2', '0', '4', and '1' have not been achieved successfully, the object will be examined with facts about number '6' which is described in the following paragraph.Also, in this case skeletionalization is initially used to preprocess the object.Afterwards, the three conditions are: Firstly, Euler Number must equal to one.Secondly, the ratio fraction of the object should be greater than or equal to factor (0.25) (opposite of number '1' recognition).This factor is determined by dividing width distance by the height distance.In order to compute each mentioned distance, three basic points are located on the object as depicted in the figure 15.P1 is positioned in bottom of the object, P2 is positioned upper left, and P3 lays upper right.Also, since each point has its trajectories x and y as P(x,y) thus the Height distance (H), width distance (W), slop distance, width height ratio are determined based on Euclidean distance using the following equations ( 12), ( 13), ( 14) and( 15) : (15) The third condition is that, the angle between the slop (P1-P2) and x-axes should be negative (less than zero) to indicate that this object is number '6', which is computed as follows: ( 16), (17) The pseudo code for the three conditions is as follows:

IV. EXPERIMENT AND IMPLEMENTATION
To test the proposed algorithm and ascertain its ability to generalize to any random input number, thus, evaluating the algorithm with car plate numbers is considered in this experiment.The number of Arabic numerals is 226 characters collected randomly from 40 images of Iraqi car plates.It is important to mention that, the images are captured in real world conditions where several imaging factors such as illumination, shadow, camera view and incidental lighting are not constrained.Normally, in the verification or identification comparison, there are two possible error measures: False Accept Rate (FAR), which results from the forged template that accepted by the computer system falsely during testing.and False Rejection Rate (FRR), which results from the genuine template that the system recognizes as the query template wrongly [29,30].Finally, the total accuracy of the system is calculated by subtracting the average error rate from 100% as in ( 18): (18) In this research, FAR error does not exist, as there are no forge templates in this experiment.Therefore, FAR is mainly equal to zero.However, FRR is largely used for the testing measure to estimate the recognition rate, because the Arabic numbers are considered as genuine templates, if they are wrongly recognized by computer system, then the FRR increases.For example, number '5' is deemed as genuine template, if the computer system recognized it as 5, FRR is going to be zero, otherwise FRR will be increased.Finally, the equations that are used to estimate the accuracy are as (19) and ( 20): The proposed algorithm is summarized in figure 16 as follows: (1) (2) Two different results are reported as the outcomes of this research.The first result is the recognition error for each distinct Arabic number among the 226 sample set, which is attained by calculating the total output of a specific numeral type divided by the total input (queried) of the same numeral type that have been randomly collected in the dataset.For example, as shown in Table 1, numeral number 8 has been iterated 25 times with no error in the recognition output (FRR=0).For the remaining numeral types, Table 1 shows the details as dataset iteration times with their system output and also shows the successful accuracy.
It can be seen in table 1 that the highest False Reject Rate error (FRR) is 11.53%, which is specifically related to the Arabic number 6.In decreasing order, it can be seen that number 2 has 6.66 % error, number 3 has 4.16 and number 1 has FRR of 2.17%.These results are mainly due to the noise in the car plate images, while numbers zero, five, seven, eight, and nine have no error at all during testing.The second result that is reported in this research is obtained by calculating the recognition rate and FRR error for each car plate number, whether each one might consist of 4 or 5 or 6 numeral numbers.Figure 17 shows a bar chart which describes each attempt among the 40-image in x axes with their successful accuracy as in 100% in y axis of the chart.Here, the overall successful accuracy is 97%, which is the average recognition rate for 40 car plate numbers.It is clear in figure 17 that the following images: 7, 17, 18, 19, 20, 22 and 23 have accuracy 83% because one of the numbers is not recognized correctly due to presence of noise in the input image.This rate (83%) is calculated as follows: In this research method, there is no dataset training to be matched against it as matching operation.However, it works by extracting facts by using geometric feature extraction in order to be applied to the set of rules by using if-else statements as an expert system works.

VI. SUMMARY
An effective algorithm for Arabic offline print written number recognition is proposed in this research.Several preprocessing operations are initially applied to the input image such as conversion to binary image, noise removal, morphological filtering, and segmentation before entering the data to the recognition system.The proposed approach is based on extraction of both local features such as computing number of flips of the only upper part of the object, as well as global geometric features such as computing the overall aspect ratio of the object, width, height and orientation of the object number.The features are further quantified into a set of facts or conditions that are used for classification based on a set of rules as an expert system.The experiment has been conducted on a random 226 numbers collected from 40 Iraqi car plate numbers.The output showed that the recognition error rate in terms of False Rejection Rate (FRR) is 3% or the overall successful accuracy is 97%.However, this algorithm is not robust against object translation and rotation.
Finally, improving and investigating the possibility of using the proposed algorithm for handwritten Arabic number recognition rather than print written is considered as a future work.

Fig. 1 .
Fig. 1.Flowchart of the proposed offline print-written number recognition system

Fig. 6 .
Fig. 6.Shows scanning Arabic object number five to count number of flips Number 9 Recognition:

Fig. 7 .
Fig. 7. Shows scanning Arabic object number nine to count number of flips

Fig. 9 .
Fig. 9. Shows three scanning positions of Arabic object number seven Number 3 Recognition:

Fig. 10 .
Fig. 10.Shows two scanning positions of Arabic object number three Number 2 Recognition:

Fig. 15 .
Fig. 15.Shows the Arabic object number six Object is not exist www.ijacsa.thesai.orgV. RESULT AND DISCUSSION

Fig. 17 .
Fig. 17.Illustrates the 40 car plate numbers in X axis with their corresponding accuracies in Y axis

TABLE I .
SHOWS THE CHARACTER NUMBER TYPES, COMPUTER SYSTEM OUTPUTS, FRR AND SUCCESSFUL ACCURACY IN PERCENTAGE