Hybrid of Rough Neural Networks for Arabic/Farsi Handwriting Recognition

Handwritten character recognition is one of the focused areas of research in the field of Pattern Recognition. In this paper, a hybrid model of rough neural network has been developed for recognizing isolated Arabic/Farsi digital characters. It solves the neural network problems; proneness to overfitting, and the empirical nature of model development using rough sets and the dissimilarity analysis. Moreover the perturbation in the input data is violated using rough neuron. This paper describes an evolutionary rough neural network based technique to recognize Arabic/Farsi isolated handwritten digital characters. This method involves hierarchical feature extraction, data clustering and classification. In contrast with conventional neural network, a comparative study is appeared. Also, the details and limitations are discussed.


INTRODUCTION
There are various ways where computers directly take input from the human information system like Optical Character Recognition (OCR), speech recognition, symbolic (Icons, windows) interactive communication etc.Such systems are difficult to design and do not provide complete error free operation.General solution for speech or character recognition is very complex.Even fastest computers would need much larger computational time compared to human response time to perform the same job.To get working solution for the above class of problems, domain specific solutions should be much more efficient [9].Also, combing different techniques is needed to overcome the shortcomings in each other.In hand-written character recognition problem, the domain is reduced to a small subset of characters of limited number of written characters using specific style.This subset is then further classified to smaller subsets, where each group represents a character [6,9].
According to previous research [6,15], the complexity of the handwritten character recognition is greatly increased by the noise problem.Moreover, it should be influenced by the almost infinite inconsistency of handwriting as the result of the writer and the nature of the writing.These characteristics make the progress of handwritten more complex and difficult than typewritten.Thus, a good and effective tool to deal with vagueness and uncertainty of information is needed.Rough Sets [10]is used in the pre-processing stage in this paper that give the ability for dealing with inconsistency and noise reduction on the handwritten characters.This comes from the fact that Rough Sets can handle the missing feature value and inconsistency on the pattern [10,12].Moreover, rough sets are able to get the most essential part of the knowledge with minimum number of features based on its concept of reduct.The feature quality representations that are yielded from rough sets of a data dominate in typewritten character recognition, though it is well suited to all data analysis methods.Because of the nature of handwriting, the perturbation in the specific input data, pairwise representation, should be measured and treated, so a cluster based technique is needed.However the quality representation can be translated to this type of data into a pairwise representation using rough clustering method and problem dependent similarity measure.The main goal of this translation type is that a pairwise representation captures the structure that is captured by the quality data [1,11].By transferring quality data into a pairwise, the interpretation the significance of the individual feature is lost, so it is implicitly imbedded by the measure of dissimilarity result.
Since Real life data sets are relatively erroneous, there is an increasing need for effective tools that are able to deal with non-linear problems.Also, incorporating architectural changes result in improving accuracy of approximation.So far, nonlinear problems have been dealt with ANN (Artificial Neural Networks) [15].ANN was used for pattern recognition for its advantages of, parallel processing and certain fault tolerance [15,9].But, this method also has its weaknesses, just as the learning time will be soaring and easy to sink into a local minimum point with the increase of the dimensions [4].
In order to optimize the ANN structure to improve the learning efficiency, a hybrid model based on Rough Sets and Artificial Rough Neural Network (RS-RNN) is proposed.RS-RNN has a drawback that input neurons with zero activation energy negatively affect processing time and space so using rough set reduction algorithm is essential for reducing these superfluous neurons.Moreover, the perturbation in the sample data set is defined as a lower approximation and upper approximation to introduce the idea of the rough neuron [5].This method takes Rough Set as the modified disposal system of ANN to simplify the structure of it, and to reduce the attribute index and the sample numbers.So a practical method with theoretical support and methodological guarantee is provided efficiently to establish the Arabic/Farsi hand writing recognition system.www.ijarai.thesai.orgAlthough conventional neural network achieves a good ability to detect all possible interactions between predicted variables, and the availability of multiple training algorithms, it still suffer for different problems such as proneness to over fitting, and the empirical nature of model development.Combining Rough Sets solves the above mentioned problems of ANN and enhance its performance by discovering and removing input neurons with zero weights.In addition, rough neuron treats the problem of data perturbation.In contrast with ANN, a comparative study is mentioned.Database of handwritten Arabic/Farsi sample , IFHCDB (Isolated Farsi Handwritten Character Database) which is created at Amirkabir University of Technology (AUT) and isolated typewritten characters represented by 10 × 10 pixels [14], is used for more compromise results.This paper is organized as follows; in section 2 a brief introduction of important fields (rough neural network, rough set) is discussed.Section 3 describes the fundamentals of our method where the method RS-RNN and its performance are given.In Section 4 examine the application and guide the user using it.Then we conclude with section 5 the purpose of that paper and its results.

II. PRELIMINARIES
This section briefs on the basic notions of rough sets that is used in this paper and the detailed definitions can be referred to some related papers [1,3,10,12].

A. Rough Sets theory
Rough set theory is a new mathematical approach to imperfect knowledge [10,12].The principle notion of Rough Sets is that lowering the principle in data representation makes it possible to uncover patterns in the data, which may otherwise be obscured by too many details.At the basis of Rough Sets theory is the analysis of the limits of discernibility of subsets X of objects from the universe of discourse U.
Let U be a set of objects (universe of discourse), A be a set of attributes.A decision system is an attribute value .The aim of Rough Sets is to obtain irreducible but essential parts of the knowledge encoded by the given information system; these would constitute reducts of the system.So one is, in effect, looking for maximal sets of attributes taken from the initial set (A, say) which induce the same partition on the domain as A. In other words, the essence of the information remains intact, and superfluous attributes are removed.Reducts have nicely characterized in [12] by discernibility matrices and discernibility functions.
A principle task is the method of rule generation is to compute relative to a particular kind of information system, the decision system.R-reducts and d-discernibility matrices are used for this purpose [13] for each object is the lower approximation of X with respect to i I .The Coefficient   = 1 −   is called the inconsistency degree of DT [13].

B. Rough Neuron
Rough Neuron [5,16] was developed with an aim of classifying a set of objects into three parts based on a given condition i.e. into the lower, the upper and the negative regions.Rough neural networks [5] consist of both conventional as well as the rough neurons in a fully connected fashion.The rough neuron consists of two individual neurons called the upper bound neuron and the lower bound neuron, which have a mode of sharing information as demonstrated in Figure 1.The lower bound neuron, deals only with the definite www.ijarai.thesai.orgor certain part of the input data and generates its output signal called as the lower boundary-signal.The second neuron called the upper boundary neuron processes only that part of the input data which lies in the upper boundary region evaluated based on the concepts of rough sets and generates the output called upper boundary Signal.This interpretation of upper and lower boundary regions is limited only to the learning or training stage of the neural network.

III. HANDWRITING RECOGNITION BASED ON ROUGH NEURON
Arabic/Farsi handwriting recognition is widely accepted as the means of document authentication, authorization and personal verification in the modern societies of Middle East.For legality most documents like bank cheques, travel passport and academic certificate need to have authorized real time handwritten verification.Thus there is a need for automatic verification, although the difficulties faced in visual assessment of different types and different fonts [9].
In order to recognize handwritten characters, we need to digitize the position of the path of a drawing tool on a writing surface.A primary requirement of a handwriting recognition system is to allow handwriting anywhere on a writing surface and characters of any size.
The program requires the user to draw one or more copies of each character.As characters are drawn, bounding rectangles are calculated for each character.A hand-drawn character can be of any size because the bounding rectangle is used to normalize the image of the character to fit into a small two-dimensional grid that is used for input to a neural network [15].As the result of normalization, missing values and noise should be appeared.Moreover inconsistency among different patterns for the same user should be discovered.Thus, a good and effective tool to deal with vagueness and uncertainty of information is needed to extract the local features from pattern.Also, an innovative classifier which treats the inconsistency and the perturbation among patterns should be used.Hence, Rough Sets [18] are used in noise reduction and discovering the most admissible local feature, core feature.Also, depending on rough sets methodology of dissimilarity analysis [17], the differences among patterns are localized, i.e. the architecture of the neural network for the whole patterns is designed.Based on the dis-similarity measure, the location where data are perturbed, to define the position of rough neuron, is defined.As the result, the optimal architecture of rough neural networks is detected.Wherever, the superfluous neuron is removed by data reduction of rough sets, the rough neurons in the input pattern are located by the dissimilarity analysis.Then, the outputs of rough neuron (r) are calculated from equations (4,5).Finally, the conventional neurons in the input pattern are located in correspondence with the other local features without perturbation.The output of the conventional neuron is a function of the output combined from the rough neuron as equation ( 7)  =   −    (  ,  ) (7) The above function uses the difference between outputs of the upper and lower neurons and normalizes it by the average of the outputs of upper and lower neurons.As the result, each input grid point (local feature) represents the value of an input neuron in a neural network.
In training stage, hand-drawn characters can be of any size, but they are down-sampled to a specific size of grid.The program must determine when the program user is done drawing an individual character.This determination is made by recording the current time in milliseconds when each mouse down movement is recorded.After no mouse down movements has occurred for 400 milliseconds, it is assumed that the training character is no longer being drawn.

A. Rough Sets in the Pre-Processing
Hand-drawn characters can be of any size, but they are down-sampled to a specific size of grid.In this stage, we concentrate on Arabic/Farsi digits that can be represented by eleven features, typewritten digits, as demonstrated in Figure 2.
Some of the attributes that represent Arabic/Farsi digits are superfluous.The superfluous attribute can be reduced using rough set reduction algorithm (using discernibility Matrix) and the result is then clustered and applied to rough neural network.www.ijarai.thesai.orgFig. 2.
Template For Arabic/Farsi Digits Now we will use rough sets for data analysis.Rough Sets theory provides tools for expressing inexact dependencies within data.A Minimum Description Length Principle (MDL-Principle) gives us the reason of why we will use rough sets to reduce the input features of the data.It states, generally speaking, that rules of the most simplified construction, which preserve consistency with data, are likely to classify so far unseen objects with the lowest risk of error.Therefore, to enable classified more objects with high accuracy, it needs to neglect features being the source of redundant information, i.e. to use what is called reduct of attributes.A reduct is a subset of attributes such that it's enough to consider only the features that belong to this subset and still have the same amount of information., if and only if there exist two objects, which have the same value for each attribute from C except a .This statement may be expressed by mean of matrix elements   ij c , as given in Figure 3.
The input data to the model will be quantized first, i.e. the features defined the problem should be identified and labeled.If the input data i u are given, one has to divide the data into distinct sets and introduce new logical input variable k s such that: -Min‖ performs an operation that is analogical to checking for prime implicates of a Boolean function.The returned value is true if the argument R does not contain redundant attributes.Depending on the reduced set of attribute the robustness of the neural networks will be increase, according to the following theorem [2]; Theorem 1: Let F(u 1 ,u 2 ,………..,u n ) be an arbitrary linearly separable function π is the hyper plane separating vertices.Then with decreasing the dimensionality from (n) to (n-1) , the distance of vertices from π in (n-1) dimensions cannot decrease.

Thus; Corollary 2:
Decreasing number of input neurons increase pattern robustness and reduce tolerance Proof: it is proved by [2].
At present the reduction algorithm always focuses on reducing attributes and aims at obtaining the best attributes reduction.Potential knowledge contained in data is always targeted when we analyse the database.The complexity of the information system can be reduced by attributes reduction, although not all attribute values of each rule are necessary in the reduced information table, so the dis-similarity analysis among different objects is needed [10] Fig. 3.

B. Rough Sets in Pattern Clustering
Since the nature of human writer differs from one another, the challenge is to discover pattern inherent among different patterns.Because changes in general lead to uncertainty, the appropriate approaches for uncertainty modelling in order to capture , model and predicate the perspective phenomena is considered in dynamic environment.As a consequence, the combination of dynamic data mining and soft computing is very promising.Rough clustering [3] detect such changing data structure and aggregate each pixel in its correct cluster.
The appropriate clusters are considered in accordance with the feature reduced by rough set, reduct set.When the new data ℎ in each pattern  is considered to be located at cluster  if it is near to the existing cluster center   , which is defined by the following equation  ℎ ≤ 1 2 min    ,  (9) Where (  ,   ) is the distance between adjacent cluster centers   and   and  ℎ is the distance between pixel data ℎ and the class center  .
If the new data ℎ in each pattern  does not fit well into the existing cluster where these pixels are far away from current cluster center   , new clusters should be formed.Whenever, a little of new pixels are discovered far away from existing clusters centers, they might form a noise that should be removed from our pattern.The number of noise pixels  that are far away from existing cluster center should at most less than the average number of pixels in the lower approximation of the existing clusters weighted by the multiplier   <  is the total number of pixels in the lower approximations, the strict inequality of equation ( 9), in the pattern  and   is the total number of clusters at pattern .
This criterion requires the setting of multiplier .The smaller  is, the smaller number of noise pixels and thus the greater the number of pixels needs to be established a new cluster.

C. Dissimilarity in Data Analysis
Without knowledge of the domain and specifically the data set description, finding an appropriate weighting to give reasonable result would be computationally expensive.Since rough sets is able to measure dissimilarity between records of boolean value and to compute the knowledge that represent each pattern, attributes of the same value for all records are disregarded and measure to be zero distance.On the other hand, attributes with different values are considered to be dissimilar.There are many dissimilarity / similarity measures that can be used for the comparison of objects studied in the space of measured variables [7,11].
To fully understand the descriptor which is based on the centroid distance function, it is essential to foremost understand how one computes the centroid.In this section, we note that the formulas were found from [19].The position of the centroid, the center of gravity in a pattern i, is fixed in relation to the shape in cluster k .The centroid can be calculated by taking the average of all the points that are defined inside a cluster.Under the assumption that the shape in a cluster k is simply connected, we can compute the centroid simply by using only the boundary points.Let Nbe the total number of points on the border of our clusterk.Here, n ∈ [0, N − 1] .The x and y coordinates of the centroid, denoted by g x k and g y k respectively, are given by: Here, the area of the shape, A , is given by the following equation: (12) The centroid distance function expresses the distances of the boundary points from the centroid (g x k , g y k ) of a cluster centerυ k = (x υ k , y υ k ).It is given by the following formula For each pattern i, d k i represents the perturbation of the current data in cluster k that takes into account the cluster significance, where γ k is the inconsistency coefficient of cluster k.Thus the level of significance, tolerance, for cluster k at pattern i is given by Where d(υ j , υ k ) is the distance between two adjacent cluster centers υ j and υ k .Hence the accepted variations captured in each pattern i forms an interval valued feature values; Hence, superfluous neuron, produced by the rough sets, is considered to be zero input and the other are considered to be rough neuron.If d k i is approximately zero then the corresponding neuron is a conventional neuron with input to be one, otherwise the lower and upper neurons inputs are dominated by the interval bounds of equation (15).

D. Adapting the Rough Neural Network
The proposed system uses a Back propagation (BP) RNNS for classification process.In which, the captured characters are trained by the rough neural network using back propagation technique.Back propagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task.[4,8].
The structure of RNN consists of input layer, hidden layer, and output layer [5].Neurons in the input layer are rough neurons.The number of nodes in the input layer differs according to the feature vector dimensionality.The number of neurons in the input layer is equal to the number of clusters corresponding to each digit pattern.Each input neuron is given its value by equation (15).The hidden layer consists of conventional neurons; the number of neurons in the hidden layer is approximately double the input layer size.The input of the hidden layer neuron is given by equation (7).The output layer consists of four crisp neuron that represent the binary representation corresponding to each Arabic/Farsi digit.
It is a supervised learning method, and is an implementation of the Delta rule.It requires a teacher that knows, or can calculate, the desired output for any given input.It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections www.ijarai.thesai.org that loop).The term is an abbreviation for "backwards propagation of errors".Back propagation requires that the activation function used by the artificial neurons (or "nodes") is differentiable.A generalized sigmoid function equation (16) has been chosen in order to accommodate any non-linearity during the modeling process Where the input is applied to the node and {, , } are the node parameters.Further, input to a succeeding layer () can be given as a linear combination of all outputs of neurons belonging to the preceding layer().The linear combination is based on the weighted connections between the respective neurons For learning phase, the algorithm changes the weights till a specified number of epochs or a get a zero error free.Where the new weight is adjusted according to the formula Where ` is the first order derivative of equation ( 16) ,  is the is the learning constant The learning rate is an important parameter in the learning process.It always lies between 0 to 1.In a rough neural network the training goes in two tiers, i.e., two parallel processes run through the network during parameter approximation, one through the lower neuron and the other through the upper neuron [20].The learning rates of individual neurons are different and are generally time varying and decreases with the number of iterations.The learning rate of the lower neuron is expected to be more than the upper neuron as it is having significant information, which helps in discerning the patterns.Due to the properties of sigmoid function , thus Where    is the actual output and    is the desired output of the neuron .
The neural network space N is defined as a mapping that transforms a neural network  ∈  to a neural network  ∈ .The main thing to be done is to find the connections between the way in which we make reduction and characteristics of the network that is constructed after reduction.The weights of the connections among input unit ( rough or conventional neuron) are denoted by  i h , i = 1,2, … , n, h = 1,2, … H and the weights of the connections between the hidden units and the output units by  ℎ 0 , ℎ = 1,2, … ,  = 1,2, … ,  where H is the number of hidden unit, n is the dimensionality of the input pattern and O is the number of output units.
Sine the output of the network is given by [4,8,20] )) (20) Where f is a sigmoid function.Since our goal is to find and eliminate as many unneeded network neurons as possible, it is important to identify the effect of violating connections of neurons to the output of the network.
Let  0 be considered as a function of single variable corresponding to the connection between the i th input unit and the h th hidden unit.The derivative of  0 with respect to the weights of the network is as follows By the Mean Value Theorem, let  0 be a function of single variable corresponding to the connection between the i th input unit and the h th hidden unit, thus; Where 0 <  < 1.At w equals zero, thus Consider the activation function is sigmoid function that belong to the interval [-0.5,0.5],thus From equation ( 24) and (25) if follow that Inequality ( 26) illustrates an upper bound on the changes in the output of the network when the weight   ℎ is eliminated.Since  ∈ [0,1], hence Similarly, by considering  0 is a function of a single variable  that corresponds to the connection between the ℎ ℎ hidden unit and the  ℎ output unit, Hence Thus, the changes in the output of the network after the weight  ℎ  is eliminated is bounded by Equations ( 27) and (29) show the maximum error that occurred in the network if a connection is eliminated from any layer in the model.Hence, removing the neuron and its connections that do not exist in the reduct set if the error, calculated by equation ( 27) and (29) are less than some threshold.When performing network learning is done in each iteration, a structure adaptation is consequently performed.The learning process finished whenever no significantly better results of classification with this network.

IV. TESTING HANDWRITING RECOGNITION
This section is aimed at providing an insight into results from the early section.It may also be regarded as space used to provide a reasonable justification for certain trends.This www.ijarai.thesai.orgsystem has to phases, learning phase and test phase.In learning phase two stages are applied; the first stage is applied on a typewritten data set, the other is applied on the handwritten data set.First, the Arabic/Farsi typewritten instance digits are clustered in accordance to eleven features as demonstrated early in Figure (2) and can be represented by data attribute value Table (1) Table 1: Data Attribute Table For Arabic/Farsi Digits a b c d e f g h i j k 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 0 1 1 1 1 1 0 1 Using rough set discernibility matrix, these attributes are reduced in which { } is the most significant attribute associated with the reduct set to be {, , , ℎ, } , where   =   =   = 0.4,   = 0.6    = 0.2.By Dissimilarity analysis of rough set [10], differences among patterns and the optimal structure of NN are discovered, as illustrated in Figure 4.The number of nodes in input layer are chosen in accordance with the redact set.The pattern of zero are differs than three pattern in one attribute is called , while as zero pattern is characterized by only one attribute called .Thus, the three pattern is characterized by the attributed {, }.The Dis-Similarity Diagram Among Different Arabic/Farsi Digits Moreover, the center of gravity, calculated by equation ( 11), for typewritten data set represents   , center of the cluster.Using this reduction algorithm on a typewritten data reduces the number of learning epochs as a result of reducing number of input neurons as shown on Figure 5 which compare number of iterations before reduction and after it.To evaluate the proposed hybrid model on isolated Arabic/Farsi handwritten characters, IFHCBD which is created at Amirkabir University of Technology has been used.The RS-RNN model is applied on Arabic/Farsi handwritten digits.IFHCDB contains a set of images for Arabic/Farsi digits that are divided into 88 training data and 30 test data.First, rough sets are used in clustering the handwritten data set, where the noise are discovered and removed by equation (10).Second, dissimilarity in data analysis is applied to compute the perturbation in the input data, by equation (15).Hence rough neurons are discovered and the optimal architecture is represented by rough sets' clusters.Finally, an implementation, as shown in Figure (6), for the Back propagation algorithm is performed where a structure adaptation is done by each learning epochs.The learning process is finished whenever the error function is at most 0.01.Moreover, a comparison among theories [15], such as genetic algorithms, simple object modeling, statistical method and rough sets with neural networks, and RS-RNN in terms of recognition accuracy is demonstrated in Figure (8).As mentioned RS-RNN approach is able to recognize the Arabic/Farsi handwritten digits more efficient than others.Handwriting number recognition, a system for recognizing isolated digits as standard characters, is a challenging problem because different users have their own handwriting style.Moreover, it is affected by noise that established during acquisition and normalization.The main goal to this paper is to recognizing isolated Arabic/Farsi digits exist in different forms.
This paper presents a hybrid model that starts with acquiring and normalizing an image containing Arabic/Farsi digits.The digitized image was treated by rough sets.Rough sets played an important role in reducing the feature attribute, reduct, and discovering dissimilarity among different patterns.Also, rough sets segment the user pattern into different clusters in accordance with the Arabic/Farsi digit pattern.Moreover, the noise has been eliminated and the dissimilarity between the user cluster and its corresponding feature was measured.
By this paper, the optimal architecture of the rough neural network was discovered.Finally an adaptation of the rough neuron was applied during the learning phase using the back propagation algorithm.
The results were tested on standard data and proved the efficiency of our method.This approach efficiently chooses a segmentation method to fit our demands.Our approach successfully design and implement rough neural network which go without demands.After that RS-RNN are able to understand the Arabic/Farsi numbers that was manually written by the users.

Fig. 1 .
Fig. 1.Rough neuron structure Tough the neurons (lower and upper) have a conventional sigmoid transfer; the actual output of the lower and upper neurons is given as the following functions   = max (   ,    ) (4)   = min (   ,    ) (5) Where  * =   =   .   :   ℎ  (6) Training in rough neural is similar to conventional neural network [16].During training the network use inductive learning principle to learn from the training set.In supervised training the desired output from output neurons in the training set is known, the weight is modified using learning equation.Neural network use back propagation technique for training.Training using rough Back Propagation performs gradient descent in weight space on an error function.

Fig. 5 .
Fig. 5. Comparing Number Of Iterations Using Neural Network With Number Of Iterations After Applying Rough Sets

Fig. 6 .
Fig.6.The Implementation Of The RS-RNN At the end of the study, we prove the ability of our System depending on the results for the recognition accuracy.Comparing errors for each digit pattern that are resulted from training by neural network against those are resulted from RS-RNN, as demonstrated in Figure(7). ijarai.thesai.org

Fig. 7 .
Fig. 7. Number Of Recognized Patterns By NN And RS-RNN

Fig. 8 .
Fig. 8. the accuracy measure among different theoris and RS-RNN V. CONCLUSIONS If the Decision table has n objects, so by