Zernike Moment Feature Extraction for Handwritten Devanagari (marathi) Compound Character Recognition

—Compound character recognition of Devanagari script is one of the challenging tasks since the characters are complex in structure and can be modified by writing combination of two or more characters. These compound characters occurs 12 to 15% in the Devanagari Script. The moment based techniques are being successfully applied to several image processing problems and represents a fundamental tool to generate feature descriptors where the Zernike moment technique has a rotation invariance property which found to be desirable for handwritten character recognition. This paper discusses extraction of features from handwritten compound characters using Zernike moment feature descriptor and proposes SVM and k-NN based classification system. The proposed classification system preprocess and normalize the 27000 handwritten character images into 30x30 pixels images and divides them into zones. The pre-classification produces three classes depending on presence or absence of vertical bar. Further Zernike moment feature extraction is performed on each zone. The overall recognition rate of proposed system using SVM and k-NN classifier is upto 98.37%, and 95.82% respectively.


I. INTRODUCTION
Handwritten character recognition is gaining popularity for many years and attracting researchers for the purpose of potential application development.These potential applications reduce the cost of human efforts and save the time.Some of its potential application areas are like bank automation, postal automation [1]- [3] etc. Similarly the biometric and criminal identification system uses scanned handwritten script for forensic and Historic Document Analysis (HDA) and represents an excellent study area within the research field of biometrics and forensic science.
The technical challenge in handwritten character recognition comes from three sources: Symbol: an ideal shape that occurs in hierarchy and symbol are arranged in complex form at different level in organization.Deformation: shape variation in each symbol to undergoes geometric transformation (translation, rotation, scaling, stretching) and complex representation.Defect flaw in image owing to print, scan, quantized, binary etc. Handwritten and Printing character demands diverse approach, handwritten consist of extended stroke and printed consist of normal shaped blobs.
Research in handwritten character recognition focuses on two main approaches i.e. on-line and off-line.In on-line character recognition system captures data by the sensors during writing process, which makes the information dynamically available according to the strokes.While, off-line character recognition takes place in static form where images are captured or scanned after completion of the writing process on paper/sheets.Both the tasks are challenging for automatic character recognition, specifically in off-line character recognition requires more efforts due to various reasons viz.large variations in shape of characters due to pen ink, pen width, and accuracy of devices, stroke size and location, effect of physical and mental situation of the writer on writing style, in turns effect the recognition accuracy.
Character recognition problem becomes more challenging even in on-line and off-line in Indian Language Scripts due to several reasons [4].The Indian scripts have character set with large number of characters.The shape of the characters in Indian scripts is more complex and may have modifiers.These modifiers may found at above, below or in-line with the character.The modifiers are the vowel that changes their shapes when they get connected with the consonants.The scripts may have some character pairs that are looks alike and cause difficulty in classification.Some of Indian languages like Devanagari, Bangla are having the specific problem in compound characters where two or more consonants join with each other to form a special character [5], [6].
The research work on character recognition of Devanagari script was started in 1970, where Sinha and Mahabala [7] were presented a syntactic pattern analysis system for the recognition of Devanagari characters (DC).First research report on handwritten Devanagari Characters (HDC) was published in 1977 by Sethi and Chatterjee [8], very few work were reported on OCR in the literature and later on in the next decade S. Kumar and et. al. contributed more in this domain [9].An extensive research work on printed Devanagari Characters and Handwritten Characters was carried out by Bansal [10]- [12] and Reena et.al, [13], [14] respectively.Recognition of characters in different languages using Zernike Moments was reported in [9], [15]- [22].Researchers have proposed Chain Code Histogram and directional information gradient based feature extraction in [22]- [24].A significant contribution www.ijarai.thesai.orgby Arora and et. al., proposed feature extraction techniques namely, intersection, shadow feature, chain code histogram and straight line fitting features in [25]- [28].Deshpande and et. al. [29] has proposed fine classification and recognition of Devanagari characters.S. Kumar in [30] also extracted various features and performed comparison using SVM and MLP.Pal and et al. proposed SVM and MQDF based scheme for recognition of Devanagari Characters [31].U. Pal and T. Wakabayashi [32] given a comparative study of different Devanagari Character recognizers which extracts features based on curvature and gradient information.Sushama Shelke and et.al. [33] presented a novel approach for recognition of unconstrained handwritten Marathi characters.Baheti M.J. and et al. [34] proposed a method based on Affine Invariant Moment (AIM) for Gujarati numerals using k-NN and PCA classifiers.Elastic matching (EM) technique based on an Eigen Deformation (ED) for recognition of handwritten Devanagari characters is proposed by V. Mane and et.al, [35].Recognition of handwritten Bangla compound characters was attempted by U. Pal and et al. [36] using gradient features.S. Shelke and S. Apte have reported work on handwritten Marathi compound characters using multi-stage multi-feature classifier [5], [6].
The literature evidence shows that moment can be considered as potential features for recognition of characters and numerals, which motivate us to enrich the several orthogonal and discrete moment features and test the efficacy of the system for compound characters.While significant advances have been achieved in recognizing Roman-based scripts like English, ideographic characters Chinese, Japanese, Korean, and Arabic, only few works on some of the major Indian scripts like Devanagari, Bangla, Gurumukhi, Tamil, Telugu, are available in the literature [37]- [41].
This paper proposes a novel Zernike moment based feature descriptor followed by SVM and k-NN neural network approach for recognition of Marathi Script Basic and Compound Characters derived from Devanagari.The organization of the paper is as follows: Section 2 deals with properties of Devanagari derived Marathi script.Database designing and Proposed System has been discussed in Section 3. Section 4 deals with Zernike Moment based feature extraction technique.Details about the SVM and k-NN approach used for character recognition system are elaborated in Section 5.The experimental results are discussed in Section 6.Conclusion of the paper is given in Section 7.

II. PROPERTIES OF DEVANAGARI BASIC AND COMPOUND CHARACTER
The basic set of symbols of Devanagari script consists of 12 vowels (or swar), 36 consonants (or vyanjan).The alphabet of modern Devanagari script consists of 14 vowels and 33 consonants also called as basic characters.Writing style of the Devanagari script is from left to right and the concept of upper and lower case is absent in the script.In this script vowel following by a consonant takes a modified shape, these modified shapes are called modified characters.A consonant or vowel following a consonant sometime takes a compound orthographic shape, which we called as a compound character.Compound characters can be combination of two consonants as well as a consonant and a vowel.The compound characters are joined in various ways, by removing vertical line of the character and then to the other characters from the left side like , in another way it is joined side by side or one above the other like .The example of compound characters is shown in Fig ( 1).The split character is half of the basic character which gets connected to other characters.The example of split component of compound character is shown in Fig (2).Compounding of three or four characters also exists in the script.There are about 280 compound characters in the Devanagari script [4], [31].Marathi script is one of the derived script from Devanagari, and it is an official language of Maharashtra.Marathi script consists of 16 vowels and 36 consonants making 52 alphabets.Marathi script is written from left to right, which does not have upper and lower case characters.Similar to Devanagari it has nearly the similar type of compound characters property.However, the occurrence of compound characters in Marathi is found to be about 11 to 12%, whereas in other scripts of Devanagari, it is about 5 to 7% [42].

A. Database
At present no dataset of handwritten compound characters is available for Marathi script derived from Devanagari and hence we have created handwritten compound characters dataset for this work and it has been tested with our proposed system for its recognition, this adds a new contribution in the literature.Details of this database are provided in Table I.

The database of Handwritten Characters of Marathi from
Devanagari script is created for the purpose of this work, which contains basic, compound and split components of compound characters.These data characters were recorded in written form on special paper sheet from 250 different volunteers of different age group.(in between 20-40 year old).The recorded character is then scanned with Flatbed Scanner at 300 dpi.The size of the image of each character is considered 90x90 pixels and it is stored in TIFF image format.

B. Proposed System
In the proposed system, we aim at recognizing handwritten Marathi Devanagari compound characters.This is done by employing Zernike moment feature extraction using SVM and k-NN neural network approach.Fig. (3) shows the basic block diagram of the proposed recognition system, which consists of different phases begin with input character images, preprocessing, pre-classification of the characters, Zernike moment based feature extraction and character recognition.The brief phase wise explanation of the recognition system as follows:

C. Preprocessing
Pre-processing step is performed on the character image to remove the noise from it and also to minimize the variations in character styles.Occasionally, the document while scanning was not clean and so it has produced small dots in the images.Noise generated by the shaded areas and dots must be filtered during preprocessing step.Moreover, the characters in scanned images may found to be skewed, slant and varied in sizes due to cropping.This has been processed in this step.The typical flow of preprocessing step is shown in following Fig (4).

1) RGB to Gray Image:
The database contains color character images.In preprocessing the character images are converted to binary images using rgb2gray utility in MATLAB.
2) Thresholding: This preprocessing step also termed as binarization process and converts the pixels that are above the threshold to white and those which are below the threshold to black.We have set the threshold value Th= 190 to produce good quality binarized images.

RGB to Gray
Threshold or Binarization Filtering Boundary Tracing Normalization Skeletonization 3) Filtering: To remove the noise present in the binarized image filtering has been done.We have used Median filter to remove small black spot in the image and the black shade appearing at the edges.Further, the documents were cropped from the edges.Thresholding and filtering steps often resulted in some broken characters.To rejoin the broken characters, image dilation operation on the filtered images has performed.

4) Boundary Tracing:
Tracing of the boundary identifies the connected components of the characters in the filtered images and stores it in array.To find the connected components, the algorithm starts by traversing the rows of filtered image.It searches for a foreground pixel, and then it marks that pixel and picks it.Similarly, marking of all the neighbors of found pixel in all search directions completed till all the pixels of the possible character have been traversed and marked.Otherwise, it will continue the search in the next row.If the size of any picked connected component is too small than the actual required size, then the algorithm treats that component as noise and neglects that component.

5) Normalization:
During normalization step, slant in characters is removed and resized to a window.Slant is the average divergence of the vertical strokes of the character from the right side of the character.To remove the slant, we used imrotate with angle θ.At each angle the sum of vertical projection of the transformed characteris calculated.The angle with maximum sum of vertical projection is used to finally perform shear transformation on the character and estimated the slant angle.

6) Skeletonization:
In skeletonization, the thickness of the character is reduced to one-pixel character bound.We have applied the thinning operation on the character and taken the precaution, do not to break the character.These operations were used not only to find the vertical bar and position of vertical bar in the character, but also to extract endpoints, junction in the character.This features helps in the pre-classification of the characters.A sample output of the preprocessed character is shown in Fig. (5).

D. Pre-classification
Character Images after preprocessing stage consists of some global and local features.The global feature consists of presence of vertical line, position of vertical bar in the (IJARAI) International Journal of Advanced Research in Artificial Intelligence, www.ijarai.thesai.org

IV. ZERNIKE MOMENT BASED FEATURE EXTRACTION
Zernike moments are complex number by which an image is mapped on to a set of two-dimensional complex Zernike polynomials.The magnitude of Zernike moments is used as a rotation invariant feature to represent a character image patterns [43].Zernike moments are a class of orthogonal moments and have been shown effective in terms of image representation.The orthogonal property of Zernike polynomials enables the contribution of each moment to be unique and independent of information in an image.A Zernike moment does the mapping of an image onto a set of complex Zernike polynomials.These Zernike polynomials are orthogonal to each other and have characteristics to represent data with no redundancy and able to handle overlapping of information between the moments [26].Due to these characteristics, Zernike moments have been utilized as feature sets in applications such as pattern recognition [27] and content-based image retrieval [28].These specific aspects and properties of Zernike moment are supposed to found to extract the features of compound handwritten characters.Teague [16] has introduced the use of Zernike moments to overcome the shortcomings of information redundancy due to geometric moments.The Zernike moment were first proposed in 1934 by Zernike [44].Their moment formulation appears to be one of the most popular, outperforming the alternatives [45] (in terms of noise resilience, information redundancy and reconstruction capability).Complex Zernike moments [46] are constructed using a set of complex polynomials which form a complete orthogonal basis set defined on the unit disc (x 2 +y 2 ) ≤ 1.They are expressed as A pq .Two dimensional Zernike moments: where m = 0, 1, 2, ..., ∞ and defines the order, f (x, y) is the function being described and * denotes the complex conjugate.While n is an integer (that can be positive or negative) depicting the angular dependence, or rotation, subject to the conditions: and A * mn = A m,−n is true.The Zernike polynomials [20] V mn (x, y)V mn (x, y) Zernike polynomial expressed in polar coordinates are: where (r, θ) are defined over the unit disc, j = √ −1 and R mn (r) and is the orthogonal radial polynomial, defined as R mn (r) Orthogonal radial polynomial: where R mn (r) = R m,−n (r) and it must be noted that if the conditions in Eq. 2 are not met, then R mn (r) = 0.The first six orthogonal radial polynomials are: So for a discrete image, if P xy is the current pixel then Eq.
(1) becomes: To calculate the Zernike moments, the image (or region of interest) is first mapped to the unit disc using polar coordinates, where the centre of the image is the origin of the unit disc.Those pixels falling outside the unit disc are not used in the calculation.The coordinates are then described by the length of the vector from the origin to the coordinate point, r, and the angle from the x axis to the vector r.r Polar co-ordinate radius, θ.θ Polar co-ordinate angle, by convention measured from the positive x axis in a counter clockwise direction.The mapping from Cartesian to polar coordinates is: where, However, tan −1 in practice is often defined over the interval , so care must be taken as to which quadrant the Cartesian coordinates appear in.Translation and scale invariance can be achieved by normalising the image using the Cartesian moments prior to calculation of the Zernike moments [47].Translation invariance is achieved by moving the origin to the image's COM, causing m 01 = m 10 = 0. Following this, scale invariance is produced by altering each object so that its area (or pixel count for a binary image) is m 00 = β, where β is a predetermined value.Both invariance properties (for a binary image) can be achieved using : where a = β m 00 (10) and h(x, y) is the new translated and scaled function.The error involved in the discrete implementation can be reduced by interpolation.If the coordinate calculated by Equation 58does not coincide with an actual grid location, the pixel value associated with it is interpolated from the four surrounding pixels.As a result of the normalization, the Zernike moments |A 00 | and |A 11 | are set to known values.|A 11 | is set to zero, due to the translation of the shape to the center of the coordinate system.This however will be affected by a discrete implementation where the error in the mapping will decrease as the shape (being mapped) size (or pixel-resolution) increases.|A 00 | is dependent on m 00 , and thus on β Further, the absolute value of a Zernike moment is rotation invariant as reflected in the mapping of the image to the unit disc.The rotation of the shape around the unit disc is expressed as a phase change, if φ is the angle of rotation, A R mn is the Zernike moment of the rotated image and A mn is the Zernike moment of the original image then: Moment based features are extracted from the each zone of the scaled character bitmapped image.The image is partitioned into zone and features are extracted from each zone.In this paper Zernike moments based feature extraction is proposed for off-line Devnagari Handwritten Basic and Compound Character.To get the feature set, at first, the image is segmented to 30 x 30 blocks, and partitioned as feature set as follows and the List of the first 8 order Zernike moments is given in Table IV.
Feature set 1: Fig. 7 (a) is considered as a whole character image.

V. CLASSIFICATION AND RECOGNITION
The classification stage is the decision making part of a recognition system and it uses the features extracted in the previous stage.We have used Support Vector Machine (SVM) and k-NN for the purpose of Classification and recognition.www.ijarai.thesai.org

A. Support Vector Machine (SVM)
The support vector machine (SVM) is capable of learning and to achieve good generalization performance.If SVM is given a finite amount of training data, it is striking a balance between the goodness of fit on a given training and testing datasets.The SVM shows high ability to achieve error-free recognition.With this concept as the basis, support vector machines have proved to achieve good generalization performance with no prior knowledge of the data.The SVM is nonlinearly map the input data onto a higher dimensional feature space and determines a separating hyper plane with maximum margin between the two classes.A support vector machine is a maximal margin hyper plane in feature space built by using a kernel function.This results a nonlinear boundary in the input space.The optimal separating hyper plane can be determined without any computations in the higher dimensional feature space by using kernel functions in the input space [48].
The SVM produces a model (based on the training data) which predicts the target values of the test data features.Given a training set of instance-label pairs (x i , y i ), i = 1, 2, ...l where x i ∈ R n and y ∈ {1, −1}, the SVM require the solution of the following optimization problem: Here the training vectors x i is mapped into a higher dimensional space by the function φ.SVM finds the optimal hyperplane which maximizes the distance, or more specifically the margin, between the nearest examples of both the classes.
These nearest examples are called as support vectors (SVs).
Where, C > 0 is the penalty parameter of the error term.Furthermore, K(x i x j ) ≡ φ(x i ) T φ(x j ) is called the kernel function.We have used the radial basis function (RBF) kernel in our work given by A search is applied to find the value of γ which is parameter of RBF.The value of both variance parameter are selected in the range of (0, 1) for gamma γ and (0, 1000) for cost (c) for support vectors and examines the recognition rate.

B. k-Nearest Neighbor (k-NN) Classifier
In the k-NN based classification similar observations belongs to similar classes.The test numeral feature vector is classified to a class, depending upon nearest neighbor distance.The nearest factor is based on minimum Euclidean Distance.Prior features are used to decide the k-nearest neighbor of the given feature vector.The most common similarity measure for k-NN classification is the Euclidian distance metric, defined between feature vectors as: Where, f represents the number of features.The less distance values represent greater similarity [18], [19].

VI. EXPERIMENTS AND RESULTS
The performance evaluation using SVM and k-NN based classification has be performed on the database of Handwritten Devanagari Marathi Characters.The training dataset consists of 9600 basic character, 9000 Compound and 3000 split component of compound characters.The testing images are preprocessed and pre-classified as discussed in Section III (A).This gives 30x30 blocks segmented images of each character.Depending on the zones decided in the preprocessing step we have classified the feature sets as discussed in Section V.Then, moment based features are extracted from the each zone of the scaled character bitmapped image.Table IV

VII. CONCLUSION
This paper presents a system for offline handwritten simple and compound character recognition for Marathi derived Devanagari script.Huge compound a basic and compound character dataset is collected from various age groups of writers and which has been used for database creation and named as KVKPR2013.This database further utilize for classification and recognition purpose specifically for compound characters.www.ijarai.thesai.orgPrior to feature extraction the character is pre-classified into three categories using structural features.Various complex features of compound characters from the database has been created through Zernike moment approach and implemented successfully for its classification and recognition under SVM and k-NN approach.Zernike moment feature for Devnagari has given better result for compound character.The proposed system gives improved recognition rate of 0.37% than other handwritten character recognition system.The system has been evaluated on a huge amount of Handwritten Character Database i.e. 12000 basic and 15000 compound character dataset created in our laboratory.Since, no work has been reported on Devanagari Compound Character recognition in the last decade, the system handles the problem with structural and statistical features of compound character.The system handles

Fig. ( 3 )Figure 3 .
Fig.(3) shows the basic block diagram of the recognition system.It shows that the handwritten Devanagari character are scanned and a digitized document is obtained.From it a particular character is selected, the image character is cropped and resized into fix row and columns.Each block of the recognition system is elaborated in following sections.

Figure 4 .
Figure 4. Steps in Preprocessing of Image

Figure 5 .
Figure 5. Character Images after Preprocessing Structural ClassificationThe local features are detected on the basis of the end points of the character.We have firstly partitioned the character into 3x3 image i.e. 9 quadrants and extracted the end points and junctions in each individual block as shown in Fig.(6).

Figure 6 .
Figure 6.Presence of End Points in partition block of Character

Figure 8 .Figure 9 .
Figure 8. Recognition Rate of Basic Characters through SVM and k-NN Classifiers

Table II .
CLASSIFICATION OF DEVANAGARI BASIC CHARACTER

Table IV .
THE FIRST 8 ORDER ZERNIKE MOMENTS shows first eight ordered Zernike moments extracted from each character using equation Zernike Moment.The Zernike moments are further divided into five folded cross validation parameters for each Devanagari Marathi Basic and Compound Characters shown in Table II and III.We experiment with different 2 values of the gamma (γ) and cost function c.The value of gamma (γ) = 0.5 and cost (c) = 1000.For the value of k-NN with K=3 is selected.The results are promising for both basic and compound character.The overall recognition accuracy is 98.37% for SVM, and 95.82% for k-NN for basic character and 98.32% for SVM and 95.42% for k-NN towards Compound character.The results on some sample are placed in Table V and VI.The performance of the proposed method in terms of the recognition rate is compared with the other reported work and is given in Table VII.On the basis of the Table VII our proposed method shows the enhancement in the recognition rate i.e. 98.37% for basic characters and 98.32% for compound characters.

Table VI .
42COGNITION RESULT FOR COMPOUND CHARACTER Average Recognition Rate SVM: 98.32 and k-NN: 95.42

Table VII .
COMPARISON OF RESULTS OF PROPOSED METHOD WITH OTHER METHODS IN LITERATURE