A Review on Feature Extraction and Feature Selection for Handwritten Character Recognition

—The development of handwriting character recognition (HCR) is an interesting area in pattern recognition. HCR system consists of a number of stages which are preprocessing, feature extraction, classification and followed by the actual recognition. It is generally agreed that one of the main factors influencing performance in HCR is the selection of an appropriate set of features for representing input samples. This paper provides a review of these advances. In a HCR, the set of features plays as main issues, as procedure in choosing the relevant feature that yields minimum classification error. To overcome these issues and maximize classification performance, many techniques have been proposed for reducing the dimensionality of the feature space in which data have to be processed. These techniques, generally denoted as feature reduction, may be divided in two main categories, called feature extraction and feature selection. A large number of research papers and reports have already been published on this topic. In this paper we provide an overview of some of the methods and approach of feature extraction and selection. Throughout this paper, we apply the investigation and analyzation of feature extraction and selection approaches in order to obtain the current trend. Throughout this paper also, the review of metaheuristic harmony search algorithm (HSA) has provide.


INTRODUCTION
Handwriting Character Recognition (HCR) is the ability of a computer to receive and interpret intelligible handwritten input then analyzed to many automated process system.Generally, HCR can be divided into three steps namely preprocessing, feature extraction and classification (recognition).Preprocessing stage is to produce a clean character image that can be used directly and efficiently by the feature extraction stage.Feature extraction stage is to remove redundancy from data.Classification stage is to recognize characters or words.This paper only concentrates in the feature extraction stage.
HCR is a challenging problem since there is a variation of same character due to the change of fonts and sizes.The differences in font types and sizes make the recognition task difficult and resulting the recognition of character process becomes not good.
Feature extraction in HCR is a very important field of image processing and object recognition.Fundamental component of characters are called features.The basic task of feature extraction and selection is to find out a group of the most effective features for classification; that is, compressing from high-dimensional feature space to low-dimensional feature space, so as to design classifier effectively [1].
Based on the statement above, this study was conducted to review and examine the approach as extraction and selection method for feature in HCR.This study also was conducted to investigate a current trend on approach of feature extraction and selection.This paper is divided to four sections.Section I describes introduction.Section II describes overview on HCR.Section III describes overview on feature extraction followed by current trend on feature extraction in next section IV.For Section V and VI, an overview on feature selection and current trend on feature selection were describes respectively.In Section VII the discussion and future work were discussed briefly.The last section shows conclusion of the whole content.

II. EASE OF USE
The Handwriting recognition is defined as the transformation of a language into symbolic representation from its visual marks [3].The goal of handwriting recognition is to interpret input where it can be recognition of handwritten sentences, words or characters.Character recognition is a part of a handwriting recognition problem.The development of handwriting character recognition (HCR) is an interesting area in pattern recognition or sometimes specifically referred as optical character recognition (OCR), According to Arica and Yarman-Vural in their review of character recognition (CR), the CR systems have evolved in three stages [4].The early stage is in the period of 1900-1980.The beginning of OCR was said to have started with the objective of developing reading machines for the blind.www.ijacsa.thesai.orgIn these early systems of automatic recognition of characters, area of concentrations are either in machine printed text or upon small sets of well-distinguished handwritten text or symbols.In the second period of development in the era of the 1980s to 1990s, the explosion of information technology has helped a rapid growth in the area of OCR.The CR research was focused basically on the shape recognition techniques without using any semantic information.Although an upper limit in the recognition rate was achieved, it was not sufficient in many practical applications.The 1990s and onwards are referred as the advancements era [5], where the real progress in OCR systems has been achieved.In the beginning of this period, image processing and pattern recognition techniques were efficiently combined with artificial intelligence (AI) methodologies.Complex algorithms for character recognition systems were developed.There is, however, still a long way to go in order to reach the ultimate goal of machine simulation of fluent human reading, especially for unconstrained on-line and off-line handwriting [4].
HCR can be divided into two categories namely, online and off-line.On-line character recognition involves the identification of characters while they are written [6] and deals with time ordered sequences of data, pen up, and down movement and pressure sensitive pads that record the pen"s pressure and velocity [7].On the other hand, off-line character recognition involves the recognition of already written character patterns in scanned digital image.The off-line character recognition is more complex and requires more research compared to on-line character recognition.
HCR is a very complex task since different writing styles and handwriting variability can produce extreme differences in characters [8,9].The handwriting development is more sophisticated, found in various kinds of handwritten character such as digit, numeral cursive script, including English, Tamil, Chinese, Bangla, Devanagari, Persian, Arabic and others.The problem and difficulties of handwriting recognition task can be classified into four categories which are nature of the handwriting signals, handwriting styles, writer dependency and vocabulary sizes [10].
Most current approaches to HCR which consist of three main stages namely pre-processing, feature extraction and classification.

A. Preprocessing
The preprocessing stage aims to extract the relevant textual parts and prepares them for segmentation and recognition.The main objectives of preprocessing are noise reduction, normalization of data and compression in the amount of information to be retained [11].In noise reduction alone there are hundreds of available techniques which can be categorized into three major groups of filtering, morphological operations and noise modeling [12] [13].Filters can be designed for smoothing [14], sharpening [15], thresholding [16], removing slightly textured background [17] and contrast adjustment processes [18].Various morphological operations can be designed to connect broken strokes [19], decompose the connected strokes [20], smooth the contours, prune the wild points, thin the characters [21], and extract boundaries [22].Preprocessing stage is to produce a clean character image that can be used directly and efficiently by the feature extraction stage.

B. Feature Extraction
Feature extraction stage is to remove redundancy from data.Before building the feature extraction procedure, there are two important problems must be clarified which are feature extraction and feature selection.Feature extraction is related with which technique will be used to extract features from the image character as representations.On the other hand, in feature selection, the most relevant features to improve the classification accuracy must be searched.This paper only concentrates in the feature extraction and selection stage.The next section wills discuss the feature extraction briefly.

C. Classification
Classification stage is to recognize characters or words.After features that represent the raw input data are extracted, classification stage would use the data to recognize the feature class based on the properties in the features.There are many techniques available in the classification method that can be applied.The classification method can be traced from template matching [23][24][25], statistical approach [26][27][28], syntactic [29] and neural network [30].

III. AN OVERVIEW ON FEATURE EXTRACTION
Feature extraction can be defined as extracting the most representative information from the raw data, which minimizes the within class pattern variability while enhancing the between class pattern variability.For this purpose, a set of features are extracted for each class that helps distinguish it from other classes, while remaining invariant to characteristic differences within the class [31].A good survey on feature extraction methods for character recognition can be found in [32].
Generally there are two kinds of features, statistical features and structural features [33][34], [35][36].Statistical features contain pixel density, moment, mathematical transformation and so on.Structural features conclude stroke, contour, number of bifurcation points, number of circles and so on.Most researchers agree that statistical features could be obtained quickly using easy methods and could perform good recognition results especially in closed testing data, but it could also be easily affected by the deformation of symbols, thus could not be expanded to more applications.Structural features are more conformed to the intuitive thinking of human mind, thus are more robust for the deformation of symbols.But they usually rely on human summarized rules for the recognition algorithm.When new symbols are introduced into an application, they need more cost to revise the algorithm.[37].

A. Statistical Features
Representation of a document image by statistical distribution of points takes care of style variations to some extent.Although this type of representation does not allow the reconstruction of the original image, it is used for reducing the dimension of the feature set providing high speed and low complexity.The major statistical features mentioned below are used for character representation.www.ijacsa.thesai.org Zoning: The frame containing the character is divided into several overlapping or non-overlapping zones.The densities of the points or some features in different regions are analyzed [38].
 Crossings and Distances: A popular statistical feature is the number of crossing of a contour by a line segment in a specified direction.The character frame is partitioned into a set of regions in various directions and then features of each region are extracted.
 Projections: Characters can be represented by projecting the pixel gray values onto lines in various directions.This representation creates one-dimensional signal from a two dimensional image, which can be used to represent the character image [39].

B. Structural Features
Structural features are based on topological and geometrical properties of the character.Various global and local properties of characters can be represented by geometrical and topological features with high tolerance to distortions and style variations.This type of representation may also, encode some knowledge about the structure of the object or may provide some knowledge as to what sort of components make up that object.Various topological and geometrical representations can be grouped in four categories:  Extracting and Counting Topological Structures: In this category, lines, curves, splines, extreme points, maxima and minima, cups above and below a threshold, openings, to the right, left, up and down, cross (X) points, branch (T) points, line ends (J), loops (O), direction of a stroke from a special point, inflection between two points, isolated dots, a bend between two points, horizontal curves at top or bottom, straight strokes between two points, ascending, descending and middle strokes and relations among the stroke that make up a character are considered as features [40][41].Avoid combining SI and CGS units, such as current in amperes and magnetic field in oersteds.This often leads to confusion because equations do not balance dimensionally.If you must use mixed units, clearly state the units for each quantity that you use in an equation.

 Measuring and Approximating the Geometrical
Properties: In this category, the characters are represented by the measurement of the geometrical quantities such as, the ratio between width and height of the bounding box of a character, the relative distance between the last point and the last y-min, the relative horizontal and vertical distances between first and last points, distance between two points, comparative lengths between two strokes, width of a stroke, upper and lower masses of words, word length curvature or change in the curvature [42][43][44][45].
 Coding: One of the most popular coding schemes is Freeman's chain code.This coding is essentially obtained by mapping the strokes of a character into a 2dimensional parameter space, which is made up of codes.There are many versions of chain coding.The character frame is divided to left-right sliding window and each region is coded by the chain code [44], [46][47][48].
 Graphs and Trees: Words or characters are first partitioned into a set of topological primitives, such as strokes, holes, cross points etc.Then, these primitives are represented using attributed or relational graphs.
Image is represented either by graphs coordinates of the character shape or by an abstract representation with nodes corresponding to the strokes and edges corresponding to the relationships between the strokes.Trees can also be used to represent the words or characters with a set of features, which has a hierarchical relation [49][50][51].

IV. CURRENT TRENDS IN FEATURE EXTRACTION
Instead of focusing on feature vector based on a single representation of a character, it is a trend now of combining different types of features extracted from different representations of the same character.The advantage of combining, and harnessing, such different kinds of features is that it can offer wider range of identification clues to help improve the accuracy of recognition.For example, Heutte et al. [52] combine different statistical and structural features for recognition of handwritten characters.They construct a 124variable feature vector comprising following seven families of features: 1) intersection of the character with horizontal and vertical straight lines, 2) invariant moments, 3) holes and concave arcs, 4) extremas, 5) end points and junction points 6) profiles, and 7) projections.Aurora et al. [53] combine different feature extraction techniques such as intersection based features, shadow features, chain code and curve fitting features for Indian Devnagari language script.Kimura et al. [54] propose a genetic algorithm based strategy for finding a suitable combination of features from a large pool of features with the objective criteria to minimize the classification error.Other combining or hybrid method for features extraction shows in TABLE 1.  [66] Multi Zoning of the character array (i.e., dividing it into over-lapping or non-overlapping regions, computing the moments of the black pixels of the character, the n-tuples of black or white or joint occurrence, the characteristic loci, and crossing distances) Likforman-Sulem et al. (2012) [67] Structural and statistic features In In HCR, feature selection is a technique to select the features that is relevant for classification stage.The goal of feature selection (FS) is that of reducing the number of features to be considered in the classification stage.This task is performed by removing irrelevant or noisy features from the whole set of the available ones.Feature selection is accomplished by reducing as much as possible the information loss due to the feature set reduction: thus, at list in principle, the selection process should not reduce classification performance.The feature selection process consists of three basic steps (see Fig. 1): a search procedure, a subset evaluation and a stopping criterion.A typical search procedure uses a search strategy for finding the optimal solution, according to a given subset evaluation criterion previously chosen.The search procedure is repeated until a stopping criterion is satisfied.The feature selection problem implies the selection, from the whole set of available features, of the subset allowing the most discriminative power.The choice of a good feature subset is crucial in any classification process because of various reasons.If the considered feature set does not include all the information needed to discriminate samples belonging to different classes, the achievable performance may be unsatisfactory, regardless of the learning algorithm effectiveness.On the other hand, the size of the feature set used to describe the samples determines the search space to be explored during the learning phase.Therefore, irrelevant and noisy features make the search space larger, increasing the complexity of the process.Finally, the computational cost of classification depends on the number of features used to describe the patterns.
When the cardinality N of the candidate feature set Y is high, the problem of finding the optimal feature subset, according to a given evaluation function, becomes computationally intractable because of the resulting exponential growth of the search space, made of all the 2N possible subsets of Y. Therefore, heuristic algorithms become necessary for finding near-optimal solutions [75].Such algorithms require both the definition of a strategy for selecting feature subspaces and the definition of a function for evaluating the goodness of each selection performed, i.e. how well classes result separated in the selected feature subspace.
As regards evaluation methods, those proposed in the literature can be divided in two wide classes: i) filter methods, which evaluate a feature subset independently of the classifier and are usually based on some statistical measures of distance between the samples belonging to different classes.ii) wrapper methods, which are based on the classification results achieved by a given classifier.Filter methods are usually faster than wrapper ones, as these latter require a new training of the used classifier at each evaluation.Moreover, filter-based evaluations are more general, as they exploit statistical information about the data, while wrapper methods are dependent on the classifier used.
As for search strategies, many heuristic algorithms have been proposed in the literature for finding near-optimal solutions: Greedy selection [76], branch and bound (B&B) [77], floating search [77].These algorithms use greedy stepwise strategies that incrementally generate feature subsets by adding the feature that produces the highest increment of the evaluation function.Since these algorithms do not take into account complex interactions among several features, in most of the cases they lead to sub-optimal solutions.An alternative way to cope with the search problem is that of using genetic algorithms (GAs), which have demonstrated to be an effective search tools for finding near-optimal solutions in complex and non-linear search spaces [78].These properties, make GA"s suitable also to solve feature selection problems [79][80].Comparative studies have demonstrated the superiority of GA"s in feature selection problems involving large numbers of features [81].

VI. CURRENT TREND IN FEATURE SELECTION
Recently, interest in feature selection has been on the increase with the abundance of algorithms derived.The feature selection algorithm can be classified into two namely heuristic and metaheuristic approaches.Many heuristic algorithms have www.ijacsa.thesai.orgbeen proposed in the literature for finding near-optimal solutions [76][77].GA is a one of metaheuristic approach and have been widely used to solve feature selection problems [82][83][84][85][86][87].We have reviewed the introduction, concept and stages in the development of HCR.The goal of handwriting is to identify input characters or image correctly then being analyzed to many automated process system.A handwritten character recognition system consists of a number of preprocessing steps, feature extraction, and classification.This paper only concentrates on feature extraction and selection method.
One of the most important phases in successfully achieving character recognition is the task of feature extraction and selection.Feature extraction is related with which technique will be used to extract features from the image character as representations.On the other hand, in feature selection, the most relevant features to improve the classification accuracy must be searched.
Generally there are two kinds of features extraction, statistical features and structural features.We have investigated and analyzed the method used by researchers to extract the feature for feature extraction.Based on the analysis, recently the most method used by researchers for feature extraction are combining different types of features extracted from different representations of the same character Instead of focusing single representation of a character.It is a trend now of combining different types of features extracted as shown in the Table 1.
On the other hand, the main goal of feature selection is to choose a number of features from the extracted feature set that yields minimum classification error.Meanwhile the feature selection is a technique to select the feature that is relevant to for classification stage.This task is performed by removing irrelevant or noisy features from the whole set of the available ones.Generally, feature selection is finding a subset of features which improve the recognition accuracy.This process has two main phases.First phase includes a search strategy to select one feature subset among all possible, the second phase includes a method for evaluating selected subsets with assigning a fitness value to them generally divided in two: filter and wrapper method.
We have investigated and analyzed the method used by researchers for feature selection.There are many methods or approaches as search strategy for feature selection used by researcher.We were divided in two categories: heuristic and metaheuristic approaches.Based on analysis, recently many researchers used metaheuristic approach rather than heuristic approach for feature selection as shown in TABLE II.
Nowadays, the metaheuristic algorithms have taken an important place in the optimization fields.Metaheuristic algorithm is an approach to solve the optimization problems and to find the best of all possible of solutions.There are many metaheuristic algorithms like genetic algorithm (GA), simulated annealing (SA), particle swarm optimization (PSO) and others.As shown in Table 2, we can see that a researcher used metaheuristic approach i.e.Artificial Bee Colony (ABC) [94], GA and ACO [88] and Axiomatic Fuzzy Set (AFS) [97].
As a future work, we considered to propose a feature selection method based on HSA.To the best our knowledge, the HSA have not implemented for feature selection problem yet.Due to the literature study, HS possess several advantages over traditional optimization techniques [121] such as:  HS is a simple population based metaheuristic algorithm and does not require initial value settings for decision variables;  HS uses stochastic random searches;  HS does not need derivation information;  HS has few parameters; www.ijacsa.thesai.org HS can be easily adopted in various types of optimization problems [122].
These features increase the flexibility of the HS algorithm in producing better solutions.HS were applied successfully in many areas such as computer science, electrical engineering, civil Engineering, mechanical engineering and biomedical application as shown in Table III.So, based on all this consideration we will use HSA for feature selection in HCR as our future work.

TABLE III .
SUMMARIZATION OF APPLICATION OF HARMONY SEARCH ALGORITHM