Data fusion based framework for the recognition of Isolated Handwritten Kannada Numerals

combining classifiers appears as a natural step forward when a critical mass of knowledge of single classifier models has been accumulated. Although there are many unanswered questions about matching classifiers to real-life problems, combining classifiers is rapidly growing and enjoying a lot of attention from pattern recognition and machine learning communities. For any pattern classification task, an increase in data size, number of classes, dimension of the feature space and interclass separability affect the performance of any classifier. It is essential to know the effect of the training dataset size on the recognition performance of a feature extraction method and classifier. In this paper, an attempt is made to measure the performance of the classifier by testing the classifier with two different datasets of different sizes. In practical classification applications, if the number of classes and multiple feature sets for pattern samples are given, a desirable recognition performance can be achieved by data fusion. In this paper, we have proposed a framework based on the combined concepts of decision fusion and feature fusion for the isolated handwritten Kannada numerals classification. The proposed method improves the classification result. From the experimental results it is seen that there is an increase of 13.95% in the recognition accuracy.


I. INTRODUCTION
Achieving the best possible classification performance for a given problem domain has become the ultimate goal of designing the pattern recognition systems.This objective traditionally led to the development of different classification schemes for any pattern recognition problem to be solved.In the recent past, studies have been done to obtain the optimal feature set and classifier set.
Features play a very important role for any pattern classification task.A set of bad features can deteriorate the performance of a good classifier.With the increase in noise and dimensionality, feature selection becomes an essential step.
A feature that is having too much of confusing (contradictory) information than the rest of the set should be avoided as these features confuse the classifiers.To reduce the noise in the data, features which are weakly correlated to the class information should be removed."Curse of dimensionality" is also a big motivation for feature selection.Too many features increase the computational time without any significant change in the performance during the training phase [1].
In practical classification applications, if the number of classes and multiple feature sets for pattern samples are given, a desirable recognition performance can be achieved based on these sets of features using data fusion [2].Fusion strategies are mainly classified into information fusion (low-level fusion/pixel level fusion), feature fusion (intermediate-level fusion), and decision fusion (high-level fusion) [3].
Information fusion combines several sources of raw data to produce new raw data that is expected to be more informative and synthetic than the inputs.Feature fusion deals with the selection and combination of features to remove redundant and irrelevant features.If two features have similar or nearly similar distribution, one of them is redundant.A feature is said to be irrelevant if it correlates poorly with the class information.The final set of features is fused together to obtain a better feature set, which is given to a classifier to obtain the final result.Feature fusion is an advancement of information fusion.Decision fusion uses a set of classifiers to provide a better and unbiased result.The classifiers can be of same or different type and can also have same or different feature sets.
There are different classifiers such as KNN, SVM, ANN etc., and a single classifier may not be well suited for a particular application.Hence a set of classifiers are merged together by various methods to obtain the final output [1].It has been found that a consensus decision of several classifiers can give better accuracy than any single classifier [4].Therefore, combining classifiers has become a popular research area during recent years.The goal of combining classifiers is to form a consensus decision based on opinions provided by different base classifiers.Combined classifiers have been applied to several classification tasks, for example to the recognition of faces, handwritten characters identification, and fingerprint verification [5,6].
In this paper we have proposed a framework which combines both the concepts of feature fusion and decision fusion.First, a feature selection method is presented to find the best feature set from a set of features .Next the best feature set is applied on two training data sets with different sizes and of different complexity to know the effect of the training dataset size on the recognition performance of a feature extraction www.ijacsa.thesai.orgmethod and the classifier used.After finding the best feature set, this feature set is combined with the other feature sets to form fused feature sets(union vector).These fused feature sets are classified using K-NN classifier and the fused feature set with highest recognition accuracy is chosen for the level.Lastly, the decision fusion method is used for a better classification results.Here, we have applied the proposed framework for the recognition of isolated handwritten Kannada character recognition.Results are presented using our own built handwritten Kannada numeral datasets.
Over the last few years, extensive research is being carried out on Handwritten Character Recognition (HCR) systems in the academic and production fields.A Handwritten Character Recognition system can either be online or offline.The process of finding letters and words present in a digital image of handwritten text is called off-line handwritten recognition.A number of methods of recognition of English, Latin, Arabic, Chinese scripts are excellently reviewed in [7,8,9,10].A HCR system has various applications such as being used as a reading aid for the blind, applications involving bank cheques, automatic pin code reading for sorting of postal mail.
A lot of work has been done on the recognition of printed characters of Indian languages.On the other hand, attempts made on the recognition of handwritten characters are few.Most of the research in this area is concentrated on recognition of off-line handwritten characters for Devanagari and Bangla scripts.From the literature survey it is seen that there is a lot of demand for character recognition systems for Indian scripts and an excellent review has been done on the OCR for Indian languages [11].A Detailed Study and Analysis of OCR Research on South Indian Scripts can be seen in [12].
A method for the recognition of isolated Devanagari handwritten numerals based on Fourier descriptors has been proposed by Rajput and Mali in [13].Another method proposed in [14] involves computing the zone centroid and further dividing the image into equal zones.The average distance from the zone centroid to each pixel present in the zone is computed.The aforementioned process is repeated for all the zones present in the image of the numeral.At last, n such features are extracted and considered for classification and recognition.F-ratio Based Weighted Feature Extraction for Similar Shape Character Recognition for different scripts like Arabic/Persian, Devnagari English, Bangla, Oriya, Tamil, Kannada, Telugu etc can be seen in [15].
The key factor in achieving high recognition rate in character/numeral recognition systems is the selection of a suitable feature extraction method.A survey on the feature extraction methods for character recognition is reviewed in [16].
Curvelet transform is used as one of the feature extraction methods in [17, 18, and 19].Here the curvelet transform function is applied on the given image and the coefficients are obtained.The obtained coefficients are used in the feature vector for that particular image.
Literature survey shows that the automatic recognition of handwritten digits has been the subject of intensive research during the last few decades.Digit identification is very important in applications such as interpretation of ID numbers, Vehicle registration numbers, Pin Codes, etc.In Indian context, it is evident that still handwritten numeral recognition research is a fascinating area of research to design a robust optical character recognition (OCR), in particular for handwritten Kannada numeral recognition.
The paper is organized as follows: in section II, we discuss the properties of the Kannada numerals and their complexity.Section III deals with generation of handwritten Kannada numeral datasets.The need for feature selection is presented in section IV.Section V details the proposed methodology used for the recognition.The experimental results and discussions are shown in section VI followed by the paper's conclusion in section VII.

II. KANNADA NUMERALS AND THEIR COMPLEXITY
Kannada or Canarese, the official language of the southern Indian state of Karnataka is described as 'sirigannada'.Kannada has now received the Classical Language status in India.It has a history of more than 1500 years and is also spoken in the neighboring states of Tamilnadu, Andhra Pradesh and Maharastra.The expatriate population of Kannada origin is also present in USA, Australia, Asia Pacific and Africa.The Kannada speaking population is of no more than 70 million.The script includes 10 different Kannada numerals of the decimal number system as in Table I.
Kannada characters have more complex structure and curved in shape.There are a large number of similar character groups (Table II).
The challenging part of Kannada handwritten character recognition is the distinction between the similar shaped components.Sometimes a very small part is the distinguishing mark between two characters or numerals.These small distinguishing parts increase the recognition complexity and decrease the recognition accuracy.The style of writing characters is highly different and they come in various sizes and shapes (Fig. 1).Same character may take different shapes and conversely two or more different character of a script may take similar shape.Kannada lacks a standard test bed of character images for OCR performance evaluation.A major obstacle to research on handwriting character recognition of Indian scripts is the non existence of standard/benchmark databases.From the literature review, it can be seen that most of the experimentation is reported on the basis of small databases collected in laboratory environments.Several standard databases such as NIST, MNIST, CEDAR and CENPARMI are available for Latin numerals [20].But we can say to the best of our knowledge that, only two such standard databases namely databases for Bangla and Devanagiri scripts are available in the Indian Context until now.Hence we have made an attempt to create a database of our own for the experimentation in reference with [20].
The handwritten Kannada numeral database consists of two datasets, original and synthetic.Each dataset is randomly divided into respective training and test sets in the ratio of 8:2.

A. Generation of original dataset
Samples for the database have been collected using plain A4 paper and a tabular form designed for data collection purpose so that both constrained and unconstrained samples could be collected as shown in Fig. 2. The only restriction imposed on the writers was to write one numeral in one box in case of the tabular form.These samples have been collected from a wide spectrum of population of various age groups which includes students from school and college, housewives and employees.Some of the samples were collected from the people with no knowledge of the Kannada language.There was no restriction on the type of pen and color of the ink used.
The collected documents were scanned at 300 dpi using a HP flatbed scanner and stored in jpg format.The individual numerals were extracted manually from the scanned documents and labeled.The images were not size normalized.Thus 100 different samples of each numeral were created with the total of 1000 samples.This dataset is considered as the dataset 1 for our experimentation.

B. Generation of Synthetic dataset.
Hand-printed patterns come from different writers and possess great variations.Recognition of hand-printed patterns is difficult when compared to machine-printed patterns.Some factors that complicate the recognition process in hand-printed character recognition in noise-less situations are discussed in [21].Various strategies are followed in a recognition system in order to reduce the variability caused due to slant writing.Some of them are as follows: 1) the slanted word/character is normalized before recognition2) the slant is compensated during training process by having a dataset covering as many as slant angles as possible 3) slant invariant feature extraction method is used.
In order to increase the dataset size with as many as slant angles, we generated synthetic data.Synthetic data was generated by subjecting the original data to the two transformations namely blurring and rotation, thereby increasing their number by a factor of 10 [20].
In the first step all the samples in the original dataset were blurred by applying a Gaussian blurring kernel.Thus, the volume of the database was increased by 2(with and without blurring).
In the next step, both the blurred and original images were rotated by an angle of -5 o ,-10 o , +5 o and +10 o .Thus, the volume of the dataset was increased five times.Thus the total increase in volume was ten times the original number i.e., for each www.ijacsa.thesai.orgoriginal sample 9 synthetic images were generated (as shown in the Fig. 3) and hence a total of ten images were obtained for each sample taking the total number of samples to 10,000.This dataset is considered as the dataset 2 for our experimentation.

IV. NEED OF FEATURE SELECTION
Feature selection is well-known problem and has been much explored specially in the areas of data mining and content based image retrieval.The problem deals with finding an appropriate subset of features from a given set, for some particular application domain, to improve the accuracy.This involves finding a minimal subset that represents the whole set, or to rank the features based on their importance, from the overall set.
Feature selection methods are mainly classified into filter method, wrapper method and hybrid method [22].In the filter approach, the feature set is evaluated at once which is independent of any clustering algorithm or classifier.On the other hand the wrapper method calls the clustering algorithm or the classifier for each subset evaluation to find the final subset.While the filter method is unbiased and fast, the wrapper method gives better results for a particular clustering algorithm or classifier.Hybrid method is a fusion of both filter and wrapper methods.In this paper, we have used the wrapper based feature selection method.
V. PROPOSED METHODOLOGY In this section, selection of feature set and recognition of isolated handwritten numerals using a framework based on the combined concepts of feature fusion and decision fusion is proposed as shown in the Fig. 4. The proposed method consists of three stages.In the first stage, a framework for selection of a better feature set is proposed and in the second stage the fused feature vector is selected and in the last stage improvement in accuracy is shown using decision fusion approach applied on the fused feature vector.

A. Framework for feature selection
The framework designed for the feature selection has various steps described as follows:

1) Preprocessing
Initially the color images were converted to gray scale and in turn the gray scale images were converted to binary using the global threshold method.Thinning was applied on the binary image.Thinning is an image preprocessing operation performed to make the image crisper by reducing the binaryvalued image regions to lines that approximate the skeletons of the region.Region labeling was then performed on the thinned binary image of the numeral and a minimum rectangle bounding box was inserted over the numeral.The bounding www.ijacsa.thesai.orgbox image would be of variable size due to different style and size of numeral.Hence this image was resized to a 256*256 and thinning was applied again.These preprocessed samples are used in the next stage i.e., feature extraction.

2) Feature extraction
The feature extraction technique that we have used to extract the features of a numeral is the Curvelet transform.We have used the Curvelet Transform because it extracts features efficiently from images which contain a large number of C 2 curves (i.e. an image which has a large number of long edges) [23].
We have applied wrapping based discrete curvelet transform using Curvelab-2.1.2,a toolbox implementing the Fast Discrete Curvelet Transform, to find the coefficients of every 256*256 image in the database.These coefficients are used as the feature vectors for those images.In this experiment we have used the default orientation and 5 levels of discrete curvelet decomposition.Hence for an image of size 256*256, curvelet coefficients in five different scales were obtained.Thus we have five different feature sets.

3) Dimensionality reduction using standard deviation
The curvelet coefficients obtained for each sample are numeric.In this implementation, we have chosen wavelet in the finest level of curvelet transform.This is due to the fact that use of wavelet reduces the redundancy factor [24].One subband at the coarsest and one subband at the finest level of curvelet decomposition are obtained after the application of curvelet transform in Scale 1 on the input.The numbers of subbands obtained at each level for the other scales of curvelet decomposition is different.The number of coefficients obtained after application of curvelet transform is very high.Hence if all the coefficients obtained are used in the feature vector, the size of the feature vector and the time taken for feature vector formation increases drastically.Therefore, for extracting the best features and also decreasing the size of feature vector for each sample, we use standard deviation as the dimension reduction technique [23].
The standard deviation of the coarsest and the finest levels are calculated first using the equation (1).Then, we calculate the standard deviation of the first half of the total subbands at each of the remaining scales.
We consider only the first half of the total subbands at a resolution level for feature calculation because; the curvelet at angle θ produces the same coefficients as the curvelet at angle (θ+π) in the frequency domain i.e. these subbands are symmetric in nature.Hence, considering half of the total number of subbands at each scale reduces the total computation time for the feature vector formation without the loss of information of the image.For the finest and the coarsest subbands the standard deviation calculated is used directly in the feature vector but for the other subbands the sum of the standard deviation is calculated and stored in the feature vector.It is seen that by applying standard deviation we can reduce the features as shown in the Table III.
Where , and n is the number of elements in the sample.

4) Classification
The classifier used in the proposed method is the k nearest neighbor classifier [25].The Nearest Neighbor Classifier is an efficient technique to use when the classification problem has pattern classes that display a reasonably limited degree of variability.It considers each input pattern given to it and classifies it to a certain class by calculating the distance between the input pattern and the training patterns.It takes into account only k nearest prototypes to the input pattern during classification.Here, Cityblock measure is used as the distance and nearest used as the rule.The decision is generally based on the majority of class values obtained by classifying k nearest neighbors

5) Subset evaluation and selection of final feature set
A subset of features is evaluated based on its recognition accuracy i.e., its capability for class separability.The subset which gives highest recognition accuracy for the given dataset is selected as best feature set for the classification.The scale 1 feature (subset 1) has the highest recognition accuracy and is selected as the final feature set.

B. Combining classifiers (decision fusion)
A traditional approach in classification is the use of a single classifier, which assigns a class label for each feature vector describing the image content.The decision functions produced by different classification principles differ from each other.This makes the classification accuracy somewhat varied the feature patterns obtained from the real world images are nonhomogenous , noisy and overlapping which may cause variations in the decision boundaries of different classifiers , due to these reasons different classifiers may classify the same image in different ways.As the features or classifiers of different types are able to complement one another in classification performance, the consensus decision of several classifiers can yield improved performance compared to individual classifiers [26].

1) Need for classifier combination
To improve the accuracy and efficiency of the classification system, a multi classifier system is preferred over a single classifier due to some of the following reasons like [1,26] www.ijacsa.thesai.org In certain applications, the volume of data to be analyzed is too large to be handled by a single classifier.Training a classifier with such a vast amount of data is usually not practical.A multi classifier system will be an efficient approach, where data is partitioned into smaller subsets, trained with different classifiers for different subsets and the outputs are combined.
 A single classifier cannot perform well when the nature of features is different.Using multiple classifiers with a subset of features may provide a better performance.
 Another reason for combining classifiers is to improve the generalization performance: a classifier may not perform well for a certain input when it is trained with a limited dataset.Finding a single classifier to work well for all test data is difficult.Instead multiple classifiers can be combined to give a better output than a single classifier.It may not necessarily out-perform a single best classifier, but the accuracy will be on an average better than all the classifiers.

2) Categories of multiple classifier systems
In a multiple classifier system, it is common that there are several base classifiers that are combined using a particular classifier combination strategy.It is obvious that a combination of base classifiers with identical errors does not improve the classification and hence, the base classifiers with decorrelating errors are preferred.Consequently, the base classifiers should differ from each other in some manner.This type of classifier combination can be achieved in one of the following ways: [1, 26 and 27]  Variation of initial parameters of the classifiers: a set of classifiers can be created by varying the initial parameters, using which each classifier is trained with the same training data.For example, in K-NN classification, the value of k needs to be selected.By using different parameter values, it is possible to obtain differently behaving classifier. Variations in the architectures: In several kinds of classifiers, the architecture can be selected.For example, the size of neural networks in the base classifiers can be varied.


 Variations in the feature sets: the base classifiers may use separate feature sets as their inputs.These feature sets may describe different properties of the object to be classified.
Once the base classifiers have been constructed, it is necessary to combine their opinions using some combination strategy.Classifier combination strategies are mainly classified into classifier fusion and classifier selection.In classifier fusion, every classifier is provided with complete information on the feature space, and the outputs from different classifiers are combined.Every classifier contributes to make a final decision whereas in classifier selection methods, every classifier is an expert in a specific domain of the feature space and the local expert alone decides the output of the ensemble.Classifier fusion is further categorized based on the output of classifiers and classifier selection is classified into dynamic and static classifier selection.Some of the different classifier combination strategies are [26, 27 and 28]: Strategies based on probabilities: These methods are also known as fixed combining rules.These strategies utilize the fact that the base classifier outputs are not just class numbers, but that they also include the confidence of the classifier.
Voting based strategies: The basic idea behind these methods is to make a consensus decision based on the base classifier opinions using voting.Hence, the class labels provided by the base classifiers are regarded as votes, and the final class is decided to be the class that receives the majority or most of the votes.The benefit of these methods is that the decision can be made solely on the basis of the class labels provided by the base classifier.
Strategies employing the class labels: In addition to the voting and the probability-based classifier combination methods, various classifier combination methods have been proposed that utilize the base classifier outputs in other ways than voting.In the most common case, these outputs are the class labels given by the base classifiers, though in certain cases methods such as probability distributions are employed.Some examples of these methods are class ranking, stacked generalization, error-correcting output codes.

C. Feature fusion
Once the best feature subset is obtained from the original set, we can use the derived set or can derive a new feature based on two or more of the selected features for the task of classification.Based on this concept, there are two existing techniques of feature combination: serial and parallel combination [2].
Feature combination (feature fusion) is the general technique where two features and are concatenated together [2].If m and n are the weights of and , respectively, then according to the serial fusion, the combined feature is .In parallel combination, a complex variable is used to combine the two features into a complex feature.The absolute value of the complex feature is taken as the final feature.Hence, if m and n are the weights of and , respectively, then the combined feature is set as .In these cases, the weights m and n can be decimal or binary values.In the latter case, 0 as a weight for a particular feature denotes that the corresponding feature is discarded, while 1 denotes that the corresponding feature is selected for the final subset of features [1].www.ijacsa.thesai.orgTo improve the accuracy we have proposed a method based on feature fusion.Here we have used a serial based feature combination where the weights of the features are taken as 1.Features from the selected feature subset is serially combined with the features of the other extracted feature subsets to form a union vector.For example, selected feature set 1 is combined with feature subset 2 to form union vector (1,2).Similarly, we have obtained four such union vectors (1, 2), (1,3), (1,4) and (1,5).

VI. RESULTS AND DISCUSSIONS
The experiments were carried out in Matlab 7.5.0, on a 64bit 2.67 GHz INTEL i5 processor, with 4 GB RAM.The curvelet transformation was done using the Curvelet 2.1.2toolbox, available from http://www.curvelet.org.The morphological operations were performed using Matlab's Image Processing Toolbox.
The curvelet transform is used to extract the features from the numeral samples in the dataset1.All the five different subsets of features extracted (scale 1, scale 2…, scale 5) are applied on this dataset and classified.The recognition accuracy is calculated for each of the scales or the feature subset (as shown in Fig. 5).The feature subset with the highest recognition accuracy is considered as the final or selected feature set for the next level of our methodology.
In our case scale 1 or (subset 1) has a highest recognition accuracy of 91% and is selected as the final feature set.The selected feature set (scale 1 features) from the feature selection framework is experimented on two datasets with a size of 1000 and 10,000 samples respectively using K-nearest neighbor with Cityblock as the distance measure.We obtained a 91% of recognition accuracy for the dataset 1 and 65.65% for dataset 2 proving that for any pattern classification task, an increase in data size affect the performance of any classifier [1].
To improve the accuracy we have proposed a method based on the concepts of decision fusion and feature fusion.
Here we have built a classifier combination based on the variation of initial parameters of the classifier.K-NN classifier is used to build the multi classifier system.A set of differently behaving classifiers can be created by varying the initial parameters like K-value, the distance measures in the K-NN classifier.
For our experimentation we have varied the distance measure parameter of the K-NN classifier and built the multi classifier system.We have applied four distance measures-Euclidean, City block, Cosine and Correlation to build the base classifiers.A feature vector selected as the best features from the feature selection process that is scale 1 features (subset 1) was given to K-NN classifiers with different distance measures .The results of all these classifiers are combined and a vote was taken to see the class to which the sample was classified the maximum number of times and this was considered as the class to which the sample belonged to (plurality voting).
Multi classifier system built was experimented on the dataset2 and from the results we find that there was an increase in the accuracy for the dataset2.
An increase in the recognition accuracy is seen from the experimental results as shown in Table IV.Next, a method based on feature fusion was used.Features from the selected feature set are combined with the features of one of the subsets to form a union vector.In our case we have obtained four such union vectors (1, 2), (1, 3), (1,4) and (1,5).Using these union vectors, the experiment was repeated on the dataset 2(synthetic dataset).Recognition accuracy for each of these union vectors is found (as shown in Fig 6).
Another important observation is that when the combination of scales are used for classification, the recognition rates appear better for the dataset2 and is the best (78.45%) with 0% rejection rate when the scales 1 and 5 are used together.One of the reasons of this result can be the using of image's information in different size's partitions and various scales.An increase in the recognition accuracy is seen from the experimental results as shown in Table IV.
Finally, we combined both the decision and feature fusion concepts and came up with a new framework .First we obtained the fused feature set which gave the highest recognition rate and this fused feature set was given to K-NN classifiers with different distance measures.
The results of all these classifiers are combined and a vote was taken to see the class to which the sample was classified the maximum number of times and this was considered as the class to which the sample belonged to (plurality voting).From the results it is seen that there is an increase in the recognition accuracy as shown in the Fig. 7.
From the experimental results (Table V), we observe that the average time required for the recognition is very less and that is in seconds which is not going to affect the efficiency of the proposed method.This can be attributed to the fact that the entire co-efficient set obtained is reduced using standard deviation and this result in dimensionality reduction of the feature vector and hence reducing the time taken for recognition.VII.CONCLUSION In practical classification applications, if the number of classes and multiple feature sets for pattern samples are given, a desirable recognition performance can be achieved based on these sets of features using data fusion.Data fusion is an ever growing field with a wide scope of interdisciplinary research over the fields of computer science, mathematics, statistics and machine learning.In this paper, we have proposed a framework based on the combined concepts of decision fusion and feature fusion for the isolated handwritten Kannada numerals classification.The proposed method improves the classification result.From the experimental results it is seen that there is an increase of 13.95% in the recognition accuracy.

Fig. 1 .
Fig. 1.Samples showing the style of writing characters with different size and shapes

Fig. 4 .
Fig. 4. A framework for the improvement of the recognition accuracy based on the combined concepts of feature fusion and decision fusion.
 Preprocessing  Feature extraction  Dimensionality reduction using standard deviation  Classification  Subset evaluation and selection of final feature  final feature set Variations of the training dataset of the classifiers: multi classifier systems can be built by training the same classifier with different training datasets.The type of training in the two level scheme can be either training the individual classifier and applying fusion or by training the individual classifier followed by training the fusion  Variations in the number of individual classifiers used: training different types of classifiers like SVM, ANN, etc., with the same training dataset.

TABLE III .
THE REDUCTION IN THE NUMBER OF FEATURES

TABLE IV .
IMPROVEMENT IN RECOGNITION ACCURACY (%) USING OUR PROPOSED FRAMEWORK

Dataset Recognition Accuracy (%) Before fusion After decision fusion only After feature fusion only After feature and decision fusion
Fig. 6.Recognition accuracy of different feature fused feature sets on dataset2

TABLE V .
AVERAGE RECOGNITION TIME