An Improved K-Nearest Neighbor Algorithm for Pattern Classiﬁcation

—This paper proposed a “Locally Adaptive K-Nearest Neighbor (LAKNN) algorithm” for pattern exploration problem to enhance the obscenity of dimensionality. To compute neighborhood local linear discriminant analysis is an effective metric which determines the local decision boundaries from centroid information. KNN is a novel approach which uses in many classiﬁcations problem of data mining and machine learning. KNN uses class conditional probabilities for unfamiliar pattern. For limited training data in high dimensional feature space this hypothesis is unacceptable due to disﬁgurement of high dimensionality. To normalize the feature value of dissimilar metrics, Standard Euclidean Distance is used in KNN which s misguide to ﬁnd a proper subset of nearest points of the pattern to be predicted. To overcome the effect of high dimensionality LANN uses a new variant of Standard Euclidian Distance Metric. A ﬂexible metric is estimated for computing neighborhoods based on Chi-squared distance analysis. Chi-squared metric is used to ascertains most signiﬁcant features in ﬁnding k-closet points of the training patterns. This paper also shows that LANN outperformed other four different models of KNN and other machine-learning algorithm in both training and accuracy.


I. INTRODUCTION
Nearest neighbor classifier is a simplest, oldest and wideranging method for classification. It classifies an unidentified pattern by choosing the adjacent example in the training set and measured by a distance metric. It is one of the most common instance-based learning method. Simplicity, transparency and fast training time are the advantage of this algorithm. Instances of nearest neighbor denoted as a point of Euclidian space. It is a conceptual method that can be used to approximate realvalued or discrete-valued target function. K nearest neighbor algorithm is best suited for small data sets and which datasets have less features. This algorithm considers close relationship for similar things. In other words, the similar things of neighbors are considered one of them. For example, if mangoes' appearances is more similar to apple, orange, and guava (fruits) than horse, dog and cat (animals), then most likely mango is a fruit.
In pattern recognition problem, a feature vector x = (x 1 , --X q ) q , is considered as an object like J classes, and the goal is to form a classifier that allots x to the exact class from a given set of N training samples. The simplest and alluring approach to solve this problem is the K Nearest Neighbor (KNN) [1] [2] classification. Rather than fixed data points this method works on continuous and overlapping neighborhoods [3]. This method uses different neighborhood for each single query so that all points in the neighborhood are adjacent to the query to the extent possible [4][5] [6]. KNN uses Straight Euclidean distance to discover the k-closest points from query point [7][8] [9] [10]. This can influence a real less important feature more than that of others to classify a pattern and misclassify the pattern due to dissimilar metric in measuring the feature values [11] [12]. It can seriously affect in the training set with high dimensional feature space [13]. Several biases are introduced in KNN for high dimensional input feature space with limited samples [14].
A modified metric of Standard Euclidean Distance is proposed here, which uses the variance of each feature to give identical influence on the decision to all dissimilar metrics in the feature values [15]. Distance is weighted as chi-squared metric that discovers most relevant features in finding k-closet points to the pattern under consideration from the training space [16].
A locally adaptive form of nearest neighbor classification (LANN) is proposed here to upgrade the obscenity of dimensionality [17]. An effective metric is used here to compute neighborhoods which determines the local decision boundaries from centroid information, and then shrink neighborhoods in directions orthogonal to these local decision boundaries, and extend them parallel to the boundaries [18][19] [20].
To give all features equal influences on the pattern classification a variance based Euclidean distance metric is used in the proposed algorithm instead of straight Euclidean distance metric. The variance of each feature is calculated during training.   1 shows an example. There are two classes and both classes data are produced from a bivariate standard normal www.ijacsa.thesai.org distribution. The radius of class one data is less than or equal to 1.15, while radius of class two is greater than 1.15. As a result, class one is surrounded by class two. Fig. 1(a) shows the nearest neighborhood of size 50 of a query located at (0, -1) near the class boundary. This neighborhood is computed using the Euclidian distance metric Fig. 1(b) displays the neighborhood of same size computed by using the adaptive nearest neighbor classification algorithm. The amended neighborhood is elongated along the direction of the true decision boundary and constricted along the direction orthogonal to it, which is the most relevant direction for the given query.  . Plot (b) shows the corresponding neighborhood found by the proposed algorithm also containing 50 points. After applying the adaptive procedure, the neighborhood is constricted along the most relevant dimension and elongated along the less important one. This paper proposed an algorithm that can be used in many practical applications of pattern recognition problem in machine learning technology for pattern classification tasks. It has been compared experimentally with KNN, DANN and C4.5 in a large number of artificial and natural learning domains. Experimental result shows that use of Variance based Euclidean distance metric and FRW perfectly removes the problem of constant class conditional probabilities in KNN and improves the performance of KNN.

II. LITERATURE REVIEW
Locally adaptive KNN algorithms indicate the value of k that should be used to categorize an interrogation by accessing the outcomes of cross-validation calculations in the resident locality of the query [21] [22]. Local KNN procedures are exposed to complete analogous to KNN in experimentations with twelve frequently secondhand data sets.
Deepti et al. [ Armand et al. [25] proposed a metaheuristic search algorithm named Simulated Annealing, to choose an optimal k, thus rejecting the prospect of an exhaustive search for optimal k. Hence, the result is compared with in four different classification method to determine a substantial development in the computational competence compared to the KNN methods.
D. Maruthi et al. [26] introduced an effective classification system for MRI brain tumor and for giving grade of brain tumor images. The images are classified by using the adaptive k nearest neighbor classifier. However, the classification and segmentation arrival method are valued by accuracy, sensitivity and specificity.
Jieying et al. [27] proposed a precise image interpolation with adaptive KNN for searching image on the input image patch and conduct them for nonlinear mapping among low resolution and high-resolution image patches.
Jianping et al. [28] offered a local mean representation based k nearest neighbor classifier to increase the performance of classification and exceed the primary issues of KNN classification. They used two databases UCI and KEEL and also three common databases that carried out by liken LMRKNN and KNN based. However, it shows the LMRKNN significantly outperforms the KNN based methods. Some previous works on K-Nearest Neighbor Algorithm for Pattern Classification that we have discussed above (Table  I). Apart from this, no such similar topic related work exists as far as our knowledge. Our primary focus is to propose an algorithm that can be used in many practical applications of pattern recognition problem in machine learning technology for pattern classification tasks. It has been compared experimentally with KNN, DANN and C4.5 in a large number of artificial and natural learning domains but there is no work found that use the comparison among AI and NLP domain. Besides, no relation is shown in any research as per our study with use of Variance based Euclidean distance metric and FRW which perfectly removes the problem of constant class conditional probabilities in KNN and improves the performance of KNN.

III. METHODOLOGY
LANN has three main components: Variance-based Euclidean distance Metric, Feature Relevance Weight (FRW), the best K value using the majority voting scheme [12] [13]. LANN uses a variance based Euclidean Distance metric to find the adjacent neighbors of a query point from the training space and then the class is assigned with the majority class of the neighbors. The component of each feature in the distance is normalized using the variance. While finding the nearest points, distance component of each feature is weighted with chi-squared distance metric to work out the most relevant features.
The main steps of the algorithm and the working procedure are as follows (Fig. 3):  [25] To choose an optimal K value proposed a metaheuristic search algorithm and also eliminate the prospect of an exhaustive search KNN, Adaptive algorithms, Parameter Optimization The adaptive KNN method can't achieve good performance. [26] Introduced an effective classification system to classify MRI brain tumor AKNN, Median filter Can't provide an explanation in optimization computation complexity problem. [27] Proposed an accurate image interpolation with adaptive KNN searching and nonlinear regression.

AKNN
Do not explore deep learning models.
[28] Proposed a k nearest neighbor classifier based on local mean representation.
[29] Introduced a method named density based adaptive k nearest neighbor. Step-1: Start several Leave-One-Out Tests (Test index "T") for a single neighbor (T=1) to a threshold value (T=10). For each Leave-One-Out test, each example in the training space is classified according to the step 2 to 7.
Step-2: For each test point x0 in training space in each leave-one-out test (Query point index "j" of each "T" value, Given input parameters: K0, K1, K2, L), Initialize a feature relevance weight "wi" to 1 for each feature component in Euclidian distance measure in equation 1.
xis the mean value of ith feature, where q is the number of features of each point. x, y are the two data points and distance between x and y data point is D (x, y). xi and yi are the ith and ith feature value of x and y data point respectively. Equation 1 measures Euclidean distance with the normalized weight of each feature according to the variance of that feature that are in training data set. wi is the feature relevance weight for each feature.
Step-3: Compute K0 nearest neighbors of x0 by means of the variance based weighted Euclidian distance metric using equation 1 for w i = 1.
where N (x 0 ) represents the neighborhood of x 0 holding the K 0 -nearest training point.r i (x 0 ) denotes the capability of feature i to predict P r(j|z)s at x i = z i and defined as follows: The nearer P r(j|x i = z i ) is to P r(j|z), the additional information features i carries for predicting the class posterior probabilities locally at z.P r(j|x i = z) is the conditional expectation of p(j|x), given that xi assumes value z, where xi represents the ith feature of x.P r(j|z) and P r(j|x i = z i ) is estimated as follows: 1(.) is function which acts as indicator, such that if the argument is true it returns 1 and if false then returns 0. N 1 (z) is the neighborhood centered at z containing K 1 nearest training points.

P r(j|x
N 2 (z) is the neighborhood centered at z containing K 2 nearest training points, the value of i is selected from the interval containing a fixed number of L points: Step 5: Update Feature Relevance Weight (FRW) "w i " according to equation 8 to equation 9. Feature Relevance Weight (FRW) is calculated by: where R i (x 0 ) is defined by t = 1, 2 giving quadratic weighting scheme. In all our experiments we obtained optimal value for input parameters K 1 = 5, K 2 = 10% of N, K 0 = 15% of N. L is set to half of the K 2 .
Step 6: Iterate steps 2 to 5 again, in this situation each feature has some FRW value.
Step 7: Using Step 2 to 6 a FRW for each feature is obtained. Using Step 8: All examples in the training space are classified following the steps from 2 to 7.
Step 9: Error rate is calculated for Tth Leave-One-Out test.
Step 10: All (T=10) Leave-One-Out Tests are completed and error rate is recorded for each test. Test with minimum error rate is chosen as best k-value for the training data set.
Step 11: Using the best k-value; classify any query point following the steps from 2 to 7.
The algorithm of LANN appears to be complex, but the core of LANN is the application of three main components: Variance based Euclidean distance metric, Feature relevance weight, Choice of the best k-value.  end for end for Compute weighted distance of x j using the new FRW w i by equation 1 from D. //Label-2 // E=Choose "T" neighbors from D Apply majority voting on E and classify x j . end for Calculate error rate for "T" test. end for K = Choose best T with lowest error rate. Compute a FRW for x 0 following steps from "Label-1 to Lablel-2". F = Choose "K" neighbors from D training -x 0 Class C =Apply majority voting on "F" Return "C" Vowel, Sonar, Hepatitis, Wine, Segmentation, Lymphography, Liver-Disorder and Lung-Cancer data are taken from UCI Machine Learning Repository [4]. All for the datasets we perform Leave-One-Out test to measure performance (Table  II).  Table III shows the leave one out test result for 12 datasets. Table III depicts the Leave-One-Out error rates for the four methods under consideration on the twelve real world data.
The above table shows error rates (%) for different Kvalues. Column 1 of Table III shows that the minimum error rate is 2.43 for K=4 in breast cancer dataset. Column 2 of Table III shows minimum error rate 3.33 for K=2 for Diabetes dataset, minimum error rate for Iris dataset is 3.33 that shown in column 3 of Table III, So, the best K-value is 6. Minimum error rate for Glass dataset is 24.76 for k value 4 is shown in column 4 of Table III, 9.13 is the minimum error rate for k value 4 for sonar dataset shown in column5 of Table III, for k value 2 minimum error rate 0.56 is found for Vowel dataset that shown in column 6 of Table III, column 7 of Table III shows the minimum error rate of Hepatitis dataset which is 21.33 for k value 2. Minimum error rate of Wine dataset, Segment dataset, Lymphographic dataset, Liver disorder dataset and Lung Cancer dataset is 1.68 for k value 2, 1.63 for k value 4, 8.10 for k value 2, 22.31 for k value 4, 37.5 for k value 4 are shown in column 8, 9,10,11 and 12 respectively of Table III. After completion of all Leave-One-Out tests we calculate the error rate of LANN by the following:        four algorithms. " Fig. 9

V. DISCUSSIONS
There are basically two parts for pattern classification. By using an algorithm the first part creates feature vector from a given image and these features are used in the second part to learn a machine to classify an unknown pattern.
These two parts are not completely independent, this means machine learning algorithms may be benefited by knowing how the features are extracted from an image and feature extraction may be more fruitful if the type of machine leaning algorithm is known. However, the limitation of this paper is it only explored second part. That is, this work emphasis on to build a system which can classify an unknown image or pattern by using machine learning from a given set of database, all of which feature vectors have already been broken down into by an image processing algorithm. For example, the Segment dataset that is used in this work is an image classification problem. After applying the proposed algorithm (LANN) on the Segment dataset, the classification error rate is observed as 1.6%, whereas the error rates for C4.5, DANN, KNN are 3.7%, 2.5%, 3.6%, respectively. It proves that the LANN performs better than other existing algorithms in image-classification problems (Fig. 17.)

VI. CONCLUSION
LANN presents a new variant of nearest neighbor method to classify pattern effectively. To produce neighborhood, it uses a flexible metric that are elongated along less relevant feature dimensions and constricted along most influential ones. By using this technique, the class conditional probabilities tend to be more homogeneous in the modified neighborhoods. From the experimental result it is clearly shown that LANN can potentially improve the performance of K-NN and recursive partitioning methods in some classification problems. The results are also in favor of LANN over other adaptive methods such as C4.5 and DANN.