Supervised Hyperspectral Image Classification using SVM and Linear Discriminant Analysis

Hyperspectral images are used to recognize and determine the objects on the earth’s surface. This image contains more number of spectral bands and classifying the image becoming a difficult task. Problems of higher number of spectral dimensions are addressed through feature extraction and reduction. However, accuracy and computational time are the important challenges involved in the classification of hyperspectral images. Hence in this paper, a supervised method has been developed to classify the hyperspectral image using support vector machine (SVM) and Linear Discriminant Analysis (LDA). In this work, spectral features of the images are extracted and reduced using LDA. Spectral features of hyperspectral images are classified using SVM with RBF kernel like buildings, vegetation fields, etc. The simulation results show that the SVM algorithm combined with LDA has good accuracy and less computational time. Furthermore, the accuracy of classification is enhanced by incorporating the spatial features using edge-preserving filters. Keywords—Linear discriminant analysis; support vector machine; guided image filtering; bilateral filter


I. INTRODUCTION
Hyperspectral image classification helps in identifying the various objects present on the earth's surface. The important application of this classification involves urban planning and development, flood monitoring, forest management, resource monitoring, vegetation fields, etc. A Hyperspectral image is captured with different wavelengths and contains a number of spectral bands. Classification of hyperspectral image is based on the spectral signature of the various materials. Spectral signature contains the reflected and absorbed light of the material with respect to different wavelength of electromagnetic spectrum. Various materials on the earth surface such as vegetation land, buildings, roads, etc. has different reflection and absorption at various channel wavelength. These spectral signatures of various materials on the earth are measured using spectrometer. The number of spectral signature values of various spectral wavelength bands, increases the spectral dimensions of the hyperspectral image. The problem with higher spectral dimensions in classification results in Hughes phenomenon [1], which means, the accuracy of the classification reduces as the spectral band dimension increases. To overcome the problem with higher dimensions of image, researchers have developed various band reduction techniques to reduce the number of spectral bands in hyperspectral images.
Principal component analysis (PCA) [2] is one of the unsupervised band reduction techniques in various data analysis for pattern classification. It is designed to decrease the mean square error between the original dimensions and reduced dimensions. PCA will not maximize or minimize any metric related to automatic target recognition. The first problem in pattern classification is encountered when there is large size of training data samples compared with dimensions of the features space considered for classification. These features space contains very less redundancy information. Second, it is difficult to model the patterns by considering very less training data features and it results in rank-deficient covariance matrix if a feature has very high redundancy. The experimental evidence has been provided for the hyperspectral image that dimensionality reduction technique using PCA is not much effective for solving small sample size problems, with the number of training pixels are less than the dimensionality of feature space. On the other hand, few techniques have been developed in order to resolve the small sample size problems are Linear discriminant (LDA) transformation. LDA is developed to maximize the separation between the class scatters and minimize the separation with-in class scatters.
Hyperspectral image classification is divided into supervised and unsupervised classification. Supervised classification can achieve the higher classification accuracy compared with unsupervised classification due to the learning phase using training samples. In literature, Various supervised classifier has been developed for the classification of hyperspectral image based on some statistical parameters such as artificial neural networks [3], minimum distance [4], knearest neighbor [5], parallelepiped, Adaboost [6], Mahalanobis distance [7], Gaussian maximum likelihood [8], decision trees [9], Logistic Regression [10] and Convolutional neural network [11].
The mentioned methods are the pixel wise classifier and will not provide the very good classification results due the complexity involved in the hyperspectral images. The numbers of training samples present in the dataset are very limited compare to the spectral bands present in the image which results in decreased classification accuracy. Moreover, hyperspectral bands are highly correlated which results in certain kind of noise that exists between the bands. However, certain techniques have been adopted to overcome the difficulties present in classifying the hyperspectral data. The one approach is developing the spectral and spatial classifier which results in accurate classification and combines the pixels wise classifier with spatial information of the image. www.ijacsa.thesai.org SVM [12] is one of the power full classifier for the hyperspectral image as it has good accuracy compared to the other classifiers that are mentioned above. To achieve more accuracy in SVM classification, several variations have been made to SVM which includes transductive SVM and SVM with composite kernels. Labeled and unlabeled samples are exploited in the transductive SVM. The Spectral features are directly incorporated into the SVM kernels. SVM is the pixel based classifier and includes only the spectral features for the classification and it does not include the spatial information of the image. To overcome problem related to including spatial features to the SVM, many researchers have explored and made an effort to enhance the reliability of classification by incorporating the spatial and spectral features [13] using image fusion techniques [14] and edge-preserving filter.
Guided image filter [15] and bilateral filter are the edgepreserving filter that incorporates the spatial features to the SVM algorithm. These edge-preserving filters improve the accuracy of the SVM algorithm by correcting the misclassified samples in the hyperspectral image.
In this paper, the hyperspectral image is classified by using SVM and LDA. The LDA is used as a band reduction technique and to extract spectral features of image. These spectral features contain the reflected and absorbed light information of the earth surface. Furthermore, to improve the performance of SVM classification, the two edge-preserving filters such as guided image filter and bilateral filter is used as post processing. Both filters use the first component of linear discriminant analysis as a guidance image. This paper is organized as follows: Section II presents the feature extraction and reduction of the image based on LDA. Section III presents the classification of images with the help of SVM and RBF kernel using spectral features. Section IV presents the edge-preserving filter such as guided image filter and bilateral filter to enhance the classification accuracy by incorporating spatial features. Section V discusses the flowcharts and implementation of hyperspectral classification. The results and analysis of Indian pines image is discussed in section VI and conclusion derived in section VII.
II. LINEAR DISCRIMINANT ANALYSIS LDA is the supervised dimensionality reduction method used in the hyperspectral image. It reduces the number of spectral bands in the image and extract spectral features for classification. The technique is used to convert the high dimensional spectral feature space to the low dimensional spectral feature space using known class labels in the image. This will lead to enhance the class separation to eliminate the overfitting problem by reducing the error while estimating the parameter and reduces the computation time of the classification. However, even though spectral bands are reduced, the information required for the classification is maintained in very few numbers of linear discriminant components.
Steps to be followed in determining the LDA for feature reduction.
1) Calculate the mean of spectral signature for the various classes in the hyperspectral image.
2) Obtain the scatter matrices between and within the classes of the data.
Scatter matrix for within class is calculated by using equation (1).
Where, c is total classes in the hyperspectral dataset Where, x is a spectral value and n is the number of spectral values for each class.
Scatter matrix for between classes matrix is calculated using equation Where, The eigenvectors and eigenvalues of the above scatter matrix are calculated.
4) Arrange the eigenvalues and vector in decreasing values. Furthermore, form the matrix W, of dimension d × k, by selecting k eigenvectors with the highest eigenvalues.
5) The new sample subspace is created by using the d×k eigenvector matrix. This can be transformed by matrix multiplication: Y=X×W. where X contains the n samples and Y contains the newly transformed subspace.
The new sample subspace which contains the linear discriminant component values, which are used for the classification of hyperspectral images.

III. SUPPORT VECTOR MACHINE
SVM is the supervised classification algorithm developed by Vapnik in 1998, which can be used for the classification of hyperspectral images. SVM classification depends on the optimal hyperplane that has been generated from support vectors. SVM is basically designed for binary classification, however multiclass SVM is developed by using original binary SVM. SVM model requires training spectral signature and testing spectral signature with corresponding class labels. Training spectral signature with labels are used to train the model and a trained model is tested using the testing spectral signature.
Now let X is the input spectral signature data and Y is the corresponding class label for each spectral signature. The training samples for the model is {(x1, y1), (x2, y2), ……. (xm, ym)}. Now train the model and identify the acceptable value of yY from the earlier seen value xX. www.ijacsa.thesai.org SVM model using training samples can be created by the equation.
where,  is kernel function parameters. These parameters are identified properly for exact classification. SVM is modeled with different kinds of kernel functions such as linear, polynomial functions, and radial basis functions. From the literature, it is proven that RBF kernel had achieved the highest classification accuracy for hyper spectral image.
The SVM decision function is given by equation (5).
( , ) is the kernel function, S is training features and labels.
Radial basis function:

IV. EDGE PRESERVING FILTER
There are two types of edge-preserving filters, namely, bilateral filter and guided image filter. These filters are used to separate the noise present in the classified image. Noise in the classified image is mainly because of the wrong classification of each pixel. These misclassified pixels are corrected using bilateral and guided image filters. These two filters use the smoothing concept to correct the misclassified pixel by comparing the neighboring correctly classified pixels. Guided image filter is better than a bilateral filter at the edges of the image during the smoothing process due to gradient reversal artifacts present in the detailed composition.

A. Guided Image Filter
Guided image filter includes the spatial feature to the classified image and acts as an edge-preserving filter. The output of the filter is computed for each pixel by considering the input image p and guidance image I. The guidance image for the filter is as same as input image or any other type of image and it depends on the application. The output of the filter is calculated for each pixel by considering the window of size as shown in equation (7). It is basically the linear transformation of guidance image and calculated for each pixel of an input image.
and are the linear coefficients that are calculated by reducing cost function between the filter input p and filter output q is the regularization parameter and linear regression is used to calculate the solution to equation (8).
= ̅ − Here, and 2 are the mean and variance value of guidance image in , | | is the total number of pixel values in .
Linear modeled for every local window is applied and compute the filter output by equation (11

B. Bilateral Filter
The bilateral filter output of each pixel is computed by the weighted average of the nearby pixels. The bilateral filter preserves the edges while smoothing by considering the difference in values of the neighboring pixels. The bilateral filter is defined by.
where, is the normalization factor, the sum of its weight is equal to 1.
Here, parameter and define the filtering amount for the image I.
is the spatial Gaussian weight that minimizes the effect of distant pixels. is a range Gaussian minimize the effect of pixel q when their intensity values vary from .
V. METHODOLOGY The methodology followed in the classification of hyperspectral image classification consists of the implementation of SVM algorithm, LDA algorithm, and Edge preserving filters such as guided image filter and bilateral filter. Matlab with various toolbox are used in the implementing the algorithm. The implementation of the hyperspectral image classification is shown in Fig. 1. Here hyperspectral image with 224 spectral bands have been considered for the classification. LDA extracts and reduces the spectral features of the hyperspectral image. These spectral features contain the reflected and absorbed light information at different wavelengths. These high dimensionality spectral bands at different wavelength of the image are converted to the low dimensionality spectral bands for reducing the computational complexity in classification. The implementation of the LDA algorithm is shown in Fig. 2. The LDA is the supervised dimensionality reduction algorithm and uses the spectral signature of the image and known labels of the dataset as input data. LDA calculates the mean, scatter matrix, and eigen values and eigen vectors of the spectral signature of the materials present in the hyperspectral image and also uses known labels present in the dataset for the calculation. Finally, the k eigenvector is selected as a feature vector and this feature vector is used to derive the new set of data by multiplying the feature vector with spectral signature. The obtained new data vectors are the linear discriminant components of the spectral signature. The maximum linear www.ijacsa.thesai.org discriminant components obtained here is the number of classes present in the image. The number of linear discriminant components depends on the selection of eigenvector. Linear components contain the spectral features values and these values are used for SVM classification. These spectral features along with labels of each class are divided into the training and testing samples for SVM classification. SVM model is trained using the RBF kernel by computing the cost function and gamma parameter using a five-fold crossvalidation method.
The implementation of the five-fold cross-validation method is shown in Fig. 3. This method calculates the cost function and gamma parameter required for RBF kernel using training samples and labels based on maximum classification accuracy achieved at each iteration of C and g value. Now predict the test samples using a trained SVM model and calculate the overall accuracy of classification by plotting the confusion matrix. Few pixels of the hyperspectral image are correctly classified and few pixels are wrongly classified into different classes. The misclassified pixels in the image are corrected using edge preserving filter such as guided image filter and bilateral filter.  The implementation of this filter is shown in Fig. 4 and Fig. 5. This edge-preserving filters includes the spatial features for the classification of images. Misclassified pixels in the SVM classification are considered as noise and filter removes the noise by comparing the neighbor pixels. The filter uses the SVM classified image as an input image and first component of linear discriminant analysis image as a guidance image for both filters. Filter coefficients required for both edge-preserving filter is determined based on (9) and (10) for the selected window of 3x3. The output of the filter is derived based on (11) and (12). The output of the filter contains the pixels with correctly classified classes. Finally, the overall classification accuracy of SVM with both edge-preserving filter is measured by plotting the confusion matrix and simulation results are compared with previous SVM outputs.

A. Hyperspectral Image
The Indian pines image as shown in Fig. 6(a) is captured through the AVIRIS sensor (Airborne Visible / Infrared Imaging spectrometer). These image are captured at the North-Western Indiana test place. Recorded image comprises of 220 bands spectral information. The spectral wavelength of the AVIRIS sensor is 0.4 to 2.5m. The total size and spatial resolution of the image is 145x145 pixels and 20m per pixel, respectively. The image consists of forest land, agriculture land, and various vegetation fields, and totally it has 16 classes of information. Spectral bands covered with water absorption region are removed and total bands of the image are reduced to 200 bands. Color image and ground truth data indicating all the classes are shown in Fig. 6. The parameters considered to evaluate the performance of the developed methods are Kappa coefficient, Average accuracy (AA), and overall accuracy (OA).

B. Classification Results
The Indian pines image as shown in Fig. 6(a) is classified using a SVM with RBF kernel and simulation results of different proposed approaches are shown in Table I. The training data used in the generation of the SVM model is 10% of the total samples. Fig. 7 shows the classified image of different vegetation fields using the trained SVM model with RBF kernel using spectral features. The resultant overall classification accuracy obtained for the classified image is 81.15% and the computation time required to simulate the algorithm is 155 seconds. This computation time of SVM classification is reduced by removing the unnecessary spectral information by using LDA. The computation time is reduced to 25 seconds with the same classification accuracy, if only 15 linear components are used to train and test the SVM model. It can be infer from Fig. 7 that there are few pixels in the resultant image are wrongly classified to the different classes. Theses pixels appear like a noise in the resultant image which degrades the overall classification accuracy. These noises in the resultant image is removed by considering the edge preserving filter such as guided image filter and bilateral filter.
The overall classification accuracy of the SVM model is improved by adding spatial features to the SVM classified image. These features are added by using a guided image filter and bilateral filter with the first linear component of LDA as a guided image. Fig. 8 and 9 shows the SVM classified image using a guided image filter and bilateral filter, respectively. From the figure, it can be infer that noise in the resultant image are removed when compared to the previous SVM resultant image. The removal of noise increases the overall classification accuracy. Guided image filter improves the overall classification accuracy of SVM from 81.15% to 93.41%. Bilateral filter improves the overall classification accuracy of SVM from 81.15% to 92.41%. Guided image filter performs better compare to bilateral filter and it has better behavior near edges.    The linear components are used to reduce the computational time by maintaining the same accuracy level of the SVM model. Fig. 10 shows the variation of overall classification accuracy of SVM with a number of linear components used to train the SVM model.
A few numbers of linear components are more enough to get the same accuracy when compared to the classification accuracy of SVM with all features are considered. From the simulation experiments it can be infer that liner components with more than 7 are giving almost same classification accuracy. By considering very few linear components the computation time for classification is reduced and Fig. 11 shows the computation time with a number of linear components considered for SVM classification. From figure it can be infer that first few liner components require more computation time for classification. The computation time is minimum for the liner components from 6 to 8 and later for higher linear components the computation time increases. This computation time for classification includes reading of the image data, extracting the spectral features using LDA, training and testing the SVM model.   VII. CONCLUSION In this paper, hyperspectral image classification was developed based on spectral and spatial features. SVM with RBF kernel was used in the proposed approach for classification. More computation time required for SVM to compute its kernel parameter and classify, if all the spectral features of the images are considered. These spectral features of the hyperspectral image are reduced by linear discriminant analysis and only a few linear components are considered for the SVM classification in order to reduce the computational time. Furthermore, the classification accuracy was increased by incorporating the spatial features to the classification. These spatial features are included in classification by using edge-preserving filters such as guided image filter and bilateral filter. Simulation results shows that the guided image filter improves the classification accuracy from 81.15% to 93.41% and bilateral filter improves the classification accuracy to 92.41%. In future, computation time and accuracy of SVM can be improved by using various dimensionality reduction algorithms.