Hyperspectral Image Classification using Convolutional Neural Networks

org


I. INTRODUCTION
Latest developments in optics and photonics have made a hyperspectral imaging sensor with better spectral and spatial resolution. The spatial and spectral information is efficiently exploited to identify the materials and objects on the earth surface. The spectral signatures are modelled in such a way that they will differentiate the various objects and materials. It is possible to see the identification of different materials, objects and surface ground cover classes based on their reflectance properties as a classification task, i.e. the classification of image pixels based on their spectral characteristics. Classification of hyperspectral imaging has been used in a broad range of applications, such as agriculture, environmental science, astronomy, surveillance, astronomy and biomedical imaging. However, the classification of the hyperspectral images has its own particular problems, in addition to (i) high dimensionality, (ii) the small number of samples which have been labelled, and (iii) significant spatial variation of spectral signatures.
Much of the recent work on hyperspectral data classification follows the traditional pattern recognition method, which comprises of two distinct steps: first, detailed handcrafted features are obtained from the original data input, and secondly, classifiers like Support Vector Machines (SVM), Neural Networks (NN) [1], maximum likelihood [2], parallelepiped classification, k nearest neighbours [3], minimum distance [4], and logistic regression [5] have been used to learn the extracted features. The "curse of dimensionality" affects the majority of the algorithms mentioned above. Any dimensionality reduction-based classification approaches [6] have been suggested to manage the large dimension complexity and small training samples of hyperspectral data. Band selection and transformation are the other methods available to deal with dimensionality. In general, statistical learning techniques have been used to solve the large dimensionality and variability of high dimensional hyperspectral data and when few training samples are accessible.
Support vector machine, a popular classification method for hyperspectral data classification, is presented in [7]. SVM is resistant to the Hughes phenomena and has poor sensitivity to high dimensionality. In certain situations, SVM-based classifiers perform better compared to other commonly used pattern recognition strategies in terms of classification accuracy. These classifiers were cutting edge technological tools for a long time.
In recent years, spatial information has become highly relevant for hyperspectral data classification. In terms of efficiency, spatial-spectral classification methods provide substantial benefits. Many new methods aim to bring spatial information into account to deal with spectral signature spatial variability. In [8], applying SVM and a guided image filter, a technique for classifying hyperspectral images has been presented. The guided image filter is used to incorporate the spatial features into the SVM classifier. In [9], the edgepreserving filters such as bilateral filter and guided image filter are included to incorporate the spatial features to the SVM classifier.
However, it is essential to know which features are most relevant for the classification algorithms due to the high diversity of represented materials. The various deep learning models [10,11] have been developed for classification purposes. These models are trained from various levels of features. The high levels features required for training the model are obtained from low-level features. These models automate the extraction of features for any problem compared with any convolutional pattern recognition technique. In addition, deep learning systems tend to match and resolve the classification issue more efficiently with wider datasets and large images with high spatial and spectral resolution. Deep learning approaches have already shown promising results for detecting real artefacts, such as man-made objects or vehicles, as well as for classifying hyperspectral data.
More precisely, a deep learning system for the classification of hyperspectral data with accurate results was used in [12]. www.ijacsa.thesai.org In precise, the auto encoder's helps in the design of the deep architecture, which hierarchically extracts the high-level spectral features required for the classification of each pixel in the image. Spectral features were coupled with spatialdominated information in a separate stage and fed to a logistic regression classifier as input.
In the same way, for the classification of hyperspectral data into various classes, we suggest a deep learning system. Our method, however, is based on a coherent structure that incorporates spectral and spatial information in a single stage, creating high-level spectral-spatial characteristics at the same time.
In specifically, we suggest the use of a Modified Convolutional Neural Network (CNN) that performs the operation of constructing large-level features and a Multi-Layer Perceptron (MLP) which is used for the classification of the image. The evolved framework builds spectral-spatial characteristics at once under this kind of design and, at a similar time, performs real-time predictions of the various classes in the image because of the existence of feed forward network in CNNs and MLPs. This paper is organized into five sections: Section II gives a brief introduction and background of the convolution neural network. Section III presents the implementation methodology of the convolution neural network for hyperspectral image classification. Sections IV and V discuss the results and discussion of the image classification and conclusion.

II. CONVOLUTIONAL NEURAL NETWORK (CNN): BACKGROUND
Convolution neural network (CNN) is the most standard deep learning algorithm for image classification and image recognition problems. Along with these applications, CNN is widely used to recognize human faces and classify objects.
CNN algorithm takes the input image and extract the features in the learning phase and classify it to various class or categories. The algorithm sees the input image as an array of a matrix and it will depend on the image resolution. CNN consider the matrix for the RGB image and for the grayscale image. In principle, create the CNN model bypassing the input image series through various convolution layers with filters or kernels to extract features, pooling, fully connected layers, and finally applying the softmax function to detect the objects probabilistic value between 0 and 1. Fig. 1 shows the entire CNN follow diagram process.

A. Convolution Layer
The features existing in the input images are extracted through a convolution layer. The relationship that exists between the image's pixels is preserved by taking a small square of input data at the learning phase. The convolution is the mathematical operation given by the multiplication of an image matrix and a filter or kernel. The convolution operation of the image is shown in Fig. 2.
Consider an example of an image pixel whose matrix is 5 x 5 with values 0 and 1 and the filter matrix or kernel 3 x 3. The convolution layer multiplies the image matrix with a filter matrix to provides the convolved feature or feature map. The same convolved operations are shown in Fig. 3 and Fig. 4.
Use the various types of filters or kernel listed in the Table I for convolution operation. These filters can also perform different mathematical calculations like sharpening of the image, blur the image and to detect the edges in the image.

B. Strides
Strides are used in convolution operation, and it determines the shift in the filter kernel matrix by certain number of pixels on the input image. For example, if stride value is 1 in convolution operation, then the filter kernel in convolution is shifted by one pixel at a time on the image matrix. If stride value is 2, then a filter kernel is shifted by two pixels at a time on the image matrix. Fig. 5 shows the operation strides by 2 pixels.

C. Padding
In most of the cases, the chosen filter does not fit to the input image. In those cases, the following options are used to fit the input images.
• Zero paddings: input image is padded with zeros to fit the input images.
• Drop a few pixels in the image to fit the input images.

D. Non Linearity (ReLU)
Rectified Linear Unit (ReLU) performs the non-linear computation in convolution network. The output of ReLU is .
The goal of introducing the non-linearity in ReLU is to learn the non-negative linear values in the convolution network. Fig. 6 shows the ReLU operation in CNN. Tanh and sigmoid are the some of the non-linear functions which can be employed instead of ReLU. But the performance of ReLU is better compared to the other two non-linear operations

E. Pooling Layer
The pooling layer decreases the output pixels of the convolution layer and reduces the complexity of large images. The spatial pooling is also called subsampling or downsampling of pixels, and it decreases the spatial dimension of the image by retaining most of the information in the image. Following are three different categories of spatial poling.

F. Fully Connected Layer
The two-dimensional matrix of the pooling layer is converted into a single dimension matrix. This single layer vector is feed into a fully connected layer similar to a neural network. Fig. 8 shows the operation of a fully connected layer. The output feature matrix is converted to vector as . These features are merged to construct a model using a fully connected layer and finally use softmax or sigmoid function to classify the outputs.

III. METHODOLOGY
The implementation methodology of hyperspectral image classification using the CNN approach is shown in Fig. 9. The hyperspectral image is defined as a 3-D matrix of size , where and represent the image's height and width and represents the spectral channel in the image. The hundreds of channels in the hyperspectral image enhance the calculating time and memory resources of the training and prediction process. On the other hand, using statistical analysis, it is observed that the spectral response variance is minimal for the pixel that belongs to each class.
It means for every channel, pixels with the same class labels have related spectral values, and pixels with different class labels have different spectral values. A dimensionality reduction procedure can be used, based on these properties, to decrease the dimensionality of the input data, to optimize the training and classification processes. PCA is the dimensionality reduction algorithm that reduces the hyperspectral image's spectral dimensions without losing any image information. After dimensionality reduction, split the hyperspectral image into small patches to compatible with CNN's basic nature. Each created patch contains the spectral and spatial features of a single pixel.
More precisely, square patch of dimension centred at a pixel is used to classify the pixel at location on the image plane and combine spectral and spatial feature data. Let us denote the class label of the pixel at the given location as and patch centred at pixel as . then create a dataset where and . The patch is the 3D matrix with size , which has spatial and spectral data for each pixel. Furthermore, this matrix is split into matrixes of size which are given as an input to the CNN algorithm. This CNN develops the large features that encapsulate the pixel's spectral and spatial information. The CNN architecture contains many layers, as shown in Fig. 10. The first layer in CNN structure is the convolution layer with a trainable filter of dimensions 3x3. This layer gives matrices of dimensions 3 x 3. The second layer is also a convolution layer with trainable filter. The output of the convolution layer is given to flatten layer and fully connected layer. The flow chart for the implementation of hyperspectral image classification using a convolutional neural network is shown in Fig. 11.

IV. RESULTS AND DISCUSSIONS
The Hyperspectral image is used to simulate the proposed algorithm. Indian pines dataset is the hyperspectral image captured at north western Indiana test site using AVIRIS sensors, and its specification is shown in Table II. The overall accuracy, test loss, precision, recall and f1-score are parameters used to analyze the performance of algorithm.
In the simulation, PCA reduces the spectral bands, and 30 principal components are chosen for the classification. After dimensionality reduction, each patch contains dimensions. During the simulation, the patch size is defined as 5 to consider the nearest 24 neighbours of each pixel. Each patch is given to the CNN architecture for classification.
The classification accuracy obtained for the 5 x 5 patch is 84.12%, with a simulation time of 1142.26 seconds. The screenshot of the results and the classified image is shown in Fig. 12. The test loss for the same is 47.33%. The precision, recall and f1-score of the individual class is shown in Fig. 12. The patch size is increased to 7, 9, 11 and 13 to increase the number of neighbours for each pixel.  Increasing the patch size enhance the classification accuracy and increases the simulation time. The results of CNN classification with different patch size is shown in Fig. 13. The accuracy and computation time of CNN classification is given in Table III. From the table, it can be seen that CNN achieves the highest classification accuracy and computation time for the patch size 13 x 13. No further improvements are observed in classification accuracy for the patch size of more than 13, and in fact, the accuracy of the classifier deteriorates and increases the computational resources. The proposed CNN algorithm for the classification of the hyperspectral image is compared with the Support vector machine classifier as shown in Table IV. Support vector machine uses the spectral features for classification and spatial features are add to SVM output by using guided image filter and bilateral filter. Along with SVM, the dimensions of the spectral features are reduced using PCA and LDA. The SVM method along with guided image filter and bilateral filter increases the classification accuracy and SVM method along with PCA and LDA decreases the computational time. CNN algorithm combines both spectral and spatial features at the same time without using guided image filter and bilateral filter. CNN algorithm archives the highest classification accuracy compared to the SVM classification accuracy with increasing computation time. SVM algorithm consumes less computational time compared with CNN algorithm.

V. CONCLUSION
The hyperspectral image is classified using a convolutional neural network. Both spectral and spatial features are considered to classify the image. Spatial features are included in the classification to improve the classification. CNN archives the highest classification accuracy of 98.28% compared to support vector machine classifier and other methods. CNN accuracy depends on the patch size considered for the classification. Patch size indicates the number of spatial features considered for classification. It is observed that the patch size of 13 x 13 is enough to achieve the highest accuracy. CNN consumes more computation time for training and testing compared with other classifiers. The proposed method avoids the usage of an edge-preserving filter to incorporate spatial features into the classification.