Naïve Bayes Classification of High-Resolution Aerial Imagery

In this study, the performance of Naïve Bayes classification on a high-resolution aerial image captured from a UAV-based remote sensing platform is investigated. K-means clustering of the study area is initially performed to assist in selecting the training pixels for the Naïve Bayes classification. The Naïve Bayes classification is performed using linear and quadratic discriminant analyses and by making use of training set sizes that are varied from 10 through 100 pixels. The results show that the 20 training set size gives the highest overall classification accuracy and Kappa coefficient for both discriminant analysis types. The linear discriminant analysis with 94.44% overall classification accuracy and 0.9395 Kappa coefficient is found higher than the quadratic discriminant analysis with 88.89% overall classification accuracy and 0.875 Kappa coefficient. Further investigations carried out on the producer accuracy and area size of individual classes show that the linear discriminant analysis produces a more realistic classification compared to the quadratic discriminant analysis particularly due to limited homogenous training pixels of certain objects. Keywords—Naïve Bayes; k-means; classification accuracy; training set size; discriminant analysis


I. INTRODUCTION
In remote sensing, classification is the process of assigning a pixel to a particular type of land cover. Classification uses typically a measurement vector or feature vector  of data acquired from a spaceborne or airborne acquisition system. It aims to assign a pixel associated with the measurement  at position x to particular class i, where 1 i M and M is the total number of classes. The classes are defined from supporting data, such as maps and ground data for test sites. Two types of classification are commonly used, supervised and unsupervised. Supervised classification starts from a known set of classes, learns the statistical properties of each class and then assigns the pixels based on these properties. Unsupervised classification is a two-step operation of grouping pixels into clusters based on the statistical properties of the measurements, and then labelling the clusters with the appropriate classes. Supervised classification classifies pixels based on known properties of each cover type, it requires representative land cover information, in the form of training pixels [1], [2], [3]. Signatures generated from the training data will be in a different form, depending on the classifier type used. Examples of supervised classification classifiers include Naïve Bayes, Maximum Likelihood, Mahalanobis Distance, Parallelepiped and support vector machines. On the other hand, in terms of unsupervised classification, the clustering process produces clusters that are statistically separable, giving a natural grouping of the pixels [4]. Landcover information is then used in the following labelling process where clusters are assigned to classes based on the available landcover information. This has the disadvantages that (1) a cluster may represent a mixture of different landcover types and (2) a single landcover may be split into several clusters. Furthermore, the assignment of clusters to classes, also known as the labelling process, requires manual input using available knowledge and needs to be carefully performed after the clustering, to correctly label the clusters. Examples of unsupervised classification are K-means and ISODATA. These unsupervised and supervised methods have been used extensively on satellite images however, there is limited effort to investigate the performance of these methods on highresolution aerial images [1], [2], [3], [4]. In this study, the performance of Naïve Bayes classification on a highresolution aerial image is to be investigated where K-means clustering is initially performed in determining the training pixels. www.ijacsa.thesai.org

A. K-means Clustering
K-Means algorithm is an iterative method to partition a given dataset into a user-specified number of clusters, K. Its objective is to minimize the average squared Euclidean distance of distance from their cluster centres [5]. Let denotes the mean for cluster centre , and the K-Means objective function can be written as: Where, measures the sum of squared distances between each training example and the cluster centroid to which it has been assigned. The inner-loop of K-Means repeatedly minimizes with respect to while holding fixed, and then minimizes with respect to while holding fixed. With this function well defined, the process can be split into several steps, to achieve the intended result. The starting point is a large set of data entries and defining the number of centres, k.

B. Naïve Bayes Classification
Generally, from the conditional probability theorem, the probability of an event A occurs given event B has already occurred is equal to the intersection of event A and B divided by event B [6], [7]. This can be expressed as: In the same way, the probability of an event B occurs given event A has already occurred can be expressed as: From the Commutative law, it can be easily proven: Therefore (3) can also be written as, and, Hence, (2) can be expressed as: This is popularly known as the Bayes" Theorem. ( ) is also known as a posteriori probability of B. Event B is the evidence or feature. P(A) is the priori of A or the prior probability. In real-world problems, multiple features B are typically considered. For n features, B can be expressed as a feature vector: .., When these features are independent, the Bayes Rule can be extended to Naive Bayes: Since ( ) ( ) can be expanded into: and Hence, In remote sensing, the probability distributions of the data may take a variety of forms, but very frequently they are assumed to be Gaussian, more specifically having normal distribution [8], [9]. When each class obeys a multivariate normal distribution for N spectral dimensions, specifically the number of bands used, the probability that feature vector occurs in a specified class can be defined as: where, where is the class mean vector, is the class covariance matrix for class i, is the number of pixels in class i, is the feature vector of the jth pixel and is determinant. This assumption is likely to be suitable for data that comes directly from spectral band measurements, but should not be used if the feature vector contains more general types of data, e.g. band ratios, without first testing its validity.
The Naive Bayes classifier is based on Bayes" theorem of probability. In classification, the concern is to predict the classes given the measurement from different spectral bands [9], [10]. Therefore, the probability of class i occurs given the spectral measurement , P(i|), needs to be determined. From the Bayes" theorem, the a posteriori distribution P(i|) which is the probability that a pixel with feature vector  belongs to class i, is given by: where ( )is the priori of , the prior probability, that is the probability of class i occurs before is known. P(|i) is the likelihood function, P(i) is the a priori information, that"s is the probability that class i occurs in the study area and ( ) is the probability that  is observed. ( ) or the priori of can be expressed as: where M is the number of classes.
For Naïve Bayes, Expanding ( ) gives: (17) and Hence, Since ( ) is constant given the input, the following classification rule can be used: Naïve Bayes classification is possible if the prior information P(i) is available. This is the most powerful use of the Bayes Theorem. Pixel x is assigned to class i by the rule: xi if P(i|) > P(k|) for all k≠i (22) III. METHODOLOGY

A. Personal Remote Sensing System (PRSS) Workflow
Image acquisition is carried out using an aerial imaging known as Personal Remote Sensing System or PRSS [11], [12]. The PRSS has been developed in the previous research for overcoming limitations in term of resolution besides cloud and haze effects of the space-borne remote sensing satellites [1], [13], [14], [15], [27]. This system consists of 1) aerial segment, 2) ground segment and 3) user segment. The aerial segment consists of a quad rotor UAV that is equipped with GPS and telemetry facilities and mounted with a high-resolution RGB camera [16], [17], [18]. Images are captured automatically at certain time interval and stored in the camera"s storage card. Upon completing an image acquisition mission, the images in the card are transferred to the ground segment for subsequent image processing tasks. The ground segment consists of a laptop installed with softwares for controlling and tracking the UAV besides processing the captured images [19]. The processed images are finally uploaded to the cloud-based geospatial databases that can finally be accessed and personalised using a smart phone at the user segment. A user can make other request to the ground segment for images of other areas or objects. Upon receiving the request, the ground segment will prepare a new mission plan and it to the aerial segment for a new image acquisition mission to take place. The image used in this study was acquired on 28 March 2016 at 0956 local time. The UAV is flown at an altitude of 180 m at 0900 to 1100 MST (Malaysian Standard Time) and the sky was having clear conditions. The size of the image is 3000 rows by 4000 columns and the image format is JPG. Fig. 1 illustrates the PRSS workflow.

B. Image Classification
The acquired image was initially processed using K-Means clustering algorithm [8]. The K-Means clustered image is later to be used together with the existing information of the study area in selecting the training pixels for Naïve Bayes classification later. The K-Means clustering algorithm is as follows.
1) An initial mean vector (point) is randomly specified for each of the K clusters. These points are to be the centre for each of the K clusters.
2) Next, the distances between every point of the image pixels and those centres are computed.
3) Each pixel is assigned to the cluster whose mean vector is the closest to the pixel vector. This leads to the formation of the first set of decision boundaries.
4) Based on the pixel vectors within each boundary, a new set of clusters mean vectors is then calculated and the pixels are reassigned accordingly to these new mean vectors.
5) The iterations are continued until there is no significant change in pixel assignments from one iteration to the next. Specifically, the magnitude of change from iteration ( ) to iteration summed over all K clusters can be expressed as: The clustered image produced from the K-Means clustering is used to assist in collecting the training pixels for Naïve Bayes classification. The general procedures in Naïve Bayes classification are as follows: 1) The number of land cover types within the study area is determined.
2) The training pixels for each of the desired classes are chosen using land cover information for the study area together with the cluster map produced from the K-Means clustering.
3) The training pixels are then used to estimate the mean vector and covariance matrix of each class. www.ijacsa.thesai.org 4) Finally, every pixel in the image is classified into one of the desired land cover types based on the predefined discriminant functions.
In Naïve Bayes classification, each class is enclosed in a region in spectral space where its discriminant function is larger than that of all other classes. These class regions are separated by decision boundaries, where the decision boundary between class i and j occurs when: In this study, the linear discriminant function and quadratic discriminant function are utilised.
For linear discriminant function, , thus: which can be rewritten as: This is a linear function in N dimensions that forms the decision boundary between class and .
For quadratic discriminant function, , thus: which can be rewritten as: This is a quadratic function in N dimensions that forms the decision boundary between class and .

C. Classification Accuracy
Classification accuracy is one of the key parameters required to judge the quality of land cover classification and can be defined as the degree to which the derived image classification conforms to the "truth" [20]. One of the most important components in accuracy assessment is reference pixels [21]. In this study, make use of Google Maps and the available ground truth knowledge of the study area in collecting the reference pixels [22]. To do so, a systematic sampling is performed where the chosen reference pixels are distributed in a predefined pattern. Studies have shown that the most widely used technique to analyse reference data is to use a confusion or error matrix [23]. A confusion matrix works by comparing classification result with reference information, while accuracy is conveyed in terms of percentage of overall classification accuracy and producer accuracy [24], [25]. The acceptable of overall accuracy is 85%, with no class less than 70% accurate [26]. Kappa statistics have been used as early as the 1980s as an additional classification accuracy measure to compensate for chance agreement [23].
Producer accuracy is a measure of the accuracy of a particular classification scheme and shows the percentage of a particular ground class that has been correctly classified. The minimum acceptable accuracy for a class is 70% [26]. This is calculated by dividing each of the diagonal elements in the table by the total of the column in which it occurs: where, A measure of behaviour of a classification can be determined by the overall accuracy, which is the total percentage of pixels correctly classified: where Q and U represent the total number of pixels and classes respectively. The minimum acceptable overall accuracy is 85% [28]. The Kappa coefficient  is a second measure of classification accuracy which incorporates the offdiagonal elements as well as the diagonal terms to give a more robust assessment of accuracy than overall accuracy. This is computed as: Where c a. is row sum and c .a is column sum. Fig. 2 shows the study area displayed in (a) RGB, (b) red, (c) green and (d) blue channel with the corresponding histograms. It is obvious that the study area has two main groups of which are natural and artificial land covers or objects. This scenario is indicated by the bimodal nature of the red, green and blue channel histogram. For all histogram, it can be seen that the separation of the natural and artificial objects occur at the valley that is about at DN of 120 in which natural object pixels correspond to the lower DN values while artificial object pixels correspond to the higher DN values. Fig. 3 shows the result of K-Means clustering for 5 clusters. By comparing with the RGB image in Fig. 2(a), most of the objects have been sensibly clustered. Due the nature of the K-Means clustering in which clustering process is merely based on statistical properties of the image, as expected there are clusters with more than one object and there are objects having more than one cluster. Shrub clusters (green) can be seen at the top right and bottom right of the image. There seems to be two road clusters with low-level road cluster (violet) stretches from the lower left to the upper right of the image while high-level road cluster (dark green) can be seen stretches from near the bottom middle to the top right of the image. Grassy ground cluster (maroon) can be seen mostly www.ijacsa.thesai.org between the shrub and low-level road cluster. Finally, vehicle cluster (turquoise) can be seen on both roads.  The outcome from the K-Means clustering is used to assist in selecting the training pixels for Naïve Bayes classification. In doing so, both the RGB and K-Means clustering image are displayed side by side and zoomed at the targeted objects. The zoom-in images for vehicle, grassy ground, shrub and road are shown in Fig. 4(a), (b), (c) and (d), respectively. This has provided a practically way for the spatial and spectral homogeneity criteria to be met in selecting the training pixels [8].

A. Naïve Bayes Classification using Linear Discriminant Analysis
For Naïve Bayes classification, the 9 classes identified are 1) Bright Vehicle, 2) Dark Vehicle, 3) Grassy Ground, 4) Shrub, 5) Road (Normal), 6) Road (Bright), 7) Road (Shadow), 8) Road Mark and 9) Steel Bridge. Due to the high image resolution, for Road class, three labels have been used to represent three different illumination conditions of the road. Fig. 5 shows plots of overall classification accuracy (top) and Kappa coefficient (bottom) versus training set size for the Naïve Bayes classification that is based on linear discriminant analysis. The 20 training set size gives highest overall classification accuracy (94.44%) and Kappa coefficient (0.9395) compared to the other sets. Plots of classification accuracy (producer accuracy) versus training set for all classes are shown in Fig. 6. It can be seen that Grassy Ground, Shrub and Road (Shadow) have the most stable accuracies for all training pixel sets while the least stable classes are Road Mark, Steel Bridge and Bright Vehicle. This is the due to the facts that stable classes have more abundant homogeneous pixels compared to least stable classes in which can be visually seen from the K-means clustering image in Fig. 3. For the rest of the classes, generally high classification accuracies are gained at smaller compared to bigger training sets sizes. Fig. 7 shows the Naïve Bayes classified image using linear discriminant analysis. From visual comparison with the RGB image in Fig. 2(a), it is obvious that the most objects are correctly classified except for Road Mark, Steel Bridge and Bright Vehicle. It can be seen that there are Bright Vehicle and Steel Bridge pixels that have been incorrectly assigned to the Road Mark class in which is also indicated by the confusion matrix in Table I. There also Road Mark pixels that have been incorrectly assigned to the Steel Bridge class and Bright Vehicle class. Table II shows

B. Naïve Bayes Classification using Quadratic Discriminant Analysis
For the Naïve Bayes Classification using quadratic discriminant analysis (Fig. 8), a gradual decrease in the overall accuracy can be seen as the training set size increases compared to that of using the linear discriminant analysis. The highest overall classification accuracy of 88.89% and the highest Kappa coefficient of 0.875 are shared by the 10 and 20 training set size. In term of individual class classification accuracy (producer accuracy) in Fig. 9, the most stable classes are Shrub, Road (Shadow) and Grassy Ground while the least stable classes are Road Mark, Steel Bridge and Road (Bright). A strange increasing trend occurs for Road (Bright). By comparing the linear and quadratic discriminant analysis plots, overall, quadratic trend looks smoother compared to linear discriminant trend in which likely due to the more flexible criteria of the quadratic discriminant decision space. The classes with somewhat common producer accuracy trends are Shrub, Grassy Ground, Road (Shadow) and Road (Normal) due to the abundant homogenous training pixels. The classes having the most distinct trends are Road (Bright), Dark Vehicle and Steel Bridge due to the least homogenous training pixels.     Fig. 10 shows the Naïve Bayes classified image using quadratic discriminant analysis. It is obvious that there are more incorrectly assigned pixels compared to that of the linear discriminant analysis. It can be seen that there are Road (Bright) pixels that have been incorrectly assigned to Steel Bridge class in which is also indicated by the confusion matrix in Table III. Table IV    The classes having the most distinct area sizes are Grassy Ground, Steel Bridge and Road (Bright). The linear shows a more realistic area percentage compared to the quadratic discriminant analysis particularly due to its Steel Bridge having higher is larger than other abundant objects such as Grassy Ground and Road (Bright).

V. CONCLUSION
In this study, Naïve Bayes classifications on a highresolution aerial image have been performed. K-means clustering of five clusters has been used as a guide in selecting the training pixels for the Naïve Bayes classification. The classification has been experimented for training set size 10 through 100 for linear and quadratic discriminant analysis. From, the classification outcomes, training set size 20 has been chosen due to having the highest overall classification accuracy and Kappa coefficient where the linear with 94.44% overall classification accuracy and 0.9395 Kappa coefficient is higher than the quadratic discriminant analysis with 88.89% overall classification accuracy and 0.875 Kappa coefficient. The producer accuracy for individual classes of linear and quadratic discriminant analysis has yielded the classes having similar trends due to the availability of abundant homogenous training pixels compared with the classes with distinct trends due to the least homogenous training pixels. The linear discriminant analysis has been found to produce more realistic class area percentages of the study area compared to the quadratic discriminant analysis, particularly for Steel Bridge. Nevertheless, the performance of Naïve Bayes classification is greatly influenced by the way the sampling of the training pixels is made in which is not investigated in this study. Therefore, future work will take into consideration investigating the effects of different patterns of systematic sampling of training pixels on classification performance.