Combined Non-parametric and Parametric Classification Method Depending on Normality of PDF of Training Samples

Classification method with combined nonparametric and parametric classifications which depends on the normality of Probability Density Function of training samples is proposed. The proposed classification method is also based on spatial information for high spatial resolution of satellite based optical sensor images is proposed. Also, a classification method which takes into account not only spectral but also spatial features for LANDSAT-4 and 5 Thematic Mapper (TM) data is proposed. Treatment of the spatial-spectral variability existing within a region is more important for such high spatial resolution of satellite imagery data. Standard deviations in small cells, such as 2x2, 3x3 and 4x4 pixels, were used as measures to represent the spatial-spectral variabilities. This information can be used together with conventional spectral features in a unified way, for the traditional classifier such as the pixelwise Maximum Likelihood Decision Rule (MLHDR). The classification performance of new clear cuts and alpine meadows which are very close in spectral space characteristics and difficult to distinguish them by conventional methods are focused. Through experiments, it is found that there is a substantial improvement in overall classification accuracy for TM forestry data. The Probability of Correct Classification (PCC) for the new clear cuts and the alpine meadows classes rose by 7% to 97% correct. The confusion between alpine meadows and new clear cuts was reduced from 9% to 3%. Keywords—Spectral information; spatial information; maximum likelihood decision rule; satellite image; image classification; classification performance; instantaneous field of view


I. INTRODUCTION
Maximum Likelihood Decision Rule-based classification method: MLHDR is widely used for satellite imagery data classification. There are some assumptions of the MLHDR, such as (1) there is no correlation among the pixels, (2) Probability Density Function: PDF of the training samples is normal distribution, and so on. These assumptions, however, are not always true. There are pixel-to-pixel correlations and spatial information which can be extracted from the satellite imagery data and can be used for image classification are followed by non-normal distribution. The proposed image classification method uses not only spectral but also spatial information and allows classification with non-normal distribution of PDF of the training samples. Spatial resolution of spaceborne based optical sensors is improved remarkably. Classification performance is getting down because the variances of the specified class categories are getting large in accordance with spatial resolution which results in increasing overlapped areas among the class categories in the feature space for classification. It might be possible to improve classification performance by including spatial information into classification (Spatial information is getting large in accordance with spatial resolution).
Due to the increased spatial resolution of TM (30m compared with 80m for Multi Spectral Scanner: MSS), the number of ground cover spectral classes which are included in the Instantaneous Field of View (IFOV), decreases comparatively. This implies that spatial spectral variability for TM data increases in comparison to MSS. With the increase of the number of spectral bands and quantization bits of Landsat Thematic Mapper: TM (30 m of Instantaneous Field of View: IFOV (spatial resolution) of optical sensor), the discrimination ability or classification accuracy of the surface observation object improved [1], [2], [3]. However, improvement in spatial resolution does not always have a favorable effect on classification accuracy. The spatial frequency components of the object to be observed are widely distributed. The spatial frequency components are integrated with an area corresponding to the spatial resolution and sampled to obtain image data. At this time, the variance of the data increases as the spatial resolution increases. Therefore, with the improvement of spatial resolution, the class variance in the feature space increases, and the classification accuracy tends to deteriorate [4]. Each institution has proposed a method that can cope with this [5]- [14], but here the author proposes a method that also uses the spatial information that has been increased together with the spectral information as the spatial resolution improves. Spatial information can be roughly divided into (1) Context and (2) Texture, both of which use feature values defined within a relatively large window area. This is because the size of the feature quantity depends on the size of the window area, and a large window is required for the spatial resolution of the MSS. However, in the TM image, the spatial information is increased, so that even a small window is effective, and the physical meaning of the variation or difference in the spectral information between the surrounding pixels and the immediate 310 | P a g e www.ijacsa.thesai.org pixel is clear. Furthermore, the definition of spatial information that has already been proposed and its application to classification methods are often complex, and require a lot of processing time.
Based on the above, this paper defines the standard deviation of pixel values in a small window (cell consisting of several pixels) as spatial information, adds it to the dimension of the spectral information in the feature space, and uses the maximum likelihood classification. The author proposes a method to do it. Although the standard deviation within this cell was proposed by Strahler et al [15], the basis for using the standard deviation was not clear [16], and the consideration of the optimal cell size, the probability density function, its normality, etc., was lacking [17]. In this paper, the author clarifies these and mention the effect and practicality of this method based on statistical features [18].
In the following section, related research works and research background including motivation of the research are described. Then, the proposed context classification method is described followed by experimental method together with experimental results. After that concluding remarks and some discussions are described.

II. RELATED RESEARCH WORKS
Classification by re-estimating statistical parameters based on auto-regressive model is proposed for purification of training samples [19]. Meanwhile, multi-temporal texture analysis in TM classification is proposed for high spatial resolution of optical sensor images [20]. On the other hand, Maximum Likelihood (MLH) TM classification taking into account pixel-to-pixel correlation is proposed [21].
Supervised TM classification with a purification of training samples is proposed [22] together with TM classification using local spectral variability is proposed [23]. A classification method with spatial spectral variability is also proposed [24] together with TM classification using local spectral variability [25].
Application of inversion theory for image analysis and classification is proposed [26]. Meanwhile, polarimetric SAR image classification with maximum curvature of the trajectory in eigen space domain on the polarization signature is proposed [27]. On the other hand, a hybrid supervised classification method for multi-dimensional images using color and textural features is proposed [28].
Polarimetric SAR image classification with high frequency component derived from wavelet multi resolution analysis: MRA is proposed [29]. Comparative study of polarimetric SAR classification methods including proposed method with maximum curvature of trajectory of backscattering cross section in ellipticity and orientation angle space is conducted and well reported [30].
Comparative study on discrimination methods for identifying dangerous red tide species based on wavelet utilized classification methods is conducted [31]. On the other hand, multi spectral image classification method with selection of independent spectral features through correlation analysis is proposed [32]. Image retrieval and classification method based on Euclidian distance between normalized features including wavelet descriptor is proposed [33].
Image classification considering probability density function based on Simplified beta distribution is proposed and evaluated its performance [34]. Meanwhile, maximum likelihood classification based on classified result of boundary mixed pixels for high spatial resolution of satellite images is proposed [35]. Also, context classification based on mixing ratio estimation by means of inversion theory is proposed [36]. Optimum spatial resolution of satellite-based optical sensors for Maximizing Classification: MLH performance is discussed [37].

A. Problems in Classifying High Spatial Decomposition
Images When the class defined for the Multi Spectral Scanner: MSS (80 m of IFOV of optical sensor onboard the same satellite of Landsat) image is directly applied to the TM image, for example, if the same training area is set for both data obtained by observing the same object at the same time, the TM image has a larger class variance. This is due to the high radial and spatial resolution of TM. Both data can be obtained by sampling and quantizing the observation object that is continuous both spatially and radiometrically. However, when the sampling frequency and the number of quantization bits increase, the variance of the observation value increases.
However, the class variance also increases, leading to a decrease in the degree of separation between classes, and as a result, the classification accuracy decreases. On the other hand, the spatial information included in the image increases as the sampling frequency and the number of quantization bits increase. Therefore, an improvement in accuracy can be expected by considering spatial information in the classification of high-resolution images.

B. Classification Method in Concern
Image classification methods can be divided into the following four categories.
In this paper, the author examined the supervised maximum likelihood method per plane element which was effective for Landsat MSS data for the following reasons.
1) Unsupervised classification is difficult to interpret the physical meaning of the class after classification, and is suitable as a preliminary classification. However, correspondence with classes according to purpose of use such as wheat fields, paddy fields, soybean fields in agricultural areas is difficult.
2) Since the radiometric resolution of TM is improved compared to MSS, the spectrally homogeneous spatial region is narrowed, and cell-based classification is not effective. 311 | P a g e www.ijacsa.thesai.org 3) Nonparametric methods generally require more computation time and storage capacity than parametric methods.
On the other hand, when applying the maximum likelihood method using only the spectrum information per plane element to the TM image, the following points need to be considered.
1) Definition of spatial information and its application to classification methods.
2) Multidimensional normality of probability density function of observation vector.
In this paper, especially for (1), the author proposes a method that defines the standard deviation in the cell as spatial information and uses it as one dimension of the feature space.

A. Process Flow of the Proposed Classification Method
The proposed method is based on MLHDR which is based on multi variate normal distribution of probability density function [16] which is expressed with equation (1) and (2).
where Dij denote separability between class i and j and p is the number of dimensions, Σi is the covariance matrix of the class, x is the observation vector, μi is the mean vector of class i, and t, -1 are the transposed matrix and the inverse matrix, respectively.
The degree of separation between classes is tested using only spectral information, and the following method is applied only to classes that do not show a satisfactory value (for example, divergence of 500 or more). That is, by selecting the spectral band exhibiting the largest variance and examining the spatial frequency components of the class, the optimal cell size is determined, and the standard deviation within the cell is calculated by moving the cell by one pixel. The leverage result is added to the spectrum space as a new dimension (this is called an integrated feature space). After that, the multidimensional normality in the integrated feature space is tested again to confirm the degree of separation, and then the result of classification is applied by applying the maximum likelihood classification to this result, and the final classification image is obtained. Fig. 1 shows process flow of the proposed classification method.

B. Standard Deviation in the Cell in Concern as a Texture Information
The variance of pixel values in a cell is essentially equal to "contrast" c, one of the well-known texture measures, with a constant multiple.
where i and j are the pixels and line numbers, N is the number of pixels in the cell, Nρ is the number of pixel pairs in the cell, x ij is the value at pixel position ij, and} {is the average value in the cell. is there.
According to Percival's theorem, the following equation holds between the power spectrum and σ c where, f (x) is a value at a pixel position x, and F(s) is a function obtained by performing Fourier transform in a frequency domain. Eq. (5) shows that the variance of the pixel values in the cell is closely related to the power spectrum and the contrast as texture information.

C. Probability Density Function of Standard Deviation and Variance in the Cell in Concern
Assuming now that the observed value population follows the multidimensional normal distribution, the probability density function of the sample variance and sample standard deviation in the cell is obtained, and each normality is tested. Fig. 2 shows an example of calculating each probability density function (see Reference [17]) for a population variance 2 of 10 to 60, using the cell size n as a parameter. Here, assuming a square cell, n = 4 (2 × 2) and 16 (4 × 4) were selected as parameters. 312 | P a g e www.ijacsa.thesai.org  The normality improves as this approach infinity. From the viewpoint of normality, it is known that the standard deviation (square root transformation) is superior to the variance, and that the cubic root transformation is optimal. (On the other hand, since the tendency tends to be the opposite from the viewpoint of spatial information, if normality is satisfied, it is better to use the flat-root transformation and the variance itself than the cubic root transformation. Standard deviation was used, and the normality of a single variable was judged to be satisfied if the χ 2 value was less than 5 in the χ 2 test.

A. Data used
The data used for analysis is TM data acquired by Landsat 5 on August 15, 1984, and an image processing system maintained at the Canada Center for Remote Sensing: CCRS; radiometric and geometric corrected, Geocoded data by MOSAICS system. Geocoded Data is proposed for the first time in Canada and has pixel intervals determined so that it can be easily combined with not only heterogeneous sensor data (aircraft, satellites, etc.) but also topographic maps and administrative information. This is the corrected map reference data.
The analysis area is a forested area at Spruce-Balsam in Cranbrook, British Columbia, Canada. These include logging areas, alpine meadows, rocks, rivers, small lakes, etc., in addition to forests. Fig. 3 shows a forest polygon overlay (dark blue) on the Landsat-5 TM data. Spruce-Balsam polygons have been extracted and used as a mask to cluster the TM data. The inhomogeneity of the polygons is demonstrated by the multiple classes (red, yellow, white, blue) found inside the polygons (arrow). Logging areas are further divided into two types: those that are less than 5 years after logging and those that are 5 to 40 years later. The former is called the new logging area, and the latter is called the old logging area. The former consists of topsoil, grassland, stumps, clumps of cut down trees, roads, young trees with low height, etc. The latter consists of young trees with relatively high heights, grassland, roads, etc. Therefore, four classes were set up here: (1) new logging area, (2) old logging area, (3) alpine grassland and (4) forest.

B. Separability between Classes
From the components and their ratios, the spectral characteristics of the new cutting area and the alpine meadows are expected to be very close. Fig. 4(a) shows the class distribution in the feature space of TM bands 4 and 5 (TM-9, 5), confirming the above prediction. Here, the size of the training field of each class is 956, 270, 435, and 1377 pixels for the new logging area, old logging area, alpine meadows, and forests, respectively.
The contour in the figure corresponds to twice the standard deviation of the two-dimensional normal distribution. From this figure, it can be seen that the freshly cut area and the alpine meadows are short in distance and the dispersion is relatively large, so the degree of separation is low. 313 | P a g e www.ijacsa.thesai.org Fig. 4(b) shows the distribution of each class in the space between the TM-4 and the standard deviation of the pixel values in the 2x2 cell of the TM-4 image. It suggests that the degree could be improved. Fig. 5 shows a typical 16 × 16 pixels window in the training field of each class, and the power spectrum is displayed by removing the DC component in the TM-4 window from each pixel value is there.
The display method makes it easy to see the difference in spectrum between each class by subtracting the D.C. component of the dominant power spectrum. Table I shows the average and standard deviation of TM-4 and power spectrum.
From the above, it can be seen that it is difficult to separate classes between the newly harvested area and the alpine meadows using only spectral information, but it is possible to improve the degree of separation by adding spatial information to this information. Here, the standard deviation is obtained by normalizing the minimum / maximum value in an image composed of 512 × 512 pixels to data of 0 to 255 (8 bits).   Fig. 6 shows the case where only the spectral information of TM-1 to TM-5 and 7 is used, and the case where the information indicating the standardized standard deviation within 2 × 2 cells for TM-4 is added 3 shows a maximum likelihood classification image. Table II shows the discrimination efficiency matrices for both data sets. The average discrimination efficiency can be improved by about 3.7% from 94.81% to 9828% by adding the information of the standard deviation, and the misclassification rate between the new cutting area and alpine meadows can be reduced from 9% to 3% understood.
When spectral information was limited to TM-4 and 5, the discrimination efficiency * for newly felled areas and alpine meadows was 65.3% and 87.1%. When the standard deviation was added, it became 83.2% and 91.4%, and it was found that the discrimination efficiency of 27.4% and 4.9% was improved.
The discrimination efficiency is an element of the discrimination efficiency matrix (confusion matrix), and is a percentage ratio of the pixels in the set training region and the region set for the discrimination efficiency evaluation in each class.

D. Effect of Cell Size
The standard deviation within a cell increases with cell size. Therefore, if the region is limited to a region having the same spectrum, the discrimination efficiency of each class monotonically increases as shown by the solid line in Fig. 7. However, if a cell exists at the boundary between classes, the spatial information is not class specific. Therefore, the discrimination efficiency of the area within the range of the cell size from the boundary is affected by this. This effect depends on the difference between the spatial information of adjacent class questions, the boundary shape, and the like, and can be reduced by reducing the cell size. If the area is set to include the boundary and the relationship between the cell size and the discrimination efficiency is determined, the result is as shown by the broken line in Fig. 6. By size, discrimination efficiency shows a peak. For this reason, a 2 × 2 cell size is set as the optimum value here, but the optimum cell size is generally determined based on the spatial frequency components of the class to be classified.

VI. CONCLUSION
The proposed method using the spectral information and the standard deviation of the pixel values in a small cell has the following features: 1) The spectral characteristics are very similar, and are effective for classes with different spatial information, respectively.
2) Unlike the other texture measures, the standard deviation of the pixel value in the cell is relatively close to a normal distribution, so that it can be simply used together with spectral information.
3) Therefore, compared with the classification method using other spatial information, it does not require complicated processing and is practical.
Also, the proposed image classification method allows classify images whose PDF is not followed by normal distribution because it supports nonparametric classification.

VII. FUTURE RESEARCH WORKS
The proposed method is adopted in the real earth observation satellite imagery data, and it is a future subject to realize a more usable classification method.