Image Classification Considering Probability Density Function based on Simplified Beta Distribution

Method for image classification considering Probability Density Function (PDF) based on simplified beta distributions is proposed. In this paper, image classification for Synthetic Aperture Radar (SAR) data is concerned. In particular, Probability Density Function (PDF) of SAR data is followed by not multivariate normal distribution but Chi-Square like distribution. It, however, is not always true that the PDF of SAR data is followed by Chi-Square distribution. Due to the mismatch between Chi-Square distribution and actual distribution, classification performance gets worth. In this paper, simplified beta distribution is assumed for the PDF of the SAR data. Furthermore, it is used to add texture information to the SAR data when the Maximum Likelihood classification is applied. In the paper, “Contrast” of texture feature is added to the SAR data. Through the experiments with real SAR data, it is found that matching error between real PDF and the proposed simplified beta distribution is smaller than the normal distribution. It is also found that applying the proposed distribution-adaptive maximum likelihood method using the simplified beta-distribution could achieve a classification accuracy improvement of 94.7% and 12.1%. Keywords—Synthetic aperture radar (SAR); maximum likelihood classification: MLH; probability density function (PDF); simplified beta distribution


I. INTRODUCTION
Image classification methods are roughly divided into supervised classification methods and unsupervised classification methods. The Maximum Likelihood classification (MLH) 1 is the most commonly used supervised classification method. In the maximum likelihood image classification, a Probability Density Function (PDF) 2 of a training sample extracted from pixel data is originally assumed, which is defined as a likelihood function, and the pixel is classified into a class that maximizes the function.
As a probability density function, the fact that the multispectral image and the synthetic aperture radar image (assuming a distribution target (which is the case in many cases)) is a multidimensional normal distribution in reality and theoretically, is based on the theoretical basis. Because of their clarity and their simplicity of mathematical treatment, multidimensional normal distributions are often used exclusively.
However, spatial information of statistics (images) that follow a multidimensional normal distribution, especially the 1 https://en.wikipedia.org/wiki/Maximum_likelihood_estimation 2 https://en.wikipedia.org/wiki/Probability_density_function probability density function of texture information based on quadratic statistics, is known to theoretically be a Chi-Square distribution 3 . In order to apply to classification, considerations such as using Chi-Square distribution for likelihood function are necessary. Furthermore, the Chi-Square distribution has a domain from zero to infinity, but the actual second-order statistics are finite, and the lower limit is not always zero, but rather it is usually higher.
Therefore, when the Chi-Square distribution is assumed as the likelihood function, it is expected that a classification error based on the difference from the actual probability density function will occur. In order to improve this situation, we take the beta distribution as a probability density function that can set the domain arbitrarily and sufficiently approximate the Chi-Square distribution, and realize a likelihood function closer to the real distribution.
Generally, second order statistics of spatial information derived from optical sensor data is followed by Chi-Square distribution (because the original optical sensor data is used to be followed by multivariate normal distribution).On the other hand, Synthetic Aperture Data (SAR) 4 data is followed by the distribution which looks like Chi-Square distribution theoretically and essentially [1]. However, the actual distribution is not Chi-Square exactly. Therefore, classification performance is not good enough due to the mismatching of the assumed and the actual probability density functions.
Through investigation of PDF of SAR imagery data, it is found that a simplified beta distribution is much appropriate for MLH classification than the conventional MLH based classification based on multivariate normal distribution, or, Chi-Square distribution function. Thereby, a classification method that achieves higher classification accuracy is proposed here.
The following section d4escribes related research works and research background. Then the proposed method is described followed by experiment. After that, conclusion is described together with some discussions.

II. RELATED RESEARCH WORKS
Classification by re-estimating statistical parameters based on auto-regressive model is proposed [2]. Also, multitemporal texture analysis in TM classification is proposed [3]. www.ijacsa.thesai.org Meanwhile, Maximum Likelihood (MLH) TM classification taking into account pixel-to-pixel correlation is proposed and validated [4]. Supervised TM classification with a purification of training samples is proposed [5]. Moreover, classification method with spatial spectral variability is proposed [6].
Polarimetric SAR image classification with maximum curvature of the trajectory in eigen space domain on the polarization signature, on the other hand, is proposed [7] together with polarimetric SAR image classification with high frequency component derived from wavelet Multi Resolution Analysis (MRA) [8].
Comparative study of polarimetric SAR classification methods including proposed method with maximum curvature of trajectory of backscattering cross section in ellipticity and orientation angle space is conducted and well reported [9] together with comparative study on discrimination methods for identifying dangerous red tide species based on wavelet utilized classification methods [10].
Multi spectral image classification method with selection of independent spectral features through correlation analysis is proposed and validated [11]. These are basically based on MLH utilizing methods for PDF is followed by multivariate normal distributions [12].
Probability density function of texture features based on quadratic statistics is used to be used for image classification. Since the fact that the probability density function of the local variance as the spatial feature of the multiple spectral image follows the Chi-Square distribution is detailed in the literature [13]. The same is true for the probability density function of the texture feature of the secondary statistic as the spatial feature of the synthetic aperture radar image.
As described in [13], it is known that the normalized backscattering cross section coefficient (received power of synthetic aperture radar) of a so-called distributed target composed of multiple targets follows a multidimensional normal distribution. (However, in the case of an urban area, the cardinal effect is large, and it is doubtful whether this assumption is satisfied.) Therefore, the probability density function of the texture feature of the secondary statistic as the spatial feature follows the Chi-Square distribution (for example, Contrast, Chi-Square, etc.).     As described above, the histogram of the training sample in the feature space based on the texture feature amount is biased toward the maximum or minimum value of the domain, and the PDF is different from the normal distribution, which shows a shape close to Chi-Square distribution. However, the domain of the Chi-Square distribution is infinite. However, the domain of the training sample is finite. Therefore, here, we tried to use beta distribution, which can sufficiently approximate the Chi-Square distribution to approximate the probability density function and has a finite domain.

A. Multidemensional Normality
The advantages of the maximum likelihood method using the multidimensional normal distribution include those described in the previous section. The features are as follows.

 Symmetric distribution.
 Because the domain is infinite, some unclassified pixels Judgment criteria are required.
 Classification errors occur in training samples having a steep distribution shape, because values that are far from the average show a relatively gentle distribution.
If the training sample has a shape far apart from the normal distribution due to such a feature, it may cause a reduction in classification accuracy. In particular, in classification by synthetic aperture radar image and its feature space, the population has a shape far from normal distribution, and if the probability density function is approximated by normal distribution, it cannot be approximated sufficiently and the classification accuracy. It is considered that a decrease occurs. www.ijacsa.thesai.org B. Beta Distribution 6 beta distribution is expressed in equation (1).
f(x)=beta(p,q) -l xl -p (1-x) 1-q (1) were, beta (p, q) -1 is a beta function. Even if this is simplified as follows, the arbitrariness of the domain and the degree of approximation are not inferior.
f (x) = x l-P (1-x) 1-q (2) The advantage of the simplified beta distribution is that corresponds to steep distribution. It is easy to determine a pixel to be un-discriminated. The domain is finite and can handle asymmetric distributions.
Since there are two coefficients, function approximation can be performed relatively easily. For the above reasons, it is expected that the approximation degree of the probability density function of the training sample in the feature space by the texture feature amount will increase.

C. Classification Method
The proposed classification method is based on the MLH method considering the assumed PDF of simplified beta distribution. The traditional MLH based classification method assumes multidimensional normal distribution. It, however, does not work for SAR classification because the PDF of SAR imagery data does not followed multidimensional normal distribution. Therefore, most appropriate PDF for SAR imagery data is assumed to be the proposed simplified beta distribution.
The simplified beta distribution is taken as an alternative to the multidimensional normal distribution of the probability density function. The pixel value of the training sample of the texture feature amount extracted from the synthetic aperture radar image is set as follows.
x = (x l, x 2 ,…, x n ) t (3) where, η is the number of dimensions. The likelihood function in the maximum likelihood method is expressed in equation (4).
were, θ represents a class.

D. Method for Estimating Coefficients of Simplified beta Distribution
The method of estimating the coefficient of the simplified beta distribution was performed by the least squares method that minimizes the square of the distribution difference expressed by the following equation.
where, i indicates all training samples. This equation is partially differentiated with p and q as follows, and set to 0. Simultaneous equations were made from these two equations, and the following equations were used to find the parameters .p and q.
V. EXPERIMENT

A. Data used
An experiment was conducted on the SAR image data of Numazu, which was classified into three classes: urban area, forest, and sea. Classification using the simplified beta distribution also adds unidentified classes to it. Dimension 1 was used for the synthetic aperture radar original image, and dimension 2 was used for the contrast of texture features. The contrast of texture feature is defined as the sum of square of the probability of Grey Level Co-occurrence Matrix: GLCM defined by Haralick 7 .

B. Approximation Degree of Probability Density Function
How well the simplified beta distribution (continuous function) approximates the probability density function (discrete function <bar graph>) of the texture feature (contrast is shown as an example) as a secondary statistic obtained from the synthetic aperture radar image. Fig. 3 shows whether or not this has been achieved.
This figure is from a training sample in an urban area, and both agree well. The figure also shows the case of approximation by the normal distribution with a broken line, but the difference between this and the approximation by the simplified beta distribution is obvious. As a method for evaluating the degree of approximation of the probability density function, the following Kolmogorov-Smirnov test is generally used.
where, F is the probability density function of texture features (contrast is shown as an example) as secondary statistics obtained from the synthetic aperture radar image, and C is a simplified beta distribution approximating it, or It is a multidimensional normal distribution.
In the case of Fig. 2, the simplified beta distribution is 0.04, whereas that of the multidimensional normal distribution reaches 0.95, indicating that the approximation of the simplified beta distribution is high. Also, when the square error is evaluated, as shown in Table IV, it can be seen that the simplified beta distribution has a higher degree of approximation than the multidimensional normal distribution.

C. Comparison of Classification Performance
Tables II, III and IV show the normal distribution using only the SAR original data, the multidimensional normal distribution using the contrast of the SAR original data and texture features, and the discrimination efficiency matrix of the maximum likelihood method based on the simplified beta distribution.
The author also tested the significance of the proportion of forests with the best improvement in discrimination efficiency. Taking the null hypothesis that the ratio of the discriminant efficiency matrix is meaningless, we performed a two-sided test of the ratio at the 5% significance level with the alternative hypothesis meaningful. The results were as follows.  Since all the results fall into the rejection area, the ratio of this discrimination efficiency matrix is also significant. In addition, the null hypothesis that the difference between the ratios of these two discrimination efficiency matrices has no meaning was subjected to a two-sided test at the significance level of 5% of the difference between the ratios, assuming that the alternative hypothesis was significant, became 2 = 18.21> 1.96 Since they fall in the rejection area, they can be said to have statistical significance. Percent Correct Classification: PCC is metric of the classification performance. The PCC of the MLH based on normal distribution with SAR imagery data is 80.8 % while the PCC of the MLH based on normal distribution with SAR imagery data and the texture feature of contrast is 84.5 %. On the other hand, the PCC of MLH based on beta distribution with SAR imagery data and the texture feature of contrast is 94.6 %. Table V shows accuracy of approximation of the PDF of the original SAR data and the texture feature of Contrast for Multi-variate normal and simplified beta distributions. When the MLH classification is applied only to the SAR original data, the classification accuracy is 80.8%, and even if the maximum likelihood method is applied by adding the texture feature contrast to the SAR original data, the classification accuracy is only 84.5%. However, it was found that applying the proposed distribution-adaptive maximum likelihood method using the simplified beta-distribution could achieve a classification accuracy improvement of 94.7% and 12.1%.

VI. CONCLUSION
It is confirmed that the proposed adaptive maximum likelihood image classification using the simplified beta distribution is more accurate than the conventional maximum likelihood method assuming a multidimensional normal distribution. In other words, when the maximum likelihood classification is applied only to the SAR original data, the classification accuracy is 80.8%, and even if the maximum likelihood method is applied by adding the texture feature contrast to the SAR original data, the classification accuracy is only 84.5%. However, it was found that applying the proposed distribution-adaptive maximum likelihood method using the simplified beta-distribution could achieve a classification accuracy improvement of 94.7% and 12.1%.

VII. FUTURE RESEARCH WORKS
Further research works are required for the other alternative probability density function of the texture feature www.ijacsa.thesai.org of Contrast and the other second order texture features. Also, other experimental studies with the other SAR data. Moreover, a comparative study is required for the other classification methods such as Support Vector Machine, etc. with the proposed simplified beta distribution.

ACKNOWLEDGMENT
The author would like to thank Prof. Dr. Hiroshi Okumura and Prof. Dr. Osamu Fukuda of Saga University for their valuable comments and suggestions.