Decision Tree Classification of Remotely Sensed Satellite Data using Spectral Separability Matrix

In this paper an attempt has been made to develop a decision tree classification algorithm for remotely sensed satellite data using the separability matrix of the spectral distributions of probable classes in respective bands. The spectral distance between any two classes is calculated from the difference between the minimum spectral value of a class and maximum spectral value of its preceding class for a particular band. The decision tree is then constructed by recursively partitioning the spectral distribution in a Top-Down manner. Using the separability matrix, a threshold and a band will be chosen in order to partition the training set in an optimal manner. The classified image is compared with the image classified by using classical method Maximum Likelihood Classifier (MLC). The overall accuracy was found to be 98% using the Decision Tree method and 95% using the Maximum Likelihood method with kappa values 97% and 94 % respectively. In this paper, an attempt has been made to develop a decision tree classification algorithm specifically for the classification of remotely sensed satellite data using the separability matrix of spectral distributions of probable classes. The computational efficiency is measured in terms of computational complexity measure. The proposed algorithm is coded in Visual C++ 6.0 language to develop user-friendly software for decision tree classification that requires a bitmap image of the area of interest as the basic input. For the classification of the image, the training sets are chosen for different classes and accordingly spectral separability matrix is obtained. To evaluate the accuracy of proposed method, a confusion matrix analysis was employed and kappa coefficient along with errors of omission and commission were also determined. Lastly, the classified image is compared with the image classified by using classical method MLC.


INTRODUCTION
Image classification is one of the primary tasks in geocomputation, that being used to categorize for further analysis such as land management, potential mapping, forecast analysis and soil assessment etc. Image classification is method by which labels or class identifiers are attached to individual pixels on basis of their characteristics.These characteristics are generally measurements of their spectral response in various bands.Traditionally, classification tasks are based on statistical methodologies such as Minimum Distance-to-Mean (MDM), Maximum Likelihood Classification (MLC) and Linear Discrimination Analysis (LDA).These classifiers are generally characterized by having an explicit underlying probability model, which provides a probability of being in each class rather than simply a classification.The performance of this type of classifier depends on how well the data match the predefined model.If the data are complex in structure, then to model the data in an appropriate way can become a real problem.
In order to overcome this problem, non-parametric classification techniques such as Artificial Neural Network (ANN) and Rule-based classifiers are increasingly being used.Decision Tree classifiers have, however, not been used widely by the remote sensing community for land use classification despite their non-parametric nature and their attractive properties of simplicity, flexibility, and computational efficiency [1] in handling the non-normal, non-homogeneous and noisy data, as well as non-linear relations between features and classes, missing values, and both numeric and categorical inputs [2].
In this paper, an attempt has been made to develop a decision tree classification algorithm specifically for the classification of remotely sensed satellite data using the separability matrix of spectral distributions of probable classes.The computational efficiency is measured in terms of computational complexity measure.The proposed algorithm is coded in Visual C++ 6.0 language to develop user-friendly software for decision tree classification that requires a bitmap image of the area of interest as the basic input.For the classification of the image, the training sets are chosen for different classes and accordingly spectral separability matrix is obtained.To evaluate the accuracy of proposed method, a confusion matrix analysis was employed and kappa coefficient along with errors of omission and commission were also determined.Lastly, the classified image is compared with the image classified by using classical method MLC.

II. RELATED WORK
The idea of using decision trees to identify and classify objects was first reported by Hunt et al. [3].Morgan and Sonquist [4] developed the AID (Automatic Interaction Detection) program followed by the THAID developed by Morgan and Messenger [5].Breiman et al. [6] proposed the CART (Classification and Regression Trees) to solve classification problems.Quinlan [7] developed a decision tree software package called ID3 (Induction of Decision Tree) based on the recursive partitioning greedy algorithm and information theory, followed by the improved version C4.5 addressed in [2].Buntine [8] developed IND package using the standard algorithms from Brieman's CART and Quinlan's ID3 and C4.It also introduces the use of Bayesian and minimum length encoding methods for growing trees and graphs.Sreerama Murthy [9] reported the decision tree package OC1 (Oblique Classifier 1) which is designed for applications where instances have numeric continuous feature values.Friedl and Brodley [10] have used decision tree classification method to classify land cover using univariate, multivariate and hybrid decision tree as the base classifier and found that hybrid decision tree outperform the other two.Mahesh Pal and Paul M. Mather [11], [12] have suggested boosting techniques for the base classifier to classify remotely sensed data to improve the overall accuracy.Min Xu et al. [13] have suggested decision tree regression approach to determine class proportion within the pixels so as to produce soft classification for remote sensing data.Michael Zambon et al. [14] have used rule based classification using CTA (Classification Tree Analysis) for classifying remotely sensed data and they have found that Gini and class probability splitting rules performs well compared to towing and entropy splitting rules.Mahesh pal [15] have suggested ensemble approaches that includes boosting, bagging, DECORATE and random subspace with univariate decision tree as the base classifier and found that later three approaches works even well compared to boosting.Mingxiang Huang et al. [16] have suggested Genetic Algorithm (GA) based decision tree classifier for remote sensing data SPOT -5 and found that GA-based decision tree classifier outperform all the classical classification method used in Remote sensing.Xingping Wen et al. [17] have used CART (Classification and Regression Trees) and C5.0 decision tree algorithms for remotely sensed data.Abdelhamid A. Elnaggar and Jay S. Noller [18] have also found that decision tree analysis is a promising approach for mapping soil salinity in more productive and accurate ways compared to classical classification approach.

III. DECISION TREE APPROACH
A decision tree is defined as a connected, acyclic, undirected graph, with a root node, zero or more internal nodes (all nodes except the root and the leaves), and one or more leaf nodes (terminal nodes with no children), which will be termed as an ordered tree if the children of each node are ordered (normally from left to right) (Coreman et al. [19]).A tree is termed as univariate, if it splits the node using a single attribute or a multivariate, if it uses several attributes.A binary tree is an ordered tree such that each child of a node is distinguished either as a left child or a right child and no node has more than one left child or more than one right child.For a binary decision tree, the root node and all internal nodes have two child nodes.All non-terminal nodes contain splits.
A Decision Tree is built from a training set, which consists of objects, each of which is completely described by a set of attributes and a class label.Attributes are a collection of properties containing all the information about one object.Unlike class, each attribute may have either ordered (integer or a real value) or unordered values (Boolean value).
Several methods (Breiman et.al [6], Quinlan [2] and [7]) have been proposed to construct decision trees.These algorithms generally use the recursive-partitioning algorithm, and its input requires a set of training examples, a splitting rule, and a stopping rule.Partitioning of the tree is determined by the splitting rule and the stopping rule determines if the examples in the training set can be split further.If a split is still possible, the examples in the training set are divided into subsets by performing a set of statistical tests defined by the splitting rule.The test that results in the best split is selected and applied to the training set, which divides the training set into subsets.This procedure is recursively repeated for each subset until no more splitting is possible.
Stopping rules vary from application to application but multiple stopping rules can be used across different applications.One stopping rule is to test for the purity of a node.For instance, if all the examples in the training set in a node belong to the same class, the node is considered to be pure (Breinman et.al [6]) and no more splitting is performed.Another stopping rule is by looking at the depth of the node, defined by the length of the path from the root to that node (Aho et.al. [20]).If the splitting of the current node will produce a tree with a depth greater than a pre-defined threshold, no more splitting is allowed.Another common stopping rule is the example size.If the number of examples at a node is below a certain threshold, then splitting is not allowed.Four widely used splitting rules that segregates data includes: gini, twoing, entropy and class probability.The gini index is defined as: where pi is the relative frequency of class i at node t, and node t represents any node (parent or child) at which a given split of the data is performed (Apte and Weiss [21]).The gini splitting rule attempts to find the largest homogeneous category within the dataset and isolate it from the remainder of the data.Subsequent nodes are then segregated in the same manner until further divisions are not possible.An alternative measure of node impurity is the towing index: where L and R refer to the left and right sides of a given split respectively, and p(i|t) is the relative frequency of class i at node t (Breiman [22]).Twoing attempts to segregate data more evenly than the gini rule, separating whole groups of data and identifying groups that make up 50 percent of the remaining data at each successive node.Entropy is a measure of homogeneity of a node and is defined as: where p i is the relative frequency of class i at node t (Apte and Weiss [21]).The entropy rule attempts to identify splits where as many groups as possible are divided as precisely as possible and forms groups by minimizing the within group diversity (De'ath and Fabricius [23]).Class probability is also based on the gini equation but the results are focused on the probability structure of the tree rather than the classification structure or prediction success.The rule attempts to segregate the data based on probabilities of response and uses class probability trees to perform class assignment (Venables and Ripley [24]).
From the above discussions, it is evident that a decision tree can be used to classify a pixel by starting at the root of the tree and moving through it until a leaf is encountered.At each nonleaf decision node, the outcome for the test at the node is determined and attention shifts to the root of the sub-tree corresponding to this outcome.This process proceeds until a leaf is encountered.The class that is associated with the leaf is the output of the tree.A class is one of the categories, to which pixels are to be assigned at each leaf node.The number of classes is finite and their values must be established beforehand.The class values must be discrete.A tree misclassifies a pixel if the class label output by the tree does not match the class label.The proportion of pixels correctly classified by the tree is called accuracy and the proportion of pixels incorrectly classified by the tree is called error (Coreman et.al., 1989).

IV. THE PROPOSED CRITERIA
To construct a classification tree, it is assumed that spectral distributions of each class are available.The decision tree is then constructed by recursively partitioning the spectral distribution into purer, more homogenous subsets on the basis of the tests applied to feature values at each node of the tree, by employing a recursive divide and conquer strategy.This approach to decision tree construction thus corresponds to a top-down greedy algorithm that makes locally optimal decisions at each node.Steps involved in this process can be summarized as below:  Optimal band selection: By using the separability matrix, a threshold and band is chosen in order to partition the training set in an optimal manner.
 Based on a partitioning strategy, the current training set is divided into two training subsets by taking into account the values of the threshold.
 When the stopping criterion is satisfied, the training subset is declared as a leaf.
The separability matrices are obtained in the respective bands, by calculating the spectral distance between pairs of classes in row and column.The spectral distance between any two classes is calculated from the difference between the minimum spectral value of a class and maximum spectral value of its preceding class for a particular band.More the spectral distance, maximum is the separability.For spectral distance less than zero there will be overlapping between classes.

A. Splitting Criteria
An attempt has been made to define splitting rules by calculating the separability matrix to split the given set of classes into two subsets where the separability is maximum or the amount of spectral overlapping is minimum between two subsets.That is, the split group together a number of classes that are similar in some characteristic near the top of the tree and isolate single class in the bottom of the tree.First the spectral classes are arranged in ascending order based on the value of their midpoints.Lower the value of the midpoint lowers the class orders.Here the midpoint of a class is calculated by (Min+ Max)/2.This will result in reduction of the matrix computation/memory allocation, wherein only the upper diagonal elements are to be considered for finding the threshold.The threshold value is calculated, as the midpoint between the spectral distributions of the classes in the band where, the separability is maximum or overlapping is minimum.Once the split is found, this is used to find the subsets at each node.Step 2:

B. Determining the Terminal
Step 3: Find the threshold value, which will divide the set of classes S into two subsets of classes, say S L and S R , such that the separability between a pair of classes is maximum.The procedure is as follows.
Case 1: In all the K matrices, consider only the rows having all the elements to the right of the diagonal element are positive.If no such row exists in all the matrices go to case 2.
Find the minimum element from each such row.Find a maximum from all such minimum elements.Let this element be b rc m which is at the th r row and th c column of the matrix b M .That is, then the split is in between the classes represented by the row(r) and column(c) in band b.
. Find a maximum of all such minima.Call it as EF2 and then go to case 3.
i) If no, go to step 4.
Find the maximum of all such minima.Call it as EF3.
If EF2  EF3, let 2 T T  and BAND  BAND2, go to step 4.
Step 4: Assign the threshold (T) & BAND to the node N.
Step 5: Find the subsets of classes; say left subset ( L S ) and the right subset ( R S ) as follows.A. Special Case When n=2, and these two classes are overlapping in all the bands, we can construct a decision tree in the following ways: Consider the following distributions: We choose way 1, when the node A is the left child of its parent and way 2, when it is the right child of its parent.Because, when the node A is on the left branch and the threshold in node A and the threshold in its parent are chosen from the same band, then it will reduce the error in classification and the same reason applies for the other one.

VI. APPLICATION AND CASE STUDY
To validate the applicability of the proposed decision tree algorithm, a case study is presented in this section, which is carried out on IRS-1C/LISS III sample image with 23m resolution.The FCC (False Color Composite) of the input image (Figure 1) belongs to the Mayurakshi reservoir (Latitude: 240 09' 47.27'' N and Longitude: 87017'49.41"E) of Jharkhand state (INDIA) and band used to prepare FCC includes -Red (R), Green (G), Near Infrared (NIR).The IRS-IC/LISS III data of Mayurakshi reservoir was acquired on October, 2004.The input image is first converted into a bitmap, and then used as input for the DTCUSM software.As per the software requirement the training set along with test set were selected.For the present study, eleven classes viz., Turbid water, Clear water, Forest, Dense Forest, Upland Fallow, Lateritic Cappings, River Sand, Sand, Drainage, Fallow, and Wetland are considered.The training set for these eleven classes is chosen considering the prior knowledge of the hue, tone and texture of these classes, in addition, physical verification on ground were also done for each of these class.The spectral class distributions for the training set taken from the input image are shown in Table I.
Once the training for all the classes is set, the proposed decision tree algorithm is applied to classify the image.The decision tree construction steps for the spectral class distributions given in Table 1 are shown in Figure 2. The output image after applying the proposed Decision Tree Classification method is shown in Figure 3.
To assess accuracy of the proposed technique, the confusion matrix along with the errors of omission and commission and overall accuracy, kappa coefficient (Jensen, 1996) are obtained and shown in Table II.For the comparison purpose, the same image is classified again by using Maximum Likelihood Classifier (Jensen, 1996) with same training sets as used in Decision tree classification.The classified image by Maximum Likelihood method is shown in Figure 4.The percentage of pixels in each class is given for both the cases in Figure 5.

VII. SUMMARY AND CONCLUSION
In this paper, a decision tree classification algorithm for remotely sensed satellite data using separability matrix of spectral distributions of probable classes has been developed.To test and validate the proposed Decision Tree algorithm, the sample image taken into consideration is multi-spectral IRS-1C/LISS III of Mayurakshi reservoir of Jharkhand state.The proposed Decision Tree classifier can also be used for hyperspectral remote sensing data considering the best bands as input for preparing spectral class distribution.The sample image is classified by both Decision Tree method and Maximum Likelihood method and then the overall accuracy, kappa coefficients were calculated.The overall accuracy for the sample test image was found to be 98% using the Decision Tree method and 95% using the Maximum Likelihood method with kappa values 97% and 94 % respectively.The reason for high accuracy may be to some extent attributed for the reason that the part of the training set is being considered as ground truths instead of actual data.Since the accuracy of the results depends only upon the test set chosen, the efficiency of any algorithm shall not be considered on the accuracy measure alone.Out of eleven classes considered for the sample image, many classes were found to be closely matching in both the methods.However, differences are observed in certain classes in both the methods.The classified images shall also be compared with the input image (FCC) and collecting ground truth information physically.From the comparison, it is found that both the methods are equally efficient, but the decision tree algorithm will have an edge over its statistical counterpart because of its simplicity, flexibility and computational efficiency.

Case 2 :
In all K matrices, consider only the rows having at least one positive element.Find the minimum element from each such row.Find a maximum from all such minimum elements.Let this element be b rc m which is at the th r row and th c column of the matrix b M .That is, r b If then the split is in between the classes represented by the row(r) and column(c) in band b.Find the threshold (T  T2) as in case 1.Let BAND  BAND2  b.Check whether the threshold T lies in any of the spectral range in the band b, except for the class distributions represented by r and c.If yes, compute

Case 3 :
From all the upper triangular elements of all K matrices, find the minimum negative element; say b rc m which is at the th r row and th c column of the matrix b M .

Step 6 :
Set of classes having distributions with range maximum or mid-point T  in band BAND and let L n be the number of classes in L S . R S Set of classes having distributions with range minimum or mid-point  T in band BAND and let R n be the number of classes in R S .Initialize the left node (N L ) of N by all the classes in L S and the right node (N R ) of N by all the classes in R

Figure 2 .
Figure 2. Decision tree construction steps for distribution given in table 1.

Figure 3 .
Figure 3. Classified image using the proposed Decision Tree classifier.

Figure 4 .
Figure 4. Classified image using standard Maximum Likelihood classifier.

Figure 5 .
Figure 5. Class wise percentage of pixels for Decision Tree and Maximum likelihood methods Node or Stopping CriteriaWhen a subset of classes becomes pure, create a node and label it by the class number of the pure subset having only one class.

TABLE I .
SPECTRAL CLASS DISTRIBUTIONS FOR THE TRAINING SET TAKEN FROM THE IMAGE IN FIGURE 1.

TABLE II .
CONFUSION MATRIX AND ACCURACY CALCULATIONS FOR DECISION TREE METHOD