New Divide and Conquer Method on Endmember Extraction Techniques

In hyperspectral imagery, endmember extraction (EE) is a main stage in hyperspectral unmixing process where its role lies in extracting distinct spectral signature, endmembers, from hyperspectral image which is considered as the main input for unsupervised hyperspectral unmixing to generate the abundance fractions for every pixel in hyperspectral data. EE process has some difficulties. There are less distinct endmembers than its mixed background; also, there are endmembers that have rare occurrences in data that are considered as difficulties in EE process. In this paper, we propose a new technique that uses divide and conquer method for EE process to find out these difficult (rare or less distinct) endmembers. divide and conquer method is used to divide hyperspectral data scene to multiple divisions and take each division as a standalone scene to enable endmember extraction algorithms (EEAs) to extract difficult endmembers easily and finally conquer all extracted endmembers from all divisions. We implemented this method on real dataset using three EEAs: ATGP, VCA, and SGA and recorded the results that outperform the results from usual endmember extraction techniques methods in all used algorithms. Keywords—Endmember extraction algorithm (EEA); endmember extraction (EE); automatic target generation process (ATGP); hyperspectral imagery; simplex growing algorithm (SGA); hyperspectral unmixing; vertex component analysis (VCA); divide and conquer method INTRODUCTION I. Endmember extraction is considered to be an important and crucial step in hyperspectral data exploitation. A pixel in hyperspectral data may be either a pure pixel or mixed pixel. A pure pixel represents an endmember (EM) that exists in the scene. A mixed pixel contains multiple contributions from a group of different endmembers that exists in the scene. Therefore, endmember is considered as a pure signature for a class [1]. Generally, an endmember is not a pixel; it is a spectral signature which is specified completely by the spectrum of a single material substance. Several endmember extraction methods have been developed to extract pure pixels from hyperspectral data. Here we use three different algorithms for extracting endmembers from hyperspectral data. The first one is Automatic Target Generation Process (ATGP) that finds its targets by using a sequence of orthogonal subspaces with the maximal orthogonal projections [2], [5], [7], [8] where ATGP considered the unsupervised version of Orthogonal Subspace Projection (OSP) algorithm. The second used algorithm is the Simplex Growing Algorithm (SGA) [3], [8] which finds its endmembers by growing a simplex, vertex by vertex, until it reaches the required endmembers represented by vertices of simplex. The last used algorithm is the Vertex Component Analysis (VCA) [4], [8], it is an OP-based EEA that is characterized by computational complexity reduction by replacing simple volume calculation with OP and growing nonnegative convex hulls, vertex by vertex, until it builds a pvertex convex hull (p denotes the endmembers required to be extracted). Authors in [6], demonstrate some EEAs as ATGP, VCA, and SGA and demonstrate their efficiency by using different criteria as sequential or parallel implementation, dimensionality reduction, etc. ATGP, VCA, SGA are most widely used in EE [8]. They are similar in their design but different in preprocessing steps. Some researches work in spatial and spectral information of hyperspectral data to enhance EEAs. Over segmentation based method introduced in [9], exploit spatial and spectral information to enhance computational performance for EEA. A new enhancement for EEAs is suggested in [10] that gives guidance to EE process for spatially homogenous regions and consequently to enhance performance of unmixing process. This paper contributed to enabling EEAs to find difficult endmembers where EEAs alone couldn’t find them without using this proposed method. This paper is organized as follows. Section 2 introduces Linear Mixture Model. Section 3 describes the proposed method. Dataset used is introduced in Section 4. Results and discussions are provided in Section 5. The conclusion is given in Section 6. LINEAR MIXTURE MODEL II. Linear mixture model is a well-known approach used for determination and quantification of materials in hyperspectral images. Hyperspectral image consists of pixels where every pixel is represented by a vector of values for each spectral band which, in its turn, is the reflectance of the material in a specific wavelength. Let r be an L × 1 column vector in a hyperspectral image where L refers to the number of bands. Suppose that there are (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 7, 2017 95 | P a g e www.ijacsa.thesai.org p materials in the hyperspectral image and M = [m1 m2 ... mp] is an L × p matrix of material signature, where mj is an L × 1 column vector of the j th material signature in the hyperspectral image. Assume that a is a p × 1 abundance column vector denoted as (a1, a2, ..., ap) T which associated with r (ak represents the abundance fraction of the k th signature exist in the pixel vector r). Linear unmixing can solve this mixed pixel problem. It assumes that spectral signature r can be represented by a linear regression model as in (1) where r is linearly mixed by p material signatures.


INTRODUCTION I.
Endmember extraction is considered to be an important and crucial step in hyperspectral data exploitation.A pixel in hyperspectral data may be either a pure pixel or mixed pixel.A pure pixel represents an endmember (EM) that exists in the scene.A mixed pixel contains multiple contributions from a group of different endmembers that exists in the scene.Therefore, endmember is considered as a pure signature for a class [1].Generally, an endmember is not a pixel; it is a spectral signature which is specified completely by the spectrum of a single material substance.
Several endmember extraction methods have been developed to extract pure pixels from hyperspectral data.Here we use three different algorithms for extracting endmembers from hyperspectral data.The first one is Automatic Target Generation Process (ATGP) that finds its targets by using a sequence of orthogonal subspaces with the maximal orthogonal projections [2], [5], [7], [8] where ATGP considered the unsupervised version of Orthogonal Subspace Projection (OSP) algorithm.The second used algorithm is the Simplex Growing Algorithm (SGA) [3], [8] which finds its endmembers by growing a simplex, vertex by vertex, until it reaches the required endmembers represented by vertices of simplex.The last used algorithm is the Vertex Component Analysis (VCA) [4], [8], it is an OP-based EEA that is characterized by computational complexity reduction by replacing simple volume calculation with OP and growing nonnegative convex hulls, vertex by vertex, until it builds a pvertex convex hull (p denotes the endmembers required to be extracted).
Authors in [6], demonstrate some EEAs as ATGP, VCA, and SGA and demonstrate their efficiency by using different criteria as sequential or parallel implementation, dimensionality reduction, etc. ATGP, VCA, SGA are most widely used in EE [8].They are similar in their design but different in preprocessing steps.Some researches work in spatial and spectral information of hyperspectral data to enhance EEAs.Over segmentation based method introduced in [9], exploit spatial and spectral information to enhance computational performance for EEA.A new enhancement for EEAs is suggested in [10] that gives guidance to EE process for spatially homogenous regions and consequently to enhance performance of unmixing process.This paper contributed to enabling EEAs to find difficult endmembers where EEAs alone couldn't find them without using this proposed method.This paper is organized as follows.Section 2 introduces Linear Mixture Model.Section 3 describes the proposed method.Dataset used is introduced in Section 4. Results and discussions are provided in Section 5.The conclusion is given in Section 6.

LINEAR MIXTURE MODEL II.
Linear mixture model is a well-known approach used for determination and quantification of materials in hyperspectral images.Hyperspectral image consists of pixels where every pixel is represented by a vector of values for each spectral band which, in its turn, is the reflectance of the material in a specific wavelength.
Let r be an L × 1 column vector in a hyperspectral image where L refers to the number of bands.Suppose that there are www.ijacsa.thesai.orgp materials in the hyperspectral image and M = [m 1 m 2 ... m p ] is an L × p matrix of material signature, where m j is an L × 1 column vector of the j th material signature in the hyperspectral image.Assume that a is a p × 1 abundance column vector denoted as (a 1 , a 2 , …, a p ) T which associated with r (a k represents the abundance fraction of the k th signature exist in the pixel vector r).
Linear unmixing can solve this mixed pixel problem.It assumes that spectral signature r can be represented by a linear regression model as in (1) where r is linearly mixed by p material signatures.

 r = Ma + n 
Where n is noise.In unsupervised hyperspectral unmixing process, hyperspectral image pixel represented by r with M and a are unknown.Endmember extraction algorithms come to extract M matrix from hyperspectral image to be used as an input in a linear unmixing method that plays its role to unmix the unknown abundance fractions matrix by an inverse of the linear mixture model.

PROPOSED METHOD III.
From the spectral viewpoint, endmembers in the scene have distinct signatures.These endmembers are the target of any EEA regardless its design and implementation.EEAs used extract all vertices in the simplex as endmembers as shown in Fig. 1 where vertices of great triangle are E1, E4, and E5 but vertices in small triangle are E1, E2, and E3.Endmembers set in small triangle is different from endmembers set in bigger triangle according to EEAs viewpoint.Notice E2, and E3 cannot be extracted from the bigger triangle unless we divide data into sections that will raise the probability of extracting them using different EEAs used.
In this section, a new technique that uses divide-andconquer method in endmember extraction algorithms is proposed.
Not necessarily that all extracted pixels are pure pixels and represent material signature resident in hyperspectral scene.Usually, some of the extracted pixels, using the EEAs, are mixed.This is normal because each EEA has its strategy in finding endmember set.EEAs suffer from not finding all materials signatures.The proposed technique tries to solve this problem and enhance EEAs results.To test the method, we used real dataset (as explained in the next section) along with its ground truth abundant matrix.Fig. 2 explains the workflow used in the proposed technique.There are five stages in the workflow, beginning with hyperspectral image (HSI).To begin, HSI is divided spatially into scheme N×N to create N 2 equivalent segments, where every segment is considered as a standalone HSI.
Following this, EEAs are applied into every segment to extract p endmembers and record them into p-list (p refers to the expected number of endmembers in HSI).
In the subsequent stage, all p-lists that are created are assembled into overall-list that contains p × N 2 endmembers.Next, each endmember in the overall-list is validated using abundance map.
A. The different schemes used in the proposed method are as the following:  No division scheme: dataset remains as one segment and EEAs are applied on the full dataset and results are recorded.
Jasper Ridge is one of the popular datasets used in hyperspectral data analysis [11]- [12].Jasper Ridge is a cube of data consists of 512 rows × 614 columns × 224 bands.Its spectral range is starting at 0.38 micron and ending at 2.5 micron.For simplicity, we cut a subset from the original dataset consisting of 100 rows × 100 columns as shown in Fig. 3.This subset is starting from pixel at 105th row and 269th column from the whole dataset.Because of some effects of atmosphere and water vapor absorptions, 26 bad bands are discarded from total 224 bands as follows: 1:3,108:112,154:166,220:224.The number of remaining bands is 198 were used for analysis.
There are four endmembers in Jasper Ridge data: Tree, Water, Soil, and Road.Their abundance images are shown in Fig. 4(a).Jasper Ridge dataset has an abundance map that restricted by Abundance Non-negativity Constraint (ANC) and Abundance Sum-to-one Constraint (ASC).Due to some noise and other calibration problems, we suppose that the pixel which has abundance fraction greater than 90 % is considered as a pure pixel.Fig. 4(b) illustrates pure pixels for every endmember in the map.RESULTS AND DISCUSSIONS V.
In this section, a full description for experiments is executed on Jasper Ridge dataset and synthetic dataset.It gives an overall analysis which demonstrates that results from EEAs using the proposed method D&C outperforms results from EEAs without D&C.
There are three EEAs used in the experiments (ATGP, SGA, and VCA).ATGP is a deterministic algorithm, where it can extract the same set of endmembers for different runs, so it was executed only one time.As opposed to ATGP, VCA and SGA are random algorithms, so they were executed three different runs and their results were recorded separately.
Applying an EEA on any dataset using No Division Scheme will give set of p extracted endmembers (where p is the number of endmembers resident in this dataset).Using Scheme 2×2, there are 4 × p extracted endmembers by applying any EEA.Also in Scheme 3×3 and Scheme 4×4 there are 9 × p and 16 × p extracted endmembers respectively.www.ijacsa.thesai.orgFig. 5 illustrates different division schemes used on Jasper dataset.The workflow, introduced before, will be applied on every scheme and the extracted endmembers sets will be gathered into one overall set for every unique scheme.According to abundance image for Jasper Ridge dataset, all pixels extracted by EEAs from Jasper dataset are validated as follows:

A. No Divisions Scheme Results
Table 2 demonstrates results extracted from EEAs by using No Divisions Scheme.ATGP extracted four pixels; two of them were pure pixels one for Tree and another for Soil.The other two pixels were mixed pixels, and ATGP couldn't extract any pure pixels for Water or Road.
VCA #1 as VCA #2, they extracted two pure pixels one for Water and another for Soil, but also they couldn't extract any pure pixels for Tree and Road.But, VCA #3 extracted three pure pixels from the four extracted pixels, only Road couldn't be extracted.
All SGA runs extracted here the same two pure pixels, Tree and Soil, while Water and Road didn't have any pure pixels with SGA.

B. 2×2 Divisions Scheme Results
Table 3 shows results after applying EEAs using 2×2 Divisions Scheme.ATGP extracted four groups where each group contains four pixels with a total of 16 pixels that should be extracted as endmembers.Five of sixteen were pure pixels which represent only Tree and Soil, and the other 11 pixels were mixed pixels.ATGP is still not able to find pure pixels for Water and Road signature.All VCA runs had same results, where they extracted all materials signatures except for Road signature.SGA as ATGP couldn't find Water and Road signatures.

C. 3×3 Divisions Scheme Results
According to 3×3 Divisions Scheme, results extracted after applying EEAs is listed in Table 4. ATGP was able to find two pure pixels for Road.Also, results of VCA were improved and all materials signatures are extracted.SGA could extract one pure pixel for Road signature as ATGP and continued to be unable to extract any pure pixels for Water spectral signature.

D. 4×4 Divisions Scheme Results
Finally, Table 5 shows results for extracted pixels by EEAs using 4×4 Divisions Scheme where this scheme set appropriate conditions for different EEAs to find pure pixels for all materials signatures in dataset.

E. Computation Time
Different division schemes divide dataset into different number of divisions as shown in Table 1, but by increasing the number of divisions, the division size get smaller.This section describes the change in computational time for different used divisions.Table 6 illustrates computational time consumed in seconds for different used EEAs using different division schemes where its content is reflected by Fig. 6.
In ATGP, computational time using N×N Divisions Scheme declines towards increasing N but it's a bit disturbing in VCA, where time slightly increases.It is noticeable that SGA slightly decreases in time consumption by incrementing N. It's worth noting that computational time of ATGP and SGA decline towards more divisions for dataset, but time for VCA slightly increases.

F. No Division Scheme vs. Different Others Schemes from Viewpoint of Extracted p
In the first experiments, No Division scheme used in extracting only 4 endmembers (p = 4), where the expected number of endmembers in dataset is 4 (Tree, Water, Soil, and Road).Also each division, in the other division schemes, is used in extracting 4 endmembers.
It is a fair comparison among different division schemes in terms of giving the suitable chance to extract p endmembers from each different division where division is considered as a standalone scene.But it is not a fair comparison in terms of the number of total endmembers extracted that equals to p × N 2 for N×N Divisions Scheme used.
In this experiment, No Division scheme used to extract the same numbers of total extracted endmembers from different other division schemes.According to experiments conducted on whole dataset (No Division Scheme) with p = 16, 36, & 64, we discuss the extracted results and the computational time taken in the following two sections:   7, an experiment conducted on whole Jasper dataset with p = 16.ATGP and SGA extracted pure pixels for all materials signatures expect for Water.Although there are 16 extracted pixels, but 10 of them are mixed pixels.VCA results varied in extracting materials signatures where VCA #1 extracted all materials signatures, VCA #2 extracted all except Road, and VCA #3 only extracted Tree and Soil signatures.
An experiment conducted in Table 8 with p = 36.ATGP and SGA also (as results using p = 16), were agreed on the same extracted results where Water wasn't extracted yet, but all other materials signatures were extracted.VCA extracted all materials signatures sometimes including Water and sometimes without it.
Finally, using p = 64, Table 9 lists the results.ATGP and SGA were unable to extract Water signature and VCA were able to extract pure pixels for all materials signatures.

2) Computational time consumed using different values for p
Computational time taken for experiments conducted for different values of p (p = 16, 36, & 64) are tabulated in Table 10.By comparing computational time consumed for different EEAs used, we found that VCA increased linearly which considered the least growing algorithm in computational time.ATGP had great increments in time by increasing p. SGA increases dramatically which indicates the difficulty of its implementations as p increases.All EEAs increased in computational time using different values for p without resorting to divide data spatially.As opposed to using different division schemes which showed that computational times consumed were declined as ATGP and SGA or at most slightly increased as VCA.It is noted that division schemes showed superiority in the consumption of less computational time and improve the results of EEAs.

CONCLUSION VI.
Unsupervised hyperspectral unmixing process needs endmember extraction process prior to extract endmembers resident in hyperspectral scene.EEA suffers from finding less distinct and scarce endmembers in the scene.Our proposed method divided dataset into equivalent sections where each section represented as a standalone dataset, and applied EEAs on each section and the extracted endmember sets for the same division scheme were grouped into one overall set.VCA could find pure pixels that represent all materials signatures in smaller homogeneous division, while ATGP and SGA could find them in even smaller divisions.
By comparing all overall sets for different division schemes, we found that dividing data into sections can help EEAs to find rare and less distinct endmembers where computational time consumed decreases as in ATGP and SGA and at most increases slightly as in VCA.
We often need to increase p value to make EEAs more capable of finding pure pixels in hyperspectral image.But it takes great computational time and doesn't guarantee finding pure pixels that represent all materials signatures in the scene.www.ijacsa.thesai.orgResults could enhanced using different division schemes, not only for enhancement of finding pure pixels, but also in decreasing the computational time consumed.
We divided the data into 4 sections, 9 sections and 16 sections but didn't need more divisions.But how far will we stop the data divisions!!This work can be extended by creating stop condition for more divisions.

Fig. 6 .
Fig. 6.Computational time consumed for different EEAs using different division schemes.(* Average computational time of three runs).

Table 1
illustrates number of divisions and number of expected extracted endmembers according to different schemes used in Jasper Ridge dataset.Tables 2 to 4 demonstrate results from applying EEAs of different division schemes on Jasper dataset.

TABLE III .
EXTRACTED PURE PIXELS USING 2×2 SCHEME

TABLE V
* Average computational time of three runs

TABLE VII .
EXTRACTED PURE PIXELS USING NO DIVISION SCHEME (P=16)

TABLE IX .
EXTRACTED PURE PIXELS USING NO DIVISION SCHEME (P=64) * Average computational time of three runs