Design of Multi-View Graph Embedding for Features Selection and Remotely Sensing Signal Classification

Now-a-days, signal processing remains an intensive challenging area of research. In fact, various strategies have been suggested to address semi-supervised, feature selection and unlabeled samples challenges. The most frequent achievement was dedicated to exploit a single kind of feature/view from the original data. Recently, advanced techniques aimed to explore signals from different views and to, properly, integrate divergent kinds of interdependent features. In this paper, we propose a novel design of a multi-View Graph Embedding for features selection allowing a convenient integration of complementary weighted features. The proposed framework combines the singular properties of each feature space to accomplish a physically meaningful cooperative low-dimensional selection of input data. This allows us not only to perform a semi-supervised classification, but also to propagates narrow class information to unlabeled sample when only partial labeling knowledge is available. This paper makes the following contributions: (i) a feature selection schema for data refinement; and (ii) the adaptation of a multi-view graph-based approach by a better tackling of semi-supervised and dimensionality issues. Our experimental results, conducted by using a mixture of complementary features and aerial images datasets, demonstrate the effectiveness of the proposed framework without significantly increasing computational complexity. Keywords—Signal processing; remote sensing images; features selection; graph embedding; unlabeled samples


I. INTRODUCTION
In both signal processing/remote sensing community, a considerable growth has been shown in the plenty and the capacity of images which deliver a noticeable defiance to the traditional signal processing techniques [1]. Dealing with remotely sensed images, multifarious types of features (e.g., spectral, texture, or spatial features) can be used to characterize pixels from different kind of views [2]. In fact, due to the progressively intense computational capabilities and the diligent advancement of feature extraction methods, signals are consequently outlined by disparate aspects [3]. Hence, a single pixel can be uniformly seen from different "views", where each of them is composed of a particular subspace of descriptors.
With the availability of this infinitude set of features, multiple feature views are going to have, generally, a multitude of statistical dispositions. Hence, the concatenation of these views allows to take advantage of the complementary aspect between these spaces and consequently to increase the classification accuracy. Thus, it will be very interesting to handle all the input space, including all available descriptors. Nevertheless, the simple concatenation, without a proper modeling, will lead to some problems such as feature biasing [4]. In other terms, this investigation confronts two major challenges. The first problem is the computational complexity which will be increased by the size of the input space. Moreover, the model sensitivity of classification schema will be affected by the insignificant features which should be ignored [5]. Now-a-days, multi-view learning has received noticeable immersion for signal processing applications [6]. While there have been progress in multi-view classification, most existent techniques are based only on the classical vector representation of features for each view and their fusion in the classification step. Nonetheless, the complex structure of most signal common forms and the inadequacy of vector modeling design induct significant challenges [7]. Then, a more adapted modelling of extracted features allows a finest capture and modeling of the inherent structural information. To address this issue, we propose, in this work, a novel approach involving a graph based classification enhanced by a feature selection schema. The major intention of this investigation is to find lowdimensional representations as a gateway for graph embedding while preserving the inherent structure of original data.
To outline, our principle contributions are threefold: • A novel graph embedding schema is introduced for multi-view classification and an alternating model based on SSMF algorithm [8] is presented to efficiently optimize remote sensing signal classification.
• The feature selection step, considered in previous works as a pre-processing step, has been incorporated to the proposed algorithm, thus enabling us to eliminate superflous features while controlling the dimensionality issue.
• Expanded experiments conducted on two different datasets reflect the capability of the proposed technique.
The remainder of the paper is organized as follows. Section II is dedicated to problem formulation and literature review, Section III presents in greater details the proposed approach. Section IV shows the experiment on a synthetic/real data set and highlights the obtained outcomes. Section V presents the discussion. Section VI finally establish the conclusions and the future directions of our work.

II. RELATED WORKS
The classification approaches dedicated to image processing can be decomposed into three main families : super-vised, semi-supervised and unsupervised techniques. These approaches can be also decomposed into hard and soft techniques. The so-called full pixel techniques, adopting the assumption that each pixel is associated with a pure land cover type, have been evaluated as no appropriate for the processing of mixed pixels. A widely alternative way is the use of soft classification approaches [9]. These methods do not appoint a pixel to exactly one class, although they generate a set of fractions that reflects, for each pixel, the membership amplitude for a given class. This fact imposes to include different types of features/view in order to better classify this kind of pixels.
When tackling image processing from a multi-view perspective, various refinements can be envisaged by considering remotely sensed images as a valuable source of information. In fact, combining information from various data sources has become a prominent research topic in machine learning. Nevertheless, representing images is not an unequivocal exercise in the domain of signal processing, as the amount of candidate image features is practically unlimited [10]. The election of features/model is generally related to the target appliance. Features examples can be divided into spectral, textural, structural groups. The textural group such as SIFT and Gabor texture features. The spectral group includes radiometric features, Gray Level Co-Occurrence Matrix (GLCM). The third group includes Gaussian wavelet features, shape features, etc. [11]. Major works adopt the instinctive idea stipulating that if we increase the number of features, the accuracy will increase unquestionably.
The associated literature was faintly focused on a same linear fusion schema from different views for all objects. Zhand et al. [12] propose an object-oriented segmentation combined with Support vector machines. Nweke et al. propose a multiple classifiers system based on k-Nearest Neighbors [13]. Wu et al. introduce a deep learning algorithm for multiview medical images processing [14]. Alhumaidi et al. use a serious gaming approach to manage interferences in ad hoc femtocell, Stackelberg competition is used to elect the best FAPs without making a random choice [15]. Unfortunately, the proper integration of the features with different input space is missing. Conceivably, the classifier performance was not automatically improved (reduced in some cases) by the profusion of multi-bands images [16]. Concurrently, this phenomenon is the after effect of the increase of spectral class variability. Hughes showed that classification performance decreased, as further features were involved. This phenomena is designed by "the curse of dimensionality" [17].
As an extension to the discussed works, an emerging family related to multi-view approaches are becoming an interesting area of research. The existing approaches adopt a "one-combo-fits-all" schema. Therefore, the final model based on features concatenation and manifold ranking is done linearly by combining multi-views spaces. This will lead to a single similarity graph [18]. More readily, the propagation fusion approach investigates the label propagation or for each features space, then a federation step allows to combine the obtained outcomes. Furthermore, the feature selection step is not included in the manifold ranking, which induce some errors in the final result. In order to boost the classification accuracy, a balance should be raised between the dimensionality of the input space Hughes phenomena.
The main objective is to study the effectiveness of multiview graph-based approach while applying soft classification schema. Given an image with training pixels seen from multi-views, including both labeled and unlabeled samples, multi-view learning graph intends to build a classifier by incorporating the complementary information from multi view perspectives. Currently, graph theory has been adopted for remotely sensed images processing. Lio et al. [19] propose a technique associating both feature fusion and decision for multi-sensor data classification. A general graph-embedding (GE) framework was proposed in [20]. In this framework, each algorithm is regarded as a un-directed weighted graph that incorporates ideal properties of the original data set. Yu et al. [21] investigate the relevance scores, which handled the classification process, among neighbor pixels with a hypergraph learning schema. Our work is inspired and builds in a novel direction on the success of these previous approaches.
One of the prominent issues of remotely sensed images is the feature redundant aspect which is quite common with sensors covering wide areas. This can lead naturally to the challenging issue of overabundance affecting seriously the classifier accuracy. To overcome this problem, feature selection focuses on the election of a subset of the initial features space according to a selection criterion. It is an outstanding approach widely applied in pattern recognition. It allows the reduction of dimensionality by eliminating irrelevant and superfluous features, and thus allows to improve for applying algorithms, such as improving classification precision, boosting results readability, and reducing computational complexity. With reference to the choice of label information exploitation, feature selection algorithms can be categorized as supervised algorithms, unsupervised algorithms or semi-supervised algorithms [22]. From the outlook of selection design, feature selection approaches are chiefly declined into three families: filter, wrapper or embedded. The filter approach weigh features in an independent way without recourse to any algorithm. The wrapper approach involves a learning schema to extract and weigh the quality of input features. Finally, the embedded approach designs the feature selection process as an integral part of the algorithm and use the associated objective to lead searching for significant features. Conclusively, the reviewed approaches may either elect a subspace of the original space or return the weights of features evaluating their adequacy.
Another problem may arise when dealing with images, it concern the lack of labeled samples. This leads, inevitably, to a low accuracy particularly with a missed or incomplete ground truth. To overcome this problem, Shi et al. propose an hierarchical multi-view learning framework based on CNNs [23]. Aydav et al. use granular computing to improve classification [24]. The problem of missed labeled samples should be Incorporated in the learning process. If there is a great majority of works based on transfer learning, one trending approach is the active learning can offer a solution for these issues.

III. PROPOSED APPROACH
The proposed approach is illustrated by Fig. 1. It begins with a first pre-processing step. Next, a feature extraction stage from different views is carried out. Secondly, a learning graph embedding approach is proposed. The learning process includes a features selection step which mallows to eliminate redundant features. The first step of the proposed approach is the extraction of multi-view low-level visual features : spectral, textural, structural. The textural view is a combination of SIFT and 2D log Gabor texture features. The practical expression of this feature is given by equation 1.
The spectral view includes the concatenation of radiometric features, Grey Level Co-Occurrence Matrix (GLCM). The third view includes shape Hu's invariant moments (seven invariant forms to rotation, translation and scaling) [25]. This first 4 moments are illustrated by equation ??. The final target is to make a combination of them in in the form of a compound-feature structure. The majority of related works try to exploit of the panoply of available features without paying attention to the dimentionality curve. A shortcoming which certainly affects the precision of the learning model. We propose here an innovative feature selection strategy based on an embedded graph to extract efficient and suitable features from remote sensing data.

1) Graph based modelling:
Each of extracting views, presented in Fig. 1, models a particular aspect of original data. In order to profit from the completeness of these views, a concatenation approach turns to be unavoidable. As discussed precedently, the concatenation of these features in a single vector will amplify the dimentionality problem and will affect the classification accuracy. In addition, the lack of labeled data complicates the classifier task. In order to overcome these challenges, we propose here to model this combination using a graph learning model. Let's make the following notations: • N : Number of pixels ; • M : Number of views ; • C : the number of classes. So, X = {x 1 , x 2 , . . . , x N } be a set of pixels seen from M views.
Hence, the proposed approach aims to model each of these view as a graph. For each graph, we set pixels as vertices, and specifies edges based on the similarity between samples. An edge between nodes i and j is drawn if x i and x j are "proximate". Edges may be weighted based on similarity scores [?]. Therefore, we construct M graphs each using a specific kind a feature. G g denotes a K − N N graph build up on X using g th feature. Accurately, G g is designed by linking each two vertices x i and x j if one is with the k nearest neighbors of the other. We designate by W g the edge affinity matrix of G g . Each entry W g (i, j) in W g reflects the similarity between x i and x j according to the g th features view. If the similarity is not null, there is an edge in G g between x i and x j . Otherwise, W g (i, j) is zero.
To compute W g (i, j), Gaussian kernel, a widely used measure of similarity between data instances, is used to compute edge weights as shown in Equation 3. It has shown to exceed the other distances for data classification [?].
where d A (x i , x j ) is the distance measure between instances x i and x j , and σ is the kernel bandwidth parameter. Owing to the high-dimensionality of the extracted features, the performance of the kN N rule classifies is imperatively related to the adopted metric. This choice can't be always optimal.
2) Feature selection: The significance of a feature can be assimilated to the following question: how much it respects the graph design. A legitimate criterion for selecting features is based on the minimization of the laplacian score of the g th feature through the following equation: where f r = f r1 , f r2 , · · · , f rn ] T With the graph G = (V, E, W ) constructed, we can presently achieve classification over the graph and assign labels to all remaining samples. Afterwards, the manifold ranking allows the adjusting process of the graph, the more similar two samples, the more probably they have analogous labels. This process is called local smoothness. The labeled samples incrementally propagate the label relevance scores to unlabeled ones via graph edges until convergence. The final accomplishment of graph learning, called "global consistency", is the consistency of the obtained given the initial label information [8].
Let D g be the diagonal matrix of G g where each component D g (i, i) is specified as D g (i, i) = n j=1 W g (i, j). In a semi-supervised context, the first m samples x i (i = 1, 2, ..., m) are labeled and the remaining other samples are unlabeled. Let's denote by L ∈ R n×c be the relevance labeling matrix with L(i, j) = 1, if x i is marked by label j, designated by L(x i ) = j(1 j c), and 0 otherwise. Equivalently, let R g ∈ R n×c be the relevance score of unlabeled sample x u affected to class j respecting the gth view. The final expression of optimal R g is attained by minimizing the following objective function: Algorithm 1: The proposed algorithm g , α g , λ, ε (convergence threshold) Output: R * SSM F : final label relevance matrix. for g = 1, 2, ..., M do t=0 compute W g t=1, for g = 1, 2, ..., M do until c ≤ / * The change is smaller then a threshold * / return the converged relevance label matrix

A. Data Sets
Experiments were conducted on two datasets: 1) UC Merced land use scenes, 2) Taif city aerial images. Those datasets, including a variety of spatial/textural patterns, boost the classification challenges.
1) UC Merced dataset: is expressed by 21 land-use classes selected from aerial imagery. Each set comprehends 100 images of 256x256 pixels for each of the 21 categories. This data set is delineated by Fig. 2. 2) Taif region dataset: For real case scenario, the studied area is Al-Taif city located in the south-eastern part of Makkah region. The centroid for the study area is at 32126 14.1828 N and 4030 45.7704 E. A series of Landsat images was used in this experiment were collected from the USGS library through the Glovis Viewer. This data set is delineated by Fig. 3.

B. Results
Fig. 4 and 5 illustrate the confusion matrix of the proposed approach for the two datasets. As noted, there is a few overlapping between some land-cover types. For example, some pixels belonging to the agriculture area are classified as baseballdiamand. This is, generally, acceptable according to the resemblance between them.
To allow a deeper assessment, we have compared the performance of the proposed approach against some conventional methods : adaptive nearest neighbor clustering (CAN) and LS-SVM. The efficiency of the proposed approach with different amounts of labeled training samples, Fig. 6 illustrates how the different methods reacts in the face of a changing number of labeled/unlabeled samples in terms of Ranking Loss and Average Precision. Regarding these outcomes, we notice that if the total precision increases in correlation with the number of labeled samples. Compared with other methods, our algorithm mostly accomplishes the best precision for different rates on all available datasets.
This proves that our algorithm may attain an advance efficiency given a fixed ratio of labeled training samples, and  certify the performance of the proposed methodology. Finally, to better assess the performance of the proposed algorithm, the overall precision (OA) and kappa coefficient (Kappa) are computed for all approaches and compared via Table I.  The average processing time for the simulated images was 94s for the the Mac OS computer. It depends on the image and it is about O(N 2 )complexity.

V. DISCUSSIONS
The contribution of the proposed algorithm lies in establishing a multi-view graph. The results show that the proposed approach is persuasive in remotely sensed image classification. It significantly produces higher precision when compared to state of the art techniques. Hence, it is a conceivably an advantageous alternative when dealing with multi-view and feature selection scenarios.
We have also proposed an efficient multi-label embedding graph which allows a feasible resolution to multi-label ranking contrary to common approaches that adopts binary classifiers for unlabeled multi-view learning problem. This investigation enables us to apprehend the relationships between labels.
Despite the gratifying outcomes obtained by the proposed approach, further achievements need to be investigated in several aspects. First, our algorithm is developed according to a semi-supervised scenario. To expand applications areas, we are concerned with the proposition of a new version able to deal with more difficult entries such as completed unlabeled data. On the other hand, we project to propose a faster implementation of our approach using parallel computing. Finally, motivated by previous works, we hope to integrate big data and deep learning aspects to extend the number of views.

VI. CONCLUSION
In this paper, we proposed a novel approach for remotely sensed image classification using based on multi-view classification and feature selection in a unified framework. This investigation allows a flexible integration of label information, a better handling of views discrepancy and a modeling of the non-linearity between the data samples. The improved discrimination rate shown through experimental results on various datasets demonstrates the expressive contribution of each step in the proposed approach. Notably, the graphical modeling of nonlinear complex structure in multi-view features helped increase the recognition rate. Furthermore, the outcomes demonstrate also that our approach holds prevalence while maintaining a reasonable time complexity.