Insights on Deep Learning based Segmentation Schemes Towards Analyzing Satellite Imageries

Satellite imageries are essentially a complex form of an image when subjected to critical analytical operation. The analytical process applied on remotely sensed satellite imageries are utilized for generating the land cover map. With an abundance of traditional techniques evolved to date, deep learning-based schemes are progressively gaining pace for identifying and classifying a terrestrial object in satellite images. However, different variants of deep learning approaches have different operations, and so are the consequences. At the same time, there is no reported literature to highlight the issues, trends, and effectiveness much on a generalized scale concerning segmentation. Therefore, this paper reviews some of the recent segmentation approaches using deep learning to contribute towards review findings in the form of research trends, research gaps, and essential learning outcomes. The paper offers a compact and distinct picture of deep learning approaches used to boost segmentation for satellite images. Keywords—Deep learning; landcover; map generation; remotely sense image; satellite image; segmentation


I. INTRODUCTION
The satellite images of various resolutions are used for generating maps for land cover [1]. Analysis of satellite images potentially assists in developing multiple classes of landcover, e.g., water, impervious surface, vegetation surface, residential area, etc. [2]. To construct an accurate landcover map, it is essential to classify various elements, e.g., trees, individual buildings, roads, cars, etc. In this process of building a landcover map, spatial resolution is proven more efficient than spectral resolution [3] [4]. It will also mean that image pixels with finer resolution are more beneficial than the maximized number of spectral bands [5]. This is the prime justification behind the increased usage of remote sensing satellite images to enhance the terrestrial object's visibility [6]. In this direction towards generating an accurate landcover map of the satellite image, most existing studies are now emphasized towards pixel-based analysis where segmentation plays a significant role. This is also boosted by applying a deep learning approach towards land cover map generation and classification of objects [7]- [9]. With the availability of various wireless technologies, satellite images are transmitted from a satellite to the earth receiving center, where processing is carried out [10] [11]. However, the problem surfaces towards the quality of received signal where signal fluctuations are witnessed due to electrical signals in wireless transmission systems [12]- [14]. This finally leads to various errors and artifacts within the received satellite images [15] [16]. Most of these artifacts are in the form of noises, while it is challenging to eliminate coupled noises. Existing filters are not capable of controlling these forms of noises [17]- [19].
The available processing algorithms for satellite images are required to be proven for their efficiencies concerning computational complexities (space and time) that are significantly missing from literature towards wireless image transmission. The pixel-based analysis approach also leads to issues with the increase of spatial resolution; the complex patterns start to surface for spectral response originated from multiple objects of much smaller dimension in an urban region. The prime reason is using a similar object to develop various structures in landcover while emission of similar spectral response is continued. In object-based analysis, multiple segments are generated from an image indexed with different attributes, followed by subjecting it to rules of classification operation (e.g., texture, size, area, length, etc.). However, both pixel and object-based analyses have inefficiencies towards classification, making way for the deep learning approach to contribute to the segmentation process. However, deep learning cannot be fully considered an end solution, although it shows potential progress in analyzing satellite images. Despite some dedicated research approaches towards segmentation problems, deep learning has shown promising results, as well as there are also pitfalls associated with this approach.
The paper presents a discussion of existing deep learning approaches towards improving segmentation mechanism of satellite images. With an increasing proliferation towards adopting deep learning techniques, it is essential to understand its rate of success for analyzing satellite imageries, which itself is one complex form of signal. The significant problem under consideration is that there is no standard reporting for implementation effectiveness of using deep learning methods towards addressing segmentation problem, which is one of the www.ijacsa.thesai.org essential steps in analyzing satellite images. Inspite of availability of different variants of deep learning, the research question arises are i) what are the identified advantages and limitation of different deep learning schemes towards segmentation problem in satellite images? ii) which is the most preferred dataset of satellite imageries considered for existing evaluation, iii) what is the existing direction of research trend of exploiting the potential of deep learning over analyzing satellite images?. Hence, based on the above stated research question.
Hence, this paper studies the effectiveness of existing deep learning schemes for the segmentation of satellite images. The contribution of this review paper are multi-fold viz. i) a compact briefing of satellite imageries concerning conventional segmentation and its associated challenges, ii) exhaustive recent reviews of frequently adopted deep learning segmentation schemes on satellite images, iii) compact briefing of the frequently adopted dataset by the existing researcher, iv) Discussion of research trends of existing deep learning segmentation methods, v) exclusive highlights of research gap and essential findings of this review study of recent papers. The paper's organization is: Section II discusses satellite imageries while existing literature of different deep learning approaches is discussed in Section III. Section IV discusses data adoption while Section V highlights research trends with various perspectives. Section VI discusses review contribution, while Section VII outlines the paper's conclusion.

II. SATELLITE IMAGERIES
Satellite imageries are considered one of the meteorologists' primary sets of information to predict the atmosphere's behavior. Satellite images are of three types, as shown in Fig. 1 viz. water-vapor images, infrared images, and visible images [20] [21]. Some of the actual applications of remotely-sensed ideas are viz. i) tracking cloud for weather prediction, ii) monitoring growth of city area, iii) identifying changes in forest and farmland over some time, iv) Mapping and exploring the topography of ocean bed, v) forest fire, etc. Apart from this, satellite images have broader applications, e.g., anomaly hunting, regional planning, cartography, geology, oceanography, agriculture, forestry, etc. [22]. With the modernization, there has been a change in the forms and types of satellite images based on capturing it. Visual sensors play a significant role in this regard, integrated into modernday satellites to generate remote sensing images [24]. This is a process of identifying different physical characteristics specific to a monitoring region based on emitted radiation from the air-borne vehicle or the satellite. This paper has discussed the case of satellite images in remote sensing, which is characterized by five types of resolution, i.e., geometric, radiometric, temporal, spectral, and spatial [25].
The resolution of satellite images largely depends upon the orbit altitude and types of the instrument being used. This manuscript will not illustrate fundamental theories of satellite images, as information can be easily accessed from various online articles, e.g. [26] [27]. Instead, the proposed study will emphasize the challenges of processing it and understanding the existing literature's effectiveness. A series of image processing is carried out to process satellite images, e.g., enhancement, feature extraction, segmentation, fusion, detection of changes, compression, classification, and feature detection [28]. Out of all these processes, the most critical function that significantly contributes to accuracy in prediction is the segmentation process. The process of segmenting satellite images targets to obtain a distinct segment of boundaries and their objects.
Some of the conventional methods of satellite image segmentation are briefed in Table I which exhibits conventional approaches of segmentation over satellite images e.g. Gabor filter and graph-based [29], firefly algorithm [30], deep learning [31], Markov Random Field [32], and Cuckoo Search [33]. From the preliminary outcomes, it can be seen that performance (especially with respect to accuracy) is higher for deep learning compared to other conventional methods irrespective of any dataset being used. The segmentation techniques mentioned above in Table I are considered as a baseline by various research work; however, the segmentation approaches are still found not to exhibit better predictive accuracy as they should be. There is a reason for this trade-off which is complexities associated with segmenting satellite images which are as follows: i) limitation of coverage are of satellite images, ii) availability of limited information from current satellite data [34], iii) higher possibilities of degradation of image quality during retrieval process [35], and iv) possibilities of generation of artifact-data when aggregated from multi-instrument data. Hence, there are significant challenges in performing proper segmentation for any complex remote sensing satellite images in all the cases mentioned above.
Further, it is noticed that landcover classification is associated with various challenges viz. multitemporal images, presence of clouds, classification of the object, small scale benchmarks, etc. Landcover images of the satellite are considered to be massive, and hence the mining community has already started to utilize the Big Data concept towards mapping and classifying crops [36]. In this respect, deep learning is a potential player to contribute to classification. It is because. However, this is also another factor of motivation to take an interest in working on this topic in the current era. Hence, this paper explores the impact of existing deep learning on the segmentation of satellite images.

III. DEEP LEARNING SCHEMES
Deep learning schemes are constructed based on the neural network that consists of neurons in the form of multiple layers capable of transforming the input of satellite images to an outcome image in the form of identified land covers. This is achieved by progressively learning the high-level features. This section discusses different taxonomies and approaches of deep learning schemes that have been used for image segmentation towards the satellite data as follows:

A. Convolution Neural Network
The generalized framework of Convolution Neural Network (CNN) consists of harnessing the softmax layer using different blocks with distinct architecture and ensemble of its outcome. The conventional practices of CNN comprise different layers, e.g., input, convolution, pooling, completely connected layer, and outcome of the soft-max layer. The computation of the filter adopted in the convolution layer is mathematically represented as follow: In the above expression (1), the outcome of the filter is dependent on multiple variables: The variable g represents nonlinear activation function while the first component Δa represents ( ) where variable x is input during the second component Represents bias term considering weight w with m th filter corresponding to l th layer. Further feature maps are produced from progressive usage of pooling and convolution layer that is finally transformed into onedimensional features leading to final prediction using soft-max layer. Fig. 2 highlights the sequence of the process mentioned above in CNN.

Convolution Layer
Pooling Layer Further, the process of training in CNN can be boosted by using Adam Optimizer [37], stochastic gradient descent, batch normalization [38], dropout [39], parametric rectified linear unit [40], etc. At present, there are various approaches of CNN being applied towards image segmentation of satellite data.
One of the significant beneficial characteristics of the CNN model is that it doesn't have any dependencies towards tuning the parameters. This fact was investigated in the study of Wurm et al. [41]. The study also analyzes the capabilities of transferring a trained network to a different type of dataset. The idea of this implementation is to carry out semantic segmentation of landcover using CNN. The presented study emphasizes the transfer learning operation of fully CNN.
The study of segmentation carried out by Zhang et al. [42] has extracted road area of land from satellite images using CNN. A raster map is constructed from satellite signal trajectories, where the outcome shows better accuracy. A recent work carried out by Li et al. [43] used CNN for extracting features considering the case study of developing footprint maps of buildings. This mechanism also uses a graph model to consider the spatial correlation of data to retain boundary-related information. Preprocessing is carried out using co-registration and truncated signed distance labels. The CNN inputs satellite image and ground truth, leading to extracted feature where a segmentation probability is computed and pairwise potential extraction. This combined yields a new graph model that finally generates multi-class results emphasizing object detection. A similar category of work has also been carried out by Saetchnikov et al. [44]. Multiple deep neural networks of CNN have been used for comparative analysis to perform segmentation leading to object detection and tracking.
Wang et al. [45] have developed a unique segmentation process that is iterative in its operation to preprocess remote sensing images for detecting ships. The region detection of the ship is carried out using multivariate Gaussian distribution. The training is carried out using optical panchromatic data during a hardware synthesis of the model over a Field Programmable Gate Array. The work carried out by Jiang et al. [46] has captured geographic information of road using remote sensing technology of satellite imageries. This technique uses CNN to classify satellite images into two categories, i.e., road and non-road sections. Further optimization is carried out to address the inclusion issues of non-road noises owing to natural scene factors using wavelet packets. Another unique work was carried out by Persello et al. [47], where the identification of information settlement over a landcover has been investigated. The study has emphasized over-extraction of spatial feature and texture information. The authors have used a complete convolution network where labeling towards the pixel of satellite images has been carried out. The higher representation of the data is autonomously subjected to a learning algorithm considering six layers of convolution network. Different from conventional studies towards land cover, Wang et al. [48] have carried out the study using CNN to find the ice concentration. Observation: From this discussion, it can be seen that CNN is used as a standalone and in combination with other schemes to boost the segmentation performance. For the majority of the implementation scheme, the performance of the CNN remains nearly similar with respect to method simplification and accuracy. It is also seen that the usage of CNN is found highly efficient for extracting potential features, classification of the scene, and detection of a specific form of land in satellite images. However, it is observed that features often tend to diminish while using the pooling layer. This will potentially affect the computational performance. In contrast, the outcome of the feature map and predictive resultants is not much improved during satellite image segmentation using CNN. Table II highlights the strength and weaknesses of the existing segmentation approaches facilitated by CNN to understand the contribution of existing literature.

B. Recurrent Neural Network
Recurrent Neural Network (RNN) formulates a directed / undirected graph obtained from node connection in neural network considering temporal sequence. Input with variable length sequence is processed using internal state of RNN which is a network class for infinite impulse response. Used over wide variety of application in current time, RNN is another frequently used supervised learning model deployed towards segmentation of satellite images. Essentially meant to carry out analysis of discrete sequences, it is found that RNN can generate deep feedforward networks. The conventional architecture of RNN is shown in Fig. 2.
According to Fig. 3, the conventional RNN model usually connects the outcome of all the neurons to the input to construct a topology of the network. RNN is of different types, i.e., one-to-one, one-to-many, many-to-one, and many-tomany. The common activation function in RNN is sigmoid, Tanh, and relu. In this case, the outcome of the previous step is considered an input for the existing step. One of the significant advantages of RNN in the segmentation process is its memory system, which retains information about all its calculations, thereby reducing the complexity of attributes not present in other neural networks. The discussion presented by Taberner et al. [49] gave a good insight into using deep learning over time-series datasets of satellite imageries. The most recent work carried out by Turkoglu et al. [50] has used RNN with multiple layers where gated cells were drawn. The most significant contribution of this study is findings that state a change in gradient magnitude when it moves through the cell over a deep lattice of RNN. The study has used MNIST dataset which is benchmarked dataset consisting of satellite imageries. The work carried out by Ienco et al. [51] has harnessed the potential of RNN to carry out the classification of landcover from satellite images. This technique has carried out segmentation using a multi-temporal stack to obtain information associated with multi-temporal layers of an object. The researcher has used a multiresolution segmentation approach followed by applying statistical evaluation over its features. Maggiori et al. [52] present a similar direction of work, where RNN has been used as an iterative and semantic segmentation process. Sun et al. [53] have implemented Long Short Term Memory RNN, which harnesses temporal factors of crops captured from satellite images over a time series.   Observation: RNN is a good option for boosting segmentation performance; however, not much recent research has been carried out in this perspective. Table III highlights the summary.

C. Long Short Term Memory
Long Short Term Memory, also called LSTM, is a typical case of RNN to offer extensive dependencies of long-term learning. It is capable of recording information over a more extended period without much effort. It is noticed that RNN consists of iterative modules formed in the chain for the neural network. In conventional RNN, a single tanh layer is used as a simplified structure for the iterative module. A similar chainbased system also exists in LSTM; however, the structure slightly differs concerning iterative modules. LSTM offers four layers of the interactive structure instead of a single layer. Fig. 4 highlights the conventional architecture of LSTM with multiple blocks of operation, i.e., neural network layer, pointwise operator, vector transfer, concatenation, and copy function.
The prime notion of LSTM is basically about a cell state that linearly runs over the entire chain letting the network simplify the flow of information. A specialized structure is used in LSTM called gates, which can add or eliminate information over the cell state. Usually, conventional gates in LSTM consist of pointwise multiplication operations and use a sigmoid neural network. From the image processing viewpoint, LSTM can be applied to images only after anticipated features have been extracted from them. At present, the applicability of LSTM towards improving the segmentation process of satellite images has been researched upon by various investigators. It has been noticed that although deep learning is preferred for classifying remote sensing images, simultaneous recognition of multiple objects and extracting their spatial relationship is yet a bigger problem in deep learning. This problem is addressed in Cui et al. [54], where a novel deep learning model is constructed by combining LSTM with fully CNN. The model has used natural language to define the spatial relationship of the remote objects using attention-based LSTM. The model has carried out semantic segmentation of multi-scale using CNN and U-Net entirely for object recognition of remotely sensed images. Nearly similar work is also carried out by Ghosh et al. [55]. A bidirectional LSTM is used along with UNet to extract temporal and spatial features of the satellite images for landcover mapping. The presented study has also used attention-based aggregation for all hidden representations of Spatio-temporal factors using a feedforward neural network followed by softmax normalization and spatial averaging. One of the significant advantages of this model is that it can perform segmentation even if different forms of atmospheric disturbances cover the images. Another recent work carried out by Kalinicheva et al. [56] has carried out a similar direction of study towards analyzing satellite images considering dynamic land cover changes. However, the authors have developed a better version of LSTM by introducing a unique unsupervised approach with an evolution graph. This technique makes use of image segmentation over the changed areas using the graph-based tree-merging method. Combined usage of LSTM and fully CNN has been reported in Sefrin et al. [57], which offers the benefit of using multitemporal information. Apart from this, a better classification is presented due to adopted preprocessing techniques. Another work carried out by Zhu et al. [58] has developed a hybrid model where the semantic segmentation process is integrated with relearning of post classification. After feature extraction, the model performs an object-based voting system for controlling fluctuation in different classes. Table IV briefs of comparison of existing LSTM based approaches.
Observation: LSTM has been proven for better classification and segmentation performance for satellite images associated with landcover. However, the models don't emphasize its practicality as not many computing units can be used over a system with limited memory or channel capacity. This challenge remains unattended. www.ijacsa.thesai.org

D. Staked Auto Encoders
An autoencoder is an unsupervised learning structure characterized by input, hidden, and output layers, while the training operation in autoencoders consists of encoding and decoding. The encoder carries out the mapping of the input data into hidden representation, while the decoder carries out the reconstruction of input data from the hidden representation. The dependable parameters for the encoding process are encoding function, weight matrix, and bias vector. In contrast, dependable parameters for the decoding process are still the same, i.e., decoding function, weight matrix of decoder, and bias vector. To control the reconstruction error, an objective function exists for optimizing it considering the loss function. So, stacked autoencoders are basically about stacking a specific number of n autoencoders into the same n number of hidden layers using a supervised learning approach, as shown in Fig. 5.
Further supervised learning scheme is used for fine-tuning it. At present, there is the evolution of specific schemes that use autoencoders to analyze satellite images. In most existing approaches, the first line of action is to train the initial autoencoders to extract the trained feature vector. The second step is to use that feature vector for the next layer as an input, and it's iterated till the completion of training. Finally, after all the hidden layers are trained, cost minimization is carried out, followed by updating weights. Existing approaches have reported the use of stacked auto encoders for change detection in landcover. The work carried out by Jing et al. [59] has used stacked auto encoders where multi-scale image segmentation is deployed over temporal images followed by the adoption of CNN to obtain a change map. A stacked autoencoder is used for classification. Usage of a similar principle was also reported in the work of Protopapadakis et al. [60] to evaluate targets over massive unlabeled data. Further denoising auto encoders have been reported in the work of Zhang et al. [61], where a spanning tree has been used for segmentation. The model can extract texture, spatial, and spectral features for all the identified objects, contributing to higher accuracy. Table V summarizes the existing contribution of stacked autoencoders towards segmentation process.  Observation: Deployment of stacked autoencoders is relatively a new scheme used for the segmentation of satellite images. Hence, there are few implementation studies towards stacked auto encoders to improve segmentation performance. Unfortunately, all the analyses using this approach don't find the appropriate number of stacks sufficient for encoding. The computational complexity raised towards maintaining the accumulation of information is not yet resolved in the existing system.

IV. ADOPTION OF DATASET
There are various types of a dataset of satellite images/remote sensing images used for analysis for landcover analysis. One of the essential datasets is the QuickBird dataset [62], which provides massive images of 0.60m Ground Sample Distance resolution and consists of four further multispectral bands with very high resolution ranging from 0.30-0.60m. The images were captured from 2001 to 2015 with an orbit height of 450 km. The following important dataset is Google Earth Satellite Imageries [63] which includes a raster dataset of satellites easier to be processed by any scripting environment. This is a massive dataset of satellite imageries consisting of a different crop type digital map, vegetation map, oil map, terrestrial field plots, elevation, human population, forest, water cover, etc. It also consists of Landsat satellite images with 30 meters resolution considered highly updated thermal and multispectral data [64]. MODIS [65] and Sentinel [66] is another dataset developed in collaboration with the Google dataset itself. MODIS dataset consists of satellite images ranging from 250m-1000m of snow cover, surface temperature, surface reflectance, leaf area index, and thermal anomalies, usually retrieved on 16days duration from Aqua and Terra spacecraft.
Further, the Sentinel dataset is a part of the European Space Agency consists of optimal high-resolution images (from Sentinel 1A/1B), land-ocean-climate images (Sentinel-3), and air quality images (from Sentinel-5P). They are frequently used in current research to analyze climatic change, emergency management, atmospheric monitoring, Marine monitoring, land monitoring. SPOT-5 dataset is another contribution for the European Space Agency [67], where the images were collected between 2002 and 2015 with an orbit height of 832 km and an orbit duration of 101 min. Similar organization of European Space Agency also offers RADARSAT-2 dataset whose resolution ranges from 9.0-160m [68]. This is the most updated dataset captured between 2008 and 2021, with both medium and very high resolution of wavelength between 5.2-7.7 cm. Inria Aerial image dataset consists of labeled remotely sensed images with 810 square kilometers [69]. With a spatial resolution of 0.3m, this dataset has color imageries with ground truth data and two semantics classes. This dataset consists of alpine towers, densely populated areas, and irregular urban settlements. A sample dataset for Inria is shown in Fig. 6, which exhibits sample Chicago landcover (Fig. 6(a)) and its reference as ground truth (Fig. 6(b)). The presence of reference/ground truth image assists in evaluating the correctness of analysis models, and hence this dataset is widely adopted. Existing studies have also been carried out considering the MNIST dataset, a benchmarked dataset for satellite images as a part of Kaggle [70]. This labeled dataset is managed in the form of images and CSV files. Kaggle dataset also consists of a DSTL dataset consisting of image identity with class type over its labeled area [71]. The dataset is maintained in 3/16band satellite images with a resolution range of 0.31-7.5 m. The next dataset found in the current study is the ISPRS dataset, consisting of an indoor scene, old buildings, aerial images of specific locations, and satellite images of different parts of the earth [72]. Another dataset adopted in the current study is remote Sensing Image Captioning Dataset (RSICD) collected from the different applications of Google with all images with 224x224 pixels and 10921 remote sensing images [73]. Apart from the above-mentioned standard dataset, existing literature has also reported usage of another dataset too viz.  [80]. It should be noted that all the dataset has the different characteristic of data of satellite images.  Table VI highlights the comparative characteristic of different satellite image dataset with respect to spatial resolution. It should be noted that different dataset has different types of characteristic which is mainly based on the process of acquisition of signal. Hence, the proposed study considers highlighting spatial resolution of the images being captured to be mentioned in Table VI. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 126 | P a g e www.ijacsa.thesai.org  Fig. 8, it can be seen that the number of adoption of deep learning has been spontaneously increasing in the last five years. There are more probabilities towards the continuation of similar trends in upcoming years. From Fig. 9, it is noticed that CNN is the dominant DL approach compared to other DL variants, i.e., RNN, LSTM, Stacked Autoencoders, Fully Convolution Network (FCN), and Deep Belief Network (DBN). Also, there is increasing adoption of LSTM and FCN approaches; however, they are significantly less in numbers. Hence, chances are more for CNN to be dominant in coming years too.
From Fig. 10, it is noticed that urban-based application is more investigated, followed by a water-based application using DL methods. The urban-based application will include identifying land covers, mainly exhibiting that it will focus on the research area.   This outcome eventually shows a higher scope of continued research work using DL methods towards satellite images in more progressive order. These findings of the research trend are one of the essential contributions of this manuscript.

VI. REVIEW CONTRIBUTION
From the prior section, it can be seen that various degree of research work is being carried out towards the segmentation of satellite images. It should be noted that not all deep learning mechanisms have implemented segmentation towards the input image. This paper has discussed only the research work where segmentation has been applied. The scope of this paper is i) the paper considers discussion of approaches published in last five years, ii) the paper emphasizes assessing the impact of different deep learning models towards segmentation. After reviewing different taxonomies of deep learning methods towards segmentation, different points of research gap are concluded that are briefed as follows:

A. Research Gap
The essential research gap explored after reviewing existing approaches are as follows:

 Unattended Computational Problems in Deep
Learning: Despite the frequently adopted technique, the studies using deep learning have witnessed an evident trade-off between achieving simplification in learning (positive point) and poor computational performance (negative point). Almost all the CNN www.ijacsa.thesai.org approaches studied in this paper are witnessed with lower performance scalability or increased computational complexities. Unless lower computational complexities characterize the framework; it is eventually not feasible to prove its application over real-life resourced defined computing devices. This fact is also witnessed for almost all deep learning techniques.
 Complexities of Satellite Images: Irrespective of any dataset of satellite imageries, it is now known that such images are accompanied by various challenges, from analyzing the raw sensed image to obtaining higher accuracy by extracting essential features. Although, deep learning methods can only be applied if the preprocessing is effectively carried out, which is missing in the most implementations. Hence, it is essential to preprocess the satellite image before feature extraction because it consists of a large amount of information. The deep learning method can automate the process for better classification; however, it will still depend on practical preprocessing input images before training.
 Biased Focused on Segmentation Approach: A closer look into existing approaches shows a majority of semantic segmentation methods used over satellite images. Such a technique labels each pixel concerning the associated class of satellite images, further subjected to dense prediction. One of the pitfalls of such an approach is that instances of the same classes are not segmented, potentially affecting landcover applications. Apart from this, various methods reported in this paper using deep learning don't include object irregularity, illumination factor, poor contrast, noise, etc. The exclusion of these points will eventually affect the accuracy of classification using deep learning.
 Less Emphasis towards Computational Performance: Although there are more comprehensive deep learning approaches to analyze satellite images, it is essential to identify the proper model. Different models have different performance patterns, and there is no fullproof deep learning model generalized towards analyzing satellite images. Inappropriate selection of deep learning model towards segmentation is a complex task as segmentation operation to be applied wholly subjected and application-oriented. This eventually leads to computational complexities, evident from limiting points found in existing studies discussed in this paper. Without addressing computational performance, it is quite questionable to understand their applicability.

B. Discussion
A closer look into the deep learning approaches shows many methods for analyzing satellite images, but not much work is reported towards segmentation approaches. One interesting observation noted in all segmentation-based approaches is that almost every research work has adopted different datasets with different properties and implemented them. Although Sentinel, MODIS, and QuickBird are frequently adopted datasets, they differ in addressing different problems. A better form of progressive work towards a deep learning approach is required considering a large scale similar dataset first, which can be compared with other existing datasets later. But this is not the case with existing methods. Another observation is that existing approaches are not witnessed much with scalable and consistent performance. LSTM, which is considered a better variant of RNN, is seen with time consumption and increased complexities in many cases. This is because architecture towards an extensive memory system is theoretically proven, and its performance doesn't scale up when exposed to a complex and challenging environment. Apart from this, CNN has implemented either individually or in combination with other training models. The standalone implementation of CNN towards the segmentation process is witnessed with various challenges that are not attended. The first challenge in standalone CNN implementation is associated with the drastically slower operation that consumes enough training time for the larger size of the satellite image. None of the existing studies has reported overcoming the dependencies of a resourceful graphical processing unit for supporting extensive layers of training in CNN. Apart from this, after the object is identified from satellite images using the CNN technique, it must be encoded for better accuracy. However, it is not feasible for CNN to encode pixel position and identify changes in object orientation. This will potentially affect the accuracy. Hence, there is a need to address the inherent issues of deep learning and attend to other matters.

VII. CONCLUSION
Remote sensing and satellite images have become essential applications for change detection and classification. With an increasing rise of deep learning-based approaches for analyzing satellite images, the idea of this paper is to review the existing approaches associated with segmentation. The novelty points of this paper are i) existing review papers has reviewed semantic segmentation, segmentation with specific application, whereas the proposed review paper has explicitly discussed all the recent segmentation approaches using deep learning with various application grounds offering more technical insights, ii) proposed review contributes to updated research trends to understand most dominant deep learningbased technique suitable for segmentation, iii) each study has been discussed concerning good points and limiting factors for offering more granularity in review findings, iv) the proposed review work contributes towards research gap followed by a discussion to know less spoken information about strength and weakness of existing schemes. Future work will be further carried out to address the research gap identified in this paper. A computational framework can be designed to consider various artifact inclusion combined that has never been done before. This will offer a scope to introduce a novel preprocessing approach that can potentially contribute to the enriched feature extraction process. Further, a novel deep learning model can be framed, emphasizing reduced training demands, reduced processing time, and optimal computational performance.