Application of Deep Learning in Satellite Image-based Land Cover Mapping in Africa

Deep Learning Networks (DLN), in particular, Convolutional Neural Networks (CNN) has achieved state-of-theart results in various computer vision tasks including automatic land cover classification from satellite images. However, despite its remarkable performance and broad use in developed countries, using this advanced machine learning algorithm has remained a huge challenge in developing continents such as Africa. This is because the necessary tools, techniques, and technical skills needed to utilize DL networks are very scarce or expensive. Recently, new approaches to satellite image-based land cover classification with DL have yielded significant breakthroughs, offering novel opportunities for its further development and application. This can be taken advantage of in low resources continents such as Africa. This paper aims to review some of these notable challenges to the application of DL for satellite image-based classification tasks in developing continents. Then, review the emerging solutions as well as the prospects of their use. Harnessing the power of satellite data and deep learning for land cover mapping will help many of the developing continents make informed policies and decisions to address some of its most pressing challenges including urban and regional planning, environmental protection and management, agricultural development, forest management and disaster and risks mitigation. Keywords—Deep learning; satellite image classification; land cover mapping; Africa


I. INTRODUCTION
Africa is the world's second largest and second most populous continent in the world with an estimate population of one billion, three hundred (1,300,000,000) people. She has a total land area of approximately thirty million, three hundred and sixty-five thousand square km (30,365,000). Common land cover/use classes or physical features which describe the usage of the land areas include water bodies, forest, agricultural land, barren land, built-up/settlements, etc.
The rate of population growth, socio-economic activities, urbanization, and other environmental forces has led to the exploitation, degradation and subsequent altering of these land features [1]. All these described above have led to the constant changing features in the land cover as illustrated by K. Kalra et al in [2] showing land cover change of Abuja-Nigeria from 1987, 2002, and 2017 as shown in Fig. 1.
These changes have contributed to the enormous challenges it faces today, including food scarcity, degradation of habitats, outbreaks of epidemics, environmental hazards, and consequently, climate change and global warming. The impact of these challenges is complex and has an adverse effect on the people, the economy, and the environment causing a lot of concerns to both individuals and the governments, locally and internationally.
To mitigate the effects of these enormous challenges, it is critical to map the earth surface to collect information on the status of the different classes of bio-physical cover of the earth surface and its changes over time, otherwise known as land cover mapping. The data obtained also provides valuable information for developing sustainable policies or strategies to mitigate the impacts as well as meet the increasing demands for basic human needs and welfare. Land cover mapping is regarded as one of the most important application in remote sensing.
Globally, scientists spend a greater percentage of their time and money on fieldwork mapping the land covers. In Africa this fieldwork typically involves traveling to distant and sometimes isolated locations for long period of time. It also requires braving adverse weather conditions, inaccessible terrains, security risks and performing physically demanding tasks. With the growing demand for real-time land information, it is becoming increasingly difficult to physically visit every location. While carrying out field surveys is still fundamental, the negative costs of field surveys, slow speed of generation of geo-information, objectivity of human interpretation of field observations and inconsistent land cover maps are some of the factors limiting the total reliance on field surveys.
Recently images of the earth surface acquired with highlyaccurate remote sensing technology, operated by the government or commercial businesses around the world, have been seen to contain detailed information of the earth's land cover. Its proliferation over the past few decades has given a radically improved understanding of our planet's landforms, vegetation, and resources. Today, it is considered one of the most important data sources for earth observation because of its ability to cover very large area. It has the potential to provide more accurate, reliable and faster land cover information. Additionally, it reduces the manual effort required to conduct field surveys. www.ijacsa.thesai.org  The advent of Machine Learning (ML), a subset of Artificial Intelligence (AL) has brought a lot of hope to computer vision applications. It is concerned with algorithms and techniques that allow computers to analyze data, learn from the data and make prediction on new data using computational and statistical methods. Several papers have been published introducing the basic concepts of ML to the remote sensing and earth science community. However, earlier experiences with automated classification in Mali, Senegal and Niger produced disappointing results [2]. Important land use and land cover types, such as agriculture in the Sahel, could not be uniquely differentiated from other types based on their spectral reflectance properties. Automated methods of image classification are based on spectral image data and are often plagued by problems of misclassification. Spectral reflectance of land surfaces is measured by remote sensors may be quantitative but not absolute which makes them not necessarily unique. This homogeneity often leads high variation within various land covers and subsequently poses a huge challenge in mapping and analyzing land cover types based solely on their spectral properties.
As the revolution and growth of ML continued to bridge the gap between the capabilities of humans and machines, it gave birth to Deep learning, the fastest growing field of machine learning. Compared to traditional ML techniques, it has impressive learning capability, does not require expertise or domain knowledge and is time-saving. Its potentials in remote sensing tasks have been ascertained by many researchers in developed countries for solving problems in the domain of geological mapping, land cover mapping, land use planning, geological image classification, infrastructural development, mineral resources exploration, etc. Despite its broad use in developed countries, performing land cover classification with deep learning has been very challenging in developing countries. Recently, new approaches have yielded significant breakthroughs, offering novel opportunities for its further research and development. This can be taken advantage of in low resources environments such as Africa; however, the shortage of scientific papers discussing such emerging approaches or concepts in an easy and commonly understood way remains one major obstacle to its application. Inspired by the above, the main contributions of this paper are to: 1) Provide a brief overview of a typical DL model. 2) Provide a brief overview of satellite image classification with CNN.
3) Discuss the challenges it poses to developing countries.

4)
The emerging solutions for satellite image classification in low resource environments are further discussed.

5)
Finally, a discussion on the potential of DL technique in land cover mapping in Nigeria is provided.

A. Overview of Satellite Image-land Cover Classification with CNN
This section gives an overview of the processes involved in classification of satellite images using CNN.
Land cover classification from satellite images is the process of assigning a land cover class or theme to each pixel in a satellite image based on its features otherwise known as spectral information. To achieve this task with deep learning, a DL model usually a convolutional neural network model is designed. A set of pre-labeled training data is fed into the CNN model. This model attempts to learn the visual features present in the training images associated with each label, and classify unlabeled or unseen images accordingly. The steps below describe the processes from the input image to classification result: 1) Design the architecture for the CNN model: The first step in land cover classification with CNN is to design the CNN Architecture. There are two ways by which this can be achieved: you can either design a CNN model from scratch (if you have a large dataset and good computational resources) or repurpose/redesign an already existing model using transfer learning technique. In the first case you choose the type of layers and the way they will be arranged and connected to each other. While in the latter case, you choose an existing architecture known as pre-trained model, already trained on large datasets and continue your design from it. Designing a CNN architecture from scratch can be challenging and timeconsuming. This is because it requires expert knowledge and effort due to the large number of architectural design choices [3].
2) Hyperparameter selection and optimization: At this step, configuration variables that determine the network architecture or how the network is trained are selected, www.ijacsa.thesai.org initialized and optimized. These variables are known as parameters and hyperparameters and they play a big role in the design time and prediction accuracy of the classification task. Unlike a model's parameter which are learnt or estimated from data during training, the hyperparameters must be selected, and optimized before and during the training process, respectively. Usually, an empirical process that involves a lot of trial and error is used to optimize these hyperparameters. However, it is time-consuming requires expertise from the domain and, in some cases, where the number of hyperparameters in a CNN is so large, it is difficult to optimize manually.
Although a variety of algorithms including grid search, random search, Bayesian Optimization, Gradient-based Optimization, Evolutionary Optimization have been proposed and used for optimizing the values of these hyperparameters, the development of efficient hyperparameter optimization algorithm still remains a huge research area.
Common hyperparameters, their functions and common values are shown in Table I.
3) Select dataset: An important step in landcover classification with CNN is to source for your dataset. These datasets comprise of a collection of N images, each labeled with one of K distinct classes. These datasets are usually divided into three: training set, validation set and Test set commonly in the ratio of 80% 10% 10% respectively. These data can be pre-processed (resizing, rescaling, etc.) To train a high performing CNN model requires large quantity of dataset as well as equal distribution of the known classes in the dataset (balanced dataset).

4) Model training:
The training phase is when the network "learns" from the data it will be fed with. CNN models are trained using an optimizer (optimization algorithm) through a back-propagation process. The weights and biases in the network are changed to minimize a cost function. The errors between the network output and the ground-truth value are calculated by a predefined loss function. The larger the loss function the farther we are from the correct answer. An optimizer updates the model's parameter based on the output of loss function with the goal of reducing the loss function as much as possible. The errors are back propagated based on the partial derivatives after which each weight and the corresponding error term are adjusted. The iterative process is illustrated by the steps below:  Input a batch of Data.
 Forward propagate it through the neural network.
 Get the loss (i.e., the Actual value or the predicted value).
 Back Propagate to calculate the gradient.
 Update the weights parameter using the gradient.
 The above process is performed in an iterative way.
 Until the network has learned or converged. The forward and backward process is illustrated in Fig. 2.
This is done using an optimization strategy such as gradient descent. The gradient descent algorithm simply measures the change in all weights with regards to change in error. This is illustrated in the formula 1.
where [w + ] is the new weight, w is the current weight, denotes the learning rate, a parameter that determines how much an updating step influences the current value of the weights, i.e., how much the model learns in each step. c is the gradient of the cost function. Gradient can be thought of as a slope of a function. The higher the gradient, the steeper the slope and the faster a model can learn. If the slope is zero, the model stops learning. In calculating the error of the model during the optimization process, a loss function must be chosen. Training is commonly done using the Python programming language with either Goggle TensorFlow or Pytorch. Both are extended by a variety of APIs, cloud computing platforms, and model repositories.

5) Model evaluation:
At this step, the model accuracy is evaluated. Confusion matrix, accuracy, precision, recall and F1 are commonly used metrics to quantitatively evaluate the performance of a model. These metrics are adopted as a classification performance indicator by the research community. The validation and testing datasets are used to evaluate the model. While the validation dataset is used to give an estimate of a model's performance while fine tuning the model's hyper parameters, the test set is used to provide an unbiased evaluation of the final model fit on the training dataset. A well-performing trained neural network is expected to correctly predict the labels of unseen images.

A. Challenges
Although deep learning has been around for decades, its use among developing countries is still relatively new. This section discusses some of the challenges to using deep learning for satellite image classification tasks in scare resource environments: 1) Lack of training data: Despite deep learning's powerful feature extraction capability, in practice it is difficult to train CNN models with small quantity of datasets [4]. This is because a deep neural network has so many layers with many nodes in each layer, which results in exponentially many more parameters to tune. Without enough data, we cannot learn from these parameters efficiently. Unlike the huge number of everyday training images such as clothes, furniture, cars, animals etc. used in popular deep learning models, satellite image data is expensive to obtain. Therefore, there are no sufficient labeled satellite image patches to train a very deep network. The lack of the right quantity of training data in remote sensing domain is a significant obstacle to utilizing the full power of deep learning models especially in developing countries [5].
2) Lack of technical expertise: Technical expertise is an important factor in DL for satellite image classification research, development and application. Taking full advantage of this emerging technology, requires considerable technical expertise. This is because satellite image-based land cover mapping with DL lies in the intersection of distinct disciplines such as information technology, remote sensing, mathematics, image processing, programming etc., which means that, a fairly high skill level is required for this area.
Unfortunately, finding the right professionals with the appropriate skill combination is a huge task in Africa [6]. Furthermore, this domain is very dynamic, with new technologies emerging daily, making it difficult for professionals to remain abreast of latest trends. This challenge can put this technique out of the reach of many researchers and operational experts that wish to use this emerging technology in land cover mapping.
3) Image quality (spatial and spectral resolution): Two fundamental characteristics of a satellite image are its spatial and spectral resolution. They are important factors that play a role in determining the performance of a satellite image-based classification task using DL. The Spatial resolution refers to the smallest size an object can be depicted in an image. It is usually presented as a single value representing the length of one side of a square. While spectral resolution refers to the number and dimension of distinct wavelength intervals (bands) of an electromagnetic radiation a sensor is capable of measuring.
The higher these resolutions mean the more detail it can provide and the higher the cost of obtaining them. Purchasing the images is very expensive and unaffordable in most developing countries. Most researchers in African have made use of freely accessible satellite images from earth observation programs such as European space Agency's (ESA) and the United States National Aeronautics and space Administration's (NASA) [7], such images usually have limited spatial and spectral resolution and as such affects the degree of accuracy. Lack of access to high-resolution images is a major limiting factor to DL use in land cover classification.
In addition, some researchers in Africa use images from different sources with varying resolution. Despite its usefulness, the extraction of information from such multiresolution sources using deep learning technique is still a grey area in the research world.

4) Lack of image pre-processing techniques:
Satellite data often contain cloud cover, noise and other distortions from imaging systems, sensors, and observing conditions. Therefore, further pre-processing is required to deal with the defects. Image processing is a method to perform some operations on an image, in order to improve the image data (features) by suppressing unwanted distortions and/or enhancement of some important image features to extract some useful information from it. Despite its high potential value to DL accuracy, the automation of data preprocessing has been mostly overlooked by the machine learning community [8]. The dearth of methodological knowledge in this area continues to be a limiting factor to DL usage especially in developing continents.

5) Lack of data repositories:
In most developed countries several remote-sensing image datasets were introduced by different groups to enable machine-learning based research development and application for satellite image classification [9]. These dataset repositories include Merced (UCM), PatternNet, NWPU-RESISC45, Aerial Image Dataset (AID), EuroSAT and most recently BigEarthNet. Labeling of satellite images is very expensive since it requires trained professionals, so optimally leveraging existing data repository is essential. Unfortunately, such data repositories are not available in Africa and this has hindered the research, development and application of DL in satellite image classification tasks. www.ijacsa.thesai.org The heterogeneous appearance of satellite images, variation in geography and difference in spatial details of different images limits the use of such data repositories in developing countries.

6) Computational resources limitation:
A typical DL application requires huge computational resources owing to (i) the large amount of multiply and accumulate (MAC) operations as well as memory access operations it executes (ii) the huge number of parameters (weights) to learn (iii) repeated model optimization to select the best hyperparameters for optimal performance.
In the last decade, the non-availability of computers with such high processing capability posed a huge challenge to the development and application of deep learning. The birth of Graphic processing units (GPUs) has given hope to machine learning enthusiasts. This is as a result of their highly parallel structure for distributed computational processing, large memory bandwidth to accommodate large dataset and other GPU resources which allows them to handle the processing of large amount of data faster and more efficient. However, due to the cost of GPU's, this powerful supercomputer is widely out of reach of most DL researchers in developing countries.
The widely available CPUs cannot be sufficiently relied on for training a deep learning model. Consequently, nonavailability of computational resources remains a huge challenge in the development and application of deep learning in developing continents.

7) Political and economic constraints:
Currently, Nigeria lack access to emerging tools and techniques for satellite image-based classification tasks with DL due to budgetary constraints, licensing issues or bandwidth limitations. Although significant investments in geo-information technologies have been made in the past by multilateral and bilateral aid projects from developed countries, national governments in Africa have generally not supported RS applications development [6]. Most universities and government organizations in Africa are poorly funded and thereby do not have the capacity to utilize emerging technology in remote sensing [10].

B. Emerging Solutions
A number of notable attempts have been made in the past to solve some of the challenges mentioned above. This section discusses some of these emerging solutions to satellite image classification using Deep learning.

1) Augmentation:
The classification performance of DL models highly relies on the training procedures and the quantity of diverse training data. Data Augmentation is a very powerful method that offers the opportunity to solve the problem of inadequate dataset to train accurate and robust classifiers. It is used to artificially expand the size of a training dataset. It creates variations of the images using a range of operations. The augmented data will represent a more comprehensive set of possible data points which helps to improve the ability of the model to generalize what they have learned to new images and prevent overfitting [11]. Basic augmentation approaches which can make a model invariant to changes in size, translation, occlusion, viewpoint, illumination etc. include: a) Reformation of original images by basic operations such as cropping, stretching, rotation of the image, image rescaling, horizontal and vertical flips, etc. b) Altering the intensities of the RGB channels of raw data proposed by Krizhevsky et al. [12].
c) Random Erasing introduced by Zhong et al [13]. This technique is specifically aimed at proffering solution to the problem of occlusion which is a limiting factor to the generalization ability of CNN's. An occlusion occurs when part of an image is closed up or blocked off. d) Generative Adversarial Networks (GANs), proposed by C. Bowles et al. [14], which offers a way to unlock additional information from a dataset by generating synthetic samples with the appearance of real images.
Most of these operations are useful in remote sensing because they do not increase the spectral or topological information in the satellite images which is important for consistent classification result. In most research papers, the experimental results with augmentation operations outperformed those from the same deep model architecture training on the original dataset. Recently there has been extensive use of data augmentation to improve CNN task performance [15].

2) Transfer learning (TL):
Transfer learning is one of the most emerging design methodologies for also solving the challenge of limited training data. It involves training a CNN model on a base dataset for a specific task and then using those learned features/ model parameters as the initial weights in a new classification task. Instead of starting the learning process from scratch you start from patterns that have been learnt while solving a different task. It relies on the fact that features learned in the lower layers of a CNN, like edges or curves or color may be general enough to be useful for other classification tasks. By transfer learning we are able to take advantage of the expensive resources (expertise, training data and computational power) that were used to acquire it.
To mitigate the problem of limited labelled training data in remote sensing, most researchers have used transfer learning to leverage on either satellite image data repositories such as UC Merced (UCM) [16], EuroSAT [7] and more recently BIGEARTHNET [17] or using natural image data repositories such as ImageNet or CIFAR-10. For instance, R. P. De Lima et al [18] successfully used transfer learning to address a suite of geologic interpretation tasks. M. Xie et al. [4] used a sequence of transfer learning steps to design a novel ML approach using satellite image.
3) Improvement in architectural design: As seen in Fig. 1, a typical CNN architecture is formed by the stacking of multiple and non-linear processing neurons in layers. Different variations of CNN architectures have been proposed in an attempt to solve some of the challenges faced in using DL in less resource environments. This includes: www.ijacsa.thesai.org a) Improving the layer structure of CNN's to accommodate the limited training data [19].
b) Encoding rotational equivalence in the network structure [20]. Rotation Equivariance is the ability of a network to generalize feature detection in different locations, which is key to the generalization ability of CNN's. c) Capsule network (CapsNet) [21] and light convolutional neural network (LCNN) [22], were designed to suite low computational resources and the small number of training samples. d) Other notable techniques incorporated in CNN architectures include regularization, batch normalization, addition of shortcut connections between layers etc. 4) Improvement in image quality: To solve the problem of non-availability of high-resolution images, use of images with different resolution, poor-quality images due to varying distortions. Researchers have successfully used different techniques such as: a) Super resolution (SR) techniques-These techniques are used to enhance or produce a high-resolution image from one or two low resolution images using algorithmic means. Several SR techniques have been proposed and successfully used.
b) Multi-source data fusion-These techniques combine data from multiple sources to produce a high-quality visible representation of the data to improve model training effort [23].
Improving image quality using various techniques remains an active area of research.

5) Computational resources optimization:
To solve the challenge of computational resources which is a limiting factor to the use of DL, a number of notable solutions has been proposed and successfully used. These techniques/approaches can be beneficial to researchers in developing countries. a) Hyperparameter optimization: it has become popular to use different optimization strategies to design network architectures that are computational efficient to train [1]. b) Algorithmic Design: reducing computation resources consumption through CNN model designs [24]. c) Cloud-powered DL: This is another active research area [24]. It allows deep learning training and evaluation on cloud-based platforms, thereby limiting the need for high-end computational resources.

6) Open-source developmental tools:
The emergence of open-source developmental tools and libraries such as Pytorch, Caffe, Keras, TensorFlow, Theano for developing and training models are also speeding up progresses in deep learning [24] and will be also helpful in developing deep learning models. These tools and libraries allow the re-use of existing state of the art models thereby taking advantage of the expensive resources (expertise, training data and computational power) used in their development. Other usage of these open-source developmental tools includes: a) Ease of configuring the learning process b) Ease of plotting accuracy graphs which provides useful insights about the training of the model.
c) The capability to register callbacks when training a deep learning model. (This function allows saving model weights when the model's performance on tracked metrics improves or view a model while it's training).
GitHub [25], a web-based collaborative platform for software developers is another interesting development which can aid the application of DL in satellite image classification tasks in less resource areas. It is a free to use platform which allows developers to collaborate on a project. African researches can take advantage of the platform to learn through shared resources in the platform. It also provides cloud resources, programming tools, etc.

C. Potentials of Land Cover Mapping Applications
The benefits of using intelligent systems such as deep learning to automatically classify satellite images are becoming more evident in land cover mapping applications. These applications will aid the management of long-term challenges faced in Africa. In this section, we present an overview of some land cover mapping applications.

1) Land use planning:
Land use planning is the process of allocating and reallocating land resources for different purposes such as infrastructure distribution, recreational and industrial use, transportation routes etc. This is usually done by the government to ensure efficient usage of the land resources for orderly development. Although humans have been modifying land to obtain food and other essentials for thousands of years, the current frequency and intensities of LULC changes in African are far greater than ever in history, driving unprecedented changes in ecosystems and environmental processes at local, regional and global scales [26]. The use of deep learning techniques for automatic land cover classification is very critical for studies involving extensive land mass such as found in most part of Africa, Other benefits include its cost and time saving as opposed to other machine learning techniques. Additionally, the digital information is also easy to retrieve, update, edit and store.
2) Agricultural development: Despite the conscientious effort by the government to increase the gross domestic product (GDP) in Africa, the population of malnourished people is still on the increase. This menace can be salvaged by expanding food production through agriculture using satellite images and deep learning technique. Ways through which the above can be achieved include: 3) Population estimation: Population estimation involves obtaining a reliable estimate of the number of persons in a given area at a particular time. Accurate mapping of population distribution is essential for policy-making, urban planning, admiration and risk management in hazardous areas [27]. Although census (a count of the population) has been used in the past to collect this important information in Africa, the high cost of carrying out the exercise has made it difficult to sustain. For instance, in Nigeria, the last census was conducted in 2006. The automatic classification of land cover using satellite images and DL techniques will provide a cost-effective alternative for estimating the population. It will also allow for large coverage mapping. As seen in [28], the CNNs model's estimate gave comparable results to the census county-level population projections. The population can be estimated through mapping the number of settlement areas, the area of urban land, the area occupied by different land uses or even directly from spectral and textural information available at the pixel scale [27].

4) Geoscience:
In recent years deep learning networks has become an increasingly important interdisciplinary tool that has advanced several fields such as healthcare, image processing, speech recognition etc. however, its adaptation in geoscience has been relatively low [29]. Mapping large land areas with satellite data and deep learning technique can be applied in various geoscience tasks including: a) Explore and prepare maps quickly to help evaluate the geo-potential of any specific area.
b) Study the general physical characteristics of rocks (lithology).
c) Geological and geo-structural mapping d) Mineral exploration e) Borehole drilling f) Geo-hazard monitoring 5) Disaster management: According to a world bank report "Developing countries suffer more than 95 percent of all deaths caused by Natural disaster and losses (as a percentage of GDP) are 20 times greater. Africa has experienced its fair share as a result of their high population densities, poor infrastructure, unstable landforms and severe weather condition. These Disasters range from natural to man-made such as drought, floods, earthquakes, desertification, climate change, etc.
Satellite images cover wide range of areas and provides massive amount of land cover information. Analysis of these images using deep learning technique is imperative for effective mitigation and management of these risks.
Its uses in disaster management include: a) Mapping fire, flood and hazard prone areas for disaster monitoring and mitigation.
b) Weather forecasting. c) Disaster response planning. d) Impact/ Damage assessment.

IV. CONCLUSION
Africa is confronted with serious developmental challenges arising from unplanned and unguided use of its land resources. It is therefore critical to provide accurate and up-to-date information of its land surface. The benefits of using intelligent systems such as deep learning to map massive large land area is becoming more evident. This information will aid the management of the long-term challenges faced in Africa. Furthermore, it will provide a better cost effective and time management solution than the use of visual interpretation or other machine learning techniques (unsupervised, supervised and object-based) currently obtainable in Nigeria today. This work aims to deepen the understanding of application of CNN for satellite image-based land cover mapping in developing continents, and encourage African researchers/scientist to leverage these new digital technologies to drive large-scale transformation and competitiveness in earth observation application.