Spatial Cloud Detection and Retrieval System for Satellite Images

In last the decade we witnessed a large increase in data generated by earth observing satellites. Hence, intelligent processing of the huge amount of data received by hundreds of earth receiving stations, with specific satellite image oriented approaches, presents itself as a pressing need. One of the most important steps in earlier stages of satellite image processing is cloud detection. Satellite images having a large percentage of cloud cannot be used in further analysis. While there are many approaches that deal with different semantic meaning, there are rarely approaches that deal specifically with cloud detection and retrieval. In this paper we introduce a novel approach that spatially detect and retrieve clouds in satellite images using their unique properties .Our approach is developed as spatial cloud detection and retrieval system (SCDRS) that introduce a complete framework for specific semantic retrieval system. It uses a Query by polygon (QBP) paradigm for the content of interest instead of using the more conventional rectangular query by image approach. First, we extract features from the satellite images using multiple tile sizes using spatial and textural properties of cloud regions. Second, we retrieve our tiles using a parametric statistical approach within a multilevel refinement process. Our approach has been experimentally validated against the conventional ones yielding enhanced precision and recall rates in the same time it gives more precise detection of cloud coverage regions.


INTRODUCTION
Satellite images have become a common component of our daily life either on the Internet, in car driving and even in our hand-held mobile handsets.There is huge image content appearing every second through multiple competing satellite systems [1].Manual interaction with this large volume of data is becoming more and more inappropriate, which creates an urgent need for automatic treatment to store, organize and retrieve this content [2].
Traditional textual meta-data such as geographic coverage, time of acquisition, sensor parameters, manual annotation, etc., are now insufficient to retrieve images of interest when we target a specific visual concept such as desert, rock, crops, clouds or others [3].In many fields, we need specific contents from the satellite images as specific crops, geology structures or climate changes.
Manual annotation needs to annotate every region by human where users enter descriptive word after image download from satellite.However it is a labor intensive and tedious process [4].Therefore we need to retrieve images that contain our intended contents automatically.The content based image retrieval (CBIR) approach challenge is how to fill the gap between the low level features that describe the scenes and our human understandable semantic concepts.This gap of understanding is called the semantic gap [5] [6].In addition, these semantic concepts themselves may be defined differently, e.g. each one of us interprets what he sees from his point of view.
The most commonly used features include those reflecting color, texture, shape, and salient points in an image.For instance, in a color layout approach, an image is divided into a small number of sub-images and the average color components (e.g.red, green, and blue intensities) are computed for every sub-image [7].Texture features are intended to capture the granularity and repetitive patterns of surfaces within an image.
The traditional satellite cloud image search method was based on the file name and the sensor parameters of every image.The disadvantages of this method are that it cannot describe the image contents such as cloud shape [8] and also leads to the inconvenience in retrieving images [9].
We have done statistics for Spot4 satellite observation on the Middle East from NARSS archive to determine the percent of clouds on these scenes in the period starts from January 2006 to December 2009.There were about 170000 scenes covering the receiving station area.Normally for each scene; an expert has to decide manually the percentage of cloud coverage.
The different percentages of clouds coverage during each year are shown in figure 1 and table I.During the last decade many approaches have been proposed to retrieve satellite images using their content in general.Specifically less effort has been devoted to cloud despite its importance during satellite image processing or meteorological management and observation.F. Acqua and P. Gamba presented a tool for shape similarity evaluation for query-by shape searching into meteorological image archives based on the point diffusion technique [8].R. Holowczak et al., reported a system that can automatically determine whether a region of interest is visible in the image, free from cloud, and can incorporate this into the meta-data for individual images to enhance searching capability [10].T. Nauss et al., have proposed an algorithm based on the analytical solutions of the radiative transfer equations valid for optically thick weakly absorbing cloud layers [11].D. Fu and L. Xu have used 2D-Gabor wavelet in satellite image classification [12].D. Upreti has used Gray level Co-occurrence Matrix GLCM and histogram quantization technique to retrieve cloud patterns to discover Tropical Cyclone [13].
The previous approaches were concerned with cloud retrieval.Some observations were found as follow:  Most of the previous work was directed to meteorological observation images with very low resolution. It doesn't care with cloud removal preprocessing operation which is still done manually. It doesn't handle spatial distribution of cloud within the scene.Through our new proposed approach, we covered these missed points of research.It will be very useful to detect and retrieve these clouds and consequently as further process, remove them and replace the cloudy sub-images with other clear ones.

III. SYSTEM OVERVIEW
Our system is composed of two main stages as shown in figure 2. First stage is cloud signature database building stage which is responsible for building up the features vectors for different clouds patterns.Second stage is cloud detection and retrieval stage in each satellite scene, which determines where the clouds in this scene and their percentage are.We have used two strategies in our system [1].First one is query by polygon strategy where we build our signature database using cloud polygons instead of rectangular shapes.Second one is multiple size tiling strategy where we break down our scene into different sizes followed by features extraction to obtain features vectors.According to these strategies, the two stages have passed through different sub-processes starting by tiling then features extraction to from features vectors.This is done for each level of retrieval.

A. Cloud Signature Database Building stage
There are many forms that clouds appear with in satellites images as shown in figure 3.These forms differ depending on altitude and density of clouds [14].These forms start with low dense water vapor to high dense clouds with different altitude.Beside clouds there are also their shadows which should be taken into account during retrieval.The first stage of the cloud retrieval process is to determine cloud signature as shown in figure 4.This is done using query by polygon approach where we first determine different type of clouds, then we draw georeference polygons that contain these clouds.These different types of clouds are used to form signature databases according to the type of tiling size used.Using our proposed feature extraction algorithm we compute features vectors of cloud polygon tiles

B. Cloud Detection and Retrieval Stage
After building our cloud signature database, we have to build the features vectors for each scene as shown in figure 5. www.ijacsa.thesai.orgOur approach based on breaking down the whole image into small sections of sub-images called tiles.The number of resulted tiles is determined by their sizes.
According to the two stages hierarchy used in [1] for the retrieval process, we have rebuilt the system.Instead of starting with features databases and get query features for each semantic, we have reverse the order which begins with building cloud signature database then the input scene is treated as query image.The two stages hierarchy, candidate selection stage and refinement stage, are used.In candidate selection stage, we define the primary candidate's area for clouds.In refinement stage we refine the first stage areas using its neighborhoods with smaller tile size.

V. MAIN SYSTEM PROCESSES
The main system processes; features extraction, retrieval and evaluation have some key points to be included into the two levels hierarchy to enhance cloud detection and retrieval system.

A. Features extraction Process
We have depended on various domains to get the tile signature either for the cloud example dataset or for input satellite image.These domains extract the spectral and textural characteristics of images.www.ijacsa.thesai.orgTo build our feature vector database , we start by determining the components of our feature vector for each tile and its length.For each multispectral tile image with number of bands, we form the feature vector of each band depending on different spectral and textual characteristics of the image.We used the mean and standard deviation statistics of feature domain for each band.The features we used are histogram , Daubechies wavelets transform coefficients ,Discrete cosine transform coefficients and Discrete Fourier transform Coefficient [15].Using these domains, we build various feature vectors , and .For each multispectral tile with number of bands, we build domain feature vector for each domain as in equation 1.

[ ]
We then use these domain feature vectors to form domain feature database for m number of tiles as in equation 2.
[ ] Using all feature vectors for all tiles; we formulate our cloud signature database or input scene feature vectors using all domains as in equation 3.

B. Retrieval Process
The retrieval process, as shown in figure 6, has two sub stages as mentioned in [1], the candidates selection stage and the refinement stage.
In the candidates selection stage, we use tile size features to get the most appropriate matching tiles similar to cloud.In the refinement stage, we use the tile size features of the first stage results and their neighborhoods to get our final results.
We have used a retrieval engine that based on statistical parametric paradigm using normal distribution [16] rather than the traditional nearest neighbor approach.The statistical parametric paradigm aimed to determine the parameters of the statistical distribution that the data follows as mean and standard deviation .We define the the training dataset that represent cloud example tiles set and non cloud example tiles set as in equation 4.
[ ] This is done for every tile size.Therefore, our global signature data is formed from all sizes used in our system as in equation 5.

[ ]
After we have built our statistical model using , SCDRS is now ready to receive the satellite images as an input.

C. System Evaluation Process
Our evaluation process is carried out in terms of recall and precision (equations 6, 7 respectively) using relevant areas in the database.www.ijacsa.thesai.org We use the map coordinates (i.e.Latitude and Longitude) instead of using file coordinates (pixels).As the map coordinates is universal and continuous where the file coordinates is file specific.The global coordinate system is independent from the pixel size whatever the scanning satellite or stored file.So the percent of cloud area in the input scene is as shown in equation 8where the actual cloud percent retrieved is calculated as shown in equation 9

VI. EXPERIMENTAL RESULTS
On our experiments we have used Spot4 satellite scenes with different cloud cover percents which cover about .Each scene covers 60 of earth surface in Egypt with pixel size of .We used also Landsat archive images database with different cloud coverage percentages.There scenes cover about with pixel size.
Each scene has been divided into sub images of and .The experiment scenes have formed more than sub-images which are preclassified clouds images.We have used samples of different clouds types to form our cloud signature database which is composed of sub images acting as clouds examples.

VII. RESULTS ANALYSIS
For our semantic concept which is cloud; first we have used two categories of polygons shapes, one used for building cloud signature database and the other is tied with each input scene used for evaluation.An example result of our system is depicted in figure 7.
The results of each input scene could be evaluated by two ways.First, the input test polygon for cloud; which determines exactly the positions of clouds in this scene and the area of clouds compared to the whole scene area.Second, the expert's estimation used in ground station which estimate the range of cloud cover as explained in table I.
As shown in table II, results of the two successive stages of the system are presented.It shows how the different types of features domains affect the results.
To determine cloud percentage coverage, we have calculated the total area of output results cloud tiles with respect to the whole scene area which is as in equation 8.We have put into consideration that the most important parameter is precision as we should guarantee that the output results have to be more accurate and decrease the non clouds tiles resulted.So, when we select the cloud examples, it should be purely determined.Table III shows the recall and precision results using the different feature domains.The accuracy of different features is very comparable.The results explain that the key point here is the processing time, which is recorded to histogram features as it is the least complex than the others.As the tile becomes more smaller the spectral characteristics become more sufficient than textural characteristics to distinguish between tiles.www.ijacsa.thesai.orgIn this paper, a new approach was developed to detect the percentage of clouds and retrieve their positions within the satellite images using two stages; Cloud Signature Database Building stage and Cloud Detection and Retrieval Stage.The two stages used multilevel framework hierarchy of candidates selection and candidates refinement processes.This is done using spatial and textural features and parametric statistical approach for retrieval process.The capability of the developed system was tested using a dedicated satellite images and assessed in terms of cloud percentage coverage with the traditional precision and recall measurements.Results show that the developed system enhanced the precision and recall and in the same time it gives a closer assessment for cloud coverage to the real area calculations.They also show that the spectral features have higher accuracy than textural features.We propose as future work to represent a system for detecting different types of clouds using more robust retrieval algorithms which integrated with GIS systems.

Fig. 1
Fig. 1 Clouds coverage percentages II.REVIEW OF RELATED WORK

TABLE II :
DIFFERENT RECALL AND PRECISION FOR TWO

TABLE III :
DIFFERENT RECALL(R) AND PRECISION (P) FOR DIFFERENT TYPES OF FEATURES USING 0.5 KM TILE SIZE AND PROCESSING TIME (PT)