Remotely Sensed Data Processing on Grids by using GreenLand Web Based Platform

Developing applications for analyzing and processing different remotely sensed data is very important for environmental predictions and management strategies. Applications focusing on environmental and natural resource monitoring need large data sets to be processed and fast response to actions. These requirements mostly imply high computing power that can be achieved through the parallel and distributed capabilities provided by the Grid infrastructure. This paper presents the GreenLand application as a user friendly web based platform for the use of environmental specialists engaging remote sensing applications using Grid computing technology. Theoretical concepts and basic functionalities of GreenLand platform were tested in two detailed case studies: a land cover/use determination analysis in Istanbul (Turkey) by conducting vegetation indices and density slice classification on Landsat 5 Thematic Mapper (TM) imagery, and the retrieval of large remote sensing products datasets (The Moderate Resolution Imaging Spectroradiometer (MODIS)) for the entire Black Sea Catchment. All the results of different image processing scenarios used in the reported experiments have been developed through the enviroGRIDS project, targeting the Black Sea Catchment (BSC) area. Keywords—GreenLand; Landsat; MODIS; image processing; Grid computing


I. INTRODUCTION
Environmental applications require large volume input data sets that mainly consist of remotely sensed images (some of them up to 1 GB in size).Another important aspect regards the fact that most of the applications in the Earth Science domain use algorithms based on big sets of parameters that have to be combined in a certain way to obtain the accurate results [1].Remote sensing image processing is a very demanding procedure in terms of data manipulation and computing power.As a result, it is mostly impossible to obtain reasonable processing times in environmental applications by using a stand alone machine.Grid infrastructure provides the solution of this problem, by providing parallel and distributed computation methods.
The Grid infrastructure [2] is the execution environment where all the data processing takes place.This emerging technology provides access to computing power and data storage capacity distributed over the globe [3].
Grid computing is the use of multiple computers to solve a single problem at the same time usually a scientific problem that requires a great number of computer processing cycles or access to large amounts of data [4].
The approach presented in this paper is to use the Grid infrastructure that offers high power computation machines that allow parallel and distributed execution of tasks for satellite image processing.The main use case partition into smaller tasks is done automatically at runtime, by the GreenLand platform.The GreenLand application is conceived as free Geographic Information System (GIS) software for geospatial data management and analysis, image processing, graphics/maps production, spatial modelling, and visualization [1].Grid-based GreenLand platform produced within the enviroGRIDS project [5], and available through the Black Sea Catchment Observation System (BSC-OS) Portal [6], offers scalability when dealing with a large number of users and/or a large processing data volume.
The ability of the Grid based platform tested based on two different case studies using remotely sensed data as Landsat 5 TM (Thematic Mapper) and MODIS (Moderate Resolution Imaging Spectroradiometer).In the first case study, land use/land cover categories of Istanbul, were derived by using remote sensing vegetation indices such as Normalized Difference Vegetation Indices (NDVI) and Enhanced Vegetation Indices (EVI) and density slice classification.In the second case study, a workflow was developed to retrieve two MODIS products MOD15-Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation (FPAR) and MOD16 -Surface Resistance and Evapotranspiration (ET) at the scale of the BSC.GRASS (Geographic Resources Analysis Support System) library was used in the study as a code source that necessary for image processing [7].
GreenLand web-based application is able to provide smart solution by automating repetitive processes and using www.ijacsa.thesai.orgdistributed or Grid computing technology when needed.Due to the computing and storage capabilities offered by the Grid infrastructure, the workflow execution times are significantly reduced in comparison with standalone/cluster processing.Therefore this powerful tool will certainly be very useful for sustainable management of the Black Sea catchment by using remote sensing technology.

A. Study Area
Case Study I: Determination of Land Cover/Land Use of Istanbul, Turkey Istanbul is located in north-western Turkey within the Marmara Region on a total area of 5,343 square kilometers.There are several reasons why Istanbul is considered as the test case study for deriving land cover categories phenomena.Humans are increasingly disturbing natural resources, ecosystems and the environments in the city.
As a result, the city is facing serious water quality problems, deforestation, desertification, soil erosion, degradation of land productivity, and the disappearance of biodiversity and sensitive regions.There is an urgent need to determine and monitor the land cover types of the mega city.It is very important to derive land cover/land use information by using freely available remotely sensed data and freely available image processing platforms [1].

Case Study II: MODIS Mosaic at Black Sea Catchment
The Black Sea Catchment area covers more than 2 million square kilometers, overspreading entirely or partially 24 countries.Approximately a hundred and sixty million inhabitants live in this area which is annually frequented by millions of tourists.
One of the aims of the enviroGRIDS project was to assess water resources in the past, the present and the future using the Soil and Water Assessment Tool (SWAT) [8] for the entire catchment.Combined to in-situ data, remote sensing products could be a valuable source of information to improve modeling such spacious and complex environment by providing homogenous datasets over broad area with high temporal resolution.

B. Data
In this paper, two case studies were highlighted.In case study I, 2009 dated Landsat 5 TM data were used to derive land cover/land use categories of Istanbul by using vegetation indices and density slicing classification in GreenLand platform.Landsat 5 TM sensor acquires data in seven spectral bands that cover a wavelength range from 450 nm-2350 nm with a spatial resolution of 30 m.The remotely sensed data were obtained from NASA, by the Warehouse Inventory Search Tool (WIST) [9].
In the second case study, two MODIS level 4 products were selected to develop a workflow which facilitates their retrieval at the scale of the Black Sea catchment; MOD15 and MOD16 [10].These products are multilayers stacks of 1 km resolution issue from EOS (Earth Observation Services) instrument and freely provided by NASA on 8-day basis in .hdfformat.These high level processed products are specially used for monitoring wildfire danger and crop/range drought, and to describe the canopy structure.MOD15 Leaf Area Index (LAI) defines the one-sided leaf area per unit ground area (value between 0 and 8) when Fraction of Photosynthetically Active Radiation (FPAR) measures the proportion of available radiation (400 to 700 nm) that a canopy can absorbs (value between 0 and 1).MOD16 consists of surface resistance and evapotranspiration.

III. GREENLAND OVERVIEW
The GreenLand [11]  Currently the platform is used in two major case studies concerning with the Istanbul geographic area, and the Black Sea catchment region.
When studying complex use cases (like the ones presented in this paper) it is hard to model and simulate them as a whole.Instead the domain field specialists need to divide the use cases into smaller modules and to analyze them separately, and only after that they are able to create the global results.
The solution related to these issues, which was implemented in the GreenLand platform, represents the complex use cases based on mathematical notions from the graph theory.This means that each node of the graph represents one of the algorithms within the use case (e.g.NDVI, EVI, Density slicing), while the edges specify the relations between these algorithms.
Usually the complex use cases take a long time to execute, and for this reason the Grid-based data processing solution was adopted.Because of the workflow-like description of the scenarios, the GreenLand can easily optimize the entire execution by creating group of nodes, similar in complexity, that are processed in parallel on different Grid machines.
The nodes of the workflow are not independent one from another; instead their inputs and outputs are connected through uni-directional edges.This means that at runtime some of the nodes will wait until the corresponding ones will complete their execution.Only after that they can start to process the data.
Based on these aspects, the execution of a workflow will always generate multiple partial results, and it is up to the GreenLand platform to combine them and to create the final outputs that corresponds to the main workflow.
The complexity of processing data over the Grid infrastructure is hidden from the user, by implementing special interactive techniques in the graphical user interface, which Funded by the European Commission seventh research framework through the enviroGRIDS project (Grant Agreement n° 226740).www.ijacsa.thesai.orgallow the creation, instantiation, and execution of the use cases, represented as workflows.
Among other features provided by the GreenLand platform, the following ones are the most important:  Automatic data retrieval from remote repositories, by using the Open Geospatial Consortium (OGC) [12] standard and the File Transfer Protocol (FTP);  Manual upload of spatial data from the user local machine;  Parallel and distributed execution of satellite images over the Grid infrastructure.This involves also the partition of the use case into smaller processes and the tasks schedule over the physical machines;  The execution optimization by creating groups of workflow nodes that have similar complexities.In this way a balanced Grid processing is achieved;  The standardized data processing, by using the OGC WPS service, meaning that other external systems are able to access the exposed workflows and to execute them remotely;  GRASS [13] support for developing new geospatial algorithms that can be further used as nodes within the workflows;  Data level interoperability with other platforms through the Web Map Service (WMS) and Web Coverage Service (WCS) that are part of the OGC standard;  Dynamic data visualization, by overlapping the execution results directly over interactive maps.
From the graphical user interface level the specialist is able to execute simultaneously several workflows during the same working session.The GreenLand uses the project concept that can be defined as a virtual container that stores groups of workflows, defined by the user.
Fig. 1 highlights a project example that contains four distinct workflows that were selected from the right side list.
Before starting the Grid execution the user must instantiate all these items with specific data inputs.

A. Case Study I: Determination of Land Cover/Land Use of Istanbul, Turkey
Istanbul case study reports an application of remote sensing image processing steps to derive land cover/land use categories especially vegetation areas in Istanbul, Turkey by using GreenLand platform.
As it is depicted in the Fig. 2 case study starts with preprocessing of data and then vegetation indices are calculated as the following step.Then, after classification, the accuracy assessment steps are executed and finally the results are presented as thematic maps.
Satellite data pre-processing comprise of radiometric calibrations (atmospheric corrections) for 2009 dated Landsat TM data.The objective of radiometric correction is to recover the "true" radiance and/or reflectance of the target of interest [14].Conversion from Digital Number (DN) to radiance (analogue signal) was conducted by using calibration parameters such as gain and offset.These are available in published sources and image header files [15].
Equation ( 1) is used for the calculation of radiance values from DN values: where L is top of atmosphere (TOA) upwelling radiance, C 0 and C 1 (mWcm -2 sr -1 μm -1 ) are Offset and Gain values, and DN is digital number.
L was converted to TOA reflectance, R (without unit) using the (2).
where R is planetary reflectance, d is Earth Sun distance, L λ is at-sensor radiance, Z is the solar zenith angle in degrees, and ESUN λ is Mean solar Exoatmospheric irradiances on the top of the atmosphere.
The Dark Object Subtraction (DOS) method was used to correct for atmospheric scattering in the path [16].DOS is an image-based approach that assumes dark objects exist within an image and these objects should have values very close to zero (such as water bodies), and that radiance values greater than zero over these areas can be attributed to atmospheric scattering and thereby subtracted from all pixel values in an image.The correction is applied by subtracting the minimum The DOS method was implemented as a workflow within the GreenLand platform, and has the effect of correcting the satellite images affected by the atmospheric conditions.This workflow takes a single input, representing one band of the satellite image and generates an atmospheric corrected one that maintains its original size, projection, and location.
The DOS algorithm is implemented based on a combination of GRASS functions.The advantage of using these functions inside the GreenLand platform is that they can be involved in the parallel and distributed executions over the Grid infrastructure.This means that at runtime the platform transfers to the Grid machines the input band specified by the user together with the GRASS scripts.
A vegetation index is a number that is generated by some combination of remote sensing image bands and may have some relationship to the amount of vegetation in a given image pixel.Description of vegetation indices tested in this study such as Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) is shown in Table .1.The Normalized Difference Vegetation Index (NDVI) is one of the oldest, most well-known, and most frequently used Vis (Vegetation Indices).The combination of its normalized difference formulation and use of the highest absorption and reflectance regions of chlorophyll make it robust over a wide range of conditions.It can, however, saturate in dense vegetation conditions when LAI becomes high.The value of this index ranges from -1 to 1.The common range for green vegetation is 0.2 to 0.8 [17].
The enhanced vegetation index (EVI) was developed as an alternative vegetation index to address some of the limitations of the NDVI.The EVI was specifically developed to be more sensitive to changes in areas having high biomass (a serious shortcoming of NDVI), reduce the influence of atmospheric conditions on vegetation index values, and correct for canopy background signals.EVI tends to be more sensitive to plant canopy differences like LAI, canopy structure, and plant phenology and stress than does NDVI which generally responds just to the amount of chlorophyll present.The value of this index ranges from -1 to 1.The common range EVI value for green vegetation is 0.2 to 0.8.Two alternative indices were taken into account (NDVI and EVI) and implemented within the GreenLand platform as independent workflows.The development of these algorithms is based on the formulas described in Table 1.
For this experiment the 2009 dated Landsat 5 TM satellite image bands were used.The NDVI requires the Red and NIR bands as input, while the EVI workflow expects the usage of valid inputs for the Blue, Red, and NIR layers (Table 2).

Outputs:
1. EVI image band (Geotiff).Each pixel is in the [0, 1] range It is worth mentioning that the order in which the inputs are specified by the user is very important, and should be identical with the one that is used inside the algorithms (see the two computation functions in Table 1).If the inputs are switched, the workflow will not fail at runtime, but will generate an erroneous result.Fig. 3 highlights how these concepts are mapped for the NDVI workflow.Without going into further details; we can say that the inputs specification process is similar for the rest of the existing resources.When executing the NDVI workflow, the system automatically generates its internal representation that is used www.ijacsa.thesai.orgfor the parallel and distributed execution over the Grid infrastructure.
Based on the NDVI formula, a graphic representation is highlighted in Fig. 4, together with the XML-based structure.Density slicing allows the user to define sub-intervals for characterizing the data.The advantage of density slicing is that it allows one to gain a greater degree of variability of brightness within the remotely sensed image compared to the original image (e.g.black and white imagery).The method works best if the range of brightness values covers a single band of frequencies.Each interval is then assigned a color value.The intervals may be defined based on the application.The range of input pixel values is assigned a single output pixel value in a density sliced image.
The range of pixel values may be defined by the user.Density slicing is most effective when the value of particular pixels have significance to a physical variable.
The density slicing workflow acts like a pseudo-color algorithm, used in the creation process of the thematic maps that can be shared and analyzed by the scientific communities.
If in the case of the DOS, NDVI, and EVI workflows the input data type was identified as satellite image, in the density slicing algorithm a new type had to be created that represents the classes range chosen by the user at the graphical interface level.
Once this information was specified, internally the algorithm loops through the entire pixels structure of the satellite image and assigns to each item a specific color.In the end a thematic map is generated.
If more than one domain field specialists are involved in the scenarios development process (e.g.Istanbul case study) the GreenLand platform provides a collaborative environment for developing new algorithms and workflows, but also for visualizing and analyzing the results.
Not in all cases the scientific community members are using the same applications for local development of the scenarios.For such situations it is suitable to create standard services that can be used by all these tools.This is the case of the visualization and interpretation of the GreenLand results that can be access through the OGC standard.
For sharing the density slicing results among the scientific community members, the GreenLand platform provides the Web Map Service (WMS) that is able to expose the output in a standardized format.Its two operations (GetCapabilities and GetMap) allow the user to periodically query the data repository and to retrieve, as a static image, the data they are interested in.
In order to create a dynamic visualization environment, the image returned by the GetMap operation is overlapped onto an interactive map.Once the image is displayed, the user is able to extract relevant information from a specific area within the image boundaries.The area selection is also interactive, and can be performed directly with the mouse.In case of higher accuracy a set of input fields are provided where the user can specify a more detailed area.Classification accuracy is the main measure of the quality of thematic maps produced and required by users, typically to help to evaluate the fitness of a map for a particular purpose [20].Ground truth and classified classes are compared to assess classification accuracy.Error matrix is constructed for this comparison [21].Each row of the matrix is reserved for one of the information classes used by the classification algorithm.Each column displays the corresponding ground truth classes in an identical order.The diagonal elements of the error matrix show the number of pixels classified correctly in each class.www.ijacsa.thesai.org In this research, to assess the accuracy of classification, the error matrix and some common measures derived from this matrix namely, overall accuracy, user's accuracy, producer's accuracy and kappa coefficient are used.The confusion matrix is a simple cross-tabulation of the mapped class label against that observed in the ground or reference data for a sample of cases at specified locations [21].
The accuracy assessment workflow, implemented in the GreenLand platform, determines the quality of the Istanbul land cover/land use classification, by comparing the obtained results with the ground based measurements.
The accuracy assessment workflow implements a set of GRASS functions, in order to obtain the required statistics.It expects two inputs: the classified image and a vector file that contains the ground based measurements.

B. Case Study II: MODIS Mosaic at Black Sea Catchment
Large amount of existing remote sensing products are freely available through the web.In this paper, a challenge was to find freely-available products of scientific quality, covering the entire Black Sea catchment, with large temporal availability, necessary especially for improving crop's type monitoring and for helping SWAT's results validation.Two MODIS products MOD15 -Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation (FPAR) and MOD16 -Surface Resistance and Evapotranspiration (ET) presented great interest in validating SWAT result in the Vit river basin in Bulgaria [22].Therefore, developed flowchart has the ability for the validating the results of SWAT models applied in BSC at local and regional scale.
A specific workflow "BlackSeaMosaicPDG" was developed in the GreenLand platform which permits to retrieve directly these products at the scale of the Black Sea catchment.Twelve tiles are necessary to cover the entire area of interest.The flowchart (Fig. 6.) consists in downloading one-year time series from an FTP server, then extracting the bands separately and mosaicking adjacent tiles together in a single operation.
There are several disadvantages in retrieving large datasets without any automation help.On top of requiring specific software for the analysis, repetitive processes are very unexciting, time consuming and require powerful computing and storage capabilities.Moreover processing made on a standalone computer without publishing results on web-based application are not made visible to others, and therefore the reuse of retrieved datasets remains unlikely.
The MOD15 and MOD16 products are used in the implementation of this case study.The difference between them is in terms of bands organization, different repositories where these products are stored, and different structure inside the repositories.When implementing it within the GreenLand platform, the main goal was to provide an easy to use graphic interface that hides from the user the entire data retrieval and Grid execution processes.
At the graphical user interface level of the application the user is required to specify only the processing year and the bands of the two MODIS products that he is interested in.In the background the application automatically retrieves the related satellite images from the corresponding remote data repository and transfers them onto the Grid machines, where the tasks are going to be processed.
In order to optimize the execution process, the GreenLand platform partitions the use case into groups of tasks, where each group integrates five BlackSeaMosaicPDG workflows, instantiated with different input data sets.The content of each workflow consists in processing the selected bands of one MODIS product, for an entire year time period.

A. Case Study I: Determination of Land Cover/Land Use of Istanbul, Turkey
The selected region used in this study contains diverse land cover types, including vegetative area, high and medium density built up spaces (artificial surfaces-other), and water surfaces.Fig. 7 shows that the NDVI and EVI values for the area are consistent with the theoretical values.Accuracy assessments were applied based on Foody, 2002 [21].25 random points were selected for accuracy assessment.By using error matrix, the overall accuracy (OA) was calculated as 0.92 and Kappa was calculated as 0.87 for classified NDVI and EVI image.Although errors and confusion exist because of mixing problem, these two indices (NDVI and EVI) showed satisfying classification results (OA and Kappa > 0.80).

B. Case Study II: MODIS mosaic at Black Sea catchment
After a couple of hours of processing time on powerful servers, 45 dates (one year) of four MODIS mosaicked products are available in geotiff format as input for the application, for download or for direct export into the enviroGRIDS The Unified Resource Management (URM) portal [23] using OGC standards.
Such development simplifies the access to LAI (Fig. 9.), ET and FPAR MODIS product collections at the scale of the Black Sea catchment, by considerably reducing time for data processing without needing any particular remote sensing skills neither specific software, while benefiting of GRID technology to process and to store voluminous datasets.VI.CONCLUSION Grid technologies provide powerful tools for huge volume of remotely sensed data sharing and high performance processing.After an overview of the recent initiatives of 'gridifying' satellite image processing, two specific usage scenarios in which the Grid is conceived as a powerful computing resource were analyzed.GreenLand web-based platform and application are able to provide smart solutions by automating repetitive processes and using distributed or Grid computing technology when needed.Moreover this application is linked to the EnviroGRIDS URM geo portal where processed results could directly be exported according OGC standards, increasing visibility of existing datasets and encouraging the reuse of processed and available data.Studies to extend the capabilities of the GreenLand application are in progress.Above stated case studies prove that GreenLand is a useful and flexiable platform to implement open web based remote sensing applications.

Fig. 4 .
Fig. 4. The XML and graphical representations of the NDVI Digital image classification uses the spectral information represented by the DN in one or more spectral bands, and attempts to classify each individual pixel based on this spectral information.The resulting classified image is comprised of a mosaic of pixels, each of which belong to a particular theme, and is essentially a thematic "map" of the original image [19].Density slicing, also known as double threshold, is a classification technique using computer processing of digital data [1].

Fig. 5
Fig. 5 exemplifies the visualization of the NDVI result, after applying the density slicing algorithm, based on the WMS service.

Fig. 5 .
Fig. 5.The standardized visualization of the NDVI density sliced result.

Fig. 7
Fig. 7 indicates the NDVI values fluctuated from -0.40 to +0.80.In the figure, the values between 0.4 and 0.8 indicate the green areas in Istanbul.The positive values (bright pixels) less