Repository System for Geospatial Software Development and Integration

The integration of geospatial software components has recently received considerable attention due to the need for rapid growth of GIS application and development environments. However, finding appropriate source code components that can be incorporated into a system under development requires considerable verification to ensure the source code can work correctly. This paper therefore describes the design of a repository system that employs a new specification language, namely SpecJ2, to address the challenges involved in integrating and operating software components. SpecJ2 was designed to represent the architectural attributes of source code components and to abstract their complexity by applying the notion of separation of concerns, a key consideration when designing software systems. The results of the experiment showed that SpecJ2 is capable of defining the different architectural attributes of source code components and can facilitate their integration and interaction at run-time. Thus, SpecJ2 can classify software components according to their identified types. Keywords—Open-Source software; geographic information system; repository system; specification language; components integration


I. INTRODUCTION
There are many open-source GIS projects now actively running and most have reached a high level of maturity in applying their tools to the provision of information that can feed into decision-making processes [1].GIS applications have evolved rapidly by integrating different components to generate a fully functional system that serves a specific domain [2].Business requirements are the key driver in defining the architecture of any GIS application in terms of identifying the functional components related to: data collection and remote sensing components; storage and retrieval components; semantic analyses and data geoprocessing components; and presentation and reporting components.Moreover, certain GIS applications might need to be integrated as a whole into different types of systems to address certain performance, usability, and reliability issues.Despite the functional advantages of open-source GIS-component integration, ensuring the interoperability of different components is a very challenging task.In technical terms, a comprehensive environment is required to define the necessary integration frameworks and avoid potential mismatches between GIS components, both syntactically and semantically [3].Moreover, the diversity of available OSS-GIS solutions might confuse normal users and complicate the process of identifying the best GIS tool for users in terms of the functionality, usability, and integration of applications with other platforms.This paper therefore aims to establish a general-purpose repository system that identifies, classifies, integrates, and develops open-source GIS components to fulfill the requirements of GIS business applications.Specifically, the paper addresses the difficulties involved in component integration as this is the key element underpinning the development of GIS applications.The terms "source code components" and "software components" will be used interchangeably throughout the paper as both refer to source code fragments.

II. RELATED WORK
The integration of components has been a research topic in different application domains from early work by Allen et al. [4] through to the present day, where further investigations into components or services integration continue to be reported.
For instance, Suri et al. [5] examined modularity and interoperability aspects for software systems in industry from an integration perspective.They discriminated between source code behavior and the execution logic within the systems.They utilized UML to bridge the gap between behavioral modelling and the execution of systems.Kaur and Singh [6] developed a web service called GlueCode to mediate the interaction between components written in different programming languages, such as java-based components and .Net components, and the data source Cloud.IO.Their primary focus was on the data exchange patterns and signature matching between components.Farcas et al. [7] developed a new real-time component model to address the problem of component integration.They identified the key distinguishing factors of software components that need to be addressed to ensure successful integration, such as component behavior and a logical execution environment.Fatima et al. [8] conducted a semi-systematic survey to identify risk factors for the integration of software components.They concluded that a lack of interoperability standards, glue code, and format variation are the key reasons for failure to integrate.Schorp and Sommer [9] defined a new component model in the domain of automotive ICT architecture.They contended that a successful integration of software components can be accomplished if functional interdependencies and non-functional requirements are clearly addressed.Their component model facilitated integration based on the discovery of interaction between features.Dogra et al. [10] investigated the reasons for component integration failure and concluded that such failure www.ijacsa.thesai.org is primarily attributable to architectural mismatches between software components.Furthermore, they highlighted the fact that a lack of knowledge and expertise regarding software components might also cause problems with integration.Overall, most of the reported work has thus identified component architecture as the key hindrance to successful integration.There have been few studies showing that functional interdependencies might also case integration failure which means this area of research requires further investigation.This work proposes a new methodology to document and facilitate component interaction by considering the architectural attributes of source code components.It reports our ongoing development of a software development environment that facilitates the identification and integration of software components to build a GIS functional application.

III. REPOSITORY SYSTEM DESIGN FOR OSS-GIS TOOLS
A repository system is a development environment that is equipped with the necessary tools for the automatic identification, classification, and storage of software components.Users can retrieve components from the repository in accordance with their functional requirements by conducting a free-text search, browsing, or providing a detailed formal system specification.In this section, we describe our proposed repository solution for open-source GIS software systems.We also explain the main architecture of the repository system.The main objectives when designing this repository system were to: The behavior of the repository system is described as follows.The source code of a GIS component is deposited into the repository system either manually by uploading code to the system or by providing a GitHub URL from which to import the source code.Once the source code is uploaded into the repository, the identifier sub-system analyzes the code to identify its architecture.Based on this analysis the component might be classified under a matching category represented in the classifier sub-system.If the source code cannot be categorized under any of the available categories it is discarded from the repository workflow and stored as an "Undefined Type" in the repository for further consideration.
From a user's perspective, the repository system provides the capability to search for an available source code or subsystems by providing an XML description of component types using the developed specification language described in Section 5.The matcher sub-system compiles the XML description provided by the user to identify a match to the components in the repository.Matching specifications result in finding either exact matches to the description or partial matches.If exact matching components are found, they are listed to users for further investigation.If partially matching components are found, the repository system refactors the source code to fulfill the XML description that was provided.In cases where the available source code in the repository lacks some of the required interfaces to match the user's specification, the repository generates the necessary interfaces in the form of skeleton code to satisfy these requirements.However, the code generated by the refactoring process must be examined by the user to confirm that the new packaged component works and will provide the expected behavior.

IV. REPOSITORY CLASSIFICATION SCHEME
The GIS system architecture, like many information systems, commonly conforms to the N-Tier architecture [11], which is characterized by three main layers: the interaction and presentation layer, the processing layer, and the management layer.The overall architecture is depicted in Fig. 2.
These three layers are the building blocks of many GIS systems, whether they are proprietary GIS software systems or open-source GIS software systems.Our classification scheme was primarily built on these layers to identify high-level functional areas and their facets for the classification of GIS tools.It is necessary to understand these layers and define their interfaces in order to facilitate the potential integration of different components, such as those found in other GIS tools.
As highlighted by Dempsey [12], many OSS-GIS tools are available to support these three layers.For example, according to Alkazemi et al. [13], in the information management layer, common tools include PostGIS and Geodatabase, both of which serve as a data source and database for other tools.PostGIS and Geodatabase make it possible to store GIS data in a central location for easy access and management.Grass, Sextante, and MapWindow are some of the common tools used for the human interaction layer; these facilitate communication between the information system and external users, which are either people or computer systems such as a web browser.Hadoop [14] is one of the OSS tools available on the market and is classified under the processing layer.Ut is an Apache top-level project that is being built and used by a global community of contributors and users.[15].However, to work correctly, source code components must comply with standard characteristics.Thus, source code components might be characterized by: Signature of methods or functions defines the name of the method, input and output parameters, and their datatypes.Programming language adds more filtration to the searching text to obtain a more accurate result.Certain source codes may not be used alone and can be incorporated with other codes or applications.Therefore, it is necessary to understand the sequence with which a method is executed to run as expected in the application under development.The attributes of source code components, especially those related to their architectural attributes, are always hard to document and represent as they differ from one programming style to another.
To avoid the complexity of source code matching characteristics, we developed a specification language, namely SpecJ2, to summarize and document the necessary attributes of source code components.SpecJ2 formalizes some of the architectural characteristics of software components and this also applies to GIS-component integration.SpecJ2 thus serves as a verification mechanism that checks whether source-code conforms to the required properties of a system in the OSS-GIS repository system.Table 1 describes the syntax of the SpecJ2 language that identifies the key elements which represent the architectural properties of components.Some of the attributes may be null values and therefore might be omitted in the description file.The key attributes are data input and output as these handle data exchange between the components of the system.Thus, SpecJ2 can be considered the adapter layer between any two GIS components designed to interoperate with each other as it handles component interoperability.Thus, data are exchanged in a standard manner between the different types of components.This layer is generated automatically by the builder component within our repository system to facilitate the simultaneous integration of tools or components.The conceptual view of SpecJ2 is presented in Fig. 3. SpecJ2 represents the intermediate layer (i.e.wrapper) between source code components and the underlying framework of the system to be built.It hides the complexity of the implementation and differences in software components within the framework.Thus, if a developer compiles the system under development all the components will be considered the same because the SpecJ2 layer hides component types from the underlying system compiler.Furthermore, SpecJ2 defines the linkage between components that will exchange messages by connecting the interfaces of methods together, which facilitates data exchange at run-time.For example, if the system under development was built using Java language and a developer needed to incorporate a component written in another programming language, say PHP class, they can either treat them as services and handle data exchange at run-time or use SpecJ2 to handle environmental difference parameters.In SpecJ2 we described the geocoding module of ArcGrid which is a generic functional model in many forms of geospatial software as it interprets coordinates (i.e.latitude, longitude) based on their corresponding addresses, either by querying the database of stored addresses (e.g.Google API) or by reading addresses from points on the map.To demonstrate our approach, in Fig. 4 we provide a description of the logging component of the geocoding facility in SpecJ2.The SpecJ2 description captures part of the logging capability which is a generic feature in many GIS applications.We conducted our experimental work at this stage by identifying how many components obtained from open source repositories can fit as a logging module, and hence can be reused in GIS applications.We therefore obtained 50 codes for each component type from GitHub; these were defined as geospatial related components from solutions including uDig, ArcGrid, and deegree [12].However, we limited the experiment to Java based solutions.The selection of the source code was carried out manually by downloading all the corresponding JAR files of the solutions then applying the sampling technique defined by Kamal et al. [16] to ensure we covered as many of the test samples as possible.We then ran SpecJ2-compiler to scan through the source code to identify matching results.The process of compiling source code is illustrated in Fig. 5.
Source code is first examined using the extraction tool that identifies the signature of the methods within the JAR file provided.The extracted methods are then sent to the SpecJ2compiler to compile the source code against a generated Junit test class based on the XML component description provided.Fig. 6 presents the generated JUnit test class used for compiling test samples.In cases where the deposited source code does not match any component types, re-scoping of the source code fragment was performed to include more attributes for the next round.Re-scoping was initially set for four rounds.If components failed to compile after the first round they were discarded from the system.

VII. RESULTS AND DISCUSSION
The results obtained for the experiment are summarized in Table 2.We categorized these results into fully matched, partially matched, and no match.Fully matched refers to when all the attributes defined by the source code component matched the corresponding SpecJ2 description, hence the component can be used without any modifications.However, if none of the attributes were identified in the selected source code, the code fragment is categorized as no match.Midway between both extremes are partially matched components which require further investigation.We counted the number of matching and non-matching attributes to assess the level of modification needed.The experiment produced striking outcomes with respect to the identification of component types.Overall, SpecJ2 yielded significant results in terms of matching components to the types defined in the repository.Compared to the matched samples, the number of unmatched components was minimal with an overall average ratio of 0.124 (i.e. for each "no match" there was four matched components on average).We therefore conclude that SpecJ2 is useful in representing source code components and can also be used to intermediate the interaction between various types of component.The results of the partially matched components were twofold as the overall percentage of matched attributes counted was more significant than the percentage of unmatched attributes except in the cases of openJUMP and gvSIG.We investigated the source code for these component types by hand and observed that openJUMP needed to operate in conjunction with the OSGE framework to provide a complete set of attributes.However, gvSIG was slightly different as the available components were mainly plugins, hence the attributes examined were an extension of the main framework.The other missing attributes were coded in the main Factory class within the gvSIG package.Thus, the unmatched percentages indicated that they were missed by SpecJ2 due to a lack of support for inheritance which will be included in the new release of the language.www.ijacsa.thesai.org

VIII. CONCLUSION AND FUTURE WORK
The integration of software components is a key element of the component-based software development paradigm.The architectural and the behavioral features represent the backbone of any integration process and must be described precisely.The development of GIS applications is no different as it involves various forms of component integration.
In this work, we developed SpecJ2 as a specification language to address the complex interoperability and execution of software components.SpecJ2 complemented the design of the repository system proposed in this work to examine the feasibility of identifying component types and classifying them according to their attributes.The results obtained in this work supported the design considerations of SpecJ2 and proved that it was capable of identifying potential mismatches between software components.Such identification is significant as it can help developers verify components prior to reusing them in their systems.The next step in this work is to automate the refactoring mechanism of software components to transform those which are partially matched into fully matched candidates.Moreover, we plan to consider a wider range of component types in different programming languages.
Tier GIS Application Architecture.www.ijacsa.thesai.orgV. INTEGRATION OF GIS COMPONENTS Software components can interact with each other as services if they share common characteristics as a data exchange model