Extracting Code Resource from OWL by Matching Method Signatures using UML Design Document UML Extractor

Software companies develop projects in various domains, but hardly archive the programs for future use. The method signatures are stored in the OWL and the source code components are stored in HDFS. The OWL minimizes the software development cost considerably. The design phase generates many artifacts. One such artifact is the UML class diagram for the project that consists of classes, methods, attributes, relations etc., as metadata. Methods needed for the project can be extracted from this OWL using UML metadata. The UML class diagram is given as input and the metadata about the method is extracted. The method signature is searched in OWL for the similar method prototypes and the appropriate code components will be extracted from the HDFS and reused in a project. By doing this process the time, manpower system resources and cost will be reduced in Software development. KeywordsComponent: Unified Modeling language, XML, XMI Metadata Interchange, Metadata, Web Ontology Language, Jena framework.


INTRODUCTION
The World Wide Web has changed the way people communicate with each other.The term Semantic Web comprises techniques that dramatically improve the current web and its use.Today's Web content is huge and not wellsuited for human consumption.The machine processable Web is called the Semantic Web.Semantic Web will not be a new global information highway parallel to the existing World Wide Web; instead it will gradually evolve out of the existing Web [1].Ontologies are built in order to represent generic knowledge about a target world [2].In the semantic web, ontologies can be used to encode meaning into a web page, which will enable the intelligent agents to understand the contents of the web page.Ontologies increase the efficiency and consistency of describing resources, by enabling more sophisticated functionalities in development of knowledge management and information retrieval applications.From the knowledge management perspective, the current technology suffers in searching, extracting, maintaining and viewing information.The aim of the Semantic Web is to allow much more advanced knowledge management system.
To develop such a knowledge management system the software company's can make use of the already developed coding.That is to develop new software projects with reusable codes.The concept of reuse is not a new one.It is however relatively new to the software profession.Every Engineering discipline from Mechanical, Industrial, Hydraulic, Electrical, etc, understands the concept of reuse.However, Software Engineers often feel the need to be creative and like to design "one time use" components.The fact is they come with unique solution for every problem.Reuse is a process, an applied concept and a paradigm shift for most people.There are many definitions for reuse.In plain and simple words, reuse is, "The process of creating new software systems from existing software assets rather then building new ones".
Systematic reuse of previously written code is a way to increase software development productivity as well as the quality of the software [3,4,5].Reuse of software has been cited as the most effective means for improvement of productivity in software development projects [6,7].Many artifacts can be reused including; code, documentation, standards, test cases, objects, components and design models.Few organizations argue the benefits of reuse.These benefits certainly will vary organization to organization and to a degree in economic rational.Some general reusability guidelines, which are quite often similar to general software quality guidelines, include [8] ease of understanding, functional completeness, reliability, good error and exception handling, information hiding, high cohesion and low coupling, portability and modularity.Reuse could provide improved profitability, higher productivity and quality, reduced project costs, quicker time to market and a better use of resources.The challenge is to quantify these benefits.
For every new project Software teams design new components and code by employing new developers.If the company archives the completed code and components, they can be used with no further testing unlike open source code and components.This has a recursive effect on the time of development, testing, deployment and developers.So there is a base necessity to create system that will minimize these factors.http://ijacsa.thesai.org/Code re-usability is the only solution for this problem.This will reduce the development of an existing work and testing.As the developed code has undergone the rigorous software development life cycle, it will be robust and error free.There is no need to re-invent the wheel.To reuse the code, a tool can be create that can extract the metadata such as function, definition, type, arguments, brief description, author, and so on from the source code and store them in OWL.This source code can be stored in the HDFS repository.For a new project, the development can search for components in the OWL and retrieve them at ease.The OWL represents the knowledgebase of the company for the reuse code.
The projects are stored in OWL and the source code is stored in the Hadoop Distributed File System (HDFS) [9].The client and the developer decide and approve the design document.For the paper the UML class diagram is one such design document considered as the input for the system.The method metadata is extracted from the UML and passed to the SPARQL to extract the available methods from the OWL.Selecting appropriate method from the list the code component is retrieved from the HDFS.The purpose of using an UML diagram as input is before developing software this tool can be used to estimate how many methods is to be developed by extraction.The UML diagram is a powerful tool that acts between the developer and the user.So it is like a contract where both parties agree for software development using UML diagram.After extracting the methods from the UML diagram these methods are matched in the OWL.From the retrieved methods the developer can account for how many are already available in the repository and how many to be developed.If the retrieved methods are more the development time will be shorter.To have more method matches the corporate should store more projects.The uploading of projects in the OWL and HDFS the corporate knowledge grows and the developers will use more of reuse code than developing themselves.Using the reuse code the development cost will come down, development time will become shorter, resource utilization will be less and quality will go up.
The paper begins with a note on the related technology required in Section 2. The detailed features and framework for source code retriever is found in Section 3. The Keyword Extractor for UML is in section 4. The Method Retriever by Jena framework is in section 5.The Source Retriever from the HDFS is in section 6.The implementation scenario is in Section 7. Section 8 deals with the findings and future work of the paper.

A. Metadata
Metadata is defined as "data about data" or descriptions of stored data.Metadata definition is about defining, creating, updating, transforming, and migrating all types of metadata that are relevant and important to a user's objectives.Some metadata can be seen easily by users, such as file dates and file sizes, while other metadata can be hidden.Metadata standards include not only those for modeling and exchanging metadata, but also the vocabulary and knowledge for ontology [10].A lot of efforts have been made to standardize the metadata but all these efforts belong to some specific group or class.The Dublin Core Metadata Initiative (DCMI) [11] is perhaps the largest candidate in defining the Metadata.It is simple yet effective element set for describing a wide range of networked resources and comprises 15 elements.Dublin Core is more suitable for document-like objects.IEEE LOM [12], is a metadata standard for Learning Objects.It has approximately 100 fields to define any learning object.Medical Core Metadata (MCM) [13] is a Standard Metadata Scheme for Health Resources.MPEG-7 [14] multimedia description schemes provide metadata structures for describing and annotating multimedia content.Standard knowledge ontology is also needed to organize such types of metadata as content metadata and data usage metadata.

B. Hadoop & HDFS
The Hadoop project promotes the development of open source software and it supplies a framework for the development of highly scalable distributed computing applications [15].Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment and it also supports data intensive distributed application.Hadoop is designed to efficiently process large volumes of information [16].It connects many commodity computers so that they could work in parallel.Hadoop ties smaller and low-priced machines into a compute cluster.It is a simplified programming model which allows the user to write and test distributed systems quickly.It is an efficient, automatic distribution of data and it works across machines and in turn it utilizes the underlying parallelism of the CPU cores.The monitoring system then rereplicates the data in response to system failures which can result in partial storage.Even though the file parts are replicated and distributed across several machines, they form a single namespace, so their contents are universally accessible.Map Reduce [17] is a functional abstraction which provides an easy-to-understand model for designing scalable, distributed algorithms.

C. Ontology
The key component of the Semantic Web is the collections of information called ontologies.Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related.Gruber defined ontology as a specification of a conceptualization [18].Ontology defines the basic terms and their relationships comprising the vocabulary of an application domain and the axioms for constraining the relationships among terms [19].This definition explains what an ontology looks like [20].The most typical kind of ontology for the Web has taxonomy and a set of inference rules.The taxonomy defines classes of objects and relations among them.Classes, subclasses and relations among entities are a very powerful tool for Web use.

III. SOURCE CODE RETRIEVER FRAMEWORK
The Source Code Retriever makes use of OWL is constructed for the project and the source code of the project is stored in the HDFS [21].All the project information of a software company is stored in the OWL.The size of the project source will be of terabytes and the corporate branches are http://ijacsa.thesai.org/spread over in various geographical locations so, it is stored in Hadoop repository to ensure distributed computing environment.Source Code Retriever is a frame work that takes UML class diagram or XMI (XML Metadata Interchange) file as an input from the user and suggests the reusable methods for the given Class Diagram.The Source Code Retriever consists of three components: Keyword Extractor for UML, Method Retriever and Source Retriever.The process of the Source Code Retriever Framework is presented in the "Fig. 1 The XMI is an Object Management Group (OMG) standard for exchanging metadata information using XML.The initial proposal of XMI "specifies an open information interchange model that is intended to give developers working with object technology the ability to exchange programming data over the Internet in a standardized way, thus bringing consistency and compatibility to applications created in collaborative environments."The main purpose of XMI is to enable easy interchange of metadata between modeling tools and between tools and metadata repositories in distributed heterogeneous environments.XMI integrates three key industry standards: (a) XML -a W3C standard (b) UML -an OMG (c) MOF -Meta Object Facility and OMG modeling and metadata repository standard.The integration of these three standards into XMI marries the best of OMG and W3C metadata and modeling technologies allowing developers of distributed systems share object models and other Meta data over the Internet.
The process flow of Keyword Extractor for UML is given in the "Fig.2".The XMI or UML file is parsed with the help of the SAX (Simple API for XML) Parser.SAX is a sequential access parser API for XML.SAX provides a mechanism for reading data from an XML document.SAX loads the XMI or UML file and get the list of tags by passing name.It gets the attribute value of the tags by attributes.getValue(<Name of the attributes>) method.The methods used to retrieve the attributes are Parse, Attributes and getValue(nameOfAttibute).The Parse() method will parse the XMI file.The Attribute is to hold the attribute value.GetValue(nameOfAttibute) method returns class information, method information and parameter information of the attribute.

UML Extractor
Class Name (Name, scope) Method Information ( Name, type) Parameter Information The XMI file consists of XML tags.To extract class information, method information and parameter information are identified with the appropriate tag as given in the Table I.Using the tags the metadata of the UML or the XMI is extracted.The extracted metadata are class, methods, and attributes etc., which are passed to the Method Retriever component.

V. METHOD RETRIEVER
Method Retriever component interact with the OWL and returns the available methods from the OWL for the given class diagram is represented diagrammatically in "Fig.3".The extracted information from the UML file by the Keyword Extractor for UML is passed to the Method Retriever component.It interacts with OWL and retrieves matched method information using SPARQL query.SPARQL is a Query language for RDF.The SPARQL Query is executed on OWL file.Jena is a Java framework for building Semantic Web applications.It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.Jena is a Java framework for manipulating ontologies defined in RDFS and OWL Lite [22].Jena is a leading Semantic Web toolkit [23] for Java programmers.Jena1 and Jena2 are released in 2000 and August 2003 respectively.The main contribution of Jena1 was the rich Model API.Around this API, Jena1 provided various tools, including I/O modules for: RDF/XML [24], [25], N3 [26], and N-triple [27]; and the query language RDQL [28].In response to these issues, Jena2 has a more decoupled architecture than Jena1.Jena2 provides inference support for both the RDF semantics [29] and the OWL semantics [30].
SPARQL is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language used to retrieve the information from the OWL.SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware.SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions.SPARQL also supports extensible value testing and constraining queries by source RDF graph.The results of SPARQL queries can be results sets or RDF graphs.

A. Query processor
A query processor executes the SPARQL Query and retrieves the matched results.The SPARQL Query Language for RDF [31] and the SPARQL Protocol for RDF [32] are increasingly used as a standardized query API for providing access to datasets on the public Web and within enterprise settings.The SPARQL query takes method parameters and the returns the results.The retrieved results contains project details like name of the project, version of the project and method details like name of the package, name of the class, method name , method return type, method parameter.Query processer takes the extracted method name and the method parameter as an input and retrieves the methods and project information from the OWL.

VI. SOURCE RETRIEVER
Source Retriever component retrieves the appropriate source code of the user selected method from the HDFS.It is the primary storage system used by Hadoop applications.http://ijacsa.thesai.org/HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.The source code file location of the Hadoop repository path is obtained from the OWL and retrieved from the HDFS by the copyToLocal(FromFilepath,localFilePath) method.
QDox is a high speed, small footprint parser for extracting class/interface/method definitions from source files.When the java source file or folder that consists java source file loaded to QDox; it automatically performs the iteration.The loaded information is stored in the JavaBuilder object.From the java builder object the list of packages as an array of string are returned.This package list has to be looped to get the class information.From the class information the method information is extracted.It returns the array of JavaMethod.From this java method the information like scope of the method, name of method, return type of the method and parameter informations are extracted from the JavaMethod.
QDox finds the methods from the source code.The file that is retrieved from the HDFS is stored in the local temporary file.This file is passed to the Qdox addSource() method for parsing.Through Qdox each method is retrieved one by one.The retrieved methods are compared with methods that the user requested for source code retrieval method.If it matches the source code is retrieved by getSourceCode() method.Then the temporary file is deleted after the process.In Hadoop repository files are organized in the same hierarchy of java folder.So it gets the source location from the OWL and retrieve the java source file to a temp file.The temporary file is loaded into QDox to identify methods.Each method is compared with method to be searched.If it matches; the source code of the method is retrieved by getMethodSourceCode() method.

VII. CASE STUDY
The input for the frame work is a UML class diagram.The sample class diagram is given below The entire process of the framework is given in the Table II.The Keyword Extractor for UML uses the class diagram and retrieves the method validateLogin(username:string).The output is given to the Method Extractor and generates the SPAQL query and extracts the matched methods which are listed in the Table III.From the list the appropriate method will be selected and the QDox retrieves the source code from the HDFS and displays the method definition of the selected methods as shown in the output of the Source Retriever in Table II.To test the performance of this framework the reusable OWL files are created by uploading the completed projects.The first OWL file is uploaded with first java project.The second OWL file is uploaded with first and the second java projects.The third OWL file is uploaded with first, second and third java projects.Similarly five OWL files are constructed.The purpose of creating OWL is to show how reusability increases when the knowledgebase grows.A sample new project is considered and it contains ten methods to be developed.The OWL files are listed with the number of packages, number of classes, number of methods and number of parameters.These methods are matches with the OWL files and the number of matches is listed in the Table IV.These data in the row of the Table IV shows that the number of matched methods.The reusability graph shown in the "Fig.4" shows that how the matches increases when the number of projects in the OWL grows.For the graph only five new method names are used instead of ten listed in the Table IV.The X-axis represents the OWL file numbers and the Yaxis represents the number of method matched for the new method legends.This progress shows that by uploading more projects in the knowledgebase can able to provide nearly hundred percent of the methods for reuse during software development.

VIII. CONCLUSION
The paper presents a framework to extract the method code components from the OWL using the UML design document.OWL is semantically much more expressive than needed for the results of our searching.With these sample tests the paper argues that it is indeed possible to extract code from OWL using the UML class diagram.The purpose of the paper is to achieve the code reusability for the software development.The OWL for the source code has already been created and this paper searches and extracts the code and components and reuses to shorten the software development life cycle.Before starting the coding phase of the development the framework helps the software development team to access the possibilities of how much code can be reused and how much code need to be developed.This assessment can help project manager to allot resources to the project and reduce cost, time and resource.The software companies can make use of this framework and develop the project quickly and grab the project at the lower cost among the competitors.
After developing OWL Ontology and storing the source code in the HDFS, the code components can be reused.This paper has taken design document from the user as input, then extracted the method signature and try to search and match in the OWL.The knowledgebase gets uploaded with more and more projects the reuse rate is also higher.The future work can take the SRS as input; text mining can be performed to extract the keywords as classes and the process as methods.The SRS artifact is much earlier phase than the UML.So considerable amount of time can be reduced than using UML as input.The method prototype can be used to search and match with the OWL and the required method definition can be retrieved from the HDFS.The purpose of storing the metadata in OWL is to minimize the factors like time of development, time of testing, time of deployment and developers.By creating OWL using this framework can reduce these factors.

Figure 1 .
Figure 1.Process of Source Retrieverretrieves the matched methods from the repository.Method Retriever constructs SPARQL query to retrieve the matched results.The user should select the appropriate method from the list of methods and retrieve the source code by Source Retriever component which interacts with HDFS and displays the source code.IV.KEYWORD EXTRACTOR FOR UMLUnified Modeling Language (UML) is a visual language for specifying, constructing, and documenting the artifacts of systems.It is a standardized general-purpose modeling language in the field of software engineering.To create UML class diagram Umberllo UML Modular open source tool is used.The diagram is stored in XMI format.Umbrello UML Modeller is a Unified Modeling Language diagram program for KDE.UML allows the user to create diagrams of software and other systems in a standard format.Umbrello It can support in the software development process especially during the analysis and design phases of this process.UML is the diagramming language used to describing such models.Software ideas can be represented in UML using different types of diagrams.Umbrello UML Modeller 1.2 supports Class Diagram, Sequence Diagram, Collaboration Diagram, Use

Figure 2 .
Figure 2. Process of Keyword Extractor for UML Figure 3. Method Retriever Process

TABLE I .
TAGS USED TO EXTRACT METADATA FROM XMI FILE It holds the informations of the class attributes like name of the attributes, type of the attribute, and visibility of the attribute etc., UML:Operation It holds the methods information of the class like name of the method, return type of the methods, visibility of the method.
like name of the class, visibility of the class ,etc., http://ijacsa.thesai.org/UML:Attribute Attribute is a sub tag of class.

TABLE II .
PROCESS FLOW OF THE FRAMEWORK

TABLE III .
METHOD RETRIEVER OUTPUT