OntoDI : The Methodology for Ontology Development on Data Integration

Department of Information Systems, Faculty of Computing and Information Technology Rabigh King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia. Department of Information Technology, Faculty of Computing and Information Technology Rabigh 3 King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia. Department of Computer Science, Faculty of Computing and Information Technology Rabigh King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia. Department of Informatics, Faculty of Computer Science and Information Technology, Mulawarman University, Indonesia Department of Information Systems, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia. Department of Computer Science, Faculty of Computing and Information Technology King Abdulaziz University, Jeddah, Kingdom of Saudi Arabia. Master in Computer Science Program, Budi Luhur University, Jakarta 12260, Indonesia 4 School of Electrical Engineering, Politeknik Negeri Ujung Pandang, Makassar, Indonesia


I. INTRODUCTION
The implementation of data integration still opens many problems to be solved.Sharing and integrating data from loosely coupled, heterogeneity of data representation and mapping data on different data sources are among serious problems in data integration [1][2][3][4].Moreover, big data that most likely comprises of data heterogeneity produces data conflicts issues, especially on semantic aspects between different data representation and sources [3,[5][6][7].These phenomena become more common and become the main challenges in data integration implementation in the last few years [3,6,[8][9][10][11][12][13][14].
Semantic aspects problem is related to the meaning of every words between terms in a special context or system [6,15].There are two possibilities of data problem on semantic aspects [16].The first problem is about data that have different names with the same meaning.For example, between two data sources with different applications in education domain, they store data about students.In one data source, student"s data is saved by pupil name and in another data source, student"s data stored by the learner name.This condition produces semantic data conflict between pupil and learner, because in these two data sources the same data about student information are stored.
The second possible problem on semantic aspect is about homonyms, in which there exists data with same name, but different meaning.For example, inside education domain between two data sources in different applications, "book" is used as a name.In the first data source, "book" refers to storing information about a book for reading, while the other data source, "book" refers to storing the status of making reservations.Ontology approach is a promising solution for these kinds of problems through constructing semantics relationship between these two semantic aspects.
The methodologies for ontology development are evolving in recent years.Every proposed ontology development method is based on specific objectives and domain areas during the implementation of the ontology knowledge [17][18][19].Section II of this paper discusses on the review and analysis on the existing ontology development methodologies.As a result, a brief summary of the limitations of the existing ontology development methodologies are identified.www.ijacsa.thesai.org The aim of this research is to propose an improved method phases for ontology development, specifically on data integration domain area (OntoDI) as illustrated in Section III.OntoDI is developed based on the review and analysis activity in Section II and it is an improvement of ontology development methods from our previous work.Section IV of this paper describes in detail the experiment of ontology development on data integration (OntoDI) in education area, while Section V confers the results and discussions of OntoDI.Section VI concludes this paper and briefly informs the future work of this research.

II. EXISTING METHODOLOGIES FOR ONTOLOGY DEVELOPMENT
In this paper, sixteen methodologies for ontology development are under study, starting from the year of 1989 to 2017 [17,[19][20][21][22][23][24][25][26][27][28][29][30][31][32][33].This paper reviews and analyzes existing methodologies for ontology development based on four criteria.Table 1 summarizes the review of the methodologies based on the name and the year published, the purpose of the methodology, the category of the method, and the main steps involved in the methodology.
The second column of Table 1 presents the purpose of each methodology.It is realized that majority of the researchers developed methodologies by constructing or involving ontology knowledge [17, 19-23, 25-27, 30, 32, 33].A few researchers developed ontology by creating enterprise model [28,31] and a few others focused on data integration [24,29].It can be concluded that, every proposed ontology development method is based on specific objectives and domain areas to implement the ontology knowledge.
The third column of Table 1 classifies the development methodology into three categories.First is the methodology that does not consider collaboration and distributed construction (NoCoDi).Second is the methodology that considers both collaborative and distributed construction (CoDi).While the third category is the methodology that can be reengineered (Reeng).
Moreover, there exists a methodology that combines CoDi and Reeng [20,24].The NeOn methodology [32] is both CoDi and Reeng.This is because inside the NeOn there involves reusing and reengineering ontological resources process.This means that NeOn also enters into reengineering methodologies category.
The fourth column of Table 1 shows the steps to develop the ontology.There are a lot of diversity of steps to develop ontology.This is due to the fact that the steps relate to the goal of the ontology in specific implementation domain.Only in CoMOn [19], the researcher discusses on the common steps of the ontology development method.From the review and analysis of the steps in Table 1, it can be acknowledged that the most common steps in the ontology development are: specification, conceptualization, formalization, implementation, evaluation and documentation.
The specification process involves identifying the purpose and the domain of the ontology development.The conceptualization process relates to the organization and structuring of the domain knowledge.Meanwhile, the formalization process transforms the conceptual model into formal model.And then followed by the implementation process, in which it involves the building of the ontology.Subsequently, the evaluation process is performed that focuses on verifying and validating the ontology.The documentation process is where all activities and results are recorded and filed.
From the overall review and analysis of methodologies in Table 1, many issues in the implementation of data integration are identified as to be related to the semantic aspects [8-11, 13, 14, 34, 35].One important aspect in ontology development for data integration is the data sources (resources) [36].By observing Table 1, only two methodologies (i.e.NeOn and OmMAS) discussed about resources.
NeOn methodology [20] consists of phases that reuse and reengineer non-ontological resources.Unfortunately, there is no ontology evaluation and validation to check the consistency aspect of the ontology knowledge.Moreover, NeOn does not have ontology refinement phase that is required for editing and improving the ontology knowledge when inconsistency errors occur.In addition, the OmMAS methodology [17] has a phase to identify resources from multi-agent system, but OmMAS has too many phases (i.e.nine phases altogether), that can make it less efficient.Therefore, a methodology with reasonable number of phases is required so that the process become more effective.

III. METHODOLOGY FOR ONTOLOGY DEVELOPMENT ON DATA INTEGRATION (ONTODI)
This research focuses on building an improved method for ontology development specifically for the data integration implementation called ontology development on data integration domain (OntoDI).The main purpose of the OntoDI is to develop the ontology knowledge to handle semantic aspects problem, with a reasonable number of phases, in order to support the implementation of data integration.www.ijacsa.thesai.orgTo build ontologies from scratch NoCoDi Requirement specification, conceptualiz-ation, formalization, implementation, maintenance, knowledge acquisition, documentation and evaluation SENSUS, 1997 [26] To provide a broad conceptual structure to develop translator machine NoCoDi Terms are taken as seed, terms are linked to SENSUS, all concepts from new terms in the path are included, relevant terms are added, the relevant nodes is subtree are added and new domain terms are added (KA)2, 1999 [25] To design knowledge acquisition using ontologies development in a joint effort by a group of peoples from different locations and using the same templates and language

CoDi
Ontological engineering to build an ontology of the subject matter, characterizing the knowledge in terms of the ontology and providing intelligent access to the knowledge Ontology Integration, 2001 [24] To reuse and integrated existing ontologies for specific purpose

CoDi and Reeng
Identification of ontologies candidate, select the candidate of the ontologies, studying an ontologies, choose most acceptable source ontologies, apply the integration and analyse the ontology result On-To-Knowledge, 2001 [23] To provide ontologies application-driven development for knowledge management NoCoDi Feasibility study, kick-off, refinement, evaluation and maintenance DILIGENT, 2004 [22] To support specific domain experts in a distributed setting to engineer and evolve ontologies CoDi Building, local adaptation, analyse activity, adjustment, and local update Semi-automatic creation ontologies, 2010 [21] To develop ontology from company databases to integrate information sources and to contribute to the logical treatment NoCoDi Requirements analyse, collection of metadata, building, improvement, testing, and feedback NeOn, 2012 [20] To develop embed ontology in ontology network with complex settings that could collaboratively build ontologies by reusing and reengineering knowledge resources

CoDi and Reeng
Specification task to implement, reuse and reengineer non-ontological resources, reuse the ontological resources, reuse and reengineer ontological resources, reuse and merge ontological resources, reuse merge and reengineer ontological resources, reuse the ontological design patterns, restructure the ontological resources and localize the ontological resources CoMOn, 2013 [19] To develop ontology knowledge specific on compliance management

NoCoDi and CoDi
Identification, build the ontology, evaluate the ontology, improvement the ontology and create documentation OmMAS, 2017 [17] To build the ontology knowledge in the multi-agent system development NoCoDi Define the purpose of ontology development, identify the resources from multiagent system, re-engineer and reuse the identified resources, conceptualize all the terms and relationships, restructure resources, formalize all terms and relationships into diagram design, implement all terms and relationships into ontology, evaluate and validate the ontology, refine the ontology and create ontology documentation www.ijacsa.thesai.orgBased on Badr et al. [18], there are several common phases that are essential to develop ontology knowledge.These phases are definition, conceptualization, formalization, implementation, evaluation and documentation.Additional phases are added to improve the existing processes.Fig. 1 illustrates the methodology for ontology development on data integration domain (OntoDI).OntoDI has three main parts: the pre-development, core-development and post-development.And in every part contains several phases.The first is the pre-development part.This part contains two phases: the definition of the purpose of ontology development and the identification of resources.
The second is the core-development part.This part comprises of three phases the conceptualization and formalization of the ontology knowledge, the development of ontology knowledge using specific tools, and the evaluation and validation of ontology knowledge.In order to refine the ontology, these steps may need to be repeated and may require many iterations.
The third is the post development part that contains two activities: the ontology refinement and the completion of documentation.Essentially, the documentation process of the OntoDI starts from the beginning phase of the Predevelopment part and continues in all phases of the OntoDI.It involves compiling the steps necessary in each phase and the interrelated process.
We claim that ontology development phases on OntoDI follow the standard common phases proposed by Badr et al. [18] and more efficient than the one proposed by OmMAS [17].Table 2 shows the mapping of common phases by Badr et al., the phases in OmMAS and the proposed phases in OntoDI.OntoDI has seven phases, in which six of them are common phases and have reduced to a reasonable number of phases from OmMAS.
OntoDI has fulfilled the important aspects of ontology development for data integration, in which it considers the data sources by having the Resource Identification phases; it checks for consistency aspect of the ontology knowledge by adding Ontology evaluation and validation phase; it able to edit and refine the ontology knowledge when inconsistency errors occur by adding the Ontology refinement phase.The number of phases in OntoDI has been reduced (compared to OmMAS) and simpler, so that the process of implementation of data integration become more efficient.

IV. EXPERIMENT OF ONTOLOGY DEVELOPMENT ON DATA
INTEGRATION (ONTODI) This section describes the implementation of OntoDI in specific domain for data integration.It follows the methodology described in Section III.This section also explains in detail about the OntoDI steps and phases.The main purpose of OntoDI is to develop an ontology knowledge to handle semantic aspect problems to support the implementation of data integration.Fig. 2. Data Source on SES and GS.www.ijacsa.thesai.org

A. Definition of the Purposes of Ontology Development
This is the first phase of the OntoDI"s pre-development part.The experiment of this research is related to data integration implementation in the electronic learning system domain.Therefore, the purpose of the ontology development in this research is to produce learning knowledge to share and integrate different learning information between different systems.

B. Resources Identification
The second phase of the OntoDI"s pre-development part is to identify and select the specific data resources that requires integration.There are many sources exists in different systems in education domain.This research focuses on two systems which are: the Student Evaluation System (SES) and Grading System (GS) as shown in Fig. 2.There are four attributes to be selected from SES, namely the student, student2, questions and mark.And three attributes are selected in the GS, namely the student, student_undergraduate and grade.
From our observations, two semantic aspect problems have occurred between these two systems.First problem is the semantic problem between mark and grade.These two resources contain same data item regarding the student mark, but they used different name.Therefore, the semantic issue raised in this situation is: different name with the same meaning.
The second semantic problem occurs in the student"s records in both SES and GS.These two data sources have same name but contain different student information.In SES, the student record contains about undergraduate information, while in GS contains about postgraduate information.Consequently, the semantic issue raised in this situation is: same name but with different meaning.

C. Ontology Conceptualization and Formalization
Conceptualization is the first phase in the Core development part.It is the process of generating and reforming all terms and relationships.In other words, all possibility tables and field names in the database system are being represented as classes and subclasses term for the ontology knowledge.
Then, the formalization process is conducted to produce meaningful models at the knowledge level.In this process, every class or subclass term is given semantic relationship between them.Table 3 portrays all relationships that can be used within the ontology knowledge.Table 4 shows all possibility terms in SES and GS to be candidate of classes and subclasses for ontology knowledge.This phase is the solution for the semantic problems that identified in the resources identification phase.There are two semantic aspect that solved in this phase, the first semantic aspect problem is between two different tables named grade and mark from two different data sources, formalized to be class Score.Furthermore, for the second semantic aspect problem is between two different tables with the same name Student table, formalized to be class LearningPerson and subclass StudentUndergraduate and StudentPostgraduate.

D. Ontology Development
Ontology development is the second phase in the Core development part.It is the process to develop ontology knowledge for a specific domain and purpose.This is done by using certain tool or application.
In this research, the ontology development is using the Protégé tool.Protégé is recommended because it is a free tool and it has reasoner features that able to evaluate and validate the ontology knowledge.The result from the ontology development is Web Ontology Language (OWL) syntax that can be used in programming language such as JAVA, programming language.Protégé also provides other useful feature, such as to convert the ontology knowledge into RDF/XML file format, OWL/XML format, OWL Functional Syntax, KRSS2 Syntax, OBO Format and Manchester OWL Syntax.The second partition of Fig. 4 illustrates the property assertions of Student1.There are nine semantic relationships as an object property and one data property that relates to Student1.The purpose of the ontology knowledge is to create semantic relationships between individuals in the ontology knowledge.www.ijacsa.thesai.org

E. Ontology Evaluation and Validation
The evaluation and validation stage is a process to verify the level of consistency of acceptance of ontology knowledge.The level of consistency is about semantic terms and relationships used in ontology to verify and validate whether the ontology threshold still has inconsistencies or all semantic terms and relationships have reached a level of consistency.The evaluation and validation of the ontology is performed using the reasoner feature in the protégé tool.There are several standard reasoner available in the protégé tool, such as FaCT++, HermiT and Pellet.Fig. 5 shows the evaluation and validation result using FaCT++ on the Protégé.

F. Ontology Refinement
The refinement is one of the phase in the Post-development part.It will be performed when the evaluation and validation phase from the Protégé reasoner yielded erroneous results.Fig. 5 shows the interface selection of the Protégé reasoner.
The ontology refinement phase is an iterative process in which it involves editing and improving ontology knowledge for better ontology results.The process will stop when the results achieve the consistency level of acceptance.

G. Completion of Documentation
The documentation process is a continuous activity that is conducted from the beginning of the first phase in OntoDI until the end.These documentations are important as they help recognizing the current state of a process and assist this research to maintain standards and consistency.
At the last phase of the Post-development part, the final version of the documentation will be compiled and completed.This documentation file helps the client/user of the ontology in understanding the processes and makes it easier to maintain for future improvements.www.ijacsa.thesai.orgV. RESULTS AND DISCUSSIONS The development of ontology knowledge using OntoDI has been completed and has been implemented in education domain.We claim that using OntoDI, the development of ontology knowledge gives simpler phases, complete steps, clear documentation for the ontology client and follow the standard of common ontology development phases proposed by Badr et al. [18].OntoDI is expected to improve the existing methodologies by adding and customizing suitable ontology development phases and become one of the promising solution for data integration implementation purpose.
In addition, OntoDI supports the development of ontology knowledge.By ontology knowledge, the semantic aspect problems can be resolved when they occur during sharing and integration process of the education domain.One crucial phase that had been added in OntoDI is the resources identification phase, in which it is important to identify the possibilities of semantic aspect problems on data sources.All tables that has semantic aspect problems, such as different name with the same meaning and same name with the different meaning, will be resolved.This phase is important before going to the next phase, which is Ontology conceptualization and formalization phase.
In the experiment, OntoDI has shown that it able to identify and select specific data or information that need to be integrated, at the conceptualization and formalization phase.At this phase, all terms are being generated into classes and subclasses in ontology perspective.After the generating process, all data that are related with the resources are formalized using semantic relationships.
Another advantage of OntoDI is its documentation phase.This is because, the ontology developer starts to document the process from the earlier phase of OntoDI and this task is continued in other phases until the last phase.Doing so, enables the developer to revise the process as it goes along and can be very helpful in identifying for any inconsistencies or inefficient results.Moreover, a documentation process assists the user of the OntoDI to get better understanding of the processes and allows timely changes when necessary.

VI. CONCLUSIONS AND FUTURE WORKS
Ontology becomes one of the popular research area in recent years.This is due to the fact that, there are a lot of semantic aspect problems during the implementation of a domain system.In the implementation of data integration, ontology becomes one of the solutions to solve semantic aspect problem.This research has successfully developed an improved method for ontology development in data integration (OntoDI).The ultimate goal of OntoDI is to make customization, improvement and simplification from existing methodologies to get better ontology development result for data integration area.In this paper, we have shown that OntoDI is applied in the education domain and able to resolve the semantic aspect problems.
For future work, OntoDI will be examined with other real case study.And more critical evaluations will be conducted to improve the OntoDI for a better ontology development in the future.www.ijacsa.thesai.org

CoDi
Collaborative and distributed construction category NoCoDi Not consider about collaboration and distributed construction category

Fig. 3
Fig. 3 shows the ontology knowledge in a diagram view that has been exported by the OntoGraf feature of Protégé.From this view, users can easily see the attributes in Student1.In this example, Student1 has nine object properties, one type (ontology classes or subclasses), one different individual and one data property.Moreover, Fig. 4 demonstrates the detail attributes of Student1 which is divided into two partitions.The upper partition is the description about Student1.Fig. 4 shows that Student1 is an individual of the StudentUndergraduate and Student1 is different from Student2.

TABLE I .
EXISTING METHODOLOGIES FOR ONTOLOGY DEVELOPMENT

TABLE IV .
ALL POSSIBILITY TERMS ON SGS AND GS