Fully Automated Ontology Increment’s User Guide Generation

This research focuses on a domain and schema independent user-guide generation for ontology increments. Having a user guide or a catalogue/manual is vital for quick and effective knowledge dissemination. If a user guide can be generated for an ontology as well, there could be ample advantages. Stakeholders can scan across the user guide of the ontology and verify the eligibility of it, against the intended purposes. Additionally, this could be useful in ontology`s version management requisites and knowledge verification requirements as well. Even though, ontology construction being iterative and incremental operational, there will be several intermediate versions before it reaches to the fine-tuned final version. Therefore, manual user guide creation will be a tedious and impossible operation. Consequently, this research focuses on a novel algorithmic approach to domain and schema independent ontology verbalization. A special algorithm is created to alter the functionality of Google’s AliceBot to work as a verbalizer, instead of a chatterbot. Artificial Intelligent Modelling Language (AIML) technology is utilized to create the templates for the ontology specific knowledge embeddings. This entire process is fully automated via the proposed novel algorithm, which is a key contribution of this research. Eventually, the generated user guide generation tool is evaluated against three different domains with the involvement of fifteen stakeholders and 82% of averaged acceptance has been yielded. Keywords—AliceBot; artificial intelligent modelling language; ontologist; verbalizing


I. INTRODUCTION
Ontologies are recognized as domain rich conceptualizations [1] which are both machine and humanreadable [2][3]. Further, as of its enriched ability on conceptualization, it`s ideal to encode specialized human knowledge [4]. Subsequently, encoded knowledge will be machine-readable accomplishing endless domain-specific reasoning and knowledge representation necessities. Because of those unique features, the popularity of ontology-based applications escalated drastically. Therefore, presently, there`re thousands of applied ontologies developed in numerous domains such as biology, agriculture, bioinformatics, law, management, etc. [5][6]. Applied ontologies are used to overcome issues coming from non-computing domains and by using the aforementioned benefits of the ontologies, most of those issues can be effectively resolved [30]. Construction of an accurately defined applied ontology is a complex process, which requires both ontologists and domain specialists to work hand-in-hand with mutual understanding throughout the entire process. As a methodical workflow to fulfil the effective bridging of domain specialists and ontologists, "collaborative ontology engineering" has been emerged out as a separate niche under the umbrella of ontology engineering [7][8][9].
One of the crucial necessities to be fulfilled in collaborative ontology engineering is the proper glueing in-between domain specialists and ontologists [10]. Unless effective participation and collective contribution reaching towards an error-free applied ontology will not be realistic. Researchers have mentioned, ontologists need to have reasonable insight on the domain to be modelled and vice-versa domain specialists should have a sound understanding of the essential basics associated with semantic web and knowledge modelling. Once this state is achieved only, collective, and effective participation of both parties can be expected, leading towards the construction of an error-free applied ontology schemata [11][12].
Even though, there is a critical bottleneck caused due to shortage or illiteracy of comprehension, on semantic concepts, experienced by non-computing domain specialists such as lawyers, medical doctors, agricultural specialists, etc. [13][14][15].
Ontology construction, being a complex, iterative and incremental task, it`s expected at the end of each iteration, domain specialists should cross-reference and verify, that the knowledge provided by them, are accurately and consistently modelled, and embedded to the ontology by the ontologists [16][17]. Then the errors located can be corrected then and there, without waiting until it reaches to complex conceptual flaws resulting in an erroneous schema. In accomplishing this requirement, lack of or no literacy in semantic concepts is a strong hindrance. Because to properly understand an ontological taxonomy defined, the user should have reasonable knowledge associated with basic object-oriented concepts such as inheritance and semantic concepts such as triple concept, data properties, object properties, disjoint classes, symmetric classes and, the concept of individuals, etc. Though the person understands those aspects, the next step is writing an appropriate SPARQL or SQWRL query to verify the accuracy of the knowledge embeddings. Properly understanding the schemata of the ontology and along with the use of accurate syntaxes, forming up an accurate SPARQL query is a challenging task, even for a computer scientist at once. Therefore, obviously, it`s an unrealistic goal to be expected from a non-computing domain specialist like a medical doctor, business manager or lawyer. Even if the SPARQL query is written and executed, results will be returned as a triple with URLs and pre-processing of those are required to derive the answers in plain English, where this is also could not be a feasible task based on the competency level of a noncomputing domain specialist [13][14][15].
All these obstacles are constraining the domain specialists` involvement, in accomplishing knowledge verification necessities, which is a crucial step in collaborative ontology engineering [9,18]. On the other hand, the creation of a catalogue (i.e., user guide) for the ontology`s structure is very important for its latter maintenance and knowledge diffusion requirements [21][22]. However, manually fulfilling this task could escalate the workload of the ontologists. Further, lately, if any alteration occurred in the schemata, the entire catalogue needs to be re-written or updated accordingly. This is going to be a highly effort-consuming and tedious task on the shoulders of the ontologists [21][22].
This research is focusing on a fully automated mechanism to verbalize (i.e., output the knowledge encoded in the ontology, in its natural language form) the entire ontology, despite its domain or the schemata [19][20]. This will resolve the technological challenges non-computing domain specialists need to face and it will revoke the burden of manual user guide creation efforts from the ontologists' workload as well. Therefore, both domain specialists and ontologists are benefited from this novel contribution. The key contributions of the proposed technique against the existing mechanisms are: - • The proposed technique can work with any domain  Domain independent.
• The proposed technique can work with any schema  Schema independent.
• No external / manual configuration requirements.
• Converts semantic contents in the ontology increments into layman understandable English.
II. LITERATURE REVIEW In the process of collaborative ontology engineering, the collective opinion of both the ontologists and domain specialists are very vital. There needs to be a proper workflow to accomplish the collective opinion derivation requirements [23][24]. Otherwise, in a collaborative environment, multiple people will raise multiple viewpoints and try to stick to their perspectives. This will ultimately lead to the issue of "Tragedies of Commons" [25][26]. Hence, in Shneiderman`s "Information Seeking Mantra" concept, Shneiderman has pointed a proper workflow to methodically integrate the dispersed viewpoints of the stakeholders to reach towards an overall collective insight, at the end [27]. Unfortunately, the idea of Shneiderman`s "Information Seeking Mantra" requires a specialist tool support to fulfil its workflow steps. Information Seeking Mantra concept first requests the stakeholders to get an overall idea about the problem of concern. In accomplishing this request, Shneiderman suggests the use of both visualization and verbalization tool support. Next, zoom towards the required information only. Both verbalization and specially defined visualization techniques can fulfil the second step's requirements as well. The third and fourth steps are focusing on, filtering unnecessary information and look for information on demand, respectively [27][28][29]. As the outcome of this research, though a special prototype is proposed to address all requirements of Shneiderman`s "Information Seeking Mantra", this paper`s scope is constrained to discuss its verbalization feature only, to manage the scope of this paper.
Technically, verbalization is defined as the process of translating axioms defined in ontology to natural language [19][20]. Most of the existing verbalization systems rely on the complex Natural Language Generation (NLG) pipeline to convert axioms into natural language [31]. This is a complex, technological pipeline where all the phases need to be accurately fulfilled, to get an understandable natural language output. Namely, those steps are defined as content selection step, discourse planning step, lexicalization step, aggregation step, generation of referring expressions and finally linguistic realization step [32].
Among all those steps, the discourse planning step is very vital to achieve coherent verbalization output. The discourse planning step utilizes the 'Rhetoric Structure Theory (RST)' for the coherent organization of the text [65]. RST is based on two main conceptions as nucleus and satellite. The nucleus represents the significant axioms associated with the considered domain and the satellite represents the associative properties linked with the nucleus which are required to elaborate the nucleus [65]. Therefore, if the identification of the nucleus and satellite didn`t occur in a domain-specific manner, it will adversely affect the clarity of the verbalized contents [31,33]. For that reason, there is a manual phase with the domain specialist and the ontology engineer to properly assign weights to the axioms defined in the domain considered. Afterwards, with the help of the pre-defined rule sets, it will automate the RST, assuring appropriate discourse planning, leading towards accurate and coherent verbalization.
The problem that arises here is the inability of using the same verbalizer for any other domain. Complex prior configurations, which is referred to as portal configuration is a must. This makes a verbalization ready framework to become domain-dependent always [32]. This is a key limitation associated with the existing verbalizing techniques.
The next restriction is the necessity of annotations to enrich the semantic realization of the concepts in the ontology. Again, this request additional effort from the ontology engineers and in most cases eligible foundational de-facto standard metamodels are (i.e. Dublin Core, FOAF) needed to be incorporated into the ontology. Because the majority of the existing verbalization frameworks are configured to link with the predefined annotated endpoints of those de-facto standard metamodels only. This poses an additional overload to the ontologists, and it acts as a modelling restrictor also [34][35]. Therefore, the free will of the ontologists and domain specialists are restrained, as they need to plan everything in a manner to suit up with the de-facto standard meta-models.
As the final disadvantage, it can be pointed, that most of the existing verbalizers produce Control Natural Language (CNL) which is resembling to assembly language and it`s not colloquial English that can be understood by laymen.
218 | P a g e www.ijacsa.thesai.org Therefore, another Natural Language Processing (NLP) layer must be introduced to overcome the barrier of converting technical English constructs to its colloquial format, which will be another processing overhead. One of the main causes for this is, the existing evolution of verbalizers have evolved up to the level of Attempt to Control English (ACE), which is a form of Control Natural Language (CNL). In CNL, verbalizers attempt to extract the triple formulations in the ontology and to exactly covert them into the English language, where the contextualized connectivity and colloquialism will be lost [36][37][38]. LODE [53] Ontology increment must be published on the web and the accessibility url must be according to the cool_URI format. XSLT script of the LODE is configured to work with only standardized metamodels like FOAF, Dublin-Core, etc. This acts as a modelling restrictor.
SWAT Tools [54] Extensive redundancies in the HTML link sequences generated for the ontology. Split information problem with lots of dispersed information here and there. Make the role of the domain specialist very difficult.
MIKAT [55] Verbalizer is statically attached to the breast cancer domain. It cannot work with any other domain. Fully domaindependent. That`s the main reason for the verbalized output to look very primitive and the flow seems inconvenient to interpret by the end-users. Most of the deficiencies associated with verbalizers are schema and domain-specific [39][40][41]. Among those, some of the general deficiencies are reviewed above. Hence, it can be easily concluded, through the existing verbalization mechanisms, the afore-mentioned research gaps of domain-dependence, excessive human involvement associated with configurations and CNL based less colloquialism are not properly resolved. Hence, the emphasis of this research is to propose a novel approach to overcome the aforesaid shortages. Table I contains a comparison of famous existing verbalization tools, their limitations and why they cannot resolve the issue of domain and schema independent verbalization by producing a colloquial user guide.
Further, several existing verbalization algorithms are also reviewed to recognize their deficiencies. Table II contain the details of verbalizer algorithmic analysis.
Therefore, according to the discussion conducted in the literature review section, it`s apparent there is a research gap to be resolved. The following sections of the paper discuss the steps followed to fulfil the recognized research gap.

III. METHODOLOGY
After completion of an intense and extensive systematic literature assessment [42], it's concluded that the aforesaid research gaps are still not been resolved properly.
Subsequently, the blend of the think-a-loud protocol [43]and systems thinking [44] notions are used to collectively brainstorm on the problem of concern and ultimately AIML (Artificial Intelligence-based modelling/mark-up language) is selected to implement the proposed solution, as AIML is recognized as an ideal technology for creating natural language software agents with an XML dialectic [44][45][46][47][48]. Table III contains the analysis results of the technology review conducted.
As per the collective brainstorming results logged in Table III, it was determined AIML is the ideal technology platform to be utilized to resolve the research problem. Because it has broad external integration support and no domain-related training datasets are required to train a domainspecific model. Hence, AIML ideally matches up with the domain and schema independence requirement.
Design science research methodology is selected as of the investigative nature of this research [47]. Implemented version of the prototype is quantitatively and qualitatively assessed on its functionality Fig. 1 exposes the application workflow of design science research methodology`s operation for this research.
The importance of the research problem being investigated and its timely relevance was already justified via the literature review results studied. Additionally, a comprehensive literature analysis was conducted again on existing verbalization tools and algorithms to explicitly justify the deficiencies unresolved. Consequently, it was recognized: 219 | P a g e www.ijacsa.thesai.org b) Zero configuration effort. c) For the fully automated user guide generation of the iterative and incremental ontology increments.
as the consolidated research, objective to be accomplished from this research.
Henceforth, via an adequate amount of collective brainstorming, the following algorithm was designed. The proposed algorithm comprises three main operational phases as depicted in Fig. 2. -Integration with third party resources is difficult -Steep learning curve -Costly AIML / ALICE [45][46][47][48] -No datasets required for training purposes -Intelligence is extracted from the knowledge scriptscan be auto-generated via axioms extraction from the ontology -Manual integrations and expansions are also supported -ALICE contains a robust collection of AIML scripts to make the bot more intelligent -Freely available -Easy to use -A lot of potential for external integrations -Stimulus-response model can be used to organize knowledge -A regular expression for pattern matching support is also available

Start
Upload RDF / OWL version of the ontology increment to be verbalized.

Check for the format as RDF or OWL. Trigger format-specific knowledge extraction logic. While [ Until EOF == TRUE ] Extract class information Extract data properties Extract object properties Extract class-specific individuals (if existing) Stow them appropriately in different relations of the RDBMS. End While
Ontology increment to be verbalized can be directly uploaded to phase-I of the algorithm. Phase-I of the algorithm contains code snippets for knowledge extraction from both RDF(Resource Description Framework) and OWL (Web Ontology Language) formats. Henceforth, the extracted knowledge elements will be separately stowed in the database relations as classes, data properties, object properties etc. Implementation of phase-I of the algorithm can be depicted as in Fig. 3 code snippet. 220 | P a g e www.ijacsa.thesai.org

Phase-II -[Auto-generation of the ontology specific AIML template]
The main purpose of this phase is ontology increment specific AIML template generation. This can be identified as a critical contribution to this research. Reasons for the choice of AIML technology is already elaborated in Table III  As depicted in Fig. 4, the generalistic baseline AIML template structure's placeholder contents will be filled by the information extracted from phase-II of the algorithm. Subsequently, the generalized baseline AIML template will be ontology increment specific. This will be done in a fully automated manner by phase-II of the algorithm.

Phase-III -[Verbalization Process]
The main purpose of phase -III is for the generation of the verbalized user guide. The pseudocode operation of the phase-III is as visible below. Start Load AliceBot Engine (Fig. 5  The initial step of phase-III is to load the AliceBot engine. Generally, AliceBot is a free chatbot engine provided by Google. However, through this algorithm, the behaviour of the AliceBot is altered from a general chatterbot to a verbalization engine. That's a significant contribution in phase-III of this algorithm.
Loading of the AliceBot engine can be depicted as in Fig. 5.
Henceforth, contents to be verbalized will be supplied as a request to the AliceBot Engine. Afterwards, AliceBot will traverse through the customized baseline AIML template generated by phase-II of the algorithm (i.e., Fig. 4) and conduct the verbalization process as per the information placed in the AIML template and by filling the placeholder values from the contents extracted from the database. This process can be depicted as in the code snippet in Fig. 6.

IV. RESULTS AND DISCUSSIONS
A sample sketch of the portion of the verbalized report generated for a criminal law ontology increment is depicted in Fig. 7. The entire verbalized user guide is hyperlinked to facilitate easier navigation across the document. To prevent information overloading and cluttering, a segment-wise approach is followed to efficiently layout the verbalized contents, facilitating readability.  222 | P a g e www.ijacsa.thesai.org AliceBot is a Google chatterbot, developed on the foundations of AIML technology. With the help of the aforementioned algorithm designed, the operation of the chatterbot is altered to an automated verbalizer, which is a significant research contribution.
As a means of enhancing the validity of the experiment conducted, the same verbalized user guide generation process is conducted for three different domains. Those are on COVID-19 ontology increment, aquaculture ontology increment and the criminal law ontology increment. Snapshots of the taxonomical structures of those ontology increments are included in the appendix of this paper.
The proposed evaluation workflow is depicted in Fig. 8. The proposed evaluation framework utilized in this research can be pointed out as another contribution to this research. Both quantitative and qualitative emphasis were blended in this proposed framework.
First, the operationalization step was carried out.
In there several open-ended questions were compiled against the research`s objectives. This mapping of the questions in the questionnaire with the research`s objective is the process of operationalization [64]. This will make sure the responses collected via the questions in the questionnaire are very much relevant and matching across with the requirements of the research.
The list of open-ended questions defined for this evaluation is as mentioned below:

1) Have you been notified about the existing verbalization mechanisms?
2) In contrast to those, have you identified any positive capabilities of the proposed mechanism?
3) Do you think it will facilitate the role of ontology increment verification? 4) Can you elaborate on how it will facilitate the role of the inspectors? 5) What are the deficiencies you identified in the proposed verbalization mechanism?
In the pre-warm-up setup, all stakeholders were exposed to a specially created synoptic video clip about the research conducted. This phase acts as a retrospect and summarizes the significant aspects of the research conducted to all the stakeholders involved in the evaluation. This was done, before the official commencement of the evaluation process as it will resolve all unclear areas associated with this experimental setup.
In the controlled interview sessions, a face-to-face interview series were conducted with nine domain specialists belonging to the criminal law, COVID-19 and aquaculture domains and six ontologists. The above listed five questions were the main basis for the controlling of the interview sessions with the fifteen stakeholders. All controlled interview sessions were video recorded to facilitate later interpreting requirements. The recording was done, via getting prior approval and consent from all the involved participants and it was utilized only for research purposes and not for any other personnel gains. During the thematic extraction phase, all recorded interviews were transcribed into a textual format. Henceforth, those were iteratively examined for several turns by the involved research staff. All information gathered through the repetitive analysis were segregated into a few generalistic themes. At the point of the analysis, initially at a drastic rate new themes started to emerge out, by the time of reaching the ninth transcription, a saturation of the themes were noted, where the same themes commenced repeating again and again. This characteristic was recognized as reaching the saturation state of the interview findings.
The theme extraction allowed to recognize of mostly insightful areas of the research. It was not feasible to collect all important opinions only via quantitative terms. Therefore, the qualitative phase enforced through controlled interview sessions created the opportunity to recognize significant user insights.
Henceforth, a close-ended question series was compiled to gather more insights on the located themes. This allows us to derive attention to details on the specific aspects with a numerical emphasis. A special rating grid was utilized to extract stakeholder opinions as depicted in Fig. 9.
Following five questions were provided in a close-ended format and requested to rate the opinions: 1) Natural English verbalization of the technical semantics of the ontology increment is accurate.
2) Acts as a manual / user guide for the ontology increment, enforcing offline usage as well.
3) Boosts comprehension when concepts are verbalized in layman terms. 4) Domain and schema independent, configuration free operation.
5) How would you rate the verbalization assistance provided by the tool support?
Following   Boosts blended comprehension when visualization and verbalizations are integrated during analysis.

3.
Offline information reviewing facility is important, as doesn't need to stick to a computer screen all the time.

4.
Acts as a user guide/manual for each version of the ontology increment.
Natural language (i.e., English) representation of the semantic concepts in a domain specialist friendly manner.
As the final phase of the evaluation process, the iterative framework was applied to reflective asses on the research objective accomplishment. Iterative framework [49][50][51] is an established framework to evaluate the efficacy of research objective accomplishment logically. The operation of the iterative framework is governed via three different but interconnected questions. Reflective evidence must be provided for each section in place. The discussion associated with the iterative framework steps were logged in Table VI. In the quantitative phase of evaluation, the verbalization prototype is exposed to multiple ontology increments in three different domains. In all those, experiments quantitative matrices are calculated to determine the overall efficacy of the verbalization prototype and it`s apparent the overall operation has yielded successful results.
In the qualitative assessment phase, stakeholders' opinions were thematically assessed, and refined outcomes were tabulated in Table V.
Both, quantitative and qualitative evaluation phases conducted on the criteria of the domain/schema independence verbalization, have collectively yielded successful outcomes.
Therefore, as the overall final reflection, as per the iterative framework rationale, it can be concluded as, there is a positive and satisfactory link between step-01 and step-02, which reflects the overall efficacy of the verbalization prototype/algorithm resulted from this research.

V. CONCLUSION
Effective synchronization between ontologists and domain specialists are a must to accomplish the effective operation of the collaborative ontology engineering goals. To fulfil this purpose, domain specialists should closely involve in the verification process of knowledge embeddings present in each ontology increment. Because error-free, domain-oriented applied ontology construction is an iterative and incremental operation. Hence, it's an extremely effective practice to expose the ontology increment for the cross-validations of the knowledge embeddings, at the end of each iteration.
Verbalization is recognized as a very effective procedure for fulfilling this necessity, as it's a natural language representation mechanism of the internal knowledge embeddings of the ontology of concern.
Usually, lack of technical literacy on semantic concepts, querying skill sets will act as a barrier for the non-computing domain specialists (i.e. lawyers, medical doctors, business consultants, bankers, etc. …), to complete their role of crossvalidation in collaborative ontology engineering workflow.
But as conversed in this paper, almost, all of the existing verbalizers have lots of limitations, confiding their operation to a statically attached one domain or complex configuration phases or technically complex verbalized results that are not easy to interpret. Therefore, this research focused on addressing those shortcomings of the existing verbalizers and uplifting the operational efficacy of the verbalizers to the next level, by making them operate in a domain and schema independent manner.
In this research, AIML based Alice bot's operational flow is transferred into a form of a verbalizer, by the introduction of 224 | P a g e www.ijacsa.thesai.org a fully automated newly defined algorithm, which is a significant technical contribution resulting out from this research. Henceforth, its operational accuracy is quantitatively and qualitatively evaluated, where both mechanisms have yielded an overall of 82% acceptance.
Domain and schema independent ontology verbalization with no manual configurations and fully automated user guide construction for the ontology of concern are two critical application-level contributions yielding out from this research.
But as one limitation, it can be concluded that this verbalizer will only work for ontologies with a lexicon-based schematic structure, as the backbone of this prototype is developed on top of a chatter bot's architecture. 227 | P a g e www.ijacsa.thesai.org