Arbitrary Verification of Ontology Increments using Natural Language

Parallel to the advancement of practical use cases in computers, the trend toward collaborative ontology engineering is accelerating. Both domain experts and ontologists must collaborate in collaborative ontology engineering processes. However, the bulk of domain experts are not computer experts (i.e. lawyers, medical doctors, bankers, etc.). Question and Answer on Linked Data (QALD) is a suggested method for noncomputer domain experts to engage with the ontology increments as they evolve. Existing QALD methods and systems, on the other hand, have a number of drawbacks, including significant setup requirements, domain dependence, and user discomfort. As a result, a new QALD algorithm and QALD system designed with the usage of First Order Logic (FOL) are presented in order to address the shortcomings of current QALD mechanisms. The suggested FOL based, QALD mechanism was tested quantitatively and qualitatively over three distinct ontology increments. This experiment had an overall acceptance rate of 79 percent from all stakeholders. Keywords—First order logic; linked data; ontologist; iterative framework


I. INTRODUCTION
Ontology increment verification using QALD is critical for evaluating the correctness and relevance of a given ontology increment. Existing QALD methods, on the other hand, have a number of flaws that will be discussed in this article.
Both domain experts and ontologists must work in unison and with mutual understanding during collaborative ontology engineering [1]. Specialists in certain domains will help ontologists by sharing their specific domain expertise (i.e. COVID-19, Criminal Law, Aquaculture, etc.) [23]. On the other side, ontologists will conceive and build ontologies using the information collected from domain experts [24]. Thus, the information contained in the resultant ontology will become both human and machine readable [2], thereby allowing for an unbounded wide variety of application options.
However, the process of developing an ontology is iterative and incremental [3]. At the conclusion of each cycle, the ontologists will generate an ontology increment. Domain experts must then evaluate the ontology increment generated by the ontologists. When information is transferred from domain experts to ontologists, there is a possibility of misinterpretations and ambiguities that result in cognitive glitches. As a result, it is possible that the ontologists do not always replicate the precise cognitive interpretations conveyed by domain experts. Because neither ontologists nor domain experts are ontologists. Consequetly, there are many ways for knowledge errors to result in an incorrect schematic conceptualization at the ontology level. This might be hazardous if such ontologies were to be published directly into the production environment and produced illogical outcomes [4,25]. QALD is a favored method for bridging this knowledge gap between domain experts and ontologists. Effectively built QALD systems may significantly aid domain experts in their ontology augmentation evaluation process. As a result, domain experts and ontologists may collaborate to debate and implement necessary improvements to the ontology increment under review [5][6][7].
However, current QALD systems have a slew of problems and restrictions that limit their potential. The bulk of them have a complicated technical curve that excludes noncomputer domain experts such as bankers, attorneys, medical practitioners, and marketers from eligibility. it is not possible [8][9][10]. Therefore, this study introduces a new domain specialist-friendly, domain-and schema-independent, configuration-free algorithm to aid the QALD process in a more effective manner.
II. LITERATURE REVIEW SPARQL or SQWRL querying capabilities are a highlevel capabilities that cannot be acquired overnight. Even if that barrier is resolved, an individual cannot construct a valid SPARQL or SQWRL query without first understanding the schematic structure of the corresponding ontology increment. To understand the schematic structure of an ontology increment, one must be familiar with the semantic web's fundamental notions, such as triple concepts, data and object properties, and individuals. All of them are extra and unnecessary costs for domain experts, which may demotivate their participation [8][9][10]. However, in collaborative ontology engineering, the domain experts' participation in evaluating ontology increments is critical [2]. The following Table I summarizes an evaluation of various recently implemented QALD systems and their shortcomings. Schema-Agnostic QALD [5] -Using similarity assessment logic, SPARQL queries are generated based on the contents of the natural language query.
-Accuracy is very low. Results that produces tempt to be very ambiguous.
Question and Answering on Linked Data (QALD) [11] -The QALD tool is statically associated with a single ontology. It is incompatible to work with all other ontologies.
-Extremely domain-specific. But the requirement is for domain independency Regular Language to SPARQL questions [12] -Based on the tool's domain-specific rule sets.
-Generates numerous SPARQL queries, even for a single purpose.
-Extensive operational overhead and extremely low precision.
-Intense human interaction throughout the setup process.
In addition to that, an assessment of some of the latest existing QALD algorithms was conducted as depicted in Table II.   TABLE II. EXISTING QALD ALGORITHM ANALYSIS ACCORDING QALD Algorithm Deficiency SPE Algorithm [14] -An extensive domain-specific configuration effort needs the intense involvement of domain experts.
-It is necessary to create predefined question and response pairings.

Conversational
Question and Answering (CQA) with BERT [15] This is accomplished via the use of a sophisticated machine learning model that has been trained on 104 languages. The BERT architecture is not suitable for querying linked data using SPARQL or SQWRL.

Visual Genome and Visual Question
Answering [16] -This is based on the visual genome dataset and a deep neural network trained on it. As a result, it is statically bound to a domain.
QA Optimization pipeline Algorithm [17] -Frankenstein Framework is the basis for this algorithm. It is intended for the purpose of determining the optimum QA pipeline from 360 configurations and is not tailored to QALD needs.
Template-Based Question and Answering [18] -Defining templates is a time-consuming manual process that needs significant human participation.
As shown in Tables I and II, the current QALD methods have a number of drawbacks. The primary weakness of current QALD methods is as follows: 3) Extensive effort required for manual configuration. 4) Inconvenience to the user.
The purpose of this study is to develop a more userfriendly, domain-and schema-independent QALD method that does not need knowledge of SPARQL or SQWRL to verify ontology increments. Additionally, this is free of lengthy manual setup procedures.

III. METHODOLOGY
The research methodology used in this study is shown in Fig. 1.
After numerous brainstorming sessions with ontologists and domain experts, the nature of the issue and the importance of resolving it were justified. Following that, a systematic evaluation of the most recent available tools and algorithms was performed to identify unsolved gaps. Thus, the objective of "need for a more user-friendly QALD method for verifying ontology increments" was created. Finally, the brainstorming findings resulted in the introduction of the following algorithm. As shown in Fig. 2, the suggested method is divided into four distinct stages.
The algorithm's first phase is responsible for extracting information from the associated ontology increment file. The corresponding ontology increment files may be in OWL (Web Ontology Language) or RDF (Resource Description Framework)format. To begin, this method requires a file containing the ontology increments. Phase I will extract and store the knowledge in a relational database. Below is a representation of the pseudocode for phase-I execution. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 7, 2021 374 | P a g e www.ijacsa.thesai.org

End While End
The portion of the code snippet associated with the practical implementation of the pseudocode phase -I is depicted in Fig. 3. Phase II is in charge of creating the FOL (First Order Logic) version of the ontology increment by accessing the database's relevant knowledge embeddings. This will be completely automated, with no human intervention required. The pseudocode below illustrates the execution of Phase II. The portion of the code snippet associated with the practical implementation of the pseudocode phase II is depicted in Fig. 4. Similarly, the autogenerated first-order logic (FOL) version of the inspecting ontology increment is depicted in Fig. 5. This FOL version of the inspecting ontology increment is completely autogenerated as an outcome of phase II of the algorithm.

Phase-II -[Auto-generation of the First Order Logic Knowledge base]
Phase II transforms the RDF/OWL file's semantic components to a standardized FOL format series. For instance, among the specified standard format series are the following: 1) Classes are converted as:= "Class(Class Name)".
6) Object Properties as:="oProp(Individual-1,obj_prop_name, Individual-2, Domain, Range).  The relevant values retrieved and stored in the RDBMS, which was accomplished in phase I, will be replaced with the associated parameter values accessible in the gerenated standardized FOL rules displayed earlier. These standardized FOL rules along with the replaced parameter values will be utilized for the autogeneration of the FOL file as shown in Fig. 5. Therefore, the phase II of the algorithm converts the entire contents of the ontology increment into it`s respective FOL format. The algorithm's third phase is responsible for locating the FOL components that correspond to the plain language queries. Here, domain experts may immediately ask the necessary queries in English, obviating the requirement for SPARQL or SQWRL literacy entirely. The following pseudocode illustrates the execution of Phase III.

Phase-III -[Natural Language Query Mapping]
Start Accept natural language (i.e. English) user query. Activate Part of Speech Tagging (POS) for the user query classification. Derive POS sequences for the user query. Locate positioning of various lexicons and special terms and update flag variables accordingly to classify the user query. Remove all prepositions, modal verbs, pronouns. Nouns and Verbs will only be remaining.
Locate the nouns and verbs of concern by executing a verification prologue query on the generated prologue knowledge base (i.e. Fig. 5 The code snippet below (i.e. Fig. 6) illustrates the categorization and updating of flag variables depending on the Part Of Speech (POS) of the user inquiry. This technique can be used to extract particular requests for data or object attributes from a natural language-based user inquiry. The following figure (i.e. Fig. 7) illustrates the updating of POS sequence-specific flag variables and the creation of chained prologue queries. The chained prologue query structures are parameterized in this section. An appropriate parameterized chained prologue query will be initiated based on the POS www.ijacsa.thesai.org sequences in the natural language user inquiry. At the moment, specified parameterized prologue query structures mapped to the user query's POS sequences are capable of extracting nearly all object and data property queries.  Phase IV is in charge of processing the output in natural language (i.e. English) and ensuring that it is readily understandable by domain experts. The pseudocode below illustrates the execution of Phase IV.

Start
Use StringBuilder to append all returned results with "\n" as the delimiter.
Split the appended output by the delimiter "\n" Return the result in Natural Language (i.e. English) End The QALD interface designed is shown in Fig. 8 above.
The user just has to enter the desired question in English and click the query button. Then, as visible, the relevant findings will be presented in natural language.
Let us attempt to understand the execution of this algorithm using a simple real-world scenario. Assume you have been given a university ontology increment that includes information of professors who teach various courses. Thus, the domain expert may simply ask the query in plain language (i.e. English) to verify that correct mappings are included to the ontology increment.
For instance, who teaches Financial Accounting?
After completing the first levels of processing, the aforementioned English question will be converted into its new structure as teach [ VERB ] "Financial Accounting" [ NOUN ]. Thus, a simple Prologue verification query on the phase II-generated 'Prologue. pl' file will validate "teach" as an object property and "Financial Accounting" as a data property. Because "teach" is a property of an object, it will have both a domain and a range. However, "Financial Accounting" will have a domain-exclusive scope. Due to the axiomatic difference between object and data, properties can readily be distinguished with a simple IF condition. Thus, extracted knowldge elements may be represented as follows using the specified standard FOL representation rules: Likewise, teach can be represented as:-oProp(Individual-1," teach", Individual-2, Domain, Range) As per the question asked our concern is to find a person. Therefore, we can negate the unnecessary fields of the prologue query as mentioned below. www.ijacsa.thesai.org dProp(Individual,_, "Financial Accounting", _). oProp (Individual-1," teach", Individual-2, _, _) Henceforth, a chained prologue query can be formulated as: "teach",X,_,_),dProp(Y,B,Z,_).
Once the above-chained prologue query is executed in the fact base following conclusions can be derived.

dProp("Lect-1",B,Z,_).
Hence, B represents the data property names and the Z represents the data property values.
The algorithm's fourth step allows for the formatting and representation of returning results as cleaned string outputs, resulting in the names of individuals who teach Financial Accounting.
According to Lampa's comprehensive experiment report [19], prologue searches are straightforward and close to 10 times quicker than SPARQL or SQWRL queries. The primary reason for this is because the Prologue reasoning engine uses a backtracking search technique to explore the fact base in search of necessary axioms. SPARQL requires the generation of a parse tree after finding the ontology's schema, which is a highly complicated and resource-intensive operation [20]. As a result, our SPARQL-free natural language interrogation technique represents a major addition.

IV. RESULT AND DISCUSSION
The algorithm described above was exposed in three distinct ontology increments, one for COVID-19, one for aquaculture, and one for criminal law. This experiment included a total of fifteen stakeholders. Nine were domain experts, while the remaining six were professional ontologists. The appendix of this article contains snapshots of the three ontology increments used in this experiment. We did not go into detail on such ontology increments since they are outside the scope of this article.
The evaluation workflow for the suggested new algorithm is depicted in Fig. 9. This suggested evaluation workflow took into account both quantitative and qualitative perspectives.
Prior to it, the operationalization phase was accomplished. We prepared a list of open-ended questions regarding the study's goals. Operationalization entails matching questionnaire items to the research objective [21]. This guarantees that the questionnaire's questions elicit highly relevant and consistent answers. The following is a collection of open-ended questions that correspond to the assessment's study aim:

1) Have you been informed of the NLI processes that are currently in place?
2) In comparison to them, what are the favorable characteristics of this mechanism that you identified?
3) Do you believe it will ease the ontology increment verification function? 4) Could you expand on how this would ease the inspectors' work? 5) What flaws did you discover in the suggested verbalization mechanism?
Both ontologists and domain specialists were shown a specially created synoptic video clip about the research as part of the pre-warm-up setup. This phase acts as a retrospective, summarizing the major findings of the research performed by the evaluation's stakeholders. This was done prior to the formal commencement of the assessment process in order to clear up any remaining questions about the procedure.
A face-to-face interview series with nine domain experts in criminal law, COVID-19, and aquaculture was done during the controlled interview session. The five questions outlined above served as the basic foundation for the nine domain experts' interviews. All controlled interview sessions were videotaped to aid in subsequent analysis. All participants gave their previous permission and consent to the recording, which was utilized only for research purposes and not for personal benefit. Fig. 9. Evaluation Workflow. www.ijacsa.thesai.org During the thematic extraction process, all recorded interviews were transcribed into textual format. Following that, the research team iteratively analyzed the transcribed texts for numerous turns. The data gathered over the course of the repetitive study were categorized into a few broad themes. At the outset of the research, new themes developed rapidly; however, as the study progressed to the ninth transcription, the development of new topics slowed significantly, while the same motifs reappeared repeatedly. This characteristic was identified as approaching saturation.
The extraction of themes facilitated the identification of the most intriguing aspects of the research. It was difficult to elicit all important views simply via quantitative procedures. As a consequence of the qualitative phase, which was conducted through controlled interview sessions, important user insights were identified.
Following that, we created a series of closed-ended questions to extract more information on the highlighted themes. This enables us to focus our attention on particular subjects while simultaneously emphasizing their numerical importance. Fig. 10 illustrates the process of extracting quantitative stakeholder views using a customized rating grid. 10 20 The four questions below were given in a closed-ended manner, with respondents invited to evaluate their level of agreement with the quantitative inspection criteria.
1) SPARQL / SQWRL is not required, and interrogations may be initiated using plain language (i.e. English).
2) Addresses needs for on-demand, ad-hoc knowledge verifications.
3) Operations are domain and schema-independent and configuration-free.
4) The retured findings were accurate. 5) How would you evaluate the tool support's NLI assistance?
The mean answer scores received from nine domain experts from three distinct domains and the ontologists participating in this research are summarized in Table III. The following Table IV summarizes the qualitative interpretations elicited during the controlled interview session.  The iterative framework was utilized as the last stage of the assessment process to keep the emphasis on the research objective. Iterative framework [22] is well-established method for evaluating the effectiveness of rationally accomplishing research objectives. Three different but connected questions control the iterative framework's functioning. Each part must provide reflective proof. Table V summarizes the discussion of iterative framework measurements. The QALD method was exposed to multiple ontology increments in three different domains throughout the quantitative assessment phase. Quantitative matrices were utilized to evaluate the overall efficacy of the QALD algorithm in each of these tests, and it was apparent that the overall operation was a success. Throughout the qualitative assessment process, the views of stakeholders were analyzed thematically, and the distilled findings are summarized in Table IV. The quantitative and qualitative assessment stages were both completed successfully.
As a consequence of the iterative framework's reasoning, it is feasible to infer that the connection between stages 01 and 02 is positive and acceptable, indicating the effectiveness of the newly suggested QALD algorithm. www.ijacsa.thesai.org V. CONCLUSION QALD mechanisms will be very helpful throughout the collaborative ontology engineering process for doing ad-hoc verifications of the knowledge embeddings contained in the ontology increments. As previously mentioned, mutual understanding between domain experts and ontologists is critical for the successful development of applied ontologies. Domain experts, on the other hand, such as medical physicians, attorneys, bankers, and marketers, are not computer specialists fluent in SPARQL, SQWRL, or Semantic principles. Thus, establishing English-language-based QALD mechanisms will be critical for domain experts to communicate successfully with ontologists and accomplish their assigned tasks.
As previously stated, current QALD methods have a number of flaws, as shown in Tables I and II above. As a result, this study proposes a new algorithm that addresses those problems by enabling domain-and schema-independent, configuration-free QALD intervention. The proposed method was evaluated in three distinct domains and produced successful results with an overall of 79% acceptance from the involved stakeholders, as shown in Tables III, IV, and V. Therefore, it may be characterized as a new and important addition to the field of semantic technologies. In the future, it is planned to test the algorithm's effectiveness across a variety of other areas and to incorporate the chatbot capability to further enhance human-computer interaction views. 380 | P a g e www.ijacsa.thesai.org