A Chatbot for Automatic Processing of Learner Concerns in an Online Learning Platform

In this article, we present a chatbot model that can automatically respond to learners’ concerns on an online training platform. The proposed chatbot model is based on an adaptation of the similarity of Dice to understand the concerns of learners. The first phase of this approach allows selecting the preestablished concerns that the teacher has in a knowledge base which are closest to those posed by the learner. The second phase consists of selecting among these k most appropriate concerns based on a measure of similarity built on the concept of domain keywords. The experimentation of the prototype of this chatbot makes it possible to find the adequate answers. In the case, where the question refers to a question from the teacher, the learner is asked if the question identified is the one he was referring to. If he answers in the affirmative, the instructions associated with his request are sent to him. If not, the learner’s concern is sent to the human tutor. The hybridization of this chatbot with the human agent comes to enrich the initial knowledge base of the chatbot. The results obtained with the concept based on the keywords of the domain are encouraging. The learner’s comprehension rate is above 50% when applying the concept of domain keywords while the measure of Dice is below 50%. Keywords—Metadata; ontologies; semantic similarity; natural language; semantic web; chatbot


INTRODUCTION
Chatbot are interactive virtual characters whose mission is to provide assistance to people in high-profile environments.Previous research has shown that this technology seems to have a positive influence on learning [1].In addition, the presence of interactive virtual agents, also called Chatbot, taking on the role of guardian [2], seems to have positive effects on student engagement [3] and on the effectiveness of teaching [4].In the education system in Côte d'Ivoire, the number of graduates is growing steadily, without a corresponding increase in the capacity of higher education institutions [5].To face this situation, the government has opted for the integration of new technologies (ICT) in education through the interconnection of universities and public schools in Côte d'Ivoire [6].This project should make it possible to unclog university lecture halls by relying on distance learning and facilitate access to teaching resources.However, since 2015 the infrastructures of the e-Education project are not operational.
In this dynamic, the State uses e-learning through the creation of Université Virtuelle de Côte d'Ivoire (UVCI) [7].One of UVCI's missions is to develop distance education in Côte d'Ivoire.This type of teaching is based on a set of platforms to facilitate access to learning resources for learners.In the pedagogical model of the UVCI, the human tutor plays the role of framer.It ensures the educational follow-up of the training.However, the response time of the physical tutor is low and the high number of students per physical tutor degrades the quality of the training.This sometimes gives rise to the feeling of abandonment in some students.
To remedy this, we offer a chatbot that helps to take care of students' concerns on a permanent basis.It is about lightening the task of teachers and tutors while contributing to the framing and effective management of student concerns.In the next section, we will describe the role of metadata and ontologies in how chatbot work.Then we will discuss the mechanism used by the chatbot to understand the sentences.Finally, we will see the experimentation of the prototype of the chatbot and the results.

II. LITERATURE REVIEW
Information systems have to evolve with certainty, their agility is a major requirement.Software architectures must therefore promote real flexibility and reusability to adapt to change.New software architectures have brought a real ability of an architecture to evolve in order to integrate some changes response to the complex need of integration of information systems.It is particularly in this context that the new generation of formal metadata system technologies and the semantic web, derived from the Service-Oriented Architectures paradigm, aims to respond in a relevant way to the question of interoperability related to the agility of chatbot systems.www.ijacsa.thesai.org

A. Semantic Web Technology
The term semantic Web, ascribed to Tim Berners-Lee [8] in the W3C, first refers to the vision of the Web of tomorrow as a vast space of exchange of resources between human beings and machines allowing exploitation, qualitatively superior, large volumes of information and varied services.Virtual space, it should see, unlike the one we know today, the users discharged of a good part of their tasks of research, construction and combination of the results, thanks to the increased capacities of the machines to access the resources and to reason with them.The semantic web is structured in layers.These layers correspond to different categories of formalisms grouped into three levels.This is the naming / addressing level, the syntactic level and the semantic level.The semantic web respects an architecture (see Fig. 1).This figure represents the structure of semantic web components.Most of the languages standardized by W3C as part of the Semantic Web are XML dialects, such as RDF and RDFS.The RDFS provides basic elements for defining ontologies or vocabularies for structuring RDF resources.SPARQL is a query language for RDF.Like SQL for relational databases or Xpath and XQuery for XML documents, this language is used to retrieve information from RDF documents.The construction of ontologies and metadata requires consensus in order to avoid lexical ambiguities due to hyperonymies and polysemias.A metadata is literally a datum on a datum.That is, a structured set of information describing any resource [9].The principle of metadata is about association a number of fields with resources for which values are assigned to each.These values can be given in a free format, as they can also conform to well-defined data formats.The operation consists of considering tags that are introduced in the files or in the appropriate programming languages.Tags have the effect of improving the efficiency of information searches compared to full-text searches.It is important to note that tagged digital resources carry with them their own metadata and those, when downloaded, copied, replicated, transmitted by email.This approach promotes interoperability for better exploitability of digital resources.Several standardization organizations have proposed and published metadata schemas that could be used by as many people as possible.We will analyze the different metadata schemas in the next section.
 LOM (Learning Object Metadata) [10]  EAD (Encoded Archival Description) [11]  Dublin Core [11] The concept of metadata requires the definition of a kernel of standard and context-dependent information.This can make it difficult to exploit metadata in a learning model.To optimize this concept, the metadata schema used in learning models is enhanced by technology.Indeed, metadata is associated with domains of knowledge that can be conceptualized in ontologies.
Ontologies represent a source of very reliable and structured knowledge.For this reason and thanks to the initiatives of the Semantic Web, which brought the creation of thousands of domain ontologies, ontologies have been widely exploited in knowledge-based systems, and more specifically, for the calculation of semantic similarity.An ontology is formally defined as a pair (O, Lex) where O is an abstract ontology and Lex is a lexicon for O [12].Let L be a logical language having formal semantics in which inference rules can be expressed.An abstract ontology is a structure O = (C, ≤c, R, σ, ≤R, I R) consists of: There are ontologies in different fields that support the design of learning systems including DogOnt ontology, SOUPA, CoBrA, CoDAMoS, etc. [13], [14].www.ijacsa.thesai.org SOUPA (Standard Ontology for Ubiquitous and Pervasive Applications).
In the literature several languages have been used for the description of ontologies.These languages include the eXtensible Markup Language [13], the Resource Description Framework (RDF) [15], the DAML + OIL (Darpa Modeling Language of Ontology + Ontology Inference Layer) [16] and OWL (Ontology Web Language) [14].These languages offer different levels of expressiveness.Making yourself available to answer questions about distance learning activities related to a training module followed in a teaching platform are nonobvious tasks especially if the number of learners is important.Hence our idea, to integrate a chatbot whose role is collaboration and cooperation with the human tutor.

B. ChatBot
A Chatbot is a computer program capable of simulating a conversation with one or more users by voice or text exchange.Indeed, he plays the role of an assistant who aims to answer the questions put to him, while imitating human behavior [17].The operating principle of a virtual guardian agent goes through three stages:  The learner first sends questions that he would like to address to the agent.
 The agent receives the learner's question.
 He analyzes the question by consulting his knowledge base and finally provides an answer to the questions asked by the learner.
We could classify chatbot into two main categories:  Virtual recommendation agents: This agent makes proposals to users in a virtual environment [18].
 Feedbacks chatbot: This agent makes feedbacks after performing an activity in a virtual environment [19].
Sassi researchers [18] propose a virtual recommendation agent that assists a user in his daily tasks, without any explicit request from the user.This agent aims to assist the user in his daily tasks thanks to his ability to perceive the state of the environment and to interact effectively according to the needs of the user.
Joanna's work [19] focused on the chatbot of Feedbacks.They provide a chatbot that can provide feedback to users after performing an activity in a virtual environment.Chatbot feedback and interpretation of user feedback is based on knowledge of the virtual environment.After analyzing the different works, we found that the proposed chatbot do not take into account the online learning environment.In the next section, we present some approaches for comparing texts.We will speak later of similarity between texts.The presented approaches have been selected to best respond to the context.Thus, this document does not claim to give an exhaustive list of all the existing methods but tries to give an overview of the most used methods in the context of our study.In the next section, we will describe these different notions of similarity measure in sentences.

C. Similarity Measures Between Sentences
In automatic language processing, similarity measurement plays an important role and is one of the fundamental tasks.The automatic understanding of a sentence requires from the web agent different types of abilities: recognizing words and associating them with lexical information (morphological analysis); structure the sentence with a grammar (parsing), understand the sentence with semantic rules (semantic analysis) and take into account the context (pragmatic analysis).Huangs [20] has shown that the performances of syntactic similarity based on the Jaccard index and the Dice index are very close and that they are significantly better than those of the Euclidean distance and the Levenshtein distance.The distance from Levenshtein is widely used in linguistics and bioinformatics as well as for the recognition of text blocks.Unfortunately, the computation time (complexity) is when applied to two sequences of approximately the same size.This is an obstacle in many practical applications.In Christine's work [21], she proposes a method for measuring the semantic similarity between strings of characters.This method is based on the combination of Levenstein's distance and Jaccard's index.This method has shortcomings when the strings correspond to names composed of several words.In addition, it requires a perfect match between each string in the two sets of strings.Thus, Hai-Hieu Vu and Jeanne Villaneau [22] proposed another method for measuring the semantic similarity between sentences that uses Wikipedia as the only linguistic resource.This method is based on a vector representation; it uses a random indexing to reduce the size of the manipulated spaces.Hai's method does not return a precise answer to the user.It returns to the user a Wikipedia article containing the elements of answer to his concern.The user is led to analyze this article in order to find an answer to his concern.Goutam Majumder and Partha Pakray [23] propose a method for calculating the semantic similarity between sentences based on the WordNet taxonomy.It allows to index, classify and put in relation the semantic and lexical contents of the English language.This method is not adapted to our context.
The similarity methods proposed in the research works are based on the TF-IDF method.TF-IDF (term frequency-inverse document frequency) is a weighting method used for finding information in the corpus.The TF-IDF method requires preprocessing of the corpus to determine the discriminating power of each word.While this pretreatment uses significant resources and lengthens the query processing time.The proposed chatbot model is an adaptation of the Dice measure based on the concept of domain keywords to understand the concerns of learners.The hybridization of the chatbot with the human agent comes to enricher the initial knowledge base of the chatbot.www.ijacsa.thesai.org

III. MECHANISM USED BY THE CHATBOT TO UNDERSTAND SENTENCES
We propose a measure adaptation of Dice to calculate the similarity between sentences.This approach is based on the Dice index and the measure of similarity of the keywords of the domain.We will discuss the principle of the algorithm and the process of calculating the similarity between sentences.

A. Principle of the Algorithm
 The learner sends a question to the chatbot.
 The chatbot receives the learner's question.
 The chatbot analyzes the learner's question.
 Lemmatization (Convection of words in lemma).
 Selection of k questions (Comparison of words in common and select questions closest to the learner's question).
 Similarity based on domain words (Search among selected questions, one that is semantically close to the learner's question).
 Proposition of the question semantically close to the learner.
 The learner should confirm that the proposal corresponds to his / her concern or not.
 If the learner answers with "NO", his question is returned to a human agent.
 If the learner answers with "YES", the chatbot provides the answer to the learner's question.

B. Calculation of the Similarity between Sentences
The calculation of the similarity between sentences has been implemented by performing the following steps: Phrase Labeling: This step deals with all of the sentences in the corpus (see Fig. 2) and converts each of their terms into lemmas.Lemmatization consists of finding the root of the bent verbs and bringing the plural and / or feminine words back to the singular masculine form (see Fig. 3).
Selection of k questions: A measure of similarity to select the k questions closest to the learners' preoccupation (see Fig. 4).This similarity approach is based on the measure of Dice.The measure of Dice calculates the similarity between two sentences and based on the number of terms common to and (see Fig. 4).
( ) Q E represents all the terms of the student's question.
S E represents the number of terms after the lemmatization of the student's question.The analysis of the terms common to Q E and Q S makes it possible to retain the k Q S questions close to the Q E questions.Then, a method of similarity based on the keywords of the domain allows to retain the Q S question closest to the Q E question (see Fig. 5).www.ijacsa.thesai.orgQ S represents all the terms of the teacher's s th question with ∀ ∈ * +.
S S represents the number of terms after the lemmatization of the teacher s th question that belong to the faith in Q S and MC.: represents all terms in common to Ss and (3) (5) ( 6) Lemmatization refers to the lexical analysis of the content of a text grouping the words of the same family.Each of the words of a content is thus reduced to an entity called lemma (canonical form).

Stopword:
A stopword is a non-significant word in a text.It is opposed to a full word.The meaning of a word is evaluated from its distribution (in the statistical sense) in a collection of texts.

Concern in the adapted format:
The learner's concern is converted into a format that allows the chatbot to understand it.

Knowledge base:
The knowledge base brings together knowledge specific to the field of Université Virtuelle de Côte d'Ivoire, in a form usable by the chatbot.It contains rules that allow structuring of the data.
Selection of k questions in the knowledge base: This module allows you to browse the knowledge base in search of the teacher's questions that are close to the learner's preoccupation.We retain k questions that have high score and close to the learner's concern.

Keyword concept of the domain to select the question closest to the learner's preoccupation:
Once the k closest questions are selected, we apply a concept based on the domain keyword principle.This approach selects the question of the teacher closest to the learner's concern.
Proposition of the question closest to the request: This module makes it possible to propose the question of the teacher to the learner.The learner is amenable to rewrite, if he answers by YES then the answer associated with the question of the teacher is returned to the learner.If the learner answers by NO then his concern is sent to a human agent for treatment.
Human Agent: When the chatbot does not have the answer to the learner's concern, the learner's concern is sent to the human agent who analyzes it and returns the appropriate answers.The responses of the human agent enrich the knowledge base.Fig. 6 shows the overall operation of the chatbot and hybridization in the human agent to enrich the knowledge base.

V. EXPERIMENTATION
The experiment concerns the global operation of the prototype of the chatbot.The learner is connected to his workspace (Fig. 7) and he submits his concern to the chatbot.When he clicks the Enter key or the Submit button in the window, his concern is then converted into a language query (Fig. 8).Treatments are successively carried out as the suppression of stopwords then a lemmatisation of the remaining terms.
As a result of these treatments, the query obtained is analyzed to obtain questions from the teacher close to the learner's question.Then, the treatment carried out makes it possible to find the question of the teacher closest to the question of the student.Once a question is selected, it is sent to the view to be returned to the learner (Fig. 9).Then the answer to this question will be analyzed and will return the instructions according to the following answer: Yes: the appropriate instructions will be sent (Fig. 9) No: the learner's concern is sent to a human tutor for analysis (Fig. 10) The experiment is performed with the following hardware and tools: It is a Corei7 processor computer, 12GB RAM and 1TB hard drive, the object-oriented PHP programming language and a Database Management System MYSQL.www.ijacsa.thesai.orgFirst a message of good is given then the student in the field seizes in the order to submit his concern to ChatBot.Finally, the learner clicks the submit button to validate his concern.As shown in Fig. 8, once the learner submits his preoccupation.This triggers the process of processing the quest.After treatment, the chatbot offers a response element to the learner.The learner has the opportunity to confirm the proposal of the chatbot.An answer item is returned to the learner based on the confirmation.If the learner answers by YES, he receives the answer adapted to his concern (Fig. 9) and if he answers by NO, the concern is sent to the human agent have the adapted answer (Fig. 10).The figures show the process used to respond to the learner.In the case where the question refers to a question from the teacher, the learner is asked if the question identified is the one he was referring to.If he answers yes, the instructions associated with his request are sent to him.If not, the learner's concern is sent to the human tutor.
Experimenting with the prototype of the chatbot makes it possible to find adequate answers to queries posted by the learner by applying our semantic similarity method.

A. Evaluating the Performance of the Chatbot Prototype
This assessment focuses on the learners' level of understanding of the learner's concerns.The table below represents the comprehension rate according to the number of questions asked by the learner (Table I).
NQ: The number of questions MT: The method based on the index of Dice and concept of keywords of the domain DICE: The measure of Dice CMC: The concept based on the words of the domain TC-DICE: Rate of understanding of the questions by applying the index of dice followed by the variation of the number of questions TC-CMC: Rate of comprehension of the questions by applying the concept of words of the domain followed by the variation of the number of questions  The graph above represents the rate of understanding of the questions based on the index of Dice and that based on concept of keywords of the domain (an improvement of the measure of Dice) (Fig. 11).

B. Results
Fig. 11 shows that the learner's comprehension rate is above 50% when applying the Dice Index method.In addition, the rate of understanding of the questions is weaker and weaker as the number of questions increases while the student's comprehension rate is above 50% when applying the concept of domain keywords.This concept is an improvement of the Dice Index.The results obtained with the concept based on the keywords of the domain are encouraging.

VI. CONCLUSION
In this paper, we presented work based on similarity measures to provide a chatbot with the ability to provide adequate responses in a learning interaction with learners.We have shown the steps to implement the prototype of the proposed chatbot that is an adaptation of the Dice Index.We also described the overall operation of the chatbot and the process used to address the learner's concern.The rest of the work consists in integrating the chatbot into the teaching of Université Virtuelle de Côte d'Ivoire and finish with the process of evaluation of learner's satisfaction.
Two disjoint sets C and R whose elements are respectively called Concepts and Relations; A partial order ≤C on C, called hierarchy of concepts or taxonomy; A function σ: C x C called signature; A partial order ≤R on R, called hierarchy of relations where r1 ≤R r2 implies sigma (r1) ≤CxC σ (r2) with r1, r2 ∈ R. A set I R of inference rules expressed in the logical language L; The dom function: R → C with dom (r) = П1 (σ (r)) returns the domain of r; The range function: R → C with rank (r) = П2 (σ (r)) returns the scale of values of r; A lexicon for an abstract ontology O= (C, ≤C, R, σ, ≤R, I R) is a structure Lex: = (S C , S R , Re f C , Re f R ) which consists on: Two sets SC and SR whose elements are called signs, respectively for concepts and relations; -Two relations Re f C ⊆ S C x C and Re f R ⊆ S R x R, called assigning lexical references respectively for concepts and relationships; From Re f C we define ∀s ∈ Sc, Re f C (s) = c ∈ C|(s,c) ∈ Re fc and Re fc-1

Fig. 4 .
Fig. 4. Common terms analysis process.Similarity measure built on the concept of domain keywords: this method consists in finding the Q S question closest to the Q E question by integrating the principle of the keywords of the domain.MC represents all the keywords of the teacher's course.Q E represents all the terms of the student's question.S E represents the number of terms after the lemmatization of the student's question that belong to faith in Q E and MC.

Fig. 7 .
Fig. 7. Window allowing the learner to submit his concern to the chatbot.

Fig. 7
Fig.7represents the window of dialogue with the learner.First a message of good is given then the student in the field seizes in the order to submit his concern to ChatBot.Finally, the learner clicks the submit button to validate his concern.As shown in Fig.8, once the learner submits his preoccupation.This triggers the process of processing the quest.After treatment, the chatbot offers a response element to the learner.The learner has the opportunity to confirm the proposal of the chatbot.An answer item is returned to the learner based on the confirmation.If the learner answers by YES, he receives the answer adapted to his concern (Fig.9) and if he answers by NO, the concern is sent to the human agent have the adapted answer (Fig.10).

Fig. 8 .
Fig. 8. Suggested question after the analysis of the learner's concern.

Fig. 10 .
Fig. 10.The proposed question does not match the learner's concern.
The tests are performed by applying the Dice Index method and the domain keyword-based concept (Dice Improvement) to the learner's concerns.The different tests are carried out with a series of concerns of the learners.It is a question of calculating the rate of comprehension of the concerns of the learners by the chatbot.The rate of comprehension of the questions based on the Dice index represents the ratio of the number of terms of the learner understood by the chatbot on the terms of the learner's question.www.ijacsa.thesai.orgTE : The terms of the learne's question TEC : Learner's terms understood TC : Rate of understanding.(7) Calculation process of the understanding rate with the measure of Dice TEC: 2 ; TE : 5 ; TC = 2/5 TC = 40% The rate of understanding of the questions based on the concept of domain keywords (An improvement of the measure of Dice) represents the ratio of the number of terms of the teacher understood by the chatbot on the terms of the question of the teacher.TS: The terms of the teacher's question that corresponds to TE TSC: The terms of the teacher understood TC: Rate of understanding.(8) Calculation process of the understanding rate with the concept of domain keywords TSC: 3 ; TS : 7 ; TC = 3/7 TC = 43%

Fig. 11 .
Fig. 11.Representation of the understanding rate of the questions based on the measure of Dice and the concept of keywords of the domain.

TABLE I .
RATE OF UNDERSTANDING OF THE QUESTIONS BASED ON THE DICE INDEX AND THE CONCEPT OF KEY WORDS IN THE FIELD