State of the Art in Intent Detection and Slot Filling for Question Answering System: A Systematic Literature Review

—A Question Answering System (QAS), also known as a chatbot, is a Natural Language Processing (NLP) application that automatically provides accurate responses to questions posed by humans in natural language. Intent Detection and Classification are crucial elements in NLP, especially in a task-oriented dialogue system. In this paper, we conduct a systematic literature review that will perform a comparative analysis of different techniques or algorithms that are being implemented for intent detection and classification with slot filling. The goals of this paper are to identify the distribution, methodology, techniques or algorithms, and evaluation methods, that can be used to develop and construct a model of intent detection and classification with slot filling. This paper also reviews academic documents that have been published from 2019 to 2023, based on a four-step selection process of identification, screening, eligibility, and inclusion, for the selection process. In order to examine these documents, a systematic review was conducted and four main research questions were answered. The results discuss the methodology that can be used for the implementation of intent detection and classification with slot filling, along with the techniques, algorithms and evaluation methods that are widely used and currently implemented by other researchers.


I. INTRODUCTION
A Question Answering System (QAS) is a group of natural language texts or a pre-structured database which are used to automatically provide correct and accurate responses to questions posed by humans in human natural language [1,2].A QAS is more capable and more efficient in answering the user query than most search engines such as Google, Yahoo, Live.com,Ask, YouTube, Facebook, and Microsoft Bing [3].
Search engines have a remarkable capability; however, the engines provide lists of related websites and resources including documents that match the user's query [2] without regard to their real intention or what the real question asked is.Hence, instead of the user having to search for the most relevant website or result to their query provided by the search engines, a QAS makes it easier for the user since it just displays the necessary information and results by detecting, recognizing, and classifying the intents of the user via their human natural language, thus providing them with a corresponding yet accurate response and result directly [2,4].Natural Language Processing (NLP) is one of the branches of Artificial Intelligence (AI) that studies human-computer interactions [5].AI has two subsets, which are Natural Language Understanding (NLU) and Natural Language Generation (NLG).Being able to detect the intent of an utterance has been a keen issue in NLU [6] since it uses syntactic and semantic analysis of the text and speech to determine the meaning of a sentence.
Intent Detection (ID) and Classification (IC) act as critical elements in human language or NLU, especially in a taskoriented dialogue system [1,6,7,8].In particular, ID is usually related to the keywords of utterance, entities or slots, and intent.ID mainly aims to identify the user's true intent from a given utterance [8] in which, according to [9], the user's true intention sometimes does not settle the meaning of their utterance, and vice versa.
Entities or slots refer to the extraction of the associated arguments of the utterance in which every word in the utterance has its entity or slot.The most usual way of its representation is the IOB representation [10], where the initial "B" in the label indicates the beginning of the slot, the "I" indicates an extension of the "B", and lastly "O" refers to a null label, and it commonly labels the other general words in an utterance.The slots or entities that are constructed are identified by the researchers, hence it is meaningful and easy to be understood by humans.
The intent that is detected by the ID model is another crucial part that defines the context of the text, whereby these intents are the ones that classify and capture the true intention of the user, what they are trying to search for or what they are trying to convey [11] to the application or the system.There are numerous possible intents, depending on the utterance, and for an example of an utterance related to books, some of its intent might be SearchBook, PurchaseBook, BookAuthor, BookGenre, and so on.Many researchers have investigated intent recognition in the English language with very good accuracy.However, to the best of our knowledge, there has been little work done on the Malay language for question answering, with exceptions such as [12].www.ijacsa.thesai.orgWithin this paper, several methods and methodologies have been identified and proposed for developing ID and IC with a slot-filling model.For instance, [11] has performed several experiments on ID and IC by using various techniques such as Seq2seq, Slot-Gated, Capsule NLU, SF-ID, StackPropagation, SlotRefine, GL-GIN, and Joint-BERT.In addition, a model of ID and IC has been proposed with slot filling of JointBERT and Conditional Random Forest (CRF), called JointIDSF, whereby both XLM-R and PhoBERT were used as the utterance encoder [1].This paper will contribute to better understanding, and thus developing and implementing, the methods of ID and IC with slot filling in various fields.
In addition, we review the existing literature on the development and implementation of ID and IC with slot filling in this paper in which our main objective is to explore which methods or techniques are the most suitable and can achieve higher accuracy in terms of IC performance.The following are the contributions of our review.
 The previous research that has been published regarding ID and IC with slot filling is investigated.
 The methodologies or approaches proposed by previous research are identified and discussed.
 The techniques or algorithms that can be implemented on ID and IC with slot filling are identified.
 A discussion on the validation process and main results of the previous research is conducted.
On the other hand, [8,15,17,20,24,26,28,31,33] have implemented integration techniques of Bidirectional Encoder Representations from Transformers (BERT) with various other techniques such as CRF, LSTM, BiLSTM, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or Regular Expressions (RE).On top of that, there are also published papers with comprehensive reviews [50-51, 53, 55].The aim of [50][51] is to conduct a literature review about Intents, Intention Mining, and IC and focuses on the review of algorithms, models, and tools that have been implemented in Intention Mining.A review paper on the ID methods in the human-machine dialogue system seeks to advance the study of multi-intent detection methods based on Recurrent Neural Networks [52] and deep neural networks [53].This paper primarily analyses compares, and summarizes the deep learning methods used in the research of ID in recent years.It also considers how to apply deep learning models to multi-intent detection tasks.The third paper [55] introduces the methods of two tasks ranging from the independent model to the joint model.It focuses on joint modelling methods based on deep neural networks and analyzes current problems and future development trends of two sub-tasks.Therefore, one of the aims of this paper is to focus on conducting a systematic literature review about ID and IC with slot filling in which we are not focusing on the implementation, but we are going deeper into the investigation and discussion of the framework, methodology, and techniques or algorithms that can be implemented in ID and IC with slot filling.Furthermore, what we are reviewing differs from [51,53,55], which reviews the topic of intention mining, ID methods in the human-machine dialogue system, and methods of two tasks from the independent model to the joint model, respectively.This paper is divided as follows: Section II presents the conducted systematic review methodology that consists of the definition of research questions, search phases, inclusion and exclusion criteria, and paper eligibility screening.Section III presents the results of the systematic review consisting of answers to the research questions, and finally, the conclusion of the systematic review and our thoughts on directions for future work are presented in Section IV.

II. RESEARCH METHOD
The systematic review (SR) in this paper was conducted by implementing the preferred reporting items for an SR and meta-analysis (PRISMA) approach as done in [13], in which PRISMA is an evidence-based minimum set of items used to guide the development and structure of SRs and other metaanalysis.According to [14], PRISMA is designed to help researchers to perform literature reviews systematically and transparently.and report how the review was done in a manner which leads to the findings.Thus, by implementing the PRISMA approach in our paper, the reviewing protocol includes three steps, namely definition of the research questions, search phases, and specification of inclusion and exclusion criteria.The specification of these steps is described in the following sections.

A. Definition of Research Question
This SR is organized to encompass the range of research examined by classifying and evaluating previous related articles.To correctly explain the coverage rate of existing works, the research questions must first be defined.We can gain various insights by examining comparable works, which can subsequently assist researchers in coming up with new insights.Table I lists the research questions that were considered in our SR.

B. Search Phase
Defining the information sources is the initial stage in conducting our SR.As shown in Table II, several academic databases, digital libraries, and open-access search engines have been consulted.In order to locate publications that are pertinent to our setting, the following stage entails creating procedures for examining the scientific and technical documentation that these searches produced.The process is built around these two steps: (i) it is necessary to first www.ijacsa.thesai.orgdetermine the search phrases from the earlier research questions in order to create a list of keywords; (ii) it is necessary to create the queries that will be used to locate and gather all connected results, thus the Boolean operators AND/OR will be used to find and gather all related results in accordance to ID and IC with slot filling.About 221 papers were found overall in the first phase with search terms that might be most pertinent in the title.The search terms used for this paper are shown in Table III.The answer to this question allows us to identify the work's domain, when, and where the research studies have been conducted.

RQ2.
What is the methodology used to develop ID and IC with slot filling?
The answer to this question illustrates the steps and phases in developing ID and IC with slot filling.RQ3.What are the techniques or algorithms that can be used to implement ID and IC with slot filling?
The answer to this question helps to identify the most suitable techniques or algorithms that can be adopted into the implementation of ID and IC with slot filling.RQ4.
Which evaluation method was used and what are the main results that have been drawn based on the evaluation method used?
The answer to this question identifies the methods used to evaluate the performance of ID and IC with slot filling and presents the main results or outcomes of the studied works.

C. Inclusion and Exclusion Criteria
We employed a set of inclusion criteria and exclusion criteria to identify pertinent papers and to narrow search results (see Table IV).Papers that do not address the exclusion criteria are disregarded, and a screening procedure is used to identify papers that are pertinent to our setting.The following three inclusion criteria phases form the basis of the screening procedure: 1) An abstract-based step: It using details and keywords from publication abstracts, we eliminate irrelevant results.Articles were kept for additional screening if their abstracts met at least 60% of the inclusion criteria.
2) Full-text-based step: We eliminate any results that did not address or make reference to the research terms listed in Table III, i.e., any publications that only cover a small portion of the search terms mentioned in their abstracts.
3) Quality-analysis-based step: We perform quality analysis on the remaining results and discard any that do not meet the requirements listed below:  C1: The paper discusses a comprehensive approach and methodology to ID and IC with slot filling.
 C2: The paper includes the technical implementation of the proposed solution.
 C3: The paper references additional works.
 C4: The paper discusses the outcomes that were found.

D. Systematic Search Strategy Procedure
The PRISMA systematic search-strategy procedure includes four core processes which are identification, screening, eligibility, and inclusion were used to choose the pertinent publications for this review.In the identification procedure initially got 221 records (n=221).After the screening procedure, the results were pared down to 149 (n= 149) after duplicates were eliminated.Next subjected to eligibility criteria based on the title and abstract, acquiring 69 (n-69); finally, eligibility criteria based on the complete text allowed us to acquire 25 pertinent studies.To glean the findings reported in the next part, a thorough analysis of these 25 studies was conducted.www.ijacsa.thesai.orgOf the 25 papers that had been screened from the review process, all of these papers discussed the ID and IC with slotfilling topics by using various techniques or algorithms.Fig. 1 illustrates the number of papers from each stage in the SR process in a graphical representation of the papers that have been reviewed.

III. RESULTS AND DISCUSSION
This section includes a discussion of the review's findings about the earlier proposed research questions.This review is made up of 25 publications that were carefully chosen to address the topic of ID and IC with slot filling.These answers help us to know the related recent literature, methodologies used, techniques or algorithms that can be used to implement ID and IC with slot filling, as well as the methods that can be used to evaluate the performance of the model.Answer to research question RQ1: What is the distribution per year, domain application, and publisher of the published papers related to ID and IC with slot filling?
The papers that have been gathered and which relate to ID and IC with slot filling originated from several domains, including medical, education, electrical, music, economy, and airline.Fig. 2 and Fig. 3 display the distribution of the selected papers by publication year and source respectively.Answer to research question RQ2: What is the methodology used to develop ID and IC with slot filling?Numerous methodologies were used for developing ID and IC with slot-filling phases in which Table V displays the list of methodology names and the phases that are involved in developing ID and IC with slot-filling.After reviewing and analyzing all of the papers, we identified that the majority of the papers integrate several methods resulting in a new methodology or approach.Papers 1, 22, and 25 have integrated Bidirectional Long Short-Term Memory (BiLSTM), Bidirectional Encoder Representations from Transformers (BERT), and Joint BERT with Conditional Random Forest (CRF) respectively.In addition, since there are various and multiple ways in which ID and IC with slot filling have been implemented, almost all of the papers' implementations differ from one another, even though they might seem similar.
The second paper implements a novel non-aggressive joint model, and the fourth paper implements a GloVe approach in their implementation.Some papers implement a unique methodology or approach such as paper 9 which implements a Capsule Network or Capsule-NLU, paper 13 implements a Multi-level Shared-private Framework, paper 14 implements a Deep Concurrent Multi-Task Paradigm, a Dual pseudolabelling and dual learning methods approach by Paper 16, a Generative and Classification-based approach is implemented by paper 17, and paper 23 implements an Attention-based RNN and Slot-Gated mechanism.Meanwhile, the remaining methodologies implemented by the remaining papers may be further viewed in Table V and Table VI.
In addition, we removed papers [38][39][40][41][42][43][44][45][46][47]54]  There are several techniques or algorithms that can be implemented for ID and IC with slot filling.According to Table V, these 25 papers have implemented various techniques.Some papers have similarities in their approach with one another but still differ and report unique results.
Initially, Long Short-Term Memory often referred to as LSTM networks are an extension of Recurrent Neural Network (RNN) whereas RNN is a type of neural network that is specially designed for sequence prediction problems since it imposes an order on the observations that must be preserved when training models and making predictions [72].The LSTM architecture consists of a set of recurrently connected subnets that are also known as memory blocks, which can be thought of as a differentiable version of memory chips in a digital computer [73].Not only that, LSTM has been primarily implemented for problems such as speech modelling and language translation [72] and it has achieved state-of-the-art performance for IC and slot filling [74].In addition, an LSTM network is a special variant of an RNN such that it overcomes stability bottlenecks encountered in traditional RNNs, thus www.ijacsa.thesai.orgenabling its practical application [72].Furthermore, an LSTM can also utilize its internal memory in such a way that its predictions are conditional on the recent context in the input sequence, not what has just been presented as the current input to the network.As an example, an LSTM model can show one observation at a time sequentially, and it can learn what observations it has seen previously and which are relevant.From there, it will think and train on how prediction can be done based on their observations made earlier [72].
Furthermore, Bidirectional Long Short-Term Memory (BiLSTM) is a further refinement of LSTM [24] which integrates the forward hidden layer and the backward hidden layer [71], which can acquire and access both the previous and subsequent contexts.Since LSTM exclusively exploits the historical context, unlike BiLSTM, as a result, BiLSTM is better than LSTM at resolving the sequential modelling problem [71].LSTM and BiLSTM have been used to classify texts and have achieved some progress [19,22,24,[26][27][28].
According to our review, the majority of the papers implement BiLSTM whereas this technique was implemented by 10 papers -7, 10, 12, 14, 15, 16, 19, 22, 23, and 24.In addition, these papers have integrated BiLSTM with various other techniques which include Conditional Random Forest (CRF), Concurrent Neural Network (CNN), and Bidirectional Encoder Representations from Transformers (BERT) to perform both ID and IC with slot-filling tasks.Some of these papers only implement BiLSTM such as papers 15 and 23 as their main technique.In addition, papers 7, 22, and 24 have integrated BiLSTM with CRF, papers 10 and 14 have integrated BiLSTM with CNN, and are followed by papers 12 and 16 which have integrated BiLSTM with BERT respectively.However, there is also a paper that integrates BiLSTM with two more techniques, and paper 19 has integrated BiLSTM with both CNN and BERT to perform ID and IC with slot filling.The integration between two to three techniques may help to improve the proposed model's performance.Paper 15 [27] No specific framework name mentioned BiLSTM Paper 16 [28] Dual semi-supervised NLU with Semantic-to-sentence Generation (SSG) BiLSTM+CRF+BERT Paper 17 [29] No specific framework name mentioned JointBERT+XLM-Roberta Paper 18 [30] No specific framework name mentioned KoBERT, KLUE-RoBERTa, mBERT Paper 19 [31] Multitask Learning with Knowledge Base for Joint Slot-Filling and Intent-Detection (MTL) BERT+BiLSTM +CNN Paper 20 [32] No specific framework name mentioned CNN+LSTM+Rules Paper 21 [33] Tagger and Classifier LSTM+BERT Paper 22 [34] MTL-Fully Shared Network (MTL-FSN) and Hierarchical-MTL (H-MTL) BiLSTM+CRF Paper 23 [35] Slot-Gated Modeling BiLSTM Paper 24 [36] SF-ID Network BiLSTM+CRF Paper 25 [37] No specific framework name mentioned JointBERT+CRF www.ijacsa.thesai.orgIn recent times, the BERT framework has been investigated for jointly identifying the intent and slots of an utterance [74,75].The model architecture of BERT is a multi-layer bidirectional transformer encoder that is based on the original Transformer model [74,[76][77] whereby it jointly conditions both left and right contexts in the Transformer [75,77].Furthermore, the input representation for BERT is a concatenation of WordPiece embeddings [8], positional embeddings, and segment embedding [74].
The BERT model is pre-trained with two strategies on large-scale unlabeled text which refers to Masked Language Model (MLM) and Next Sentence Prediction (NSP) [74][75]77].In addition, MLM's function is to randomly mask in order to avoid a token observing itself in a multi-layered context and on the other hand, NSP aims to capture useful information for sentence pair-oriented tasks [75].Nonetheless, BERT models greatly contribute to enhancing NLP and it is the most commonly used transformer architecture [78].Furthermore, the BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks such as in QAS and language inference, without substantial task-specific architecture modifications [77][78]80].

The most implemented technique other than BiLSTM is
Bidirectional Encoder Representations from Transformers known as BERT in which BERT has been implemented by nine papers -2, 3, 5, 8, 12, 14, 16, 19, and 21.These papers www.ijacsa.thesai.orghave also integrated BERT with CRF, LSTM, BiLSTM, CNN, RNN, or Regular Expressions (RE).Firstly, paper 5 has implemented BERT as its sole technique for ID and IC with slot filling.However, of the remaining papers, papers 2 and 8 have integrated BERT with CRF, paper 21 has integrated BERT with LSTM, and paper 12 has integrated BERT with BiLSTM.Papers 14 and 19 have integrated BERT with BiLSTM and CNN meanwhile paper 16 has integrated BERT with BiLSTM and CRF.There is also a paper that has integrated BERT with four more techniques, making a total of 5 techniques including BERT.Paper 3 has integrated BERT with LSTM, CNN, RNN, and Regular Expressions (RE).
In addition, BERT can be extended into several extensions such as JointBERT, RoBERTa, KoBERT, mBERT, and KLUE-RoBERTa.Even though these techniques are extensions of BERT, they still differ and are different in terms of performance, whereas JointBERT has been implemented by papers 1, 11, 17, and 25.Paper 11 only implements JointBERT in its implementation, meanwhile, papers 1 and 25 have integrated JointBERT with CRF, and paper 17 has integrated JointBERT with RoBERTa.This is followed by RoBERTa which has been implemented by papers 13 and 17 and finally, paper 18 has integrated KoBERT, KLUE-RoBERTa, and mBERT in their ID and IC with slot filling implementation.
The remaining papers (3, 4, 6, 9, and 20) have implemented a technique that is not vastly and widely implemented such as some of the papers have implemented Concurrent Neural Networks (CNN), Recurrent Neural Network (RNN), Deep Learning Modified Neural Network (DLMNN), Mayfly Optimization (MFO), Named Entity Recognition (NER) in their implementation of ID and IC with slot filling.More details on each of these paper's implementation of techniques or algorithms can be further viewed in Table V.
Answer to research question RQ4: Which evaluation method was used and what are the main results that have been drawn based on the evaluation method used?
There are various ways of evaluating the performance of a model after it has been implemented by using various techniques.However, there are several main evaluation metrics or methods that are generally used to evaluate the performance of an ID and IC with a filling model.Such methods include accuracy, F1-score, precision, and recall which is mainly applied for both ID and slot filling task, respectively.Besides that, there are also other evaluations conducted within the 25 papers such as sentence accuracy, semantic error, G-measure, and so on.Technically, model evaluation is crucial to check and validate whether the model is well-functioning and works correctly according to the specifications and the requirements or not.
The majority of these papers evaluate the performance of their ID and IC with slot filling model by using accuracy and F1-score for both intent, sentence and slot performance.
Initially, accuracy is the most used empirical measure and it can be defined as a ratio of accurately classified data items to the total number of observations [48,80], making it one of the most suitable methods to evaluate the performance of ID and IC.Despite it being the most used technique for testing, however, accuracy does not distinguish between the number of correct labels of different classes [79] and is valid only when the evaluation of classification is well-balanced and is not skewed, and there is no class imbalance.Hence, it is not the most appropriate performance metric in some situations especially in a case where the target variable classes in the dataset are unbalanced.This is because, when the model predicts that each point belongs to the majority class label, the accuracy will be high but the model is not accurate because of the imbalances.
Next is a precision technique which is a measure of correctness that is achieved in true prediction, or it also means how many predictions are positive out of all the total of positive predictions.Precision is calculated by the ratio of the total number of correctly classified positive classes divided by the total number of predicted positive classes [68,80].However, precision can only work properly and accurately when the FP is higher than the FN [48,68,79] and is usually suitable for a system that has a yes or no, true or false result such as e-mail spam detection.
The following evaluation technique is a recall technique which is measured as the actual observations which are predicted correctly.Recall can also be called sensitivity and is the most suitable evaluation technique to be used when the researcher wants to capture as many positives as possible [48,68,80].Recall is a ratio of the total number of correctly classified positive classes divided by the total number of positive classes, or in other words, out of the observations that are positive, how many of them have been predicted by the algorithm [48].
On the other hand, the F1-score, also known as the Fmeasure, uses both precision and recall and is one of the best evaluation techniques to calculate the performance of an algorithm [48,80], especially in the evaluation of text classification and identification tasks [59,80], because it balances out the precision and recall, whereby if the precision is low, the F1 is low and if the recall is low, then the F1 is also low.Plus, the F1score evaluation method has been generally used to evaluate the performance of the slot-filling task.Therefore, in this paper, we have identified the evaluation methods used in these papers, and Table VII depicts the list of evaluation methods and their results.

IV. CONCLUSION
This paper summarizes the distribution of papers per year, domain application, and source of the published papers related to ID and IC with slot filling along with the proposed frameworks, techniques or algorithms used, the methodology that has been implemented research, as well as the methodology phases, in each paper.In addition, a systematic review has been conducted using the PRISMA approach, and its selection process of identification, screening, eligibility, and inclusion was reported in detail.A total of 25 works were selected from the 221 works that have been initially extracted, based on their relevance to the four main research questions we have developed.In addition, from the review, we discovered that the techniques and algorithms that are generally and widely used to implement ID and IC with slot filling are Bidirectional Long Short-Term Memory (BiLSTM), Bidirectional Encoder Representations from Transformers (BERT), and Conditional Random Forest (CRF).Not only that, for the evaluation methods, we have looked at various evaluation techniques for ID and IC with slot filling, and we have identified that the majority of the past works' models have been evaluated by the accuracy and F1-score evaluation methods.Therefore, our review on this topic has led us to conclude that ID and IC with slot filling are still crucial and indeed still in need of evolution especially for the low-resource languages other than English such as Malay, Chinese, Tamil, or Vietnamese.Hence, the development of ID and IC with slot filling for low-resource languages requires further studies, implementation, and optimization, in order to provide timely future work opportunities for researchers who are interested in this integrative field.
In the future, this research on intent and slot-filling recognition aims to make these systems more accurate, adaptable to different domains, and responsive in real-time interactions.The focus will be on combining various data sources, like text, speech, and images, to better understand user intents, making conversations more natural.Personalizing models for individual user preferences and ensuring ethical considerations, such as minimizing biases, will be crucial.Additionally, efforts will be directed toward making models interpretable and capable of handling multiple languages seamlessly.

Fig. 2 .
Fig. 2. Distribution of selected papers by publication year.
and [2-15, 49-50, 56-70] from the list due to their related survey paper status.Furthermore, papers [2-15, 49, 56-70, 81] were removed due to their unrelatedness to ID and IC with slot filling or QAS topic relevance.Answer to research question RQ3: What are the techniques or algorithms that can be used to implement ID and IC with slot filling?

TABLE IV .
LIST OF IC AND EC Studies should be published/in-press at a journal or conference.Studies with missing full text.Studies should provide answers to the research questions.Papers that are not directly relevant to the ID and IC with slot-filling topics.The search is performed based on the title, abstract, and full text.

TABLE V .
LIST OF PROPOSED FRAMEWORK AND TECHNIQUES

TABLE VI .
LIST OF METHODOLOGIES AND THEIR RESULT

TABLE VIII .
LIST OF DATASET, DOMAIN AND LANGUAGE