Question Answering Systems: A Systematic Literature Review

—Question answering systems (QAS) are developed to answer questions presented in natural language by extracting the answer. The development of QAS is aimed at making the Web more suited to human use by eliminating the need to sift through a lot of search results manually to determine the correct answer to a question. Accordingly, the aim of this study was to provide an overview of the current state of QAS research. It also aimed at highlighting the key limitations and gaps in the existing body of knowledge relating to QAS. Furthermore, it intended to identify the most effective methods utilized in the design of QAS. The systematic review of literature research method was selected as the most appropriate methodology for studying the research topic. This method differs from the conventional literature review as it is more comprehensive and objective. Based on the findings, QAS is a highly active area of research, with scholars taking diverse approaches in the development of their systems. Some of the limitations observed in these studies encompass the focused nature of current QAS, weaknesses associated with models that are used as building blocks for QAS, the need for standard datasets and question formats hence limiting the applicability of the QAS in practical settings, and the failure of researchers to examine their QAS solutions comprehensively. The most effective methods for designing QAS include focusing on syntax and context, utilizing word encoding and knowledge systems, leveraging deep learning, and using elements such as machine learning and artificial intelligence. Going forward, modular designs ought to be encouraged to foster collaboration in the creation of QAS.


I. INTRODUCTION
Information retrieval has undergone tremendous transformation in the recent past. Today, modern information access systems enable us to retrieve documents that may be linked to the input we supply the system. However, in most cases, the user is left to extract the important information from the retrieved documents. For example, the question "Who has finished a marathon race in under two hours?" Should return the answer "Eliud Kipchoge". Instead, the user is supplied with a list of associated documents to explore and reveal the correct answer. Regardless of this limitation, Question Answering System (QAS) has been noted as an area with great potential in modern computing [3,4,5]. QAS allows access to information in a very natural way, that is, by asking questions and getting related responses in natural language [9,10,12,14,41,70].

A. Current Rsearch Limitations
Because of the great potential and usefulness of QAS, it will definitely be a subject to major advancement in the forthcoming years. Nonetheless, there are significant challenges that will require urgent resolving if we are to attain full potential of QAS. One of these challenges is the existing asymmetry between natural language and machine language. The asymmetry has delayed the ability of question answering systems to understand natural language-based input from users and interpret it correctly for an accurate retrieval of responses [43, 44, 45, and 46]. The language asymmetry has been a compound of several factors such as classification, construction of correct questions, ambiguity resolution, deficiencies in semantic equivalence recognition, and poor identification of sequential association in complex queries [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, and 71]. Besides, there has also been a lack of accurate validation mechanism to guarantee accurateness in the responses produced by QAS (689). The unresolved nature of most of these challenges in the QAS literature is the most significant limitation of QAS research.

B. Research Motivations
Despite the above limitation, among other challenges facing the development of perfect QAS, researchers have not given up on the quest to develop a more accurate QAS. Across all ages, human beings have always exhibited an acute thirst for information and a difference between this information and knowledge has always existed [17,18,19,25,32,33,35,37,38,40]. Owing to this difference, consistent research has led to the maturity of modern information retrieval systems such as web searches, which allow users to access information related to their interests at their fingertips. QAS is a modern and specialized method of information access that seeks to bridge the gap between the information that users may have and the relevant knowledge. In a typical internet browsing session, an internet user is not interested in the relevant webpages that come up when making internet searches, rather, the interest is in the answers to the questions defining the searches. Bridging the gap between the questions and answers has been the main motivation of modern QAS research and the great potential of QAS makes it an exciting field to explore.

C. Problem Statement
The working mechanism of modern question answering systems can be broken down into three broad stages, namely question analysis, document analysis, and answer analysis. The question analysis stage entails the parsing and classification of questions, as well as the development of queries that can be interpreted by the machine. The second stage of document analysis involves the extraction of documents that are relevant to the questions as interpreted by the machine and identification of suitable answers. The third stage of answer analysis involves a further breakdown of the individual documents to extract candidate answers and rank them according to their relevancy to the question.
In all the three stages, a combination of techniques from AI, NLP, statistical processing, pattern identification and matching, and information retrieval and extraction is used [2, 74, and ]. The majority of modern question answering systems incorporate most if not all the above techniques to deliver improved accuracy of results. In fact, the taxonomy of the modern QAS is derived from the techniques that forms the basis of the systems at different working stages. Such include, Linguistic-based QAS, Statistical-based QAS, and Patternbased QAS approaches. Even the most sophisticated of these approaches has always faced the challenges of language asymmetry, among other challenges which have delayed the attainment of ultimate perfection in QASs. The desire to resolve the challenges has been a significant impetus for QAS research. A review of the literature on QAS is needed to understand the extent reached by current QAS research in resolving the outstanding challenges to the attainment of the perfect question answering system.

D. Research Objectives
This paper provides a systematic review of the modern question answering systems' literature. The areas of interest to the paper are the current state of QAS research and identification of the most significant gaps and limitations in the reviewed studies. The three objectives of this research are broken down into three research questions.

E. Research Questions
RQ1: What is the current state of QAS research?
RQ2: Which are the most significant gaps and limitations in the reviewed studies? RQ3: What are the most effective techniques used in designing QAS?

F. Summary
The remainder of the paper is organized as follows; following in the next section is a review of the recent literature on QAS then a methodology. The research methodology section emphasizes on planning, the review process, and reporting the results of systematic review. Lastly, the paper offers a conclusion relevant to the three research questions and the reviewed literature.

II. RELATED WORKS
This section of the paper provides a review of modern studies on QAS. However, it is important to appreciate that the development of the modern question answering systems has not been an event, but a process with a rich history. Modern studies have built on the findings of older studies to better the perfection of modern information retrieval systems. Advanced research on QAS extends back four decades ago and has grown parallel to the whole natural language processing field. Over the last four decades, hundreds of question answering systems have been developed following tremendous research efforts.
QASs are information retrieval-based tasks that process questions posed in natural language by using pre-organized databases or a large corpus of documents published in natural language. In another way, QAS accepts questions in a natural language and returns a collection of related responses in natural language. The demand for systems with this capability has been exponential owing to the growing quest for precision in the wake of increasing data and information. This has transformed it into a field with growing academic research interest from around the world.
Like any other technical field in the modern day computing, there are key terms that are related to QAS. Defining these terms will help us understand the concepts used in QAS. One of the key terms in the QAS literature is "Question Phrase", which is the section of the question that contains the search items (636). Another term is the "Question Type" which is identifies the kind of question given its purpose (636). In the QAS literature, the "Answer Type" means the classification of items that the question is seeking (636). "Question Focus" is the property related to the items of the sought by the question and "Candidate Passage" are the items identified by the search system as relevant to the search question. A candidate passage can be anything from a document or sentence in natural language that is retrieved by the search system. A "Candidate Answer" is response ranked as among the most suitable answer to the search question.
QAS literature divides the working mechanism of question answering systems into three broad modules, namely, question processing, document processing, and answer processing. As noted earlier, the question processing stage entails the parsing and classification of questions, as well as the development of queries that can be interpreted by the machine. The goal of the first stage is to identify the type, which defines the focus of the question. The focus of the question is identified by classifying as a "What", "Why", "Who", "Where "or "How" question so that the expected answer can be determined (141). This is important in bettering answer detection, which ultimately leads to acceptable accuracy of the returned answers.
The role of the document processing stage is to select a collection document that are related to the question posed by the user. The document processing stage also involve extracting few paragraphs from the selected documents that conform to the focus of the question. In modern QAS, this stage generates a dataset or a neural model which acts as the pseudocode for the process of answer extraction [13, 76, 77, and 78]. The data that is retrieved in the second stage is organized with preference inclined to those that are highly relevant to the question.
The third stage of answer processing involves a further breakdown of the individual documents to extract candidate 496 | P a g e www.ijacsa.thesai.org answers and rank them according to their relevancy to the question (5). This is often the most challenging task of the three. It involves further analysis of the document analysis stage to select the most suitable answer to the question. The complexity emanates from the need to make the answer as simple as possible even when it requires combining of information from different neural models.

III. RESEARCH METHOD
This systematic review followed the guidelines provided eight steps, amongst which the most significant are the purpose for reviewing the literature, searching the literature, screening the literature, quality evaluation, and data abstraction. The flowing section outlines the stages of the systematic literature review conducted in completing the paper.

A. Planning the Review
The systematic review of literature started with the creation of an elaborate plan. The key components of the plan included identifying the resources required and defining the timeframes for the completion of the process. In addition to identifying the various scholarly databases, other components of the plan entailed timelines for deriving research questions from the topic, creating the search strategy, determining the search terms and strings, implementing the search strategy, selecting the most relevant studies, reviewing those studies, and writing the research paper. The detailed phases are provided in the following sections.

B. Specifying the Research Questions (RQS)
Over the years, we have seen an exponential expansion of digital databases that have increasingly pushed for sophisticated tools of information retrieval. With the data at hand, the challenge has always been to develop efficient techniques of consuming the data, which involves using the information we have to extract knowledge from the digital information. One of such techniques is the question answering system which allows human users to interact with computers in the most natural way as they seek answers to their questions from large corpus of unstructured data. In this review, we create three questions, as noted under section 1, subsection 4 to guide this systematic review of literature, which seeks to understand the current state of QAS research and identification of the most significant gaps and limitations in the reviewed studies. C. Defining Search Strategy 1) Data retrieval: The first step of the review was to index the journals and papers, including conference proceedings written and published in English. A date filter of the year of publication was limited to between 2015 and 2020. The journals and papers, including conference proceedings were pulled from five digital libraries, ACM Digital Library, IEEE Xplore, Science Direct -Elsevier, Springer Link, and Wiley. This was achieved using a conceptual research string containing the keywords in the research questions.
2) Screening of the journals and papers: The choice of the papers included in this review were determined using an inclusion and exclusion criteria. The choice of the papers included in this review were determined using an inclusion and exclusion criteria to ensure that the study was explicit about the journals and papers included in the research. Only papers that met the criteria were included in the review. Table I provides more details about the inclusion and exclusion criteria used in recruiting reviewed literature.

D. Defining Data Sources(Eligibility of the Journals and Papers)
To ensure that only relevant papers were included in the review. The scheme involved answering quality assessment questions with a yes = 1.5, partial = 0.5, a no = 0 depending on a preliminary analysis of the individual papers. The papers that had been preselected but had a score of less than 0.5 were excluded from the review which resulted in a sample 130 papers. Purposive exclusion of 50 papers was done to remain with 80 papers.

E. Defining Search Keywords
The initial search words were derived from the three research questions. Additional search words were determined based on the results of the initial search. Table II highlights

F. Conducting Review Process
The process entailed identifying records, screening them, determining their eligibility, and listing the included studies in accordance with PRISMA. Fig. 1 summarizes the search protocol.

G. Selection of Study
The articles chosen for this study were determined using two-level inclusion and exclusion criteria. The identification process produced a total of 350 articles, with 320 being found through database searching whereas 30 were identified using other sources. After the exclusion of duplicates, 189 studies remained. The screening process resulted in the elimination of 59 studies. The resultant 130 studies were assessed for eligibility and 50 were excluded with reasons. Accordingly, eighty studies were included in the systematic review: 15 were qualitative whereas 65 were quantitative.

A. Primary Studies Overview
Eighty studies formed the basis of the study: 20 on QAS based on syntax and context; 20 on QAS based on word encoding and knowledge systems; 20 based on forms of deep learning [73, 74; and 20 based on modern components of machine learning and artificial intelligence. It was imperative to select an equal number of studies in each of the four categories to highlight the main directions in QAS research. In addition to being relevant to the research questions, the selected studies adhered to the inclusion and exclusion criteria. This means that the selected studies were authored in English, published after January 1, 2018, and were academic research work found in scholarly journals and conferences. Out of the eighty studies surveyed, only sixty-nine were primary studies.

B. Answering the Research Questions
This section discusses the relationship between the selected studies and the research questions. The relevant research articles extracted are utilized to answer each research question as shown in Table III. This research paper conducted a systematic review of literature, rather than the conventional literature review, due to the need to attain scholarly vigor. In addition to enabling the researcher to obtain the most relevant studies, systematic reviews of literature adopt an objective perspective, which limits biases and enhances the usability of the research findings. Generally, systematic reviews of literature are implemented to summarize existing evidence in a given area, identify gaps for further investigation, and provide a framework for positioning new research activities. In this study, appropriate search terms were utilized in conjunction with various Boolean operators and search strategies to obtain studies to answer the three research questions.
The first research question (RQ1) focused on providing insights concerning the current state of QAS research. The idea is to provide the current understanding of QAS systems in terms of the approaches utilized, their effectiveness and accuracy, and potential areas of improvement. Accordingly, this study explored studies published after January 1, 2018. In total, sixty-nine studies were identified and examined to provide contemporary understanding of QAS research.
The second research question (RQ2) aimed at identifying the most significant gaps and limitations in the reviewed studies. One of the major gaps and limitations is the inability of the developed QAS systems to be utilized for a variety of tasks [1, 4, 6, 7, 15, 24, 30, 74, 75, 80, 81, 82, 83, 84, and 85]. From an ideal standpoint, a QAS system should be applicable to different questions and settings. For example, Utomo [4] developed a QAS system that can only be used with the Quran. Another critical limitation is that a typical QAS system exhibits weaknesses associated with the model or algorithm used [4,23,29,31,42,47,60]. For example, Jovita developed a model that required a long time to give an answer (about 29 seconds) [42]. Besides efficiency, some models are associated with poor precision. Deep learning models, in particular, require quality and large volumes of training data [47]. Thirdly, some of the QAS systems require question templates, selection of hot terms, and standard datasets, which limits their applicability in the practical environment [8,13,16,20,21,22,27,28,34,36,39,48,49,62,69]. Other limitations of the studies relate to the thoroughness of the assessments and explanations of the QAS 498 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 3, 2021 systems developed [11. 26, 40, 63, 72, 79]. For instance, Abdiansah developed a QAS system that was tested on only three search engines [11].
The final research question (RQ3) aimed at identifying the most effective techniques utilized in the design of QAS systems. Based on the search conducted, the four most effective approaches are syntax and context; word encoding and knowledge systems; forms of deep learning; and components of machine learning and artificial intelligence. Each of the four approaches was studied using 20 research articles. The syntax and context approach places questions within their context, both in terms of the semantic information carried by noun, preposition, and verb phrases and other syntactic entities, as well as the discourse roles relating to the entire question-answering activity. The word encoding and knowledge technique entails utilizing knowledge bases, in combination with question encoding at the character level or using word embedding, to answer a question. The deep learning technique encompasses multiple layers of algorithms to progressively extract high-level features from a question to enable accurate answering. Finally, some question answering systems employ diverse components of machine learning and artificial intelligence, other than deep learning, to answer questions.

V. FINDINGS
This systematic review of literature demonstrates that the current state of QAS research is highly divergent. It appears that different scholars are setting out to develop their individual QAS systems from scratch. This trend could be explained by the fact that QAS is an emerging field. The diversity in QAS techniques means that it is almost impossible to compare them objectively. There is also an emerging trend of combining different components, which makes it difficult to evaluate the effect of each component individually. Accordingly, the adoption of a modular approach could be helpful as it would enable the scientific community to contribute by developing new plugins to improve or replace existing ones. Despite the challenges, QAS research is making positive strides towards the creation of accurate question answering systems.
The review also highlights various significant gaps and limitations in QAS research. A key limitation identified is the highly focused nature of the QAS developed. In addition, the models utilized have weaknesses, which limits the accuracy and efficiency of the entire QAS. Deep learning models, while suited to QAS applications, require vast amounts of quality training data during their development [61, 62, 63, 64, 65, 66, 67, and 68]. Without such data, their effectiveness and applicability reduce significantly. In research studies, particularly those targeting machine learning, the availability of unbiased training data is often a challenge. Moreover, some of the QAS developed only work well with standard datasets. Accordingly, when testing them, researchers are likely to obtain high accuracies. However, in practical settings, standard datasets are unavailable. Furthermore, the research methodologies adopted by the different scholars were deficient as some of the QAS developed were not evaluated comprehensively.
The design of QAS can take one of the four approaches identified. The first one encompasses examining the syntax of the question and the context in which it is placed to enable accurate answering. The second approach involves encoding words contained in the question and then utilizing knowledge bases to find the correct answer. The third method entails utilizing some forms of deep learning, which enables a progressive extraction of information from questions during the answering process. Fourthly, artificial intelligence and machine learning are routinely applied in question answering systems.

A. Research Limitations
The findings of this study must be understood within the limitations encountered. One major weakness is that the systematic review of literature was limited to studies published in English. While this requirement was necessary to ensure that the selected studies were understandable to the author, it is possible that some helpful studies were eliminated. Another limitation is that only QAS studies published after January 1, 2018 were surveyed. This criterion might also have limited the inclusion of potentially helpful studies despite being fairly older. Besides, the systematic review of literature included studies that had different methodological weaknesses, including inadequate QAS system evaluation. Accordingly, limitations in individual studies affected the overall strength of this systematic review.

B. Research Conclusion
The objectives of this study entailed providing a picture of the current state of QAS research, discuss gaps and limitations in the QAS research, and explore effective methods utilized in the design of QAS. The study adopted the systematic review of literature research methodology and encompassed examining relevant studies published in English after January 1, 2018. A total of eighty studies were selected for the research study. However, only 69 were relevant to the first research question, 55 were relevant to the second research question, and 41 answered the final research question. Based on the findings, QAS research literature is growing but is highly divergent as scholars adopt different techniques. The main techniques include syntax and context, word encoding and knowledge systems, deep learning [21,28,62,66,73,83], and artificial intelligence and machine learning. Some of the significant gaps identified include ineffectiveness and inefficiencies associated with the models adopted, the highly focused nature of QAS systems developed, the reduced practicality of QAS due to the need for standard datasets or question formats, and the inability to test QAS thoroughly. Future research ought to focus on the development of QAS based on modular approaches to enhance collaboration within the scientific community. Future studies should also examine the applicability of some of the developed QAS in practical environments.