Software Security Static Analysis False Alerts Handling Approaches

False Positive Alerts (FPA), generated by Static Analyzers Tools (SAT), reduce the effectiveness of the automatic code review, letting them be underused in practice. Researchers conduct a lot of tests to improve SAT accuracy while keeping FPA at a lower rate. They use different simulated and production datasets to validate their proposed methods. This paper surveys recent approaches dealing with FPA filtering; it compares them and discusses their usefulness. It also studies the used datasets to validate the identified methods and show their effectiveness to cover most program defects. This study focuses mainly on the security bugs covered by the datasets and handled by the existing methods. Keywords—Software security; static analysis; false alert reduction; source code dataset; security bugs


I. INTRODUCTION
Software coding and implementation have grown fastly during the last years. This is due to the rapid migration towards bits and the extensive use of digital technologies. The more software applications become relevant, the more security assurance of programs gets essential. However, software security defects increased due to implementation failures regarding security best coding practices. Escaping software faults into later stages of software development will increase the maintenance cost [1], [2]. Also, after application deployment, cyberhackers will try to detect these coding vulnerabilities and exploit them to achieve their goals. Thus, coding review and auditing is a primordial task before software use.
Static Analysis Tools (SAT) play an essential role in automatically detecting these vulnerabilities and alerting the programmer, which reduces the auditing time, effort, and cost. SAT automatically examines the code for any programming defects without executing the code and generates alerts about possible errors. Alerts provide the auditor with useful information such as the location of the purported defect in the source code, the nature of the fault, and additional contextual information. However, the SAT still suffers from several issues, letting them underused in practice. Among them, this study focuses on the large number of warnings generated by SAT; most of them are false positives, which is a time-consuming and painstaking task to review them all.
One approach to deal with a large number of FPAs is by unsoundly processing source code. Almost all existing SATs are uniformly unsound [3]. Loops and unknown external libraries call, for instance, are a significant source of imprecision. Unsound SAT considers only a fixed number of loops while ignoring the rest and assumes any unknown external library call as predefined behaviors such as skip [3]. This unsoundness regarding loops and unknown external libraries causes the analysis to miss a significant amount of real bugs and reduce false-positive alerts.
In this study, any paper that sacrifices SAT soundness to reduce false-positive alerts is ignored. Ideally, an SAT must be precise and scalable while avoiding false positives.
Existing efforts dealing with the false-positive alert reduction face several challenges, mainly are: • Handling of a large code base will decrease SAT precision; most of them perform better in a small set of problems. Besides, processing a significant codebase causes the SAT over-approximation of the input program behavior, which may consider correct program properties as errors.
• Increasing SAT precision raises much more falsepositive alerts. The challenge is how to keep a high detectability rate without throwing FPAs.
• The inability of the SAT to get knowledge about the software architecture, its dependencies, and the manner of how data flows through the system, which may result in throwing FP alerts considered as potential errors [4].
So researchers are trying to solve one challenge or some of them to reduce false-positive alerts.
To our best knowledge, these different approaches have not been studied rigorously and comprehensively. Thus, the objective of this paper is the investigation of current methods dealing with false alert elimination. It mainly presents the most significant efforts in this field and their scalability in the last ten years. It defines new criteria to compare different approaches. Also, this study focuses on showing the most effective dataset used in the literature and provides statistics about them. Finally, the paper discusses the advantages and shortcomings of FPA handling approaches and presents recommendations to improve the SAT. This paper is divided into eight parts; after introducing the research subject in Section 1, it presents the related works in Section 2. The paper shows the research methodology for selecting the relevant articles in Section 3 and existing approaches identified categories in Section 4. The paper compares, in Section 5, the different methods used to reduce false alerts. Section 6 provides an overview of the used datasets, then discusses the shortcomings and proposes recommendations to II. RELATED WORKS In paper [5], authors studied the existing efforts aiming at combining static analysis and dynamic quality assurance techniques to improve SAT bugs detection with reduced false alerts. They finally selected 51 articles for their mapping study. Thus, they include only papers that consist of the integration of combined technologies so that the output of one method is the input of the second. However, this paper shows the different approaches categories and any possible combination used to improve SAT precision or reduce FP alerts.
Heckman et al in [6] investigates 18 research effort to identify actionable alert identification techniques. They categorize the approaches as classification or ranking methods. The authors also conducted a comparative study to identify the approaches having the best accuracy. In this effort, articles that improve SAT precision to reduce FP alerts, not only improving the bugs detection rate, are also studied.
Similar as [5], authors in the paper [7] identified 51 papers for their mapping study. They focus on the study of the existing static analysis tools and techniques to reduce false alerts. However, this article covers only methods handling false alerts.
The paper [8] surveys 79 articles that handle the enormous amount of FP alerts after their generation. The authors focus on the methods dealing with the reduction of SAT alert reports. While, this study considers all kinds of unique approaches that help minimize FP alerts, whether the method is for the refinement of the software source code, the improvement of SAT precision, or the post-handling of SAT alerts report.
It is worthy to note that all the reviewed papers by the above surveys were published four years ago since the last study [8] at our best knowledge published in 2016. Thus, this effort focuses on the recent papers fitting the selection requirement as maximum to provide researchers with a recent and accurate literature review.
This study outperforms the above surveys by: • the selection and presentation of relevant datasets to test and validate the SAT tools. It collects the different open source datasets along with information about their size and features (see Section VI).
• the presentation of the features used by the identified methods for their model training and alerts prediction or classification(see Section V).
• providing the reader with the different types of security bugs handled by the identified approaches alongside with the paper reference (see Section VI-B).
• the comparison of the different false alert handling techniques according to their scalability in order to study their ease of integration and application (see Section V).
• depicting ongoing projects and competition aiming at boosting the researches to improve SATs and at providing accurately labeled datasets (more details in VI). Several other existing studies, such as [9], [10], [11], [12], [13], evaluate the SAT in terms of precision and alert handling and conduct a comparison study between them. This paper has a different objective by only presenting the approaches that improve SAT alerts handling, not testing their precision.

III. RESEARCH METHODOLOGY
This survey starts by identifying relevant papers that deal with false alert reduction. Fig. 1 depicts the main steps to select pertinent articles and extract information from them.

A. Research Questions
The process of relevant paper selection goes through the precise definition of the research topic, enabling identifying the keywords used for the scientific database search. This study aims at answering the following questions:

B. Used keywords and Search Engine Configuration
The relevant keywords are determined based on the research questions identified in Section III-A. Keywords could be classified into four categories representing the most used terms and their synonyms. Then for each search round, a combination of keywords taken from each set is used. The used keywords are listed in Table I. The first category encompasses the most used names of program errors. The second category contains the different terms of alerts; more specifically, it focuses on false-positive alerts. The third category includes possible static analysis names that different researchers may use, and finally, the last category contains the used keywords to describe alert reductions.
So, this study makes 108 = 3 × 3 × 3 x 4 separate search strings rounds at Google scholar, which ranks research papers based on their relevance. It refines the search by showing only articles published after 2010 to ensure that the selected documents consider recent programming technologies and new trends of SATs. The first 50 papers that match all the searched keywords combinations are chosen. So, this paper identified 540 articles before proceeding with the selection process.

C. Relevant Papers' Selection Process
This section presents the paper selection process that consists mainly of the quick and peer review of the candidate articles from the previous steps. Papers are filtered quickly at the second filtering round based only on the title, abstract, evaluation, and conclusion. Only papers satisfying the following criteria are included in the final peer review: • papers that explicitly aim to reduce false alerts. Thus, any effort based on improving the precision of the static analyzer or modifying the software source code, or post handling of SAT alert reports is included.
• papers that have an evaluation and test of their approach.
Also, this study excludes papers that: • sacrifices the soundness of the SAT to reduce falsepositive alerts.
• aims only to detect true positive alerts without reducing FPAs.
• only surveys existing efforts without providing any new technique or approach to reduce FPAs.
• mostly uses similar techniques and datasets to another already selected paper. The aim is to keep the uniqueness and originality of each chosen article.
After this process, 30 relevant articles that summarize almost all approaches and efforts dealing with SAT false alert handling are finally selected. The distribution number of chosen papers according to the Scientific publisher databases are shown in Table II

D. Information Extraction Process
In this step, this study proceeds for peer review of the identified papers to extract the relevant and targeted information, which are: • the used approaches or techniques.
• the application level of the approach. It means if the proposed method deals with improving the precision of SAT or modifying the software source code before analyzing it, or post handling of SAT reports.
• the coverity of the approaches to detect most programming bugs since several articles only reduce false alerts generated by specific bugs.
• the human intervention effort during the false alert filtering.
• the FPAs reduction percentage, whether explicitly mentioned or could be deduced from the other metrics presented in the paper. In some articles, it is not possible to extract the FPA reduction rate due to the (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 lack of specific measures.
• the programming language of the examined application.
• the SAT used for code examination.
• the dataset used to evaluate the proposed approach.
All gathered information is carefully saved in an Excel sheet database created to facilitate their mining. The extracted data contains the required information to answer this study's research questions.

IV. FALSE ALERTS HANDLING APPROACHES: A CLASSIFICATION
To answer RQ1, the paper starts by identifying the used approach of each article and categorizes them based on the similarities of the used techniques. This study distinguishes mainly seven categories, as shown in

A. Machine Learning-based Approaches
Machine Learning is the science of teaching a computer how to learn from data and create a model used after that to predict/classify new data [14]. It works mainly with algorithms, not raw data. ML is widely used in the field of static analysis to improve the SAT precision or post-handle the SAT-generated alarms and predict their truthness (resp. falseness).
Authors in [15], [16] have similar works that consist of establishing a new classifier based on additional learning features, which is the program structure patterns that correlate similar false alarms. They use mainly Naïve Bayes, LSTM (long-short term memories), and SVM to predict new alerts. In [17]and [18], the authors propose a clustering-based approach to classify and correlate similar alerts generated from the SAT. They formalize new methods to find dependencies between alarms caused by the buffer overflow error. Then, they cluster dependent warnings in the same cluster. After that, they tag the groups based on the dominant sound alerts. In [19], authors train a decision tree ML technique using ensemble learning (i.e.training several weak classifiers to form a new combined stronger model; authors use AdaBoost for ensemble learning) to classify alerts. They labeled the training dataset generated from multiple SATs to train the created model. Their approach is based only on the SAT reports, which provide their solution better scalability( no code pre-processing is required to try their approach) Authors in [20] proposed an approach that merges several SAT alerts to extract features used in the prediction model. They use four machine learning techniques to identify the best reducing false alarms. The paper [3] tries to deal with unsoundness static analysis and the tradeoff between False Negative rate (FNR) and False Positive Rate (FPR). Since reducing FPR increases the FNR, which is more critical and vise versa. They proposed to selectively learn their SVM model by only harmless codeset structures used to predict only FPAs. In [21] and [22] authors uses ML techniques to reduce false alarms. They use typestate variables and software engineering metrics to learn their model and predict false alerts.
Authors in [23] use lexical tokenization labeled by the human to learn their CNN classifier to reduce false alerts. They propose a continuous mechanism for code integration after review.

B. Root Causes based Approaches
Root causes analysis is the process of identifying and investigating the causes of events occurrences. Therefore, investigators could specify effective corrective measures [24]. This technique is used to identify SAT false alerts root causes to eliminate or filter them. In [25], authors conduct a manual inspection of 30 javascript web application alerts generated by the static analyzer, and they conclude seven root causes of alarms. Then they use a different technique for each identified root cause to eliminate any generated alert. Authors [26] aims to reduce false alerts by reporting to the SAT user the alarm root causes to be inspected instead of the alarm itself. Also, they ask the user to answer questions related to the root causes to fix the error until no more alarm is triggered. Their approach requires extensive interaction with humans to validate root causes and define the corrective measures. The paper [27] aims to overcome the issues of the alert propagation technique. It consists of inserting new alerts before or after their causes location and removing original alarms generated by the SAT. However, the number of warnings may increase in several cases. Their paper overcomes this issue by repositioning alerts to their causes instead of creating new alerts and removing the original alert after that.

C. Model Checking based Approaches
Model Checking is a formal verification technique that investigates all possible states of a given system based on a model that defines the system behavior properties. The MC verification technique is as proper as the model representing the system [28]. SATs widely use MC techniques to reduce false-positive alerts by verifying their correctness according to the predefined model. The paper [29] aims at implementing a software analyzer that could process large-scale lines of codes with high precision at the expense of completeness and possible missing of potential defects. Their main idea is the use of specialized abstraction based on both data and predicate abstraction bounded on several model checkers. Similarly, Microsoft uses MC based static analyzer to review its software codes. Their product SLAM2 uses a model checking approach over abstract C program statements to identify program defects and eliminate false warnings [30]. In [31], authors made a benchmark using the LABMC model checking for false alert reduction. They add loop abstraction before the use of the LABMC model checker. Authors in [32] aim to detect FPAs via the use of deductive checking to verify the conforms of source code position reported by the alert with a standard coding protocol such as Sei Cert C and ANSI/ISO. Authors in [33] aim to improve the scalability of model checking to handle the massive amount of generated SAT false positive alerts. They introduce a new variable named complete-range non-deterministic values (cnv) to reduce and avoid redundant verification calls of the model checker, mostly responsible for generating false-positive alerts. Another use of system verification techniques is the employment of Satisfiability modulo theories (SMT) solvers to identify the true/false alerts. In paper [34], authors use first abstract based analysis to fastly review codes, then link alarms to the related code snippet. After transforming alerts to SMT acceptable formulae, they use it to check the properness of such warnings.

D. Data Mining based Approaches
Data Mining (DM) techniques are used to identify hidden, potential, and valuable patterns from extensive data [35]. It is designed to extract the rules from a vast amount of data to be used by the human or other automated techniques [36]. Frequently, DM is used in combination with ML techniques that use DM-generated patterns as features to learn ML model [37]. SAT uses DM techniques to identify false-positive alert patterns for further filtering. Authors in [38] use a frequencybased algorithm to discover similar warnings patterns of SAT alerts. They transform generated warnings to composed traces and then compute their similarity using a DM-based technique that calculates similar patterns' frequencies. Then they use the patterns to filter false alerts. In [39], the authors use the Stochastic gradient descent (SGD) DM technique to reduce the complexity of finding patterns from important alerts set of several SAT's reports. Then, the authors use the Adaboost ML-based technique to create a stronger classifier trained from the SGD output.

E. Rule based Approaches
Rule-based approaches are used to manipulate knowledge to interpret information in a useful way. Rules are provided by a human or automatically generated using machine learning algorithms [38]. The latter is called Rule-based machine learning, considerably used in SAT precision enhancement and FPA reduction. Authors in [40] design and implement a bug detection software based on a set of rules extracted from manual inspection of software patches. They refine rules using a feedback-based approach by iteratively improving them each time their SAT reports a false alert. In [41], authors propose a new extension to the industrial static analyzers to fix the multiple locations of frequent warnings using experts' knowledge in the form of rules. Their expansion reduces only one false alert type by detecting the alert's name and applying a rule-based knowledge algorithm to check its truth. Authors in [4] propose a new algorithm to distinguish true positive from false-positive alerts. They try to identify the connection between the CWE and false positives to extract new rule-based patterns.

F. Semantics based Approaches
Semantic approaches refer to the meaning of language constructs. It "provides the rules for interpreting the syntax which does not provide the meaning directly but constrains the possible interpretations of what is declared," according to Euzenat [42]. The semantic approach uses mathematical logic to build rules describing constructs and relations identified in the program code. In [43] use logic programming language named DataLog to build their declarative static analyzer called URSA with the help of interactive user questions to identify alarm root causes. This tool augments the semantics of DataLog to control its over-approximation. Authors in [44] define new abstract domains that specify software violations. They apply the finite state machine technique to determine these domains and use them with a semantic-based static analyzer. In paper [45], authors propose an algorithm to generate a program graph that is used along with a static analyzer report to prioritize true bugs and reduce false alerts. Their main contribution is extracting semantic information to calculate the severity level of warnings and then using the graph algorithm to prioritize SAT alerts.

G. Slicing based Approaches
The program slicing approach is mainly used to avoid the complexity analysis of codes by reducing the original program to its minimal form called slice while keeping the same program behavior [46]. It consists of the computation of a program statement set, called program slices, that may affect the values at some point of interest. The slicing approach is used widely in program debugging to locate errors more easily [47]. There exist two types of Slicing techniques: static program slicing and dynamic slicing. The first, according to the original definition of Weiser, consist of all statements in a program that may affect the value of a specific variable in a certain statement [46]. In contrast, dynamic program slicing "contains all statements that actually affect the value of a variable at a program point for a particular execution of the program rather than all statements that may have affected the value of a variable at a program point for any arbitrary execution of the program" [48].
The main idea of the slicing approach proposed by [49] is the decomposition of the program into several executable slices and run dynamic analysis over each of them, which will reduce the processing time and complexity and consequently reduce the false alarms. Authors in [50] aim to focus directly on the sliced code generated by the alarm and verify its correctness. After applying static analysis over JAVA EE code, they slice the code based on the linked alert, transform it into executable slices and verify the code again while filtering any false alarm.

V. COMPARISON AND ANALYSIS OF RECENT EXISTING EFFORTS
This section provides the different extracted data from the 30 selected papers after several rounds of peer-reviewing depicted in Table III. This effort starts by depicting the papers processed bugs called bugs coverity aiming at knowing whether the proposed approach deals with all security bugs or just focuses on some types. According to the Table III, 53.3% of the approaches filter all kinds of defects in general. However, a considerable effort, about 46%, focuses only on specific types of defects, and therefore they could not be used without combining with other methods. Also, only 43% of papers explicitly aim to reduce false-positive alerts while maintaining high accuracy in detecting true security bugs.
Then, the paper show the categorization of the different approaches as detailed in S IV. The extensive use of ML based approaches to reduce false alerts is very observable, which is very expected since ML techniques outperform other methods when treating big data. However, the main issue of ML-based approaches is the need for a large amount of labeled data to obtain satisfactory accuracy. ML-based techniques are combined with model checking methods to verify source code properties better, extract features, and predict or classify the alerts. All identified papers that use ML techniques are applied to the source code or SAT alert reports.
Data mining-based approaches are used in four papers to reduce false-positive alerts. Also, none of the identified articles using DM methods are applied to the SAT source code level. It is explained by the SAT use of verification techniques based on knowledge rules to check software source code rather than ML or DM based models.
Model Checking based approaches used logical rules to verify source code properties or alerts truthiness. MC methods are applied and used for all integration levels.
Root causes based approaches as well are used to identify the location of alert causes from the examined software source code. Thus, all papers using root causes-based approaches apply their methods to both software source code and SAT alert reports.
Semantic based approaches is generally used to extract source code properties used further as patterns and features by SAT.
Slicing based approaches most times used to reduce source code complexity by decomposing it into small slices having the same behavior then run SAT over reduced programs which improve its soundness without throwing a large number of false alerts. Rule based approaches are only used with software source code for rule patterns extraction used after that by ML or DM based techniques to predict or classify alerts.
The supported languages feature aims to understand the research direction focus on the handled languages. Since C language is unsafe, most SAT analyzers are dedicated to analyzing C codes. Consequently, most approaches dealing with FP alert handling are generated from the static analysis of C implemented applications.
The Scalability feature seeks to depict the extend of a proposed approach to easily being used by most users. In Fig.  3, this article distinguishes three application levels of false alert handling approaches, which are Software source code level, Static analyzer source code, and Static analyzer alerts report. Also, Fig. 3 summarizes each application level's most used approach categories. This paper consider approaches dealing false alert handling only from SAT reports as the most scalable. It is explained by the direct processing of SAT reports without any preprocessing, which will avoid any inconvenience when trying to adopt the approach. Meanwhile, approaches already integrated with SAT tools are also easy to use since the difficulty is only in the integration step already made by the approach's authors.
The human effort feature shows the approach reliance extend of human intervention. Of course, each time the proposed method does not require human interaction, it is considered more effective, scalable, and time/cost-saving. Almost all ML and DM-based approaches require moderate to extensive human intervention. This is due to the labeling effort required to train the created models. Few ML-based approaches require reduced human efforts explained by using a clustering approach to label the dataset then use it to lean an ML model. It is observable from the Table III that almost proposed approaches, that do not require human intervention, are applied to the SAT alerts report.
The False Alerts Reduction Rate feature extracts the reduction rate of false alerts, as mentioned by the paper authors. Some papers explicitly present the reduction rate while, in other articles, the FPA reduction rate is deduced. Almost approaches do not exceed 90% of reduction rate except one paper [41] that reaches a 100% reduction rate but for only one type of alert.

VI. AN OVERVIEW OF THE USED DATASETS
Finding or creating an effective dataset that reflects the real issues and complexity of software source code analysis to validate SATs is of paramount importance. To facilitate the identification of valuable datasets, this study extracts the relevant datasets used by the selected papers and shows their related information. It is worthy to note that several papers do not explicitly provide the used dataset, while others use an anonymous dataset for privacy issues. Thus, this article presents only the papers providing open datasets. The Table  IV provides the dataset name or the paper reference using it. All the programs are available through quick Google searching.
This study depicts if provided by the paper's authors, the features extracted from the dataset used to train their models that has the potential to predict or classify the alerts. The number of Lines Of Code (LOC) for each program to better know the used dataset's size is also depicted. It is very observable that almost all datasets are not labeled, and authors do not share their manual labeling of SAT alert reports. Only two datasets from NIST and OWASP provide guidelines to label SAT alerts.

A. Interesting Dataset Projects and Competitions
This section presents interesting community projects aiming to provide accurate datasets and enhance static analysis verification research.

1) Juliet Dataset:
The National Institue of Standards and Technology (NIST) provides the Software Assurance Reference Dataset (SARD) 1 to users, researchers, security assurance developers to evaluate SAT and test their methods. SARD includes a set of well-known security flaws as test cases covering all software development lifecycles. Also, it covers a large variety of vulnerabilities, languages, platforms, and compilers. The dataset fits all user's needs since it includes wild, synthetic, and academic test cases. It is intended to be a broad effort contributed from many sources 2 . 2) OWASP Benchmark Project: OWASP Benchmark Project 4 aims to address the difficulties of testing software defects detection tools and study their weakness, strengths, and analysis time. OWASP provides a Java test suite designed to investigate and evaluate the accuracy, coverage, and speed of Software vulnerabilities analysis and detection tools. OWASP benchmark provides the users with test cases covering all kinds of vulnerabilities and a scoring tool to score the SAT-generated alert and compute the True Positive, False Negative, True Negative, and False Positive alerts percentages.

3) Competition on Software Verification: The European
Joint Conferences on Theory & Practice of Software, ETAPS 5 organizes each year, starting from 2012, an international competition on software verification to boost the invention of new methods, technologies, and tools to improve the software analysis process.
In the training phase, they provide several benchmark programs, each covering a wide range of CWEs weaknesses to SAT developers. Then, the submitted verifiers' tool will be executed in the evaluation phase, and the number of solved instances and runtime is measured. Researchers could find valuable programs to use as a dataset within the ETAPS website and competition results of each year. Table V presents the security bugs handled on different papers to enhance SAT precision to detect potential security bugs without increasing the FPA rate. The paper [19] is the only one that considers almost security bugs during the FPA reduction process.

B. Papers' Identified Security Bugs
This section answered the research questions RQ4 and RQ5 by presenting the used datasets, the identified bug types, and the relevant projects and competitions.

VII. DISCUSSION: SHORTCOMINGS AND RECOMMENDATIONS
In the review of the identified approaches, this study depicted several shortcomings that decrease the effectiveness of false alert handling methods. We cite mainly: • almost cited papers use open programs labeled by themselves without providing their alerts labeling datasets. Which will prohibit other researchers from reproducing the papers' proposed method. • only a few papers consider the combination of more than one technique to handle alerts.
• almost proposed methods deal with the software source code refinements and analysis, which is not scalable as handling the SAT report directly.
• machine learning-based techniques require labeling efforts to examine and validate the proposed approach, which inhibits several researchers from using ML techniques.
• most of the proposed SAT false alert reduction approaches cover C, C++, and Java languages. However, languages such as Python used extensively with big data are rarely focused on by researchers.
To address these shortcomings, this study recommends focusing more on: • combining several techniques parallelly or sequentially to get better accuracy and lower FPA rate. Existing studies on methods combination shows promising results [5].
• focusing more on the slicing approach to decompose extensive application on small slices is highly encouraged since SATs are very useful with small programs.
• focusing on the processing of SAT report directly to provide better scalability and testing easiness. The significant size of the alerts report is a suitable dataset to examine through deep learning techniques.
• thinking on labeling SAT alert report using active learning techniques to reduce the human effort [52].

VIII. CONCLUSION
This paper studied the recent efforts dealing with SAT alerts handling. It provides a new categorization of the used techniques as well as a comparison between the proposed methods. Then, it presents the datasets used to test and validate the different approaches along with information about their size, features, and contained bugs. It summarizes the shortcomings of existing approaches and cites recommendations for future research to improve SAT false alerts handling.
As future plans, profoundly investigating the slicing approach of SAT alert reports and their processing using MLbased techniques will help preserve the SAT scalability and benefit from the high classification accuracy of ML-based methods.