Statistical Analysis for Revealing Defects in Software Projects: Systematic Literature Review

—Defect detection in software is the procedure to identify parts of software that may comprise defects. Software companies always seek to improve the performance of software projects in terms of quality and efficiency. They also seek to deliver the soft-ware projects without any defects to the communities and just in time. The early revelation of defects in software projects is also tried to avoid failure of those projects, save costs, team effort, and time. Therefore, these companies need to build an intelligent model capable of detecting software defects accurately and efficiently. The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, the conclusion is discussed. The search string was applied to all electronic repositories looking for papers published between 2015 and 2021, which resulted in 627 publications. The results focused on finding three important points by linking the results of manuscript analysis and linking them to the results of the bibliometric analysis. First, the results showed that the number of defects and the number of lines of code are among the most important factors used in revealing software defects. Second, neural networks and regression analysis are among the most important smart and statistical methods used for this purpose. Finally, the accuracy metric and the error rate are among the most important metrics used in comparisons between the efficiency of statistical and intelligent models.


I. INTRODUCTION
Software companies aim to improve the quality of software projects in terms of their accuracy and efficiency. Software companies consume from 50% to 75% of the total budget of software projects in finding and fixing defects in those projects [1]. In the CHAOS report, many software projects vary in size (small, medium, and large projects) and, therefore, cost. These projects use many software development methods such as waterfall and agile. Several software projects failed due to the development and testing phase, as shown in Table I. A standard software development cycle has six phases, namely, planning, analysis, design, implementation, testing, and maintenance. In the development phase, developers modify source code that may lead to many defects in a software project. In modifications, developers should be careful not to produce any new defects in these projects. The testing phase is crucial to software projects. It is responsible for delivering the final project or product efficiently to customers without any defects and in time. Many factors, such as McCabe and Halstead, help developers find and fix defects in those projects, as shown in Table II. Nevertheless, there is difficulty in using these factors in medium and large-scale projects. Thus, developers need a statistical or intelligent model capable of predicting defects in software projects accurately and efficiently.
Many reasons lead to the failure of software development projects. These are the lack of experience of the project team, lack of knowledge of the code language, insufficient experience in the field, etc. Software defects in the development phase are among the most critical problems facing software companies because the many defects lead to those projects' failure. The avoidance of software defects is to gain clients' trust by providing a quality product. According to the CHAOS report, many software projects still fail because of the many reasons that have been mentioned earlier [2]. However, the direct reason for these projects' failure is the emergence of many software defects, as shown in Table I [2].
It was performed a compressive study about the relevant related work using PRISMA methodology. The PRISMA explanation gives the minimum set of items for detailing a precise audit. It comprises the four-phase flow diagram, which permits us to utilize the Clarification and Elaboration document to go through cases and clarifications and find the meaning and method of reasoning for each item on the checklist. For a clear understanding of PRISMA, perusing the Clarification and Elaboration document is unequivocally recommended. The PRISMA Stream Graph delineates the stream of data through the diverse stages of a Precise Audit. It maps out the number of records recognized, included, and prohibited and the reasons for avoidances.  Many researchers, such as [3] and others [4]; [5] have suggested many factors to detect software defects. However, to date, there is no formal study to determine the critical factors to help software companies detect software defects with a reasonable degree of accuracy. Most researchers such as [6] and others also used scientific methods and models to detect software defects, but these models were weak in accuracy and results. Thus, software companies need a formal study to determine the critical factors to build a statistical model capable of detecting software defects with high results and accuracy.
The paper is organized as follows. Section 2 presents the materials and methods, PRISMA, search questions, and search strategy. Section 3 presents the results with an analysis, and discussion, visualizing analysis and analysis per topic. Section 4 presents the methodology. Finally, in Section 5, we discuss the conclusion.

II. MATERIALS AND METHODS
The methodology is composed of three steps. First, PRISMA was used to find appropriate manuscripts in our research based on the manuscript title and the experimental results of the manuscripts. Second, bibliometric analysis was used to find the common terms that influence the revealing of software defects in terms of critical factors, performance metrics, and intelligent and statistical methods. Finally, the manuscripts were analyzed in detail to extract the most important factors and statistical methods used in detecting software defects and linking them to the results of the bibliometric analysis.
The systematic literature survey presents an evaluation of the scientific community's contributions to the topic of revealing software defects by using a rigorous and auditable methodology based on the PRISMA approach.
The PRISMA method is composed of five phases, as follows:  Identification of relevant manuscripts of the domain or domains.
 Screening of titles, abstracts, papers without experiments, and position papers.
 Eligibility analysis.
 Full-text screening exclusion.
 Final papers to be analyzed in detail.
It was also adopted a bibliometric map; the bibliometric map is used to find the relationships between common software defects domain terms [8]. To this end, three phases were followed, evaluating the following quantities:  Words frequency.
 Most common words.
 Frequency of these common words in the final manuscripts of the study.

A. Research Questions
The study aims to provide a state-of-the-art review of current research efforts in revealing software projects. It was started by introducing the reader to specific topics concerning research objectives and employed methods. Particularly, the survey addresses the following research questions, aiming to identify the adoption techniques that have been applied in the overall domain of revealing software defects: RQ1: What kinds of metrics have been adopted in software defects (SD)? RQ2: Which statistical or intelligent techniques have been adopted for SD? RQ3: What performance metrics have been adopted in the literature in the prediction of SD?

B. Search Strategy
A literature survey, generally, recommends searching several available journal and conference paper repositories to (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 11, 2021 239 | P a g e www.ijacsa.thesai.org determine if similar work has already been performed, aiding in locating potentially relevant studies. The papers counted were searched in two electronic repositories, Scopus, and Web of Science. This study's covered topics were multidisciplinary, including, Software, Computer Science, Engineering, Mathematics, Environmental Science, Telecommunications, and Multidisciplinary Sciences. However, both repositories were used. The analysis showed that most of the publications from Web of Science were in Scopus as well. A repeated search process was performed to identify publications that have in their titles, abstracts, or keywords the following expressions: "software-defects" (or software defects, or defect or projects defects), and "machine learning" in Fig. 1.
Phase 1, the search string was applied to all electronic repositories looking for papers published between 2015 to 2021, which resulted in 627 publications. Phase 2 followed a 5-step approach. In step 1, we excluded manuscripts based on titles (e.g., software defects, regression, and machine learning), which narrowed the set to 211 publications. In step 2, we excluded manuscripts based on abstracts screening, which resulted in 117 publications. In the following step 3, we excluded manuscripts reporting research without experiments, resulting in 83 publications.
Subsequently, in step 4 of phase 2, we excluded position manuscripts which gave us the final figure of 29 publications, as shown in Fig. 2.  In phase 3, manuscripts underwent a full-text reading and review, which lead to no exclusions (the result of phase 4).
As a result of our paper selection approach, the final list included 29 manuscripts (phase 5), analyzed in detail in this paper. These were further divided into the following four categories, as shown in Tables III and IV.  Regression analysis studies to reveal Software Defects.
 Studies of Software Defects Prediction. (software-defects OR defect OR projects) AND (OR "data mining" OR forecasting OR "machine learning" OR "neural network" OR "clustering" OR "artificial intelligence" OR "prediction" OR "predictive" OR "statistical" OR analysis") www.ijacsa.thesai.org  This section introduces two main parts, which are bibliometric analysis and analyzing previous works in detail. The first part shows the relationships between common terms in intelligence, statistical techniques, and performance metrics used in the previous study. The second part seeks to find the scientific gap between proposed manuscripts in this study to build a novel model to overcome the issues for revealing defects in software projects.

A. Visualizing Analysis
It was used VOS viewer ("VOS viewer," n.d.), a Visualizing bibliometric network, to find common terminology in two areas: software defects and statistical techniques, across the 29 manuscripts under analysis. This tool supported the study with visual information enabling us to explore the relations between the domains of software defects and statistical techniques. Moreover, it helped to find the most common dimensions, clustering, and variety techniques able to answer the research questions. Fig. 3 represents the visualization of a network map that displays the relations between the most popular terminology, how it is linked. The larger node represents the popular terminology in manuscripts, and the size of it represents the number of times these words appeared in manuscripts. VOS viewer splits the terminology into clusters according to the relevance concerning each other.
It was performed the analysis on the title and abstract using a binary counting method of 759 examined keywords with a minimum threshold of 2 occurrences, resulting in 57 terminologies, as shown in the figure. The largest nodes representing the important nodes of each cluster in the network map are determined as" Regression" (red), "cluster" (yellow), "software engineering" (green), "neural network" (blue), and finally "software defect prediction" (purple). Looking closer at the network map in Fig. 1, we can see that the 5 clusters are connected between them; for instance, the "regression" term is connected to "fault prediction model" in the same red cluster, it connected to "cluster" and "accuracy" in the yellow cluster, it is also connected to "software engineering" and "recall" in the green cluster. Finally, it is also connected to "neural network" and "feature selection" in the blue cluster; it is also connected to "software defect prediction "and "defect density." Besides, the term "software engineering" in the green cluster is connected to "cluster" in the yellow cluster, "regression" in the red cluster, and "neural network" in the blue cluster. Moreover, the terms "random forest" and "feature selection" are connected to "neural network" in the blue cluster, "recall" and "software engineering" in the green cluster, "cluster" in yellow cluster, "regression" and "fault prediction model" in the red cluster and "software defect prediction" and "defect density" in the purple cluster.
Finally, by analyzing the network map in Fig. 1 was possible to identify the important terms in each cluster, as follows:  In the red cluster: "regression" and "software prediction model."  In the yellow cluster: "cluster" and "accuracy."  In the green cluster: "recall" and "software engineering."  In the blue cluster: "random forest", "feature selection" and "neural network"  In the purple cluster: "software defect prediction" and "defect density." B. Analysis per Topic RQ1 drove to look for metrics, data sources, and critical factors able to reveal software defects. Our review of papers S1 to S26 allowed us to extract such critical factors.  Table V shows the variety of metrics used in predicting defects in software projects. The studies of S1, S4, and S16 relied on team dimension (team size and the number of developers) to predict software defects in software projects. The studies of S2, S3, S4, S7, S8, S12, S15, and S26 relied on software status dimensions (software complexity, number of defects, and software size) to detect defects in those projects. Moreover, the studies of S5, S6, S9, S15, S16, S18, and S21 relied on the OOP dimension (coupling between object classes, depth of inheritance tree, number of methods) also to reveal defects in those projects. Also, the studies of S7, S8, S10, S11, S13, S16, S19, S22, and S26 relied on McCabe metrics (line count of code, cyclomatic complexity, essential complexity, and design complexity) to find the optimal intelligent techniques to predict defects in software projects. Finally, the studies of S1, S3, S11, S13, S16, S17, S25 relied on Halstead Metrics (total operators + operands, effort to write the program, number of delivered bugs, count of lines of comments, and time to write a program) to forecast defects in various software projects. We observed that four factors are the most used in predicting defects in software projects. These are the number of defects, depth of inheritance tree, number of methods, and line count of code.

S1
Software testing defect prediction model-a practical

M.D.
Suffian and et al S2

[11]
A Prediction Model for System Testing Defects using Regression Analysis A.
FELIX and et al.

S3 [12]
Integrated Approach to Software Defect Prediction VERMA and et al.

S4 [13]
Prediction of defect density for open source software using repository metrics Sharma and et al.

S5 [14]
Identification of latent variables using factor analysis and multiple linear regression for software fault prediction Sari and et al.

S6 [15]
Use of Logistic Regression Analysis for Bug Prediction MAUSA and et al.

Metrics as Identifiers of Defect Occurrence
Severity presented a model for predicting defects in software projects

M.
Dhillon and et al s9 [18] An empirical model for fault prediction on the basis of regression analysis Chen and et al.

s10 [19]
An empirical model for fault prediction on the basis of regression analysis H.

s11 [7]
Extracting software static defect models using data mining

Karuna P
and et al.

s12 [20]
Statistical analysis of metrics for software quality improvement An enhanced evolutionary model for software defect prediction Defect prediction on a legacy industrial software: a case study on software with few defects KUMAR and et al.

s15 [23]
An effective fault prediction model developed using an extreme learning machine with various kernel methods Zhang and et al.

s16 [24]
Towards building a universal defect prediction model Marandi and et al s17 [25] An approach of statistical methods for improving software quality The impact of using regression models to build defect classifiers

S.
Rathore and et al.

s19 [27]
Predicting the number of faults in a software system using genetic programming

M.
Sirshar and et al.

s20 [28]
Comparative Analysis of Software Defect Prediction Techniques

M.
Rawat and et al.

s21 [29]
Software defect prediction models for quality improvement: a literature study Feng and et al.

s22 [30]
Software defect prediction models for quality improvement: a literature study Patil and et al.

s23 [31]
Software defect prediction models for quality improvement: a literature study Jiarpakdee and et al s24 [32] Software defect prediction models for quality improvement: a literature study Bangash and et al.

s25 [33]
Software defect prediction models for quality improvement: a literature study

S.
Morasca and et al s26 [34] Software defect prediction models for quality improvement: a literature study While addressing RQ2, we examined the techniques applicable in predicting defects in software projects. With this goal, we analyzed manuscripts S1 to S26 and noticed that techniques such as multiple linear regression, logistic regression, and machine learning are the most adopted, as shown in Table VI. Moreover, multiple linear regression was adopted by 23% of the analyzed manuscripts, whereas statistical analysis and data mining were the choices in 27% of manuscripts. Logistic regression accounted for 27% of the revised manuscripts. Also, machine learning techniques accounted for 19% of the revised manuscripts. Finally, the remaining 4% corresponded to the other intelligent techniques. We noticed four points.
Firstly, the studies (S1, S2, S4, S5, and S21) relied on multiple linear regression where S1 presented a model to predict defects in software projects to enhance the quality of software testing. This study seeks to find a suitable model to predict software defects to save effort, costs, and software companies' time. The results of this study show that R square and standard errors are 0.91 and 5.90%, respectively. S2 presented a model for predicting defects in software projects to improve the testing process in those projects. Besides, the adjusted R square in multiple linear regression is 90%. S4 presented a framework to predict defect density in opensource software projects. The results of this study show that the R square in multiple linear regression is 0.86. S5 presented a model to predict faults in software projects. Furthermore, the results of this study show that R square and adjusted R square are 83% and 80%, respectively. S21 presented a review study to detect defects in a software project. It also seeks to find an optimal model to detect defects efficiently to save costs and time. Also, this study confirmed that regression models have achieved high results in terms of accuracy in detecting defects of software projects.
Secondly, the studies (S6, S7, S8, S9, and S10) relied on logistic regression, where S6 presented an approach to improve the quality of software projects by detecting bugs in software projects efficiently. Also, the standard error in the proposed statistical technique is 0.24. S7 presented a study to detect defects in software projects in the early stage to save effort, money, and time. This study also depends on statistical techniques such as correlation coefficients and logistic regression. The results show that the accuracy in logistic regression is 91.2%, and the correlation coefficient is 0.95. S8 presented a model for predicting defects in software projects.
The result of this study shows that the standard error in logistic regression is 0.19. S9 presented an empirical model to predict fault in software projects. This study also depends on the binary logistic regression technique to predict defects in software projects. The results also show that the precision, recall, and f1 measures are 0.65, 0.9, and 0.79. S10 presented www.ijacsa.thesai.org a study to predict software defects by using logistic regression just in time. The results of this study show that the proposed technique is better than the state-of-the-art methods in terms of accuracy. The accuracy of the proposed technique is 0.73. Othe r S1 ✓ Thirdly, the studies (S3, S11, S12, S14, S17, S24, S25) relied on statistical analysis and data mining techniques where S3 presented an approach to forecasting defects in software projects. It also depends on statistical regression such as multiple linear regression to predict defects in those projects. Besides, the adjusted R square in statistical regression is 98.6%, and the correlation coefficient is 0.98. S11 presented a model to extract software static defects by using data mining techniques. The results of this study show that the accuracy in Association Rules, Decision Tree, Naive Bayes, and Neural Network is 77.2%, 76.6%, 73.2%, and 73.2%, respectively. Thus, Association Rules is better than Decision Tree, Naive Bayes, and Neural Network in terms of accuracy. S12 presented a study to improve the quality of software projects using statistical analysis. The results of this study were evaluated in terms of projection of errors (total errors) and cumulative projection of severity errors (e.g., series, moderate and minor). It also shows that total errors in 2016 are more than in 2015 by 1.5%.
Moreover, most severity errors are minor types. S14 presented a study to predict defects in legacy industrial software using data mining techniques. The results of this study show that the area under the curve (AUC) in Random Forest, Logistic Regression, Decision Tree, Naive Bayes, and a combination of Random Forest + Logistic regression is 0.73, 0.72, 0.66, 0.67, and 0.75. Thus, a combination of Random Forest + Logistic regression is better than Random Forest, Logistic Regression, Decision Tree, Naive Bayes. S17 presented an approach to improve software quality and cost minimization using statistical analysis. The results of this study were evaluated in terms of standard error. The standard error in the statistical model is 0.13. S24 presented a study to evaluate the impact of automated feature selection techniques on the interpretation of defect models. This study investigated 12 automated feature selection techniques in terms of consistency, correlation, performance, computational cost. By analyzing 14 publicly-available defect datasets, the results showed that the most important inconsistent metrics are highly correlated with the automated Spearman correlation of 0.85-1. S25 presented a study to predict defects in software models. This study applied the Mathews Correlation Coefficient-MCC to avoid defects in software models. MCC in F-score is less than 0.01. Therefore, the proposed technique is better than the state-of-the-art methods in terms of MCC.
Fourthly, the studies (S13, S15, S16, S18, S20) relied on machine learning techniques where S13 presented a model to predict software defects by using an enhanced genetic algorithm. The results of this study were evaluated in terms of precision. It also confirmed that precision in enhanced genetic algorithm, genetic algorithm, and neural network is 0.93, 0.81, and 0.80, respectively. Thus, the enhanced genetic algorithm is better than the genetic algorithm and neural network. S15 presented a model to predict effective faults in software projects using extreme learning machines with various kernel methods (e.g., Linear kernel, Polynomial kernel, and Sigmoid kernel). The results of this study were evaluated in terms of accuracy metrics. The accuracy in the linear kernel, Polynomial kernel, and Sigmoid kernel is 0.88, 0.93, and 0.91. Thus, an extreme learning machine using the Polynomial kernel is better than linear kernel and Sigmoid kernel. S16 presented a model to predict universal defects in software projects using clustering techniques. The results of this study were evaluated in terms of AUC. The AUC in K-mean clustering is 0.76. S18 presented a model to detect defects in a software project. This study depends on object-oriented metrics. It also relies on many intelligent techniques such as linear regression (LR), logistic regression (LG), random forest (RF), support vector machine (SVM), and neural network (NN). The results of this study were evaluated in terms of AUC. The AUC in LR, LG, RF, SVM and NN is 0.86, 0.94, 0.91, 0.90 and 0.90. Thus, LG is better than LR, RF, SVM, and NN. S20 presented a review analysis to predict defects in a software project. This study depends on many metrics, such as product and process metrics. It also introduced a www.ijacsa.thesai.org comparative analysis between Neural Network, Naive Bayes, Deep Forest technique. This study relies on previous works in the analysis of these techniques. Besides, this study confirmed that Deep Forest is better than Neural Network, Naive Bayes in terms of error rate.
Fifthly, the studies (S19, S22, S23, and S26) relied on other intelligent and statistical techniques where S19 presented an approach to predict many faults in a software system by using a genetic algorithm. The results of this study were evaluated in terms of error rate and recall. The error rate and recall in the genetic algorithm are 0.11, 0.91, respectively. S22 presented a new technique in software defect prediction by Complexity-based Oversampling. This paper relied on three main factors: a line of code, number of children, and weighted method per class. By analyzing the results, the proposed technique is better than the other oversampling techniques under the statistical Wilcoxon rank-sum test and Cliff's effect size. S23 presented a framework to predict software defect type using concept-based classification. This paper's main objective is to minimize the labeled training data's dependence for automation of the software defect type classification task. The results show that the proposed framework outperforms the state-of-the-art semi-supervised [LeDEx] in terms of the F1 score. F1 score in the proposed framework and LeDEx is 63.16% and 62.30%, respectively. S26 presented a study to assess the software prediction model by using Receiver Operating Characteristic. The results showed that the proposed technique is better than all other state-of-the-art methods in terms of recall and accuracy by 0.4 and 0.8, respectively.
Our research helped us to determine several research gaps. It was only possible to identified a few manuscripts (S11 and S13) tackling specific metrics impacting defects in software projects. For example, some studies (S5, S6, S9, S18, and S21) are concentrated on the OOP metric in general, with no mention of the line count of code and the number of developers. There are only simple manuscripts (S14, S20, S23, and S24) regarding finding defects in all types of software projects (small, medium, and large projects). However, stakeholders in software companies seem to find this topic pertinent and are willing not only to enhance software efficiency in those projects but interested to predict early defects in software projects to save costs and money. The results of this survey also showed a significant gap in the field of "intelligent and statistical models," particularly relating to the automatic prediction of defects in software projects. Some of the most promising algorithms are not yet being utilized.
Only a few studies (S18 and S21) tackle the application of "hybrid statistical and intelligent techniques, for instance, logistic regression with multiple linear regression and regression analysis with deep learning," which is a promising technique for forecasting defects in software projects. Moreover, there is a lack of official studies to identify critical factors that influence defects in software projects.
Proposal of a new proposed model based on a statistical model able to predict defects in software projects. This section presents an approach for a statistical model able to predict defects in software projects. The proposed model has been used in several scientific data science researches like is the case of [7]. As shown in Fig. 4, the detailed the proposed model will cover the following phases:  State-of-the-art analysis: Review the literature to extract important metrics, data sources, mathematical and computational approaches used for predicting defects of software projects.
 Data collection: data is collected from the NASA data sets online. We have two reasons to select the NASA Data set. The first reason is it is too hard to collect huge data from software companies to reveal the defects in software projects. The second reason for selecting Nasa is based on its vast and high-quality data. It explains the static measures and other variables that are used to detect static defects in software projects. It also shows a binary variable indicating whether the module is defective or not.
 Data Analysis and Pre-Processing: Analyze the data in detail and, if necessary, transform it to expose its information content better. Different mathematical techniques may be used, namely, outlier removal, discretization, reduction of the number of variables, and/or dimensionality (adopting regression models).
 Feature selection: determine critical metrics and detect defects that will be adopted in the proposed IST study by using logistic regression and multiple linear regression. Create a mapping between logistic regression and multiple linear regression to determine the final list of critical metrics capable of predicting defects in software projects.
 Build a model: present a statistical model capable of predicting defects in software projects using multiple linear regression and logistic regression.
 Training and verification model: train the model with data set and verify its ability to predict defects in software projects.
 Also, we will present a comparison between logistic regression and multiple linear regression by using the final list of critical metrics to determine which one is better than the other in terms of accuracy, precision, recall, F1 measure, and error rate.
Following this holistic approach, we built a methodology composed of five phases, as shown in Fig. 4.

V. CONCLUSION
This paper presented a systematic review on the topic of revealing defects in software projects, concentrating on finding replies to our research questions, a diplomatic map was used to find the most used terminology in the statistical technique"s software projects domains. By following a Prisma approach in our systematic review, we started by determining 627 papers and ended with VP analyses of 26 papers. The research questions covered three major points. Firstly, we identified the factors of our metrics that influence revealing defects in software projects. Secondly, we concentrated our research on identifying the production techniques used in the context. After, we determined the evaluation criteria used by those techniques. Thus, there is still a chance for enhancement regarding our topic to use statistical and intelligent techniques to reveal defects in software projects.
Finally, a new methodology based on a statistical model able to predict defects in software projects was proposed.
This study succeeded in identifying the critical factors that affect the detection of defects in the programs. Statistical analysis is executed by four methods, which are MLR-CDF, MLR-PLSDF, LR-CDF, and LR-PLSDF. LR-CDF outperforms on all the proposed methods in order to accuracy and standard error. In addition, LR-CDF outperforms on stateof-the-art methods (Association rule, Decision tree, Naive Bayes, and neural network) related to the accuracy by 9.1%, 10.3%, 13.1%, and 13.1%, respectively. www.ijacsa.thesai.org The study has some limitations. it was restricted by the search keywords selected and the time of the manuscripts (last six years). In addition, it utilized a fixed number of electronic sources. Furthermore, this study only handled English scientific papers, and we cannot warranty to have picked all the worthy substance for our review.
It is recommended as future work to utilize other techniques in terms of improving the model accuracy and identifying critical factors for revealing defects in software projects.