Effective Prediction of Software Defects using Random-tree Entropy based Feature Selection Framework

—Software systems have grown in size and complexity. These characteristics increase the difficulty of preventing software errors. As a result, forecasting the frequency of software module failures is critical to a developer’s efficiency. Many methods for defect detection and correcting problems exist. Hence, Machine Learning (ML) classification performance has to be greatly improved. Thus, in this study, a novel approach is proposed for predicting the number of software defects based on relevant variables using ML. First, feature entropy on each raw features is performed and then identifying the un-pruned random feature. Then is selected the relevant feature through the identical existence among the entropy and un-pruned feature. And finally, the software defect dataset of National Aeronautics and Space Administration (NASA) PC-1 is sent to an ML-based model to estimate the number of faults. Initial PC-1 dataset comprises 37 raw features from this only 8 critical characteristics are utilized to enhance the ML model. A random tree feature selection strategy is shown to be accurate and potentially outperform existing methods in the experimental results. The proposed method considerably outperformed the performance of current ML models by obtaining the accuracy of 97.76% in Random Forest (RF) model.


I. INTRODUCTION
In the recent years, the researcher tried to find different techniques and tools in taming the quality, dependability, and reliability of the software systems [1]. A software defect can cause minor inconvenience or catastrophic failure. Predeployment fault prediction for testing is supported by recent research in software fault prediction (SFP). Object-oriented programming is harder than procedural programming due of inheritance. By identifying faulty software modules before to the start of the testing process, software defect prediction can help enhance software quality and testing efficiency. These findings aid software engineers in allocating scarce resources to more prone-to-failure modules. Complex software application can deliver high efficient, accurate and powerful work to modern organizations [2]. Software defect prediction (SDP) has grown in popularity during the previous two decades. The results of the SDP assist in allocating resources for software testing. However, defect prediction is often employed for activities with a high degree of precision. It is difficult to ensure resource allocation prior to software testing or without prior execution data. Machine learning is used to identify problematic modules, as it reveals hidden patterns in software properties [3]. The feature selection activity removes non-classification features with low performance [4]. The variant selection activity selects the best versions of classification methods for their ensemble [5].
A data collection method based on regular expressions and bug-code linking [6] is proposed. In terms of accuracy and consistency, our strategy outperforms other commonly used data collection methods and their publicly available datasets [7]. Around 65 publicly available base datasets containing Chidamber and Kemerer (CK) and other inheritance indicators were used to determine the effect of inheritance on SFP [8]. They investigate the degree to which an inheritance metric accurately predicts software fault proneness. Additionally, they choose CK measures and inheritance metrics for predicting software problems. In SFP experiments, metrics such as exclusive usage and inheritance viability are analyzed [9]. They combed publicly available inheritance metrics data sets and discovered approximately 40 that contained inheritance metrics. Their initial cleanup included nine metrics relating to inheritance.
They preprocessed selected data sets and then merged them using all possible inheritance metrics combinations. The study [10] examined defect prediction datasets. There is no memory data management strategy proposed, nor is a mechanism for defect detection proposed. The proposed technique for defect prediction keeps track of the error rate performance. The defect prediction detector initiates the generation of defects, warning, and control flags. The proposed technique outperforms the conventional technique (p-value 0.05) and within-group comparisons yield statistically significant effect sizes. We observe that increasing the error rate results in DP, which results in suboptimal prediction performance. To overcome the difficulties associated with zero value thresholds, a spectral classifier based on the median absolute deviation threshold was developed [11]. Rather than using a measure of central tendency, this method makes use of the dispersion of eigenvector values. The report's baseline technique is a zerovalue threshold spectral classifier, and the entity class is predicted using a heuristic technique. www.ijacsa.thesai.org The highest co-entropy criteria [12] successfully handle the non-Gaussian noise for SDP. A new classifier is created after instance filtering, feature selection, and reduction. It also finds a non-normal distribution for the 21 most significant software indicators. The hybrid feature selection (HFS) [13] is divided into two stages and it clusters features first using hierarchical agglomerative clustering and then eliminates un-normalized and duplicate features using two wrapper methods. Three distinct classifiers with four performance metrics were evaluated empirically on 11 well-studied NASA programs such as accuracy, precision, recall, and F-measure.

II. RELATED WORK
To predict defects on NASA datasets, decision tree (DT), random forest (RF), Naive Bayes (NB), multi-layer perceptron (MLP), radial basis function (RBF), support vector machine (SVM), and k-nearest neighbour classifiers are used [14]. Precision, Recall, F-Measure, Accuracy, Matthew Correlation co-efficient, and ROC Area are used to evaluate classification performance. A two-stage data pre-processing method for software failure prediction models and semi-supervised deep fuzzy C-mean clustering feature extraction is presented [15]. The main goal is to optimise intra-cluster class and feature using deep multi-clusters of unlabelled and labelled data sets. A new strategy called conditional domain adversarial adaptation (CDAA) [16] can help with a variety of SDP problems. The CDAA has a generator, discriminator, and classifier. This is how the generator learns to move between spaces. The discriminator learns to spot the generator's bogus instances. The classifier learns to classify occurrences appropriately. In our CDAA, both classifier and discriminator loss functions propagate to generator. The enhanced wrapper feature selection (EWFS) [17] method selects features in stages while keeping previous choices in mind. This feature selection improves subset assessment while maintaining model performance. On software defect datasets of various granularities, the DT and NB classifiers were used to evaluate EWFS. This feature selection outperformed existing metaheuristics and sequential search-based WFS techniques in the experiments.
For feature exploration and categorization, neural forest (NF) [18] combines deep neural network with decision forest. After the neural network, a decision forest is connected to perform classification and guide feature representation learning. For efficient defect prediction, NF combines NN and decision forests, and the performance of this hybrid method is examined [19]. The hybrid approach [20] improved classification accuracy compared to existing methods. This method investigates the relationship between defect density, velocity, and introduction time. An integrated machine learning approach is used in ten PROMISE data sets with 22838 instances.
To see how FRFS (filter-based ranking feature selection) [21] methods affect software defect using feature selection methods that are too computationally costly. Empirically, they look at three large-scale web applications. Then they build SVP models using a random forest classifier and seven FRFS methods. To address the prediction model's low classification rates, a hybrid strategy called DELT (diverse ensemble learning technique) [22] is presented. Unlabelled test modules are predicted by majority voting. The DPAHM (Defect Prediction based association hierarchy method) [23] is used to allocate resources for coarse-level activities. FAHP (Fuzzy Analytical Hierarchy Process) is a prevalent multi-criteria decision-making method [24]. Conversely, this evaluation methodology employs a wide range of performance indicators. They may now trust study findings more, avoid misleading conclusions and set realistic restrictions. They employed 11 defect classifiers and 22 prominent performance measurements. The study used KNIME data mining and 12 NASA MDP software defect data sets.
With KMFOS, the class imbalance problem is solved [25]. KMFOS creates additional faulty instances by interpolating between two clusters. They would then spread out in the flawed dataset space. To reduce the noise, CLNI uses clusterbased oversampling. To develop an HDP model, a structured unsupervised deep domain adaptation is applied [26]. They start by combining data from both source and target projects into one statistic. The authors then develop an SNN (simple neural network) model to manage the various and classimbalanced difficulties in SDP. The hybrid defect prediction model [27] uses the cross-entropy loss function as the classification loss function to reduce distribution mismatch. A heterogeneous defect prediction approach [28], [29] addresses the issue of extreme class imbalance in real-world software datasets. Minority samples in defect data are balanced using the Majority technique based on Mahalanobis distance in the first step. Ensemble learning and joint similarity measurement are used in the second stage to identify the most relevant and representative features across the source and target projects. At last, knowledge transmission from source to target project inside Grassmann manifold space.
The PROMISE Source Code (PSC) dataset was created to expand the CNN research's initial PSC dataset [30]. Our study used 30-repetition holdout and 10-fold cross-validation. An improved CNN model was then proposed and compared to previous CNN findings and an empirical study. It is used to identify contributing elements and independent variables [31]. Defect-free modules have their bugs replaced by a negative number, while faulty modules have their bugs left alone. Negate the false values of defect-free modules while increasing the false values of defective modules. In the next step, algorithms from NASA, SoftLab, and Promise are used. RKEE [32] is preceded by feature selection and rough set-based KNN noise filtering. Remove redundant features first using the feature ranking algorithm. A rough-KNN noise filter removes noisy samples from both minority and majority classes in the second stage. Both the minority and majority classes deal with ambiguity and overlap. NASA and Eclipse data sets have been used to test our technique. www.ijacsa.thesai.org There are considerable discrepancies in data sharing between the source and destination projects, which leads to inconsistencies in metrics. First, we present a clustering-based metric matching approach. An extract multi-granularity metric feature vector unifies the metric dimension while keeping maximum information. A strategy for predicting cross-project defects [33]. That is, it converts the project's original feature space into a manifold space, then uses that manifold space to train a superior naive Bayes prediction model. FSLBDA (fewshot learning based balanced distribution adaptation) technique [34] for unique defect prediction. Under-sampling can correct class imbalance in defect datasets, but reduces the size of training datasets. They remove redundant measurements from severe gradient boosting datasets. Dual innovative approaches [35] for learning from imbalanced data sets to improve minority class forecasting accuracy. These strategies try to distinguish between oversampling and misclassification costs. Experiment findings showed that identifying problematic modules accurately reduced detection system costs by G-mean and AUC. Instance weight is determined by information gravity among source and destination domains, whereas feature load is determined by high correlation with the learning goal, low correlation with other features, and low domain difference. Using 25 real-world datasets, the suggested methodology outperforms existing CPDP (cross project defect prediction) approaches [36]. The suggested approach builds a better CPDP model by allocating weights based on the varying contribution of characteristics and cases to the predictor.

III. PROPOSED METHOD
This section summarizes the software defect classification framework, as well as the significance of each feature. There are a total of 37 software defect attributes in total, with 8 significant features chosen for model performance evaluation. The proposed framework's system block diagram is shown in Fig. 1. NASA software defect datasets must be analyzed using machine learning models. The model is trained using six classification methods in this experiment: DT, EB, RF, SVM, LM, and NN. The experiments are carried out with the help of the R programming language, which trains models to classify software defects. The RF random tree and DT entropy have used feature values for each measurement and measurement class as inputs.

A. Dataset Description
The publicly available NASA Defect Dataset of PC1 was used in this study which is presented in Table I. In the dataset, there are 759 samples and 37 features, respectively. Lines of code, normalised cyclomatic complexity, cyclomatic density, essential complexity, maintenance severity, halstead content, halstead difficulty, parameter count, and other metrics are included in the data as presented in the Table II.

IV. RESULT AND DISCUSSION
In this section, the summary of the experimental results obtained by various machine-learning models are presented. These experiments are conducted on the dataset NASA PC1 Dataset. The results obtained from various ML models are shown in Table III. In the next stage, the ML method on the dataset with all the features of confusion matrix calculated and shown in Fig. 2. The accuracy and precision are also calculated and are shown in Fig. 2 and Fig. 3. The Sensitivity and Specificity are also calculated the results are shown in Fig. 4.    In the next stage, the ML method on the dataset with significant features are applied and confusion matrix calculated and shown in Fig. 6. The accuracy and precision with significant features are calculated and the presented in Fig. 7. The Sensitivity and Specificity are also calculated the results are shown in Fig. 8.   Although this result is obtained by using with only 8 features out of 37 features, the proposed approach time consuming due to the large number of parameters in features selection. Since the proposed model utilizes only important features and avoid features which are not have high impact. Machine Learning models are used in to find out optimal feature selection and significant result improvement achieved by using Random Forest method in selection process.
The results revealed that our proposed method performed better than existing methods without significant features. The machine learning models with all features accuracy results obtained 97.50 % by using Random Forest method. The same dataset with significant features results in accuracy improvement 97.76 is achieved. Six distinct models are investigated for software defect data classification with selected features. As a result, the results of all six classification methods are compared using the outputs of the suggested feature ranking algorithms as input. The experimental results in Table IV shows that the suggested feature with an RF model have the greatest accuracy scores of all six features.   Software defect prediction method plays important role and important to prevent and predict the bugs in the software in early stages are very difficult and challenging. However, this work using machine learning models perform evaluation of defect prediction with all features used in NASA dataset. The Machine learning models like DT, EB, RF, SVM, LM, and NN are used. The evaluation process carried out using with significant features and all features. The experimental results analyzed and summarized based on confusion matrix, accuracy, precision, sensitivity and specificity. The accuracy is plays major role and error rate also evaluated by using random forest the results are improved. The comparison results with all features and significant features used with ML models shows improvements. As future work, many more ML models and performs comparison among them to make more optimal results.

ACKNOWLEDGMENT
The author wishes to thank the College of Computer Sciences and Information Technology, King Faisal University, Saudi Arabia, for providing the infrastructure for this study.