Bayesian Hyperparameter Optimization and Ensemble Learning for Machine Learning Models on Software Effort Estimation

—In recent decades, various software effort estimation (SEE) algorithms have been suggested. Unfortunately, generating high-precision accuracy is still a major challenge in the context of SEE. The use of traditional techniques and parametric approaches is largely inaccurate because they produce biased and subjective accuracy. Meanwhile, none of the machine learning methods performed well. This study applies the AdaBoost ensemble learning method and random forest (RF), on the other hand the Bayesian optimization method is applied to determine the hyperparameters of this model. The PROMISE repository and the ISBSG dataset were used to build the SEE model. The developed model was comprehensively compared with four machine learning methods (classification and regression tree, k-nearest neighbor, multilayer perceptron, and support vector regression) under 3-fold cross validation (CV). It can be seen that the RF method based on AdaBoost ensemble learning and bayesian optimization outperforms this approach. In addition, the AdaBoost-based model assigns a feature importance rating, which makes it a promising tool in software effort prediction.


I. INTRODUCTION
Software effort estimation (SEE) is a method of estimating the amount of time it will take to build a software system in person-months or person-hours [1] [2]. Uncertainty and imprecision characterize software effort estimation environments [3] [4]. The topic of SEE, on the other hand, has been characterized as a regression problem in general [5]. Various SEE models in the literature still show considerable performance deviations and are extremely dataset-dependent. SEE has previously been accomplished via expert judgment, analogy, decomposition and recomposition, and parametric techniques [6]. Meanwhile, standard SEE methodologies can produce erroneous estimates due to human bias and subjectivity [7]. The Machine learning (ML) algorithm is very effective in modeling uncertainty as a better decision-making process [8]. However, no single learning method is ideal for all supervised learning tasks [9]. However, a single sophisticated algorithm may not be a consideration for building current SEE models, as the performance of any model mainly depends on the characteristics of the data set used, such as data set size, outliers, categorical features, and missing values.
Several machine learning techniques have been widely applied in the SEE context which have been considered as necessary steps, such as: Case-Based Reasoning (CBR), Artificial Neural Networks (ANN), Support Vector Regression (SVR) [10], while for other ML methods, such as Random Forest (RF), Classification And Regression Tree (CART) and K-Nearest Neighbor (kNN), they are still ignored. RF is a powerful, easy-to-train ensemble learner with big data [11]. RF is widely used in the data mining domain and has achieved good performance when dealing with regression and classification problems [12]; this method can overcome overfitting and is less affected by outliers [13]. On the other hand, RF improves prediction accuracy without significant computational improvement, and is not sensitive to multicollinearity [14]. Some researchers have also applied RF to solve regression problems in the context of SEE, for example [15][16] [17]. Unfortunately, the RF model can only be extended horizontally because decision trees exist in parallel and these decision trees have equal weight in voting for the final prediction even though some of these trees may underperform.
The use of Ensemble learning combines several algorithms that process different hypotheses to make their predictions perform well [18]. According to Lessmann et al. (2015), the ensemble method is better than the single ML and other statistical method [19]. While, , the proper use of the ensemble method can outperform the performance of single learners on the ML model [20], as well as being one of the best methods in increasing the accuracy and stability of the most influential estimation [21]. Averaging, voting, and bagging are three common broad ensemble approaches that have piqued the interest of machine learning researchers. Meanwhile, developing ensemble methods such as stacked generalization, AdaBoost algorithm, Gradient tree boosting have not been widely tried/ignored [22]. Ren et al. (2016) investigated the use of ensembles in classification and regression and the success of AdaBoost about regression behaviour [23]. AdaBoost, as a popular boosting algorithm, combines weak estimators and implements them on an improved version of the data with the help of weighted majority voting/hard voting [24]. However, AdaBoost may not offer high accuracy when the dataset is heavily contaminated with noise [25]. In contrast to, Martin-Diaz et al. (2017), AdaBoost is also known to achieve a *Corresponding Author. www.ijacsa.thesai.org significant reduction in bias as well [26], and low error variance [24]. Also, it is not easy to overfit during training [27].
Unfortunately, the method will need to be fine-tuned to fit the scenario at hand. Automatic hyperparameter adjustment saves time and effort when experimenting with different machine learning model configurations, improves algorithm accuracy, and increases reproducibility. Hyperparameter tuning is a well-known approach for achieving optimal performance in machine learning models [28] [29]. Several other studies have shown that the accuracy dimension in SEE is strongly influenced by choice of information estimation using parameter tuning techniques [30]. Because determining the best hyperparameter combination can improve the ML model's performance [28] [31]. However, much of the work seems to implicitly assume that tuning the parameters will not significantly change the results [32]. In the worst case, improper parameter setting may lead to inferior performance results [33]. However, the default hyperparameter setting may not produce consistent results depending on the data set used [34]. Based on the work by [35], there is still limited work investigating the impact of parameter setting for SEE methods in improving prediction accuracy.
Manual search, grid search, and random search are some of the most used hyperparameter optimization strategies [36]. When the hyperparameter space is large, however, this method is time consuming and impractical [6]. Manual search necessitates a higher level of professional knowledge, is difficult to implement without prior experience, and takes time [37]. Meanwhile, grid search suffers from a dimensional curse, which means that as the number of hyperparameters set or the complexity of the search space and the range of values of the hyperparameters increases, the algorithm's efficiency declines dramatically [37]. Random search, on the other hand, is more effective in high-dimensional space [38], but this method is not reliable for training complex models [38]. Despite the fact that this method provides automatic tuning, it can acquire the optimization goal function's ideal global importance.
Among other classic hyperparameter optimization techniques, Bayesian optimization is a very successful optimization algorithm for solving computationally expensive functions [39]. Bayesian hyperparameter optimization technique to further promote generalization accuracy [40]. Bayesian optimization is effective for problems with fewer hyperparameters and that are difficult to parallelize [6]. Therefore, it promises to encourage the use of hyperparameter tuning for further applications in SEE.
Motivated by the benefits mentioned above, the AdaBoost RF-based method of ensemble learning was developed to capture the relationships between features in an SEE context. To reduce time dependence and impracticality, Bayesian optimization method is used to find suitable hyperparameter models. The datasets in the PROMISE and ISBSG repositories have built the model in this paper. Based on literature review, there is still a limited use of the AdaBoost and RF ensemble learning methods adopted to build the SEE model. The remainder of this paper begins with related work, followed by experimental design. Then, the results and discussion containing the comparison of models. At the end, discuss the conclusions and future work.

II. RELATED WORK
Meanwhile, the literature on offline learning does not have a supervised procedure for automatic parameter setting. In the context of offline SEE, the use of Classification and Regression Tree (CART) with the addition of more innovative features can improve accuracy [28], the researchers used a grid search strategy to obtain optimal parameters for five machine learning techniques (KNN, Regression Trees (RT), Multilayer Perceptrons (MLP), Bagging+RT, and Bagging+MLP) without used generating ensembles. The results revealed that while RT, Bagging+RT, and KNN were unaffected by tuning settings, MLP and Bagging+MLP were. Minku (2019) Linear Regression in Logarithmic Scale (LogLR) results in a more stable prediction performance [41]. Meanwhile, Minku and Yao (2013) investigated the RT, Bagging+MLP, and Bagging+RT techniques shown to perform well across several data sets. This suggests that the best parameters to use with a machine learning approach may change over time [35]. In the context of SEE, parameter tuning in Support Vector Regression (SVR) is critical. A tabu search has been proposed in particular to find the best SVR parameter tuning [42]. Elish (2013) conducted an empirical study based on five machine learning approaches (KNN, SVR, MLP, Decision Tree (DT), and Radial Basis Function Network (RBFN)) that suggested a heterogeneous ensemble. Due to its irregular and unstable performance across the specified data set, the single approach is unreliable. Furthermore, across all data sets, five machine learning approaches were trained using the same configuration, but no explanation for parameter tuning was supplied [43].
Meanwhile, Villalobos-Arias and Quesada-López (2021) investigated CART, SVR, and ridge regression (RR) using random search and grid search with six bio-inspired algorithms. The results demonstrate that the Flash+Log+SVR model is the most accurate in the most data sets, while the Hyperband+Log+RR model is the most stable in the most datasets [44]. In particular, the stacked ensemble that offers the best overall accuracy in this study takes the average of the estimated effort values generated by Bagging, RF, ABE, AdaBoost, Gradient boosting machine, and Ordinary least squares regression which are optimized using grid search techniques [25]. Meanwhile, Zakrani et al. (2018) used the grid search (GS) method to improve SVR. The results show that this approach can help improve the SVR technique's performance [45]. Stacking ensemble learning uses two hyperparameter tuning (Particle Swarm Optimization and Genetic Algorithms) while base learners (linear regression, MLP, RF, and Adaboost regressor). Experimental results reveal that the estimation accuracy is higher when hyperparameters are set using PSO [6]. ROME (Rapid Optimizing Methods for Estimation) is a configuration technique that uses sequential model-based optimization (SMO) to identify configuration settings for KNN, SVR, RF, and CART techniques. For both traditional and current tasks, ROME outperforms sophisticated approaches [46]. The accuracy and stability of SVR in SEE were tested to see how a www.ijacsa.thesai.org random search hyperparameter tuning strategy affected them. According to the findings, the SVR set for random search performed similarly to the SVR set for grid search [47].
Based on the findings of previous empirical investigations in the context of SEE. This paper is different from previous research. The RF-based AdaBoost ensemble learning method reinforced by the Bayesian optimization method was used to find the appropriate hyperparameter model. Meanwhile, four ML techniques, such as: classification and regression tree (CART), k-nearest neighbor (k-NN), multilayer perceptron (MLP), and support vector regression (SVR) were used to compare the performance results of the proposed SEE model.

III. EXPERIMENTAL DESIGN
The data sets, ensemble learning, hyperparameter tuning, parameter setting ML, and evaluation measures utilized in this paper are described in depth in this section.

A. Datasets
The most widely used datasets related to the SEE context are the Repositories on PROMISE and ISBSG, among the most popular datasets. In 9 datasets from the public PROMISE repository (also known as SEACRAFT), as well as two subdatasets from the ISBSG10 and ISBSG18-IFPUG repositories. Table I lists the data set that was used in paper, including the number of projects, features, and categorical features. This paper uses eleven datasets (china, albrecht, maxwell, nasa93, cocomo81, kitchenham, kemerer, desharnais, and ISBSG10) and the preprocessing rules used in the study by [28][1], and the UCP dataset according to the regulations [48]. Meanwhile, ISBSG18-IFPUG refers to research [44].

B. Data Preprocessing
The Data Preprocessing approach in study was used to improve prediction accuracy in the end. The Data Preprocessing technique is an effective option for effort estimation [50], is a crucial step in improving machine learning performance [51]. According to Famili et al. (1997) the first step by removing features on the dataset that is not relevant. Machine learning algorithms will perform better if irrelevant features are removed [52]. Subsequent processing converts information on categorical data into numeric. Ordinal coding assigns a unique number code to each category [53], the advantage is that the dimensions of the problem space do not increase because each category is displayed as a single input [54]. Handling missing data by using kNNI (K-Nearest Neighbor Imputation). This method proved to be efficient for estimating missing attribute values in various software engineering datasets [55]. In this study, data normalization was used as a scale of values 0 and 1. For all datasets, Mensah et al. (2018) discovered that the normalized Z-score [0,1] generated the best prediction accuracy when compared to boxcox and log transformation [56]. This research will utilize two subsets at random: training (80%) and testing (20%) as a predicted evaluation of the training procedure.

C. Random Forest Algorithm
For regression purposes, the random forest is a set of tree predictors ( ) where represents the observed input vector (covariate) of length with a random vector associated with and is independent and identically distributed (iid) random vector. A regression setting where has a numeric result, , but makes multiple points of contact with the classification problem (categorical results). The observed (training) data are assumed to be taken independently of the combined distribution ( ) and consist of ( For regression, the random forest prediction is the unweighted mean of the collection ̅ ( ) ( ) ∑ ( ) . For the Law of Large Numbers ensures [57].
The quantity on the right is the prediction (or generalization) error for a random forest, designated . Convergence in Eq. (1) implies that the random forest is not overfit [57]. Determine the mean prediction error for the individual tree ( ) as: Assume that for all the tree is unbiased, eg ( ), then yields: Where ̅ is the weighted correlation between residuals ( ) and ( ) for independent .

D. Adaboost Ensemble Learning
AdaBoost is a popular variation of the original Boosting scheme [27]. AdaBoost is a robust ensemble approach for fitting a poor collection of learners to a enhanced data set. The predictions of the weak learner are merged using weighted summation, to reproduce the final prediction [58]. The Adaboost regressor is a high-accuracy ensemble learner that is used to tackle regression issues [59], Adaboost.R2, a modified version of Adaboost.R, is used for regression [27]. Given , which is applied to the training set, seeks to improve the training data in each boosting iteration, using different weights. The first update iteration uses the same weights and training data as before. The learner's algorithm is then re-applied to the new weighted data, after each initial weight is updated. If the weights for the training data that were predicted incorrectly in the previous phase are increased, the weights for the training data that were successfully predicted will be reduced. Finally, each weak learner is compelled to concentrate on the sample that the preceding one missed in the sequence [27]. In the following, Adaboost.R2 steps according to the rules [60][61]:  Set the initial weight ( ) to the training set.
 As the training set, define the original data's input (x) and target (y) variables.
 Install the regression model ( ) to the training set with the notation .
 Use the following equation to find the loss value for the i-th training sample ( ).
Any function that is , can be used as the loss function ( ). Calculate the average loss ( ̅ ) for using the following formula: When ̅ is smaller than 0.5, an appropriate forecast is produced.
 If ̅ is more than 0.5, the weights should be updated using the equation below.
 To get the required loss function range, repeat steps 4-7.

E. Bayesian with Gaussian Process Optimization of Model Hyperparameters
Bayesian optimization is a useful strategy for locating the by extremes of computationally expensive functions [39]. In this paper, Bayesian optimization is used in this research to discover the maximum value at the sampling point for the unknown function f in model hyperparameter configuration problem [62] [63]. The objective function is computationally over the compact hyperparameter domain , which aims to minimize f without using gradient information.
Thus, the hyperparameter mapping in Bayesian depends on the objective function. The target value is predicted with the historical result . A series of steps to find the hyperparameter as follows: 1) Define the historical model, 2) Find the optimal parameter, 3) Apply the detected hyperparameter to the objective function, 4) Update the model with new result, and 5) 2-4 steps are repeated until the threshold value is reached or the process exceeds the limited time.
Determines a previously small sample of points uniform at random, and computes the value of the function at those locations ( ) ( ) . Then, model f using a probabilistic model for the function. We'll take f as a Gaussian process. Because the Gaussian process is so flexible and simple to use, Bayesian optimization uses it to fit the data and update the posterior distribution [37]. For only a finite collection of points , posterior delivers a probability distribution over a particular function. The Gaussian process posits that the probabilities ( ( ) ( )) form a multivariate Gaussian distribution that is specified by the mean function ( ) and the covariance function ( ) , where is a positive definite kernel function (such as: the squared exponential kernel, the rational quadratic kernel, and the Matern kernel). The posterior predictive mean function ( ) and the posterior predictive marginal variance function ( ), both specified across the , domain and calculated in closed form, are obtained by modeling using the Gaussian process [62].
Then determine the sampling point, , from to find the location of the minimum function. Controlled by the optimization proxy of the acquisition function, .
( ) This paper, will use the expected enhancement algorithm which is defined as follows [64]: The minimum observation value of is ( ), while which expresses the expectation of a random variable at ( ). Thus, we receive the same reward as "fixing" ( ) ( ) there is no alternative reward when ( ) is less than ( ). The right-hand side of Eq. (9) can be written as given the Gaussian Process predicted mean and variance functions: Where, is its derivative, and is the standard normal cumulative distribution function, while Based on the above analysis, the basic framework of Random Forest and AdaBoost embedded in Bayesian Optimization is formulated in Fig. 1.

F. Parameter Settings ML
Classification And Regression Tree (CART), Multilayer Perceptron (MLP), Support Vector Regression (SVR), K-Nearest Neighbor (KNN), and Random Forest (RF) were employed in the experiment. Table II shows the hyperparameter search space for setting the parameter values of a single approach using Bayesian Optimization. www.ijacsa.thesai.org

G. Cross-Validation
The cross-validation methodology is a widely known method for revealing the model's true performance, and it is highly recommended by researchers [58]. For the PROMISE and ISBSG R10/R18-IFPUG datasets, will apply ten times 3fold cross-validation. This paper divided the data into three folds or groups at random. The test set is chosen, and the remaining two folds are joined to form the train set. For each schema, the model is based on a train set (a combination of machine learning algorithms, ensemble learning, and hyperparameter tuners). Within this framework, AdaBoost functions as a meta-regressor for ensemble and Bayesian optimization to provide automatic tuning that functions as an appropriate hyperparameter model optimization objective. Sub-partitioning is done via 3-fold cross-validation because the tuner does not have access to the test set. The model is retrained on the entire set with these parameters once the optimal hyperparameter values have been identified. Finally, an assessment matrix will be used to assess the model. The flowchart of the framework in the proposed regression scheme based on AdaBoost and bayesian hyperparameter tuning is summarized in Fig. 2.

H. Evaluation Metrics
Mean absolute error (MAE), root mean square error (RMSE), and R-squared (R 2 ) are the metrics that are used in the evaluation. If the MAE and RMSE are low, and the R 2 is high, the model is better. In this paper describe the experiments in depth and offer the findings of each experiment in this section. All of the tests were run in a Python environment. In this study, eleven software engineering repository data sets with various dimensions and attributes were employed. Table I lists the dataset's details, including the number of samples, characteristics, and target value. In the first step, carried out the data preprocessing stage, which was used to overcome missing data by Missing Data Treatment (MDT) using k-NNI and converting categorical data into numeric using the ordinal encoder. In the next step, will normalize the data with a scale of 0 and 1. The data has been converted to a format where powerful machine learning algorithms may be deployed to create accurate predictions in the SEE context using various data preprocessing approaches.

A. Model with Default Parameter Tuning
After passing through the data preprocessing stage, the next step will be to compare 5 ML methods (namely, CART, MLP, SVR, KNN, and RF) without setting the hyperparameters using the default parameters tuning. The purpose of the ML algorithm comparisons is to assess which algorithm is more likely to have the best performance without tuning in to different problems. The default settings in the training data are used to train the machine learning technique. More exact findings are obtained by using the ML approach with the lowest MAE and RMSE. When it comes to the R 2 value, the greater the number, the better the accuracy. For the performance of the ML method, which considers the values of MAE, RMSE, and R 2 . Where the best value is marked in bold, otherwise the worst value is marked in italics.
Tables III to V list the best possible performance values for each model and dataset, as well as the tuner who achieved them (without tuning where the parameters are set randomly within the range). Each model's performance changes depending on the dataset. For the PROMISE and ISBSG datasets, the dataset used with small/medium/large effect sizes [49]; in the first experimental stage, the default settings for the base learners model will yield the most accurate predictions. Based on the experimental results in this paper, it can be seen that RF almost usually outperforms other methods. In particular, when measured in MAE (Table III), RF achieved the best average rating, CART performed second, followed by k-NN and SVR with a slightly worse average rating, and MLP performed the worst among all related methods. Nonetheless, no significant differences could be found concerning the three methods: k-NN, MLP, and SVR. RF has advantages over other methods in most datasets with medium/large effect sizes but performs worse than CART, k-NN, and SVR in many datasets with small effect sizes. In terms of RMSE (Table IV) and R 2 (Table V)  Based on the study results in the table, it shows that the algorithm with the best performance in almost all datasets is RF. RF obtained the highest accuracy in china, desharnais, IFPUG, ISBSG10, kitchenham, Maxwell, and Nasa93. With the best accuracy value in the kitchenham dataset with MAE (0.0044), RMSE (0.0098), and R 2 (0.8474), although Desharnais owns the best R 2 value. Furthermore, the second position is obtained by CART, which has the best accuracy value on albrecht and kemerer. Meanwhile, KNN, MLP, and SVR have almost similar performance.
The lack of parameter adjustment in this situation can result in worst performance of CART, KNN, MLP, and SVR. This is due to SVR's ability to perform effectively with limited data sets. However, it is unsuccessful in dealing with outliers in training data, which is a common occurrence in real-world applications. As a result, some outliers cause regression to be poor. Meanwhile, MLP, with a small data set without any appropriate parameter tuning, can reduce the number of hidden nodes which causes a decrease in its approximation ability [65]. KNN is extremely sensitive to characteristics that are irrelevant or redundant. Because it is unclear which sort of distance and which attribute are employed in distance-based learning KNN to give the best results, and because must calculate the distance of each query instance to all training samples, the computational cost is relatively large [66]. CART performs badly on smaller data sets compared to bigger data sets. As a result, using this CART approach without considering the data's magnitude is not recommended [67].

B. Comparison Model with Hyperparameter Tuning and
Ensemble Learning Next, this model will use hyperparameter tuning based on the Bayesian-based Gaussian process. The ML method is trained with hyperparameter tuning on the training data in the training process. All tuners are used as an optimization method combined with a cross-validation procedure. Configure using cross-validation (i.e. cv: 3), verbose: 3, scoring: 'mean_squared_error', and 10 iterations. Because the datasets in this scenario are small, will narrow the search space to the most promising values based on previous research.
Next, the same experiment was repeated using Adaboost ensemble learning. With repeat the experiment to find the best convergence with 10 iterations. Configure the Adaboost ensemble learning using the maximum number of estimators at which the algorithm is terminated (n_estimator: 200), learning_rate: 1, and random_state: 0. After that, will compare the algorithms have done, aiming to assess which algorithm is more likely to be efficient and how this efficiency varies by hyperparameters tuning and reinforcement using ensemble learning on different problems. Fig. 3 shows the performance of Bayesian hyperparameter and Adaboost ensemble learning on the ML model concerning the error function based on MAE, RMSE, and R 2 .
The performance of each model varies depending on the dataset. Base learners model with parameter tuning using Bayesian which produces the most accurate predictions. In Fig. 3, it shows that the algorithm that has the best performance in almost all datasets is RF. RF achieved the highest accuracy in cocomo81, ISBSG10, kitchenham, maxwell, nasa93, and UCP. Meanwhile, CART, KNN, MLP, and SVR have almost similar performance. These results show that CART, KNN, MLP, and SVR are not very sensitive to parameters tuning, while RF is very sensitive to parameters tuning which results in stable prediction performance. This suggests that the best parameters to use with a machine learning approach may change over time.
As for the base learner model with Adaboost ensemble learning, it shows different results. Where the algorithm that has the best performance is CART, followed by RF as the second algorithm that has the best performance. While KNN, MLP, and SVR have almost similar performance. For CART, obtain the highest accuracy in albrecht, china, cocomo81, ISBSG10, IFPUG, kemerer, and UCP. This analysis of different optimization approaches reveals that the Adaboost ensemble learning optimization is the clear victor, as it can create a model with the highest test accuracy for eleven data sets. To summarize, the meta-parameter analysis for Adaboost, which was used to strengthen the basic CART model, significantly outperforms other models (on this dataset).

C. The Best Model using Adaboost with Bayesian Hyperparameter Optimization
The same experiment used the ML algorithm to set the Adaboost Ensemble learning parameters using Bayesian hyperparameter optimization. In this paper, will repeated the experiment to find the best convergence with iterations from 10 to 200. The effect of the ML model on setting the Bayesian hyperparameter values of the Adaboost ensemble model is presented in Table VI to VIII.
In particular, when measured in MAE, RMSE, and R 2 (Table VI to VIII), RF and SVR achieve the best average ratings, followed by MLP, and CART, while k-NN with slightly worse average ratings among all related methods. In this respect, RF, SVR, and MLP have advantages over other methods in most datasets with medium/large effect sizes but perform worse than CART and k-NN in many datasets with small effect sizes. CART and k-NN perform best on data sets with small effect sizes. No significant differences could be found among the three methods RF, SVR, and MLP had similar overall performance and were superior to CART and k-NN with medium/large effect sizes depending on the data set. Nonetheless, the RF method is more consistent among the best methods regardless of the metric.
This experiment shows that overall, the five machine learning models that are strengthened by the Bayesian gaussian process and Adaboost ensemble learning have almost the same performance in all datasets used. However, it can be determined that RF, SVR, and MLP have the best results in this area.

V. CONCLUSION
An enhanced hyperparameter tuning approach on an ensemble learning algorithm will be evaluated for its impact on model accuracy and stability in this study. The parameters of five machine learning models trained on eight datasets from the PROMISE repository and two subsets of data from the ISBSG R10/R18-IFPUG dataset are adjusted using this tuner. This study applies a state-of-the-art method by combining Bayesian-based gaussian processes with Adaboost ensemble learning to improve ML performance in a SEE context. Tuning, training, evaluation, and cross-validation will all be used in this project. The findings of this study show that optimizing machine learning models can considerably improve their performance. The implementation of AdaBoost ensemble learning and Bayesian hyperparameter optimization can improve the performance of the RF method. RF outperformed other methods in almost all datasets. As such, AdaBoost ensemble learning is the optimization that impacts machine learning model performance across all data sets in this scenario. On the other hand, the Bayesian optimization approach based on the Gaussian process to improve the performance of machine learning prediction models can achieve high accuracy in some cases.
More empirical research could be conducted in the future to support the conclusions of this study and to acquire knowledge utilizing different data sets. Additionally, compared or investigated various different optimization strategies, particularly for classification issues. It's also crucial to test the efficacy of various feature selection approaches, as well as increase with optimization tuning, when estimating software effort.