Hyperparameter Optimization of Support Vector Regression Algorithm using Metaheuristic Algorithm for Student Performance Prediction

—Improving student learning performance requires proper preparation and strategy so that it has an impact on improving the quality of education. One of the preparatory steps is to make a prediction modeling of student performance. Accurate student performance prediction models are needed to help teachers develop the potential of diverse students. This research aims to create a predictive model of student performance with hyperparameter optimization in the Support Vector Regression Algorithm. The hyperparameter optimization method used is the Metaheuristic Algorithm. The Metaheuristic Algorithms used in this study are Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). After obtaining the best SVR hyperparameter, the next step is to model student performance predictions, which in this study produced two models, namely PSVR Modeling and GSVR Modeling. The resulting predictive modeling will also be compared with previous researchers' prediction modeling of student performance using five models: Support Vector Regression, Naïve Bayes, Neural Network, Decision Tree, and Random Forest. The regression performance metric parameter, Root Mean Square Error (RMSE), evaluates modeling results. The test results show that predictive student performance using PSVR Modeling produces the smallest RMSE value of 1.608 compared to predictions of student performance by previous researchers so that the proposed prediction model can be used to predict student performance in the future.


I. INTRODUCTION
Educators need a prediction of student performance to improve student achievement. Predicting student performance is used as material for evaluating student learning so that it can facilitate the diversity of potential students, both those who excel academically [1] [2] and detect students who have the potential to experience failure [3]. Accurate prediction of student performance can also be the right policy decision in educational institutions [4].
The implementation of the Machine Learning Algorithm to predict student performance is to compare the accuracy of both classification and regression [5]. The Machine Learning Algorithms used include Neural Networks, Decision Trees, Naïve Bayes, SVM, KNN, and Logistic Regression [2][5] [6]. Support Vector Machine (SVM) is a Machine Learning Algorithm that can be used to predict student performance [5] [7][8] [9]. As for solving regression problems, SVM is better known as Support Vector Regression (SVR) [10]. SVR has good generalization ability, can be implemented for non-linear data with high dimensions, and has low computational complexity [11]. In addition, other advantages of SVR are overcoming overfitting and making predictions with data that is not too large [12]. From these advantages, SVR can be implemented in this study to predict student performance [11]. Problems often experienced by SVR occur in large-scale data, thus making significant computational processes challenging to determine optimal hyperparameter values [11] [13].

Optimal selection of hyperparameters in Machine Learning
Algorithms has been carried out using various Metaheuristic Algorithm approaches, namely Particle Swarm Optimization (PSO) [14], Artificial Bee Colony (ABC) [15], and Genetic Algorithm (GA) [16]. The SVR Algorithm is a Machine Learning Algorithm; optimizing hyperparameters in SVR modeling will increase the value of modeling accuracy [16] [17].

II. RELATED WORK
Many researchers have researched student performance prediction using Machine Learning Algorithms. Tomasevic et al. [6] predict student performance by comparing Machine Learning Algorithm modeling, namely KNN, SVM, ANN, Decision tree, Naïve Bayes, and Logistic Regression with classification and regression models. This study used data on students' past learning achievements, learning engagement, search activities, discussion participation, and demographics. The results of this study show that ANN outperforms other Machine Learning Algorithms with the best accuracy.
In the study of student performance prediction conducted by Xu et al. [18], using student activity data, as many as 4,000 students on the online duration, traffic volume, and connection frequency. The resulting classification prediction modeling is in the form of passed and failed. The Machine Learning Algorithms used include Decision Trees, ANN, and SVM. This study showed that the ANN and SVM Algorithms for predicting student achievement were the most accurate.
Cortez et al. [5] compared the accuracy of predicting student performance with classification and regression modeling using Neural Network, Decision Tree, Naïve Bayes, Random Forest, and SVM Algorithms to predict student www.ijacsa.thesai.org performance in mathematics and Portuguese. The experimental results show that in the classification case, the Naïve Bayes Algorithm produces the best accuracy for predicting student performance in mathematics and the Decision Tree Algorithm produces the best accuracy for predicting student performance in Portuguese. In the regression case, the Random Forest Algorithm has the best accuracy for predicting student performance in mathematics. In contrast, the Naïve Bayes and Random Forest Algorithms produce the best accuracy for predicting student performance in Portuguese.
This study will use previous research datasets, namely the performance of high school students in Portugal in mathematics [5], with the development of the SVR Algorithm. The choice of the SVR algorithm is because the algorithm can overcome overfitting and make predictions with data that is not too large [12]. The development of the SVR Algorithm is to find optimal hyperparameters in the SVR Algorithm using the Metaheuristic Algorithm, namely PSO and GA [14] [16]. By using optimal hyperparameters, the application of the SVR Algorithm can increase the accuracy of predictive modeling [16] [17]. So the proposed contribution of this research is developing a model predicting student performance on the SVR Algorithm with hyperparameter optimization using the Metaheuristic Algorithm, which previous researchers have not done.

III. MATERIAL AND METHOD
In this study, there are several stages needed to predict student performance. In the early stages, the collection of the student performance dataset, Split dataset to data training and data testing, Optimization of hyperparameters on SVR using the Metaheuristic Algorithm, and Modeling of Student Performance Prediction.  Fig. 1, it can be seen the stages carried out in this study. The first step was collecting the dataset. Dataset collection can be done by downloading from the UCI Machine Learning Website 1 . After processing the dataset, the dataset will be split into training and testing data, with 90% of the training data and 10% of the testing data. Training data is used when training Algorithms and looking for suitable models, while data testing is used as test data to determine the performance of the model that has been produced. The next step in this study is to model student performance predictions using the SVR Algorithm with hyperparameter optimization. This study used two models: GSVR Modeling and PSVR Modeling. After predictive modeling is generated, the next step is to evaluate the performance of the regression performance metrics using RMSE and compare it with Machine Learning Algorithm modeling done by previous researchers using the same dataset [5].

A. Data for Student Performance Prediction
This study uses a dataset of student performance at secondary schools in Portugal from the UCI Machine Learning Repository Website. This collection of student performance data comes from two secondary schools, namely the Gabriel Pereira School and the Mousinho da Silveira School mathematics. It consists of 395 instances and 33 demographic, social, financial, and academic data attributes [5]. Of the 33 attributes in this dataset, one attribute is the result of students' mathematics final exam scores, namely G3, which will be used as the target for modeling student performance predictions, so 32 attributes in the dataset affect student performance. The description of the student performance dataset used can be seen in Table I.

B. Machine Learning Algorithms
This research will focus on developing the SVR Algorithm by optimizing hyperparameters using the Metaheuristic Algorithm. At the end of the development, we will compare with other Machine Learning Algorithms used by previous researchers to predict student performance, namely Support Vector Regression (SVR), Naïve Bayes, Neural Networks, Decision Trees, and Random Forests [5].

1) Support vector regression (SVR): SVR is a development of the SVM Algorithm introduced by Vladimir
Naumovich Vapnik in 1995 [19]. SVR shows good performance in solving regression problems [11]. SVR applies the Structural Risk Minimization (SRM) method, which is a method with a focus on finding the optimal hyperplane and minimizing errors from the training data and incentive loss function, resulting in a continuous and real-value data output [20]. In this study, the hyperparameters used are C, gamma, and epsilon.
2) Naïve Bayes: Naïve Bayes is a simple probabilistic classifier that calculates a set of probabilities by summing the frequencies and combinations of values from the given dataset [21]. This Algorithm uses Bayes theorem and assumes that all attributes are independent or not interdependent, given the value of the class variable [22]. Naive Bayes is based on the simplifying assumption that attribute values are conditionally independent when given output values [23]. In other words, given the output values, the probabilities of observing together are the product of the individual probabilities [24].
3) Neural networks: Neural networks are information processing Algorithms inspired by the workings of the biological nervous system, especially in human brain cells, in processing information [25]. Neural networks consist of many information-processing elements that are connected and work together to solve a particular problem, which is generally a classification or prediction problem [26].

4) Decision tree:
A Decision tree is a predictive model technique used for task classification and prediction [27]. A Decision tree divides the problem search space into problems [28]. The process in the decision tree is to change the form of table data into a tree model. The model tree will generate rules and be simplified [29]. 5) Random forest: Random Forest Random Forest is a supervised learning Algorithm released by Breiman [30]. Random Forest is commonly used to solve problems related to classification, regression, etc. This Algorithm is a combination of several tree predictors, or it can be called a decision tree, where each tree depends on a random vector value sampled freely and evenly on all trees in the forest [31]. The prediction results from the Random Forest get the most results from each decision tree [32].

C. Metaheuristic Algorithm for Optimizing Hyperparameters in the SVR Algorithm
A Metaheuristic can be defined as an iterative generation process that guides subordinate heuristics by intelligently combining different concepts to exploit the search space used to organize information in efficiently finding a near-optimal solution [33]. A Metaheuristic Algorithm is used to help optimally find hyperparameters to produce the best accuracy value in predictive modeling [12] [17]. Determining hyperparameters in a Machine Learning Algorithm is a significant step in modeling [16]. The optimal hyperparameter is determined based on the fitness function. The fitness function is as follows.
Where � is the predicted value, is the original value of the sample dataset, and n is the total number of samples. In this study, The Metaheuristic Algorithms used to find optimal Hyperparameters in SVR are PSO and GA, which will be discussed as follows.

1) Particle swarm optimization (PSO): PSO has been developed by Kennedy and Eberhart as an optimization
Algorithm [34]. The way PSO works is based on the results of observing the social behavior of a group of birds and fish moving to a specific position to get food, which is then referred to as the best position in the multidimensional search space [35][36]. The term particle denotes a bird in a flock that collectively influences its intelligence or that of the group [37]. According to the search area, particle movement with velocity will save it as the best position as Pbest and Gbest [38]. PSO aims to get the optimal solution by minimizing the fitness function [39].
In this study, we will apply PSO as an optimizer for SVR hyperparameters, with the name PSVR Modeling. The steps taken are the initialization of the initial parameters of the PSO in the form of particle velocity, initial particle position, and iteration. Particles will update the position and velocity memory to obtain the Pbest and Gbest values [11]. The best fitness value of the iteration limit will produce the best SVR hyperparameter combination in the form of C, gamma, and epsilon in PSVR Modeling. www.ijacsa.thesai.org 2) Genetic algorithm (GA): GA is an evolutionary Algorithm inspired by the mechanism of natural selection based on Charles Darwin's theory [40]. GA was introduced in 1975 at the University of Michigan by John Holland [41]. GA is widely used to solve optimization problems [42]. GA works to find the optimum solution simultaneously at several points in one generation, and then GA manipulates the population structure symbolically as the best solution [43]. In GA, a solution is a chromosome, and a group of chromosomes is called a population. Chromosomes from one population form a new population based on the objective function or the best fitness value [44].
This study will also apply GA as an optimizer for SVR hyperparameters, with the name GSVR Modeling. The steps are to initialize the initial GA parameters in the form of the initial population and iteration limits. The initial population in the state of individuals will be selected based on the order of the best fitness function with the selection stages [16]. After that, the cross-over stage is carried out, namely the exchange of genes between one chromosome and another based on the parameter of the crossover rate. The next stage is a mutation, in which the resulting chromosome will replace one or more genes with other genes at random [45]. In the final stage, a new individual will be generated to determine the best fitness value obtained from the iteration limit to produce the best SVR hyperparameter combination in the form of C, gamma, and epsilon in GSVR Modeling.

D. Evaluation Method
The developed model will be evaluated using regression performance metric parameters in the final stage. The function of the regression performance metric parameter is to measure the accuracy of modeling predictions of student performance. This study's regression performance metric parameter is the Root Mean Square Error (RMSE). RMSE can be defined as the square root of the average value of squared errors between the actual value and the forecast value [36].
Where � is the predicted value of student performance, is the original value of the student performance sample dataset, and n is the total number of samples.

IV. RESULT AND DISCUSSION
This study uses the student performance dataset by Cortez [5], focusing on students' final exam scores in Mathematics. This student performance dataset from two secondary schools consists of 395 instances and 33 attributes. This dataset is a file with the Comma Separated Values (CSV) format in excel. After checking each row and column of data, no empty data is found in this dataset, so it can be stated that this dataset has been filled in completely. The next step is to model student performance predictions by optimizing hyperparameters in SVR Algorithm using the Metaheuristic Algorithm, which is then compared with modeling student performance predictions that have been carried out by previous researchers using Machine Learning Algorithms.

A. Results of Previous Research Results using Machine
Learning Algorithms Cortez et al. [5] have researched predicting student performance modeling using Machine Learning Algorithms, including SVR, Naïve Bayes, Neural Networks, Decision Trees, and Random Forests. The modeling results that have been produced are in Table II below.   TABLE II. RESULT OF PREVIOUS RESEARCH [5] Algorithm In Table II, the modeling results show that the best predictive modeling of student performance is obtained in modeling predictive student performance in the Random Forest Algorithm with an RMSE value of 1.75, while the worst student performance in the SVR Algorithm with an RMSE value of 2.09. In previous studies, only making comparisons of the accuracy of modeling predictions of student performance using Machine Learning Algorithms and hyperparameter optimization has not been carried out on Machine Learning Algorithms. So that in this study will improve the accuracy of modeling student performance predictions with the SVR Algorithm by optimizing hyperparameters using the Metaheuristic Algorithm.

B. Results of Optimization Hyperparameter SVR using Metaheuristic Algorithm and Modeling
This stage is the initial stage for developing the SVR Algorithm to predict student performance. Hyperparameter optimization is performed to determine the best hyperparameter composition of the SVR Algorithm as a predictive model to be developed. The settings for the hyperparameter values to be optimized are C, gamma, and epsilon by determining the range of upper and lower limit values, with hyperparameter values C = [100 -1000], gamma = [0.001 -0.009], and epsilon = [0.001 -0.009]. The predictive modeling of student performance resulting from hyperparameter optimization using the Metaheuristic Algorithm is as follows.
1) Optimization hyperparameter SVR using particle swarm optimization (PSVR modeling): In this study, PSO will be applied as an optimizer for SVR hyperparameters. The SVR hyperparameters are C, gamma, and epsilon. Meanwhile, as an optimization Algorithm, the PSO parameters will be determined by initialization, also carried out on the initial PSO parameters. The PSO parameter will be set with a total of 50 particles, while the value of C1 is 1.0 and C2 is 2.0, with a weight value of W is 0.5. The number of iterations in this research will be varied by the number of iterations of 50, 100, 250, and 500. Based on the PSVR Modeling results in Table III, the optimal SVR hyperparameter combination was obtained at C, gamma, epsilon = [103, 0.002, 0.001] at the 100th iteration is the best RMSE value of 1.608. The selection of optimal hyperparameters generated in PSVR Modeling results from searching for a combination of hyperparameter limits on SVR that has been determined using the PSO method search stages. In the PSO method, the resulting hyperparameter combination will be evaluated for its fitness value based on the Pbest and Gbest values of the iterations and predetermined PSO parameters so that the best hyperparameter combination with the smallest RMSE value is obtained.
2) Optimization hyperparameter SVR using genetic algorithm (GSVR modeling): GA will be applied as an optimizer for SVR hyperparameters in this study. The SVR hyperparameters are C, gamma, and epsilon. Meanwhile, as an optimization Algorithm, the GA parameters will be determined by initialization, also carried out on the initial GA parameters. The GA parameter will be determined by a total of 50 individuals, while the mutation coefficient value is 0.01, with a cross-over coefficient value of 0.5. The number of iterations in this research will be varied by the number of iterations of 50, 100, 250, and 500. Based on the GSVR Modeling results in Table IV, the best RMSE value was 1.830 at the 250th iteration with the optimal SVR hyperparameter combination at C, gamma, epsilon = [100, 0.001, 0.008]. In GSVR Modeling, selecting optimal hyperparameters is also the result of searching for a combination of hyperparameter limits on a predetermined SVR using the GA method search stages. In the GA method, the resulting hyperparameter combination will be evaluated for its fitness value based on the GA stages to get the best individual based on the iteration results and GA cycle stages in the form of selection, cross-over, and mutation so that the best hyperparameter combination with the smallest RMSE value is obtained.
From the experimental results, obtained modeling of student performance with PSVR Modeling and GSVR Modeling, which will then be used to be compared with student performance modeling in previous studies.

C. Comparing and Analysis Results
In this study, modeling of student performance predictions has been carried out using the SVR Algorithm, which is optimized for hyperparameters with the Metaheuristic Algorithm, by producing a proposed model in the form of PSVR Modeling in Table III and GSVR Modeling in Table IV. In the PSVR Modeling experiment with the optimal SVR hyperparameter, the RMSE value was 1.608, and GSVR Modeling with optimal SVR hyperparameters produced an RMSE value of 1.830, so when compared to modeling student performance predictions using the Machine Learning Algorithm obtained from previous research in Table II, which will produce a comparison like in Fig. 2. Fig. 2 shows that modeling predictive student performance with PSVR Modeling gets the best results with the smallest RMSE value of 1.608 compared to the RMSE value on GSVR modeling and performance prediction modeling students conducted in previous studies. It can also be seen that the SVR algorithm gets the highest RMSE value of 2.09 compared to the Machine Learning algorithm used in previous studies for modeling student performance predictions. With this research, it can be seen that optimizing the hyperparameters in the SVR algorithm can reduce the error value or increase the accuracy of modeling student performance predictions. Fig. 3 shows the graph of increasing accuracy using hyperparameter optimization using the Metaheuristic Algorithm for the SVR Algorithm. The experimental results show an increase in the accuracy of the SVR algorithm with the proposed model. GSVR Modeling shows an increase in accuracy of 12.44% with a decrease in the error value with RMSE from 2.09 to 1.830 compared to the SVR Algorithm. As a comparison, the best improvement is PSVR Modeling which shows an increase in accuracy of 23.06% with a decrease in the error value with RMSE from 2.09 to 1.608 compared to the SVR Algorithm.  Modeling student performance predictions using an accurate Machine Learning Algorithm can predict student performance so that appropriate strategies can be determined to improve student learning outcomes. In previous research [5], modeling predictions of student performance were compared using several machine learning algorithms. This study has developed a predictive model for student performance by optimizing the hyperparameters in SVR using the Metaheuristic Algorithm, namely PSO and GA, to produce a proposed model with two models, PSVR Modeling and GSVR Modeling. In predicting student performance predictions using PSVR Modeling, the prediction accuracy is the best compared to predicting student performance using other Machine Learning Algorithms with an RMSE value of 1.608. The increase in the accuracy of the RMSE value was also generated by modeling predictions of student performance with PSVR Modeling of 23.06% compared to predictions by modeling student performance using the SVR Algorithm. This experiment shows that the student performance prediction model with the proposed model can be used to predict student performance in the future. In this study, the selection of optimal hyperparameters in the SVR Algorithm has been proven to increase accuracy in predicting student performance. Future research is expected to be able to conduct experiments by setting hyperparameters on C, gamma, and epsilon with a more varied range of values so that it is possible to obtain even better predictive modeling accuracy results.

VI. FUTURE WORK
In future research, further the development will be carried out on predicting student performance modeling using the Feature Selection Method with Metaheuristic Algorithms. So modeling student performance predictions using the feature selection method will produce features that influence student performance predictions and increase the accuracy of the resulting model.

ACKNOWLEDGMENT
The first author is a doctoral student at the Faculty of Engineering, Universitas Sriwijaya. The authors would like to thank Universitas Sriwijaya for their support in carrying out this research.