The Most Efficient Classifiers for the Students’ Academic Dataset

Educational institutions contain a vast collection of data accumulated for years, so it is difficult to use this data to solve problems related to the progress of the educational process and also contribute to achieving quality. For this reason, the use of data mining techniques helps to extract hidden knowledge that helps in making the decisions necessary to develop education and achieve quality requirements. The data of this study obtained from the College of Business and Economics at Qassim University. Three of the classifiers were compared in this study Decision Tree, Random Forest and Naïve Bayes. The results showed that Random Forest outperforms other algorithms with 71.5% of Precision, 71.2% F1-score, and also it got 71.3% of Recall and Classification Accuracy (CA). This study helps reduce failure by providing an academic advisor to students who have weaknesses in achieving a high-Grade Point Average (GPA). It also helps in developing the educational process by discovering and overcoming weaknesses. Keywords—Data mining; student performance; classification algorithms; evaluation


I. INTRODUCTION
Over the past few years, the world has seen rapid progress in technology. This progress led to the accumulation of information and data and its availability in all sectors systems, such as educational, health, social, and others. This data can be used to discover and analyze the obstacles and problems facing these sectors by employ data mining techniques [1], [2]. Data mining is considered as an interdisciplinary approach and an essential step in knowledge discovery [1]. It is used to extract useful and hidden information from large databases. Through the use of data mining tasks, it is also possible to answer questions that cannot be known through other techniques such as queries or reports [3]. The essential function of data mining is to use various algorithms such as classification, clustering, regression and association rules to discover hidden patterns that help in many important decisions [2]. The classification technique is a supervised learning task, the data in this method is classified into pre-defined classes. It is a frequently used method for creating models that are used to predict futuristic patterns. Examples of classification algorithms are Decision Tree, Naïve Bayes, Logistic Regression, K-nearest neighbor, Neural Networks, etc. [4]. The process of discovering knowledge involves several steps: collect data, clean data, pre-processing data, and then using data mining techniques. This paper aims to use data mining techniques to examine student performance, the classification algorithms will be used to classify the student's GPA through the use of historical data from the College of Business and Economics at Qassim University from 2014 to 2018.
Moreover, the importance of these results was situated in their practical application in educational institutions, as recommended, to use the best classification model on any academic data such as data of university students, institutes, schools, etc., for students' performance prediction. Furthermore, classify students in many aspects that assist the institution to enhance the educational process.
The remaining of the paper is divided into the following sections: the second section reviews the work related to this study, the third section the methodology of the study, the fourth section includes the results and the discussion, whereas the last section presents the conclusions.

II. RELATED WORKS
The research by Ramaphosa et al. [5] was about primary schools students from four cities in South Africa. The goal of their study was to recognize a predictive algorithm to detect learners' performance and make appropriate decisions for improvement. They analyzed the data using WEKA tool by employing classification algorithms namely Naive Bayes, BayersNet, J48 and JRip. They proved by their results that the J48 algorithm is the best model of prediction when compared to other algorithms by 99.13% classification accuracy. Ultimately, they reported that their study assists the schools in early discovering the academic performance of learners and enable stakeholders to improve the results of weak students.
According to the study by Abu Amrieh et al. [6], they found there is a relationship between the academic performance of the student and the behavior of the student (student interaction with the e-learning system). In their study, they used the dataset from Kalboard 360 which contains 500 records and 16 attributes. The student performance was predicted by applying classification algorithms which are Decision Tree, Artificial Neural Network and Naive Bayes by using WEKA tool. Besides, for improving classifier performance, they implemented ensemble methods namely Boosting, Random Forests and Bagging. Their results showed a robust relationship among academic performance and the behavior of the student, where the predictive model with behavioral attributes achieved higher accuracy than the predictive model without behavioural attributes. Furthermore, they observed an improvement in accuracy when they used ensemble techniques. Finally, they explained that this model www.ijacsa.thesai.org supports stakeholders in understanding students and identify weaknesses and develop their learning process in addition to reducing failure. Another study by Al-Noshan et al. concentrated on a set of important factors affecting the students' performance in the first year of the university [7], likewise Al-Rofiyee et al. [8]. Also, in [9], [10], [11] the authors compared the classifiers accuracy but using a medical dataset.
On the other hand, Rahman and Islam [12] applied four traditional classification algorithms which are K-NN algorithms, Naïve Bayes, Decision Tree and Artificial Neural Network algorithms. Besides, they used bagging ensemble method, boosting ensemble method and at last ensemble filtering technique, which helps extract hidden knowledge from student data that makes it easier for educational establishments to improve their quality of education. Their results indicated that the ensemble filtering technique obtained the best accuracy among all the algorithms.
Roy and Garg [13] applied the J48, Naïve Bayes, and MLP. The results showed that J48 obtained 73.92% accuracy which was the highest accuracy among the used algorithms. The objective of their study was to identify and predict the factors that affect student academic performance, where the performance of students can be affected by different attributes such as related to school, social and demographic.
The aim of the study by Guerra et al. [14] was to predict the performance of students in specific courses. In their study, they applied Decision Tree techniques on the dataset of IFMS in Brazil from 2012 to 2015 by utilizing WEKA tool. Their results showed that the J4.8 classification algorithm achieves the best results with cross-validation and pruning by 75.8%.
Ahmed and Elaraby [15] applied classification techniques to predict students' performance in the final assessment. They collected the dataset from the information system department from the year 2005 to 2010. The tool used in this work is WEKA by applying a Decision Tree algorithm (ID3). Through their results, they explained that their study help improves student performance as well as identify students who need the advice to guide them and make the appropriate decision.
The study by Tsiakmaki et al. [16] aimed to predict students' marks in the final exams of the courses of the second semester based on the first semester grades. They used a dataset from the Business Administration department of the TEI of Western Greece from 2013 to 2017 which contains 592 students. They only applied methods of regression using the WEKA tool, namely Linear Regression, Bagging, M5 algorithm, Gaussian processes (GPs), M5-Rules, Sequential Minimal Optimization (SMO), Random Forest and 5NN. The evaluation measure used in their study was MAE. After all the experiments they had done on the data set, they concluded that all the algorithms had achieved fair accuracy.
P´erez et al. [17] presented their initial results that prediction of attrition of students from a large dataset of Systems Engineering (SE) undergraduate students after six years of registration at a Colombian university, the dataset includes 762 students. In their study, they applied four algorithms which are Decision Tree, Random Forest, Naive Bayes and Logistic Regression. Then, they found that SE courses performance is linked to mathematics and physics courses performance where they obtained the best AUC from Random Forest by 97% in the 3rd semester. These results showed them plainly that the courses which related to Systems Engineering have a dominant effect in predict dropout.
The objective of the study by Adekitan and Salau [18] was to perform predictive analysis to determine the final CGPA of graduation using their GPA of the previous three years as well as to define the class to which the student belongs at graduation. They applied six algorithms namely Decision Tree, Tree Ensemble, Random Forest, Naive Bayes, Logistic Regression and the Probabilistic Neural Network on the dataset which was gathered from Covenant University at Nigeria for the engineering students. The tools used in their study were KNIME and MATLAB. Their results demonstrated that the logistic regression obtained the best accuracy by 89.15%. Hence, they pointed out that students' results can be predicted the last year of their study using their performance in the previous three years. On the other hand, a few of studies included large datasets with records ranging from 14,333 records to 21,314 records.
Yulianto et al. [19] applied classification algorithms to student data to identify features that affect student achievement. Besides, they expected that the results of the analysis would be able to find the reasons that led to the delay of some students in the study period. They used two models of algorithms k-Nearest Neighbor and Decision Tree C4.5. They concluded that the k-Nearest Neighbor got better accuracy than the other. Quinn and Gray [20] used data from the Moodle to predict students 'grades whether they will succeed or fail in the course. They applied the classification algorithms Random Forest, Gradient Boosting, k-Nearest Neighbours and Linear Discriminant Analysis using R. They summarized that the use of data from Moodle gives the ability to early detection of students at risk. The aim of the study by Walia et al. [21] was to build classification models to predict academic performance for students through the use of classification algorithms Naive-Bayes, Decision Tree, Random-Forest, JRip, and ZeroR. The results indicated that the school and study time were influential factors in the students 'final grades.
A comparison of classification algorithms has been applied in several fields like emotion classification, precipitation, Spatial modelling of storm dust provenance etc. Fauziastuti et al. [22], classified students' graduation on time or overtime, by used two classification algorithms to compare their performance: Naive Bayes Classifier and K-Nearest Neighbor, but using a small dataset.
A similar research paper was in emotion classification to find the best classifiers amongst a set of classifiers [23], whereas in this paper we are concentrating on the extraction of the hidden knowledge embedded in the academic data of undergraduate students by a set of classifiers to find the best classifier for getting the hidden knowledge from this kind of data. The study achieved by Lazri et al. [24], focused on estimating precipitation from Meteosat Second Generation images, by combining six models of classification. They also used a linear regression model. Likewise, a study conducted by www.ijacsa.thesai.org Gholam et al. [25], was applied eight classification algorithms, for spatial maps to predict the source of dust in Khuzestan.

III. METHODOLOGY
The methodology used focuses on the use of classification algorithms in analyzing student performance to discover hidden patterns that help officials make the necessary decisions in the educational process.
The knowledge discovery process consists of four phases: data collection, pre-processing, data mining technique (classification), and interpretation of results, as in Fig. 1. The tool used in this study is the Orange data mining platform.

A. Data Collection
Data was collected from the College of Business and Economics from 2014-2018, which contains 72259 records for male and female students from several majors. The dataset contains the following attributes: Semester, Course code, Course name, CRD hours, Gender name, Entry date, Confirmed mark, Grade, Cumulative GPA (CGPA), Semester GPA (SGPA), Student status, Major name and Student level.

B. Data Preprocessing
Real data is usually incomplete and inconsistent due to individual errors or computer errors. Therefore, before starting to use data mining techniques, pre-processing of data is required. The process of data pre-processing includes first, clean the data from missing values, as the data was cleaned by the Orange program through the use of Impute widget. After cleaning the data, the number of records became 52,430. Second, data transformation, where the students' GPA was classifying into five categories as follows: This classification was done using the Feature Constructor Widget on the Orange platform based on the CUM_GPA attribute and the new attribute was named as Class_GPA. The first class is Excellent, second class is Very Good, third class is Good, the fourth class is Average and the fifth class is Fail.

C. Data Mining Techniques (Classification)
The classification method is known as supervised classification, where, the data are organized into given known classes. The dataset in classification is divided into a training dataset and test dataset. The classification algorithm is trained through a training dataset to build a model and test the model by test dataset since this model is used later to classify new data [26]. For example, predict students' performance by using the classification of GPA to good or bad. Algorithms used in classification such as Decision Trees, Random Forests or Bayes models. The classification techniques that were used in this study include Decision Tree, Naïve Bayes and Random Forest.
The classification algorithms are connected to Predictions widget to shows models predictions on the data. Hence, to evaluate the performance of models, we have focused on four different metrics which are CA, F1-score, Precision and Recall, given in Equations (1-4), where true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). Moreover, the dataset was divided into a training set, and a test set by using a fixed proportion of data in Data Sampling widget, 75% of the data were used for the training set and 25% for the test set. The target variable is Class_GPA. Fig. 2 shows the model workflow of the classification task.

CA
(1) Predictions widget was used to recognize model predictions on the data. Data Sampler widget is used to sample the data by using a fixed proportion of data. The dataset was divided into training data by 75% and test data by 25%. The data sample was sent to three algorithms widgets by Data Sampler widget so that they can produce the corresponding model; after that, the models were sent into Predictions widget while the remaining data was directly sent to Predictions widget.

IV. RESULTS AND DISCUSSION
In this section, we presented the evaluation results of Decision Tree, Random Forest and Naïve Bayes. Fig. 3 shows the Predictions widget, which presented data with added predictions and the results of testing classification algorithms.
The widget received the dataset and then constructed a predictive model with Decision Tree, Random Forest and Naïve Bayes widgets, and it found the probabilities in predictions. Table I presents the evaluation results of the classification. As we can see from the table, the Random Forest was the best classifier with 71.5% of Precision, 71.3% Recall and CA, and also it got 71.2% of F1-score. While the worst algorithm was the Naïve Bayes with 60.5% of Precision, 59.4% of CA and Recall, 59.5% of F1-score.
Confusion Matrix will be displayed that aims to assess the predictive performance of the models for each class to recognizing prediction of TP, FP, TN, and FN. The class labels are Excellent, Good, Acceptable and Fail. Table II, Table III,  and Table IV illustrate the confusion matrices for Naïve Bayes, Random Forest and Decision Tree, respectively.     4 shows the evaluation of the three models Naïve Bayes, Random Forest and Decision Tree. The models were evaluated using four measures CA, F1-score, Precision and Recall Through the figure, we notice that the Random Forest outperform in all measures than other algorithms, followed by decision tree algorithm. As for the worst model was Naïve Bayes.

V. CONCLUSIONS
Educational institutions often require an analysis of student data to obtain useful knowledge that contributes to enhancing the learning process in addition to achieving quality in education. For this reason, data mining techniques were used to extract hidden knowledge from student data, and a comparison was made between three classifiers, Naïve Bayes, Random Forest and Decision Tree. Experimental results showed that Random Forest exceeded other classifiers with the accuracy of 71.3%, followed by the Decision Tree by 69.8%, then the last classifier was the Naïve Bayes by 59.4%. This study helps to know students' performance in advance by relying on previous results to improve their achievement in the future. Also, educational institutions must provide an academic adviser to failed students to enhance their academic performance.