Predicting Students’ Performance of the Private Universities of Bangladesh using Machine Learning Approaches

Every year thousands of students get admitted into different universities in Bangladesh. Among them, a large number of students complete their graduation with low scoring results which affect their careers. By predicting their grades before the final examination, they can take essential measures to ameliorate their grades. This article has proposed different machine learning approaches for predicting the grade of a student in a course, in the context of the private universities of Bangladesh. Using different features that affect the result of a student, seven different classifiers have been trained, namely: Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic Regression, Decision Tree, AdaBoost, Multilayer Perceptron (MLP), and Extra Tree Classifier for classifying the students’ final grades into four quality classes: Excellent, Good, Poor, and Fail. Afterwards, the outputs of the base classifiers have been aggregated using the weighted voting approach to attain better results. And here this study has achieved an accuracy of 81.73%, where the weighted voting classifier outperforms the base classifiers. Keywords—Prediction; machine learning; weighted voting approach; private universities of Bangladesh


I. INTRODUCTION
Nowadays various statistical and machine learning algorithms are applied in different fields, such as marketing, health and medical issues, weather forecasting, socioeconomic behavior analysis, etc. It has emerged to educational data also. From the perspective of Bangladesh, the number of the private universities is much more than the number of the public universities. Currently in Bangladesh, there are one hundred and five private universities [1]. As a result, the number of students in the private universities is much higher than the public universities. Since we cannot imagine the development of the higher education except the development of the quality of the private universities, it is necessary to focus on the students of the private universities.
The databases of the different universities store a large volume of data. This data include the data of the students, teachers, and employees of the universities. By analyzing this data, different patterns can be derived which will be helpful to make decisions. Using diverse machine learning and data mining techniques on these data, many kinds of knowledge can be discovered and this knowledge can be used to predict the enrolment status of the students in a course, to detect illegal activities in the online examination and, to identify unusual marks in the result sheet, etc. [2]. Different statistical analysis and machine learning algorithms can be applied on the data of the students of the universities for predicting the grades of different courses that they have taken in their undergraduate level. There is a massive growth in the number of students who are getting admitted in different public and private universities of Bangladesh. A vast portion of these students can not gather proper skills and knowledge in their four years' tenure of the university life. Not only practical knowledge, a huge number of students come out of universities with low scoring results. As they lack both theoretical knowledge and practical knowledge, it becomes very hard for them to place themselves in job markets. If the students can predict their grades or results before their final examinations, they can take necessary actions to improve their results. Then the teachers can also identify which students are at risk and so they can guide the weak students properly and help them to recover [3]. For predicting the performance of the students predictive modeling can be used. Several methods can be used for building a predictive model like: classification, regression, categorization. Among these methods, classification has the most popularity [4].
After the accomplishment of this research, it will help to find out the different approaches to predict the students' final grades as well as determine the best approach for performing the prediction.
The main objectives of this research are: to predict the final grades of the students in a course using different machine learning algorithms, to forecast whether a student is at risk of failure in the final examination or not, and to compare the results of different machine learning algorithms for identifying which algorithm gives the best performance.
The residue of the paper is structured as follows: Section II describes the related works, Section III exhibits the entire methodology. The results are discussed in Section IV, the conclusion is discussed in Section V. Finally, Section VI represents the future works.

II. RELATED WORKS
Yadav and Pal [2] performed a study on predicting the results of the students of 1st year of Engineering. They collected data from the enrolment form which were filled by the students during their admission in VBS Purvanchal University, Jaunpur. With this dataset, they built models using different variations of the Decision Tree algorithm for classifying the students' performances in the year final examination of the first year. They showed that C4.5 obtained the highest accuracy of 67.78%.
Liu and Zhang [6] gathered 210 records of the students. The dataset contained the marks of some major subjects and with this dataset, they trained C4.5 classifier for predicting whether a student would pass or fail.
Sweeney et al. [7] proposed a system for predicting the grades of the students for the next enrollment term. They applied two classes of methods: Simple Baselines and Matrix Factorization (MF) based methods. The lowest prediction error was achieved by the Factorization Machine (FM) Model of Matrix Factorization based methods.
Yadav et al. [8] performed a comparative study among the CART, C4.5, and ID3 algorithms for predicting the end semester marks. The dataset contained a variety of attributes like: marks achieved in the last semester, grades obtained in the class test, attendance marks, lab work performances, etc. They used WEKA explorer as the data mining tool.
Minaei-Bidgoli et al. [9] performed their study on LON-CAPA, which is an online education system. Firstly, for the classification purpose they used diverse base classifiers like: Parzen window, 1-Nearest Neighbor (1NN), K-Nearest Neighbor (KNN), Quadratic Bayesian Classifier, Multilayer Perceptron (MLP), and Decision Tree. For improving the accuracy, they also made use of a combination of the classifiers. Finally, for optimizing the accuracy of the combination of the classifiers they used Genetic Algorithm (GA). They found that Genetic Algorithm increased the accuracy by 10-12%.
Z. Iqbal et al. [10] found that the CGPA of a student in the degree program is high, if his university entry test score and HSSC (Higher Secondary School Certificate) score is high. They compared the performance of Restricted Boltzmann Machine (RBM), Matrix Factorization (MF), Collaborative Filtering (CF) and showed that RBM exhibited the best performance.
A comparative study between four distinct models: Stepwise Polynomial Regression, Linear Decision Rule, Linear Multiple Regression, and a simple Artificial Neural Network were proposed by Gorr et al. for predicting students' GPA [11].
Meier et al. [12] stated that the timely prediction of the final grade is also significant. So they proposed an algorithm that could not only predict the final grades but also performed timely prediction using previous performances of the students.
Jishan et al. [13] showed that preprocessing the data with the combination of Synthetic Minority Over-Sampling and Optimal Equal Width Binning significantly improves the accuracy of predicting students' final grades.
Socio-demographic data of over 450 students, which were collected during the time of enrollment at the Open Polytechnic of New Zealand were analyzed by Kovacic [14] for predicting students' success. For classification purposes he applied CHAID and CART algorithms and showed that CART transcended CHAID.
Hijazi and Naqvi [15] identified several factors that influenced the students' performance in the intermediate examination using simple linear regression. They found that class attendance, family income, mother's education, and study hours per day have a proportional relation with the student's performance, and mother's age has a reverse proportional relation with the result.
Mia et al. [16], proposed different machine learning techniques for predicting the registration status of the private university's students of Bangladesh. Among the different classifiers, Support Vector Machine outperformed all other classifiers and achieved an accuracy of 85.76%.
Biswas et al. [17] used diverse machine learning classifiers to predict the enrollment and dropout status in the postgraduation level. For this work, they collected the dataset from a renowned public university of Bangladesh. They computed the performance evaluation metrics for each of the classifiers. Finally, they found that the locally weighted learning outstrips the other classifiers.

III. METHODOLOGY
This section is divided into three subsections: data description, algorithms description, implementation procedures. The subsections are briefly described below.

A. Data Description
The dataset used in this study has been obtained from a reputed private university of Bangladesh. It contains the records of 400 students of diverse courses of different departments from Summer 2018 to Fall 2019. This research has been performed using eight attributes, among them only one attribute is the response variable and the other seven attributes are predictor variables. These variables are described below in details.
• ATTDM: This attribute depicts the attendance marks of a student.
• RTK: It represents whether a student has retaken the subject or not.
• APAQ: During the tenure of a single semester, a student has to give three quizzes in a particular course. This attribute portrays whether a student appeared in all the quizzes or not.
• AQM: The average of the obtained quiz marks is depicted by this attribute.
• MIDM: The obtained marks in the mid term examination is represented by this attribute.
• SUAS: This attribute confirms whether a student has submitted the assignment or not. • PPRE: Represents whether a student has performed the presentation or not. The above seven attributes are the predictor variables.
• FNLG: This is the only response variable. It depicts the final grade of a student after the final examination. The university follows the grading policy of University Grants Commission (UGC) of Bangladesh which is shown in Table I [18].
The final grades are categorized into four categories. If a student achieves A+, A or A-, then his grade is categorized into the category 'Excellent'. B+, B and B-are considered as 'Good'. The letter grades C+, C, D are categorized as 'Poor', and the grade F is considered as 'Fail'. The possible values, data types, and variable types of different variables used in this research are shown in Table II.

B. Algorithms Description
The algorithms used in this research are described below in details.
1) Support Vector Machine (SVM): Support Vector Machine (SVM) tries to separate two classes using an optimal hyperplane [19]. It uses supervised learning. SVM works better, if the size of the data is small [20]. It attempts to make the decision boundary to such a degree that the partition between two classes is as broad as could reasonably be expected. To separate two classes, let's assume we are given a training data set, D = (x 1 , C 1 ), (x 2 , C 2 ), ..., (x N , C N ) where x i denotes input vector and C i refers to the class label of the vector which could be specified as either positive or negative. For specifying any unspecified vector X, the condition is as follows: Here, the nonzero coefficients are a i (i = 1, 2, ..., N ) and the bias is represented by b [21].
2) Logistic Regression: The relationship between different variables are settled by regression analysis. If the relationship is linear, then Linear Regression analysis can be applied. But in the case of nonlinear relationship between the variables, we can't apply Linear Regression and Logistic Regression can be introduced then. Logistic Regression is a generalized form of Linear Regression [22]. Consider the following equation for the Linear Regression: Here, y is the response variable and Z 1 , Z 2 , Z 3 , ........Z n are the predictor variables. By applying the sigmoid function on the equation, we can get the logistic function.
3) K-Nearest Neighbor (KNN): A simple, non-parametric supervised learning algorithm is K-Nearest Neighbor algorithm, which can be used for both regression and classification. Based on the feature similarity (e.g. distance function), all the available cases are stored and new cases are classified by it. The output is a class membership in KNN classification. A case is categorized by a predominance vote of its neighbors. The case is allotted to the utmost common class among its K nearest neighbors. Various heuristic techniques can select the value of K (positive integer) in KNN method. The case will be assigned to the class of its nearest neighbor if K=1 [23]. Different distance functions like: Minkowski Distance, Manhattan Distance, Euclidean Distance are used in KNN algorithm. In this work, Minkowski Distance function has been used. The Minkowski Distance for two points U (u 1 , u 2 , ...., u n ) and V (v 1 , v 2 , ...., v n ) can be represented by the following equation, where q represents the order of the Minkowski Distance.
4) Decision Tree: Decision Tree classification uses tree like structures. The internal nodes of the tree represent the conditions and the external nodes or the leaves represent the class labels. The branches from the internal nodes represent the outcomes of the tests or conditions. The decision of splitting the data is controlled by entropy and which can be defined by the equation below, where p j is the probability of the j th class.
Different variations of Decision Tree are available as for instance: ID3, C4.5, CART, etc.

5) AdaBoost:
AdaBoost stands for Adaptive Boosting classifier. A set of weak classifiers is combined into a strong one using this approach. Here, the following equation represents the classification using the AdaBoost algorithm.
Here, the m th weak classifier is represented by f m and θ m represents the corresponding weight.

6) Multilayer Perceptron (MLP):
Multilayer Perceptron is a form of feedforward neural network and it consists of multiple layers of neurons. A neuron of one layer interacts with the neurons of its adjacent layers through weighted connections though there exists no connection between the neurons of the same layer. Excluding the input and the output layers, the MLP has one or more hidden layers or intermediate layers [24]. The error of the k th output node in the data point n can be represented by the equation below where d and c represent the actual and predicted values respectively.
7) Extra Tree Classifier: A variant of Random Forest known as Extra Tree Classifier was first introduced by Geurts et al. [25]. Extra Tree Classifier differs from other tree based classifiers in such a way that it uses the entire learning sample for growing the trees and it chooses cut-points for splitting the nodes fully at random.

8) Weighted Voting Classifier:
Voting Classifier is an approach for combining the outputs of different base classifiers as it is hard to identify a specific classification algorithm that gives the best accuracy on a certain data. Both homogeneous and heterogeneous models can be aggregated using the Voting Classifier. In the weighted voting approach, a weight or coefficient is assigned to each base classifier which is proportional to the base classifier's individual accuracy [26]. Consider h 1 , h 2 , h 3 , .......h n are the outputs of n-different classifiers respectively and s 1 , s 2 , s 3 , ......, s n are the assigned weights to each classifier, respectively, then the final output H of the Weighted Voting Classifier can be represented by the following equation.

C. Implementation Procedures
The implementation procedures are illustrated in this section. To carry out the study, Python and Scikit-learn library have been used. The graphical form of the stepwise procedures for predicting students' final grades is represented by Fig. 1. The details of Fig. 1 is depicted below.

1) Input Data:
After collecting the data of 400 students via the Enterprise Resource Planning (ERP) system of the university, the task of inputting the data in the proposed system has been performed in this step.
2) Data Preprocessing: Data preprocessing step is categorized into two categories, namely: Data normalization and Encoding the categorical data into numeric data. In the collected dataset, the attendance marks range from 0 to 7, the average quiz marks range from 0 to 15 and the obtained mid term examination marks range from 0 to 25. Under the circumstances, these three predictor variables are in very different ranges. So, normalization of these three attributes has been performed. After the normalization procedure, the values of these three variables range from 0 to 1. There are some categorical data in the dataset. Algorithms like Decision Tree algorithm can work effectively with categorical data but most of the other algorithms give better performance while using numerical data instead of its categorical counterpart. Hence, the categorical data have been encoded into numerical data using the Label Encoding approach of the Scikit-learn library.
3) Data Splitting: Splitting the dataset follows the data preprocessing step. This step splits the dataset into training data and test data. In this work, 74% of data is used for training purposes and the rest 26% of data is used for testing.

4) Training and Testing using Base Classifiers:
In this step, the seven base classifiers have been trained with the training data. And after training the classifiers, the prediction of the final grades of the students has been performed using the test data. Accuracy of each base classifier is also measured separately.

5)
Aggregating the Outputs of the Base Classifiers: Eventually, using the Weighted Voting Classifier, this step aggregates the outputs of these seven base classifiers for achieving better performance.

7) Final Decision:
According to the outcomes of the evaluation metrics, the best classifier for predicting the final grades of the students has been selected in this step.

IV. RESULTS
For testing purposes, the records of 104 students have been used. Among these records, 38% records are actually classified  as "Excellent", 23% records are actually classified as "Good", 36% records are actually classified as "Poor" and the other 3% records are originally classified as "Fail".
The confusion matrices of the result of this study using SVM, Logistic Regression, KNN, Decision Tree, AdaBoost, MLP, Extra Tree and Weighted Voting Classifier are presented in Table III.
The calculated Accuracy, Precision, Recall, F-1 Score and AUC (Area Under Curve) are shown in Table IV. The Area Under the Curve (AUC) for different classifiers has been measured also. When the value of AUC for a certain classifier is 1.0 then the classifier is considered as a perfect classifier and if the value of AUC is 0.5, then the classifier is considered as a worthless classifier. Here the achieved AUC value for the Weighted Voting Classifier is 0.90 which has surpassed the AUC value of other base classifiers and the lowest AUC value was achieved by the Logistic Regression. Additionally, this study has been performed by reducing the number of class labels also. In this task, the classes 'Excellent' & 'Good' have been categorized as 'Higher Order Grades' and 'Poor' & 'Fail' classes have been categorized as 'Lower Order Grades'. After that the performance of the proposed approach has been measured again. Table V represents the performance metrics using two class labels. From this table, it is found that the Weighted Voting Approach has gained the highest accuracy of 93.26%. Comparing Table IV and Table  V, it can be observed that the accuracy has been significantly increased by reducing the number of class labels.
In both cases, the Weighted Voting Classifier has improved the accuracy. So, it can be confirmed that the Weighted Voting Classifier has overshadowed the other classifiers undoubtedly.
This study uses a dataset of 400 students of different courses and different departments. As the study has gathered the data from a variety of departments and with this dataset the proposed approach gained an accuracy of 81.73%, so, it can be assured that the proposed approach is reliable enough.

V. CONCLUSION
This study has used seven base classifiers to predict the students' final grades and then combined the outputs of the base classifiers using weighted voting approach. And from the observation, it can be confirmed that aggregating the base classifiers using the weighted voting approach has caused a rise in the accuracy. From the achieved AUC values it can be also stated that the Weighted Voting Classifier is almost the perfect classifier for classifying the accumulated dataset.
The limitation of this study is, it has not shown any comparison among the performance of this proposed approach and other approaches' performance, illustrating other study.

VI. FUTURE WORKS
This work is performed by using the dataset of only one private university of Bangladesh. In future the dataset can be enlarged by collecting data from different private and public universities of Bangladesh to achieve better performance and better accuracy. Moreover, a comparative study between the proposed approach of this work and the approaches presented in other works can be performed in future. Different studies show that by preprocessing the data using discretization method and oversampling techniques like Synthetic Minority Over-sampling Technique (SMOTE) can result in an increase of the accuracy. By using these approaches, preprocessing of the gathered dataset can be performed to get better performance.