A Meta-analysis of Educational Data Mining for Predicting Students Performance in Programming

—An essential skill amid the 4th industrial revolution is the ability to write good computer programs. Therefore, higher education institutions are offering computer programming as a module not only in computer related programmes but other programmes as well. However, the number of students that underperform in programming is significantly higher than the non-programming modules. It is, therefore, crucial to be able to accurately predict the performance of students pursuing programming since this will help in identifying students that may underperform and the necessary support interventions can be timeously put in place to assist these students. The objective of this study is therefore to obtain the most effective Educational Data Mining approaches used to identify those students that may underperform in computer programming. The PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analysis) approach was used in conducting the meta-analysis. The databases searched were, namely, ACM, Google Scholar, IEEE, Pro-Quest, Science Direct and Scopus. A total of 11 scientific research publications were included in the meta-analysis for this study from 220 articles identified through database searching. The residual amount of heterogeneity was high (τ2 = 0.03; heterogeneity I 2 = 99.46% with heterogeneity chi-square = 1210.91, a degree of freedom = 10 and P = <0.001). The estimated pooled performance of the algorithms was 24% (95% CI (13%, 35%). Meta-regression analysis indicated that none of the moderators included have influenced the heterogeneity of studies. The result of effect estimates against its standard error indicated publication bias with a P-value of 0.013. These meta-analysis findings indicated that the pooled estimate of algorithms is high.


I. INTRODUCTION
An essential skill not only in IT programmes in higher education but other disciplines as well is the ability to write good computer programs. However, the failure rate of programming relative to other subjects that students pursue is significantly higher [1]. Furthermore, we are currently in the 4th industrial revolution and it is imperative that graduates acquire this important skill to add value to the organizations that will employ them in the future. It is therefore important to be able to predict the performance of students wanting to pursue programming to put in place the necessary interventions for students that are likely to underperform in programming. The prediction of students" performance in programming can therefore be facilitated through the process of Educational Data Mining.
In the not too distant past data analysis was performed using mathematical and statistical methods using tools like charts, regression methods etc. to assist in decision making. However, because the amount of information in the world is increasing very rapidly coupled with an increase in the number of databases, the production of useful information has become very challenging and primitive tools can no longer be used in the analysis of these huge data sets. The type of analysis that needs to be performed on the data to extract interesting, important and meaningful patterns of information thereby allowing its applicability in many areas of our lives is called Data Mining (DM) [2][3][4].
Data Mining (DM) is also known as Knowledge Discovery from Data (KDD) which converts enormous amounts of data into knowledge. In DM data is explored from different perspectives to derive useful information from the data [5]. Closely related to Data Mining is Educational Data Mining and as illustrated by Ventura et al. in [6] Educational Data Mining shares many attributes from other disciplines like education, computer science and statistics [5,[7][8][9][10][11][12].
Educational Data Mining (EDM) attempts to obtain knowledge from educational data by building models to facilitate the examination of educational data to discover important student related information [5]. Educational Data Mining is a relatively new discipline that employs various methods to extract meaning from huge amounts of data found in educational environments in order to better understand students" behaviour and results. The primary goal of EDM is to decipher how students learn and to identify those factors that will enhance students learning [13].
A desired outcome of EDM is to be able to predict the performance of students since this is closely related to the quality of education. The resulting prediction models created as an outcome of EDM can help educators identify problems faced by students that may be affecting their academic performance [1]. Numerous studies have been conducted in predicting the performance of students not necessarily in programming, including studies in [14][15][16].
This study is a meta-analysis of Educational Data Mining research with the aim of obtaining the most effective Educational Data Mining approaches used to predict the performance of students pursuing computer programming. Aligned to the aim the research question of this study is as follows: What are the most effective EDM methods used for prediction of student performance in computer programming? This paper consists of the following sections: Section II is a www.ijacsa.thesai.org discussion of related works about studies involving the prediction of students" performance in programming. Section III is a discussion of the methodology used. In Section IV the results and findings of the meta-analysis are presented. The limitations of the study are discussed in Section V and finally, the paper concludes in Section VI.

II. RELATED WORKS
Many studies have been conducted to predict students" performance in programming [1,17,18]. An analysis of the literature reveals that the studies conducted can be categorized into the following two broad categories namely, studies carried out to predict student performance in programming using their performance in a programming related module either at school or in a programming related entrance test; and studies conducted to predict student performance in programming using other features like background factors, grades obtained in mathematics or physical science or other factors not directly related to programming. In this section, the literature from these two perspectives are presented.
In research conducted by Sivasakthi in [19], five data mining algorithms were executed on a data set to predict students" performance in an introductory programming module. These algorithms were: Multilayer Perceptron, Naïve Bayes, SMO, J48 and REPTree. The study used student demographic related data, the grade obtained in programming at college (i.e. before university) and the grade obtained in an entrance test. It was found that MLP performed best with an accuracy of 93% and the Naïve Bayes algorithm had the lowest prediction accuracy of 84%. In the MLP method, the factor that lead to the highest prediction of students" performance was students" grade obtained in college and the entrance test. Because many students pursuing programming have not programmed previously be it at school or elsewhere, this model will not be able to predict the performance of students with no prior programming exposure.
Pathan et al. in [5] developed a DT model to classify C programming students into 3 groups good, average and poor. The attributes used in this study were related to student behaviour and past educational information as well as C programming questions. The DT model by Pathan et al. in [5] was able to classify 87% of students correctly.
In a study by Đambić et al. in [20] a machine learning model was developed to predict the likelyhood of students pursuing an entry level programming module of failing. The features that were used in the model are as indicated in Table I:   TABLE I. FEATURES FOR MODEL

X1
Number of points from the first colloquium

X2
Number of points from the first quiz

X3
Number of points from the first homework

X4
Whether is this a second-time student has enrolled in this course

X5
Whether the student has attended the first colloquium This study used the logistic regression model. The misclassification of the model was around 19% and the precision was around 67%. The use of this model simply meant that many students who would have passed on their own were identified and would be sent for additional support interventions.
Costa et al. in [21] attempted to determine the efficiency of four EDM techniques namely Decision Tree, Support Vector Machine (SVM), Neural Network and Naive Bayes. These techniques were implemented on two independent sets of data pertaining to entry level programming modules at a university in Brazil. The data sets were data from residential students and the other included data from distance education students. The study revealed that the SVM technique performed far better than the other EDM techniques by predicting with an accuracy of 92% for distance education students and with an accuracy of 83% for residential students.
Figueiredo et al. in [22] proposed a neural network predictive model for predicting student failure in programming using their performance in various programming related tasks during class. This model enabled teachers to filter out those students that are more likely to fail early enough to implement new teaching interventions so as to enhance the students programming skills. The neural network model had an accuracy of 94.12% and a precision of 95.45%.
Vihavainen et al. in [23] investigated how students programming behaviour (e.g. eagerness to work on programming exercises) influences their grade in the module. In this study, only data derived online taking screen shots of students programing exercises were used. Furthermore, students" background information was not used as features in this study. The study predicted with a 78% accuracy as to whether the student was a high-achiever, passed the module, or failed the module.
In the study by Bergin et al. in [24] six machine learning algorithms were considered in the prediction of student performance in programming. The study used several categories of predictors of performance in programming. The categories include background factors, factors related to comfort level at the commencement of the module (This category included programming related questions), motivation and the student use of learning techniques. Naïve Bayes outperformed the other machine learning algorithms by being able to predict with an accuracy of 78.3%.
Aguinaldo et al. in [25] developed a predictive model to determine student"s success in an introductory programming module using six 21st century learning skills which are: Creative Skill, Reflective Skill, Problem-Solving Skills, Collaborative, Communication and Adaptability Skills. This predictive model used the PART classifier algorithm. It was found that communication was the strongest predictor of success in programming logic formulation. Unlike the study by Sivasakthi in [19] this predictive model was not based on performance in programming and can therefore be utilised to predict the performance of students who have no prior programming exposure. www.ijacsa.thesai.org In a study by Abdulsalam et al. in [2] three decision tree algorithms which are C4.5 (J48 in WEKA), CART and BF were used in predicting the performance of students in computer programming using the attributes of the grades obtained in Mathematics and Physics. The study revealed that J48 performed better than the CART and BF algorithm. J48 had a prediction accuracy of 70.37% while CART and BF Tree had prediction accuracies of 60.44% and 60.30% respectively. In a similar study conducted at a Nigerian university using a prediction model based on Artificial Neural Networks (ANN) it was also found that students possessing above average grades in Mathematics and Physics performed better in programming as compared to students who did not possess these attributes [26].
In the study by Mohamad et al. in [27] rough set was applied to a data set in order to identify those factors that influenced students" performance in programming based on data from earlier student results. The study revealed that students who have attempted a programming course before university and students who have obtained an average mark for mathematics, English and the Malay language at school were good indicators of performance in programming at university. In addition, in terms of personality factor, the investigative and social type student and the average cognitive student were identified as important attributes that effect the performance in computer programming.
Badr et al. in [28] developed a model to predict the performance of students wanting to pursue programming. This model used as attributes the marks that students obtained in mathematics and English. In this study, a classifier was built using an association rules algorithm. Unlike many other studies, this study resulted in the creation of a model that was able to predict a students" likelihood of success in programming before registering for the course. This meant that the performance of students pursuing programming increased since they could adjust their teaching strategies to accommodate those students that were predicted to more likely underperform in the programming course. The study conducted two experiments by executing the CBA rulegeneration algorithm. The first used the marks obtained in English and mathematics modules, and this resulted in four rules with an accuracy of 62.75%. The second used marks obtained in only English, resulting in four rules with an accuracy of 67.33%. Table II summarizes the various studies in the literature that used data mining or machine learning algorithms in the prediction of students" performance in programming. The table is classified according to the following headings namely: author, problem focus, scientific method, sample size, classification of the algorithms and accuracy. The application of data mining algorithms such as multilayer perceptron, Naive Bayes, SMO, J48, REPTree on student related data to determine those students that may require additional support. Rough set was applied to a programming data set in order to determine those factors that will influence students success in programming.

A. Literature Search Strategy
The study was carried out using the PRISMA (preferred reporting items for systematic reviews and meta-analysis) approach [29][30][31]. In conducting the meta-analysis, many databases were searched including ACM, Google Scholar, IEEE, Pro-Quest, Science Direct and Scopus. Only papers published in English between the period 2010 and 2020 were retrieved from the databases. The following combination of terms were used in searching the various databases: "Programming" [All Fields] AND "Machine learning" [All Fields] OR "Programming" [All Fields] AND "Data Mining" [All Fields] OR "Programming" [All Fields] AND "Intelligent Systems" [All Fields] OR "Programming" [All Fields] AND "Problem Solving" [All Fields] OR "Programming" [All Fields] AND "Higher Education" [All Fields]. The search terms were separated or combined using the Boolean operators "OR" or "AND". All papers identified by the search were imported into EndNote X9. A total of 220 articles were identified between the years 2010 and 2020 as indicated in Fig. 1 below. Furthermore, the reference lists of related articles were also manually checked for citations overlooked during the searching of the databases.

B. Inclusion Criteria
The inclusion criteria of the articles were that the studies were carried out at higher education institutions where the performance of students in programming using machine learning or data mining algorithms were studied.

C. Exclusion Criteria
Articles written in languages other than English, published before January 2010 were excluded. Systematic reviews, editorials, books, book chapters and thesis were excluded. Articles on the performance of students in programming at schools were also excluded. Studies related to performance prediction of students in subjects other than programming were also excluded.

D. Statistical Data Analysis
The appropriate principal studies data were obtained and then captured onto an Excel sheet, which facilitated it being exported to the statistical analysis software, STATA version 15. Furthermore, the study incorporated the use of forest plots to estimate pool effect size and the effect of each study with their confidence interval (CI) to provide a visual image of the data. In a meta-analysis, it is essential to assess heterogeneity between the pooled studies. Heterogeneity in a meta-analysis denotes the dissimilarity in the results of the various studies. The index of heterogeneity (I 2 statistic) was used to assess the heterogeneity amongst the included studies and we tested for its significance using Cochran"s Q test [32][33][34]. The I² statistic is used to denote the percentage of disparity amongst the studies that is attributed to heterogeneity and not chance. The I 2 values of 25%, 50%, and 75% indicate low, medium, and high heterogeneity, respectively. The meta-analysis amongst the subgroups were conducted to assess the mean pooled performance estimates based on the different types of algorithms.
Publication bias refers to biasness that is found in published academic research. Publication bias happens when the results of an experiment or study effects the decision as to publish the study or distribute it. Thus, only publishing studies that show a noteworthy finding affects the outcome of the research findings. In addition, publication bias can also result in the formulation and testing of hypotheses that is based on incorrect perceptions from the scientific literature. Hence, in this study, small study effect and funnel plot test were evaluated to assess the risks of publication bias. Furthermore, publication bias was assessed by means of Egger"s and Begg"s test [35,36].
As indicated in Fig. 1, this systematic review includes published papers between January 2010 and November 2020. These articles were then imported into EndNote version X9 and the duplicates removed, resulting in 196 articles remaining. A further 25 articles were removed after reading the abstracts. Following the review of the 171 articles, 139 articles were deleted due to various reasons and a further 21 excluded due to the specified inclusion and exclusion criteria. The smallest sample size was 26 participants in a study conducted with a machine learning algorithm, while the largest sample size was data mining algorithm approach. A total of 1956 participants were included in this meta-analysis. Most of the studies were carried out with the data mining algorithm approach, 8 (73%), hybrid algorithm, 2 (18%), and the remaining were performed with a machine learning approach, 1 (9%). When we look at the subgroup where the prediction was made, we found that three of the included studies was used to make a prediction and three on studentrelated prediction. Fig. 1 below illustrates the PRISMA approach used in conducting the database searches. www.ijacsa.thesai.org

A. Performance of Various Algorithms
The meta-analysis comprised of eleven published studies and all eleven studies were considered in the estimation of the pooled performance of algorithms used to make the prediction. The stratification was done based on the different types of algorithms used in the extracted articles. The minimum performance of algorithm prediction was 10% and it was found in studies performed with drop out and retention. Conversely, the maximum algorithm prediction performance was found to be 36%, in a study performed with the associated student-related sub group data. The I 2 test statistic revealed high heterogeneity (I 2 = 99.17%, P= <0.000). By means of the random effect analysis, the pooled performance of the algorithms was 24% (95% CI (13%, 35%). Subgroup analysis based on the types of algorithm techniques showed that the performance of the algorithm with a study using hybrid and data mining was found to be 3% (95% CI: 1%, 5%) and 20% (95% CI: 9%, 32%), respectively (Fig. 2). The midpoint and the length of each segment showed performance and a 95% CI, while the diamond shape indicated the combined performance of all studies.

B. Publication Bias
All the studies that were part of the meta-analysis were visually evaluated for publication bias using the funnel plot. Studies documented in the literature have suggested evaluating publication bias in meta-analysis to draw a reasonable conclusion about the generalizability of cumulative findings that can be affected by biases. The aim was to identify the degree to which biasness influences the study outcome to determine the validity of core findings. The funnel plot is a standard visual method for identifying publication bias. It is a scatterplot of odd log-ratio standard errors against the study effects size computed by the odd log ratio. In a funnel plot depicting a meta-analysis with no publication bias, studies will be symmetrically distributed on either side of the vertical line marking the pooled effect size if no relevant findings are missing. The funnel plot asymmetrically indicated the presence of publication bias since a higher percentage (82%) of the studies fell outside the triangular region (Fig. 3). This implies that only a smaller proportion (18%) of the studies fell inside the triangular region. In addition, the result of Egger"s test revealed the presence of publication bias, Pvalues <0.05 (Table III). The presence of publication bias was assessed subjectively using funnel plots and objectively using the Egger"s test. Each point in the funnel plots indicated a separate study and the asymmetrical distribution of studies on the plot is an indication of publication bias. First, studies" effect sizes were plotted against their standard errors and the assessment of the funnel plots revealed that in all cases the funnel plots were slightly asymmetrical (Fig. 3).
The visual examination of a funnel plot can be generally subjective to interpretation for which the Egger asymmetry method has been suggested as a complementary statistical test for bias.
The Egger test's purpose was to perform a simple linear regression to test whether the model intercept significantly differs from zero at P< 0.05. however, the funnel plots was also objectively assessed by means of Egger"s weighted regression statistics. According to the symmetry assumptions, there is a publication bias in the combined (p = 0.013), pooled estimates of algorithms (Table III). www.ijacsa.thesai.org

C. Sensitivity Analysis
Besides, a sensitivity test was conducted to determine the influence of each study. The outcome of the sensitivity test suggested that there was no influence on the pooled estimate of algorithm while eliminating one study at a time from the analysis. We did the sensitivity analysis of the performance of algorithms by the application of a random-effects model (Table IV). The analysis was conducted to determine the effect of each study on the pooled estimated performance of algorithms by excluding each study incrementally. The outcome of this indicated that studies that were excluded had no significant difference on the performance of algorithms. Sensitivity analysis is crucial to evaluate the robustness of combined estimates to different assumptions and inclusion criteria. The combined estimates were obtained by excluding studies judged to be at high risk of bias with those judged to be at low or moderate risk of bias [37,38]. Hence, the presented sensitivity analysis indicated that the meta-analysis is fairly robust to the publication bias. Furthermore, the sensitivity analysis was used to assess the effects of probable violations of modelling assumptions, all of which produced alike results.

V. LIMITATIONS
The one superficial limitation of meta-analysis that has been observed in this study is the exclusion of articles that do not satisfy all the inclusion criteria. Such articles that were excluded may contain useful information. Besides, another limitation of the current study is that only the perspective of students was considered. Extending the study to capture other institutions' perspectives apart from learning institutions could have yielded more insightful findings. However, this metaanalysis study has provided valuable information regarding the most effective Educational Data Mining approaches to predict the performance of students pursuing computer programming. These limitations could be addressed in the future study because we might have missed a few relevant studies through the exclusion criteria. Further research is needed to explore the interdependencies among factors that can be utilized to predict the performance of students pursuing computer programming. In the future, we plan to explore ways to analyze missing data in related articles to cover the vital information that may have been lost because of the exclusion criteria of this study.

VI. CONCLUSION
A meta-analysis method has been used to identify and analyze factors influencing student performance, but this is the first study that applied meta-analysis to obtain the most effective Educational Data Mining approaches used to predict students' performance pursuing computer programming. Effect sizes were determined, variations and bias were determined for the included studies because of different Funnel plot with pseudo 95% confidence limits www.ijacsa.thesai.org classifications of algorithms applied to identify students' performance pursuing computer programming. The obtained results showed that the pooled estimate of the most effective Educational Data Mining approaches used to predict students' performance pursuing computer programming was highly prevalent among participants. An attempt was made to determine the possible sources of heterogeneity by means of subgroup analysis, meta-regression, and sensitivity analysis; however, the sources of variability could not be established in all cases. The most likely reason for this colossal heterogeneity is that some of the studies were obtained from the variation among the sample size utilized in adopting the various algorithms.