Educational Data Mining Applications and Techniques

Educational data mining (EDM) uses data mining techniques to analyze huge amounts of student data in the educational environments. The main purpose of EDM is to analyze and solve educational issues and, consequently, improve educational processes. With the emergence of EDM applications in the educational environments, several techniques have been identified to implement these applications. This paper reviews the relevant studies in EDM including datasets and techniques used in those studies and identifies the most effective techniques. The most prevalent applications include predicting student performance, detecting undesirable student behaviors, grouping students and student modeling. These applications aim to help decision makers in the educational institutions to understand student situations, improve students’ performance, identify learning priorities for different groups of students and develop learning process. The prediction accuracy is selected as the evaluation criteria for the effectiveness of educational data mining techniques. The results show that Bayesian Network and Random Forest are the most effective techniques for predicting student performance, Social Network Analysis is the best technique for detecting undesirable student behaviors, Clustering and Social Network Analysis are the most effective techniques for grouping students and student modeling, respectively. This study recommends conducting more comprehensive and extended studies to evaluate the effectiveness of EDM techniques with an extended evaluation criteria. Keywords—Educational data mining; student performance; prediction; classification; clustering


I. INTRODUCTION
The main aim of educational systems is providing knowledge and skills for students to move into their future careers in a specific period. The way that the educational systems meet effectively with this aim is a key determinant for both social and economic progress [1]. The used technologies in educational systems generated massive data that is difficult to analyze with a human eye [2]. Educational data mining (EDM) uses different data mining techniques [3] to analyze students' data in educational environments. The main purpose of EDM is to analyze and solve educational issues and, consequently, improve educational processes [4]. Therefore, its goal is to examine the educational data for resolving the associated issues with education. EDM include extracting useful, interpretable, interesting and novel information from data within the educational field.
Lately, with the emergence of the educational data mining applications in educational environments, several techniques have been identified to implement these applications. Therefore, this study explores the EDM techniques of educational data mining applications. Among these applications, the most prevalent are predicting student performance, detecting undesirable student behaviors, grouping students and student modeling.
There are different applications in EDM where these applications have different objectives including enhancing and improving the quality of learning and improve the learning process understanding process [5]. Moreover, EDM applications target the different stakeholders in the educational systems including students, researchers, administrators and educators. Providing recommendations, personalization and feedback can develop the students learning. This paper reviews the relevant studies in the EDM landscape including the datasets and techniques used in those studies, and identifies the most effective techniques for educational data mining applications, with an emphasis on applications concerning students. The importance of these applications is that they help decision makers in educational institutions to gain a deep understanding of student situations, improve students' performance, identify learning priorities for different groups of students and develop learning process. This paper is organized as follows. Section II presents a review of some studies that analysed the landscape of Educational Data Mining (EDM). Section III is devoted for presenting a thorough analysis of EDM applications and associated techniques. Section IV discusses surveyed techniques for each application and identifies the most effective techniques, while Section V concludes this research considering some thoughts for future work.

II. RELATED WORK
Several investigations have been carried out regarding Educational Data Mining (EDM) applications and techniques in academic environments, demonstrating the importance of EDM for extracting accurate information about students' behavior and effectiveness of the learning process [6]. There are also several survey papers published about EDM so far. Recently in one of the surveys published in 2018 by Ray and Saeed [2] on EDM in higher education, four areas of applications have been counted, namely predicting students' performance, educating students using big data, assessment of students' learning, and teaching and research. Based on these applications, they described EDM techniques that have been applied in order to improve and understand the learning process of students. In another survey about EDM applications written by Bakhshinategh et al. [5], 13 categories of EDM applications were suggested under student modeling, decision support systems and other applications. This survey has been (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 4, 2020 useful because it described these applications with the help of research examples. Due to the importance of educational data mining, many researches related to education involved analysis and data mining. This paper reviews EDM techniques with an emphasis on their applications concerning students.

A. Predicting Student Performance
The main aim of this application is predicting the students' academic failure to improve their learning and develop the educational process. Also, it helps stakeholders in education to improve the performance of students in future [7] [5]. The most used techniques for predicting student performance includes Bayesian classification, decision trees, neural network, rule based and feature selection.
In 2015, Ahmad, Ismail and Abdul Aziz [8] studied the techniques for predicting students' academic performance in the first year of the bachelor's in computer science. They used different techniques including Naive Bayes, Rule Based, and Decision Tree for applying them on the data of students to produce the best prediction model for students' academic performance. The results of this study showed that the Rule Based is the best prediction model as it received the highest percent of prediction accuracy 71.3%. The study [9] by Kaur, Singh and Josan focused on predicting slow learners among students by different classification techniques. They collected a dataset of 152 students from a high school then they analyzed and tested students' performance using WEKA tool. The result of the comparison between used predictive techniques showed that Multilayer Perception technique had the best prediction accuracy of 75%.
The study [10] by Mueenm Zafar and Manzoor in 2016 applied different techniques of data mining including Naïve Bayes, Decision Tree and Neural Network on students' data for two courses to predict the academic performance of students. The results showed that Naïve Bayes algorithm had the highest prediction accuracy of 86%, while Decision Tree and Neural Network had 82.7% and 79.2% respectively.
In 2017, Abu Amra and Maghari in their study [11] proposed the best model for predicting students' performance based on their attributes by using Naive Bayes and K -Nearest Neighbor (KNN) classification algorithms. They collected data from 500 students with eight attributes in secondary school. The result of their study showed that Naive Bayes classifier had better prediction accuracy of 93.6% while KNN classifier had 62.9%. Another study [12] by Almarabeh applied five data mining classification techniques for predicting students' performance using WEKA tool. He collected data from 225 students in university. The result of his study showed that Bayesian Network had the highest prediction accuracy of 92%, Naïve Bayes and J48 had the same prediction accuracy of 91.11%, Neural Network had the prediction accuracy of 90.2% and Iterative Dichotomiser 3 (ID3) had the the lowest prediction accuracy of 88%. Mousa and Maghari in their study [13] applied three data mining classification techniques and they are Naïve Bayes, Decision Tree and K -Nearest Neighbor (K-NN) to predict the performance of students using academic attributes. They collected data from a preparatory in Gaza strip school of 1100 male students. Their result showed that Decision Tree had the best prediction accuracy of 92.96%, Naïve Bayes had the prediction accuracy of 91.50% and K-NN had the prediction accuracy of 90.91%. Kapur and Ahluwalia [14] compared six data mining algorithms to predicting the marks of students and they are Naïve Bayes, Random Forest, Decision Tree, IBk, K-star and Naïve Bayes Multiple Nominal. They collected a dataset of 480 records with 16 attributes, and they used WEKA tool. The results of their study showed that Random Forest had the highest prediction accuracy of 76.667%. Also, Khasanah and Harwati [15] applied Bayesian Network and Decision Tree techniques for predicting the performance of students to avoid students failures. They collected data from Industrial Engineering students at Islam Indonesia University. The result showed that Bayesian Network had the best prediction accuracy of 98.08% while Decision Tree had 94.23%. Makhtar, Nawang and Shamsuddin in their study [16] classified students' performance according to their performance in specific subjects using Naïve Bayes algorithm. They collected 488 student's data from the Maktab Rendah Sains MARA Kuala Berang Information System. Their result showed that Naïve Bayes method had accuracy of 73.4%.
Hussain, Dahan, Ba-Alwib and Ribata in 2018 [17] applied four data mining techniques to predict the students' performance and avoid students' dropout using WEKA tool. They collected a dataset of 300 students and 24 attributes at India and Assam colleges. Their results showed that Random Forest had the highest prediction accuracy of 99%, Bayes Network had prediction accuracy of 65.33%, J48 had prediction accuracy of 73% and PART had prediction accuracy of 74.33%.
In 2019, Salal, Abdullaev and Kumar [18] implemented data mining classification algorithms including Naïve Bayes, Random Forest, JRip, REPTree, OneR, Decision Tree (J48), SimpleLogistic and ZeroR for predicting students' academic performance. They collected 649 student's data with 33 attributes from two secondary school then they analyzed it using WEKA tool. The result showed that Decision Tree (J48), REPTree and OneR had the prediction accuracy of more than 76%. Decision Tree (J48) had accuracy of 76.2712%, REPTree and OneR had the same accuracy of 76.7334%. The study [19] by Agarwal, Maheshwari, Roy, Pandey and Rautray analyzed 306 students' data in higher education for predicting student performance using two classification algorithms K-Nearest Neighbor and Random Forest. The result of their study showed that Random Forest had the highest prediction accuracy of 93.54%. Adekitan and Salau [20] analyzed the performance of students using six data mining algorithms like Naïve Bayes, Random Forest, Logistic Regression, Decision Tree, Neural Network and Tree Ensemble. They collected 1841 data from engineering students in the first three years. They result showed that Logistic Regression had the maximum prediction accuracy of 89.15%. Adekitan and Noma-Osaghae [21] used data mining algorithms for predicting student's performance in KNIME and Orange platforms. They analyzed student's data in their first year at Covenant University in Nigeria. The result of their study showed that Logistic Regression in KNIME platform and Neural Network in Orange platform had the prediction accuracy of 50.23% and 51.9% respectively. Another study [22] by Rifat, Al Imran and Badrudduza used six classification algorithms of data mining including Random Forest, Decision Tree, Tree Ensemble, Gradient Boosted Tree, K-Nearest Neighbors and Support Vector Machine for predicting the students' performance. They collected 398 business students' data from the Marketing department of a renowned university in Bangladesh then they analyzed it using KNIME, the Konstanz tools. The result of their study showed that Random Forest had the highest prediction accuracy of 94.1%.
In 2020, Alhakami et al. in their study [23] used J48, Naive Bayes algorithms to predict students' academic performance and help in advising students using WEKA tool. They collected 38671 students' data of both male and female from Umm Al-Qura University for 5 years, with several attributes including Exams Marks, School, Sex, Age, Nationality, City and final grade. Their result showed that J48 algorithm had the highest accuracy of 84.38%.

B. Detecting Undesirable Student Bahaviors
With its similarity to student performance prediction, this application focuses on detecting undesirable student behaviors including erroneous actions, low motivation, cheating, academic failure. The main used data mining techniques including classification, clustering, outlier detection and feature selection, decision tree and neural networks [5].
In 2012, Bayer et al. focused in their study [24] on predicting school failures and drop-outs when enriching student data with derived data from student's social behavior. The collected data described gathered social dependencies from discussion board conversations and e-mail mainly. They described new features extraction from both represented behavior and data student data through a constructed social graph. Then, a novel method was introduced for learning a student failure prediction classifier that uses cost-sensitive learning in order to lower the wrongly classified unsuccessful students number. The results showed that Social Network Analysis (SNA) produced significant increase in the prediction accuracy to 92.89%.
The study [25] by Guarín, Guzmán and González in 2015 applied two data mining methods including Naïve Bayes and decision Tree to predicting low academic performance of students in the first four enrollments. They collected data from Architectural Engineering program and Computer and System Engineering program. The result of their study showed that Naïve Bayes had the best prediction accuracy of 75%.
The study [26] by Athani et al. in 2017 aimed to enhance the behavior of secondary school students using techniques of data mining. Naïve Bayesian classifier is implemented to predict the behavior of students to create the prediction. The classifier accuracy is calculated through WEKA tool where confusion matrix was generated. The obtained classifier accuracy is 87% which could be further enhanced through appropriate attributes selection.
In 2019, the study [27] by Pattanaphanchai, Leelertpanyakul and Theppalak proposed a model to predict students' dropout patterns using WEKA tool. The dataset is collected from Faculty of Science, Prince of Songkla University of five years. The result of their study showed that JRip had a prediction accuracy of 77.30%.

C. Grouping Students
The aim of grouping students application is to create a group or cluster of students according to different profile in-formation properties [28]. This application is used by different stakeholders in education for several tasks to develop the educational process [29]. It is often unlike clustering similar students together where the aim is to group students who complement each other. Moreover, the highest dissimilarity is detected between clusters when clustering students. However, this is not always the case in students grouping [5]. The most common EDM techniques used in grouping students include clustering, neural network and feature selection.
In 2013, Harley, Trevors and Azevedo in their study [30] presented the obtained results using clustering on collected data from 106 students. Three extracted clusters were analyzed and validated through multivariate statistics (MANOVAs) to characterize the three distinct students profiles, showing statistically significant differences of all the twelve used variables for the formation of the clusters (such as: performance, notetaking use and sub-goals attempted number). The results showed that variations existed among the clusters concerning perceived prompts through the system. The prediction accuracy of the overall clusters was 78.8%. On the other hand, Nunes and Minussi in their study [31] investigated a Neural Network technique used in students grouping who are likely to be failed. The results of this study indicated that the Neural Network technique reached prediction accuracy of 76%.

D. Student Modeling
This application defines different aspects characterizing the student including cognition, skills, emotions, domain knowledge, learning strategies, achievements, features, learning preferences, affects and evaluation. The aim is to characterize the student and adjust the teaching processes to meet the learning requirements of students [32]. The main used techniques in students modeling application including Social Network Analysis (SNA), rules induction, decision tree, Linear Discriminant Analysis (LDA), Bayes theorem and rules induction.
Dekker, Pechenizkiy and Vleeshouwers (2009) in their study [33] aimed to predict the students drop using classification algorithms of data mining. They collected dataset of 648 students in the Electrical Engineering program. The results of this study showed that simple decision trees and intuitive decision trees classifier gave prediction accuracy of 75% and 80% respectively.
In another study [34] by Macfadyen and Dawson (2010), the researchers included an investigation of which student's prediction modeling is more effective. The results of the study showed that Social Network Analysis (SNA) has generated the best prediction. Logistic modeling validated Social Network Analysis (SNA) predictive power that achieved prediction accuracy of 81%.
The study [35] by Sivakumar, Venkataraman and Selvaraj (2016) used improved decision tree for predictive modeling of dropout students. The dataset is collected from 240 records at university in India. The results showed that improved decision had a prediction accuracy of 97.50%.

IV. DISCUSSION
Based on the above review of studies which analyzed student's data to solve some educational issues, seven techniques are identified for four EDM applications and they include: The comparison between the surveyed techniques was based on the prediction accuracy as a common evaluation criteria in each of the selected research papers. " Table I" demonstrates the main surveyed techniques.
According to " Fig. 1", Social Network Analysis (SNA) is the most common technique where it is the most effective technique used in detecting the undesirable student behaviors as well as student modeling. However, the efficiency of Social Network Analysis (SNA) is not the same for both applications, which indicates that each application implies different type of techniques. Social network analysis (SNA) focuses on detecting the students' interaction pattern and has been evolved increasingly with the emergence of social networking such as Facebook and Twitter [36] [37]. Also, SNA is used for assessing the student's participation in discussions of online courses [38]. In the previous studies, SNA was used to monitor students' creative capacity [39], to detect "at risk" students besides Bayer et al. [24] who used SNA for dropout's prediction.
Naive Bayes is a classification algorithm of educational data mining techniques. Naive Bayes algorithm is mainly used for predictive modeling that is based on the Bayesian techniques with independent attributes [40]. Naive Bayes, Rule induction and decision tree algorithms can easily be implemented in the IF-THEN rules form of object-oriented programming that can be simply understood [41] [42]. This way, normal users who have no deep knowledge regarding data mining can understand easily the obtained results using the previous algorithms.
Decision tree is a popular and powerful prediction and classification technique. It is the most frequent data mining technique that is used in the related studies [43]. This algorithm consists of number of nodes which are used to getting related information for supports the decision-making process on the root node [44].
Neural Network is a set of input/output units that are connected where every connection has a different associated weight with each other [45]. Throughout the learning part, the network learns through modifying the weights in order to predict the input samples correct class.
Clustering techniques achieved better prediction accuracy compared to neural network techniques in students' grouping application. Clustering is a very efficient technique in grouping where it divides the data into groups or clusters with similar characteristics [46]. However, the size of data is reduced when clustering [47], so some details are lost.
The results of this review paper could be a reference for decision makers in educational systems, where such data could provide a decision support validation for the used prediction technique. Moreover, this paper analyses the efficiency of the educational data mining applications with the right predictions.
This study helps in improving and enhancing the performance and experience of students by presenting best prediction techniques. Teachers also can benefit from this paper by finding the best methods that they can use to develop the educational processes. Using educational data mining, Teachers can identify the students' behavioral patterns that can support their judgments and teaching methods and determine indicators of student engagement and satisfaction besides monitoring the learning progress. Moreover, researchers can use this review paper for further concentration on the evaluation and development of educational data mining techniques.

V. CONCLUSION AND FUTURE WORK
Recently, with the increase of utilizing data mining applications in educational environments, this paper identifies the most effective techniques for each of these EDM applications. The importance of this review paper lies on the united evaluation criteria for the comparison of the different techniques for each EDM application. Prediction accuracy is used as an indicator for the effectiveness of the surveyed techniques. This paper indicates that the effective technique in one application does not necessarily means it will be effective on another application. Therefore, further surveys should be conducted for each of the EDM applications to more accurately identify the most effective techniques. Moreover, extended evaluation and comparison criteria should be used in the evaluation.