Machine Learning Model through Ensemble Bagged Trees in Predictive Analysis of University Teaching Performance

The objective of this study is to analyze and discuss the metrics of the Machine Learning model through the Ensemble Bagged Trees algorithm, which will be applied to data on satisfaction with teaching performance in the virtual environment. Initially the classification analysis through the Matlab R2021a software, identified an Accuracy of 81.3%, for the Ensemble Bagged Trees algorithm. When performing the validation of the collected data, and proceeding with the obtaining of the predictive model, for the 4 classes (satisfaction levels), total precision values of 82.21%, Sensitivity of 73.40%, Specificity of 91.02% and of 90.63% Accuracy. In turn, the highest level of the area under the curve (AUC) by means of the Receiver operating characteristic (ROC) is 0.93, thus considering a sensitivity of the predictive model of 93%. The validation of these results will allow the directors of the higher institution to have a database, to be used in the process of improving the quality of the educational service in relation to teaching performance. Keywords—Machine learning; ensemble; bagged trees; predictive analysis; teaching performance


I. INTRODUCTION
The information and communication technology (ICT) sector is currently a leader in the analysis of data from different media [1], [2], such as virtual platforms, survey administration software, among other technological tools [3], [4], which capture or acquire information to be processed and analyzed in descriptive statistical research or in research on predictive models applicable to various areas of knowledge [5].
The advantages that the introduction of ICT has generated in the education sector is based on the importance of technology to develop research that previously could not be carried out, [6], [7] as is the case of the identification of predictive models for the analysis or monitoring of university teaching performance, student performance, among other relevant factors for the education sector [8]- [10].
Worldwide, the education sector has undergone changes and transformations, due to the virtualization of the teachinglearning mode, [11], [12], [13], as a consequence of this scenario, universities face new challenges, to safeguard the quality of education that goes hand in hand with the advancement of technology [14]- [16].
Given this, in the education sector, an increasing amount of data has been generated with greater relevance, product of the iterations of the different actors of the educational process, these being the teacher, the students and the institution, through the application of tools technological, such as survey software, which generate a database [17], [18]. As indicated, the data that are stored, are used in order to improve the efficiency of the educational process through predictive models, among the factors to optimize are academic performance, student dropout, teaching performance, graduate follow-up [19].
There are various technologies used to obtain predictive models, which use data from virtual platforms and survey administration software, applied to students by universities [20]. Within these technologies is the branch of Artificial Intelligence that within its fields houses Machine Learning [21]- [23]. As indicated in [24], Machine Learning is a set of algorithms capable of learning to perform certain tasks from the generalization of examples. Machine Learning has been successfully applied to a variety of areas of human endeavor, and has recently been applied to the educational sector, whose purpose is oriented towards the design of algorithms, methods and models, which will allow the exploration of data from teaching-learning environments [25], [26].
Among the multiple algorithms of Machine Learning, there is Ensemble Bagged Trees, which is an algorithm that is used in joint learning [27]. This can combine training and base 367 | P a g e www.ijacsa.thesai.org classifiers to produce ensemble models or use an algorithm with multiple test data sets as the basis [28]. In this regard, in [29] it is pointed out that the Bagged Trees algorithm forms different trees when there is a change in the starting point of the training data that results in a decrease in stability. This technique or algorithm is also suitable to be used in the search for optimal models for large data, since the classification becomes easier [30], [31].
In this sense, the main objective of this article is to determine the predictive model using Machine Learning through the Ensemble Bagged Trees algorithm, for the predictive analysis of university teaching performance, in order to use it as part of the procedure to improve the quality of the educational process. Initially, the methodology used will be detailed, then the validation of the algorithm will be determined, by means of the accuracy and the confusion matrix, to finally analyze the total performance metrics (Accuracy (A), Precision (P), Sensitivity (S) and Specificity (R)) of the selected algorithm, from obtaining the receiver operating characteristic curve (ROC).
The contribution of the research focuses on applying a novel technique for the higher institution, through machine learning making use of the data and information collected, which allows making preventive and corrective decisions based on reliable results, obtained through a methodology not so complex.

A. Type and Level of Research
The type of research is applied, since it starts from the identification of a problem, related to the improvement of university teaching performance, for which use is made of methods or tools already defined such as predictive models through Machine Learning, which employs the Ensemble Bagged Trees algorithm. Likewise, the research level is descriptive, since it focuses on analyzing and discussing the metrics of the predictive model obtained through the Ensemble Bagged Trees algorithm, applied to the perception data of engineering university students.
This research also seeks to design a predictive multidimensional model that can be used to create and store new data for the higher institution. Based on this technological tool, it determines patterns and calculates association rules, providing support and reliability to the results obtained. Performance metrics such as Accuracy, Precision, Sensitivity and Specificity show improved performance over the manual method of the same procedure commonly performed in research [28].

B. Participants
The participants in this research are made up of students from the sixth to the tenth cycle of professional engineering schools, with a total of 581 students, this selection criterion is part of a regulation established and approved by the higher institution. It should be noted that it was possible to collect data from the entire population, for this reason, it can be noted that the sample coincides with the population.

C. Data Collection Technique and Instrument
The data collection technique is the survey, and the instrument used to collect data regarding university teaching performance is the questionnaire, which was carried out virtually, due to the context of the health emergency declared by the Covid -19. The virtual platform of the higher institution was used, which gave access to the data collection instrument through the code of each student, which guaranteed the security and reliability of the information. The questionnaire consisted of responses on a Likert scale ranging in levels from 1 to 4 (from dissatisfied to very satisfied). These levels of satisfaction in the analysis will be represented as the classes of the predictive model. In Fig. 1, the indicators considered as predictive elements in the perception of university teaching performance are shown.

D. Reliability of the Collected Data
As part of the methodology, the validation of the collected data is carried out, through Cronbach's Alpha coefficient using the SPSS software, this analysis carried out, it is observed in Table I that the consistency coefficient is equal to 0.932. As indicated in [12], values greater than 0.9 indicate great consistency that is, high homogeneity and equivalence of the response of all indicators. Once this result is obtained, the following section shows the results.

E. Data Processing Design
The data processing design responded to a nonexperimental transactional process, in which data was collected through a virtual questionnaire. In Fig. 2, the methodology of the research process is shown, which begins with the collection of data on the perception of engineering students from a public university in Peru. These data are related to the 6 indicators that are visualized in Fig. 1, whose appreciation regarding teaching performance is of an ordinal qualitative type, thus establishing 4 classes (very satisfied: 4, satisfied: 3, not very satisfied: 2 and dissatisfied :1). Likewise, the information collected was stored in a database in Microsoft SQL Server, associated through the Open Data Base Connectivity (OBDC) driver and the Matlab R2021a software. Using the Matlab software, we proceeded to use the "Classification Learner" tool, in order to identify the best Machine Learning algorithm, through its metrics. This algorithm allows the classification of students from the results obtained from the indicators specified in Fig. 1.

A. Determination of the Predictive Model
Using the Matlab R2021a software, and using the Classification Learner and Statistics and Machine Learning Toolbox 12.1 application, the best predictive model determined by the validation of the accuracy is identified, in Fig. 3, the results generated by the software are shown. Matlab R2021a.
As shown in Fig. 3, the Machine Learning algorithm that presents the best accuracy, for classifying the level of satisfaction with respect to university teaching performance, is the Ensemble Bagged Trees algorithm with an accuracy of 81.3%.

B. Results of the Predictive Model Metrics
When using the predictive model through Machine Learning through Ensemble Bagged Trees, to determine satisfaction with university teaching performance, confusion matrices are obtained, which represent elements of validation or performance measurement of the predictive model.
In Fig. 4, the confusion matrix is shown, with respect to the sensitivity metric, in it you can visualize the number of observations made by the classification system, and it reports the number of false negatives (FNR), which is the number of positive examples wrongly classified as negative and true positives (TPR) that define the number of positive samples correctly classified as positive, which shows the closeness between the levels of satisfaction predicted (Predicted class) by the model with respect to its true value ( True class).
As can be seen in Fig. 4, of the 4 classes on which the predictive model acts through Ensemble Bagged Trees, class 3 shows the highest percentage of sensitivity, this means that the predictive model has the ability to discriminate between a true positive (TP) of a false negative (FN) in this class (satisfied), in this case it is 89.9%, as observed in this class the model was only confused by 10.1%. While the lowest level of sensitivity of the predictive model is shown in class 1 (satisfaction level: dissatisfied), whose value is 63.9%.
In Fig. 5, the confusion matrix is shown with respect to the precision metric, since the values of the main diagonal indicate the precision of the predictive model for each class.
In Fig. 5, the confusion matrix is shown regarding the precision metric of the predictive model for each class, in which it is observed that the predictive model for class 1 (satisfaction level: dissatisfied) shows the highest precision rate, in this case it is 88.5%. This result indicates that the level of dispersion of the data for this class is very low.   Table II shows the metrics of the predictive model through Ensemble Bagged Trees, for each class, in which it is evidenced that the total Precision is 82.21%, the total Sensitivity is 73.40% and the total Specificity is 91.02%, and the Accuracy presents a total value of 90.63%.
As part of the predictive model through the Ensemble Bagged Trees algorithm, the response that Matlab provides for each class under study is evidenced, its corresponding Receiver operating characteristic (ROC) graph and considering that the ROC graph describes the Sensitivity and Specificity of the algorithm classifier, the findings in Fig. 6, allow us to establish that for class 1 (dissatisfied), a sensitivity of 93% is shown.
In addition, the discrimination threshold is 0.64 for the rate of true positives and 0.00 for the rate of false positives, showing an area value on the curve (AUC) of 0.93, this value being close to 1, it is noted that the model for class 1 is optimal.
In Fig. 7, the ROC graph for class 2 (not very satisfied) is shown, where a sensitivity of 91% is displayed. In addition, the discrimination threshold is 0.70 for the rate of true positives and 0.08 for the rate of false positives, showing an area value on the curve (AUC) of 0.91, this value being close to 1, it is noted that the model for class 2 is optimal.   In Fig. 8, the ROC plot for class 3 (satisfied) is shown, where a sensitivity of 91% is displayed. In addition, the discrimination threshold is 0.90 for the rate of true positives and 0.26 for the rate of false positives, showing an area value on the curve (AUC) of 0.91, this value being close to 1, it is noted that the model for class 3 is optimal.
Finally, in Fig. 9, the ROC graph for class 4 (very satisfied) is shown, where a sensitivity of 92% is displayed. In addition, the discrimination threshold is 0.70 for the rate of true positives and 0.02 for the rate of false positives, showing an area value on the curve (AUC) of 0.92, this value being close to 1, it is noted that the model for class 4 is optimal.

C. Discussion
In relation to the results obtained, it is evidenced that the predictive model, based on the Ensemble Bagged Trees algorithm, presents acceptable metrics of precision, sensitivity, specificity and accuracy, in its 4 classes each of its classes, in this way the predictive model obtained provides security and reliability, contributing to decision making to improve the quality of the course content and the pedagogical methodology. In this regard, in [16] it is pointed out that preventive and corrective decision-making in higher education institutions involves building predictive models based on intelligent systems. 371 | P a g e www.ijacsa.thesai.org As indicated in [6], researchers have been concerned in recent years to work on the development of models that allow understanding aspects of the academic life of the student, teachers and institutions that allow the preparation and making of correct decisions, for the improvement continuity of educational quality. Likewise, in [19] it is indicated that the results obtained and validations show a precision of 82%, therefore, it can be pointed out that the process describes an optimal performance of the algorithms, so its incorporation would be satisfactory to be incorporated to the management of virtual educational knowledge.
In relation to the metrics of the predictive model, the model obtained through Matlab R2021a presents a general precision of 82.21% and an accuracy of 90.63%, being considered an optimal model, in this regard in [20], the author states that his predictive model was good since its general precision was 75.42% and an area under the ROC curve of 0.805. Likewise, in the investigation of [27] it is pointed out that the general result shows that each of the techniques used shows a good result in the classification and prediction performance, obtaining a greater precision of 86.9%.
On the other hand, the results of [26] showed a precision rate of 89.31% and a specificity rate of 91.25%, these measures are substantial to select classifiers since the researcher intends to minimize false negatives.
Regarding the term optimal model, in [4] it is pointed out that the so-called optimal models are combined with the dominant sets, which significantly improve the performance of prediction models and are highly influential in academic performance factors. Likewise, regarding the area on the curve, whose highest value in this research was 0.93 or 93%, in [4] it is indicated that an AUC of 50% of 91% or 99%, which was obtained in the research represents a better Classifier algorithm performance, favorable results for research.
The results of this study, from the perspective of innovation, will make it possible to achieve great changes, delegating functions, promoting competencies and fostering the continuous updating of higher institutions, all from the perspective of visionary leadership. In [10] it is pointed out that the proposed model accurately predicts the completion of the course and the performance of students in the university, thus allowing the organization to provide a better quality of service, since the satisfaction of the student depends on it student.

IV. CONCLUSION
The use of technological tools such as Machine Learning and its algorithms are supporting and strengthening decisionmaking from an administrative and academic point of view and in the educational sector. According to the results obtained, it is concluded that the metrics of the Machine Learning model through Ensemble Bagged Trees, applied to the predictive analysis of university teaching performance, present on average optimal values in their validation metrics such in their 4 classes, with a precision of 82.21%, a Sensitivity of 73.40%, a Specificity of 91.02% and an Accuracy of 90.63%. From the validation of the Machine Learning algorithm metrics, its implementation is viable and reliable in improving the performance of university teachers. Finding the 4 classes of the predictive model with relatively high values, the results allow establishing the grouping of engineering students who can achieve a level of satisfaction based on the indicators called predictors (indicators), through which the authorities of the higher institution can make timely decisions to improve the percentage of satisfied students in relation to university teaching performance.
Once the conclusions are presented, it can be noted that the present study achieved its purpose of determining the best performance model for the predictive analysis of university teaching performance, which is why it can be used as part of the procedure to improve the quality of the educational process. Because these results allow to have a relevant, reliable database that is obtained in less time compared to manual processes.
ACKNOWLEDGMENT Thanks to the researchers who have contributed their knowledge in the development of this paper.