Educational Data Mining to Identify the Patterns of Use made by the University Professors of the Moodle Platform

Due to the events caused by the COVID-19 pandemic and social distancing measures, learning management systems have gained importance, preserving quality standards, they can be used to implement remote education or as support for face-to-face education. Consequently, it is important to know how teachers and students use them. In this work, clustering techniques are used to analyze the use, made by university professors, of the resources and activities of the Moodle platform. The CRISP-DM methodology was applied to implement a data mining process, based on the Simple K-Means algorithm; to identify associated groups of teachers it was necessary to categorize the data obtained from the platform. The Apriori algorithm was applied to identify associations in the use of resources and activities. Performance scales were established in the use of Moodle functionalities, the results show the use made by teachers was very low. Rules were generated to identify the associations between activities and resources. As a result the functionalities that need to be enhanced in the teacher training processes were identified. Having identified the patterns of use of the Moodle platform, it is concluded that it was necessary to use a Likert scale to transform the frequency of use of activities and resources and identify the rules of association that establish profiles of teachers and tools that should be promoted in future training actions. Keywords—Clustering; educational data mining; moodle; usage patterns; k-means algorithm; a priori algorithm


I. INTRODUCTION
The pandemic caused by COVID-19 brought with it a mandatory social confinement that forced societies to change customs and models and to face the need to enhance the use of technological tools to give rise to teleworking and tele education.
The face-to-face educational system necessarily migrated towards virtual teaching-learning environments, which generated new ways of understanding the educational process. Integrating information and communication technologies (ICT) to the educational process implies changes in the forms of communication, in the contents and forms of evaluation, changes in the role of the teacher and students, ICT can be used by teachers as technical-pedagogical support and by students as a tool for autonomous learning [1].
Although some educational institutions were gradually integrating ICTs as a means of enhancing their educational processes, in the context of the pandemic, many of them had to abruptly assume, without considering the context of their educational community, the use of virtual scenarios to carry out their educational activities.
Although some educational institutions were gradually integrating ICT as a means of enhancing their training processes, in the context of the pandemic, many of them had to assume abruptly and without considering the context of their educational community, the use of virtual scenarios to carry out their training activities. In particular, universities had to adapt to this transformation by assuming educational models that use technological tools of support and accompaniment to improve conventional teaching and learning processes. Within these tools are virtual educational platforms or LMS (Learning Management System) that enable new teaching-learning modalities. For [2] in the b-learning modality, which combines face-to-face with non-face-to-face teaching, the LMS favors learning. In the e-learning modality, the use of Internet-based technologies provides a wide range of solutions that promote the acquisition of knowledge and development of skills [3].
Training processes in virtual environments imply changes in the roles of their actors, the student stops being a passive consumer and becomes a producer of information and knowledge; for [4], the students must be autonomous and independent in their information search skills, must create new content in an innovative way and transform it into knowledge and must communicate effectively by decoding messages and transmitting information. For his part, the teacher assumes the role of guide and incorporates technology adequately and effectively, that is, incorporating technological tools with a pedagogical approach [5].
In this information context, in order for students to achieve meaningful learning, the teacher must use the LMS tools to manage resources and activities in a way that enhances the autonomous work of the student and favors the development of competencies and skills such as search and organization of information, teamwork and communication with their peers and with the teacher. That is why this research focuses on improving university teacher performance and consequently student learning.
For the analysis and evaluation of the use of LMS, data mining is a tool that allows determining patterns of behavior in the data obtained from the platform and identifies the factors associated with the success of online learning [6]. In www.ijacsa.thesai.org [7,8] they found that most works focus on the usability of LMS, limiting themselves to the student's role and neglecting the teacher's work process.
In [9] a statistical analysis is made of how mathematics teachers in a face-to-face model supported by the Moodle platform develop cognitive and action competencies in elementary school students. In [10] educational data mining is applied to the Moodle LMS to identify behavior patterns in students to identify the resources and activities that are best suited to the students' needs; it concludes that there is a correlation between the level of activity and their academic performance.
In [11], the authors investigated the use of Virtual Learning Environments by professors in a higher education institution; they sequentially combined three methods: processing of VLE logs, surveys, and interviews. In [12] data is collected in some Spanish universities to study the uses that university professors make of the virtual campus and the methodology they propose to students. The model of courses carried out concludes that the universities studied make minimal use of the platform and focus on making materials available to the student and that the model is characterized by the presence of basic, complementary, and organizational teaching materials, together with a proposal of individual and group activities.
Works such as [13, 14, and 15] have used the CRISP-DM methodology to implement data mining processes aimed at describing students' academic behavior.
In [16], the authors use data mining tools and techniques for academic improvement of the student performance and to prevent drop out. Four classification methods, the J48, PART, Random Forest and Bayes Network Classifiers were used, the data mining tool used was WEKA.
The authors in [17], propose two different guidelines: Learning Analytics focused on descriptive processes, and Educational Data Mining for predictive processes, directing activities adjusted to this environment for obtaining satisfactory results.
In light of the above, the objective of this work was to analyze the use of resources and activities of the Moodle platform, made by teachers at the National University of San Agustin de Arequipa (Peru). The importance of the work carried out is that based on the results obtained, recommendations can be made to enhance the use of this platform and favor the use of interactive and collaborative activities within the framework of a socio-constructivist pedagogical model.

A. LMS Moodle
Modular Object-Oriented Dynamic Learning Environment (Moodle) is a learning management tool or LMS developed to create and manage online training environments. Known by its acronym, Moodle is one of the most widely used content management systems globally. It provides a powerful set of learner-centered tools and collaborative learning environments that empower both teaching and learning.
Moodle fosters active and participatory virtual environments by enabling teachers and students to interact on the platform through chats, forums, videoconferences, among others. Consequently, it facilitates to the teacher the possibility of extending the limits of the classroom to spaces and moments different from the face-to-face class, it gives students autonomy to consult multimedia content and to interact and participate in learning communities [18].

B. Educational Data Mining
Data mining is a set of techniques and technologies that, in an automatic or semi-automatic form, allows the exploration of large databases; the objective is to find repetitive patterns, trends, or rules that explain the behavior of the data in a specific context. Data mining involves the interaction of different techniques and procedures from computer science, statistics, mathematics, and information science. The extraction and analysis of data has become important in multiple areas, because through the application of various techniques, it allows to transform that data into information and knowledge of great utility [19].
In the educational field, there is talk of educational data mining (EDM); which focuses on the development of discovery methods that use data from LMS to understand and improve virtual teaching-learning environments. The EDM builds analytical models that uncover interesting patterns and trends in the use of an LMS [20].

C. CRISP-DM Methodology
CRISP-DM (Cross Industry Standard Process for Data Mining) is a standard and open analytical model of the data mining process [21]. It includes a description of the phases of a data mining project, the tasks required in each phase, and an explanation of the relationships between the tasks. CRISP-DM provides an overview of the life cycle of a data mining process.
CRISP-DM divides the data mining process into six main phases ( Fig. 1). The sequence of phases is dynamic; it is possible to move forward or backward through the phases. The result of each phase determines which phase, or which particular task of a phase, is to be done next. The arrows indicate the most important and frequent dependencies. The advantages of using CRISP-DM include replicability, independence of the application context, and its tool neutrality [22].

A. Context
The research was conducted at the National University of San Agustín (UNSA) in Arequipa, Peru. Traditionally it works under a face-to-face model supported by the use of the Moodle platform. The University Directorate of Information Technologies (DUTIC-UNSA) is the unit in charge of the administration and management of the Moodle platform, in addition, it provides training and technical support services to teachers and students.
As a result of the COVID-19 pandemic, the UNSA migrated towards a mixed model in which face-to-face sessions are carried out through a videoconferencing system and arranged the mandatory use of the Moodle platform to dynamize and enhance the training process.

B. Methodology
Cross-sectional descriptive-exploratory research has been carried out.
To develop the data mining process, the steps of the CRISP-DM methodology have been followed.

C. Data
The study considered a universe composed of 4809 virtual classrooms implemented in the 2020-A academic semester. These classrooms correspond to the three academic areas of the UNSA: engineering, social and biomedical.

IV. DEVELOPMENT OF THE PROPOSAL
A. Phase 1, CRISP-DM Methodology: Understanding the Business Tasks related to understanding project objectives and requirements were performed to turn them into technical objectives and a project plan.
Objectives: This research focuses on analyzing the use that teachers make of the activities and resources available on the Moodle platform. It is intended to:  Investigate if there are differences in use between the three academic areas: Natural and Formal Sciences and Engineering, Social Sciences and Biomedical Sciences.
 Identify associated groups of teachers in the use of resources and activities using the Unsupervised Learning Algorithm Simple K-Means.
 Identify associations in the use of resources and activities using the A-priori association algorithm.

Resources:
 Technological: Computer, Google Collaboratory platform in Python language for data processing and model development.
 Technical: Data mining techniques.
 Human: The Researchers.
 Data source: Moodle platform usage logs during the 2020-A Academic Semester.
B. Phase 2, CRISP-DM methodology: Understanding the Data A first contact with the data was made to familiarize themselves, identify their quality, define the first hypotheses and establish the most obvious relationships.

1) Data collection:
The data was obtained from the records of the Moodle platform. For the data to have meaning it was necessary to integrate different tables. The "teacher" table with the correlative code of each teacher and their full name. The "course" table contains the correlative code of each virtual classroom, the short name and the full name of the corresponding subject. Both are related through the "course_teacher" table. The "resource_activities" table contains the number of times that the 13 activities and 7 resources available on the virtual platform have been used. This table relates to "course_teacher" through the course and teacher correlates. Fig. 2 shows the entity-relationship diagram of the integration performed.
2) Verification of data quality: Initially, the database considered the 4809 classrooms configured on the virtual platform. Data was cleaned considering that some classrooms had not been used or had not been assigned a teacher; 2 duplicate teacher records were deleted, 219 classrooms that had the Teaching field blank, and 335 classrooms that had less than or equal to a resource or activity used. As a result, we worked with data from 4255 classrooms. 3) Data exploration: The use of classrooms was analyzed for each academic area, Fig. 3 shows the percentage of classrooms implemented and the number of teachers who were in charge in each of them. The differences found are explained by the number of professional schools that are integrated into each academic area.
At UNSA, the academic areas are divided into Schools and these into Professional Schools. Table I shows the number of virtual classrooms implemented in the different schools, as well as the number of professors administered them.
Tables II and III show the frequency of the use of Moodle resources and activities in each academic area and the amount of total usage.

C. Phase 3, CRISP-DM Methodology: Data Preparation
In this phase, the necessary activities are carried out to build the data set that will serve for the modeling.
Each classroom was evaluated according to the frequency of use of each activity and resource. Finally, only the activities or resources used in at least 97% of the classrooms in each area were considered (Tables IV and V).
Consequently, in the academic area of Social Sciences ten tools were considered: Attendance, Chat, Questionnaire, Forum, Glossary, Task, File, Folder, Label and URL.
In the Engineering area, eleven tools were included: Assistance, Chat, Questionnaire, Forum, Glossary, Task, File, Folder, Label, Page and URL.
In the area of Biomedical Sciences, twelve tools were considered for the study: Attendance, Chat, Questionnaire, Forum, Glossary, Pdf Annotator, Task, File, Folder, Label, Page and URL.
In order to complement the information, Tables IV and V show the number of classrooms that have never used the activities or resources indicated by area of knowledge. www.ijacsa.thesai.org Calculated the frequencies of use of the Moodle functionalities, it was found that the values were very dispersed so it was necessary to reduce them. Considering that scale transformations are efficient instruments for reducing a data set, it was decided to transform the values from numeric to nominal.
On the cleaned data, the maximum and minimum values of the frequency of use of each activity and resource were identified. With these values the range was calculated and divided into five parts of equal length, obtaining the values to construct a five-level Likert scale. The scale values and weight of each level assigned were: 0 = Very low 1 = Low 2 = Medium 3 = High 4 = Very high

D. Phase 4, CRISP-DM Methodology: Modeling
In this phase, the most appropriate modeling techniques are selected for the specific data mining project.
The Simple K-Means classification algorithm was applied, using the elbow method. It was looked the part of the graph where the line changes abruptly which forms an "elbow"; that number of clusters will help when classifying the data. The appropriate was 30 iterations to determine the optimal number of clusters, obtaining k = 5 (Fig. 4).
The objective was to group similar observations and discovers patterns in the use of Moodle resources and activities in the transformed data based on the Likert scale.
It was worked with 4255 classrooms; each classroom has 30 fields including: area, faculty, school, course, teacher, activities and resources. Each classroom has the amount of resources and activities used numerically, therefore the Likert scale was applied to have a standard score from 0 to 4 with respect to usability; then, we eliminate columns that are not going to be trained, the data set used consists of 20 fields between activities and resources, which will be grouped using the K means algorithm, resulting in 5 groupings, cluster 1, 2 and 4 have less usability, cluster 5 has a medium usability followed by cluster 3 (Fig. 5) and Table VII. To identify associations in the use of resources and activities, the A-priori association algorithm was applied in each academic area.
First, we converted the data into 0's (activities and resources with score 0) and 1's (activities and resources with score 1,2,3 and 4) to work the model.  In the social area, associations were obtained with the tools file, URL, and task. In the biomedical area, associations were obtained with the questionnaire, homework, URL, and attendance. However, in the engineering area no association was found; therefore. Due to these differences, it was decided to perform the process again with the total data set.
Considering the definition of support given by [23], the association rules were constructed with a minimum support of 0.8; increasing the number of associations, where the following is true: antecedent = is used => consequent = is used, the probability of each association is observed in the "Confidence" column, the Lift column shows the increase in the probability of use when considering both tools rather than only the antecedent tool. (Table VI shows the results).  When analyzing Tables II and III, it can be affirmed that the resource "File" has been used primarily, followed by the resource "URL"; it follows then, that the virtual classroom is being used primarily as a repository of content and links to web content. Likewise, the activity "Homework" has been mostly used, followed by the activity "Questionnaire"; which means that the virtual classroom is being used to collect assignments and make assessments through questionnaires.
Table VIII (at the end of the paper) shows the results obtained by applying the Simple K-Means algorithm to the processed data. The nominal values of the Likert scale were considered for each of the tools in the clusters, since the distribution of the frequency of use is asymmetric, the value of the model was taken as a representative measure. Table VIII also shows that the teachers' use of Moodle activities and resources was classified with very low performance. Although this situation differs slightly in the areas of Social Sciences and Biomedical Sciences, it was found that in Engineering there was not enough support to find association rules.
There are no reasons to assume that teachers do not know how to use the different functionalities of the virtual platform; however, these results suggest the need to reinforce teacher training in the use of the activities and resources provided by Moodle. The training process should focus on the use of the virtual platform as a support to the synchronous sessions conducted by the teacher so that learning outcomes are enhanced through the implementation of active teachinglearning methodologies that can be easily implemented with the functionalities provided by Moodle [24,25,26]. Table VII shows the association rules obtained as a result of applying the Apriori algorithm to the cleaned data corresponding to the three academic areas. The association rules relate to the use of Moodle activities and resources. If the URL tool is used in the classroom, then the file tool will be used.

forum, URL = is used => file = is used
If both the forum and URL tools are used in the classroom, then the file tool will be used.

task = is used => file = is used
If in the classroom the task tool is used, then the file tool will be used.

task, forum = is used => file = is used
If both task and forum, are used in the classroom, then the file tool will be used 5 task = is used=> file, forum = is used If in the classroom the task tool is used, then the file and forum tools will be used.

URL = is used => file, forum = is used
If the URL tool is used in the classroom, then the file and forum tools will be used.

file = is used => task = is used
If the file tool is used in the classroom, then the task tool will be used. 8 file, forum = is used => task = is used If in the classroom the tools file and forum are used, then the task tool will be used.

file = is used => URL = is used
If the file tool is used in the classroom, then the URL tool is used 10 file, forum = is used => URL = is used If both the file and forum tools are used in the classroom, then the URL tool will be used.
11 file = is used => task, forum = is used If the file tool is used in the classroom, then the task and forum tools will be used.

file = is used => forum, URL = is used
If in the classroom the file tool is used, then the forum and URL tools will be used.
Association rules have been generated and they have allowed the identification of associations between activities and resources, rules that give evidence of the activities and resources that need to be enhanced in the teacher training processes. For example, from rules 5, 6, and 11 it can be inferred that if the teacher uses the task activity and the resources file and forum, then it is very likely that he/she will implement discussion forums among the activities to be implemented in the virtual classroom; therefore, he/she will not only be imparting knowledge but also supporting the development of communication skills and critical thinking. www.ijacsa.thesai.org

VI. CONCLUSION
With the development of this work, it was possible to identify the behavior patterns of university professors in the use of the activities and resources offered by the Moodle platform.
The use of a Likert scale to transform the frequency of use of activities and resources allowed to reduce the spectrum of values and to be able to find associations when applying the K-Means algorithm; using the elbow method with 30 it was determined that the optimal was to work with 5 clusters.
The activity of the teachers was characterized and it was found that the activities: chat, wiki, lesson, workshop, questionnaire, games and survey are not used despite their great potential as didactic material that can enhance the results of the teaching and learning processes.
The results obtained in this work will serve to implement teacher training processes focused on the proper use of Moodle activities and resources that allow the development of virtual or blended courses based on constructionist and social constructivist approaches.
Applying data mining techniques to the large amount of information generated by the Moodle platform can contribute to the creation of dynamic profiles in the development of a virtual course. In addition to improvements in the teacher's use of resources and activities, students' behavior patterns could be considered in order to adapt courses to their level of learning.