Enhanced Performance of the Automatic Learning Style Detection Model using a Combination of Modified K-Means Algorithm and Naive Bayesian

Learning Management System (LMS) is well designed and operated by an exceptional teaching team, but LMS does not consider the needs and characteristics of each student’s learning style. The LMS has not yet provided a feature to detect student diversity, but LMS has a track record of student learning activities known as log files. This study proposes a detection model of student’s learning styles by utilizing information on log file data consisting of four processes. The first process is pre-processing to get 29 features that are used as the input in the clustering process. The second process is clustering using a modified K-Means algorithm to get a label from each test data set before the classification process is carried out. The third process is detecting learning styles from each data set using the Naive Bayesian classification algorithm, and finally, the analysis of the performance of the proposed model. The test results using the validity value of the Davies-Bouldin Index (DBI) matrix indicate that the modified K-Means algorithm achieved 2.54 DBI, higher than that of original K-Means with 2.39 DBI. Besides having high validity, it also makes the algorithm more stable than the original K-Means algorithm because the labels of each dataset do not change. The improved performance of the clustering algorithm also increases the values of precision, recall, and accuracy of the automatic learning style detection model proposed in this study. The average precision value rises from 65.42% to 71.09%, the value of recall increases from 72.09% to 80.23%, and the value of accuracy increases from 67.06% to 71.60%. Keywords—Learning management system; log file, K-means; Davies-Bouldin Index


I. INTRODUCTION
The rapidly developing information and communication technology currently offer excellent potential to overcome the problem of equitable access to quality learning in Higher Education through the Learning Management System (LMS). LMS is a software application or web-based technology used to plan, implement, and assess a particular learning process. Although LMS is well designed and operated by an exceptional teaching team, the learning process through the LMS has a § Corresponding author's Email ID: rw@ugm.ac.id weakness: an inability to personalize the learning [1]. This is caused by the nature of LMS that provides the same content for all students in a given course. Each student has a different learning style and can learn better in different ways. Different geographical and socio-cultural locations of students will certainly form different learning styles [2]. Learning styles can influence and motivate students to take lessons. One of the main things that needs to be considered in learning with e-learning systems is individual learning styles that vary in LMS. For example, the contents of my course, subjects, and student behavior and online learning experiences can influence learning styles.
Currently, several learning styles have been used, such as Honey and Mumford, Kolbs, Felder Silverman Learning Style Model (FSLSM), and VAK [3]. There are also Gregorc's learning styles, Riding cognitive styles, and Myer-Briggs Type Indicator [4]. FSLSM is the most widely used learning style in the education system, which shows a high level of reliability, internal consistency, and validity [4]- [8]. This model defines student learning styles into four different dimensions (Active/Reflective, Sensitive/Intuitive, Visual/Verbal, Sequential/Global) based on student behavior patterns that use Elearning systems [6]. Students with a strong preference for a particular learning style may have learning difficulties if the teaching style does not match the student's learning style. To reach the goal of equal education successfully, the development of LMS is needed so that they present learning sources with the context and the learning process that is suitable for the student's learning styles to improve their performance. Therefore, it needs a way to classify the learning styles of each student by detecting their learning styles. of completing questions without understanding the purpose of filling out the questionnaire [14]. Students often answer questionnaires with unresponsiveness, so the results of detection using this questionnaire tend to be inaccurate [15]. Therefore, the focus of research using a static approach lies in the reliability and validity of the Index of Learning Style (ILS) instruments.
The second approach is automatic, which is based on actual behavior patterns during online learning. The approach is based on personality factors, behavioral factors and time. The automatic detection process is far more accurate, dynamic, and comprehensive than static detection because the interaction process is directly recorded without being noticed by the participant by using log file data and does not require special time [3]. Two methods can be used to determine learning styles automatically, namely: data-driven methods and literature-based methods [16]. Data-driven methods aim to build classification models that copy ILS instruments and use sample data to build models. Some classification techniques that are widely used to detect this learning style include Neural Network [17]- [19], Decision Tree [1], [6], [7], [20], [21], and Bayesian Network [22]- [26]. The literature-based method uses student behavior and actions with the system to identify their learning preferences. Some studies that use a literature-based approach include [2], [15], [17], [27], [28]. The developed method uses simple rule-based methods to calculate learning styles from the number of suitable instructions and does not involve system design. This approach still has problems in estimating the importance of various instructions used to calculate learning style preferences. Also, it requires some knowledge of psychology and cognitive science to estimate the importance of calculating learning style preferences.
Based on reviews [3] the most widely used automated approach model is the Bayesian Network model. Bayesian networks can naturally represent probabilistic information, efficiency, and support to encode uncertain expert knowledge. Also, Bayesian Network makes it possible to model quantitative and qualitative information about student behavior [22]. According to [29], in general, the Bayesian Network is too complicated for small data sets and is easy to be overfitted. This problem can be avoided by using the Naive Bayesian (NB) algorithm. The advantages of the classification using the NB algorithm are that it is easy to build because the structure is given a priority, and there are no learning procedures, as well as an efficient classification process. Both of these advantages are obtained by assuming that all features are independent of one to another. However, the requirements of each node must be separate, making the NB structure produce low accuracy. One of the improvements in the accuracy of the NB structure is to determine the appropriate class label before classification. One method that can be used to get class labels is the clustering method.
Clustering is very suitable for grouping data, which class labels are difficult to obtain at the time of feature generation. Many clustering algorithms are used to get class labels. One of the most used clustering algorithms is the K-Means algorithm. This is because the K-Means algorithm is easy to implement, the time needed to carry out this learning is relatively fast, easy to adapt, and is very suitable for clustering with a large number of groups. However, the K-Means Algorithm also has weaknesses, namely, the results of clustering are less than optimal due to the initial centroid in the initialization process are chosen randomly. If implemented with quite a lot of features, then the K-Means algorithm also has a problem known as the curse of dimensionality [30]. Therefore, to improve the performance of the K-Means algorithm, it can be developed by enhancing the initial centroid selection process.
According to [25], most studies detecting FSLSM learning styles group learning styles into eight combinations of learning styles. If observed from the FSLSM learning style model consisting of 4 dimensions with each dimension having two categories, then it is possible to have 16 combinations of learning styles. Therefore, in this study, a modification of the proposed K-Means algorithm was used to classify FSLSM learning style models to 16 groups before learning styles were detected using classification methods.
In this paper, the proposed improvement of the FSLSM learning style detection model is carried out by combining the modification of the K-Means algorithm with the Naive Bayesian classification algorithm. The detection process of the proposed learning style model consists of four methods, namely pre-processing, which aims to translate the data log file to several characteristics such as skills, level of knowledge, preferences, and learning styles that are considered to affect the learning process of students directly. This process produces in 29 features used for the grouping process of the dataset derived from the participants of the Education for Professional Teachers held by the Ministry of Research and Technology for teachers of English subject with 500 data. The second process is grouping using a modified K-Means algorithm to obtain cluster labels from each test data set. The fourth process is to detect learning styles from each data set using the Naive Bayesian classification algorithm, and finally to analyze the performance of the proposed automatic learning style detection model. This paper is organized as follows. Section 2 discusses related work. Section 3 elaborates the proposed model, followed by Section 4 containing analysis of performance evaluation of the proposed modified K-Means algorithm. Finally, Section 5 concludes this paper.

II. RELATED WORKS
Conventional learning generally uses a one-to-many tutor approach, where lecturers deliver material without looking at the diversity of students' knowledge, so the content offered is not optimal. One solution that can be used is to use a one-to-one tutor approach, but the method can be said to be impossible to be applied to conventional learning because of time constraints. The development of information technology has an impact on education, namely the use of a Learning Management System (LMS). The emergence of LMS has the potential to be applied to a one-to-one tutor approach because LMS provides easy access by lecturers and students without being bounded by time.
Learning style is an essential factor that plays a role in individual student's learning in any learning environment. Each student has a different learning style and different ways to understand, process, maintain, and understand new information. Learning style is a way for students to follow learning Honey and Mumford's learning style model introduces the concept of learning style based on the description of attitudes and behaviors that determines the way of learning preferred by learners using the Learning Style Questionnaire (LSQ) [31]. LSQ is designed to investigate the relative strengths of four different learning style dimensions from Honey and Mumford [32], namely, Activity, Reflector, Theory, and Pragmatic. Research carried out to determine Honey and Mumford's learning style models focuses on learning models. Research conducted by [31] produced a valid and reliable research questionnaire. Likewise, research conducted by [32] states that the research is significant following the principles of learning styles proposed by Honey and Mumford, which are statistically tested.
Kolbs, the learning style model, introduces the Learning Styles Inventory to identify individual learning styles [33]. Learning Styles Inventory is understood as a four-dimensional cycle consisting of Concrete experience (CE), reflective observation (RO), Abstract Conceptualization (AC), and Active Experimentation (AE). Research on Kolbs' learning style focuses on behavior by using the concept of Questionnaire [34], [35], and log file [36]. Research conducted by [34] aims to detect learning styles using Kolb's 4-dimensional and 9-dimensional, while [35] to detect learner's learning styles in LMS automatically uses the Naive Bayesian technique to replace the Kolbs' Learning Styles Inventory (KLSI). Research conducted by [36] aims to classify learner's learning styles based on the Decision Tree algorithm using the log data file.
The VAK learning style model categorizes learners' learning styles based on three dimensions [37], namely: Visual, Auditory, and Kinesthetic. VAK learning style research is mostly aimed at Behavior using Literature Base, Questionnaire, and Latent Semantic Indexing. VAK architecture to detects learning styles based on student behavior using simple rule-based techniques introduced by [37]. The research aimed at identifying VAK learning styles were carried out by [38] using the Decision Tree C4.5 algorithm on questionnaire data. Meanwhile, [39] predicted VAK learning styles using the artificial neural network (ANN) method is Latent Semantic Indexing.
The Felder-Silverman Learning Style Model (FSLSM) uses the notion of dimensions where each dimension contains two opposing categories, and each student has a dominant preference in each category of dimension. The four dimensions of the FSLSM are Processing (Active/Reflective), Perception (Sensing/Intuitive), Input (Visual/Verbal), and Understanding (Sequential/Global). FSLSM allows Learning Style (LS) to be measured based on the Index of Learning Style (ILS). Therefore, by using ILS, we can link the LS to the appropriate learning objects. The FSLSM learning style research model is mostly about behavior using log file data using different classification algorithms. Research by [40] classifies FSLSM learning styles using Fuzzy Cognitive Maps (FCMs), while [6] uses Decision Tree.
Some researchers also focus on finding appropriate learning style models in LMS, including [33], by comparing three models of Honey and Mumford's learning style questionnaire, Kolb, and FSLSM. The test results are measured based on how easy the questions to be understood, the time needed to fill out the questionnaire, and how the results are presented. The measurement results stated 67% of respondents understood the ILS FSLSM learning style model more easily and required less time than Honey and Mumford's and Kolb's methods. Whereas [41] evaluated the adaptive E-learning system based on the VAK learning style with FSLSM that had been developed using the LMS model. Based on the explanation, most of the researchers mapped the student's learning style model to the FSLSM learning style model. Also, the results of the study [33] stated that the FSLSM questionnaire model was easier to understand and needed more time to complete the assessment. Therefore, this study uses the FSLSM learning style model to automatically detect students' learning patterns in LMS for the participants of the Education of Professional Teachers (PPG) SPADA Kemenristekdikti for teachers of English language subject.
Several learning-style models have been introduced, such as the Honey and Mumford, Kolbs, FSLSM, and VAK models, but the main problem of learning through LMS is how to identify student's learning styles that fit the model. The issue of learning style can be solved using two main approaches, namely, the static and automatic approaches [42]. Learning style detection research using a static method is mostly used to measure the reliability and validity of the Index of Learning Style (ILS) instruments [11]- [13]. The results of the study to detect learning styles using a static approach show the value of preference in each low dimension, i.e., the average of each dimension is below 50% [10], [43]. This shows some limitations of the static approach; the first is related to the lack of student's motivation to fill out questionnaires and lack of awareness of their learning preferences.
The second problem is that filling out questionnaires is very tedious and takes up student's time because there are usually quite a lot of items on the polls. The third problem is students can be influenced by the way the questionnaire is formulated, which can affect students in providing the answers [3]. Based on the weaknesses of the static learning style approach, subsequently, many researchers conducted research using an automatic method.
Research on learning style detection using an automatic approach mostly uses data-driven, which is the data log files. Besides, the study conducted aims to determine the best classification algorithm among Algorithm Decision Tree (J48), Artificial Neural Network, and Support Vector Machine to detect student's learning styles into eight learning styles FSLSM [5], [17], [18], [21], [23], [26], [44]. The results of the comparison of the performance of the classification algorithm to detect FSLSM learning styles provide the Naive Bayesian algorithm better than other Data Mining algorithms. However, the Naive Bayesian algorithm has precision and accuracy values that are still below the algorithm of Artificial Neural Network with J48. This proves that the classification approach can be very accurate, depending on the available data. Improving the accuracy of the classification model can be done by determining the appropriate class label using the clustering method. Therefore, in this study, the performance improvement of the Naive Bayesian algorithm for detecting learning styles automatically using an algorithm for grouping log file data based on the FSLSM dimensions, namely the modified K-Means algorithm before classification.

III. PROPOSED METHOD
An outline of the proposed automatic learning style detection model using the merging of K-Means logarithm modification with Naive Bayesian is shown in Fig. 1. Based on Fig. 1 this research process consists of four main steps, namely: observation and pre-processing, the process of grouping learning style models using modified K-Means algorithm, classification using the Naive Bayesian algorithm and testing of the proposed model.

A. Observation and Pre-processing
The purpose of this step is to get features that correlate with the type of FSLSM learning style. This stage analyzes the data log file based on four dimensions of the FSLSM model. The observation process was carried out on 47 files from the log file data to determine the features of the log data file. Logfile data is formed automatically when students use the LMS system. The system records all activities in the form of chat, forums, quizzes, exercises, assignments, examination submissions, frequency of accessing subject matter, etc. These activities then formed the features. Furthermore, each file that has features that correlate with features needed was sorted to obtain 4 dimensions of FSLSM. Based on the observations of 47 files from the log file data, 22 files are containing 423 features that may correlate with the features of FSLSM.
The pre-processing was carried out on 22 files from the log file data by removing: • Duplicate data, thereby reducing the number of rows and columns from the data set.
• N.A. data for each user id should have recorded data on the activities of the use of the learning system. Still, the information is not widely available, so there is a lot of incomplete data.
• Removes rows of data that cannot be related to data rows in other tables because they do not share the same column.
• Determine user ID of PPG SPADA participants Kemenristekdikti teachers teaching English subjects as much as 500 data randomly.
The pre-processing process resulted 29 features consisting of 9 dimensions of processing features, 9 features of perception dimension, 6 features of the input dimension, and 5 features of understanding dimension, as shown in Table I. Modified K-Means algorithm is used to obtain labels from the learning style model for detection are shown in Fig. 2. Modifications of the K-Means algorithm are performed to determine the data set to be selected as the initial centroid.
The process of clustering using algorithms K-Means can be explained as follows:

1) Early initialization and centroid determination process:
This step is used to determine the number of clusters (K) and the objective function value (F O). This research uses K = 16 according to the FSLSM learning style model grouping, as shown in Table II.   TABLE II The value of F O is determined by a sufficiently high value, for example, 1000. The purpose of determining the initial value of F O is that the iteration process is not only done once so that the clustering results can be optimal. The next step is the process of determining the initial centroid. This step is the core of the proposed modified K-Means algorithm . Modifications made are in the process of determining the initial centroid using rules established by the author. In contrast, in the original K-Means algorithm, the initial centroid determination is done by selecting K random data set.
The rules used to determine the initial centroid are 16 data sets that are carried out by identifying all data sets that meet the FSLSM learning style model criteria in Table II, which was discovered first. The criteria for each FSLSM learning style model are available in Table II, which was used to determine the initial centroid using the following rules: • The learning style in the Processing dimension (D1 i ) determined by equation (1).
with i is the dataset number i, A is an Active learning style category, R is a Reflective learning style category, and P 1 i is the value of preference at D1 i obtained from the equation (2).
with the provision of: F H, EH, and CH : is the maximum value of each preference in the Processing dimension, respectively, the Forum feature (F ), E-mail feature (E), and Online chat features (C).
• The learning style on the Perception dimension (D2 i ) determined based on the equation (3).
with i is the dataset number i, S: is the Sensing learning style category, I: is an Intuitive learning style, and P 2 i : is the value of preference at D2 i obtained from the equation (4).
with the provision of: RH, AH, and ExH : is the maximum value of each preference in the Perception dimension i.e., successively is the Exam revision feature (R), Assessment features (A), and Exercise features (Ex).
• The learning styles in the Input dimension (D3 i ) determined based on the equation (5).
with i is the dataset number i, V i: is a category of Visual learning styles, V e: is a Verbal learning style category, and P 3 i : is the preference value at D3 i obtained from the equation (6).
with the provision of: IH and M H : is the maximum value of each preference in the Input dimension, which successively is Input teks feature (I) and Input Multimedia features (M ).
• The learning styles in the Understanding dimension (D4 i ) determined based on the equation (7).
with i is the dataset number i, Se : is a category of Sequential learning styles, G : is a Global learning style category, and P 4 : is the preference value at D 4 obtained from the equation (8).
with the provision of: The sequence of dimensions obtained in each dataset number i using equations 1, 3, 5, and 7 that is [D1 i , D2 i , D i , D4 i ] which then is used to identify learning style models that correspond to Table II. The initial centroids are taken based on the order of the FSLSM learning style model criteria in Table  II which first was found in the dataset that the learning style model has been identified.
2) Calculating the distance of each dataset to the initial centroid and group the data into clusters with the closest centroid distance: This step is used to calculate the distance of data number i (x i ) to every initial centroid number k (c k ) using the Euclidean distance formula, as shown in the equation (9).
where d ik distance of data i to centroid on cluster k, i = 1, 2, . . . , n with n is the number of datasets, k = 1, 2, . . . , 16, and m are the number of features.
Furthermore, group the data into clusters with the shortest distance. A data will be a member of the cluster k if the distance of the data to the centroid k is minimal, compared to those of other centroids. This can be calculated using equations (10).
where c i is the minimum cluster distance in each data point, then the new cluster membership is determined based on centroid with minimal distance.

3) Calculating a new centroid:
This step is used to calculate the value of the new centroid by finding out the average value of data sets that become the members of the cluster using equation (11).
where p is the amount of members in the cluster k.

4)
Calculating the distance of each data set to a new centroid and calculating the objective function values: This step is used to group data into clusters with the shortest distance using the new centroid generated in step 3, then calculated F O's value. The calculation value of F O is obtained from the closest distance from the new centroid between each data, which matches the cluster results from the previous iteration.

5)
Determining the converging conditions of the iteration process: This step is employed to determine whether the iteration has converged or further iteration is required. The K-Means algorithm in this study was considered convergent if it fulfilled the following two conditions: • The value of Delta smaller than the threshold value (T ) desired. The value of Delta is the deviation of F O on two consecutive iterations, which can be calculated using equation (12). • There is no change in cluster membership.

C. Classification Process Using the Naive Bayesian Algorithm
Naive Bayesian (NB) is the algorithm assumes there is no correlation between variables for a given output value. The NB method is based on Bayes's Theorem. If there are two separate events X and K, then Bayes' Theorem is formulated using equation (13).
with: X : Data with unknown class K : Data hypothesis is a specific class P (K|X): Hypothesis probability K based on condition X P (K) : Hypothesis probability K P (X|K): Probability X is based on a hypothesis K P (X) : Probability X.
NB theorem is a classification process that requires some clues to determine the appropriate class for the sample being analyzed. Based on Bayes's Theorem in equation (13), the NB theorem can be formulated using equation (14).
where, K represents class, while variable F 1 , . . . , F n represents the clue features needed to classify. Equation (14) explains that the probability of entering a sample of certain www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 3, 2020 characteristics in a class K (posterior), which can also be formulated using equation (15).
P osterior = prior × likelihood evidence (15) with prior is the opportunity for class K to emerge before the entry of the sample, likelihood is the opportunity for the appearance of sample characteristics in the category K, and evidence is an opportunity for the emergence of sample characteristics globally.
Evidence values are always fixed for each class in one sample. The value of the posterior will later be compared with the values of the other class posterior to determine the sample that will be classified into the appropriate class. Further elaboration of the NB formula is done by explaining it (K, F 1 , . . . , F n ) by using very high (naive) dependency assumptions. Each feature (F 1 , F 2 , . . . , F n ) is assumed to be independent of each other, so that equation (16) applies. (16) for i = j can be formulated using equation (17).
or it can be written with notation as in equation (18).
Based on equation (18) the NB theorem for the classification process can be formulated using equation (19).

D. Model Testing
A test to recognize the performance of the developed method consists of two processes, developed method they are clustering algorithm validation test and classification algorithm test. Cluster validity is obtained by measuring the cluster result based on a specific criteria. Cluster validity methods that are often used include Davies-Bouldin Index (DBI), Silhouette Index (SI), and Dunn Index (IN ). Cluster validity measure used in this study is DBI since DBI has a reasonably good performance, which shows high accuracy and low time complexity [45].
David L. Davies and Donald W. Bouldin (1979) introduced the DBI matrix used to evaluate clusters. Cluster results are said to be good if the value of DBI is as small as possible (non-negative ≥ 0). Validity is done to measure how well the clustering is done by calculating the quantity and derivative features of a data set based on cohesion and separation values. The cohesion matrix or Sum of Square within-cluster (SSW) in the i cluster is formulated by the equation (20) [45].
where m i is the number of data in the cluster i, c i is the centroid of the cluster i, and d(x i , c j ) is the same distance equation formula used when clustering process was performed Euclidean equation, city-block, and so on.
The matrix for separation between two clusters, for example, cluster number i and j using the formula Sum of Square Between Clusters (SSB) by measuring centroid distances c i and c j as shown in equation (21).
Further, The value of DBI is obtained from equation (22).
where K is the number of clusters and R i,j is the ratio of the total of sum of square within cluster for each corresponding cluster to their sum of square between clusters which is formulated using equation (23).
Testing the classification algorithm in this study in this conducted using a multi-class confusion matrix [46] n × n with n = 16, because it is used to analyze the classification of learning style detection containing 16 classes. If using a multiclass confusion matrix, the total number of false negatives (T F N ), false positives (T F P ), and true negative (T T N ) for each class number i will be calculated based on Generalized (24), (25), and (26). equations. Total true positive (T T P (all)) in the system is obtained through equation (27).
T F P (i) = The performance of the proposed system in obtaining the relevant data is measured using Precision (P ) or also called positive predictive value, while Recall (R) is used to measure the performance of the proposed classification in getting the relevant data to read. The class i used to calculate P and R for each class i equations (28) and (29).
The values of P and R are combined into one matrix called F-measure (F ). The F is an average value of weighted harmonic between P and R. The F is calculated using equation (30).
The performance of the proposed model built by the classification algorithm can be done by calculating the accuracy. The accuracy is calculated using the following equation (31).
Overall accuracy = T T P (all) T he total amount of test data × 100%.

IV. ANALYSIS AND DISCUSSION
The results of the pre-processing process are obtained base on the data from PPG SPADA participants from Kemenristekdikti teachers teaching English subjects containing 500 data. The data set consists of 29 features, which consist of 9 features to determine the Processing dimension, 9 features for the Perception dimension, 6 features for the Input dimension, and 5 features for the Understanding dimension. Each feature contains several activities in each learning module consisting of 6 modules. The performance analysis of the proposed learning style detection model was tested using the Matlab R2013a application.
Testing the validity of the clustering algorithm is carried out by comparing the maximum value of DBI(R) in each cluster between the modified algorithm with the original K-Means. The value of R in each group for one experiment is depicted in Fig. 3 Fig. 4 shows that value of DBI for the original K-Means Algorithm is unstable, and the clustering result for each data set also differs for each attempt. This is because the value of initial centroid always changes since it is determined randomly, which causes the validity of the algorithm always to improve. While the value of DBI for K-Means algorithm that had been modified remains similar to the clustering result for each data set, also it does not show any change. This result shows that the modified K-Means algorithm is good enough compared to the original K-means so that the data of the clustering result using K-Means algorithm that have been modified increases the performance of the classification algorithm to detect the learning style of the participants of PPG SPADA Ristekdikti of the English teachers.

B. Classification Results using the Naive Bayesian Algorithm
Based on the test results from 500 data sets between class labels, the results of clustering using the modified K-Means algorithm that 358 out of 500 data (71, 60%) have predicted classes that equal to the correct class. In contrast, the class labels that are different from the prediction results are 142 data (28, 40%). The precision and recall values are shown in Table V.  Table V shows the average P is 71.09%, which means that the level of accuracy of the detection information of the learning style model desired by the user with the answers given by the proposed model is quite high. While the average value of R is 80.23%, which shows the performance of the proposed model is quite good, above 70%. Table V also shows 12 of the 16 class learning style models have value R higher than 70%, which means 75% of learning style of the course participants were successfully detected using a combination of modified K-Means algorithm with NB classification.
The proposed method successfully classifies each FSLSM learning style model quite well. This can be seen from the average value of precision and recall, which is almost balanced, and the F-Measure value is 75.38%, which is higher than 70%. The accuracy of the proposed model is also quite good, which is 71.6%. This shows the level of similarity of the prediction of the learning styles of PPG SPADA participants of the Ministry of Research, Technology, and Higher Education teachers of English subjects, and the learning styles model is quite close.
The performance of the proposed automatic learning style detection model is compared to the learning style detection model if the clustering algorithm uses the original K-Means algorithm performed by measuring the average value P , R, accuracy value, and F-Measures tested 10 times . The test  results are shown in Tables VI, VII, VIII, and IX. Based   TABLE VI. COMPARISON OF THE RESULTS OF TESTING THE  on the results of testing the average value of precision, recall, accuracy, and F-Measure as shown in Tables VI, VII, VIII, and IX can be seen that the use of a modified of the K-Means algorithm to form labels before classification has increased. This shows that changes made to the K-Means algorithm improves the performance of the learning style detection model when using the original K-Means algorithm. In addition to the increasing performance of the proposed method, the average values of precision, recall, accuracy, and F-Measure also did not change. It shows if the performance of the technique of learning style detection proposed has stable performance.

V. CONCLUSION
This research succeeded in building an automatic learning style detection model using a combination of K-Means algorithm modification with Naive Bayesian. Based on the test results, there is a modification of the K-Means algorithm, which is used to form labels on the learning force detection models proposed in this study can improve the performance of grouping the data sets when compared to the original K-Means algorithm. The results of testing the validity of the modified K-Means algorithm are better than the original K-Means algorithm. Besides that, the DBI value on the modified K-Means algorithm has the same value every time it is implemented. This shows that the modification of the K-Means algorithm is more stable than the original K-Means algorithm so that the labels of each data set do not change.
The proposed learning style detection model by using a combination of modification of the K-Means algorithm before classification can improve the performance of the learning style detection model if the labeling process uses the original K-Means algorithm. The average precision and recall values of the test data set are 71.09% and 80.23%, which means the proposed model for detecting learning styles works well. The accuracy value of the proposed model is still quite good, i.e., 71.6%, which is higher than the average accuracy of the learning style detection model that uses the original K-Means algorithm for the clustering process, which is 64.8%. This shows that the level of closeness between predictions with the original learning style model is quite high.
As part of future work, the proposed model allows for increased accuracy, precision, and recall values by improving the performance of the Naive Bayesian classification method using the Augmented Naive Bayesian Tree algorithm or Artificial Neural Network-based classification algorithm.