SECI Model Design with a Combination of Data Mining and Data Science in Transfer of Knowledge of College Graduates’ Competencies

org


INTRODUCTION
Competence is defined as knowledge, skills and individual abilities that can directly influence entrepreneurial activity, which is the goal to be achieved [1].Meanwhile in the field of education, measuring the competence of graduate students can be done using the models that describe the transfer of information, such as the SECI model (Socialization, Externalization, Combination, and Internalization).In [2], the study relates to the design of measuring instruments and means to measure student competence in the enterprise system laboratory and research solutions using the SECI model.Data science, predictive analytics, and big data have been frequently implemented in various fields.Rigorous academic investigation of this science and method will lead to new areas.Article [3] discussed the results of a recent large-scale survey among supply chain management professionals on this topic.Data analytics and big data are also discussed in the research [4] which review how data mining and learning analytics have been applied to education data.In the last decade, the field of research has grown enormously.Various related terms are used in bibliographies such as academic, institutional, and teaching analytics, data-driven, decision-making, big data, and data science in education.
The data mining functionality helps in predictions of various things such as predicting customer buying behavior in the field of business [5], and predicting the final residual value in education field [6].Besides prediction, the most frequently used functionality is clustering and classification.In the era of big data, in particular, clustering is a prevalent theme [7].Finding clusters with different densities is usually tricky because finding clusters with various densities uses a fixed radius.The extended method was used to find clusters with different densities [8].As in research [9], The K-means algorithm, which considers the distance from each point to each centroid in each cluster, is also commonly used.Decision tree classification is a method that is also commonly used [10].At the same time, other algorithms in classification techniques are often compared [11].
Outliers in the data sometimes occur, namely when mining is done and patterns in the data that do not match the expected behavior found.Develop work using a statistical approach used to detect outliers [12].Evaluation of the performance of data mining classification methods, for example, in research [13] that evaluates classification performance in the Internet of Things applications, needs to be done.Related research describes a significant literature survey of Sustainable Smart Cities, Machine Learning, and Data Mining.The most cited relevance and method and feature set data were identified, read, and summarized [14].In research [15], a survey was conducted, which reviewed and discussed a detailed analysis of 142 research articles using various techniques.This survey will *Corresponding Author.www.ijacsa.thesai.orglater introduce a model of the various data mining functionalities.
Data collection was necessary for knowledge transfer.A qualitative approach was used to analyze the collected empirical data.Data was collected using various sources, including semi-structured interviews, questionnaires, and internal document processing [16].Tools were discussed in a study [17] which presents data mining in the data transfer process using the SECI model.However, the discussion was still unclear, there must be a more detailed explanation regarding data mining in each quadrant.
Research conducted with large amounts of data or without special tools would make it difficult for researchers to examine the data.In terms of quantitative data in particular, using tools in the form of technology was very useful in collecting and processing data.Data mining could be used in information dissemination and data processing.In data management, data mining techniques can extract and find valuable and meaningful information from large amounts of data [18].Implementing information technology, especially data mining, reduces the burden of human error.However, the process still needs to be carried out clearly and in detail, so that every point in the SECI model process can ensure the correctness of the data transfer process.
Graduate competence is very important for Higher Education.Students must be prepared to stand out in the workforce after graduation.Transferring information from graduates can allow college to obtain data and information which can be used as a strategic aspect in future curriculum development.Transfer of graduate knowledge with the SECI Model design was accompanied by improvements to the data mining design in each quadrant of the SECI model can help to overcome the previous model's shortcomings.The design of this research model was a refinement of the previous model, where the SECI model quadrant has data science and data mining which were more detailed in the process and has a special data mining algorithm.

A. Transfer of Knowledge with the SECI Model and Information Technology
State of the art mapping and knowledge transfer research gaps using the SECI model have been discussed from the discussion of previous preliminary research [18].Data analysis from related papers were presented generally in the form of bibliographic data analysis using the Vos Viewer as illustrated in Fig. 1  Several studies that show the use of SECI model include designing measuring instruments and determining how to measure with SECI model [2] and managing knowledge using the perspective of SECI model [19].Then to get something or new results using SECI model was shown in studies that empirically test the analysis to create knowledge models [16] and knowledge transfer using SECI model to create new knowledge [20].While from a technical point of view, research [21] actualizes and provides empirical consistency to the theory underlying the SECI model.
Information technology as a tool, will facilitate the search for results.Research [22]- [24] uses data mining as a tools to create knowledge management system.Process Knowledge discovery process from databases use data mining techniques, and the SECI knowledge dimension model.Thus, information technology enables the completion of various functions with SECI model.

B. Slice of Knowledge Mapping Gap
Information technology can be used for the design of models and frameworks.The framework builts by adding information technology to the knowledge management system [17].Models can be developed by exploring the adaptation of information technology to the knowledge conversion process [25].
The SECI model helps a lot in transferring information in building a knowledge management system in an organization [16], [20], [21].With SECI model, knowledge transfer will become more focused and measurable in each quadrant.Research gaps can be seen in weaknesses that arise when tools are not used, as in research [2], where human abilities, understanding, and resources limit the knowledge transfer to the graduates.There are often inconsistent perceptions between respondents and interviewers.Another loophole occurred in research [25] where there were limitations in data collection due to privacy issues related to sources and some informants' refusal to participate in the research.
Research that used technology to assist, is limited by small and inadequate sample of data [26], and not all implementations of information technology fit in every phase of SECI model [24].Data mining as part of information technology has been implemented in research [17].However, the discussion is still unclear and concrete; each SECI process must have an explanation in its quadrants.From the deficiencies described, the previous model needs to be corrected and revised, as discussed in several articles.Data mining has more detailed features to explain each data transfer process.Functions that can be used include classification, grouping, and association functions.This function can complement the model created to explain and fill the gaps in previous research.When designing data mining models in the SECI model, relevant and suitable algorithms can be used in the quadrants of the SECI model.

III. RESEARCH METHODOLOGY
The stages for the research method begin with data collection.Data is used to formulate the problem, which can be done using data collection techniques in the form of a literature study.This data collection technique is carried out by www.ijacsa.thesai.orgcollecting the required data using literature studies and looking for scientific journals related to the research theme.Related journals are built in the form of state-of-the-art to get research gaps.The next stage is direct interviews conducted with the head of the Study Program, the Dean, and Vice Chancellor III of student affairs, as well as distributing questionnaires to respondents, namely graduates-technical data collection using written and computer media tools.The questionnaire was carried out by asking several questions to the respondent, namely graduates who had worked.The questionnaire results are stored in written documents or computerized storage media.This initial stage is the first stage in the SECI model, namely socialization which is the transfer of knowledge from tacit to tacit.
The next step is the Externalization stage, where data transformation is performed.At this stage, hypotheses and conclusions are sought from the data that has been obtained.This process continues to the data cleansing process, which is carried out because the data received is still not clean and must be reprocessed before proceeding with the next data.The Externalization quadrant focuses on data documentation for storing data from tacit to explicit.
After the data is transformed or processed, it is continued to the data analysis stage at the Combination SECI explicit to the explicit stage.Datasets that have gone through the process of transformation and cleansing are then grouped using the K-Means Algorithm before being predicted using the same data.Classification forecasts use the Decision Tree Algorithm to test future possibilities.The third data mining functionality, namely the association, uses the Apriori Algorithm, which adapts transaction data to map graduate competency data with associated attributes.At the data analysis stage, system design and development are in the combination phase.Patterns are extracted from the test data to produce output from training data.Discovery Process describes data mining as the process of disseminating data.Data mining is about finding information or information that was not previously known to exist.Data science plays a role in information extraction, where data science helps extract valuable information from mined data.Data is analyzed from data mining results, identifying relevant patterns, trends, and relationships and exploring insights that can be used for decision-making.
The internalization process is carried out by transferring knowledge from explicit to tacit.This is done after the system is completed, implemented, and tested.This process is carried out by returning explicit knowledge to tacit by sharing the analysis results with users.The results of this analysis can also be strategically used by management.
The following is a draft framework for the SECI model with data mining and data science for graduate competency data, which is given in Fig. 2.
Tests and results can be carried out using statistical methods to test the results of the implementation of data mining and the knowledge management model built whether they are appropriate and suitable, then conclusions are drawn.Testing and evaluation performed with tools to see the relationship between the model results built and user needs.

A. Stages of Socialization
The first stage is in the socialization quadrant.Data is collected by tacit to tacit knowledge.Knowledge from related parties who mastered and indeed in their fields to formulate graduate competencies was interviewed.The parties in question are the Head of the Study Program, the Dean, and Vice Chancellor III, who handle students' affairs and graduates.This interview was conducted based on official documents from the government in the form of a Regulation of the Minister of Research, Technology, and Higher Education of the Republic of Indonesia.Government regulations define National Higher Education Standards for various levels of education programs.This research is limited to the undergraduate graduates in two fields of study: computer science and also economics and business.
The competence of graduates is contained in four formulations, namely the formulation of general attitudes and skills, which contain the same formulation for all study programs.The following formulation is specific knowledge and skills tailored to each study program.From interviews with related parties, graduates' competency formulations were obtained and mapped into questionnaire questions aimed at the intended respondents, which is the graduates.

B. Stages of Externalization
The next SECI quadrant is the externalization stage which includes the process of cleansing and transforming data.The data is divided into two types of respondents with questions tailored to their respective fields of knowledge.From a

SECI Model
System Implementation Data Science www.ijacsa.thesai.orgpopulation of 6,536 graduates of a tertiary institution, a sufficient number of samples will be sought.Respondents consisted of graduates from two science fields, computer science, and economics.Respondents from the computer science field consisted of information systems, informatics, informatics management, accounting computerization, and computer engineering study programs.Meanwhile, respondents from economics consisted of management and accounting study programs.
The data from the results of filling out the questionnaire totaled more than 400 respondents.After cleansing, it was found that some data did not meet the requirements, such as blank and double contents.From the cleaning results, clean data for 387 graduates was obtained.According to the Slovin formula, from a total population of N totaling 6,536 graduates and with an error tolerance of e = 0.05, the minimum number of samples of n respondents is obtained as follows: With this result, this number is sufficient for the minimum number of the total population of all graduates.Then testing the data is done, namely the validity and reliability testing.Validity test is done to see the correlation of each question to other questions.Following are the results of the correlation validity test for each question for the data types of respondents who are graduates of computer science and economics and business, which are presented in Fig. 3 and 4.
From the two pictures above, with an error tolerance of 0.05 from the r table value of 0.1195 for computer science graduate data and 0.1851 for economics graduate data, it can be seen in these figures that the r count meets the r count requirements exceeding the r table value and can be said to be valid.Reliability test for computer science, and economics and business graduate data is presented in Tables I and II below: From the results of the reliability test above, it was found that the reliability value was > 0.9.Then the data reliability results are included in the firm and perfect category.The data transformation process adjusts the data format to enter the data mining technical process.

C. Stages of Combination
The system was built and designed with data mining at the SECI combination model stage consisting of three primary functions: classification, clustering, and association.The data is divided into three appropriate sections for the three functionalities of classification, clustering, and association.Especially for associations, the data is adjusted to the number of choices of respondents' answers so that the amount of data is no longer the number of respondents but the number of choices.

1) Classification:
The classification dataset consists of 14 questions covering three elements of attitude, five elements of general skills, four elements of knowledge, and two unique skills.Because the questions on specific knowledge and skills are divided into two types, the total number of questions is 20.The details of the questions are presented in Table III   The target variable in the graduate competency dataset is whether the graduate's work is following the field of study according to the study program or not.The algorithms and programming languages used are decision trees and Python programming.The Python libraries used for classification include the Pandas Library, Matplotlib, Numpy, Pydotplus, and Graphviz.Binary decision tree results are obtained up to 23 levels.Fig. 5 below shows the two highest levels of the classification tree. Gini value = 0.434 means that the quality of separation is in this number.The formula for calculating the gini is given as follows: x is target variables that are by the field of science, and y is target variables that are not by the field of science.
 Value = [123,264] means that out of a total of 387, 123 will get the "Not suitable" category, and 264 will get the "Suitable" category.
A similar discussion also applies to subsequent branches up to the last branch at the 23rd level.
2) Clustering: The grouping of data will later be connected with segmentation.The data that is suitable for use is the identity of the dataset respondents.Of the many questions, three attributes will be taken, namely age, GPA, and length of study in years.The Elbow method determines the number of clusters and cluster members; the K means algorithm is used.
From the results of coding Python programming with libraries almost the same as classification, we get a recommendation for the number of clusters K = 3 using the Elbow method.
The formula for the distance of two points for three dimensions is given as follows: Furthermore, the results of clustering with a 3D plot are obtained as illustrated in Fig. 6 below: These results indicate that the age attribute of the respondent most influences the cluster results.Meanwhile, the other attributes, namely GPA and length of study, are spread throughout the clusters.
3) Association: The Python library required for association is almost the same as for classification and clustering.The addition is the Apriori Library, which is used for association.Graduate competency data associations are derived from transaction associations.The dataset contains knowledge data from graduates answering eight multiplechoice questions in checklist format, allowing for the selection of more than one option.The number of dataset records corresponds to the number of options selected, rather than the number of respondents.The association questionnaire questions are presented in Table IV   Ability to work independently or as a team member in different situations, and ability to lead and manage teams on complex projects.

7
Knowledge of social, cultural and environmental diversity, as well as the ability to adapt to differences and respect this diversity 8 The ability to develop life skills, such as the ability to solve problems, make decisions, and learn The association support formula is as follows: Association uses the Apriori Algorithm using a specific support value.With the support of 0.4 of the 1774 selected dataset records, 17 rules are obtained.Detailed rules consist of 15 with two itemsets and two with three itemsets.The rules consist of 3 itemsets, namely the itemsets for questions 8, 4, and 2 and questions 8, 6, and 2.

D. Stages of Internalization
The final stage of the SECI model is internalization, where explicit knowledge returns to tacit.From the results of data mining processing, analysis of the results of data science and analytical recommendations will return to the relevant parties.The results obtained from the three significant functionalities of classification, clustering, and association are given to the Chair of the Study Program, Dean, and Vice Chancellor III for further identification and study to plan future strategies.The strategy in question is to prepare students who are still in college so they can face the world of work after graduating from college, accompanied by sufficient knowledge and competence.

E. The SECI Model with Data Mining and Data Science
The results of three data mining functionality classifications, clustering and association, are then discussed; concerning data science to find the results of data analytics that institutions can later utilize.Data science focuses on extracting, processing, analyzing, and interpreting data to understand phenomena better.The main goal of data science is to identify patterns, trends, and insights contained in data to make better decisions or develop more effective solutions to problems.
In the design model that has been made, with the dataset obtained, for classification, the highest branch obtained is question number 6 (X [5]), namely the Competency Standards for General Skills Elements: Scientific Communication which most influences the suitability of graduates between current jobs, and fields of study obtained from lectures.So from the results obtained, for related parties, both for the Chair of the Study Program, the Dean, Deputy Chancellor III, and graduates.The analytical recommendations can be given so that they can focus more on the elements that influence graduates to be able to work if scientific communication is essential.Various scientific or technical information delivery activities can be carried out, such as seminars or forums, which support students so they can be more involved in scientific journal publications, conference presentations, and scientific discussions.
The cluster results, which consist of the attributes of age, GPA, and length of time in college, tend to be the group most influenced by age.Data science plays a role in segmenting cluster results, where analytical recommendations can be given to respondents based on age.The recommendation is estimated with age groups that are close together; they will likely be in adjacent generations and at almost the same era, so it can be suggested that events involving graduates can be divided by age or generation.
Recommendation analytics for associations depend on the results of the rules.For rules with three itemsets in questions number 8, 4, and 2 as well as numbers 8, 6, and 2, provide an association rule pattern: If graduates have life skills and can recognize information or the ability to work, then graduates will be able to think complexly.Recommendations that can be given to the Head of Study Program, Dean, and Vice Chancellor III are that students can be equipped with sufficient soft skills to face the world of work after graduating from college.

V. CONCLUSION
SECI modeling has been widely used by researchers in transferring knowledge.However, along with the increasingly sophisticated technology, tools will be very helpful in transferring knowledge.With the construction of the SECI model in transferring graduate knowledge with a combination of data mining and data science, it is hoped that it can help related parties.The Head of the Study Program, Dean, and Deputy Chancellor III can use the results of information and knowledge trainers from graduates, both from the results of www.ijacsa.thesai.orgclassification, grouping, and association, to be used as material for future considerations in designing drafting the academic documents, especially curriculum and all matters related to student activities.Recommendations for increasing the competence of graduates from the results of this study include the classification level; the highest branch is scientific communication which most influences the job suitability of graduates.So from these results, college can focus more on preparing the graduates by holding more student activities, such as seminars and workshops on improving students' scientific communication abilities.
From the results of the model modification, several new differences were obtained when compared to the previous model.Modeling modifications with data mining are carried out to understand problems and provide effective and efficient solutions so that the transfer of graduate competence knowledge is more focused.This model is designed to be easy to use and acceptable to all users.The documentation describes the concept, function, and use of the model.Models are compatible with many popular technologies and platforms.Therefore, further model testing must be carried out to test the effectiveness and success of the design model.Caution is required to deal with changes and increase in their use.

Fig. 2 .
Fig. 2. SECI model framework with data mining and data science.

Fig. 4 .
Fig. 4. Data validity of economics and business graduates.

Fig. 5 .
Fig. 5. First and second level classification trees.The highest branch of the tree, or the 0th tree, is Elements of General Skills: Scientific Communication with a value of ≤ 4.5, gini = 0.434, samples 387, and value = [123,264], this means:  For questions with an answer value of 4.5 or lower, the scientific communication elements will follow the True arrow to the left and the rest will follow the False arrow to the right.

Fig. 6 .
Fig. 6. 3D cluster results.From the picture above, the following data is obtained:  Cluster 0: Consisting of 209 data with age distribution under 26 years and a combination of GPA and length of study time. Cluster 1: Consisting of 151 data with an age distribution between 27 to 32 years and a combination of GPA and length of study time.

TABLE I .
RELIABILITY STATISTICS DATA OF COMPUTER SCIENCE GRADUATES

TABLE III .
QUESTIONS IN GRADUATE COMPETENCY QUESTIONNAIRE below:

TABLE IV .
QUESTIONNAIRE ON KNOWLEDGE OF GRADUATES