Assessing Data Sharing's Model Fitness Towards Open Data by using Pooled CFA

This study demonstrates the step-by-step procedure to perform Pooled Confirmatory Factor Analysis (CFA) in the measurement part of Structural Equation Modelling (SEM). CFA is crucial for the SEM measurement model to obtain the acceptable model fit before modeling the structural model. There are two techniques in CFA; individual CFA and Pooled-CFA. Usually, Pooled-CFA is done due to the high number of constructs and items. If the model is too complicated and has so many constructs and items, then it is recommended to perform Pooled-CFA to simplify the model's looks yet easy to understand. The perception of Malaysia Technical University Network (MTUN) academics on data sharing towards open data was analysed by using pooled-CFA. There are three main constructs: data sharing with its 4 subconstructs; (technological factor, organizational factor, environmental factor, and individual factor), mediator construct (open data licenses), and open data construct was analyzed in this research. Furthermore, second-order constructs' factor loadings towards their corresponding sub-constructs were investigated. This research collected the primary data of 442 respondents using a stratified random sampling technique. This paper will explain the theoretical framework before revealing the results of Pooled-CFA on data sharing towards open data. Keywords—Pooled CFA; data sharing; open data; measurement model; validity


I. INTRODUCTION
Open data initiatives have become ubiquitous in every country. According to [1], Malaysia has embarked on the open government data framework by The Malaysian Administrative Modernisation and Management Planning Unit (MAMPU) in the year 2015. The initiative is then leveraged to be implemented at the ministries' and agencies' levels. It seems crucial to have an open data framework within the higher education environment as [2] has mentioned that the higher institutions play a significant role and are among the most significant contributors that support the citizen's needs in the education world. In the meantime, [3] has stated that the data producer is reluctant to share data might because it possesses challenges at many levels such as cultural, ethical, financial, and technical. Adding to these challenges, [4] has highlighted that the reluctance of data sharing perhaps due to disinterest from the universities. Thus, this study employs quantitative techniques; survey to Malaysia Technical University Network (MTUN) academics. There were 442 feedbacks received and there was a need to perform Confirmatory Factor Analysis (CFA) to confirm the factor that influence MTUN academics on data sharing. This research aims to identify the factor influence MTUN academics on data sharing and analysing the open data license, which will act as a mediator between data sharing and open data. This paper will explain in details the theoretical framework developed for this research, the component in structural equation modeling (SEM), the determination of sample size, the fitness indexes of technological, organizational, environmental, and individual construct that determine data sharing and how the procedure of pooled-CFA is done. The reliability and validity indexes also will be measured to indicate the acceptance state.

II. LITERATURE REVIEW
Open data might change the relationship between the government and the public in terms of transparency [5]. This intention can be perceived by accessing the government's data through an open format datasets form. Furthermore, [6] has emphasized that open data enactment will address the existing legal challenges. The challenges include the scope of accessing the data and data ownership.
Since open data has been announced, it has created a sensation worldwide. In [7], the statement is supported by highlighting the potential of open data to improve organization services to the public. Besides that, citizen participation is encouraged in open data towards having a transparent government. As in [8], the approach to embark on open data will be different as every country has a different governance structure, and the organization has its policies regarding open data.
Ministry of Education Malaysia (MOE) has specified that Malaysia's Higher Education Institutions (HEIs) are categorized as Public Universities, Private Higher Educational Institutions, Polytechnics, and Community Colleges [9]. As for this research's scope, MTUN universities include 4 public universities (UMP, UTEM, UTHM, and UNIMAP) where it focuses on the Technical and Vocational Education and Training (TVET) approaches.
In Malaysia, according to [9], 12 National Key Economic Areas (NKEA) have been identified under the government's Economic Transformation Programme (ETP). In [10], it is highlighted that the programme has demanded an additional 1.3 million TVET workers by 2020. This demand has strengthened the need to have MTUN open data framework as this initiative will help the potential worker make an informed decision from the data shared.
As in [11], the open data demand keeps growing in public universities. The demand ensues due to the open data capabilities of removing the barriers to reuse and redistributing the data. On the other hand, it will help the public to make informed decisions. As the results in [2], the economy's deception can be reduced, and universities' accountability can be expected while embarking on open data. To conclude, the data shared are valuable in creating innovations towards having a better university in the future.
This research endeavors to develop an open data framework for MTUN academics. Before that, the factors that influence data sharing, the roles of open data license as a mediator, and the indexes of open data components are measured. This paper will explain in detail the results of factors that influence data sharing towards open data by running pooled-CFA.

III. THEORETICAL FRAMEWORK
This study integrates the technological, organizational, and environmental (TOE) framework with the Theory of Planned Behavior (TPB) to determine factors that influence data sharing. According to [12], The TOE framework is an organization-level theory that explains organization structures from 3 perspectives. These 3 perspectives are technological, organizational, and environmental. These contexts were adopted and integrate with TPB theory that examines individuals' perspectives and were analysed as the factors that contribute to the data sharing.
The TPB theory has been useful and considered one of the most influential models in predicting social behaviors [13], [14]. The TOE framework and TPB theory integration were used to develop the MTUN open data theoretical framework as Fig. 1. Fig. 1 shows how the technological, organizational, and environmental sub-constructs are derived from the TOE framework. Whereby the individual sub construct is derived from TPB theory. From the framework, it can be seen that all of these 4 sub-constructs (technological, organizational, environmental, and individual) are the factors that contribute to the data sharing construct. The open data licenses (ODL) will act as the mediator between data sharing and open data (MTUN_OD). This paper will explain how the pooled-CFA is conducted and how the constructs and items were analyzed by using IBM SPSS AMOS software (version 2.4).

IV. STRUCTURAL EQUATION MODELLING
Structural equation modeling (SEM) is a powerful, multivariate technique that has been used widely in scientific investigations to assess and evaluate multivariate causal relationships. According to [15], [16], sometimes, it is also called a statistical methodology where the confirmatory approach is used to analyse the structural theory.
There are 2 main components in SEM: measurement and structural models. These 2 components are used to examine variables in different ways. The measurement model section will relate the measured variables to the latent variables. On the other hand, the structural model section relates the latent variables to one another. SEM combines 2 statistical methods; confirmatory factor analysis (CFA) and path analysis. There are 2 CFA techniques: Individual CFA and Pooled-CFA. This paper focused on the SEM measurement model, assessed through the CFA.
The first step to run pooled-CFA was having to perform the individual CFA for each construct. In [17], [18] has mentioned that the second-order construct was validated using the CFA procedure separately before it's been simplified into first-order constructs to reduce the model's complexity. As [19] suggested, the pooled-CFA for all constructs was important to perform to assess the discriminant validity among the model's constructs. Thus, this study analyzed the feedback from the MTUN academics survey of data sharing towards open data.
There was 1 main construct (second-order construct) involved in this study: data sharing with its sub-constructs: technological factor, organizational factor, environmental factor, and individual factor. According to [20], that secondorder CFA is employed in this study as it involved the assessment of a second-order variable's factor loadings towards its corresponding sub-constructs. By running a second-order CFA, the relationships of data sharing towards its subconstructs were examined as well. The ODL construct and MTUN_OD construct were identified as first-order constructs as they do not have the sub-construct and was analyzed directly without the need to simplifying it anymore.

A. Preface
This study's population target covers MTUN academics from various educational backgrounds and working experiences. Based on [21], the total population of MTUN academics in 2018 was 3818. According to [22], as Table I, the sample needed for this study was 351 for the population of 4000.
As per shown in Table I, the population sample obtained was 442, which was higher than the number of samples required. A total of 442 respondents were chosen randomly. The comprehensive questionnaire for the field study was constructed which was derived from an exploratory factor analysis (EFA). The EFA was executed by using IBM SPSS software. The data collection for the field study is done by distributing the questionnaire to MTUN academics using a stratified random sampling technique. 565 | P a g e www.ijacsa.thesai.org IBM statistical package for social science (SPSS) and IBM SPSS analysis of moment structures (AMOS) version 24.0 were used to build and analyze the model in this study.

B. Confirmatory Factor Analysis
Confirmatory factor analysis (CFA) is a method of factor analysis, most commonly used in social research. It is usually used to examine the consistency of a construct with a researcher's understanding of that construct's factor. CFA's objective is to examine whether the data fit a hypothesized measurement model. This hypothesized model is based on theory or previous analytic research. In CFA, several things need to be tested: reliability, validity, and unidimensionality of the measurement model. The results must meet the stated requirement before modeling the structural model. According to [18], [23], the theorized model must pass 3 types of validities: Construct Validity, Convergent Validity, and Discriminant Validity. The details of validity and reliability indexes are shown in Table II. Adding to this, several fitness indexes need to be examined as well to evaluate the model fitness. Absolute fit, incremental fit, and parsimonious fit are three types of model fit categories. Below are the fitness of indexes as shown in Table III. As shown in Table III, [24] has mentioned that the names of indexes that are frequently reported in many research are Root Mean Square Error Approx (RMSEA), Comparative Fit Index (CFI), and Chi-square/degrees of freedom (Chisq/df).

C. Discriminant Validity
The discriminant validity needs to be assessed to ensure no construct redundancy occurs in the model. Construct redundancy might occur when any pair of constructs in the model are highly correlated. This redundancy also can happen when one or more constructs assess the same variable. In other words, discriminant validity tests whether the concepts of measurements that are not supposed to be related are unrelated. According to [24], if the redundancy occurs, that particular redundant items in a model need to be deleted. The deletion should start from the lowest value of factor loading until the model is fit.
Besides that, correlation coefficients are used to measure the strength of the relationship between 2 variables. As mentioned in [25], it is also acted as evidence of discriminant validity. A correlation between variables indicates that if one variable changes in value, the other variable tends to change in a specific direction. The variables should not be highly correlated to each other, or else the multi-collinearity problem will exist. Besides, [24] has highlighted that the correlation value among the exogenous variables should not exceed 0.85 to achieve the variables' discriminant validity.

D. Summary
There are 2 techniques of CFA in SEM's measurement model: Individual CFA and Pooled-CFA. Individual CFA runs each unobserved construct in the research individually; whereas Pooled-CFA runs all construct simultaneously [26]. Before performing Pooled-CFA, the individual CFA for all constructs need to be done separately. The results must achieve the indexes' fitness as Table II and Table III to make them reliable and validated. The AVE's results were recalculated to get the mean score and were used in Pooled-CFA.

A. Individual CFA
The analysis started with performing Individual CFA. It ran the latent construct one after another to achieve the required model fitness. The CFA can only be performed if the constructs have more than 3 items with no model identification problem. Fig. 2 shows that all these 4 constructs (technological, organizational, environmental, and individual) have met the initial requirement to run CFA. All of the constructs must achieve the fitness indexes required.  Fig. 2 shows that the technological factor construct has 3 components; technical infrastructure (4 items), usability (3 items), and standard (10 items). The model fitness of the technological factor construct was overall met the fitness indexes. The value for RMSEA shown was .067, the CFI was .959, and Chisq/df was 2.989.
For the Convergent Validity (CV) assessment, the study needs to calculate the AVE. According to [19], [26], the construct achieved the CV if its AVE exceeds the threshold value of 0.5. Besides in [24], there was a need to compute the CR, and the value should exceed the threshold value of 0.6 for this reliability to achieve. The AVE and CR for the primary constructs and their respective components were computed and presented in Table IV. Table IV shows that each item's factor loading was high, which above 0.6. The CR value for the technological factor was 0.948, and AVE was 0.859. Meanwhile, the CR value for technical infrastructure was 0.921 and AVE was 0.745. In addition to that, the CR value for usability was 0.754 and AVE was 0.506. Meanwhile, the CR value for a standard was 0.940, and AVE was 0.610.
From these results, we can conclude that technological factors construct together with its components and items have met the CR's requirement, which must above 0.6, and AVE, which must above 5.0. Fig. 3 shows the CFA results for the organizational factor construct.
The organizational factor construct has 4 components; norms (10 items), data sharing policy (3 items), governance (3 items), and resources (5 items). The model fitness of organizational factor constructs was overall met the fitness indexes. The value for RMSEA shown was .067, the CFI was .945, and Chisq/df was 2.988. Table V shows the AVE and CR for the organizational factor construct. Based on Table V, it can be concluded that the CR value for the organizational factor was 0.961, and AVE was 0.861. The CR value for norms was 0.927, and AVE was 0.562. The CR value for the data sharing policy was 0.857, and AVE was 0.667. Meanwhile, The CR value for resources was 0.886, and AVE was 0.610. Finally, the CR value for governance was 0.872 and AVE was 0.695.
From these results, it can be concluded that the organizational factor constructs and their components and items have met CR's requirement, which must above 0.6, and AVE, which must above 5.0. Fig. 4 shows the CFA results for the environmental factor construct.
The environmental factor construct has 2 components; data sharing culture (3 items) and research practice (3 items). The model fitness of the environmental factor construct was overall meet the fitness indexes. The value for RMSEA shown was .066, the CFI was .990, and Chisq/df was 2.925. Table VI shows the AVE and CR for the environmental factor construct.   Table VI shows that each item's factor loading was high, which above 0.6. The CR value for the environmental factor was 0.933, and AVE was 0.874. The CR value for data sharing culture was 0.812, and AVE was 0.591. The CR value for research practice was 0.892, and AVE was 0.734. It can be concluded from these results that the environmental factor construct and its components and items have met CR requirements, which must above 0.6 and AVE must above 5.0. Fig. 5 shows the CFA results for the individual factor construct. The individual factor construct has 3 components, which are attitude (3 items), perceived behavioral control (7 items), and normative belief (3 items). The model fitness of the individual factor construct was overall met the fitness indexes. The value for RMSEA shown was .047, the CFI was .984, and Chisq/df was 1.955. Table VII shows the AVE and CR for the individual factor construct. Table VII shows each item's factor loading was high above 0.6. The CR value for the individual factor was 0.868, and AVE was 0.696. Meanwhile, the CR value for attitude was 0.956, and AVE was 0.878. In addition to that, the CR value for perceived behavioral control was 0.875 and AVE was 0.502. The CR value for normative belief was 0.842, and AVE was 0.641. From these results, we can conclude that individual factors construct together with its components and items have met CR's requirement, which must above 0.6, and AVE, which must above 5.0. Fig. 6 shows the CFA results for the open data license factor construct.
In Fig. 6, the ODL construct has 5 items. Thus, the model fitness of the ODL construct was overall met the fitness indexes. The value for RMSEA shown was .053, the CFI was .995, and Chisq/df was 2.257. Table VIII shows the AVE and CR for ODL construct. Table VIII shows that each item's factor loading was high, which above 0.6. The CR value for ODL was 0.898, and AVE was 0.639. Thus, from these results, we can conclude that ODL construct and items have met CR requirements that must above 0.6 and AVE, which must above 5.0. Fig. 7 shows the CFA results for the open data (MTUN_OD) construct.  In Fig. 7, the MTUN_OD construct has 9 items. The model fitness of the MTUN_OD construct was overall met the fitness indexes. The value for RMSEA shown was .089, the CFI was .952, and Chisq/df was 4.526. Table IX shows the AVE and CR for the open data construct. Table IX shows each item's factor loading was high, above 0.6. The CR value for the open data construct was 0.909, and AVE was 0.528. Thus, from this result, we can conclude that MTUN_OD constructs and their items have met CR's requirement, which above 0.6, and AVE must above 5.0.
An overall, the technological construct, organizational construct, environmental construct, individual construct, ODL construct, and MTUN_OD construct has met the fitness indexes and passed the measurement of AVE and CR. 569 | P a g e www.ijacsa.thesai.org The study needed to simplify the overall measurement model from the first-order construct and pool them together to undergo the CFA procedure at once. This procedure is called Pooled-CFA.

B. Pooled-CFA for all Measurement Model of Constructs
In this pooled-CFA model, as suggested in [24], the measurement model for the second-order constructs was validated using the CFA procedure separately and simplified into first-order constructs to reduce complexity. The reason to perform the pooled-CFA was to assess the discriminant validity among constructs in the model [17], [19], [23], [26]. In determining the fitness indexes, the values should meet the threshold as shown in Table III. As in [17], [19], [23], [26] have highlighted that the factor loading for every item should not less than 0.6 and the correlation coefficient of any two constructs should not exceed 0.85. The multicollinearity problem will occur if the correlation between any two constructs exceeds 0.85. In this study, none of the values found to be greater than 0.85. Thus, the multicollinearity problem does not arise. The Pooled CFA has merged 3 constructs. From Fig. 8, the model looks much more straightforward and easy to understand. The pooled CFA was also performed to avoid violating regression assumptions. The correlation between DS to ODL was 0.74. Then, the correlation between ODL to MTUN_OD was 0.72, and the correlation between DS to MTUN_OD was 0.70. Thus, no multicollinearity occurs as the correlation between each construct was below 0.85. Besides, pooled CFA's model fitness was overall met the fitness indexes. The value for RMSEA shown was .050, the CFI was .951, and Chisq/df was 1.914. Table X shows the AVE and CR for Pooled-CFA.  Table XI shows the discriminant validity index summary for all constructs. The discriminant validity has been achieved when the diagonal values (in bold) are higher than any other values in its row and column. Since SEM employs the parametric statistical approach of modeling, the study needs to assess all items' normality distribution measuring their respective constructs. According to [17], [19], [23], [24], [26], the value of skewness should fall within the range of -1.5 to 1.5 to make it normally distributed. Table XII shows the values of skewness for all components in the model fell within the range between -1.5 and 1.5. It means that the distribution does not depart from normality and there were no outliers' data. Thus, the data distribution meets the normality distribution requirement for employing parametric statistical analysis in SEM.

VII. CONCLUSION
Data sharing in this study that was defined through the combination of technological, organizational, environmental, and individual components. The components were derived from the literature review. However, in this study, the exact components that form data sharing were investigated through the process of survey distribution to MTUN academics that were then be confirmed through CFA. The investigations were then extended to the ODL construct and MTUN_OD construct. As for this research purposes, this study examined the factor influence data sharing and the impact of data sharing on ODL construct and MTUN-OD construct.
In a conclusion, Pooled-CFA is recommended to perform on a complicated model in making it simpler to analyse and easy to understand. The model is considered complicated when it involves many second-order constructs and items. There are 3 important types of validities in this study; CR, Cronbach Alpha, and AVE. The CR is important in this study as its measure of internal consistency in scale items. On the other hand, the AVE is important to employ in this study to confirm that the construct should correlate with related variables but it should not correlate with dissimilar, unrelated ones. In determining the value of CR and AVE for each construct, the analysis results of Pooled-CFA are recalled and it can be concluded that 4 components influence data sharing; technological, organizational, environmental, individual construct.
The technical factor has a CR value of 0.948; which above the minimum accepted value of CR; 0.6 and 0.859; which above the minimum accepted value for AVE; 0.5. The fitness indexed for this construct was achieved with the value of RMSEA was 0.067, which less than 0.1 to make it accepted. CFI was 0.959, which above 0.9 to make it accepted and Chisq was 2.989, which less than 5.0 to make it accepted. Meanwhile, the result of organizational factors shown the CR value of 0.961 and 0.861 for AVE. The fitness indexed for this construct was achieved with the value of RMSEA was 0.067, which less than 0.1 to make it accepted. CFI was 0.945, which above 0.9 to make it accepted and Chi-sq was 2.988, which less than 5.0 to make it accepted.
On the other hand, the environmental factor has a CR value of 0.933; and 0.874 for AVE. The fitness indexed for this 571 | P a g e www.ijacsa.thesai.org construct was achieved with the value of RMSEA was 0.066, which less than 0.1 to make it accepted. CFI was 0.990, which above 0.9 to make it accepted and Chi-sq was 2.925, which less than 5.0 to make it accepted. In the meantime, the result of the individual factor shown the CR value of 0.868; and 0.696 for AVE. The fitness indexed for this construct was achieved with the value of RMSEA was 0.047, which less than 0.1 to make it accepted. CFI was 0.984, which above 0.9 to make it accepted and Chi-sq was 1.955, which less than 5.0 to make it accepted.
Besides that, the ODL construct has a CR value of 0.898 and 0.639 for AVE. The fitness indexed for this construct was achieved with the value of RMSEA was 0.053, which less than 0.1 to make it accepted. CFI was 0.995, which above 0.9 to make it accepted and Chi-sq was 2.257, which less than 5.0 to make it accepted.
Furthermore, the result of the MTUN_OD construct shown the CR value of 0.909 and 0.528 for AVE. The fitness indexed for this construct was achieved with the value of RMSEA was 0.067, which less than 0.1 to make it accepted. CFI was 0. 952, which above 0.9 to make it accepted and Chi-sq was 4.526, which less than 5.0 to make it accepted.
An overall, the technological construct, organizational construct, environmental construct, individual construct, ODL construct, and MTUN_OD construct for CFA distinctively has met the fitness indexes and passed the measurement of AVE and CR.
To ensure the overall fitness indexes of the model, this study employed Pooled-CFA. From the result of pooled CFA, the fitness indexed for this overall model was achieved with the value of RMSEA was 0.050, which less than 0.1 to make it accepted. CFI was 0.951, which above 0.9 to make it accepted and Chi-sq was 1.914, which less than 5.0 to make it accepted.
Based on Table XI, it can be shown that the model achieved discriminant validity when the diagonal values (in bold) are higher than any other values in its row and column. As stated, the data sharing value for discriminant validity was 0.782, ODL was 0.740, and MTUN-OD was 0.755. It indicates that each of the constructs was measure distinctively and not related to each other.
Finally, for the normality test, the distribution does not depart from normality and there were no outliers' data. Thus, the data distribution meets the normality distribution requirement for employing parametric statistical analysis in SEM which will be discussed in the next paper.
In a conclusion, the step-by-step to do pooled-CFA must start with performing an individual CFA for every constructs to make it simpler and easy to understand for the complicated model. All the results must follow the table of indexes (Table II  and Table III) to indicate that the results are reliable and validated.
These results of CFA will be used to be modeled in SEM. However, for future work, it is advisable to add 1 more component to be measured; data quality which should be determined under technological construct.