Cross-Organizational Information Systems : A Case for Educational Data Mining

Establishing a new organization is becoming more difficult day by day due to the extremely competitive business environment. A new organization may not have enough experience to survive in the competitive market; which in turn may push down the reputation of the organization and the trust of the investors. The goal of this research project work is to design a framework for the cross-organizational information system for assessment and decision making using machine learning with the emphasis on the educational sector. In the proposed framework, organizations share information (even raw data) with each other and machine learning tool will be utilized for shared data analysis for decision making for a particular organization. A framework like this can help new organizations to get benefit from the experience of other ‘older’ organizations and institutions. Such knowledge-based machine learning system helps to improve the organizational capability of newly established institutions. As an implementation of the framework, we build a fuzzy system that can effectively work as a crossplatform system for educational entities. Keywords—Information system; machine learning; crossorganization; decision making; education; fuzzy matching; data mining


I. INTRODUCTION
A cross-organizational information system is a kind of information system that allows association, communication, and shared interactions.Such a system helps taking decisions for the organizations to get the competitive advantage in business [1], [2].A cross-functional information system is an information system that is designed to allow different organizations share information and get benefits as the system allow the coordination between the business organization process activities [3].Designing a cross-organization information system, sharing information and getting benefit become the main challenge for the business organization professional in today"s environment.The purpose of developing a cross-organizational information system is to assist a new organization to use information from various other organizations to take right decisions.
Various types of decisions taken while establishing a new organization should be correct, accurate and beneficial for the organization Otherwise, badly assessed decisions may lead to the loss of the trust of the investors.In this scenario, we can imagine a cross-platform system to assist such organizations and that can work as a supporting tool for a new organization.A new organizations can take the benefits from the "experiences" of the older organizations, where these "experiences" are actually derived using data mining techniques on the data shared by other organizations.Thus, in this paper, we address the key challenge to making a framework that can help an organization to communicate with the other similar organizations, share information and get the benefit of the experiences of the most renowned organizations.
Of course, the ability of such a system is limited by the amount of data shared by various organizations.Generally, organizations have public and protected data; the public data can be accessed by other entities and protected data can be accessed by only the authorized person.The protected data can be accessed by the following organization policies through the interference rules [4], [5].In addition, in an amicable situation (such as country-wise public university system), sharing data may be more feasible, as the entities in such a system serve some common goals (building the competent workforce, for example).Therefore, the case study presented in this work for the proposed cross-organizational system focuses on educational data mining.The main contributions of this work can summarize as follows:  Initial independent data mining systems for each organization.
 Cross-matching of "trend curves" from similar organizations to measure comparative evaluation.
 Identify points of weakness for a "lagging" organization from the trend lines of the best (or established) organization.
While building such a cross-organizational system, a number of issues need to be considered in the framework.We need to consider distributed database system with security issues, as various organizations apply their own rules and data structure.In such a scenario, protecting the privacy of the data of an individual organization is important.In addition, the data mining tool need to normalize the data sources in order to build proper matching models.
As an implementation of the proposed framework, we build a system for educational data mining using fuzzy trend line matching.The goal is to build a system that can help a new educational institution assess the quality of their education compared to other older institutions.Data in such a www.ijacsa.thesai.orgsystem are mainly students" records which are highly private data that cannot be publicly shared.Therefore, a crossorganizational system that ensures the privacy of shared data, while providing a platform for assessing qualities is highly sought.This paper exactly seeks to achieve this goal.
The rest of this paper is organized as follows.Section 2 deals with the discussion of the related work.In Section 3, we present our cross-organizational framework.The case study and corresponding model are presented in Section 4. Section 5 deals with the discussions of the experimental results.Finally, Section 6 is the conclusions.

II. RELATED WORK
In this section, we briefly present some existing works related to our research.We first review some works which are related to cross-organizational systems.We then evaluate works that focus on educational data mining.

A. Multi-Source Systems
To the best of our knowledge, little work has been done on building a decision-making system that involves multiple organizations.Rather, many researchers have focused on "multi-source" information systems [16]- [20].For example, Poli et al. [16] discussed the possibility of integrating the precepts from multiple non-communicating observers as a means of achieving better joint perception and better decision making.Their approach involves the combination of braincomputer interface (Bel) technology with human behavioral responses.
Multi-source of information doesn"t necessarily come from distinct sources.In cases, information can be categorized into distinct classes, where each class represents different aspects of the information.Such a case can be found in the evaluation of consumer decision process.In [17] the authors utilized an agent-based conceptual and computational model of consumer decision-making based on culture, personality, and human needs.They used a five-factor model to formulate the utility function, to process and update the agent state, and to build recognition and action estimation modules for the consumer decision process.
Another way for utilizing a "multi-source" system is to adopt multiple layers of decision making, where each layer is responsible for deciphering different aspects [18].In [18], the multilayer data mining approach was focused on e-business activities to provide the high level of business intelligence for enterprises.The conversion model is used to create distinct layers of data mining structures, where these mining structures act as the platform for applying the multilayer data mining models.Brodsky et al. [19] used the concept of Decisionguidance management systems (DGMS) and proposed an initial data model and then an integrated DGMS query language, DG-SQL.Their approach supports seamless integration of construction of learning sets, learning, probabilistic prediction and simulation; and stochastic or deterministic optimization.In [20] another work, integrated decision support systems with data mining and multiple criteria decision making.

B. Educational Data Mining Systems
The case study considered in this paper involves performance evaluation systems based on data from multiple educational organizations.Such a system can be broadly categorized as an Educational Data Mining System (EDMS).Therefore, in the following, as discuss some existing approaches for EDMSs.More details on EDMS can be found at [19].
Many researchers have worked on EDMS with a number of goals in mind.Some works focus on measuring student performance [6], [12], [13], others target selecting the best educational and learning methods [10], [11], [14], [15].In addition, as we discussed in the previous section, some EDMS do utilize multi-source systems to serve their goals [7], [8].
A predictive data mining model to identify the difference between high and slow student learners was proposed in [6].Records of 300 students were used to construct Bays classification model.Guruler et al. [12] explored the factors having an impact on the success of university students.They used a decision tree classification as a data mining technique for prediction of dropout and retention for motivating engagement in learning activities and consequently increasing students" satisfaction.In [13], four types of predictive mathematical models, i.e. multiple linear regression, multilayer perception network, radial basis function network, and support vector machines are used to predict students" academic performance in engineering disciplines.
As for the learning perspective of EDMS, the impact of Learning Analytics (LA) on EDMS Educational Data Mining (EDM) was studied in [9], [10].Bienkowski et al. [11] used adoptive learning systems for measuring objectives, methods, knowledge discovery processes.Levy and Wilensky discussed [14] behavior modeling and students" exploratory actions with computer-based multi-agent models when their goal is to construct an equation.They suggested that engaging students in constructing symbolic representations may provide a bridge between frequently disconnected conceptual and mathematical forms of knowledge.Moridis and Economides demonstrated how the various kinds of evidence could be combined so as to optimize inferences about affective states during an online self-assessment test [15].They used a formula-based method for the prediction of students" mood.The method was tested using data emanated from experiments made with 153 high school students from three different regions of a European country.
Works that utilize multiple sources of information include the research carried out by Siemens et al. [7].Here, EDM with Learning Analytics and Knowledge (LAK) is utilized with the support of two distinct research communities.Formal communication and collaboration between these two communities in order to share research, methods, and tools for data mining and analysis in the service of developing both LAK and EDM fields.Romero et al. [8] used various webbased courses as learning content management systems for knowledge discovery.www.ijacsa.thesai.org

III. A CROSS-ORGANIZATIONAL SYSTEM: THE FRAMEWORK
In this section, we present our framework in a more abstract form.A proper implementation of the framework will be discussed in the next sections.Therefore, the discussion in this section is less technical and addresses only the key issues in the framework.
The main components of the proposed framework are illustrated in Fig 1 .The whole process starts with a data fetching module that ensures data privacy, adequacy, and relevancy.Once enough data is there, normalization of data is performed (like scaling, of course, grades).In this phase, extreme data points are also identified and filtered.In the third phase of the proposed framework, data mining tools are used to build models that are later compared to infer results.The concept of building models is to construct ideal cases (or best cases) scenarios for different queries that a new organization may have.In the final module of the framework, data from the new (or enquiring) organization compares its data with the models built from the previous phase to assess the performance of the organization.It is important to note here that the first module of the framework consists of several sub-modules, where each submodule is implemented for each individual database.The reason behind sub-modules is that individual databases may have their own structures and privacy policies; hence it might be difficult to devise uniform rules to screen out relevant data for later stages of the framework.A more detailed illustration of the data fetching sub-modules of individual databases and their relationship with the data integration module of the framework is depicted in Fig. 2. Fig. 2. Data normalization and integration modules in the framework.

IV. A CASE STUDY
The framework proposed in this paper is evaluated for a case study in educational data mining.In the framework described above, the "organizations" now refer to educational institutions.In this work, we use data collected from three different departments of the authors" current institution.Although the case study is completed using data from the same institution, the process described below is generic and can be readily applied to multiple institutions.
We have collected data of students" results in several courses for three different departments.For the purpose of privacy, the names of the departments and corresponding courses will not be mentioned here.Rather, we will refer to the three departments as Dept.A, Dept.B and, Dept.C. Each course under study will be referred to Dept.X Sub.Y, where X = {A, B, C} and Y> 0 is an integer.For example, Dept.BSub.3refers to the third course in department C. In this work, data from a total of 28 courses containing 585 students" records are analyzed.Each student"s record is a tuple of the form <Dept.; Semester; Course; Student ID; Semester marks; Final Marks; Total Marks; Grade>.Table 1 shows some samples of data used in this case study.The overall process of the data mining tasks carried out in this work is outlined in Fig 3 .Raw Data refers to the students" records as stored as is in the Students Affairs database.The filtering module takes this data and applies some operations like a removal of students" names/IDs (assuring privacy), aggregation of data of several semesters for the same course, normalization of data values, etc.Data mining module in Fig. 3 refers to the machine learning operations carried out on the filtered data from the previous module.In this work, this module is responsible for generating fuzzy trend lines as described later.Finally, inference module takes various trend lines and compares them so as to derive decisions.Methodology for comparing trend lines will be discussed later.

A. Problem Formulation
In this section, we describe the metric we evaluate through the data mining module.Suppose, the filtered data for Dept.X ( ) contains information for n courses, where count i refers to the number of records in course i, n i ,..., . Let Thr i be the minimum mark (out of 100) that students need to obtain to pass the i-th course.Suppose, j i SG , refers to the value of the semester mark in the j-th record in i-th course.

Now, let be t i
Prob , the probability that a student in i-th course can have his total grade >Thr i , given that the student"s semester grade is less than or equal to t.For example, 40 , i Prob refers to the probability that a student in course i can eventually pass the course, while his semester grade is less than or equal to 40 (out of 100).Estimating this probability t i Prob , can be rendering several benefits.As an example, a high value of Prob , for a low t may indicate that the final exam was relatively easier, especially if the probabilities for the same group of students in their other courses are lower than this particular course.

The probability t i
Prob , can be estimated for a course as follows.Suppose, count i,t is the number of students in course i, whose semester grades is less than or equal to t.Then, t i Prob , is estimated as the ratio of count i,t and count i .Now, let t X Prob , be the probability that a student can pass any course in department X given that the student"s semester grade is less than or equal to t.In the following section, we describe how t X Prob , can be formulated as a fuzzy number and how values t X Prob , of two different departments can be compared.

B. Fuzzy Modeling and Trend Lines
The easiest and straightforward approach to estimate t X Prob , is to treat it as an average value of all.
. However, average value is well known to destroy the trend in a set of values and thus is not a good representative.A better approach, as adopted in this work, is to represent as t X Prob , a fuzzy number that can capture some variability in students" performance across a number of courses over several semesters.

In modeling t X
Prob , , we describe the value of t X Prob , as a fuzzy probabilistic measure.Fuzzy probabilistic measures are fuzzy sets that have membership functions similar to those of fuzzy numbers that are characterized by possibility distributions.In our representation of We now describe how t X Prob , is calculated as a fuzzy probabilistic measure in this work.Suppose, . Here, S i is the maximum semester grade that a student in department X can score in course i. Fig. 5 illustrates the functions


for a number of courses of a single department.The dotted line in Fig. 5 is calculated as the average of all is called a trend line, and has a fuzzy description as explained below., consider Fig. 6.The solid dot in Fig. 6 refers to the average value calculated as: This refers to the value of  in the  number described above.In our formulation, the two width values ( 1 and 2  ) are taken equal and estimated as: Here, STDEV (.) is the function to calculate the standard deviation.Therefore, is a fuzzy probability measure which measures the probability that a student in department X can pass any course in that department with a semester grade less than or equal to k. As, for the ranges 1  and 2  , they can be taken as zero for simplicity of calculation.

C. Fuzzy Matching
In this section, we describe how trend lines can be compared and inferences can be made.Fig. 7 refers to the plot of two trend lines: for two departments in the authors" current institution.Note that, semester grades range from 20 to 60, since in these two departments, the total 100 degrees of a course is divided into 60% for semester grades and 40% for the final exam.As shown in Fig. 7 students, whose semester grades are generally low, have more probabilities to succeed in Dept.B. compared to Dept. A. This does not necessarily mean that student in Dept.A perform poor, as the trend lines in Fig. 7 are both fuzzy and the differences that can be seen in Fig. 7 may not be significant.Therefore, in the following, we describe how  can be compared so as to find that whether students" performances in two departments are comparable or not., where strip w is the width of the strip (which is 1 here) and strip d is the "fuzzy" difference between Here, strip d is used as the dissimilarity measure for fuzzy probabilities of success and is computed as follows.7).It is possible to identify two zones: highly motivational zone and low motivational zone, as shown in Fig. 7.An existence of low motivational zone may refer to the group of students who may not exert sufficient effort to pass a course.Identification of such a group is essential for an institution, as improving the performance of these groups of students can improve the overall performance of the institution.
For a newly established institution, trend lines provide means through which a program can set up its various parameters.For example, difficulty levels in the final exam can be monitored through trend lines and an appropriate level can be set.Programs can even generate trend lines of individual courses and identify the courses where students generally perform worse.In this way, a number of queries essential to an education department addressed, like identifying courses having worst results, courses whose performances are co-related, etc.

VI. CONCLUSION AND FUTURE WORK
Nowadays, for successful establishing of a new institution, a cross-organizational information system for assessment may be needed.Decision support technology is also sought in order to handle decision-making problems.In this area, machine learning and data mining can play an important role to extract relevant information.In this paper, we design a framework for the cross-organizational information system for assessment and decision making using machine learning with the emphasis on the educational sector.In the proposed framework, organizations share information (even raw data) with each other and machine learning tool will be utilized for shared data analysis for decision making for a particular organization.A framework like this can help new organizations to get benefit from the experience of other "older" organizations and institutions.Such knowledge-based machine learning system helps to improve the organizational capability of newly established institutions.In future work we will implement this framework for multiple universities.

Fig. 1 .
Fig.1.Overall structure of the cross-organizational information system for inference, assessment and decision making.

Fig. 5 .
Fig. 5. Estimation of fuzzy trend line: an example for a number of courses.

Fig. 6 .
Fig. 6.Estimation of fuzzy trend line at the department level.

TABLE I .
SAMPLES OF DATA USED IN THE CASE STUDY (STUDENTS IDS ARE KEPT HIDDEN FOR PRIVACY)