Enterprise Architecture Model that Enables to Search for Patterns of Statistical Information

Enterprise architecture is the stem from which developing of any departmental information system should grow and around which it should revolve. In the paper, a fragment of an enterprise architecture model is built using ArchiMate language. This fragment enables to search for information in enterprises which do not work in productive industry. Such enterprises include official statistics. The proposed model embraces all three architectural levels of corresponding information systems, namely, OLTP, OLAP, and Data Mining. Particularly, the latter level enables to search for patterns of statistical information.


I. INTRODUCTION
Architectural issues which arise during developing complex information systems play the role of the same importance as those arising during construction of an original building.Correct architectural decisions considerably lower the risks of the whole project of system developing and maintenance, because they make it possible to efficiently use existent infrastructure and optimally plan its further progress.At first, the term "architecture" was used in the field of IT only in relation to hardware.Later on, this term was used in relation to information systems as a whole.Only with the lapse of time it became clear that it is necessary to apply systems approach not only to developing the information system, but also to developing the whole enterprise.As a result, term "enterprise architecture" emerged.For the first time, it was used in the report [1].This terminology assumes relatively wide treatment of the concept of "enterprise."In particular, the concept can be applied to the architecture of state organizations and offices.
Among various definitions of enterprise architecture, we will primarily use the one proposed by Global Enterprise Architecture Organization (GEAO): "The way in which an enterprise vision is expressed in the structure and dynamics of an enterprise.
It provides, on various architecture abstraction levels, a coherent set of models, principles, guidelines, and policies, used for the translation, alignment, and evolution of the systems that exist within the scope and context of an Enterprise" [2, p. 7].
In general, when describing enterprise architecture, the following principles are frequently used: -The level of architecture refinement is being chosen in such a way that the amount of information about a certain component is minimized; anything irrelevant to the interconnection with other architecture components is omitted; -Architecture definition mustn't contain descriptions of the components themselves.
Usually, four main layers [3,4] of enterprise architecture are distinguished.They are presented in Table 1.Such enterprises as official statistics, i.e. those not working in productive industry, obviously need to have special architectural features.In particular, statistics deals with processing and searching not for physical resources but for information, searching being arguably its main activity.
The aim of this paper is to build enterprise architecture for enabling searching for statistical information, including searching for patterns in concealed data distribution features.
The rest of the paper is organized as follows.Section II describes three possible levels of information systems in official statistics.Section ІII reviews past researches that have been done in the area of Architecture Description Languages for enterprise architecture.Section IV details on the proposed architecture and its considerations for, particularly, searching for patterns of statistical information.Section V concludes the work.www.ijarai.thesai.org

II. THREE LEVELS OF INFORMATION SYSTEMS IN
STATISTICS From the developing a classification of applications for processing data in statistical information systems point of view, three principally different types (levels) may be distinguished.[5] which ensure basic functionality such as entering the data and results of appropriate statistical observations and surveys, their structured (usually by means of DBMS) storage and accounting, primary and aggregated data control, dissemination of results using various predefined output tables.Examples  2

) Online Analytical Processing (OLAP) systems [6] which allow creating not predefined tables and carrying out other analytical research of statistical data, including searching for their distribution patterns. Example of such systems is the system developed and implemented under the direction of the author for multidimensional analysis of 2001 All-Ukrainian population census data.
With this OLAP system, the user gained a natural and comprehensive data model arranged in three multidimensional cubes, namely, for the respondents, for the households and for the administrative and territorial units of Ukraine.Cube's dimensions were data features whose intersections enabled to obtain, filter, group, and represent information.For instance, the simplest cube for the administrative and territorial units had such dimensions as area, population size, urban or rural type, center of population type, predominate nationality.The cube for the respondents had 60 dimensions, and the cube for the households had more than 20 dimensions.
A measure defines what information is provided by the cube.For instance, a number of respondents can be a measure, and "native language," "marital status," "center of population" etc. can be dimensions.Each cube cell contains a number of respondents with particular features.The user analyzing information from such a cube can "slice" it across different dimensions, obtain aggregated or, on the contrary, detailed findings etc.

3) Data Mining systems which perform the most cumbersome and routine analytical operation of searching for concealed patterns that might exist in the analyzed data.
Thus, in contrast to architectural decisions of existent demographic data processing systems (in the US [7], France [8], Russia etc.) based on utilizing OLTP and OLAP levels only, we propose to add a new level of Data Mining.Discussed levels differ sufficiently; their comparative analysis is presented in Table 2.

III. DISCUSSION ARCHITECTURE DESCRIPTION LANGUAGES
According to the monograph [9], languages used to describe enterprise architecture can be divided into two large groups: -Universal languages such as language family IDEF [10,11], BPMN (Business Process Modeling Notation) [12], ARIS (Architecture of Integrated Information Systems) [13], UML (Unified Modeling Language) [14] and others; -Architecture description languages (ADL).
These two groups are complementary rather than interchangeable.Being inferior to universal modeling languages with respect to detailed description of processes, ADL languages have a natural advantage in describing architectural object features modeled [15].ISO/IEC/IEEE Standard 42010:2011 [16] provides a rather general definition of an architecture description language as "any form of expression for use in architecture descriptions."It also contains relatively mild criteria of labeling certain modeling languages as architecture description ones.So, nowadays, one term, Architecture Description Language, is being used for modeling languages of different classes:

1)
(Software engineering) software architecture description languages [17], such as ACME [18] and Wright [19] developed at Carnegie Mellon University, Darwin [20], AADL [21] etc.; 2) (Enterprise activity modeling) enterprise architecture description languages, the most known among them being ArchiMate [9], DEMO [22], ABACUS [23].www.ijarai.thesai.orgWhen used for describing enterprise architectures, virtually all the modeling languages from the first class have the following disadvantages [9, p. 38]: Interconnections between different levels (representations) of a model are ill-defined; models created with different representations cannot be easily integrated in future; -Language semantics is not "transparent;" -There are restrictions for describing architecture of either business or technological (infrastructure) level of a model.Thus, to describe enterprise architecture model that enables to search for statistical information and patterns in its distribution, we will use the specialized language ArchiMate which was standardized in 2008 by the Open Group consortium.

IV. PROPOSED ARCHITECTURE
In [24], the concept of on-line analytical mining (OLAM) systems was proposed.Its main idea lies in creating specialized OLAP systems for enabling of optimal searching for certain predefined patterns [25].
Such systems became widely used in criminology (to find out relations between crimes and known delinquents who could potentially commit them [26]), in medicine (to search for correlations between groups of people with a certain missing part of the Y chromosome and clinical presentation of infertility [27]) and so on.However, in a general case discussed in this paper, when the structure and elements of searched-for patterns are not known beforehand, creating OLAM systems is an unacceptable approach.
Therefore, we will build enterprise architecture model assuming that we have all three separate but complementary levels, namely, OLTP, OLAP, and Data Mining.
The model described in ArchiMate consists of three interrelated levels of view, namely, business level, application level, and technological (infrastructure) level.
To build required model at its business level, we will formally describe a process of searching for information.At other two levels, we will represent main software systems and servers used for implementing the described process, respectively (Fig. 1).To simplify notation, we don't show ancillary services like user authentication, backup and restore, contextual access to referenced data, local network support etc.To fit the whole architecture model to one page, detailed description of business functions is presented separately in Fig. 2-4.
In Fig. 1, bold vertical dotted lines additionally distinguish levels of statistical data processing system architecture singled out in Section II.The built model illustrates complementary capacities of OLTP, OLAP, and Data Mining systems when searching for statistical information.

V. CONCLUSION
In the paper, enterprise architecture model that enables to search for statistical information is built.In contrast to existent models, the proposed one embraces all three possible architectural levels of corresponding information systems, namely, OLTP, OLAP, and Data Mining.The first level allows searching for required information in output tables, or by means of not predefined queries.The second level allows searching using ad hoc queries.The third level implies preparing microfiles and searching for patterns in concealed data distribution features.
The built model will be used for developing information system for processing data of All-Ukrainian population census to be held in this year.General architectural model that enables to search for patterns of statistical information.
5 | P a g e www.ijarai.thesai.orgOLTP business functions for pattern search of statistical information.

Fig. 1 .
Fig. 1.General architectural model that enables to search for patterns of statistical information.

Fig. 2 .
Fig. 2.OLTP business functions for pattern search of statistical information.

Fig. 3 .
Fig. 3.OLAP business function for pattern search of statistical information.

Fig. 4 .
Fig. 4. Data Mining business function for pattern search of statistical information.

TABLE I
of such systems are the systems developed and implemented under the direction of the author for processing 2001 All-Ukrainian population census data and 2004 Moldova population census data.OLTP systems allow computing calculation indicators (like living area per one household member), creating predefined output tables and reports, and processing not predefined queries.In other words, representative capacities of such systems are rather limited.Thus, if creating a required output table was not specified at system design phase, obtaining appropriate data becomes practically infeasible, because to get a value of each cell or row of a new table one needs to make separate queries.