Framework for Disease Outbreak Notification Systems with an Optimized Federation Layer

Data that is needed to detect outbreaks of known and unknown diseases is often gathered from sources that are scattered in many geographical locations. Often these scattered data exist in a wide variety of formats, structures, and models. The collection, pre-processing, and analysis of these data to detect potential disease outbreaks is very challenging, time-consuming and error-prone. To fight disease outbreaks, healthcare practitioners, epidemiologists and researchers need to access the scattered data in a secure and timely manner. They also require a uniform and logical framework or methodology to access the relevant data. In this paper, authors propose a federated framework for Disease Outbreak Notification Systems (DONSFed). Using advanced design and an XML technique patented in the US in 2016 by our team, the framework was tested and validated as part of this work. The proposed approach enables healthcare professionals to quickly and uniformly access data that is required to detect potential disease outbreaks. This research focuses on implementing a cloud-based prototype as a proof-of-concept to demonstrate the functionalities and to verify the concept of the proposed framework. Keywords—Disease outbreak notification system; database federation; web services; service oriented architecture; health systems


I. INTRODUCTION
The world population growth is causing disease outbreaks to occur frequently and the advancement in transportation technology is making them spread quicker and farther.As a result, fighting modern disease outbreaks demands minimum response time from relevant healthcare professionals.One way to minimize the response time of healthcare professionals is to build an efficient disease outbreak notification system (DONS).Building an efficient DONS has many challenges and has attracted many researchers [1], [2], [3], [4], [5].Some of the main challenges are: • DONS data often reside in data-sources located across many geographical, jurisdictional and organizational boundaries.Beside technical obstacles, collecting data from such diverse data-sources poses other defiances.
• DONS data can be huge [6].Processing such volume of data on time can be challenging.
• DONS data often exist in a wide variety of formats, structures, data models, and data types.Pre-processing such variety of data can be time-consuming.
• Collecting data from heterogeneous data-sources is a complex operation.Some of these data-sources are databases while others can be as simple as webpages.These heterogeneous data-sources often require multiple interfaces, languages, and protocols.
• Arrival of the required data on time from the datasources may not be guaranteed.
• Integrating, processing, and presenting the collected data in a beneficial way to healthcare professionals is challenging.[7], [8].
To tackle the above-mentioned difficulties, researchers proposed the following two approaches.In the first approach, researchers proposed programs that enable each data-source to share and integrate data with other data-sources.This approach requires each pair of data-source to have a separate integration program, which makes adding a new data-source very costly.It can't simultaneously and seamlessly integrate data from multiple data-sources [9].In the second approach, researchers proposed federated databases.However, this approach has a number of limitations [7], [8], [10], [11], [2], [1].First, adding a new data-source to the federation is costly and modifying any of the services offered by the federated database is timeconsuming.In addition, this approach is slow in identifying potential disease outbreaks and requires local to global schema translation to resolve the data model heterogeneity among various data-sources.Furthermore, this approach's data-sources are limited to relational databases and need to know the local schema of each data-source.Knowing the local schema of each data-source may not be provided by some data-sources for security reasons.Motivated by the above-mentioned challenges and limitations, this article proposes a framework called Federated System for Disease Outbreak Notification Systems (DONSFed) which is based on federated databases and web services technology.DONSFed is a federation of many data-sources.It is robust and scalable, and it doesn't intervene with the local operation of any of its data-sources.It only asks the datasource for data specific to potential disease outbreaks.It offers its data-sources the required security and autonomy.Unlike the traditional federated databases, its data-sources are not limited to relational databases.It can include other types of datasources such as Triplestore, XML, and NoSQL databases and others.DONSFed is data-store transparent.When a user enters a query, DONSFed breaks it into sub-queries and submits each sub-query to the relevant data-source.It then collects the result of each sub-query, aggregates them and delivers them to the user.
The rest of the article is organized as follows.Section II www.ijacsa.thesai.orgpresents a summary of data integration techniques while Section III discusses in details the proposed framework.Section IV highlights our conclusions and envisions our directions for future work.

II. DATA INTEGRATION TAXONOMY
Data integration techniques can be classified into five categories as shown in Figure 1.The first technique is the link integration [6], [12].In this technique, the search begins from the first resource via hyperlinks to get related information.However, the drawbacks of this technique are instability of hyperlinks, ambiguities, and the vulnerability of naming conflicts [6].The second technique is query-based integration [13].Even though it allows the user to query and retrieve data from different sources by a single query, the query is complex and it lacks the transparency of data location and integration to users.
In the data warehouse integration technique [14], [15], the system queries and retrieves data from different sources to a unified and central repository.The advantages of this technique are improving the performance and increasing data consistency.On the other hand, the disadvantages of this method include keeping an up-to-date central repository, supporting scalability, and maintaining privacy.
The federated database integration provides a uniform and central access to query and retrieve data [13].This technique is more scalable and flexible than previous techniques [16] since there is no need for a centralized repository.Hence, data replication is not required, and this leads to enhance data privacy and scalability support.This technique is utilized by many bioinformatics systems such as Entrez [17], BioMart [18] and EuPathDB [16].
In summary, the federated database and web services techniques are prominent due to their advantages including minimizing the interference of existing operations, managing heterogeneity, preserving local autonomy of constituent systems and supporting scalability.Combining these techniques could be the key to ensure the advantages of both.This research combines federated database and web services integration techniques to build a DONS framework to connect different data-sources together internally and introduce unified access to the data offered by these data-sources.

III. DONSFED FRAMEWORK
In this section, a framework for DONS is presented that consists of a federation of databases supported by web services.Our proposed framework, DONSFed, includes federation services and component web services.Using an advanced design and an XML node-labeling technique [11] patented in the US in 2016 by our team, the framework was tested and validated as part of this work.The framework allows the use of a portal to query databases in real-time.Such a query is usually split into pieces and then sent across to the target component systems through web services.The query is then processed to retrieve the required data and results are aggregated and returned to the requesting entity.The administrators of the federated services system are empowered to design and implement the required federation services.The component systems' administrators ensure their systems are connected and available.The component systems must maintain high availability because the federated system mainly relies on it for responding to user queries.An abstraction layer, to hide the major differences among the participating systems, is necessary to make the access consistent across the entire framework.
Thus, the DONSFed design consists of the following core elements: the framework layers, the framework workflow, and the environment setup.We have reviewed various approaches that ensure web services integration and offer substantial abstraction among the specific component systems that constitute the federation.Based on the detailed study and analysis of these approaches, we identified and categorized the web services and the required operations for each identified service in our framework.Each web service consists of its description and specifies the necessary input parameters that are needed to invoke its operations.A dedicated web service is available with every component that supports the connection to the portal.Moreover, many advanced features to support changes to the web service operations have been implemented in order to reduce the maintenance required.
A. Framework Architecture III Fig. 2 presents the DONSFed framework architecture which consists of five layers namely: DONS Federation, Adaptation, Component Systems, Query Processing, and Interface.In the DONS federation layer, the federated services connect to different database systems that participate in the federation.The DONS federation layer consists of several federated services with each service responsible for processing predefined requests upon demand.A query triggers the corresponding federated service which may initiate selection of the available web services in the component systems layer.
The adaptation layer maintains an updated directory of web services available from each component database.It supports non-canonical databases, which do not provide web services natively.This is accomplished by generating web services in a compatible format.In addition, the adaptation layer takes care of the communication between the federation layer and the component systems.
The component systems layer supports heterogeneous data sources.These data sources may have native support for web services.If not, non-canonical data sources will work with The requested data is retrieved from various data sources in XML formats and sent to the results aggregator module which aggregates them into a global result in a suitable format to be delivered to the requesting application or user.To process XML data in XML data sources and to efficiently integrate and process XML data that is generated by the component databases, we developed XML data labeling scheme called Dynamic XDAS.Nearly all the existing node labeling schemes are not updated friendly.We chose to use Dynamic XDAS because it is fast, dynamic, and requires less storage space.It is fast because it computes parent-child, ancestor-decedent, and sibling relationships between XML data using logical operators.It is dynamic because, unlike nearly all the existing schemes, relabeling of XML data is not required during updates.For example, in the popular Dewey node labeling system, insertion of a new sibling node between its siblings labeled n and n+1 is impossible.In the worst case, the whole XML data in the corresponding data source must be relabeled.In Dynamic XDAS that is not required.Any node can be inserted without relabeling any other node.For example and as shown in Fig. 3, The sub-tree labeled 1, 011.01 (colored red) was inserted between the nodes labeled 1, 011.01.01 and 1, 011.011 without relabeling any existing node.
The federated services are described using the Web Service Definition Language (WSDL).The user queries are maintained in a natural language format as questions, with the provision that allows users to choose those questions.Users identify the disease or the category of the reported cases that they need to search and also provide the parameter values related to the selected question.The query planner module transforms the question into sub-queries.Web services are maintained in the Representational State Transfer (RESTful) design.The DONS federation service is passed the web URL of the required web service with the necessary parameters to properly route the The invocation of a RESTful web service with the required parameters determines which component system should be included.The participating component systems return data in XML format and the DONS federation service parses the XML data to combine the results into a single XML document using Dynamic XDAS result as an array of strings.The result is returned to the user in any format that he requires.a tabular format with respective columns for each request to ensure a semantically meaningful result.
The user interface layer provides an interface for authentication service to login to the portal.The authentication service is not only used to verify the user but also grant authorization to all required federation services.The resources across the network can be accessed based on the identified role of end-users during authentication which includes roles such as applications, administrators, advanced-users or endusers.The portal is designed to allow users or applications to select from several categories that contain a set of questions.Users can use the predefined question templates to select their queries to the system and provide the needed parameters.The query service will process and decompose the user query into a set of sub-queries.The results are then delivered to the component systems through the DONS federation layer using the appropriate web service.

B. Framework Workflow
In this section, the workflow that is initiated by a user through the submission of a query into DONSFed is presented.The term workflow, by our definition here, is a set of steps that outline the interactions between a user and the system.The workflow ensures the processing and return of the required results of the user inquiries.
The proposed framework has been designed to return the results of a distributed query in real-time.As mentioned in the previous section, each component system participating on DONSFed has web services which can be used to execute a single or multiple questions (numbered Q 1 to Q n ) and generated using a question template.The portal interface consists of a set of federated services that are designed and deployed by DONSFed administrators.Each federated service is defined as a set of questions that can be selected as workload by either the end user, application or administrator.The selected federated service will list the instructions on how to map the selected queries (Q i ) to various web services to retrieve data through those web services.The framework is highly flexible and can adapt to demands of new heterogeneous and distributed systems.These systems can join DONSFed by configuring the set of questions and deploying the required web services.Fig. 4 illustrates the workflow approach in practice: 1) the authorized user identifies and selects a specific federation service from an available pool of federation services by accessing the portal; 2) the federation service generates the consolidated question based on the parameters identified by the user; 3) the query module decomposes the consolidated query into sub-queries by mapping each sub-query to one of the questions in the consolidated user-developed question and returns a batch of queries to the federation service; 4) the federated service then invokes a set of different web services of each component system linked to the sub-query; 5) each web service will generate the results and deliver it to the aggregator for a consolidated output; 6) the aggregated results are routed to the local server; 7) The results are displayed to the user in a tabular format as an HTML page.Fig. 5 compares the execution of a request that generates multiple sub-queries with a straightforward single request.As illustrated, the partitioning, routing and merging of a complex and parallel fetching query using component systems are executed with considerable ease.The design of the DONSFed framework resolves two major issues that are routinely encountered in a database federation environment.The autonomy provided to the participating systems with adequate provisions for the maintenance of this autonomy is a major challenge for architectures such as ours.The DONSFed services mitigate this issue by applying a sufficiently strong abstraction layer for the affected operations.Furthermore, the DONSFed design ensures that changes are rare to the services layer which guarantees lower maintenance.In order to maintain autonomy of the participating systems, the DONSFed service does not require control over the connected components.
The second issue is the support for heterogeneous data sources that participate from the component repositories.The web services approach allows an abstraction layer that, in turn, supports structural heterogeneity.Heterogeneity in the data tier is generally considered to be a difficult issue to resolve.However, in the DONSFed framework, it is not a major problem since the component systems yield mostly similar types of data for diseases, cases and outbreaks.DONSFed addresses the issues of data heterogeneity and data matching thoroughly, thereby, reducing the need for the component systems to modify the data sources.Further, DONSFed encourages the use of similar naming conventions across the network.The optimized federation layer using our patented XML technique [11] and web services makes the DONSFed a highly scalable and efficient framework.The scalability of the proposed framework is supported by the building blocks, a flexible and optimized federation layer with a patented design, RESTful web services and enforcement of standards across the network based on best practices.A new component system joining DONSFed needs to design and deploy the required web services that adhere to the framework guidelines.This is followed by necessary actions on part of DONSFed administrators to add the component system to the federation layer.

C. Prototype Deployment Architecture
In this section, the prototype architecture is described in detail with respect to heterogeneous federated databases and web services that are used to validate the proof-of-concept implementation.
The prototype is a cloud-based and geographically spread implementation that spans multiple heterogeneous platforms across three tiers.The first tier is the presentation tier that represents the user interface.Typically, this involves the use of browser-based graphical user interface for smart client interaction.As shown in Fig. 6, the DONSFed browserbased interfaces for data entry, data aggregation, and data integration aid the main stakeholders including primary health centers, experts and healthcare practitioners in operational and decision-making roles.The external databases such as World Health Organization (WHO) databases and others may also The second tier is the application and logic tier where federated services are built to address the functional specifications in terms of federated queries and services based on the stakeholder requirements.Finally, the data tier consists of various heterogeneous database servers.This tier can be accessed through the business services layer and on occasion by the user services layer.Here, information is stored and retrieved and hence this tier keeps data neutral and independent from application servers or business logic while improving scalability and performance.
The different tiers communicate amongst themselves through standard interfaces and protocols.Incoming HTTP requests from users are first sent to the DNS server, where the load balancer routes the requests to web servers with the least load.Web servers directly interact with the appropriate application server to process the requests and receive a proper response.In the implementation, the different component systems were deployed with each one hosted on different virtual machines in a cloud setup using web services middleware in service-oriented architecture design.Fig. 7 illustrates the high-level view that visualizes the hardware, the middleware and the software used in the prototype implementation as a proof-of-concept deployment.The deployed model consists of multiple tiers including the application and data tier components such as web servers, clients, data sources, and integration links.

D. Prototype Data Tier
In the data tier of the prototype implementation, three autonomous, heterogeneous and distributed databases are connected.These databases were selected based on diverse geographical locations and their database repositories were migrated to our cloud platform.These databases with different schemas and semantics were evaluated as suitable for testing the proposed federation framework.The first database which formed part of the prototype deployment is the KSA DONS system which is an Oracle cloud-based database [26].The KSA DONS database server sits on our university private cloud called KLOUD (KFUPM Cloud) virtual machine with Red Hat Linux 6.4 as its operating system.The Oracle server and client software were configured on all the servers and clients in the KSA DONS architecture.This configuration helped in establishing communication amongst all components of the KSA DONS system including the database server.As shown in Fig. 8, the database schema consists of 19 tables along with stored procedures, triggers, and views.
The second database is a MySQL database from the CASE system in Sweden.This system was developed at the Swedish Institute for Communicable Disease Control (SMI).The system acquires data from the database that collects notifiable diseases in Sweden (SmiNet).The system is currently active and performs daily surveillance.This is an open source software without the personal identification of patients.The available data includes selected variables from the CASE database [4].The CASE database schema is illustrated in Fig. 9.
In order to further validate our approach that spans a federated database, constituent and actively participation systems, and integration using web services, an additional data source is added.The third database sourced the data again from the CASE database.The entire database was successfully migrated with all the associated objects including the database schema, stored procedures, triggers etc.The migration to Microsoft SQL server database platform was performed in order to ensure additional heterogeneity to the proposed deployment model.The tools used for migration included SQL Server Migration Assistant (SSMA) utility.The SSMA, which has built-in migration support, aided in the migration of database objects and data from our source MySQL database.The process involved configuring project-level options to convert objects, accurately map source data types to target data types, migrate the data, and ensure all configuration options are compatible with the proposed framework specifications.The migrated database schema consists of 12 base tables and 8 data views with the correctly mapped primary keys and indexes.The stored procedures and triggers were also migrated.The DONS database schema on the SQL server platform is presented in Fig. 10.

E. Prototype Presentation Tier
A cloud-based system with geographically spread component DONS is developed which consists of heterogeneous application and data layers communicating with the DONSFed federation layer.A portal interface is used to allow users to connect to the DONSFed.Typically, the user connects using a browser-based graphical user interface.The DONSFed inter- face layer is the presentation tier for data entry, aggregation and integration, as shown in Fig. 2, helps the major stakeholders such as primary health centers, healthcare consultants and practitioners to interact with the system.There is provision to connect external databases such as WHO database directly, through an interface in this layer, for data transmission and retrieval.In the application and logic tiers, we maintain the federated services that are designed according to the functional requirements and specifications to support federated queries and services.As mentioned earlier, the data tier consists of heterogeneous database servers participating in the DONSFed.The data tier can be accessed, if needed, through the tier directly using web services.This tier maintains data independent and neutral from application servers or business logic.
As part of the prototype implementation, authors deployed several web services using a data services server that connects to heterogeneous databases through a service-oriented architecture and offers uniform access to autonomous and heterogeneous data sources.Using data masking techniques, the heterogeneity between the data sources, including databases, spreadsheets, or files, is hidden.The web services supported include SOAP and RESTful services.A web service that originates from a DONS federation service and connected to an Oracle database is shown in Fig. 11.The service supports several operations using a WSO2 data services server1 .This service generates a request in XML format through a request window.After proper parameters are supplied, it will deliver the results in XML format as shown in Fig. 11.
The DONSFed portal offers quick and easy access to users by providing links to specific component databases sites and to the federated services.From the portal page, a user can query to determine which disease is an outbreak.The detection can be queried based on time and location and restricted to registered cases from all component databases.The second web service that originates from a DONS federation service and connected to an SQL server database is shown in Fig. 12.The third web service that originates from a DONS federation service and connected to MySQL database is shown in Fig. 13.
All the results are collected as datasets and formatted into a tabular representation.Fig. 14 shows a federation service that collects the registered cases on all component databases based on a specified date range which is defined as a parameter to that service.The aggregator service module receives and parses the XML output and generates tabular results as shown in the figure.In this particular result, the output presents the number of cases found in each of the participating data sources with the cumulative total.
The HTML output further provides a drill down feature where the user can click on the active hyperlinks to explore the data from each data source.Fig. 14 presents the results of a query as follows: MySQL database produced 31 cases, the SQL server database listed 28 cases, and the Oracle database came up with 37 cases.Several federation services that authors have tested were implemented similar fashion.

IV. CONCLUSION AND FUTURE WORK
The proposed approach in the design of a framework has proved successful.The advanced design and patented XML technique ensured that the proposed framework for disease outbreak notification systems is unique.The use of web services for implementing database federation has ensured that the components of the federated system can be added and removed without any impact on the overall federation system

Fig. 3 .
Fig. 3.An Example of Insertion Process in Dynamic XDAS