Intelligent Data Aggregation Framework for Resource Constrained Remote Internet of Things Applications

Internet of Things (IoT) is a technology that can connect everything to the Internet. IoT can be used in a wide range of applications which includes remote applications like Underwater networks. Remote applications involve the deployment of several low-power, low-cost interconnected sensor nodes in the specific region. With a massive amount of devices connected to the IoT and the considerable amount of data associated with it, there remain concerns about data management. Also, the amount of data generated in an extensive IoT-based remote sensing network is usually enormous for the servers to process, and many times data generated are redundant. Hence there is a need for designing a framework that addresses both aggregations of data and security-related issues at various aggregation points. In this paper, we are proposing an intelligent data aggregation mechanism for IoT-based remote sensing networks. This method avoids redundant data transmission by adapting spatial aggregation techniques. The proposed method was tested through simulations, and the results prove the efficiency of the proposed work. Keywords—Wireless sensor networks; Internet of Things; intelligent boundary determination; sensor nodes; data aggregation


I. INTRODUCTION
Internet of Things refers to connecting various things to the internet. IoT can be used in a wide range of applications. IoT applications include remote applications also. Remote IoT applications include military applications such as surveillance, battlefield monitoring, underwater applications such as oil spill detection, analysis of aquatic animals lifecycle, smart farming, forest applications such as fire monitoring, etc. Remote IoT applications use resource-constrained networks. It includes deploying several low-cost and low-power sensor nodes that can sense, process, and communicate the data. Each sensor node has a limited transmission range, limited energy, limited processing capabilities, and limited memory.
As sensor nodes are resource-constrained devices, several sensor nodes will be deployed in a smaller region. This leads to the generation of redundant data. Transmitting redundant data will leads to the wastage of energy and other resources. Redundant data transmission can be controlled by data aggregation techniques. There are different data aggregation techniques such as In-network data aggregation, Tree-based data aggregation, cluster-based data aggregation, Grid-based data aggregation, and hybrid data aggregation. There can be single-level or multiple levels of data aggregation. Data aggregation can be performed in homogeneous networks or in heterogeneous networks. Homogeneous networks include nodes with similar configurations and capabilities in the entire network. Heterogeneous networks include nodes with different configurations and capabilities organized in the multiple levels of hierarchy [15] [17].
Data aggregation involves just combining the data from multiple sources into a single packet and forwarding it to the next node or destination. Traditional approaches always aggregate the data based on the count, average, sum, etc. The reduction of redundant data is not that significant through normal data aggregation techniques. Hence there is a need for adding some intelligence to nodes to decide whether to transmit the data or not [16]. In the constrained remote IoT applications where we have limited resources, communication consumes more energy than compared to processing. In this paper, we are proposing an intelligent data aggregation technique, which aggregates the data as well as provides intelligence to nodes to detect whether to send the data or not. Also, our present work adds features to detect the boundary of certain event occurrences in event-driven networks.
The paper is organized as follows: Related work is given in Section 2, Section 3 describes proposed work, Section 4 describes the simulation and results, the conclusion is provided in Section5.
In a flat network, the base station transmits a query message requesting data from the sensor nodes within the network. The nodes which have data relevant to the query sent will respond with the requested data. In this method, the base station performs excessive communications and computations. Because of this, if the base station fails, then the network connection will be lost with the outer world.
Under hierarchical data aggregation, many approaches have been proposed for energy efficiency and scalability. There exist four different types of hierarchical data aggregation they are the cluster-based data aggregation [4][5], tree-based data aggregation [9], grid-based data aggregation [8]., and chainbased data aggregation [6] [7].
These traditional data aggregation schemes will just combine the data from multiple sources and forward it to the 146 | P a g e www.ijacsa.thesai.org next node. In the majority of applications, normal data aggregation is not sufficient. We need to add some intelligence to the data aggregation and data transmission process within the limits of constrained resources. J. Chen, S. Kher, and A. Somani, et al. proposed a scheme in [9], which involves Majority Voting. This approach is a data outlier detection method based on spatial correlation. Here if a reading of the local sensor node is different from the majority of its neighboring nodes, then it will be classified as abnormal.
Y. Sun, H. Luo, and S. K. Das et al. Proposed a scheme in [10]. Here according to the trustworthiness ranked by comparison with historical data and neighbor data, Weight will be assigned to every sensor data. Then weighted mean value will be calculated at the aggregator. This will be considered as the aggregated data.
S. Din, A. Ahmad, et al. Proposed a scheme in [11], where the nodes closer to the sink node performs direct communication and remains unclustered; the nodes which are one-hop away from the sink node perform multi-hop communication and remains clustered.
Bo Yin et al. in [12] specified a Tree-based scheme that involves the construction of an aggregation tree for complex queries. This ensures minimum communication cost.
Xiong Li et al. proposed a scheme in [13] where there will be three participants, they are edge server, terminal device, and public cloud center. The data from the terminal devices is encrypted and communicated to the edge server, then the edge server performs data aggregation of the data from terminal devices and forwards the aggregated data to the public cloud center. The aggregated plaintext data can be recovered by public cloud center through its private key.
S. Kumar and V. K. Chaurasiya et al. in [14] proposes a scheme, where data mining techniques are used to generate more accurate, consistent, and useful information than that generated by any individual sensor node.

A. Assumptions and Architectural Setup
Following are the assumptions made in deployment of nodes in the WSN test bench.
1) Grid based deployment of nodes: nodes are deployed on a crisscross grid, to specifically determine location of each individual node. This assumption is both safe and valid as, there is a commercially available underwater robot capable of deploying nodes at specific coordinates of latitude and longitude. Such a deployment is also helpful in making intelligent inferences based on which nodes are communicating sensed data.
2) Number of nodes per grid is fixed: this assumption helps us compute statistically significant results after data collection is done and analysis is to be performed.
3) Node density is uniform: non-uniform node deployment does not allow implementation of intelligence. 4) Each node in a grid represents a single unit of area: data collected is representative of a fixed area under surveillance.
5) Unit squares (Grids) form a Level -1 cell: this logical rule imposed on grid helps in aggregation of data across levels of nodal deployment.
6) Every level-1 cell has a level-1 cell aggregator node: this is an architectural requirement of WSNs. aggregator nodes act as data aggregator for sensed data before being forwarded to upper layers. c) Level -1 Aggregator: They act like second level grid head. They gather the data from multiple grids and transfer it to Level-2 Aggregator after processing. d) Level-2 Aggregator: They are at the top level of hierarchy, they can provide minor conclusions about the sensed information.
2) These nodes are deployed at the required locations.
3) The area of interest is divided into equal sized grids. Ground level nodes and Grid heads are deployed at each grid. High power node will be the grid head of that particular grid. 4) Each High power node broadcast a beacon message to the ground level sensor node to indicate its presence in the grid. 5) Each ground level nodes respond back to their grid heads by sending the beacon response. Same Procedure is followed between grid heads and anchor nodes, Anchor nodes and surface buoyant nodes.  Fig. 1 shows the proposed network architecture. The network is divided into smaller grid. Grids are combined to form level-1 Cell. Level-1 Cells are combined to form level -2 Cells. There will be sink node at the border of area of interest. Sink node forwards the data to server through internet connection.

C. Data Aggregation
Tier-1Local aggregation (level 1): Spatial aggregation: this primary level of aggregation is based on the percentage of area generating similar readings. Each grid has a designated grid head. The grid head communicates reading only if number of nodes reporting similar readings are greater than threshold percentage. Spatial Aggregation is adapted in Grid as well as level-1 Cell by Grid head and Level-1 Aggregator. Boundary value aggregation: region under observation has a Level -2 Aggregators that acts as a data aggregator. Radial survey is made periodically at three lengths of radii-minimum, nominal and maximum. Minimum is close to the Level -2 Aggregators, maximum is distance from Level -2 Aggregators to boundary of region under observation, and nominal is an intermediate distance.

Tier-2 Aggregation (level 2)
Algorithm 1 shows how sensor senses and sends the data. Algorithm 2 and 3 depicts the level-1 aggregation techniques (Spatial aggregation). Level -2 aggregation is given in Algorithm 4. Table I

IV. SIMULATION AND RESULTS
In the proposed research work, we have used MATLAB as the simulation tool to check the efficiency of our proposed work. Initially, we considered the area of the application region with dimensions 800m (length) x 800m (breadth). Then the area under consideration is divided into equal-sized grids of size 100m each at ground level. Table II shows the various parameters used in the simulation. Fig. 2 depicts the topology created through simulation. Here blue color nodes represent sensor nodes deployed at ground level. Red color nodes represent the grid head; the magenta color nodes indicate nodes that are Level-1 Aggregators, Black color represents Level-2 Aggregators, and Green color indicates sink node. Fig. 3 shows the amount of time required to detect the boundary of the event spread through Level-2 aggregation. In certain remote IoT applications detecting the spreading rate of a certain event is very important to take immediate measures. For Ex: the spread of fire in the forest, oil leakage in the ocean, etc. Fig. 4 helps us to understand that the number of redundant transmissions in various techniques compared with our work. It is evident that redundant data transmission is significantly dropped compared to other traditional approaches. Our method requires a lesser number of packet transmissions compared to without aggregation deployment, traditional in-network architecture, and cluster-based approach. It is clear that the proposed work reduces redundant data transmission significantly.  In this result also it is clear that our proposed method is better compared to others. The proposed method involves very little data transmission. Fig. 6 shows the comparison of the proposed method with other traditional methods in terms of energy consumption. Result clearly shows that the proposed method consumes lesser energy compared to other techniques.

V. CONCLUSIONS
Controlling redundant data transmission is a significant challenge in Constrained IoT Remote applications. In this proposed research work, Data Aggregation is addressed through spatial and boundary value aggregation, which is the novel scheme. The aggregation technique adapted reduces the number of redundant transmissions significantly. The efficiency of the proposed approach is compared with existing methods through simulation.