A STUDY ON DIFFERENT DATA AGGREGATION TECHNIQUES IN DISTRIBUTED SENSOR NETWORKS

(1)

Abstract

Mobile devices such as smart phones are gaining an ever-increasing popularity. Most smart phones are equipped with a rich set of embedded sensors such as camera, microphone, GPS, accelerometer, ambient light sensor, gyroscope, and so on. The data generated by these sensors provide opportunities to make sophisticated inferences about not only people (e.g., human activity, health, location, social event) but also their surrounding (e.g., pollution, noise, weather, oxygen level), and thus can help improve people’s health as well as life. To address these problems, this paper proposes an efficient protocol to obtain the Sum aggregate, which employs an additive homomorphic encryption and a novel key management technique to support large plaintext space. The thesis also extends the sum aggregation protocol to obtain the Min aggregate of time-series data. It shows that the proposed protocols are faster than existing solutions, and it has much lower communication overhead. In addition, the paper proposes a new concealed data aggregation scheme which is homomorphic public encryption system based. The proposed scheme has three contributions. First, it is designed for a multi-application environment. The base station extracts application-specific data from aggregated cipher texts. Next, it mitigates the impact of compromising attacks in single application environments. Finally, it degrades the

Index Terms- Cluster-Based Periodic Sensor Network (CPSN), CONCEALED Data aggregation (CDMA), Clustering topology, Big-Data Sensing, Similarity and Distance function, Group of nodes can communicate with a single node in secure manner.

I.INTRODUCTION

[image:1.612.336.536.473.568.2]

The proliferation and ever-increasing capabilities of mobile devices such as smart phones give rise to a variety of mobile sensing applications. This thesis studies how an un trusted aggregator in mobile sensing can periodically obtain desired statistics over the data contributed by multiple mobile users, without compromising the privacy of each user. Although there are some existing works in this area, they either require bidirectional communications between the aggregator and mobile users in every aggregation period, or have high-computation overhead and cannot support large plaintext spaces. Also, they do not consider the Min aggregate, which is quite useful in mobile sensing. In addition, Database as a Service model is proposed in which, a client stores her database on an un trusted service provider. Therefore, the client has to secure their database through Privacy Homomorphism (PH) schemes because PH schemes keep utilizable properties than standard ciphers. PH scheme provide aggregation queries and results.

Fig. 1. Data aggregation based on clustering Network Topology.

In Fig. 1, Data transmission between member nodes and their appropriate CH or between CHs and the sink is based on single-hop communication. Member nodes collect data in a periodic manner. Subsequently, each member node sends its data to the appropriate CH at each period. Sensor nodes sense environment at a fixed sampling rate where each ea one takes t measures at each Period.

A STUDY ON DIFFERENT DATA AGGREGATION

TECHNIQUES IN DISTRIBUTED SENSOR NETWORKS

1. Dr. D. FRANCIS XAVIER CHRISTOPHER M.sc, Mphil(CS),PGDPM&IR, Ph.D 2. C. AKALYA, MCA Director, School Of Computer Studies Mphil(CS) Research Scholar RATHNAVEL SUBRAMANIAM RATHNAVEL SUBRAMANIAM

(2)

II.RELATED WORKS

2.1 DRAWING DOMINATE DATASET FROM BIG SENSORY DATA IN (WSNs)

The Technical describes the increasing popularity of Wireless Sensor Networks (WSNs), the amount of sensory data manifests an explosive growth. In many applications, the scale of the sensory data has already exceeds several peta bytes (PB) annually. The amount of sensory data manifests an explosive growth due to the increasing popularity of Wireless Sensor Networks. The scale of the sensory data in many applications has already exceeds several peta bytes annually, which is beyond the computation and transmission capabilities of the conventional WSNs. On the other hand, the information carried by big sensory data has high redundancy because of strong correlation among sensory data dominant dataset, which is only a small data set and can represent the vast information carried by big sensory data with the information loss rate being less. This prove that drawing the minimum dominant dataset is polynomial time solvable and provide a centralized algorithm with O(n 3)time complexity. Furthermore, a distributed algorithm with constant complexity (O(1)) is also designed. It is shown that the result returned by the distributed algorithm can satisfy the requirement with a near optimal size. Finally, the extensive real experiment results and simulation results are carried out. The results indicate that all the proposed algorithms have high performance in terms of accuracy and energy efficiency.

2.2

AN ENERGY-EFFICIENT GRID BASED

CLUSTRING TOPOLOGY FOR A (WSNs)

The Wireless Sensor Network (WSN) is an emerging technology. Wireless Sensor Network composed of a set of tiny sensor nodes. The nodes are continuously sense and transmit the data. WSN have a wireless nature, due to this has a limited lifetime. So increase the lifetime of Wireless Sensor Network and Minimize energy cost in wireless sensor network are an important problem. To solve this problem clustering technique are always used, among the entire clustering technique grid based clustering is more efficient. Grid based clustering is more simple and feasible, and has

so much advantage with respect to other clustering method. In this we proposed a grid based clustering algorithm for minimizing energy consumption, where clustering is formed depending upon the interest of nodes each cluster has a coordinator or master head i.e. the

cluster head (CH). For CH selection any algorithm can be applied with our work. In Cluster formation process, Firstly a cluster head is selected then with the collaboration of BS clusters are formed and finally routing is carried out. The cluster head selection phase starts and all the deployed nodes send their energy levels to the Base Station. Then on the basis of energy level, geographical area and least id cluster head are selected. Network deployment is considered as manual so the base station is well informed about the geographical locations of the nodes. Base Station will select the cluster heads and multicast this information to them.

Fig 2 . Cluster based Model

In Fig 2. In clustering scheme network is grouped into different clusters Each cluster is composed of one cluster head (CH) and cluster member nodes. The respective CH gets the sensed data from cluster member nodes, aggregates the sensed information and then sends it to the Base Station.

2.3 CLUSTER-BASED COMB-NEEDLE MODEL FOR

ENERGY EFFICIENT DATA

AGGREGATION(WSNs)

(3)

We extend the basic Comb-Needle Discovery Support Model by including Cluster-based data aggregation mechanism, which helps minimize the communication Cost.

Cluster based approach groups the sensor nodes in the sensor network. Each node of a group will send information to its Cluster Head, which then aggregates and forwards the information to the base station (Sink). We compare the performance of proposed Cluster-based Comb-Needle Model with the basic Comb-Needle Model using the parameters, namely, energy consumption and communication cost.

The experimental results Avrora simulator on Tiny OS Platform reveal that Cluster based Comb-Needle Model for Data Aggregation helps a sensor network in conserving its energy, and thereby extending its life time.

2.3.1 Analysis Of The Cluster-Based Comb Needle Model

In this section, we analyze the “comb-needle” structure formed by the Cluster-based model. It is assumed that the total number of sensor nodes in the network, the number of clusters and their sizes are fixed by the designer of WSN application as integral values.

We assume a regular n x n grid, and N is the total number of nodes.

N=nXn……….Eq (1)

The communication cost for each query is estimated as follows:

Each event generation forms a needle of length l from comb and thus communication cost for each needle is

Cl =l-1 ... Eq--- (2)

We assume that an event can be generated by any Node the following parameters are defined:

fq = Query frequency fe = Event frequency

fd = fe/n2, Frequency of events per sensor node Note that the communication cost depends on whether unicast or broadcast is used. Here we consider unicast for simple analysis.

The cost of building a single needle of length l is l-1.

We assume that frequency of Query fq, the Frequency data event is fe are written as fe/fq.i.e., Number of Events generated per query in a needle is l-1(fe/fq) .

Since Cl =l-1, So We can write event generated cost per needle is.

Cost per needle = Cl(fe/fq)………Eq(2.1)

The query data dissemination cost for basic Comb-Needle model and Cluster-based approach is same so we consider Cqd [3], is

Cqd-n-1+(n-1)+1……..Eq(3)

The above calculates the cost to form one vertical query line and one horizontal query line. Note that, if s=l, the query data dissemination reaches its better level; it means cluster head collects the information from nodes. The values of s and l are determined by Cluster-based “comb-needle” model.

The query response cost depends on data aggregation scheme used to reply data. Here we are using Cluster-based approach; each node pushes its data towards cluster head.

The data is pushed vertically upward or down ward towards the cluster head depends on event generated location

The data is pushed horizontally and vertically upward if event is generated

Upward=l+n+d1---Eq (4)

Then data is pushed horizontally and vertically downward if event is generated

Downward =l+ (n-d) + (d-d1) ---Eq (4.1)

Combining (4) and (4.1) to get the Query reply cost for basic Comb-Needle model.

Cqr=2(n+l) ---Eq (4.2)

To find the needle in 1 hop the cost is reduced 50% in Cluster-based Comb-Needle model.(i.e., data traversed only in one direction in Cluster-based Comb-Needle Model) So we will get n+l

(4)

Our main aim is to minimize the communication cost. In summary, the complete cost per query consists of duplicated data cost, Cluster-based comb building cost and response reply cost.

2.4 DYNAMIC DATA AGGREGATION PROTOCOL BASED ON MULTIPLE OBJECTIVE TREE IN (WSNs)

A Data aggregation has been widely applied as efficient techniques in order to reduce the data redundancy and the communication load in Wireless Sensor Networks (WSNs). However, for dynamic scenarios, structured protocols may incur high overhead in the construction and the maintenance of the static structure. Without the explicit downstream and upstream relationship of nodes, it is also difficult to obtain high aggregation efficiency by using structure-free protocols. In order to address these aspects, we propose a semi-structured protocol based on the multi-objective tree.

The routing scheme can explore the optimal structure by using the Ant Colony Optimization (ACO). Moreover, by using the prediction model for the arriving packets based on the sliding window, The adaptive timing policy can reduce the transmission delay and enhance the aggregation probability. Therefore, the packet transmission converges from both spatial and temporal points of view for the data aggregation procedure. Finally, simulation results validate the feasibility and the high efficiency of the novel protocol when compared with other existing approaches.

Fig. 3 Adaptive Timer

Fig. 3. The used to explain this situation. The time interval Tw can be saved, and the whole aggregation process is speed up, and finally, the application delay can be further reduced.

2.5 APPROXIMATE HOLISTIC AGGREGATION IN (WSNs)

This techniques Holistic aggregation results are Important from Wireless Sensor Networks (WSN). Holistic aggregation requires all the sensory data to be sent to the sink, which costs a huge amount of energy. Fortunately, in most application, approximate results are acceptable. The approximated holistic aggregation based on uniform sampling. In this paper four holistic aggregation operations are investigate.

The mathematical methods to construct their estimators and determine the optional sample size are proposed, and the correctness of these methods is proved. Four corresponding distributed holistic algorithm are presented. The theoretical analysis and simulation results show that the algorithms have high performance.

The amount of sensory data generated by a (WSN) is always huge and redundant. The summary information returned by aggregation operations is thus more meaningful Aggregation operations can be classified as Distributable, algebraic and holisticons.

The information returned by holistic aggregations is the most valuable but all sensory data needs to be transmitted for the exact results, which costs high energy consumption. Considering approximate aggregation results are acceptable some approximate holistic aggregation techniques were proposed. which save lots of energy but have a fixed error bound and cannot satisfy an arbitrary precision requirement.

III. METHODOLOGY

3.1 AGGREGATION PROTOCOL FOR SUM

In this protocol, the setup phase only runs once. After the setup phase, the key dealer does not need to distribute secrets to the users and the aggregator again. In addition, the users and the aggregator do not have to synchronize their key generations with communications in every time period.

(5)

To proposed a construction for key generations that preserves the privacy of each user and the Sum aggregate efficiently. Before presenting this construction, it first discusses a straw-man construction which is very efficient for the users but not efficient for the aggregator. Then, it extends to straw-man scheme to derive this construction. Both constructions include three processes, which are secret setup, encryption key generation, and decryption key generation. They proceed in the Setup phase, Enc phase, and AggrDec phase of the aggregation protocol, respectively.

3.1.1 Protocol Overview

Setup. The key dealer assigns a set of secret values (secrets for short) to each user and the aggregator. Enc. In each time period, user i (i 1,n]) generates encryption key ki using the secrets that it is assigned. It encrypts its data xi by computing

ci = (ki + xi) mod M……….(1)

where M = 2log2 (n). Then, it sends the cipher text ci to the aggregator.

AggrDec. In each time period, the aggregator generates Decryption key k0 using the secrets that it is assigned, and decrypts the sum aggregate

n

S = xi by computing i=1

n

S = ( ci-k0) mod M---(2) i=1

The keys are generated using a PRF family and a length-matching hash function (see later). According to the

aggregator can get the correct sum so long as the following equation hold

N

k0 = ( ki) mod M---(3) i=1

Encryption key generation. In time period t  IN, user I generates its encryption key by computing.

Ki = (h(fs’(t)) - h(fs’(t))) mod M……(4) s’Si

s’Si

Decryption key generation.Intimeperiod t  IN,the aggregator generates the decryption key by computing k0 = ( h(fs’(t))) mod M---(5)



s’S

3.1.2 Aggregation Protocol For MIN

The Min aggregate is defined as the minimum value

of the users’ data. This section presents a protocol that

employs the Sum aggregate to get Min.

Fig.4. Derivative Data[Last Row Is Data Sum of All

Users]

In Fig.4. This scheme gets the Min aggregate of each time period using + 1 parallel Sum aggregates in the same time period. The sums used to obtain Min are based on a number of 1bit derivative data (denoted by d) derived from the users raw data x. Without loss of generality, it is assumed  is a power of two.Aggregate data.

The scheme works as follows: In each time period, each user generates  + 1 derivative data d[0], d[1], ... , d[, where each derivative data correspond to one possible data value in the plaintext space. For each j  [0, ], the user assigns 1 to d[j] if its raw data value is equal to j and assigns 0 otherwise. For each j  [0, ], the aggregator can obtain the Sum aggregate of d[j]

using the sum aggregation protocol presented. Then, Min is the smallest j that returns a positive sum.

(6)

3.1.3 Aggregation Protocol For MAX

The Max aggregate is defined as the maximum value of the users’ data. The protocol that employs the Sum aggregate to get Max is same as 3.3.4 except Max is the biggest j that returns a positive sum.

3.2 RELAXING THE ASSUMPTION OF TRUSTED KEY DEALER

Instead of relying on a trusted key dealer, our protocol can be easily adapted to work with an honest-but-curious key dealer that does not collude with the aggregator. An honest-but-curious key dealer correctly follows the protocol steps, but wants to get users’ data values from the transcript of messages in the protocol.

To provide privacy under this model, the protocol adds one more encryption and decryption to the data that each user submits to the aggregator. More specifically, each user encrypts its data using the secrets assigned by the key dealer to derive an intermediate result z, encrypts z with a key pre-shared with the aggregator, and then sends the obtained cipher-text to the aggregator. The aggregator first uses the pre-shared key to decrypt each user’s intermediate Result z, and then decrypt noisy some with the secrets received from the key dealer. The key dealer cannot obtain the intermediate result z of any user, as long as it does not collude with the aggregator. Hence, it cannot get any user’s data value.

The honest-but-curious model is realistic because it can be enforced with trusted hardware. In practice, certificate authorities such as VeriSign (which already provides key management services) may serve as the key dealer. Since these authorities usually undergo extensive audits, collusion with the aggregator can be mitigated. More aggregate statistics. In the basic aggregation scheme for Min presented above, the aggregator can actually get the number of times that each possible data value appears, and derive the accurate distribution of the users’ data in the plaintext space [0,

  

 IV.CONCLUSION

This project studied Sum aggregation protocol in WSN environment and it also proposed Min and Max aggregate of time-series data. This project also studied CDAMA scheme for a multi-application environment, which is the first scheme. Through CDAMA, the cipher texts from distinct applications can be aggregated, but not mixed.

For a single-application environment, CDAMA is still more secure than other CDMA schemes. When compromising attacks occur in WSNs, CDAMA mitigates the impact and reduces the damage to an acceptable condition. Besides the above applications, CDAMA is the first CDA scheme that supports secure counting. The base station would know the exact number of messages aggregated, making selective or repeated aggregation attacks infeasible. Finally, the performance evaluation shows that CDAMA is applicable on WSNs while the number of groups or applications is not large.

In addition, it applied CDAMA to realize aggregation query in Database-As-a-Service (DAS) model. In DAS model, a client stores her database on an un trusted service provider.

V. REFERENCES

[1] S. Cheng, Z. Cai, J. Li, and X. Fang, “Drawing dominant dataset frombig sensory data in wireless sensor networks,” 2015 IEEE Conferenceon Computer Communications (INFOCOM), pp. 531–539, 2015

[2] K. R. Bhakare, R. Krishna, and S. Bhakare, “An energy-efficient grid based clustering topology for a wireless sensor network,” International Journal of Computer Applications, vol. 39, no. 14, 2012

[3] M. Shanmukhi and O. Ramanaiah, “Cluster-based comb-needle model for energy-efficient data aggregation in wireless sensor networks,” Applications and Innovations in Mobile Computing (AIMoC), pp. 42–47,

(7)

[4] Y. Lu, I. Comsa, P. Kuonen, and B. Hirsbrunner, “Dynamic data aggregation protocol based on multiple objective tree in wireless sensornetworks,” Tenth International Conference on Intelligent Sensors, SensorNetworks and Information Processing (ISSNIP), IEEE, pp. 1–7, 2015.

[5] J. Li, S. Cheng, Y. Li, and Z. Cai, “Approximate holistic aggregation in wireless sensor networks,” Proceeding 35th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 740–741, 2015