In-network Aggregation - Sensor Data Collection

Appendix 5. A Some Proofs

III. Sensor Data Collection

6.2.2. In-network Aggregation

In-network aggregation is a category of data collection frameworks where sensor data from different sources is processed in networking nodes while relaying back to the sink. The techniques are tightly coupled with how data is sampled at the sensor nodes as well as how packets are routed through the network, and have a significant impact on network efficiency [94]. For instance, energy efficiency for the methods are usually achieved differently based on the employed routing protocol or the networking topology. For a tree based topology (see Section 2.1), when raw sensor data are received by an intermediate relay node, the node can append its sensor reading to the received packet and send the new data to the sink instead. Note that the total number of packets transmitted in this case is reduced. However, for cluster based network (Figure 2.1), better efficiency can be achieved by, for example, making use of the spatial correlation among cluster members. Next, we survey aggregation methods based on the network topology.

Tree Based

For the tree-based solutions, a spanning tree routed at the sink is established first. Sensor data collected at leaf nodes flows level by level towards the root; during the forwarding process, data aggregation can be performed. Simple aggregation oper- ations include SUM, MAX, MIN, and AVERAGE. By aggregation, data communication will be significantly suppressed. However, data resolution will be affected as the original data usually cannot be recovered at the sink. Another approach, called in-network aggregation without size reduction [94], in which the relaying node ap-

Chapter 6. Energy Efficient Data Collection: An Overview

(a) Tree-based Routing (b) Level-based routing and Power management with TAG (source [95])

Figure 6.1.: Tree based routing and how TAG saves energy by schedule management. pend its data to the relaying packet, can preserve data resolution while reducing communication effort.

Energy efficiency can be further improved by proper schedule management. The Tiny AGgregation (TAG) [95] is such an approach. Figure 6.1(a) shows a routing tree. TAG achieves energy efficiency by synchronizing children and parent nodes’ radio schedule. As shown in Figure 6.1(b), a sensing epoch is divided into several slots (equal to the depth of the tree). Each child node and its parent node switch on their radio at the same time slot, during which sensor data is transferred upwards in the tree. The child node then may switch off its radio and remain in standby stage until next epoch. The rest of the network apply the same technique to forward the data to the sink. The goal of this scheduling mechanism is to minimize the amount of time that nodes spend in powering-on state and maximise the time spent on standby mode. However, the solution needs a mechanism to synchronize each nodes’ clock which is a significant overhead especially for a large scale network.

TiNA [96] is an enhancement of TAG that introduces temporal correlation into the energy saving technique. The solution believes leaf nodes (or any source node in the tree) do not need to send every data they sample to their parents due to the temporal correlation. A user specified parameter tct is introduced to facilitate

a local test:

|Xt+1−Xt|

Xt+1

>tct. (6.4)

Only sensor reading passes the test needs to be sent. Intuitively, the test checks whether the new reading Xt+1 differs over tct×100% of the old one. Parent node

can use the cached reading for the corresponding child node when the new data is not received. The test is very straightforward but ad hoc. Different environmental

attributes clearly have different evolving patterns, and specifying problem-dependent

tctis not trivial. A miss-specifiedtctwill lead to interesting new data being ignored.

TiNA also introduce a mechanism to deal with dead nodes. Each children node is required to send a heartbeat message to its parent at a regular base. A dead node

will be excluded from the data aggregation.

Instead of aggregating simple summary statistics like SUM, MAX, several methods have been proposed to aggregate data based on mathematical models. For example, [97] presents a distributed regression method in which data is aggregated based on the linear regression result. Le Borgneet al. [98] investigates an aggrega-

tion method via Principal Component Analysis (PCA). PCA is a classic statistical data analysis technique which is widely used to reduce the dimensionality of the data [72]. The idea is to find a new set of basis vectors that are parallel to the eigenvectors of the data matrix. The new basis vectors can then be sorted such that the most valuable information can be contained by only a subset of the new basis vectors [84]. In other words, the data is projected into a lower dimensional subspace while the most possible variation is retained. It is shown that the PCA projection can be computed in a distributed manner with an aggregation service provided the new basis vectors are known (eigenvectors of the data matrix) [98].

Note that all the solutions presented so far only work with tree structured topology, and they generally cannot be adapted to other topologies. However, tree-based techniques suffer from single-point failures even remedies likeheartbeat message [96]

is used: any node’s failure might lead to a collapse of the whole routing sub-tree beneath it.

Cluster Based

Another popular routing mechanism employs a cluster-based topology. Low-Energy Adaptive Clustering Hierarchy (LEACH), for example, is an in-network aggregation solution based on this topology. LEACH consists of two separate phases: set-up phase and steady state phase. During the set up phase, clusters based on signal

Chapter 6. Energy Efficient Data Collection: An Overview

Figure 6.2.: LEACH data collection based on clustering.

inates itself as cluster head (CH) and surrounding nodes join them based on the signal strength. Afterwards, in the steady state phase (see Figure 6.2), all source nodes send their data to their cluster heads according to a consensus schedule established earlier at the set-up stage. The data collection uses a TDMA protocol to ensure that there are no collisions among cluster members, which saves energy and time. Cluster members may switch to a sleeping mode until the next TDMA transmission slot to further reduce energy. The data received at the cluster heads are sent to the sink via a single direct transmission (dashed arrows in Figure 6.2).

Adaptive Sampling Approach to Data Collection (ASAP) [68] is a similar cluster- based solution, which forms clusters not only based on spatial distance but on simi- larity of sensor readings. Therefore, each cluster consists of spatial correlated sensor nodes. Energy can be saved by dividing the sampling workload among cluster members. Specifically, each cluster (formed based on hop distance and data correlation) is further divided into several sub-clusters. At each time instance, only one node from each sub-cluster needs to sample the environment. The probabilistic models, formed within each sub-cluster and capturing both the spatial correlation in the sub-cluster, are sent together with the sampled data to the sink. The sink may recover the missing data by using the model. ASAP incurs significant overhead at the cluster headers which are responsible of constructing sub-clusters and electing samplers. The average lifetime might not be extended as expected as head nodes tend to deplete their energy faster.

In document Wireless sensor network control through statistical methods (Page 124-128)