Data Aggregation - Operational Concerns - Connectivity Weighted Transfer

5.2 Connectivity Weighted Transfer

5.2.5 Operational Concerns

5.2.5.5 Data Aggregation

It is assumed that a network utilising data aggregation contains a number of aggregator

nodes that forward data either to additional aggregator nodes or to the sink.

Modifying CWT to handle data aggregation requires an adjustment to the connection defin- ition: A source A is connected to a sink for the entirety of a frame F is source A generates any data during frame F that is subsequently received by:

1. a sink, or,

2. an aggregator that uses the data to generate a packet that is received by a node covered by one of these two categories.

It is also important to address the issues surrounding variablebi, which refers to the average

number of bytes transferred per source in framei. Due to data aggregation, the number of bytes sent by each source may differ from the number of bytes received by the sink. Since it is desirable for the CWT metric to measure the quantity of information rather than the

101

quantity of data, variable b refers to the average number of bytes transferred per source

before aggregation.

5.2.6 Example

This section uses an example to illustrate the limitations of common metrics as well as demonstrate the effectiveness of the CWT metric. The network shown in Figure 5.2a is used as an example, and contains one sink node (Z) and nine source nodes (A-I). For con- venience, it is assumed that communication between nodes is perfect and bidirectional. Thus, an edge between two nodes in the diagram indicates that those nodes can communicate with each other.

Z Sink D E B A C F G H I (a) Z Sink D E B A C F G H I J (b)

Figure 5.2: Two example networks with and without source J

The proposed application mimics that of Tolle’s redwoods microclimate [114] project in which the sink node is placed at the bottom of a tree and the sources are placed at different heights. Every five seconds, sources generate a piece of data, and send it via the minimum hop path towards the sink. Since Tolle does not specify a data packet size and sources may produce data packets of different sizes, particularly in different deployments, the simulation randomly determines each packet’s data size between 2 and 100 bytes. The aim of the application is to calculate a temperature gradient of the tree for as long as possible.

Thus, the application benefits from having more sources (to produce a higher resolution temperature gradient) for longer.

The application is simulated using the Castalia 1.3 simulator, whose use is justified in Section 5.3. The nodes are based on the TMote sky [101]. However, in order to reduce the simulation time, each node is given only 14.58 J of energy as opposed to the 29160 J that would be provided by a pair of AA batteries. The sink is given as much energy as the simulator would allow. Experimental observations are shown in Table 5.2.6.

Time Event

32539 (9 hours) Source C expires 33905 (9.4 hours) Source D expires 103735 (28.8 hours) Source A expires 125923 (35 hours) Source B expires Table 5.1: Simulation of experiment 1

Source C is the first node to expire. However, its loss is unlikely to render the network unusable since source D can be used in place of source C for routing. When source D expires, only sources A and B remain connected to the sink. However, it may be possible to estimate the temperature gradient at future times using only A and B. If the network remains usable with only sources A and B, then its useful network lifetime (35 hours) is almost four times greater than n-of-n lifetime suggests.

The network can be modified to that shown in Figure 5.2b by inserting an additional source J to the void between sources E and F in order to obtain a higher resolution temperature gradient. The observations are shown in Table 5.2.6.

Time Event

28511 (8 hours) Source C expires 29450 (8.2 hours) Source D expires 105538 (29.3 hours) Source A expires 122628 (34 hours) Source B expires Table 5.2: Simulation of experiment 2

103

sources (the source-forwarding problem). Sources C and D expire 13% more quickly than in the initial scenario. However, during that time the temperature gradient is more precise due to the presence of source J. Over the entire experiment, the introduction of J causes the total data transfer to drop from 4.39 MB to 4.35 MB. Thus, the total data transfer metric may be ineffective at representing the usefulness of the network. Conversely, the CWT metric withx= 2increases from2.45×108to2.93×108(19.6%).

Thus, the CWT metric correctly reflects the improved temperature gradient that can be achieved by the introduction of source J whereas the total data transfer metric incorrectly suggests that the addition of source J has a negative effect on the application, despite the increased temperature gradient resolution that can be achieved.

5.2.7 Conclusion

Existing metrics are unsuitable for comparing two WSNs to determine which is better at maintaining a high source diversity for a long period of time. Classical approaches such as total data transfer or sink connectivity do not compensate for the increased energy expenditure caused by sources routing data on behalf of other sources (the source-forwarding problem). A new metric, known as CWT has been introduced, which can be used to reward networks that maintain high source connectivity. By utilising a user defined weighting, an application’s performance can be measured according to its ability to keep numerous sources connected.

It has been shown how the metric may be simply adapted to be used in several varieties of WSN application, including those that use continual data streams, discrete packets and delay tolerant networking.

A user can design a network that maximises the CWT by constructing their network such that no sources have any intermediate nodes in common (for a positive weighting factor) or such that each source must route through as many other sources as possible (for a negative weighting factor) and vice-versa for minimising CWT. For example, a network in which no sources have intermediate nodes in common can be achieved in a star topology in which each source is a direct neighbour of a sink. Conversely, an example network in which sources must route through many other sources is a linear network.

5.3 Experimental Methodology

This section explores the different options available for experimentally analysing the effectiveness of routing heuristics or protocols at maintaining high source diversity for long periods of time. The different options include:

• algebraic models,

• a physical deployment of nodes, and,

• a simulator.

The following sections examine each of these options to determine which is most suitable.

5.3.1 Algebraic Models

In an algebraic model [77] [107] [37] [76] [75], a network is expressed as a series of math- ematical equations. Typically, the equations are combined to form a linear programming problem where the objective function is to maximise some variable that reflects the network’s lifetime. By adjusting the variables to reflect the behaviour of a routing heuristic, it is possible to see how the lifetime variable changes.

Algebraic models typically make assumptions regarding the network or the knowledge available. For example, Li [75] assumes that the cost of forwarding data from one node to another is fixed. In practice, as discussed in Section 2.5.2, wireless communication is unreliable and it is therefore unreasonable to generalise its behaviour by the use of equations. For example, with random probability, messages may be lost or may have to be retransmitted at additional cost. Neighbouring nodes may also overhear the message and expend energy. Each of these actions may lead to randomness in the rate at which nodes exhaust their batteries. In a random network, it is very difficult to predict the effect of node expiration.

Another disadvantage of an algebraic model is that modelling a routing protocol is signi- ficantly harder. As discussed in Chapter 3, a routing protocol includes a discovery task in which available paths are collected. It is extremely difficult to algebraically express the

105

messages that are exchanged during the discovery task, since it may constitute multiple phases. Furthermore, each message may cause different behaviour at different nodes de- pending on the content of the message and the state of the node (i.e. messages that it has previously received, its ID, any timers that it has set up, etc.). Furthermore, the topology of the network may be random, making it difficult to determine which nodes might receive each message. Thus, modelling a routing protocol becomes difficult if not impossible.

5.3.2 Physical Deployments

Physical deployments of nodes are not commonly used for examining the effectiveness of routing protocols, due to difficulties with keeping conditions constant. Since radio communications can be affected by the physical environment, node positions and battery voltages, it would be almost impossible to keep these parameters constant across a series of execu- tions of an experiment.

5.3.3 Simulation

Simulation has the advantage of allowing experiments to be carried out in specific scenarios in reproducible circumstances. Even therandom seedsthat govern the unreliable nature of radio communications or the random readings from node sensors can be reproduced. This permits one routing protocol to be compared to another in identical circumstances, allowing a fair comparison.

Another advantage of simulation is that the virtual environment can be fully controlled by the user. Simulations may also be easily carried out in a virtual environment that would be impractical to run in real life. For example, it would be infeasible to determine how a WSN responds to detecting the boundary of a real forest fire. However, in a virtual environment, the forest fire can be created many times over and controlled in any way the user desires. An unlimited number of nodes can be created and placed, either randomly or in a fixed pattern whereas in a real deployment, budgetary constraints would limit these factors. It is important to ensure that the simulations are executed on a single architecture in order to precisely reproduce each experiment. Running the same experiment on different architectures may produce unknown results, even if random seeds remain the same.

5.3.4 Chosen Methodology

Since physical deployments make reproducibility of experiments too difficult and an algebraic analysis of routing protocols may require unrealistic assumptions regarding the network, this thesis has carried out the experimental analysis of routing heuristics and protocols by the use of simulation.

Several choices of simulator were available. These included:

• TOSSIM [70] and the PowerTOSSIM extension [104] for power analysis,

• Atemu [52],

• ns-2 [34], and

• Castalia [90], an extension of the OMNeT++ platform [116].

TOSSIM [70] is a simulator for the TinyOS [49] operating system. The analysis of power usage is not possible with TOSSIM. However, the analysis can be accomplished with the use of the PowerTOSSIM extension [104]. Performing analysis of a program’s power usage using the PowerTOSSIM extension requires the program to be compiled for both the PC architecture and the architecture being simulated, i.e. a node. PowerTOSSIM then cross references the code produced for the WSN architecture with that generated for the PC and thereby calculates the number of processor cycles on the node required for each instruction block being executed on the PC. If the energy expenditure per processor cycle is known for the simulated architecture, the cost of processing can be calculated as the program is simulated, allowing a power analysis of the program to be carried out.

PowerTOSSIM also provides a number of plug-in replacements for TinyOS components that allow the program to be run on PC rather than a node. For example, PowerTOSSIM provides replacement radio and sensor modules. These replacement modules permit power analysis by recording the number of times each module is accessed. As with computation, if the power consumption of each hardware component is known, the energy expended by the program as a result of using the simulated hardware can be calculated. The radio layer provided by TOSSIM seems to be limited to a probabilistic bit error model where the link between each pair of nodes must have a specified probability of error.

107

Atemu [52] is an instruction level simulator, i.e. an emulator, which supports the AVR processor and several peripheral devices. As with TOSSIM, it allows the simulation of a WSN application without requiring the application to be rewritten or ported to another language. The two simulators differ in that TOSSIM emulates TinyOS applications in which node behaviour is bundled together with the TinyOS operating system. Conversely, code that is emulated by Atemu only includes an operating system if the programmer manually links it in. Atemu emulates each individual compiled instruction as it would be executed on a hardware platform. This is useful for the verification of code which is due to be installed on a WSN deployment, but as each instruction must be interpreted by the simulator, it can be up to 20 times slower than the code cross referencing approach of PowerTOSSIM [104]. ns-2 [34] is a discrete event simulator which is written in C++. It is widely used in the simulation of routing protocols. Unlike TOSSIM and Atemu, which only execute a piece of code, ns-2 also allows the user to define simulation parameters. The parameters allow the user to specify what happens to the network over a period of time. For example, the parameters might indicate the initial positions of nodes, the occurrence of an event, the movement of nodes or the sudden death of nodes. By separately defining the parameters, the programmer can examine the effect of a routing protocol under particular circumstances without having to hard-code this information in the application code being run. In ns-2, parameter files are written in TCL [85].

Castalia [90] is an extension to the OMNeT++ platform [116]. As with ns-2, the application code and simulation configurations are separated. Simulated code is written in C++ and Castalia provides a number of text-based configuration files, which can be modified to specify simulation parameters. The communications model of Castalia is flexible, and can be used to represent both ideal radios and realistic radios. The communications model was validated against real deployments in order to verify its correctness. Finally, Castalia has the advantage that it was specifically created with WSNs in mind.

Having examined these possible simulators, Castalia was chosen to carry out the simulations, for the reasons given below.

Firstly, the simulator was designed for the domain of WSNs. Consequently, it is likely that any assumptions made by the authors of Castalia would be appropriate for the simulations being executed. Furthermore, the simulator comes with a number of configuration files which are suitable for WSNs such as the TMote hardware and the CC2420 radio.

Secondly, Castalia provides a realistic radio model, which has been based on empirical data from WSN deployments. Consequently, the realistic radio model would be expected to be accurate and reflect real-life deployments. This latter point is particularly important, since routing protocols that perform well when communication is perfect often perform poorly when used with a more realistic mode of communication [109]. Consequently, it is desirable to examine routing protocols using radio models that are as realistic as possible. Finally, any results that are attained by using Castalia can be considered to be reliable since the simulator has been validated by its creators.

The experiments in this thesis were carried out using Castalia 1.3 [5] which extends OM- NeT++ 3.3 [84] and was the most recent version available at the time.

5.4 Simulator Configuration

The following sections discuss the configuration of Castalia used in this thesis.

5.4.1 Node Connectivity

Node or networkconnectivityrefers to the ability of nodes to communicate with one another. It reflects the communication network that exists, as opposed to the physicalplace- mentof nodes or the forced networktopologyof nodes, which might dictate how nodes are ordered to communicate with one another. A connectivity map can be used to represent the connectivities of nodes throughout the network.

In all experiments, nodes were configured into a single, large, unpartitioned network. To properly analyse the effect of an increasing network size, it was important that the deployed nodes acted as a single network rather than a collection of smaller independent networks. Nodes were placed in such a manner that each node was indirectly connected to every other node in the network, assuming an ideal model of communication existed between the nodes.

The Castalia 1.3 User Manual [4] states that the realistic wireless channel model used in Castalia can be expressed by Equation 5.6 where P L(d) is the path (signal) loss at a distancedmetres andηis the path (signal) loss exponent. P L(d0)is the known path loss

109

at a known distanced0 metres away from the transmission source and Xσ is a zero-mean

random variable with a standard deviation ofσ.

P L(d) =P L(d0) + 10ηlog( d d0

) +Xσ (5.6)

The average effect of the zero-mean random variable across all radio communications is zero and so shall be disregarded here. For the CC2420 radio, which is part of the TMote Sky [101], Castalia states a path loss of 55 dBm at a range of 1m and a path loss exponent of 2.4. Castalia further defines the receiver sensitivity of the CC2420 radio to be -95 dBm and the maximum transmission power to be 0 dBm. Therefore, for a node to receive another node’s transmission sent at maximum power, the path loss must be less than 95 dBm. Using these values and Equation 5.6 to solve ford, the distance at which a signal cannot be received, provides Equation 5.7. The solution to the equation is that d =1053 which is approximately

46.42m.

95 = 55 + 24 log(d

1) (5.7)

In each of the experiments in this thesis, each node was placed no greater than 46.42 metres from another node in order to theoretically ensure network connectivity. In practice, the nature of radio communication is largely random and prone to interference from other nodes as already discussed in Section 2.5.2. Therefore, this requirement may not ensure network connectivity in a realistic scenario.

5.4.2 MAC Protocol

The MAC protocol has a large effect on both the probability of successful radio communication and also the amount of energy consumed by the radio. In particular, a WSN MAC protocol may control theduty cycle, which is defined as the proportion of time that a node spends listening or receiving transmissions rather than in a low power state. A MAC protocol may also carry out carrier sensing, which attempts to reduce radio interference by waiting for an absence of local radio traffic before beginning a transmission. Other factors

In document Node reliance : an approach to extending the lifetime of wireless sensor networks (Page 129-140)