Application-Aware Data Collection in Wireless Sensor Networks

(1)

Application-Aware Data Collection in Wireless

Sensor Networks

Xiaolin Fang

*

_{, Hong Gao}

*

_{, Jianzhong Li}

*

_{, and Yingshu Li}

+*

*

_{School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China}

+

_{Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA}

{

xlforu, honggao, lijzh

}

@hit.edu.cn, [email protected]

Abstract—Data sharing for data collection among multiple applications is an efficient way to reduce the communication cost of Wireless Sensor Networks (WSNs). This paper is the first work to introduce the interval data sharing problem which is to inves-tigate how to transmit as less data as possible over the network, and meanwhile the transmitted data satisfies the requirements of all the applications. Different from current studies where each application requires a single data sampling during each task, we study the problem where each application requires a continuous interval of data sampling in each task instead. The proposed problem is a nonlinear nonconvex optimization problem. In order to lower the high complexity for solving a nonlinear nonconvex optimization problem in resource restricted sensor nodes, a 2-factor approximation algorithm whose time complexity isO(n2)

and memory complexity isO(n)is provided. A special instance of this problem is also analyzed. This special instance can be solved with a dynamic programming algorithm in polynomial time, which gives an optimal result inO(n2)time complexity andO(n)

memory complexity. We evaluate the proposed algorithms with TOSSIM, a widely used simulation tool in WSNs. Theoretical analysis and simulation results both demonstrate the effectiveness of the proposed algorithms.

I. INTRODUCTION

WSN deployment is a difficult and time-consuming work which requires much manpower or mechanical power. Once a network is deployed, it is expected to run for a long time without any human interruption. Therefore it is inefficient to carry out only one application in a network. Sharing a network for multiple applications can significantly improve network utilization efficiency [1], [2], [3], [4], [5], [6], [7], [8]. Currently, it is popular for a set of applications to share one network collecting data. Each node in the network samples at a particular frequency and the sampled data is transmitted to the base station through multi-hops. All the applications prefer to receive all the sampled data. However, if all the sampled data is transmitted to the base station, the communication cost will be high and network lifetime will be reduced. Fortunately, there may be some applications monitoring the same physical attributes. In this case, a certain amount of data may not need to be repeatedly transmitted back to the base station.

Under the above mentioned scenario, carefully designed data sharing algorithms are desired. Tavakoliet al.[9] propose a data sampling algorithm for each node, such that the sampled data can be shared by as many applications as possible. Mean-while, the amount of sampled data at each node is reduced to

a maximum level, reducing the overall communication cost. In [9], each application consists of a set of tasks. In each task, each node samples data once. As shown in Fig.1, there are two applications running on this node. TaskT1 is for the first

application, and TaskT2is for the second one.T1andT2may

overlap on the time axis, and both of them need to sample data once. A naive method is to sample data independently,e.g.s1

is sampled byT1ands2is sampled byT2as shown in Fig.1a,

resulting in two sampling data s1 and s2. In [9], the authors

designed a greedy algorithm such that only one data sampling can serve both applications as shown in Fig.1b.

T1

T2

s2 s1

(a) independent sampling

T1

T2

s'1 (b) greedy sampling

Fig. 1: Data sampling for a time point

In many applications, data needs to be sampled for a continuous interval as shown in Fig.2, instead of sampling at a particular time point. For example, railway monitoring systems which collect acoustic information [10], [11] need to sample data for a continuous interval. Volcanic and earth-quake monitoring systems [12], [13], [14] also have such a requirement to measure vibrations. Habitat monitoring systems for microclimate , plant physiology and animal behavior [15], [16], [17] need to record wind speed and take video of animal behaviors, which again require to sample data for a continuous interval.

T1

T2

s s

(a) independent sampling

T1

T2

s

(b) greedy sampling

Fig. 2: Data sampling for a continuous interval This paper studies theinterval data sharing problemof how to reduce the overall length of data sampling intervals which

(2)

could be shared by multiple applications. We assume there are multiple applications running on a same node, and each application consists of tasks. Each task requires to sample data for a continuous interval. In Fig.2, T1 is for the first

application, and T2 is for the second one. Both tasks need

to continuously sample data for an interval s. If two tasks sample data independently, two intervals of data with length

s need to be sampled as shown in Fig.2a. However, one

interval of data with length s is enough if the starting points of data sampling of these two applications can be intelligently arranged. The data sampling interval lengths for different applications may be different, and for the same application, tasks may have different data sampling interval lengths. The investigated problem in this paper is to minimize the overall data sampling interval length at each node while satisfying all the applications’ needs.

We formulate the aforementioned problem as a nonlinear nonconvex optimization problem. Since sensor nodes are re-source constrained, the cost to solve such a problem at each node is very high. Therefore, we propose a 2-factor greedy algorithm with time complexityO(n2₎_{and memory}

complex-ity O(n). We also consider a special instance where the data sampling interval lengths of all the tasks are the same. The special instance could be solved with a dynamic programming algorithm in polynomial time, whose time complexity isO(n2)

and memory complexity is O(n). The contributions of this paper are as follows.

• This is the first work to study the interval data sharing problem, where each node samples data for a continuous interval instead of sampling a discrete data point. This problem is formulated as a nonlinear nonconvex program-ming problem.

• A greedy approximation algorithm is proposed to solve the problem so as to reduce the cost of solving the nonlin-ear nonconvex optimization problem at resource restricted sensor nodes. The proposed algorithm is proved to be a 2-factor approximation algorithm. The time complexity of this algorithm is O(n2₎_{, and the memory complexity}

is O(n).

• We also analyze a special instance of the interval data sharing problem. We give a dynamic programming al-gorithm which gives an optimal result in polynomial time. The time complexity is O(n2) and the memory complexity is O(n).

• Extensive simulations were conducted to validate the correctness and effectiveness of our algorithms.

The rest of this paper is organized as follows. Section 2 re-views the related works. Section 3 formally defines the interval data sharing problem. Section 4 gives an algorithm to solve the problem and the approximation ratio is analyzed. A special instance is investigated in Section 5. A dynamic programming algorithm is also presented in this section to address the special instance. Performance evaluations are shown in Section 6 and Section 7 concludes this paper.

II. RELATED WORKS

Multi-query optimization in database systems studies how to efficiently process queries with commmon sub-expressions [18], [19]. It aims at exploiting the common sub-expression of SQLs to reduce query cost which is different from our problem. S. Krishnamurthyet al.[20] consider the problem of data sharing in data streaming system for aggregate queries. They studies the min, max, sum and count-like aggregation queries. The stream is scanned at least once and is chopped into slices. Only the slices that overlap among multiple queries could be shared. Their studied problems are different from ours. We expect to reduce the number of sensor samplings at each node resulting in less communication cost. Our problem differs in that we want to provide each applications enough sampling data while minimize the total number of samplings. Query optimization in WSNs [2], [21] usually tries to find in-network schemes or distributed algorithms to reduce communication cost for aggregation queries. While our work focuses on reducing the amount of transmitted data for each node.

The most relative work of this paper is [9]. It studies the problem of data sharing among multiple applications. This work assumes each applications only needs discrete data point samplings. While in our problem the applications may require an continuous interval of data. The proposed solution in [9] could not be applied to our problem. However, our solution can solve their problem.

III. PROBLEMDEFINITION

In order to make our problem clear, we first introduce an example as shown in Fig.3. We have two applications, and each application consists of many tasks. Application A1

requires an interval of data of length l1 during each task

duration, and A2 requires an interval of data of length l2

during each task duration. The task duration lengths ofA1and A2 are different as shown in Fig.3. Application A1 consists

of tasks T11,T12,· · ·,T1i, and so on. Application A2 includes

tasksT21,T22,· · ·,T2j, and so on. Take tasks T11,T12,T13,T21

and T22 as examples. The optimal solution is shown in the

bottom part of Fig.3. TasksT11, T12andT13pick the intervals I11, I12andI13respectively. The intervalsI11, I12andI13are

all of lengthl1. TasksT21 andT22 pick the intervalsI21 and I22 respectively. The intervalsI21 andI22 are both of length l2. The optimal solution gives a result of lengths1+s2in this

example, as shown in the bottom part of Fig.3, where the task are sorted according the ascending order of the ending time of the tasks.

Sensor data within the overlap of multiple tasks could be shared by these tasks. We aim at minimizing the overall length of the data intervals. Before the description of our problem, we give some preliminary definition which will be used later.

Definition 1. Define I]I0 as the union of two intervals or

interval setsI andI0. For example,[1,5]][3,7] = [1,7], and {[1,3],[5,7]} ][3,5] = [1,7].

(3)

T22 I22 A1 A2 T21 T22 T11 T12 T13 T11 T12 T21 l1 l2 s1 s2 I11 I21 T13 I13 I12 I11 I21 I12 I22 I13

Fig. 3: Interval data sampling for multi-applications

Definition 2. Define I_CI0 as the overlap of two intervals I

and I0. For example,[1,5]_C[3,7] = [3,5].

Definition 3. Define |I| as the length of the interval I or

the length of the union of the interval in set I. For example, |[1,5]|= 4,|[1,5]][3,7]|= 6, and |[1,3]_C[5,7]|= 2.

Definition 4. I_FI0 means intervalI is a sub-interval ofI0.

For example, [2,3]_F[1,5].

Given a set of n tasks T = {Ti}, i = 1,2,· · ·, n. Each task Ti is a three-tuple Ti=hbi, ei, lii, where bi denotes the beginning time, ei represents the end time, and li means that

Ti needs an interval of data with lengthli. It’s assumed that

li≤ei−bi. The problem is to find a continuous sub-interval

Ii in interval[bi, ei],i.e.IiF[bi, ei], for every task satisfying

|Ii|=li, so that the length of the union of all the sub-intervals

on the time axis is minimized,i.e.| ]n

i=1Ii|is minimum. Note that the sub-interval Ii is continuous.

The bottom part of Figure 3 illustrates an example. Since sensor nodes have limited communication and computational capabilities, we want to find a set of sub-intervalsI11,I21,I12, I22 andI13 for tasksT11,T21,T12,T22 andT13 respectively,

such that |I11]I21]I12 ]I22]I13| is minimum. In the

example shown in Fig.3, the optimal solution is s1+s2, all

the tasks could derive the data they need from the two data intervals s1 ands2.

We now formally define theinterval data sharingproblem.

Definition 5. Given a set ofntasksT, each taskTiis a

three-tupleTi=hbi, ei, lii, that is, each taskTihas a beginning time

bi, an end time ei, and an data sampling interval length li, the problem is to find a continuous sub-interval Ii for each task so as to min| n ] i=1 Ii| (1) s.t. IiF[bi, ei], i= 1,2,· · ·, n (2) |Ii|=li, i= 1,2,· · ·, n (3) The objective function of this problem is non-linear. So if bi, ei and li are real numbers, the problem is a non-linear programming problem which has no efficient universal solution [22]. It is easy to find that the objective function is nonconvex [23]. Several methods are available for solving nonconvex optimization problems. For example, one approach is to use special formulations of linear programming prob-lems. Another method involves the use of branch and bound techniques, where the program is divided into subclasses to be solved with convex or linear approximations that form a lower bound on the overall cost within the subdivision. However, all these methods require high computational complexity which are impractical to be implemented on sensor nodes. Since digital signals are discrete, the data intervals can be regarded as integer sequences. Therefore,bi,ei andli can be regarded as integers. The integer variables make the problem a nonlinear integer programming problem [24], [25] which is hard to be solved.

IV. A 2-FACTOR APPROXIMATION ALGORITHM

A naive method is to initiate a continuous data sampling interval at the beginning time of each task independently. However, this method results in a large amount of data. In this section, we present a greedy algorithm which is a 2-factor approximation algorithm for our interval data sharing problem. Before we present the approximation algorithm, we propose a solution for the special case where every task overlaps with each other.

A. Tasks Overlapped with Each Other

For ease of understanding, we first define satisfied as follows.

Definition 6. We say that an intervalIis satisfiedfor a task

Ti if|IC[bi, ei]| ≥li. An interval setS issatisfiedfor a task

Tiif there exists an intervalIinSsuch that|IC[bi, ei]| ≥li. If all the tasks overlap with each other, then the interval data sharing problem can be solved in polynomial time. An algorithm is presented as follows.

Step 1: Sort the tasks in ascending order by their end times. Step 2: Pick the sub-interval of length l1 at the end of the

first taskT1,i.e.pick the sub-interval [e1−l1, e1].

Step 3: Pick a sub-interval for each task from the second to the last. Take Ti as an example, if the union of the picked sub-intervals is satisfied for Ti, do nothing and continue to pick a sub-interval for the next taskTi+1. If it is not satisfied

forTi, extend forward from the tail of picked sub-intervals. If

it is still not satisfied for Ti, extend backward from the head of the picked sub-intervals.

The pseudo code for tasks overlapped with each other is described in Algorithm 1. Take Fig.4 as an example. TaskT1, T2 and T3 overlap with each other. T1 needs a data interval

(4)

Algorithm 1: SOLVE-OVERLAP(T)

Input:T ={T1, T2,· · ·, Tn},Ti=hbi, ei, liifori= 1,2,· · ·, n [bi, ei]C[bj, ej]6=∅,∀i, j= 1,2,· · ·, n

Output: Find a minimum intervalIthat is satisfied for all tasks.

1: Sort tasks in ascending order by end time. Assume that the sorted tasks set isT ={Tk1, Tk2,· · ·, Tkn}

2: s=ek1−lk1; 3: e=ek1;

4: forifrom 2 tondo

5: if[s, e]is satisfied forTkithen

6: continue; 7: else

8: lete= min{s+lki, eki};

9: if[s, e]is satisfied forTkithen

10: continue;

11: else

12: lets=e−lki;

13: return I= [s, e];

Fig. 4: Example of tasks overlapped with each other

of length l1 = 4, T2 needs an interval of length l2 = 3, and T3 needs an interval of length l3 = 9. First, the tasks are

sorted in ascending order by their end times. Second, pick the sub-interval with length 4 at the end of T1. The picked

interval forT1 isI= [7,11]. Third,I is satisfied for taskT2,

so nothing is done for T2. Forth, I is not satisfied for task T3, thus, I is extended forward until the end time of T3, at

this time I = [7,14]. But I is still not satisfied for T3, I is

then extended backward from the head of the picked interval to get I = [5,14] which is satisfied for all these three tasks. The time complexity is O(nlogn) due to sorting step. If the tasks are pre-sorted, the time complexity isO(n).

One can find that, the optimal interval I = [s, e] for tasks overlapped with each other can be also obtained by another method. The optimal interval I = [s, e] is get from the following equations. s= n min i=1{ei−li} (4) e= max{maxn i=1{bi+li}, n max i=1{s+li}, n min i=1{ei}} (5)

The second method is described in Algorithm 2 which will get the same result as Algorithm 1. This algorithm consists of two phases. Let us take Fig.4 as an example again. In the first phase, it needs to find the beginning time s. In this example, sis the minimumei−li, and it is easy to find that

s= 5. In the second phase, we find thate= 14which is the maximum s+li in this example. Thus, the optimal interval is obtained as [5,14]. As we can see, the case where tasks overlap with each other could be solved in time complexity of O(2n) = O(n) with Algorithm 2. This algorithm does

Algorithm 2: SOLVE-OVERLAP-B(T)

Input:T={T1, T2,· · ·, Tn},Ti=hbi, ei, liifori= 1,2,· · ·, n [bi, ei]C[bj, ej]6=∅,∀i, j= 1,2,· · ·, n

Output: Find a minimum intervalIthat is satisfied for all tasks. 1: s=∞; 2: emin=∞; 3: forifrom 1 tondo 4: ifs > ei−lithen 5: s=ei−li; 6: ifemin> eithen 7: emin=ei; 8: e=emin; 9: forifrom 1 tondo 10: ife < bi+lithen 11: e=bi+li; 12: ife < s+lithen 13: e=s+li; 14: return I= [s, e];

not require a sort step. However, if the tasks are pre-sorted, Algorithm 1 is no worse than Algorithm 2. As shown in the later section, our approximation algorithm pre-sorts the tasks, so either algorithm could be used as a sub-process in our following approximation algorithm.

B. 2-factor Approximation Algorithm

We now present our greedy approximation algorithm. First, sort all tasks by the end time in ascending order. Second, identify a subset of tasks that overlap withT1, and meanwhile,

these tasks overlap with each other. Find the minimum interval that could be shared by the these tasks identified last step by using Algorithm 1 described earlier. Third, remove the previously identified tasks including T1. Repeat the second

and the third steps for the remaining tasks until all tasks are removed. One can refer to Algorithm 3 for the detailed process.

Fig.5 illustrates the process of the greedy approximation algorithm. The five tasks are sorted in ascending order by end time. In the first step, task T1, T2 andT5 are identified as a

subset of tasks that overlap with each other. One can find that, if the tasks are sorted by end time, all the tasks which overlap withT1 also overlap with each other. Now, Algorithm 1 can

be used to compute the interval that is satisfied for these three tasks. After that, the three tasksT1,T2andT5are removed. In

the second step,T3 andT4 are identified as a subset of tasks

that overlap with each other. Now, Algorithm 1 is employed again to compute the interval that is satisfied for these two tasks. The union of the two found intervals is the final result of this example returned by Algorithm 3.

T1

T2

T3

T4

T5

(5)

Algorithm 3: GREEDY-APPROX(T)

Input:T ={T1, T2,· · ·, Tn},Ti=hbi, ei, liifori= 1,2,· · ·, n Output: Find a set of intervalsIthat is satisfied for all tasks.

1: Sort tasks in ascending order by the end time. Assume that the sorted tasks isT ={Tk1, Tk2,· · ·, Tkn};

2: I=∅

3: whileT 6=∅do 4: T0₌_∅_;

5: let the first task inT beTkf;

6: let the set of tasks which overlap withTkf inT beTo;

7: addTkf andTointoT

0_;

/*note that tasks inT0_{overlap with each other.*/} 8: I0=SOLVE-OVERLAP(T0);

9: I=I]I0_; 10: removeT0 fromT;

/*note thatT is still sorted after the removing step.*/ 11: return I;

Theorem 1. Algorithm 3 is a 2-factor approximation

algo-rithm.

Proof: Assume that there are m tasks left, i.e. T =

{Ti1, Ti2,· · · , Tim}, and they are sorted in ascending order

by end time. Ti1 needs an interval Ii1 of length at least li1. Assume that a task Tij overlaps with Ti1, and it needs an interval Iij of length lij. Algorithm 3 will derive a result

interval which contains Ii1 and Iij. In the worst situation,

Algorithm 3 may derive another intervalI whose length is no shorter than lij in the later steps, i.e. |I| ≥ lij. In this case I is satisfied for Tij and meanwhile it is satisfied for some

later tasks. So in the worst situation, Algorithm 3 will give a result of length|Ii1]Iij]I|for tasksTi1 andTij. However,

there may exist an optimal solution, where Ii1 is satisfied for

Ti1, and I is satisfied forTij and some later tasks. Thus, an

optimal solution may derive a result of length |Ii1 ]I| for tasks Ti1 andTij. Therefore, we have

|Ii1]Iij ]I| |Ii1]I| ≤ |Ii1]Iij|+|I| |Ii1|+|I| (6) ≤ |Ii1|+|Iij|+|I| |Ii1|+|I| (7) ≤ |Ii1|+|Iij|+|Iij| |Ii1|+|Iij| (8) = li1+ 2lij li1+lij (9) ≤2 (10)

A tight example is shown in Fig.6. Algorithm 3 returns a result of length2las shown in Fig.6a, while the optimal result is of lengthε+las shown in Fig.6b. Thus, lim

ε→0

2l

ε+l = 2. The time complexity of Algorithm 3 is O(n2₎ _{due to the step of}

identifying tasks which overlap with the first remaining task in each iteration.

Property 1. Let Tm be the task with the minimum end time,

T1 T2 T3 ε l l

(a) greedy result

T1

T2 T3

ε l

(b) optimal result

Fig. 6: A tight example of Algorithm 3

i.e. em = n

min

i=1 ei. Then sub-interval [em−lm, em] does not

result in a worse result.

Proof: One can find that, if the sub-interval Im =

[em−lm, em]is picked, there are two possible cases: no task overlaps withTmor some other task overlaps withTm.

In the first case, it is apparent that [em−lm, em] is a best solution.

In the second case, as shown in Fig.7, three sub-cases exist. Let Iopt be an optimal solution. Let I0 be the union of the picked sub-intervals excludingImin an optimal solution, i.e.

Iopt=Im]I0. In the first sub-case,I0 coversIm, so|Iopt|= |I0|. In this sub-case, Im does not contribute to the optimal result, thus, any sub-interval of lengthlm in[bm, em]CI

0 _{is a} good choice, and [em−lm, em] is one of the choices. In the second sub-case, I0 _{overlaps with} _I_{m, so} _|_I_opt| ₌_|_I

m]I0|.

[em−lm, em]is the best choice which will reduce the length of the union of the picked intervals in this sub-case. In the third sub-case,I0 does not overlap withIm, so|Iopt|=|Im|+|I0|. Any sub-interval of length lm between [bm, em] is a choice, and[em−lm, em] is one of the choices. Thus the property is proved. Tm Im I' (a) coveringIm Tm Im I' (b) overlapping with Im Tm Im I' (c) not overlapping withIm

Fig. 7: Task with minimum end time

V. MULTIPLE TASKS WITH SAME DATA SAMPLING INTERVAL LENGTH

In this section, we study a special instance of the interval data sharing problem where the length of the data sampling interval of all tasks is the same. Different from the general problem, this special instance can be solved with a dynamic algorithm.

Given a set of tasks T ={T1, T2,· · ·, Tn} and a positive

integer l, each task Ti is denoted as Ti = hbi, ei, li, where

bi is the beginning time andei is the end time. The problem is to pick a continuous sub-interval of length l for each task

Ti in[bi, ei], so that the length of the union of all the picked sub-intervals on the time axis is minimized.

Definition 7. In the same data sampling interval length

situation, a task Ti covers Tj if [bj, ej] is a sub-interval of

(6)

One can find that in the same data sampling interval length situation, tasks which cover some other task can be removed. This is because any interval that is satisfied for the covered shorter task must be satisfied for the longer task. In Fig.8a, taskT2coversT1. If they have the same data sampling interval

length, then any interval I that is satisfied forT1 is satisfied

for task T2. Therefore, we do not have to consider task T2,

andT2 can be removed in our algorithm. As shown in Fig.8b,

we will get the same result after removing T2.

T1

I

T3

s1

T2

(a) before removing

T1

I

T3

s1 (b) after removing

Fig. 8: Example of covering

Property 2. Let the data sampling interval length of all tasks

be the same. If Ti covers Tj, i.e. [bj, ej] F [bi, ei] for any

i, j = 1,2,· · ·, n, then any interval that is satisfied for Tj is satisfied forTi.

After removing the tasks which cover other tasks, the prob-lem could be solved with a dynamic programming algorithm. Let T0 ={T₁0, T₂0,· · ·, T_m0 } be the set of tasks any of which does not cover some other task. Assume that T₁0, T₂0,· · ·, T_m0

are sorted in ascending order by end time. Let I(i, j)be the interval that is satisfied for bothT_i0andT_j0, whereT_i0 overlaps withT_j0,i.e. T_i0_CT_j06=∅,i≤j. We have

I(i, j) = [e0_i−l, e0_i] if b0_j ≤e0_i−l, e0 i−l, b0j+l if e0 i−l < b0j < e0i. (11) It is easy to getI(i, j)in Equation.11 from Fig.9. There are only two cases whenT_i0 overlaps withT_j0. In the first case,T_j0

covers interval [e0_i−l, e0_i] as shown in Fig.9a, thenI(i, j) = [e0_i−l, e0_i]. In the second case,T_j0 overlaps with interval[e0_i−

l, e0

i]as shown in Fig.9b, thenI(i, j) = e0 i−l, b0j+l . T'i l T'j b'i e'i ... l b'j e'j (a) case 1 T'i l T'j b'i e'i ... l b'j e'j (b) case 2

Fig. 9: Illustration of computingI(i, j)

It is obvious that I(i, j) is satisfied for all tasks

T0

i, Ti0+1,· · ·, Tj0. This is because that ifTi0 overlaps withTj0,

T_i0 also overlaps with tasks from T_i0₊₁ to T_j−0 ₁, because the tasks are sorted in ascending order by end time.

Letf(i)be the result with minimum length of the union of result from tasksT_i0, T_i0₊₁,· · ·, T_m0 , where[e0_i−l, e0_i]is picked. Letg(i)be the indexxwhich results in the minimum length of the union of result from tasks T_i0, T_i0₊₁,· · ·, T_m0 . Theng(i)

andf(i)could be represented as follows.

Algorithm 4: SOLVE-COMMON-WEIGHT(T)

Input:T={T1, T2,· · ·, Tn},Ti=hbi, ei, lifori= 1,2,· · ·, n; Output: Find a set of intervalsIthat is satisfied for all tasks.

1: Sort tasks in ascending order by end time. Assume that the sorted task set isT ={Tk1, Tk2,· · ·, Tkn};

2: bmax=−∞;

/*The following loop removes the tasks which cover some other tasks.*/ 3: fori= 1ton do

4: ifbki≤bmaxthen

5: removeTki fromT;

6: else

7: bmax=bki;

8: Let the task set after the removing step beT0={T10, T20,· · ·, Tm0}, assume that it is sorted in ascending order by end time;

9: computeI(i, j)for1≤i≤j≤m; 10: f(m+ 1) =∅; 11: fori=mto 1do 12: lmin=∞; 13: g(i) =−1; 14: forx=itomdo 15: ifb0 x≥e0i then

16: break; /*do nothing forTx0 that does not overlap withTi0*/

17: if|I(i, x)]f(x+ 1)|< lminthen

18: g(i) =x;

19: lmin=|I(i, x)]f(x+ 1)|;

20: f(i) =I(i, g(i))]f(g(i) + 1); 21: I=f(1); 22: return I; g(i) = ( _{arg min} T0 ioverlapTx0,i≤x≤m {|I(i, x)]f(x+ 1)|} 1≤i < m m i=m (12) f(i) =    I(i, g(i))]f(g(i) + 1) 1≤i < m [e0m−l, e0m] i=m ∅ i > m (13) An example is shown in Fig.10, and the process of this example is presented in Table I. By Equation (11), we derive

I(i, j) in Table Ia. And f(i) is obtained in Table Ib from Equations (12) and (13). As represented in Equation (13), we getf(5) =∅ first. By recalling the definition off(i),f(4) =

I(4,4) = [e0₄−l, e0₄]. Thenf(3)is the one with less length of the union of the intervals betweenI(3,3)]f(4)andI(3,4)]

f(5), thus we getf(3) =I(3,4). After that, f(2) is the one with less length of the union of the intervals betweenI(2,2)]

f(3)andI(2,3)]f(4), and we obtainf(2) =I(2,2)]f(3). Finally, f(1)is the one with least length of the union of the intervals amongI(1,1)]f(2),I(1,2)]f(3)andI(1,3)]f(4), and we getf(1) =I(1,2)]f(3). The dynamic programming algorithm is described in Algorithm 4.

In Algorithm 4, the tasks are sorted in ascending order by end time in line 1. Lines 2-7 remove the tasks which cover other task. f(i) is computed in lines 10-20. Line 15 checks whether T_x0 overlaps with T_i0. If T_x0 does not overlap with

T_i0, nothing is done. Break the loop because all the later tasks will not overlap withT_i0. IfT_x0 overlaps withT_i0, the algorithm needs to record the best index g(i)and the minimum result. The final result isf(1).

(7)

T'1 l=4 T'2 T'3 0 5 10 15 T'4

Fig. 10: An example for Algorithm 4

In lines 10-20 of Algorithm 4, it seems that it needs O(n)

memory to record one f(i), andO(n2₎_{memory to record all} f(i). As the memory is a restrict resource at each sensor node, it needs to find a way to reduce the memory usage. Actually, we only want to compute f(1), and it doesn’t have to record everyf(i). It only needs to recordg(i)and|f(i)|, and we can recoverf(1)fromg(i)at the end of the algorithm when every

g(i)is found. Therefore, lines 10-22 in Algorithm 4 could be replaced with Algorithm 5.

In Algorithm 5, lines 7-11 compute the overlap length of

I(i, k)andf(k+1), so as to get the length ofI(i, k)]f(k+1)

in line 12. Algorithm 5’s memory complexity isO(n). Above all, it is easy to find that the special instance where tasks have same data sampling interval length could be addressed in time complexity O(n2₎_{and memory complexity}_O₍_n₎_.

Algorithm 5: COMPUTE-f(1) Input:T0={T₁0, T₂0,· · ·, Tm0 },Ti0=hb 0 i, e 0 i, lifori= 1,2,· · ·, m; Output: Find a set of intervalsIthat is satisfied for all tasks.

1: |f(m+ 1)|= 0; 2: fori=mto 1do 3: lmin=∞; 4: g(i) =−1; 5: forx=itomdo 6: ifb0x< e0i then 7: assumeI(i, x) = [six, eix]; 8: ife0_x₊₁−l < eixthen 9: overlap=eix−(e0x+1−l); 10: else 11: overlap= 0;

12: if|I(i, x)|+|f(x+ 1)| −overlap < lminthen

13: g(i) =x;

14: lmin=|I(i, x)|+|f(x+ 1)| −overlap;

15: |f(i)|=lmin; 16: I=∅; 17: k= 1; 18: whilek≤mdo 19: I=I]I(k, g(k)); 20: k=g(k) + 1; 21: return I;

VI. PERFORMANCE EVALUATION

We evaluate the effectiveness of the proposed algorithms above through simulations in this section. The simulations are implemented with TOSSIM [26] which is a widely used simulation tool in wireless sensor networks. Four cases are tested in this section. In each case of our experiments, four applications, each with different task durations and different data sampling interval lengths are tested. In the first case, the task durations of four applications are 11, 13, 17 and 19 unit

I(i, j) result I(1,1) [3,7] I(1,2) [3,8] I(1,3) [3,10] I(2,2) [5,9] I(2,3) [5,10] I(3,3) [12,16] I(3,4) [12,16] I(4,4) [14,18]

(a) Computing I(i,j)

f(i) result f(4) I(4,4) f(3) I(3,4) f(2) I(2,2)]f(3) f(1) I(1,2)]f(3) (b) Computing f(i)

TABLE I: Computing I(i,j) and f(i) for example in Fig.10

time respectively. The task durations are 13, 17, 19 and 23 unit time respectively in the second case. The task durations of the third case are 17, 19, 23 and 29 unit time, and 19, 23, 29 and 31 for the forth case. We assume that sensor nodes can sample once and obtain one unit data in each unit time. The sensor

nodes run Algorithm 4 every maxT ime unit time, where

maxT imeis a parameter according to the computation ability of the sensor nodes. Higher computational ability allows larger

maxT ime. Our algorithm is compared with the naive method which is introduced in Section 4. The naive method initiates a continuous data sampling at the beginning of each task independently.

case 1 case 2 case 3 case 4 Different cases 0 20 40 60 80 100 120 140 160 The amount of sampled data optimal greedy naive

Fig. 11: Data amount for short interval lengths In the first set of simulations, we evaluate the performance of the proposed algorithms in terms of the amount of sampled data. The data sampling interval lengths for every case are 2, 3, 5, 7 unit time. It’s shown in Fig.11 that, the naive method samples much more data than the optimal solution, and it cannot be bounded. In this simulation, the maxT ime is set to 150 unit time. Our greedy algorithm samples more data than the optimal solution, but it is always no more than two times of the optimal result. Compared with the naive method, our algorithm samples almost 200% less data when the data sampling interval length is short. one can also find that, when the task duration increases, the amount of data sampled by both the naive and the greedy algorithm decreases.

In the second group of simulations, we test the case where the data sampling interval lengths are longer. In such a case, the naive method may sample data in every unit time. In this group of simulations, the data sampling interval lengths for

(8)

case 1 case 2 case 3 case 4 Different cases 0 20 40 60 80 100 120 140 160 The amount of sampled data optimal greedy naive

Fig. 12: Data amount for longer interval lengths

every case are 7, 11, 13, 17 unit time, and the maxT imeis still 150. The amount of data sampled by the optimal and the greedy method is much more when the data sampling interval lengths are longer, but it is still less than that of the naive method as shown in Fig.12.

The next group of simulations is to evaluate howmaxT ime

affects the amount of sampled data. The result is shown in Fig.13. The amount of data sampled changes slightly for

different maxT ime settings. As the maxT ime increases,

the amount of sampled data increases, however, the average amount of data does not vary a lot. This observation means that it is not necessary for the sensor nodes to take care of

a long maxT ime. A small maxT ime is already enough to

derive a good result.

100 110 120 130 140 150 160 170 180 maxTime 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 Data amount per unit time optimal greedy naive

Fig. 13: Data amount for different maxTime settings Next we evaluate the impact of the node density in a network on the amount of sampled data. Fig.14 illustrates the amount of data sent by source nodes and received by the base station. In our simulations, every sensor node in the network samples data independently, and the data is transmitted over the network through a routing tree. As the node density increases, the amount of sent data increases, but the amount of data received by the base station may decrease when the node density is very large. This is because data loss rate increases sharply due to unreliable wireless link and communication congestion in node-dense networks. In this simulation, when

the node number in the network is 160, the naive method loses almost half the sent sampled data. The greedy algorithm samples much less data, thus the traffic carried on the network is not quite heavy, and the data loss rate is much lower.

10 40 90 160 Nubmer of nodes 0 500 1000 1500 2000 2500 Data amount sent,naive received,naive sent,greedy received,greedy

Fig. 14: The amount of sent and received data Fig.15 shows how the amount of received data is affected by network scale. As the network scale increases, the amount of sent data increases. But the amount of received data may decreases when network scale is too large. As shown in Fig.15, the naive method samples much more data than the greedy and the optimal algorithms, which will result in severe data loss. The greedy algorithm transmits no more than two times of the amount of data than the optimal solution, so these two algorithms are not quite different.

50 100 150 200 250 Area width 0 200 400 600 800 1000 1200 The amount of receiv ed data naive greedy optimal

Fig. 15: The amount of received data

Fig.16 shows how the data loss rate is affected by network scale. The naive method and the greedy algorithm have a similar loss rate in small scale networks. But when the network scale is very large, the data loss rate of naive method is almost 70%. This is because the naive method samples a large amount of data which result in numerous collisions in large scale network. The optimal solution and the greedy algorithm which sample less amount of data show a better result.

VII. CONCLUSION

Data sharing for multiple applications is an efficient way to reduce the communication cost in WSNs. Many applications need an continuous interval of data sampling periodically. This

(9)

50 100 150 200 250 Area width 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Data loss rate greedy naive optimal

Fig. 16: Packet loss

paper is the first work to introduce the interval data sharing problem among multiple applications, which is a nonlinear nonconvex optimization problem. Since no efficient universal solution has been found for such problem, we provide a greedy approximation algorithm to lower the high computational com-plexity of the available solutions. We prove that the provided greedy algorithm is a 2-factor approximation algorithm. The time complexity of this approximation algorithm isO(n2)and the memory complexity isO(n). In a special instance where all tasks have the same data sampling interval length, the problem can be addressed in polynomial time, and a dynamic programming algorithm is provided for this special instance. The time complexity of the dynamic programming algorithm is O(n2₎_{and the memory complexity is}_O₍_n₎_.

ACKNOWLEDGMENT

This work was supported in part by the Major Program of National Natural Science Foundation of China under grant No. 61190115, the National Basic Research Program of China (973 Program) under grant No. 2012CB316200, and the National Natural Science Foundation of China (NSFC) under grants No. 61033015, No. 60933001 and No. 61100030.

REFERENCES

[1] W.I. Grosky, A. Kansal, S. Nath, Jie Liu, and Feng Zhao. Senseweb: An infrastructure for shared sensing. Multimedia, IEEE, 14(4):8–13, oct.-dec. 2007.

[2] Niki Trigoni, Yong Yao, Alan Demers, and Johannes Gehrke. Multi-query optimization for sensor networks. InIn DCOSS, pages 307–321, 2005.

[3] Ming Li, Tingxin Yan, Deepak Ganesan, Eric Lyons, Prashant Shenoy, Arun Venkataramani, and Michael Zink. Multi-user data sharing in radar sensor networks. InProceedings of the 5th international conference on Embedded networked sensor systems, SenSys ’07, pages 247–260, New York, NY, USA, 2007. ACM.

[4] You Xu, Abusayeed Saifullah, Yixin Chen, Chenyang Lu, and Sangeeta Bhattacharya. Near optimal multi-application allocation in shared sensor networks. InProceedings of the eleventh ACM international symposium on Mobile ad hoc networking and computing, MobiHoc ’10, pages 181– 190, New York, NY, USA, 2010. ACM.

[5] S. Ji and Z. Cai. Distributed data collection and its capacity in asynchronous wireless sensor networks. InProceedings of The 31st Annual IEEE International Conference on Computer Communications, IEEE INFOCOM, 2012.

[6] Z. Cai, S. Ji, and J. Li. Data caching based query processing in multi-sink wireless networks.International Journal of Sensor Networks, 11(2):109– 125, 2012.

[7] Z. Cai, S. Ji, J. He, and A. G. Bourgeois. Optimal distributed data collection for asynchronous cognitive radio networks. InProceedings of The 32nd International Conference on Distributed Computing Systems 2012, ICDCS, 2012.

[8] S. Cheng, J. Li, and Z. Cai. o()-approximation to physical world by sensor networks. In Proceedings of The 32nd IEEE International Conference on Computer Communications, IEEE INFOCOM 2013. [9] Arsalan Tavakoli, Aman Kansal, and Suman Nath. On-line sensing task

optimization for shared sensors. InProceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, IPSN ’10, pages 47–57, New York, NY, USA, 2010. ACM.

[10] S. Ganesan and R.D. Finch. Monitoring of rail forces by using acoustic signature inspection. Journal of Sound and Vibration, 114(2):165–171, 1987.

[11] M. Cerullo, G. Fazio, M. Fabbri, F. Muzi, and G. Sacerdoti. Acoustic signal processing to diagnose transiting electric trains.Intelligent Trans-portation Systems, IEEE Transactions on, 6(2):238–243, june 2005. [12] Liang Cheng and S.N. Pakzad. Agility of wireless sensor networks for

earthquake monitoring of bridges. InNetworked Sensing Systems (INSS), 2009 Sixth International Conference on, pages 1–4, june 2009. [13] Makoto Suzuki, Shunsuke Saruwatari, Narito Kurata, and Hiroyuki

Morikawa. A high-density earthquake monitoring system using wireless sensor networks. InSenSys, pages 373–374, 2007.

[14] Rui Tan, Guoliang Xing, Jinzhu Chen, Wen-Zhan Song, and Renjie Huang. Quality-driven volcanic earthquake detection using wireless sensor networks. InReal-Time Systems Symposium (RTSS), 2010 IEEE 31st, pages 271–280, 30 2010-dec. 3 2010.

[15] Alan Mainwaring, David Culler, Joseph Polastre, Robert Szewczyk, and John Anderson. Wireless sensor networks for habitat monitoring. In Proceedings of the 1st ACM international workshop on Wireless sensor networks and applications, WSNA ’02, pages 88–97, New York, NY, USA, 2002. ACM.

[16] Robert Szewczyk, Alan Mainwaring, Joseph Polastre, John Anderson, and David Culler. An analysis of a large scale habitat monitoring application. In Proceedings of the 2nd international conference on Embedded networked sensor systems, SenSys ’04, pages 214–226, New York, NY, USA, 2004. ACM.

[17] Robert Szewczyk, Eric Osterweil, Joseph Polastre, Michael Hamilton, Alan Mainwaring, and Deborah Estrin. Habitat monitoring with sensor networks. Commun. ACM, 47:34–40, June 2004.

[18] Timos K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13:23–52, March 1988.

[19] Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. Efficient and extensible algorithms for multi query optimization. InProceedings of the 2000 ACM SIGMOD international conference on Management of data, SIGMOD ’00, pages 249–260, New York, NY, USA, 2000. ACM. [20] Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. On-the-fly sharing for streamed aggregation. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, SIGMOD ’06, pages 623–634, New York, NY, USA, 2006. ACM.

[21] Shili Xiang, Hock Beng Lim, Kian-Lee Tan, and Yongluan Zhou. Two-tier multiple query optimization for sensor networks. InProceedings of the 27th International Conference on Distributed Computing Systems, ICDCS ’07, pages 39–, Washington, DC, USA, 2007. IEEE Computer Society.

[22] Dimitri P. Bertsekas.Nonlinear Programming. Athena Scientifica, 1999. [23] D. Henrion and J.-B. Lasserre. Solving nonconvex optimization

prob-lems.Control Systems, IEEE, 24(3):72 – 83, jun 2004.

[24] Jon Lee and Sven Leyffer.Mixed Integer Nonlinear Programming. The IMA Volumes in Mathematics and its Applications. Springer, 2011. [25] D. Li and X. Sun.Nonlinear integer programming. International series

in operations research & management science. Springer, 2006. [26] Philip Levis, Nelson Lee, Matt Welsh, and David Culler. Tossim:

accurate and scalable simulation of entire tinyos applications. In Proceedings of the 1st international conference on Embedded networked sensor systems, SenSys ’03, pages 126–137, New York, NY, USA, 2003. ACM.