• No results found

Experiments with Real Data

5.4 Experiments

5.4.4 Experiments with Real Data

We used temperature readings from a collection of 64 sensors deployed in Switzerland as part of SwissEx project [4]. The dataset contains 15000 records sampled every 30 minutes. Missing values were replaced by interpolation using available values of the sensors. In this experiment we assumed that the data was clean in order to be able to use the generated ground truth quality.

In this experiment we considered the influence of both errors and events. We set misRelSup = 0.0075 and minAllConf = 0.1. One simulated event source with duration of 5 time units was placed into the region, which generated a new event every 50 time units. Figure 5.9 presents the results, where PW performs two times better than AB in terms of RMSE. PW results in slightly higher RMSE as we increase the number of faulty sensors. This is due to the fact that injecting noise in the data reduces the frequency of interesting patterns and leads to a drop in quality, which is desired for noisy tested values.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 PW (0.01) AB (r=2) AB (r=3) AB (r=4) RMSE

Quality assessment algorithm clean

5% faulty sensors 10% faulty sensors 15% faulty sensors

Figure 5.8: Root mean squared error of PW when simulated events and faults are introduced in generated streams, where minRelSup = 0.01. PW achieves two times lower error than the baseline.

5.5

Related Work

In Chapter 2 we provided an extensive review of the related work. We can distinguish our approach from the existing approaches for quality assessment and outlier detection in sensor networks as the following. (I) Contrary to the previous methods such as [30, 71, 75, 76, 96], which suffer from the choice of the fixed neighborhood, we do not restrict our method to using a predefined fixed neighborhood. In our approach, the sensors in a neighborhood are not necessarily spatially proximate. The neighborhood is dynamically determined based on the frequent patterns identified in the sensor readings over time. (II) While model-based methods such as [15], and classification-based approaches such as [36, 71, 96, 133] require the knowledge of the statistical model of the sensor readings, our pattern-wise method only works based on the value of the sensors and no background information is needed. (III) We do not make any assumption on the structure of the sensor network. The only input from the sensor network to the pattern-wise approach is the sensor data streams.

In [138] an algorithm for event detection that is based on contour map matching was presented. Thus, it converted the event detection problem into a pattern matching problem. The motivation was to address disadvantages of the threshold-based methods for event detection, where the threshold-based methods are based on the assumption that if sensor values exceed a certain (user defined) threshold then an event has oc- curred. It pointed out that such threshold, although simple, are inappropriate for the following reasons: (I) it is difficult to specify proper thresholds given differing environ- ments to be monitored and application semantics and (II) events, where the magnitude of the observed event decreases with distance from its source (diffusion events) cannot be easily captures by discrete threshold values. The proposed method constructs and incrementally updates a number of contour maps that are used as building blocs for con-

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

no event-0 event-0 event-1 event-2 event-3

RMSE

Setting

PW (minRelSup=0.0075) AB (r=2) AB (r=3)

Figure 5.9: Root mean squared error of PW algorithm in presence of events and faults on real dataset. ’event-x’ means that events are present and x signifies the number of faulty sensors. ’No event-0’ means that no event exists in the dataset and no sensor is faulty.

structing spatio-temporal patterns exhibited in contour maps. The paper also identified three types of common events with respect to the shape of their contours in the map: pyramid, fault and island.

In [77] different optimization techniques for speeding up mining frequent value set in sensor networks were presented, using interval lists consisting of intervals during which the sensor assumes a given value. They coupled the idea of interval list with approximate itemset mining and derived an online algorithm for mining frequent itemsets from large a SN.

In [108] an approach for distributed mining of spatio-temporal event patterns in SN was presented. The algorithm proceeds as follows: (I) every sensor in the network continuously collects user defined events from neighboring sensors within a fixed distance and keeps a history of a fixed size of these events and (II) every sensor runs a mining algorithm for discovering patterns among these collected events. The main idea of the approach was to transmit to the sink only the compact patterns mined at sensors.

5.6

Conclusion

In this chapter we presented the first pattern-wise method for quality assessment of sensor data that addresses the limitations of the state of the art approaches by departing from the idea of a fixed neighborhood. Using frequent itemset mining, the method finds the values from multiple sensor data streams that frequently co-occur with the tested value. The logistic regression function was used to produce the quality score of the tested value given specific features of the frequent itemset. The performance of the pattern-wise approach regarding quality score computation error was compared to the performance of the common average-based approach. Experimental results confirmed superiority of the proposed method over the average-based approach.

Chapter

6

Conclusion and Future Directions

6.1

Conclusion

In this thesis we considered some important data management problems in participatory sensing systems and proposed efficient solutions for those problems. In particular, we looked at the problem of efficient data acquisition in participatory sensing, where several factors must be considered concurrently while collecting data from participants and answering user queries. Incentivizing participants to truthfully provide their private cost information and measurements was another important problem that we considered in this thesis. Finally, we proposed a novel approach towards assessing the quality of sensor readings based on frequent pattern mining methods.

We proposed a holistic data acquisition framework for participatory sensing envi- ronments, in which we incorporated the most important parameters pertinent to this paradigm, such as uncontrolled mobility, privacy, trust, costs, and utility. Based on the argument that in such systems, the type of applications and queries that are posed by the applications can be diverse, the proposed framework was designed to be as generic as possible. We formulated the problem of optimal multi-query data acquisition with the objective of maximizing the total utility for the applications. Since finding the optimal solution is computationally intractable in many cases, efficient heuristic algorithms were proposed to myopically maximize the total utility for some of the most important query types and their combinations. In particular, we proposed utility-driven data acquisition algorithms for point and aggregate queries, which are examples of one-shot queries, lo- cation and region monitoring queries, which are examples of continuous queries, and the combination of these individual query types.

The proposed utility-driven framework for data acquisition in participatory sensing would not be useful if the participants misreport their cost information and their mea- surements. In order to incentivize the participants to truthfully report their data, we designed incentive compatible and individually rational mechanisms for data acquisition

in participatory sensing as part of this thesis. The proposed mechanisms were designed for data acquisition for point queries with the objective of maximizing the utility of the center (or applications). We considered two cases, where the participants are privacy oblivious, i.e., they are willing to report their exact location, and where the participants are privacy conscious, i.e., they are not willing to reveal their exact locations. In case of privacy conscious participants, we proposed mechanisms for enabling them to trade their privacy for more monetary incentives.

Lastly, we presented the first pattern-wise method for quality assessment of sensor data that addresses the limitations of the state of the art approaches by departing from the idea of a fixed neighborhood. Using frequent itemset mining, the method finds the values from multiple sensor data streams that frequently co-occur with the tested value. The logistic regression function was used to generate a quality score for each sensor value, given carefully chosen features of their frequent itemset on other sensor data streams. Experimental results confirmed superiority of the proposed method over the commonly used average-based approach.