• No results found

2. Big Data management for Sensors

2.2. The Generations of Sensor Data management

This thesis makes no assumptions about the networking protocols used to connect the sensor nodes. It also makes no distinction between who owns or operates the individual sensor nodes. The main focus is on the data collected by the different sensors and made available for sharing and collaboration. Such data needs to be stored and managed in a system that enables users to collaborate.

Generation Features Challenges Addressed Examples

First Centralized or distributed systems, querying, aggregation and features

Storage and querying of sensor data, scalability, energy efficiency and real- time stream processing

Aurora, The Cougar system, TinyDB

Second Limited collaboration by supporting the

Sharing information and aggregation and

CitySense, The Discovery Net

32

Generation Features Challenges Addressed Examples

configuration of sensor networks, processing and the development of analysis workflows on sensor data analysis of data in sensor networks system, CitiSense

Third Collaboration on sensor data, trustworthiness management, processing of sensor data into virtual sensors

Big data challenges, collaboration and trustworthiness assessment across all sensor data

Xively and WikiSensing

Table 2.1: A summary of the generations of sensor data management systems

The categorisation is applied on different sensor data management systems with regards to supporting such collaboration into three generations as described below. Table 2.1 summarises the different generations of sensor data management with their distinct features and the specific challenges addressed.

2.2.1. The First Generation

It is quite natural that sensors produce a vast amount of data as they continuously monitor environments [3]. This was the design rationale for the first generation of sensor data management systems that focused on storing and querying the sensor data. Examples include Aurora, Cougar and TinyDB [17, 20, 21] which process incoming data streams for applications. Such systems provide query primitives and algebra containing several primitive operations for expressing queries over the streams and querying the sensor nodes in a distributed way. Such systems had no clear provisions for collaboration between users for the sensor data.

Aurora is a Database management system for managing data in

monitoring applications developed by the Universities of Brandeis, Brown and

MIT. This system processes incoming data streams by passing them through a data-

33

can be executed while the input tuples are run through this data-flow system. For instance, the filter operator that applies any number of predicates to each incoming stream and the aggregate operator that applies a function across a window of values in a stream. Once an input has worked its way through the paths of the flow it is generally drained from the system. Aurora can also maintain historical storage in order to support certain ad-hoc queries based on a persistence specification.

Developed by Cornell University the Cougar System is a sensor data management system that supports querying in sensor networks. It follows a distributed query processing approach where the query workload determines the data that should be extracted from the sensors. The Cougar System uses an object- oriented database for storage and it models each sensor as a new Abstract Data Type (ADT). The stream processing functionalities are designed as ADT functions that return sensor data. It also supports long running queries formulated in SQL by extending the query execution engine by introducing a query construct known as ‘every’, specified with a Time frame parameter.

The sensor data management system of TinyDB specialises in query processing that uses acquisition techniques to reduce the power consumption of sensor devices. It first disseminates the queries to the sensor network and the query is then processed at the sensor nodes. Finally the results are collected back, up the routing tree that was formed as the query propagated. Hence it is clear that the intentions of sensor data management in this generation were to provide scalable and energy efficient storage systems that were able to handle large amounts of real- time data.

2.2.2. The Second Generation

The second generation data management systems provided certain primitives to support a limited amount of collaboration between users of sensor networks. These systems enabled either configuring the collection and/or the processing of data in a collaborative way between different users. For example, the CitySense [7] project

34

implemented and deployed an urban-scale wireless networking framework based on an open infrastructure allowing users to reprogram and monitor the same set of sensors via the internet and collect the data for shared analysis. The Discovery Net

system [22] provides an example where different users could develop their own

data collection workflows specifying how sensor data can be processed before storing in a centralized data warehouse. It also enabled them to develop analysis workflows for integrating the data with data collected from other data sources. Users of the system could thus share the same data and also derive new views and analysis results that were also shared.

The CitiSense [10] project is a distributed infrastructure to provide feedback on pollutants by the general public using mobile devices. By enabling this, the system supports enriching the information by the users and also allows them to comment on the operation and trustworthiness of the sensors. Each of the sensor management systems in this generation supports a degree of collaboration while operating on a fixed set of sensors. However, it is limited to either configuring sensors or sharing the processing of data of a specific sensor network.

2.2.3. The Third Generation

The third generation of sensor data management is based on open systems where users collaboratively submit data from any sensor and other users use this data. One example of this generation is Xively [9] (formally known as Pachube and then

Cosm). It enables users to share their sensor data and allows collaborating users to

build applications based on such data. The system however follows a passive approach with regards to the control (e.g. the ability to re-configure) of sensors by the collaborators when compared with some of the systems in the second generation. It simplifies online collaboration by allowing users to submit diverse data sets ranging from individual energy readings to data collected on various attributes of environments. Moreover, it allows developers to embed real-time graphs & widgets in websites; analyse and process historical data, and send real- time alerts to control devices.

35

Another third generation example is the WikiSensing System (wikisensing.org) [8] which is used in this research. It provides on-line database services allowing sensor owners to register and connect their devices to feed data into the system for storage. It also allows developers to connect to the database and build their own applications based on that data and perform different forms of analysis. It distinguishes from a system live Xively as it provides support for adding and annotating information about the sensors and their data through a wiki approach. Moreover it also supports the assessment of trustworthiness of sensor data.