The Requirements (Challenges) - Collaborative Data Management of WikiSensing

3. Collaborative Data Management of WikiSensing

3.1. The Requirements (Challenges)

The data management challenges of designing a collaborative sensor system with trustworthiness assessment are due to the inherent characteristics of the data. This data includes information on sensors, collaboration and trust.

Firstly it is a challenge to manage sensor data as sensors can generate potentially large, real-time, heterogeneous measurements. Secondly as collaborative data contains different types of information (e.g. comments, annotations, ratings etc.) provided by various collaborators it is a challenge to organise, enable sharing and provide a common vocabulary for this information. Thirdly the extensible nature of trustworthiness information (e.g. trust metrics, contextual data, etc. discussed later in chapter 5) imposes issues of data representation. Expressing such data is a challenge due to the absence of a standard trustworthiness data representation methodology (e.g. ontology).

3.1.1. Managing Sensor Data

Managing sensor data is challenging due to the potentially large amounts of real- time, heterogeneous data provided by sensor devices deployed around the world. Providing efficient and scalable storage infrastructure for large volumes of data is essential for sensor data management. The infrastructure must also be flexible to store heterogeneous types of records as different sensor devices can produce measurements with different formats (e.g. single measurements such as the temperature or humidity or measurements with multiple dimensions such as distance, orientation and altitude). Furthermore the sensor data management infrastructure must support querying of real-time and historical data. In addition the ability to aggregate sensor data in order to produce useful information is another issue that must also be addressed. The data management challenges of sensor data are categorised as follows:

Infrastructure: Designing an infrastructure that is scalable, and provides efficient

efficiently storing and retrieving large volumes of heterogeneous sensor information. It must have the capacity to scale in order to handle a large number of connected sensors that periodically submit data as well as enable a large number of users to concurrently access the system.

Querying: The framework needs to support the manipulation of both real-time and

historical information. Querying constructs are required to capture information that arrive at the system continuously. The real-time nature and the continuous flow of sensor data have created the requirement for a near real-time processing of such data. The challenge arises in case, where a query is processed and an output is produced, more up-to-date data arrive making the previous reading out-of-date. For instance assume that a query completes processing using a window of real-time data at the time frame t1. This output will be invalid at time frame t2 (where t2> t1) as new data arrives. Moreover query constructs are also required to mine historical information, when, for example, a user may wish to investigate sensor readings from a previous time frame.

Information: The framework needs to support the aggregation of data streams from

multiple sensors as well as information with reference data sources. The reference sources for example can be data providers such as the meteorological (www.metoffice.gov.uk) or transport (www.tlf.gov.uk) departments. A data stream is the term that is used throughout this thesis that refers to a collection of measurements transmitted by a sensor. Aggregation is required to combine these data streams with each other to obtain composite sensor measurements as well as to combine data streams with reference data to obtain aggregated information. The challenge in aggregating different data streams arises due to the disparity of sensor types, measurements, accuracy, quality of readings and time frames. For instance, consider the combination of two temperature data streams that have different unit of measurements (e.g. Celsius, Fahrenheit, etc.) and are submitted at different frequencies and hence have different time points.

3.1.2. Managing Collaborative data

The challenges of managing collaborative data are related to the organisation and the sharing of information that are provided by collaborating users. These issues are based on organising as well as providing a common vocabulary for the collaborative data. The collaborative data management challenges are categorised as follows:

Organisation of information: This is based on the challenge of organising sensor

data and information provided by collaborating users. The collaborative information can contain data on the sensor environment (e.g. deployment information, comments on factors that impacts the trustworthiness of sensors, etc.), or on the sensor meta-information (e.g. accuracy, range, etc.). It can also be on sensor measurements (e.g. ratings on trustworthiness of measurement, annotations justifying anomalous measurements, etc.) or about any contextual details (e.g. measurement impacting factors such as factories or hospitals, details on sensor calibration, etc.). It is a challenge to organise this information as different users have diverse goals, views and can provide different types of annotations and this cannot be accommodated in a fixed schema. Moreover the need to associate and reference different types of information is needed when organising information to enable effective collaboration.

A need for a common vocabulary: Even when the collaborative sensor data is

organised it is still a challenge to provide a common vocabulary [74] in order to preserve the correct semantics of the information. Certain terminology can have the same meaning for instance; with different users annotating sensors there is a broad chance of the existence of disparate terminologies that share common semantics.

3.1.3. Managing Trustworthiness Data

It is a challenge to manage trustworthiness data as it can be extensible as well as requires a logical representation of relationships. This data is extensible as new trust metrics and contextual data can be added when needed. Moreover the

relationships between this information need to be represented to demonstrate a natural classification (trustworthiness model described in chapter 5). Additionally trust metrics can also be assigned at multiple levels (discussed in chapter 8) and requires a hierarchical representation.

Representation of extensible and multilevel data: The flexibility offered by the

framework to manage extensible as well as multiple levels of trust metrics poses the challenge of representation. A sensor or a sensor measurement can have several trust metrics or contextual data (relating to the trustworthiness) associated with it, further in certain scenarios these trust metrics can be classified into a hierarchy of sub levels. Moreover these metrics can also be based on measurement window sizes or parameters that influence its calculations. Hence there is a need to capture the trustworthiness information so that it correctly represents the circumstance of the sensor (or sensor measurement), the state of the calculations as well as the relationships between the metrics.

In document WikiSensing: A Collaborative Sensor Management System with Trust Assessment for Big Data (Page 51-54)