• No results found

mHigk, if var < Q

6.1 DATA PROCESSING FRAMEWORK

The m ain objective of this section is to rep resent m eaningful relations an d extracted concepts from large am ounts of sensory d ata from the real w orld d ata in a h u m a n an d or m achine interpretable form at. A s show n in Figure 45, real w orld phenom ena are observed by collecting m easurem ents from sensors an d the raw d ata (m ostly num erical) is sent to a user or a gatew ay in w hich the data is further processed an d rep resented in a m eaningful sem antic representation.

We provide a fram ew ork th a t infers know ledge from the d ata and constructs a topical ontology representation from the concepts th at are extracted from the raw data. In this section, w e introduce som e background know ledge about sensor d ata an d discuss sem antic rep ­ resentation fram eworks.

6.1.1 Real World Data

Real W orld d ata is com m only rep orted th ro ug h observation an d m ea­ su rem ent d ata obtained from sensory devices. Sensor d ata is often com m unicated as raw tim e-series data th a t can consist of a tim e stam p stating the tim e of m easurem ent, device Id, an d the values sensed b y the sensor th at is on bo ard of the sensor nodes i.e. tem per­ ature, light, sound, presence an d other relevant m eta data.

creasing. O n the one h a n d the price for hard w are is decreasing an d on the other h a n d day-to-day devices and appliances are equ ip ped w ith m ore capable hardw are. D ue to the large n um b er of sensor nodes an d hig h sam pling rates of sensor data, the am o u n t of data is not bearable for m any d ata processing algorithm s. The deluge of d ata requires a variety of different efforts such as real-tim e reporting, spa­ tial distrib ution an d the variety of sensors an d various qualities of the d ata for effective processing. Therefore dim ension reduction tech­ niques are usually u sed to reduce the n u m ber of features from a h igh­ dim ensional space to a low -dim ensional representation [102].

M ost com m on u sed techniques are: the Discrete Fast Fourier Transfor­ m ation (DFFT), transform ing the tim e-based d ata into the frequency dom ain to rem ove im w anted frequencies before transform ing it back to the tim e-dom ain. The Principal C om ponent A nalysis (PCA), ex­ tracting a new orthogonal base to represent the original d ata by cal­ culating the covariance or the Singular Value D ecom position (SVD), an d the Piecewise A ggregate A pproxim ation (PAA) an d its symbolic representation, th a t uses averaged w indow s, utilised in this work. We evaluate an d discuss som e of these techniques in the evaluation sec­ tion.

To abstract from n um erical values an d to create higher-level concepts from the large am o u n t of d ata pro d u ced by sensor devices, w e use the SensorSAX dim ensionality reduction m echanism introduced in Section 4.3. SensorSAX discretises the d ata an d generates symbolic w ords representing p attern s from the sensor data.

D ata discretisation serves as bu ilding block for m any p attern an d event detection algorithm s. It enables to m ap reoccurrm g p attern s to events even if there is variance, tim e shifting or different m eans in the d ata [73], [137], [83], [93]. SensorSAX exploits a variable encoding rate in stead of a constant rate based on the activity in the stream ing d ata an d allow s h igher com pression an d fewer errors in reconstruct­ ing the original raw data by only transm itting SAX w ords in case th a t there is activity in the sensor data. In this w ork, we focus on creating a topical ontology using the p attern s th a t are extracted from the Sen­ sorSAX patterns.

For instance, a tim e series sensor data is transform ed into the discre­ tised w ord "CDDCBAAAB"; sim ilar p attern s w ill have resem blance to this symbolic representation. The string sim ilarity betw een p attern s helps to index an d then com pare different p attern s by reducing the am o un t of data th a t has to be processed an d allow s to associate rules to com pare a n d /o r process the discretised w ords.

To illustrate the symbolic d ata aggregation, w e use an exam ple, the w ord "CDDCBAAAB", is a p attern constructed from sensor d ata ob­ tained via SensorSAX from an accelerom eter th a t has been attached to a door an d m easu red over 5 seconds. This could lead to the se-

132 A U T O M A T E D O N T O L O G Y C O N S T R U C T I O N

Raw sensor Data

Data Pre-Processing SAX discretisation Ontology Construction Concept Creation Property Creadon Rule Base

Rule Based Labelling

Ontology

Concept Property Naming "V Naming

Figure 46: Framework Overview

m antic concept "doorClosed" or "doorOpened" th a t can be stored and represented in an ontology.

6.1.2 Semantic Representation of Real World Data

The key idea b eh in d u sin g sem antic description for sensor data is to enable representation, form alisation an d enhanced interoperability of sensor data. O ntologies can be used to store sem antic concepts th at rep resent phenom ena a n d attributes from the real w orld th a t are u n ­ derstandable for the h u m a n user an d also interpretable for m achines du e to the stand ardised d ata representation.

The concepts can be linked together th ro u g h relationships th a t ex­ press interactions an d dependencies betw een the concepts. The W3C Sem antic Sensor N etw ork Incubator G roup has in troduced the Se­ m antic Sensor N etw ork O ntology (SSN) [19] th a t provides a m odel to annotate sensors an d their m eta data, an d gathered data. The SSN O n­ tology uses sem antic concepts to m odel the physical attributes of sen­ sor netw orks such as "Sensor Device", "Tem perature Sensor", "Radio Link". Properties in the SSN m odel the relationship betw een concepts such as "occuredAt", "observedBy" to relate sensor d ata annotations to dom ain m odels.

Zhao an d M eersm ann [141] introduce the concept of topical ontolo­ gies th a t rep resent a basic know ledge structure of a certain dom ain th a t can be u sed as a build in g block for further enhancem ent. Topical ontologies include the m ain concepts (topics) th at app ear in a certain dom ain b u t unlike a taxonom y also provide basic relations betw een the fun dam ental concepts.

We use the SSN O ntology as a starting p o in t for o ur m etho d an d extend the ontology by extracting new insights from the raw sensor data to construct a topical ontology representing an extract of the ob­ served dom ain. The following describes o u r approach to b ridg e the gap betw een raw data an d the required sem antic concepts.

6.1.3 Overview of the framework

In Figure 46, an overview of the p rop osed fram ew ork to process the raw sensory d ata an d construct topical ontology is shown. The fram e­ w ork consists of three m ain com ponents: D ata Pre-Processing, O n­ tology C onstruction and Rule Based Labelling. The raw sensor d ata

serves as the in p u t for the fram ew ork. A KM eans clustering m echa­ nism is used to group the d ata into clusters th a t form the unlabelled concepts. A M arkov m odel is u sed to create tem poral relations b e­ tw een the new ly created concepts.

The u n n am ed concepts (i.e clustered patterns) an d tem poral relations are u sed to create the initial topical ontology. A fter the initial ontology construction, the concepts are labelled usin g a rule-based reasoning m echanism . The rule-based engine processes the context of the d ata an d tries to nam e the unlabelled concepts an d properties.

1. D ata Pre-Processing: In a first step, the raw d ata is stan dard ised to a m ean of o an d a stan d ard deviation of i to ensure an even distribution of the d ata over the w hole processing p erio d a n d to allow com parison of differently distribu ted signals. A fterw ards the data is transform ed to the SensorSAX patterns. This allow s the m ap p in g of sym bolised descriptions to sem antic concepts in the ontology construction an d also reduces the size of d ata com m unication. The dim ensionality of the d ata is reduced by the aggregation algorithm in SensorSAX.

This step can be perfo rm ed on the sensing devices, in case the devices are n o t able to perform the task d u e to lim ited process­ ing capabilities, the process can be m oved to a n o de w ith higher processing capabilities (e.g. a gateway).

2. O nto lo gy C onstruction: The structure creation process defines the outline of the ontology construction. A prelim inary ontol­ ogy structure is created by extracting concepts an d properties using a clustering algorithm an d a statistical m odel. We follow a conceptual clustering approach [39] to create sem antic con­ cepts w ith o u t labelling them .

The clusters are form ed b ased on the sim ilarity of the attributes: symbolic representation an d the m eta d ata such as sensor type an d tim e range of the m easurem ent. Each cluster is form alised as an u n n am ed concept in the ontology structure. To m odel the properties in ou r current im plem entation, w e use a M arkov m odel to find the tem poral relations such as "occursAfter" b e­ tw een the concepts.

3. R ule-B ased Labelling: In order to nam e the concepts an d the properties, w e utilise a rule-based m echanism . The rule system is based on the Sem antic Web Rule Language. It accepts sym ­ bolised SAX p atterns an d ad d s a nam e tag to the unlabelled concepts.

We introduce a system th a t is able to extract rules b ased on the m eta-inform ation an d external d ata sources to autom atically d e­ fine the labels.

1 3 4 A U T O M A T E D O N T O L O G Y C O N S T R U C T I O N

6.2 D A T A D R I V E N O N T O L O G Y C O N S T R U C T I O N

The follow ing three m eth od s are introduced to develop a solution th at autom atically constructs an ontology depicting a perceptual view of the sensed environm ent: clustering the symbolic patterns, creating properties via a M arkov m odel an d nam ing the unlabelled concepts via a rule-based m ethod.

6.2.1 Clustering for Concept Construction

In order to reduce the am o u n t of data th a t has to be processed, we use the SensorSAX algorithm to create com pressed symbolic repre­ sentations of the data. SAX introduces a distance function th at allows com paring generated w ords such as "ABBA" an d "ABBC" an d stating a sim ilarity betw een o an d i. C om m on distance m easurem ents and string sim ilarity functions such as Levenshtein- or H am m ing-distance cannot be u sed on the SAX w ords due to non-uniform distribution of the letters in the m ain SAX algorithm .

The sole com parison of the w ords is n o t sufficient, as w ords can be sim ilar b u t m easu red by different types of sensors th at are n o t related to each other. The w ords are also d ep en d en t on the observation time. We introduce a set of inform ation th a t is need ed to cluster the data into different grou ps based on their different attributes. We define a triple set A = [P, t, T], w here P is a SAX w ord, t is the observation type an d T the observation tim e. In addition, w e define a distance function (show n as equation 2) to com pare the sim ilarity of tw o triple sets.

sa x D ist(P , Q) ' ^

•' w

\

^ ( d i s t ( p i , q i ) 2 (12) 1=1

distance(A i, A2) = saxD ist(Pi,?2) * tim eD iff(ti,t2) *typeDiff(Ti,T2) (13) In the first equ ation above, the original SAX b ased distance func­ tion is depicted. sa x D ist(P , Q) returns the distance betw een tw o w ords P an d Q according to the distance function in [83], w here n is the length of the SAX w ord, w the alphabet size of letters used in the discretisation process an d the function dist(pi., qi ) referring to a p re­ calculated lookup table for the particular alphabet size w . We extend the first equation by ad d in g a factor to com pare the tim e difference an d typ e difference betw een tw o triples show n in the equation below. tim e D if f retu rn in g a value betw een ]o,i] according to the tem poral distance of tw o triples an d ty p e D iff return in g either 0 or 1 m atching the ty pe of the triples. C om paring functions values from Euclidian an d non-E uclidian space can lead to w rong results as the space di­ m ensions are n o t equal. The alternatives are the use of non-linear di­ m ensionality red uctio n techniques an d the kernel trick to m ap them

into a common space, however the complexity for sensor networks to