F R I E D E R G A N Z
UNIVERSITY OF
SURREY
T h esis s u b m itte d to th e U n iv e rs ity o f S u rre y fo r th e d e g re e o f D o c to r o f P h ilo s o p h y C e n tre fo r C o m m u n ic a tio n S y ste m s R e se a rc h
F a c u lty o f E n g in e e rin g a n d P h y sic a l S cien ces U n iv e rs ity o f S u rre y
ProQuest Number: 27558603
All rights reserved INFORMATION TO ALL USERS
The qu ality of this repro d u ctio n is d e p e n d e n t upon the q u ality of the copy subm itted. In the unlikely e v e n t that the a u th o r did not send a c o m p le te m anuscript and there are missing pages, these will be note d . Also, if m aterial had to be rem oved,
a n o te will in d ica te the deletion.
uest
ProQuest 27558603Published by ProQuest LLO (2019). C op yrig ht of the Dissertation is held by the Author.
All rights reserved.
This work is protected against unauthorized copying under Title 17, United States C o d e M icroform Edition © ProQuest LLO.
ProQuest LLO.
789 East Eisenhower Parkway P.Q. Box 1346
A B S T R A C T
There is a grow ing tren d tow ards integrating physical data into the In tern et w hich is su p p o rted by sensor devices, sm artphones, GPS an d m an y other sources th a t capture an d com m unicate real w orld data. Cyber-Physical D ata describes the type of d ata th a t represents obser vations an d m easurem ents gathered by sensor devices. These sensor devices are capable of transform ing physical inform ation (e.g. light, tem perature, coordinates) into digitised data.
W ith trem endous volum es of Cyber-Physical D ata th at are created, novel m ethods have to be developed th a t facilitate processing and p ro visioning of the data. A utom ated techniques are required to extract an d infer m eaningful abstractions for the end-user a n d /o r higher- level know ledge.
Investigation of the related w ork leads to the conclusion th at there has been significant w ork on com m unication an d processing aspects of Cyber-Physical D ata, however, there is a need for integrated solu tions th a t contem plate the w orkflow from d ata acquisition to extrac tion an d know ledge representation.
We propose a set of novel solutions for Cyber-Physical D ata com m unication an d inform ation processing by prov iding a m iddlew are com ponent th at contains m anagem ent an d com m unication process ing capabilities to deliver actionable know ledge to the end-user an d services.
We have developed a novel d ata abstraction m eth od for Cyber-Physical Data. The abstraction m eth od is based on a probabilistic g rap h m odel and m achine-learning techniques to extract relevant inform ation an d infer know ledge from p attern s th a t are represented by the abstracted data.
The prop osed approach is able to create h um an -read ab le/m ach in e- interpretable abstractions from num erical sensor data w ith precision rate of 79% an d recall of 94%. The autom ated ontology construction algorithm has a success rate of 84% of representing occurred events in the ontology.
Finally, an integrated softw are system is introduced th a t uses the m id dlew are an d the inform ation processing techniques to provide a com plete w orkflow from data acquisition to know ledge acquisition an d representation.
P U B L I C A T I O N S
This thesis an d its contributions are based on several Journal, C on ference an d W orkshop publications. The following m anuscripts have been used for the com position of this thesis:
J O U R N A L P U B L I C A T I O N S
• F rieder G anz, Payam B arnaghi an d Francois Carrez, "'Informa
tion abstraction for heterogeneous real world internet data,"' in IEEE
Sensors Journal, vol. 13, no. 10, pp. 3793 - 3805, 2013.
• F ried er G anz, Payam B arnaghi an d Francois C arrez, "'Automated
Semantic Knowledge Acquisition from Sensor Data"' in IEEE Sys
tem s Journal (forthcom ing), 2014.
C O N F E R E N C E A N D W O R K S H O P P U B L I C A T I O N S
• Sefki Kolozali, M aria Berm udez-Edo, D aniel P uschm ann, F rieder G anz, Payam B arnaghi '"A Knowledge-based Approach for Real-
Time loT Data Stream Annotation and Processing'" in 2014 IEEE
International Conference on Internet of Things, Taipei, Taiwan, 2014.
• Ralf Toenjes, M uh am m ad Intizar Ali, Payam Barnaghi, Sorin Ganea, F rieder G anz, M anfred H au shw irth , Brigitte K jaergaard, D aniel K uem per, A lessandra Mileo, Septim iu Nechifor, A m it Sheth, Vlasios Tsiatsis an d Lasse V estergaard '"Real Time loT
Stream Processing and Large-scale Data Analytics for Smart City A p plications'" in the E uropean Conference on N etw orks an d C om
m unications, 2014.
• F ried er G anz, Payam Barnaghi an d Francois C arrez "'M ulti
resolution Data Communication in Wireless Sensor Networks"' in the
IEEE W orld Forum on Internet of Things (WF-IoT), Seoul, South Korea, 2014.
• F rieder G anz "'Information Abstraction and Knowledge Acquisition
from Real-World Data"' in the doctoral sym posium at the Web
Reasoning an d Rule Systems (RR), M annheim , G ermany, 2013. • A ndreas Emrich, F ried er G anz, D irk W erth, Peter Loos "'Statistics-
Based Graphical Modeling Support for Ontologies"' in The Seventh
International Conference on A dvances m Sem antic Processing (SEMAPRO), Porto, Portugal, 2013.
Things"' in 2012 IEEE International Conference on Internet of
Things, Besancon, France, 2012.
• Payam Barnaghi, F ried er G anz, Cory H enson, A m it Sheth "'Com
puting Perception from Sensor Data"' in Proceedings of the IEEE
Sensors 2012 Conference, Taipei, Taiwan, 2012.
• Wei Wang, Payam Barnaghi, G ilbert Cassar, F rieder G anz, P irabakaran N avaratnam "'Semantic Sensor Service Networks"' in Proceedings
of the IEEE Sensors 2012 Conference, Taipei, Taiwan, 2012. • F rieder G anz "'Designing Smart Middleware for Wireless Sensor
Networks'" in the 12th A nnual PostG raduate Sym posium on the
Convergence of Telecom m unications, N etw orking an d Broad casting (P G N etii), Liverpool, U nited K ingdom , 2011.
• F rieder G anz, Payam Barnaghi, Francois C arrez, an d Klaus Moess- n er "'A Mediated Gossiping Mechanism for Large-scale Sensor Net
works"' in the International W orkshop on M achine-to-M achine
C om m unications (IWM2M), IEEE GLOBECOM 2011, H uston, Texas 2011.
• F rieder G anz, Payam Barnaghi, Francois C arrez a n d Klaus M oess- ner '" Context-Aware Management of Sensor Networks "'in the Fifth International Conference on C O M m unication System softWAre an d m iddlew aRE, COMSWARE, Verona, Italy, 2011.
U N D E R R E V I E W
• F rieder G anz, Payam Barnaghi, Francois C arrez "'A Practical
Evaluation of Information Processing and Abstraction Techniques for the Internet of Things'" subm itted to IEEE Internet of Things Jour
nal, 2014.
A C K N O W L E D G M E N T S
This w ork w ould have n o t been accom plished w ith o u t the su p p o rt an d help of m y colleagues, friends an d fam ily Special thanks go to:
Dr. Payam B arnaghi for su p p o rtin g m e from the beginning of m y studies at the U niversity of Surrey, his excellent supervision d u rin g this tim e an d his lim itless patience.
Dr. Francois C arrez for his su p p o rt an d his w itty com m ents th ro u g h ou t m y work.
Rita Kottmeier, Sarah Klein, D aniel P uschm ann an d A ndreas Em- rich for proof reading this thesis an d their accom panim ent in the last years.
M y friends an d fam ily for their invaluable su pp ort, m otivation an d feedback.
C O N T E N T S 1 P R O L O G U E 1 1 I N T R O D U C T I O N 3 1.1 D e f in itio n s ... 4 1.2 Research C h a l l e n g e s ... 4 1.3 Research O b je c tiv e s... 6 1.4 A ss u m p tio n s ... 7 1.5 C o n tr i b u tio n s ... 7 1.6 Thesis O u t l i n e ... 9 2 B A C K G R O U N D I 3 2.1 Wireless Sensor N e tw o r k s ... 13
2.2 M iddlew are A pproaches for Sensor N etw orks . . . 24
2.3 D ata C om m unication in Sensor N e tw o rk s ... 45
2.4 D ata Processing for Sensor N e t w o r k s ... 49
i i C O M M U N I C A T I O N F O R C Y B E R - P H Y S I C A L D A T A 7 I 3 A C O N T E X T - A W A R E M I D D L E W A R E 73 3.1 Context-aw areness in Sensor N e t w o r k s ... 73
3.2 M iddlew are A rch ite ctu re ... 75
3.3 M iddlew are Im plem entation ... 81
4 E N H A N C E D D A T A C O M M U N I C A T I O N F O R M I D D L E W A R E 8 5 4.1 M obility S u pp ort for M id d le w a re ... 85
4.2 M ediated G ossiping for Sensor N e tw o r k s ... 94
4.3 M ulti R esolution C om m unication ... 100
i i i D A T A P R O C E S S I N G F O R C Y B E R - P H Y S I C A L D A T A 111 3 A B S T R A C T I O N F O R C Y B E R - P H Y S I C A L D A T A I I 3 3.1 A bstraction C r e a t i o n ... 113 3.2 A bductive Reasoning ... 114 3.3 Tem poral R e a s o n i n g ... 118 6 A U T O M A T E D O N T O L O G Y C O N S T R U C T I O N I 2 9 6.1 D ata Processing F ra m e w o rk ... 130
6.2 D ata D riven O ntology C o n s tr u c tio n ... 134
6.3 Im plem entation an d E v a lu a tio n ... 138
7 A N I N T E G R A T E D S Y S T E M F O R K N O W L E D G E A C Q U I S I T I O N I 4 9 7.1 D esigning an Integrated T o o l ... 149
7.2 From D ata A cquisition to K now ledge A cquisition . . . 131
i v E P I L O G U E 1 3 7 8 C O N C L U S I O N S A N D F U T U R E W O R K I 3 9 8.1 S um m ary of Research A c h ie v e m e n ts... 139 8.2 Lessons L e a rn e d ... 160 8.3 F uture W o r k ... 162 IX
L I S T O F F I G U R E S
Figure i Sensor N etw ork Infrastructure ... 15
Figure 2 Oracle SunSpot an d CrossBow T e l o s B 17 Figure 3 M onitoring S c e n a r io s ... 21
Figure 4 C om m on C om ponents of M id d le w a r e 25 Figure 5 M iddlew are C la s s ific a tio n ... 26
Figure 6 H y brid vs. D ecentralised M iddlew are A pproach 27 Figure 7 C om m on Inform ation A bstraction process, de fined by exam ining different approaches . . . 52
Figure 8 Pre-Processing T e c h n iq u e s ... 54
Figure 9 O riginal D ata an d reconstructed Fourier trans form ation w ith less c o e ffic ie n ts ... 56
Figure 10 D ata transform ed via DWT into different reso lutions. Left sm ooth. R ight coefficient values . 57 Figure 11 O riginal D ata an d PA A transform ation w ith different w indow le n g th s ... 58
Figure 12 Variable PAA an d the A d ap ted W indow Sizes 59 Figure 13 O riginal D ata an d reconstructed SAX transfor m ations w ith different alphabet s iz e s ... 60
Figure 14 KM eans w ith k = 2 app lied to the data; differ ent colours state different cluster ty p e s 61 Figure 15 M arkov chain created by the frequency of the values in [1, 1, 1, 2, 100, 2, 2, 3, 3, 3] ... 62
Figure 16 H M M w ith three states app lied to the data; dif ferent colours state different cluster types . . . 63
Figure 17 C om parison betw een H M M w ith three states an d KM eans w ith k=3, it sh ould be no ted th at H M M takes the tem poral dim ension into ac count w hile grouping. Top: H M M w ith 3 states in tem poral sequence: Bottom: C luster w ith three gro up s ... 63
Figure 18 A Linked G raph containing Classes an d In stances linked together via P roperties... 64
Figure 19 The System O verview ... 77
Figure 20 P roposed A ssociation an d N egotiation Protocol 79 Figure 21 E xtended O n to lo g y ... 80
Figure 22 Sense2Web A p p lic a tio n ... 83
Figure 23 Service rendering on N ode vs Interm ediate Ren dering ... 86
Figure 24 loT S c e n a r i o ... 87
Figure 25 Caching M o d e ... 90
Figure 26 Tunnelling M o d e ... 90
Figure 27 Figure 28 Figure 29 Figure 30 Figure 31 Figure 32 Figure 33 Figure 34 Figure 35 Figure 36 Figure 37 Figure 38 Figure 39 Figure 40 Figure 41 Figure 42 Figure 43 Figure 44 Figure 45 Figure 46 Figure 47 Figure 48 Figure 49 Figure 50 Figure 51 Figure 52 Figure 53 Figure 54 Figure 55 C ontext Based G r o u p in g ... 95 D issem ination rate of different m echanism . . 97 M essage am o un t of different m echanism . . . 98 Gossip Sim ulation S o f tw a r e ... 98 M ulti-Resolution M essage F orm at show ing the m essage structure an d sam ple m essage in read able d ata an d enco ded... 100 The O riginal PAA fu n c tio n ... 101 The m odified PAA function in the SensorSAX a lg o r ith m ... 102 D im ensionality R eduction Process of SAX . . . 103 Evaluation of SAX an d S e n s o rS A X ... 104 G ranularity selection based on the quartiles of the d ata distribution for the v a r ia n c e ... 106 Im pact on d ata size an d correlation of recon structed d ata by usin g different w ind ow length m ... 107 C hange of the w in do w length m (red) over the dataset, show ing sm aller m for less active areas an d higher m for m ore active a r e a s ... 108 Processing from Raw D ata to A bstraction . . . 114 A n Exam ple of an extended PCT w ith prob a bilities ... 116 C om plete A bstraction M o d e l ... 119 The O riginal an d A bstracted D a t a ... 125 Latency an d C orrelation Results of A bstraction Creation ... 126 Evaluation Results of A bstraction C reation . . 127 From Raw D ata to S e m a n t i c s ... 130 F ram ew ork O v e rv ie w ... 132 C lustering the P atterns into C o n c e p t s 136 Tem poral Relation b ased on P attern Frequency 137 E valuation of the dim ensionality reduction p ro cess ... 139 D eterm ination of the nu m bers of clusters based on the cluster group v a r i a n c e ... 141 C lustering the Triples into G r o u p s ... 142 Evaluating the C lustered D ata from different Sensors w ith Real C alendar Inform ation and show ing the Error Rate over 100 R andom Runs 143 N u m ber of Relations based on the Factors: C ut off T hreshold an d C luster Size ... 144 A Schem atic View of the C onstructed Topolog ical O n to lo g y ... 145 A n Excerpt of the A utom atically C reated Topo logical O n to lo g y ... 145
List of Figures xiii
Figure 56 A rchitectural O verview of the T o o l k i t 150 Figure 57 Workflow from D ata A cquisition to K now ledge
A c q u is itio n ... 151 Figure 58 The W orkflow of the D ata A bstraction an d the
Selected A lg o rith m s ... 152 Figure 59 ECG data (a) an d pow er consum ption d ata (b)
loading s c r e e n ... 152 Figure 60 (a): ECG d ata after applying variance filter, (b):
W atts data after applying highpass filter. . . . 153 Figure 61 (a): ECG d ata after applying PAA, revealing
the outlier an d sup pressing the b ackground noise. (b): Watts d ata after applying PAA, revealing the regular p a tte rn of a w o r k d a y ... 154 Figure 62 (a) ECG data after applying KM eans w ith k=3,
gro up in g the d ata into groups of d ata w ith low values(o), high-values(i) an d outliers(2). (b) Watts d ata after applying H M M w ith 2 states, grou pin g the d ata into tw o g roups of low pow er (0) an d hig h pow er (1) c o n s u m p t io n ... 154 Figure 63 (a): ECG d ata after applying PAA, revealing
the outlier an d su pp ressing the backg ro un d noise. (b): W atts data after applying PAA, revealing the regular p attern of a w o r k d a y ... 155 Figure 64 D ata G raph R epresentation of the inform ation
Table i C om m only u sed sensor n o d e s ... 17
Table 2 D ata-Centric m iddlew are a p p r o a c h e s ... 29
Table 3 Features of data-centric m i d d l e w a r e ... 29
Table 4 A gent-based m id d le w a re ... 31
Table 5 Features of the agent-based m iddlew are . . . . 32
Table 6 Service-based m id d le w a r e ... 35
Table 7 Features of service-based m i d d l e w a r e ... 36
Table 8 A pplication-deploym ent specific m iddlew are . 38 Table 9 Features of application-deploym ent m iddlew are 38 Table 10 A pplication-developm ent specific m iddlew are 40 Table 11 Features of application-developm ent m iddlew are 41 Table 12 M iddlew are for Large Sensor N etw orks . . . . 43
Table 13 Features of m iddlew are for Large Sensor N et w orks ... 44
Table 14 Features in quality of service m iddlew are . . . 46
Table 15 Perform ance of probabilistic G ossiping . . . . 48
Table 16 O verview of the approaches an d their selected m ethods ... 66
Table 17 C ontributions of this Thesis beyond the State of the A r t ... 69
Table 18 C ontext Inform ation p rov id ed by different a p proaches ... 74
Table 19 Features of the m iddlew are com ponent . . . . 83
Table 20 C om parison betw een different m odes an d d i rect access ... 93
Table 21 Perform ance of determ inistic gossiping in dif ferent to p o lo g ie s ... 98
Table 22 Features of the enhanced m iddlew are com ponentio9 Table 23 Initial values for the reasoning m o d e l ... 121
Table 24 A utom atically extracted properties for rule cre ation ... 138
Table 25 E rror Rate in detecting the correct g roups from different sensor t y p e s ... 143
A C R O N Y M S XV
Table 26 M ethods applied th ro u g h o u t the process . . . 146
A C R O N Y M S
loT In ternet of Things
WSN W ireless Sensor N etw orks REST R epresentational State Transfer XML Extensible M arkup Language SAX Symbolic A ggregate A pproxim ation EFT Fast Fourier Transform ation
DFT Discrete Fourier Transform ation PAA Piecewise A ggregate A pproxim ation PCA Principal C om ponent A nalysis RSS Rich Site S um m ary
SQL S tructured Q uery Language QoS Q uality of Service
Q ol Q uality of Inform ation W3C W orld W ide Web C onsortium SensorSAX SAX for Sensor D ata
PCT Parsim onious Covering Theory
OS O perating Systems
ToA Time of A rrival
IEEE Institute of Electrical an d Electronics WiFi W ireless Fidelity (IEEE 802.11) GPS Global Positioning System
3G Third G eneration M obile S tandard CSV C om m a S eparated Values
KAT K now ledge A cquisition Toolkit
CKAN C om prehensive K now ledge Archive N etw ork API A pplication P rogram m ing Interface
UML U nified M odeling Language HM M H id d en M arkov M odel
RDF Resource D escription Fram ew ork XML extensible M arkup Language
P a r t I P R O L O G U E
Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.
— Clifford Stoll and G ary Schubert [71] We are su rro u n d ed by a deluge of data. W ith the em ersion to use devices for observing an d m easuring the physical w orld in order to com m unicate the sensory data over the Internet, trem endous volum es of d ata are created constantly.
This grow ing tren d tow ards integrating real w orld d ata into the In ternet, w ith su p p o rt by sensors, RFID tags, sm art phones, GPS an d other sensory sources th a t capture an d com m unicate real w orld data, is referred to as Cyber-Physical Data an d by including com m unication an d netw orking aspects, actuation an d services, it is referred to as
Internet of Things (loT).
The data, however, is heterogeneous by essence. It ranges from tem p eratu re readings from m eteorological sensors to location data from GPS to price inform ation from RFID tags. Raw d ata is m ostly n u m er ical an d is n o t as easy interpretable for a h u m a n as could be a text docum ent, video or other d ata available on the Internet.
It is u sually a specific abstraction process th a t can m ake the connec tion betw een the raw d ata an d som ething m eaningful th a t ultim ately can lead to actionable know ledge. A h u m a n can abstract from a sm all am o un t of in pu ts to som ething s /h e can u n d erstan d by incorporating context inform ation an d experience. However, the volum e, the variety an d dynam icity of the data m akes it im possible to do this m anually. As a result, m an y em erging research areas such as the loT w ith its goal to connect an d link devices to a ubiquitous com puting platform , an d the Big Data challenge to deal w ith the trem endous volum e of data, gain attention.
The process taking place betw een d ata collection an d know ledge ac quisition involves m an y steps in between. In this w ork w e explore the steps req uired to abstract from raw data to inform ation an d eventu ally provide n ew insights. This w ork has tw o m ain focuses: (1) Intel ligent C om m unication an d (2) Inform ation Processing:
I.The analysis of technical requirem ents th a t are need ed to gather
an d link the data from various sources an d the provisioning of a unified softw are com ponent th a t allows the access to the data.
I N T R O D U C T I O N
2. The provisioning of analytical m ethods to abstract from raw d ata to m eaningful inform ation th a t provide new insights based on the raw data.
1.1 D E F I N I T I O N S
In the follow ing chapters an d sections, w e use term s th a t are defined am biguously in the literature. Some term inology is u sed differently in certain research dom ains d ep en din g on the respective objectives. To ensure a com m on un d erstan d in g , w e provide definitions of the term s as they are u sed an d u n d ersto o d in this w ork, m ore detailed descriptions can be fo un d in the respective chapters.
C Y B E R - P H Y S I C A L D A T A is also referred to as Real-World Data and describes d ata th a t represents physical inform ation gathered by sensor devices th a t have the capabilities to transfer physical in form ation (e.g. light, tem perature, coordinates) into machine- readable data.
T H E I N T E R N E T OF T H I N G S has the aim to rep resent an d interact
w ith objects in the Internet. Sometimes, the term has been syn onym ously u sed w ith pervasive an d ubiquitous computing and
Cyber-Physical Systems.
M I D D L E W A R E is a Software com ponent th a t connects the devices
used in sensor netw orks an d hides their com plexity for the ease of application program m ing. The com ponent spans over the h et erogeneous devices from sensor nodes, gateways an d w orksta tions.
1.2 R E S E A R C H C H A L L E N G E S
The com plex process of inform ation abstraction can be div id ed into tw o categories; C om m un ication an d D issem in atio n C hallenges, and In fo rm atio n P rocessing C hallenges. W hereas com m unication tasks focus on h o w to retrieve an d forw ard the d ata from sensor nodes to end-users or higher-level applications, inform ation processing fo cuses on w h a t to do w ith the data in order to m ake it und erstan dable a n d /o r m achine-interpretable an d facilitates the process of abstract ing from inform ation to know ledge. The m ain research challenges for each category are listed below.
1. C om m unication and D issem ination Challenges:
a) There is a vast variety of data sources. The h etero g en e ity has m an y facets ranging from different hard w are inter faces an d com m unication protocols on the sensor side, to different notations an d data form ats th a t h in d er the inter operability betw een d ata producers an d data consum ers.
M ost available m iddlew are solutions focus only on certain h ard w are an d solutions p latform s . This task m akes it diffi cult to have one integrated solution th a t is able to use an d integrate the d ata in heterogeneous environm ents.
b) Sensor n odes are often resou rce-constrained devices w ith sm all m em ory an d com puting facilities. This lim its the use to algorithm s an d techniques th a t have b een developed for constrained environm ents.
c) Sensor netw orks consist of different types of nodes w ith different capabilities. C om m unication an d dissem ination m echanism s have to a d a p t to the infrastructure an d d is trib u te the load on the netw ork.
d) W ireless sensor devices can be mobile. Similar to cellular netw orks, w ireless sensor netw orks contain base stations th a t cover different areas. However, m o b ility h a n d lin g re quires m ethods th a t facilitate the com m unication an d d ata dissem ination in hand-over scenarios to sustain an d p ro vide reliable d ata flows.
e) The com plexity of sensor netw orks d u e to their heterogene ity m akes it difficult to p ro gram single nodes, as each n ode can v ary in h ardw are an d software. In m ost cases it is n o t possible to m a n u ally control a netw ork an d a d a p t to it to its changes an d therefore a h ardw are abstraction layer is required th a t hides the com plexity of the u n derlym g infras tructu re an d facilitates the developm ent of m ethods from a high-level service or application perspective.
f) Sensory data, especially from w ireless sensor netw orks is u n reliab le . The b attery can be d rain ed or n atu ra l ph en o m ena can destroy nodes in the netw ork. This leads to a con stan t flow of nodes th at join or leave the netw ork. This requires a reliable inform ation structure th a t can han dle the dynam icity of the network.
2. Inform ation Processing Challenges:
a) The deluge of d ata leads to an in fo rm a tio n overload. M an u al analysis m ethods are n o t suitable anym ore an d new (sem i-)autom atic techniques are req uired to find the d e sired inform ation a n d /o r reveal n ew insights from the data. b) Real-world inform ation is now adays cap tured in real-tim e.
This d em an ds platform s th a t can h an d le the constant flow of data an d facilitate the (near) real-tim e analysis.
c) Cyber-Physical D ata is gathered from different sources w ith diverse m eta an d context info rm atio n .
d) Cyber-Physical D ata does n o t have a sta n d a rd ise d m o d el for m eta-inform ation. For exam ple, data can be m easured
I N T R O D U C T I O N
in Celsius D egree or in Fahrenheit, however, the hetero geneity of the d ata can lead to interoperability issues in the data processing.
e) Cyber-Physical D ata is p ro n e to n a tu ra l changes an d im pacts from the environm ent. The heterogeneity of the data dem an ds m eth od s th a t can cope w ith the m ultim odal n a ture of the in p u t sources. A lgorithm s are requ ired th a t can w ork on different types of data.
f) The m eaning of data is d ep en d en t on tem p o ral attrib u tes. A n observation d u rin g w in ter tim e can lead to different interpretations th an in sum m er time. The tim e axis as con text inform ation has to be considered an d incorporated in new m ethods.
g) The m eaning of data is d ep en d en t on sp atial attrib utes. A n observation in E ngland can lead to different in terpreta tions th a n in Germany. The spatial inform ation as context inform ation has to be considered an d incorporated in new m ethods.
1.3 R E S E A R C H O B J E C T I V E S
The m ain goal of this w ork is to m ake Cyber-Physical D ata available an d un d erstan d ab le for the h u m a n a n d /o r m achine-interpretable for services an d applications. Indicated in Section 1.2, the process starts in gathering the d ata from various sources an d its com m unication, to the processing of d ata tow ards m eaningful inform ation. We break this com plex task d o w n into a finer-granular list of objectives th at are sum m arised below:
• O bjective 1: To develop a m iddlew are com ponent providing a unified layer th a t enables the access to heterogeneous data sources. The m iddlew are com ponent ru n s on sensor nodes w ith constrained capabilities, gateways w ith better processing capa bilities an d w orkstations w ith interfaces for the end-user and higher-level applications. The m iddlew are has to hide the com plexity of the underlying netw orks an d d ata sources an d ad ap t itself to changes in the environm ent.
• O bjective 2: The m iddlew are com ponent has to incorporate the dynam ic n atu re of Cyber-Physical D ata an d provide com m uni cation an d m obility su p p o rt for d ata sources th a t are pron e to changes an d failures in the environm ent. In particular, a m ech anism to h and le appearing an d disapp earing sensor n odes an d the m obility of nodes betw een different base stations (e.g. gate ways) is developed.
• O bjective 3: Based u p o n the unified access layer prov id ed by the m iddlew are com ponent, processing m ethods th a t are d e veloped for Cyber-Physical D ata are required. The processing m ethods pro vided in this w ork allow abstract from raw d ata to higher-level abstractions th a t are hum an-readable a n d /o r machine- interpretable.
• O bjective 4: The developed processing algorithm s used to ab stract from raw C yber-Physical D ata to higher-abstractions have to be autom ated in order to cope w ith the large am ou nt of data gathered by the m iddlew are com ponent.
1.4 A S S U M P T I O N S
T h roughout this w ork, w e m ade som e assum ptions th a t are listed below.
• Sensor no des an d Cyber-Physical devices use com m unication protocols on different layers of the OSI m odel. In ou r w ork, we focus on the application layer and assum e th a t efficient routing an d energy optim isations, as well as security aspects are m an aged an d im plem ented on the lower-layers.
• In o ur w ork, we provide an d develop novel autom ated algo rithm s to create m eaningful abstractions of the d ata to m ake them u nd erstan dable by h u m a n /a n d or m achines. This p ro cess describes the creation of sem antic representations of the ab stractions th a t can be read an d u n d erstoo d by h u m an s or u sed by higher-level applications th at su p p o rt sem antic queries an d m odels described by the sem antic web standards.
• The definition of real-time can vary in processing large-scale d is tribu ted Cyber-Physical Data. The processing of in p u t to o u t p u t has technical lim itations th a t often lead to delays m aking it im possible to im m ediately present o u tp u t results. Therefore, it m ig ht be m ore correct to refer to nearly real-time system s in this context. However, to stress the p o in t th at w e focus on system s an d approaches th a t deliver im m ediate results (w ithin technical boundaries), w e use the term real-tim e th ro u g h o u t this thesis.
1.5 C O N T R I B U T I O N S
This w ork assem bles the contributions m ad e to ad dress the objectives stated in Section 1.3 and to overcome the challenges th a t em erge de scribed in Section 1.2. A su m m ary of the contributions are listed be low.
1. We provide a m iddlew are com ponent th at hides the com plex ity of und erlyin g sensor netw orks an d different h ardw are an d
I N T R O D U C T I O N
protocol architectures. The m iddlew are su pp orts various sensor platform s such as CrossBow TelosB nodes an d Oracle SunSpots using IEEE 802.15.4 com m unication standards. The m iddlew are is able to detect the context inform ation of the no des includ ing b attery level, distance from the no de to a gatew ay and sig nal strength. The inform ation is u sed to a d a p t the com m uni cation p attern s to save energy consum ption. A protocol sim i lar to the IEEE 802.11 stan d ard is in troduced th at allows d y nam ic association of nodes. The design of the protocol follows a zero-configuration approach w ith the goal to m anage a large am ou nt of heterogeneous sensor nodes. The m iddlew are offers functionalities for the autom ated sem antic annotation of the sen sor n o d es' capabilites an d values to provide a know ledge base for fu rther processing m ethods.
2. Initially, the m iddlew are com ponent has b een developed for single-gatew ay setups. In Section 4.1 w e address the m obility of sensor n o des betw een distrib uted gateways an d provide new m echanism to overcome the problem s th a t arise d u rin g h a n d over betw een different gateways. We provide caching an d infor m ation dissem ination m echanism s to m aintain a reliable data flow betw een d ata prod ucer an d data consumer. W hereas Sec tion 4.1 describes the com m unication requ ired to su p p o rt the m obility of nodes. Section 4.2 introduces an overlay netw ork form ed by the gateways for intelligent q uery processing an d forw arding usin g a gossiping m ethod.
3. In order to reduce the com m unication overhead, w e introduce a data aggregation an d com pression m echanism called Sensor
S A X in Section 4.3. SensorSAX aggregates the raw sensor data
to discretised string representations. M ost of the com m on ap proaches create an d tran sm it the raw sensor d ata constantly, even in tim es w here no interesting events occur. Therefore, novel m ethods are requ ired to provide m ulti-resolution d ata transm is sion th a t allow s the com m unication of high-resolution data (i.e raw data) m tim es of h ig h activity and ag g reg ated /co m p ressed data in tim e w indow s w here no events occur. In this section, w e propose an approach to create an d transm it aggregated pattern s of d ata b y applying Symbolic A ggregate A pproxim ation (SAX) [82] to the sensor d ata unless higher-resolution d ata is required. We store the raw d ata in a ring buffer on the local sensor n od e to provide high-resolution data, if required, b u t only transm it ag gregated p attern s to a gatew ay node in tim es of low activity. We define a m essaging form at th a t includes aggregated patterns. These p attern s rep resent the m ost interesting features an d its context inform ation an d are used as the fund am en tal step for the follow ing d ata processing techniques. SensorSAX achieves
a d ata redu ctio n u p to 80% in constrained environm ents com p a re d to the SAX algorithm .
4. To process the d ata tow ards som ething m eaningful, w e created an abductive reasoning m odel b ased on the Parsim onious Cov ering T heory (PCT) I108] in w hich sensors rep o rt different d ata th a t serve as the in p u t for o u r m odel in C hapter 5. A n abduc tive m odel has been chosen rath er th an an inductive or d ed u c tive approach to address the challenge of inform ation incom pleteness. In sensor environm ents, n o t all observations m ight be available d u rin g the reasoning process. Based on the avail able d ata obtained from sensors, w e abductively rule o u t the m ost unlikely p henom ena th a t could have been caused by the sensors observations. The m odel is u sed to infer from the sym bolic representation (SensorSAX) of the sensor d ata into higher- level abstractions such as "warm", "dark" or "no-attendance" in a sm art office environm ent. We use the outcom e of the extended non-tem poral PCT into the tem poral dom ain by introducing a H id d en M arkov M odel (HMM) th a t includes the tem poral d i m ension of the data. By taking the changes of states over tim e into the abstraction process it is possible to detect events th a t occur over time. The m odel creates abstractions from num erical sensor d ata w ith precision rate of 79% an d recall of 94%.
5. The abductive m odel intro du ced above does n o t su p p o rt a u tom ated processing by default. We therefore introduce a novel rule-based system th a t designates the relationships betw een dis cretised sym bolised d ata an d sem antic concepts th a t is described in C hapter 6. We provide an autom ated ap proach for a real w orld d ata driven topical ontology construction th a t represents the perceptual view of the collected d ata an d relationships b e tw een different concepts. We have developed a KM eans clus tering algorithm to group sim ilar discretised p attern s th a t later represent n am ed concepts in ou r ontology.
To discover relations betw een related concepts, we use a M arkov chain m odel to find the m ost frequent tem poral occurrences b e tw een p attern s an d nam e them after their occurrence. The a u tom ated ontology construction algorithm has a success rate of 84% of representing occurred events m the ontology.
1.6 T H E S I S O U T L I N E
The thesis is organised in four parts d iv id ed into 8 C hapters. The first p a rt (Prologue) - this section - introduces the research challenges, objectives an d the contributions m ade to the scientific com m unity an d provides an analysis an d discussion of the cu rrent state of the art. In particular:
10 IN T R O D U C T IO N
C H A P T E R 1 (Introduction) introduces the thesis w ork an d describes
the research challenges, objectives an d also assum ptions m ade in this w ork. F urtherm ore, a sum m arised list of the scientific contributions is presented.
C H A P T E R 2 (Background) provides an introduction into w ireless sen
sor netw orks th a t serve as the b uild in g block of Cyber-Physical Systems an d are the m ain source of d ata in this work. In this chapter, w e p rovide a state-of-the art analysis of different m id dlew are solutions. We also provide an overview of the com m unication m echanism s an d describe com m on d ata processing techniques th a t are u sed in sensor netw orks an d m iddlew are solutions.
The second p a rt (Communication for Cyber-Physical Data) introduces ou r m iddlew are com ponent th a t sup po rts heterogeneous sensor nodes an d provides enhancem ents for the requirem ents of Cyber-Physical Data. In particular:
C H A P T E R 3 (A Context-Aware Middleware) g i v e s a n i n t r o d u c t i o n o f
ou r m iddlew are com ponent an d provides an architectural overview. The m iddlew are com ponent serves as a b uild in g block for the
fu rth er sections. The m iddlew are is developed for resource-constrained sensor no des a n d nodes w ith higher processing capabilities called gatew ays or base stations.
C H A P T E R 4 (Enhanced Data Communication for Middleware) extends the m iddlew are com ponent explained in C hapter 3 w ith m obil ity su p p o rt of nodes betw een different gateways. Seamless com m unication betw een several setups of sensor nodes an d gate w ays via an overlay netw ork an d m ulti resolution com m unica tion approach called SensorSAX for sensors to reduce the traffic. The th ird p a rt of this thesis is based u p o n the m iddlew are com p o n en t from p a rt tw o an d the und erlyin g com m unication m ethods introduced, to abstract the d ata to m eaningful inform ation. In partic ular:
C H A P T E R 5 (Abstraction for Cyber-Physical Data) provides the theoret
ical m odels to abstract from d ata to m eaningful inform ation. This C h apter relies on the SensorSAX algorithm introduced in Section 4.3. The chapter introduces an abductive reasoning m odel b ased on the Parsim onious Covering Theory an d extends it to the tem p oral dom ain.
C H A P T E R 6 (Automated Ontology Construction) focuses on the autom a
tion of the abstraction process from raw data to a sem antic rep
resentation of the inferred inform ation. The chapter provides an au tom ated m eth od for data-driven ontology construction u sing abstraction m ethods.
C H A P T E R 7 (An Integrated System for Knowledge Acquisition) introduces
a toolkit th a t bridges the gap betw een the m iddlew are an d p ro cessing m ethods. The know ledge acquisition toolkit serves as a client to the m iddlew are com ponent an d provides an interface to ap ply the discussed an d introduced m ethods from C hapter 5 an d C hapter 6
In the la st p a rt we conclude the thesis.
C H A P T E R 8 sum m arises the research achievem ents obtained in this
thesis an d concludes them . Finally, an outlook for future w ork is p resented an d w e discuss how this research can be furth er extended.
B A C K G R O U N D
The com m unication an d processing of Cyber-Physical D ata into m ean ingful an d interpretable inform ation for hum ans a n d /o r m achines is facilitated by a chain of activities. Starting off as m easurem ents ob tained by sensors capturin g physical attributes from the real w orld, sensor devices convert them into d ata available for the cyber w orld. The obtained data has to be com m unicated from the single sensor device th ro u g h netw orks of nodes to reach base stations an d eventu ally serv er/w o rk statio n s th a t provide the m eans to process the d ata to m eaningful inform ation.
Sensor nodes, base-stations an d server/w orkstatio ns are the physical build in g blocks th a t are u sed to enable the flow of inform ation from its raw state to actionable know ledge. However, the glue th a t b in d s all the com ponents together an d hides the com plexity of program m ing, com m unication an d provisioning of the d ata is p rov id ed by the m id dleware.
In this chapter we presen t the necessary com ponents an d m ethods. Beginning from the basic step of gathering physical observations by sensor nodes, to abstract an d h ide the com plexity of such com ponents an d gluing them together w ith m iddlew are solutions an d processing the data w ith inform ation processing techniques to becom e know l edge th a t helps to find new insights in the data.
We first provide an overview of the bu ilding blocks, nam ely W ire less Sensor N etw orks (WSN), nodes an d base stations (e.g. gateways) in Section 2.1. In Section 2.2, we introduce the concept of m id d le w are an d solutions w ith various features th a t abstract from the h a rd w are layer an d h id e the com plexity of sensor netw orks. We introduce the com m unication an d inform ation dissem ination aspects of W ire less Sensor N etw orks and m iddlew are in Section 2.3. A fterw ards, we presen t inform ation processing techniques th a t can ru n on the sensor netw ork an d the m iddlew are com ponent to acquire know ledge from the captured d ata in Section 2.4.
2 . 1 W I R E L E S S S E N S O R N E T W O R K S
Wireless Sensor N etw orks (WSN) are one of the key enablers for Cyber-Physical D ata an d the Internet of Things. W SNs facilitate the connection of the physical w orld w ith the virtual w orld an d allow interaction betw een the real w orld and the virtu al w orld entities. A WSN contains several w ireless sensor nodes th a t consist of processing- , storage-, pow er- an d w ireless netw orking-units an d sensing devices
th a t p rovide observation an d m easurem ent inform ation.
Sensor netw orks gained significantly in im portance as prod uctio n costs of sensors decreased an d thereby favour deploym ent in large areas. Ease of use an d ra p id application developm ent are also con tribu ting factors in increasing n u m b er of W SN -based applications an d services. In the beginning of the WSN era, low level p rog ram m ing languages h a d to be u sed to im plem ent applications on them ; it is n ow possible to use high-level program m ing languages such as Java to p ro gram them in a highly abstracted m anner. However, the distrib uted n atu re of large-scale netw orks an d the constrained capa bilities of those low-level devices require su p p o rt for developm ent, m anagem ent an d execution for high-level applications an d services. To bridge the gap betw een these low-level devices an d high-level ser vices, m iddlew are solutions can be used. M iddlew are is com m only an abstraction layer from the un derlying technologies. In WSN, a m iddle w are p rovides interfacing an d interaction functionality an d integrates the low-level device capabilities an d d ata w ith higher-layers, applica tions an d services.
M iddlew are hides the com plexity of the distributed n etw ork an d en ables applications an d users to access the d ata offered by the devices w ith o u t being involved in the details of u nderly ing technologies. The m iddlew are design for W SN ranges from approaches w hich are n o t far aw ay from operating system s an d provide very fu nd am ental ab straction functionalities u p to solutions th a t are very close to applica tions th a t provide functionalities for special purposes.
D epending on the pu rpo se, m iddlew are is m ostly deployed on dif ferent com ponents of the netw ork such as the sensor node, the gate w ay an d the backend. Its application also d ep end s on the underly ing hardw are an d resources an d th u s needs to be ad ap ted to diverse re quirem ents as e.g. resource constrained devices.
Several survey articles provide an overview of m iddlew are an d the WSN research areas. We extend the existing survey w ork (e.g. [94], [129], [56], [61], [95] an d [42]) an d presen t a focused overview on at tributes related to m iddlew are design a n d /o r im plem entation an d discuss a b ro ad range of existing approaches w ith different goals. In this section the W SN m iddlew are design an d im plem entations from data-centric, agent-based, service-based, deploym ent an d devel opm ent, large-scale an d quality of service view s are described. Differ ent features an d ideas th a t contribute to address the gap betw een ca pabilities an d d ata offered by low-level devices an d the requirem ents of high-level service an d applications are discussed.
2.1.1 Architecture of Sensor Networks
The w ireless sensor netw orks architecture can be div ided into a three tier architecture as show n in Figure 1: a) Sensor n odes form the
capil-2. 1 W IR E L E S S S E N S O R N E T W O R K S 1 5 Sensor Island o Sink Node
!
G a te w a y o I01 z01 B s c k e r id ;'A P I /G L ) l/A p p lic a tio n s
S
Figure 1; Sensor Network Infrastructure
lary netw ork w hich is som etim es also referred to as Sensor Island, b) Gateways or base stations th at receive data from the sensors an d p ro vide data for c) the back-end or core netw ork. Sensor nodes provide observations and m easurem ents data and forw ard it to other nodes for furth er processing. If the processing is perform ed inside the n et work, the term in-netw orking processing is used.
In the capillary netw orks, d ata is received a n d /o r aggregated by m ore pow erful nodes in the netw ork (i.e sink nodes) w hich are attached to gateway devices. They act betw een the capillary netw ork and the core netw ork. The gatew ay can be a w orkstation or m obile node w ith higher processing capabilities. The data can be send to the sink node, if the sending node has a direct connection to it. If a node is not directly connected to the sink node, data is sent th ro u g h neighbour nodes until the initial data reaches the sink node. This type of com m unication is called m ulti-hop connection an d is m ostly h an d led by the low er-netw ork layers. The advantage is th at large distance nodes can be connected to the gateway or high-level netw orks. However, the nodes in-betw een have to send m essages and therefore they consum e pow er w hich affects their lifetime (if they are b attery pow ered). The routing an d m essage forw arding in sensor netw orks is an on going research area; the routing challenges an d tasks are no t in the scope of this thesis. F urther inform ation regarding routing protocols can be found in [78] an d [4].
The gateway provides com m unication betw een low-level devices and high-level networks. In large-scale WSN and in case th o u san d s of low- level nodes have to be integrated into the IP-based internet, gateways help to provide IP-connectivity for the un derlying sensor device. The gateway nodes are usually equipped w ith two netw ork interfaces.
one for the capillary com m unication an d one for the external com m u nication. The gatew ay or m iddlew are provides abstraction and ser vice provisioning functionalities w hich are leveraged by external ap plications an d users to com m unicate w ith low-level devices. In the fol low ing section we briefly introduce com m on h ardw are devices used in sensor netw orks an d th en describe the gatew ay an d m iddlew are requirem ents for WSN.
2.1.1.1 Sensor Nodes
Sensor nodes or som etim es referred to as '"m otes'" m ainly consist of a micro-controller, transceiver, sensors an d a pow er source. The m icro-controller m anages the nod e an d processes the d ata captured by the sensors. It also m anages sending an d receiving d ata to and by other nodes. The controllers m ostly differentiate in term s of pro cessing pow er an d m em ory size. The controllers range from low-cost m icro-controllers such as the A tm el M ega Series^ u p to high-capable A rm Scale^ processors. The trade-off betw een pow er consum ption an d processing pow er plays a role in selection of the rig ht controller for a sensor node.
The m em ory of the n o de is u sed for ru n n in g the application logic an d saving interm ediate results. N odes w ith o u t contact to a central base station are usu ally eq u ip p ed w ith external flash storage for long tim e d ata operations in w hich sensors in large-scale netw orks process the data an d forw ard it to a central p o in t for storage.
The transceiver operates usually in the license-free bands: 915 M hz an d 2.4 G hz w ere several stand ards an d protocols such as IEEE 802.15.4 [54] an d the Zigbee [142] stand ard, establish the connections on the h igher layer protocols. D epending on available transm ission pow er an d an tenn a design, the com m unication range is u p to 25 km. De spite the IEEE 802.15.4 b ased transceivers, sensors can be extended by other (air-)interfaces. There are extension b oards th a t enable IEEE 802.11 W LAN or cellular netw ork com m unication via GSM, 3G or LTE for different sensor nodes.
The sensing devices are u sually attached to a sensor board. The sensing devices range from tem perature, light an d accelerom eter sen sors to m ore com plex devices such as gas an d rad iation detectors. The softw are an d program m ing languages ru n n in g on the nodes range from prim itive ones such as the C p rogram m ing language an d its spe cial v ariation nesC, developed to fully integrate into the sensor n ode operating system TinyOS [77] an d fulfil the special needs of sensor netw orks, u p to high-level p rogram m ing languages such as Java, ru n n in g natively on the nodes. The m ost w ell-know n n od e th at natively executes Java M obile Edition code is the Oracle^ SunSpot [122] node. 1 http://www.atmel.com/products/microcontrollers/
2 http://arm.com/products/ 3 Former Sun Microsystems, Inc
2.1 W IR E L E S S S E N S O R N E T W O R K S 1 7
Figure 2: Oracle SunSpot and CrossBow TelosB
TinyOS and the nesC program m ing language are used in Crossbow [23] nodes such as the TelosB an d Mica2 depicted in Figure 2. Table 1 show s som e of the com m only used sensor node platform s.
Sen sor N o d e M icrocontroller M em ory Transceiver Sen sors Program m ing L anguage Crossbow
T elos/M oteiv Tmote Sky
8M H z TI MSP4 3 0 1 0KByte Ram IEEE
8 0 2.1 5.4/Z ig -Bee O ptional sensor su ite including hght.tem perature and h um idity T in y O S/n esC /C o n tik i
Crossbow 4 M H z ATmega 1 0 3L 4 Kb Ram +
1 2 8Kb Flash
9 1 6M H z radio transceiver
Several optional T in yO S/n esC
Crossbow A tm ega 1 2 8L 4 Kb Ram +
1 2 8Kb Flash
8 6 8/9 1 6MHZ, 433 or 3 1 5M H z multi-channel
Several optional T in yO S/n esC
Oracle SunSpot
4 0 0 M H z ARM gzSej-S 8Mb Flash +
1 M byte Ram
8 0 2.1 5 .4
Transceiver
Several Java (Java ME)
WaspMote [1 3 1I A T m egaizB i 8M Hz 8Kb Ram+ 1 2 8Kb Flash +2Gb SD Card 8 0 2.1 5 .4 /Z ig b ee Pro/R F 8 6 8M h z/B lu e - too th/G SM Several W a sp m ote/c#
Table 1: Commonly used sensor nodes
2.1.1.2 Gateways
Gateways can be nodes or w orkstations w ith hig h processing capa bilities an d interfaces that enable com m unication betw een sensor n et w orks and external applications. Gateways are used to translate be tw een different low-level and high-level protocols such as 6LowPAN [96], CoAp [67] on the node side and T C P /IP or SOAP on the exter nal side.
In addition to this translation and connectivity tasks, gateways can be also used to enhance the overall netw ork perform ance w ith their resources. Caching and other m ethods used to reduce response tim e and save energy are some of the com m on capabilities that can be in cluded in the gateways. Sometimes in the literature, gatew ays are also
referred to as cluster heads'^. Some tasks can be delegated to these n o des to enhance the n etw o rk com m unications an d accessibility. 2.1.1.3 High-level Services / Backend
To use the sensor observations an d m easurem ent d ata in different sce narios, applications an d users require to retrieve the inform ation from the un d erly in g sensor netw orks. This can be achieved by providing a graphical u ser interface for the end-user or interfaces for softw are com ponents to com m unicate w ith the nodes. The interfaces can be im plem ented by leveraging existing standards such as SQL an d Web services.
In the next section w e introduce key com ponents an d challenges in designing m iddlew are for w ireless sensor netw orks.
2.1.2 The Role of Middleware
M iddlew are provides connectivity an d interoperability betw een dif ferent layers an d is usually u sed to h ide com plexity from low er tech nologies to ease the application developm ent an d integration for de velopers an d end-users. In com m unication netw orks, it acts as a ser vice provid er bu ilt on top of the netw ork layer to abstract from the low-level com m unication layers.
The C om m on Object Request Broker A rchitecture (CORBA) is one of the first [126] m iddlew are solutions to connect different applications from finance, m edicine, m anufacturing an d other areas w here various softw are com ponents are used. CORBA uses interfaces to exchange objects. The interfaces are n o t b o u n d to specific softw are architectures b u t can be extended by req uest brokers. The brokers can be im ple m en ted m different languages. CORBA is u sed in w orkstations and traditio nal netw orks. This lim its CORBA-based m iddlew are for the use in WSN. W SN require m iddlew are solutions adaptable to the re quirem ents of ad-hoc, low -processing an d pow er lim ited devices. In this section w e categorise the different challenges in designing and utilising m iddlew are solutions for WSN.
2.1.2.1 Hardware Abstraction
Sensor netw orks can include a large n u m ber of nodes. The netw ork size usu ally ranges from sm all netw orks w ith less th an 10 nodes u p to large-scale netw orks w ith several th o u san d nodes. In ad-hoc n et w orks, n o des can join an d leave d u e to n atu ral obstacles or m obil ity reasons. D espite the fact th a t the u n derlying w ireless netw o rk is volatile, the netw ork consists of heterogeneous sensor nodes. Sensor 4 There could be some variation in defining the role of cluster heads in different sce
narios. In this work, we refer to those that act as a gateway between different nodes, capillary networks and the core network.
2 . 1 W I R E L E S S S E N S O R N E T W O R K S I9
no des have different h ard w are an d su p p o rt different com m unication protocols. A solution for integrating heterogeneous hard w are is p re sented in C hapter 3. Some nodes have stronger processing capabili ties; som e have m ore sensing devices w hich can be accessed by ex ternal applications. This com plexity of sensor netw orks requires an abstraction w hich hides the netw ork an d access details an d provides a generic access m echanism to the netw ork. A h ardw are abstraction layer is needed to represent different hardw are nodes an d to p ro vide m echanism s to ensure seamless access to the W SN resources. As show n in Figure 1, the abstraction layer hides the com plexity of the u n derlying netw ork an d provides a generic access to the d ata (la) an d allows to refer to a p articular node an d retrieve their inform a tion (lb). In the existing research, hard w are abstraction is interpreted in different ways. Some researchers see abstraction as the abstraction from the hardw are platform to create a unified access layer. Some others provide abstraction from com plex p ro gram m ing languages. In sensor netw orks, b o th concepts are needed an d are discussed in the following sections.
The Internet relies on the T C P /IP protocol to ensure connection be tw een different nodes. M ost of the sensor devices are n o t able to fully su p p o rt the T C P /IP stack, d u e to the lack of req uired protocol stacks, different com m unication interfaces or having lim ited processing fa cilities. O ther solutions such as 6LowPAN [96] are requ ired to estab lish a com m unication fram ew ork for constrained devices. 6LowPAN is an approach to p o rt the T C P /IP stack to resource constrained d e vices. However, d u e to different h ardw are device requirem ents n o t all devices can su p p o rt this new standard. M iddlew are n eeds to su p p o rt several low-level protocols by using w rap p ers an d adap ters th a t create unified connectivity.
2.1.2.2 Data Processing
The sensor observation an d m easurem ent d ata sh ou ld be processed by the nodes a n d /o r com m unicated to other nodes an d application s /n o d e s th a t w ill use the data. The inform ation g athered by sensors can be com pressed an d aggregated locally to save transm ission en ergy. The data can be aggregated in the n etw o rk (in-netw ork p ro cessing) in a distributed fashion. C entral approaches in w hich the processing is done by a central instance are u sed w h en the nodes are n o t able to p erform the task. The aggregation usually contains the m athem atical operators such as SUM, M IN, MAX an d AVG to com pute data. The data processing can include advanced tasks such as event detection, clustering, outlier detection, know ledge detection a n d other intelligent algorithm s. But this is n o t the m ain goal of m id dlew are an d is of m ore interest if the app lied technique is u sed to enhance the service provisioning or the data abstraction.
2.1.2.3 Monitoring and Management
To observe an d control the status of the connected sensor devices an d the connected applications, a m iddlew are sh ou ld introduce m on itoring an d m an agem ent functionalities. This allows the application developer or the en d-user to d eb u g applications an d identify possi ble errors either in the netw ork or the used application. To ad ap t to changes or failures in the WSN, it can be re-configured by the use of adm inistrations tools. To ensure th a t all applications an d ser vice consum ers receive the rig ht am ou nt of inform ation, even w h en the architecture is exhausted. Q uality of Service (QoS) m echanism s sho uld be introduced to guarantee the perform ance of the overall system . The m iddlew are can, for exam ple, define rules to prioritise different requests. In ad d itio n to the m anagem ent functionalities, the m iddlew are sho uld offer self-organising facilities such as self-healing, self-configuration an d self-adaption to w ork w ith as less h u m a n inter action an d m aintenance as possible.
2.1.2.4 Programming
P rogram m ing abstraction is one of the m ain pillars of m iddlew are. M iddlew are sho uld seam lessly allow developing an d deploying al gorithm s w hich are then distribu ted an d app lied to the overall n et w ork. Sensor devices are typically pro gram m ed in device-level by u s ing low-level p rog ram m in g languages such as C or Assembly. Some m iddlew are approaches introduce abstractions to those low-level lan guages.
2.1.2.5 Provisioning
To provide access to the data gathered by the sensors, m iddlew are introduces com m on interfaces th a t are accessed th ro u g h other nodes a n d /o r softw are com ponents. Some m iddlew are approaches provide interfaces according to com m on standards. This can be also called ab straction b u t this abstraction occurs in the data access an d functional ity level. D eclarative Languages such as SQL w ere the first interfaces exploited to abstract an d access d ata from the und erlyin g sensor n et w ork. SQL is a com m on language to qu ery for data; how ever in W SN m iddlew are P ub lish/S ubscribe approaches are required to define ac cess m etho ds to the data an d are described furth er in Section 2.3.3. 2.1.2.6 Security
U sing sensor devices to connect physical objects to high-level services dem an d s secure com m unication. Device an d inform ation sharing in sensor netw orks in som e cases is a very sensitive task as those devices can be directly integrated in personal life scenarios such as Sm art H om e E nvironm ents an d the m edical care dom ain. M iddlew are has
2 . 1 W I R E L E S S S E N S O R N E T W O R K S 21 M o n ito r in g L ocal G lo b a l W e a t h e r s t a t i o n s F o r e s t F ire O b s e r v a tio n S m a r t C o n f e r e n c e S m a r t H o m e
Figure 3: Monitoring Scenarios
to provide security an d role m anagem ent functionalities to ensure th a t only applications an d users w ith the righ t access privileges are allow ed to access the offered data an d services.
2.1.3 Use-Case Scenarios and Requirements
W SN is u sed in several different dom ains. The cheap an d sm all d e sign of sensor nodes in recent years has m ade it easy to deploy an d use them in different scenarios. M iddlew are is u sed to gather data from the sensors, m anage it an d provide it to other services an d users. The m iddlew are therefore has to be ad ap ted to the different scenarios. For exam ple in scenarios w ith h arsh environm ental properties and a h ig h failure rate of sensor nodes, m iddlew are has to com pensate for the loss of nodes to guarantee a functional netw ork. In surveillance applications the focus is on security related topics. D ata gathered by the sensors about objects w hich is su bm itted to base stations an d other m em bers of the netw ork has to be encrypted in this type of scenarios. Recently, WSN are also used to observe the vital b o d y and health signals of hum ans. M iddlew are in this dom ain is often u sed to provide real-tim e inform ation of the d ata an d has to ensure reliable access to the data.
In the following, w e discuss som e com m on scenarios w here W SN are used an d m iddlew are is deployed to m anage the sensors an d provide interoperability w ith other services an d users. The follow ing term s are then discussed in the context of different m iddlew are approaches. 2.1.3.1 Monitoring
O ne of the application dom ains for W ireless Sensor N etw orks is m o n itoring an d observation. M onitoring can be d iv id ed into several as pects dep en d in g on the spatial properties, e.g. a w ide area to be observed or a sm all location an d ind oo r or outdo or observations as show n in Figure 3.
Forest Fire Tracking a n d in general E nvironm ental O bservation is one application field of W SN [112]. Sensor nodes observe light an d heat level in large spatial areas. There are several m iddlew are approaches w ith the focus on object an d event tracking. N odes have to be m an aged, those nearer the fire have higher com m unication an d therefore higher energy consum ption, nodes farther away could be u sed as relays. As the fire is a dynam ic event-based process from the WSN perspective, softw are agents are one approach to cope w ith this area. 2.1.3.2 Logistics
W ith the cheaper p rod uction costs of sensors an d the upcom ing of the RFID-technology, w arehouse stores start to track their item s w ith RFID tags. The tags are activated th ro u g h an induction field or sm all batteries an d the ID is sent to the reader. M odern w arehouses have sensor netw orks deployed to keep track of the stock an d m ovem ents in the w arehouse. This inform ation can be directly accessible in the online shops for the end-user. There are several m iddlew are approaches th a t gather d ata an d m ake it available to w eb-based or other service oriented applications.
2.1.3.3 Internet o f Things
The Intern et of Things aim s the connectivity an d interoperability of physical devices [135]. A com m on u sed exam ple is a light switch; its cu rrent state is sensed by a sensor an d changed b y an actuator. By m aking this "thing" available to other things, new use-cases such as energy redu ction in sm art hom es comes up. The interoperability dem an ds a heterogeneous n ode su p p o rt th at a w ide variety of sensor devices is su p p o rted an d can be accessed. M iddlew are is u sed to act as a translator betw een different hard w are platform s, protocols an d standards.
2.1.3.4 Body-Centric Applications
Body sensor netw orks are u sed to observe an d track the h u m a n vital signals. M iddlew are gathers this d ata and m akes it available to the end-user. The d ata has to be subm itted in real-tim e an d som e signals m igh t be m ore im p o rtan t th an others. M iddlew are needs to su p p o rt the Q uality of Service properties in this typ e of scenarios. A n exten sive survey of Body A rea N etw orks A pplications has been conducted by Seyedi et ah [114]
2.1.3.5 Main Requirements
In this section w e identify som e key features for w ireless sensor n e t w ork m iddlew are an d give a brief overview of them .