Integration of Reliable Sensor Data Stream Management into Digital Libraries

(1)

Integration of Reliable Sensor Data Stream Management into

Digital Libraries

Gert Brettlecker, Heiko Schuldt University of Basel [email protected] Peter Fischer ETH Zurich [email protected] Hans-Jörg Schek University of Konstanz Abstract

Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, into stream processes. Due to the recent advantages in sensor technologies and wireless communication, the amount of information generated by DSM will increase significantly. In order to efficiently deal with this streaming information, Digital Library (DL) systems have to merge with DSM systems. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) will generate a significant amount of information stored in an e-health digital library (electronic patient record). In order to stream-enable DL systems, we present an integrated data stream management and Digital Library infrastructure in this work. A vital requirement for healthcare applications is however that this infrastructure provides a high degree of reliability. In this paper, we present novel approaches to reliable DSM within a DL infrastructure. In particular, we propose information filtering operators, a declarative query engine called MXQuery, and efficient operator checkpointing to maintain high result quality of DSM. Furthermore, we present a demonstrator implementation of the integrated DSM and DL infrastructure, called OSIRIS-SE. OSIRIS-SE supports flexible and efficient failure handling to ensures complete and consistent continuous data stream processing and execution of DL processes even in the case of multiple failures.

Categories and Subject Descriptors

H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Information Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Management]: Systems

General Terms

Reliability, Measurement, Performance, Experimentation

Keywords

Checkpointing, Information Filtering, Data Stream Management, Sensor Data

1 Introduction

Recent trends in pervasive computing, together with new (wearable) sensor technologies, powerful mobile devices, and wearable computers strongly support novel types of applications. In the healthcare domain, for instance, applications make use of this new technology in order to improve the quality of treatment and care for patients and the elderly. Continuous data streams generated by (wearable/mobile) sensors have to be processed online in order to detect critical situations. For this purpose, usually different streams (generated by different types of sensors) have to be combined (e.g., jointly consider oxygen saturation, ECG signals and blood pressure). This is done by making use of specialized operators. An infrastructure for telemonitoring has to be able to reliably combine these operators in an application-specific way.

In general, the area of sensor data management and processing is a very challenging but also timely one for several reasons. First, the number of sensors for continuous data generation is continuously increasing, not only in the healthcare domain. Online traffic monitoring or network monitoring are other application domains which rely on the presence of (hardware and/or software) sensors. These sensors produce vast amounts of data which needs to be processed, analyzed and managed online and in a reliable way. Thus, having an appropriate infrastructure for telemonitoring is crucial. Second, sensor data and data stream management also greatly affects Digital Libraries. Data generated by processing streams of sensor information (e.g., after

(2)

aggregating data over a certain time window or outliers with special semantics which have been detected in a stream) has to be added to a DL. In the healthcare example, this means to store aggregated stream data and critical health states detected in the sensor signals in an eHealth Digital Library[Schuldt 2005] (electronic health record). Therefore, the infrastructure for data stream management has to consider i.) operators which deal with continuous streams of data and ii.) discrete operators/services which allow for the interaction between data stream management and Digital Libraries. Additionally, Digital Libraries affect also data stream management, considering automatic adaptation of data stream processing when new laboratory results are arriving in the patients electronic health record.

1.1 Traffic Management Systems

Monitoring and managing road traffic is an area that has seen a lot of interest in the last decades, since improving the utilization of streets is far lower than the cost of building new streets. Most of the existing systems only cover specific "hot-spot" areas with a lot of traffic due to the high cost of deploying fixed sensors, the necessary communications systems and the complex control and indicator systems. Due to available sensors, both mounted on streets (toll monitoring) and vehicles (GPS, road conditions), a lot of accurate sensory data is already available. Additionally, the cost and reliability of wired and wireless communication has been reduced significantly and allows for access to the already available sensory data. The increased amount of traffic information data needs to be processed in a rapid and reliable manner, so that the relevant information is relayed to all parties in traffic in an almost real-time fashion. The number of sensors and also the number of parties’ receiving information is in the number of millions, with some of the sensors delivering reading every second. Therefore, a scalable, reliable stream processing system is required.

1.2 Telemonitoring in Healthcare

Telemonitoring applications enable healthcare institutions to take care of their patients while they are out of hospital, which is especially useful for managing various chronic diseases as well as for measuring the effects of treatments under real-life conditions. Secondly, they serve as instrument for performing research and for accomplishing medical studies. Thirdly, they allow for triggering of emergency services in case of severe health conditions. Finally, they can offer additional comfort services as by-product, like ambient assistive services, information services and communication services. As a consequence, the patient’s disease will be better managed with less hospitalisation (which usually has physical and emotional impact) and higher quality of life. In the EU, 23 million adults are suffering from diabetes [Petersen Peto Rayner 2005]. One of them is Fred. He is a 65 years old retiree and lives alone in a house of his own. High blood pressure and diabetes often occur together and if left untreated can lead to serious consequences for the heart [Nichols et alii 2001]. Unfortunately due to the long time of untreated diabetes and hypertension, Fred has developed congestive heart failure (CHF) 1.5 years ago. CHF is defined as a disorder causing the heart to lose its ability to pump blood efficiently to the rest of the body, CHF refers to a range of conditions that impair the function of the heart. Without an assistive telemonitoring infrastructure, Fred has to do manual random sampling of his blood pressure, blood glucose level, heart rate, and body weight.

As a vision for the future, Fred’s caregiver will decide to equip him with a wearable telemonitoring system consisting of a smart shirt [Park Mackenzie Jayaraman 2002, Vivometrics 2006] , a ring sensor [Asada H. et alii 2003], a glucose measuring watch [Glucose Watch 2006], and a PDA for computation and wireless communication. This setup will allow for unobtrusive monitoring of ECG, heart rate, respiratory and sweating rates, blood pressure, blood glucose level, blood oxygen saturation as well as motion activities, sensed with an inbuilt accelerometer. As Figure 1 illustrates, Fred’s PDA will wirelessly communicate with his home automation system in order to exchange data. Beside that, Fred’s telemonitoring infrastructure also aggregate additional context measurements. For this reason, Fred’s home is equipped with an indoor positioning system, an electronic scale measuring body weight and fat, an electronic medication dispenser, and a smart bed sheet acquiring Fred’s physical activity while sleeping. These additional measurements allow for detection of context information. In order to interpret Fred’s vital signs appropriate this additional context information is needed, e.g. ECG signals vary if Fred is running or sleeping. Roughly estimated all data acquired about Fred will likely exceed one GByte per day. Therefore an efficient and reliable Digital Library infrastructure is needed, which will provide the services to analyse the data accumulated, to extract, filter, and to forward relevant information to the patient and the care provider in charge and additionally offer storing and analysis of information within an e-Health DL. Even considering Fred’s care provider, which is monitoring thousands of patients, reliable and scalable data stream processing and information analysis are vital requirements for such systems. Furthermore, the infrastructure will provide a flexible platform for different kinds of monitoring applications and allow for monitoring patients suffering on various chronic diseases by supporting individual profiles for each patient.

(3)

Sensors Preprocessing, Local storage, User interface Processing, Analysis

Long term storage Preprocessing, User interface WLAN Internet Bluetooth WLAN Internet Bluetooth Caregivers Patients Mobile Devices Base stations BP Acquisition ECG Acquisition ECG Filtering BP Variability Analysis Critical Detection Alarm processes ECG Blood Pressure

Figure 1 Example of Distributed Data Stream Processing for Telemonitoring

1.3 Structure of the Paper

This document is organized as follows: Section 2 presents the Information Filter System, developed at ETH is introduced. In general, information filters mediate between sources and sinks of information by specifying a set of rules/profiles that determine what to do with (sensor) data. Sophisticated information filters also support the selection of (sensor) data to be stored and preserved in a Digital Library. Section 3 introduces OSIRIS-SE (Open Service Infrastructure for Reliable and Integrated Process Support, Stream-enabled), an extension to the OSIRIS system, which supports the combination of arbitrary stream operators (processing continuously arriving stream data) and that allows for the combination of stream operators and (web) services. A main focus of the OSIRIS-SE infrastructure is to achieve a high degree of reliability. This is particularly important in eHealth DL applications, but in other domains as well. In addition, another unique feature of OSIRIS-SE is to provide data stream processing on mobile devices (e.g., on a PDA or a smart phone), close to the sensors where data has been generated. Section 4 discusses related work and Section 5 concludes.

2 Overview on the Context-Aware Information Filter System

Information Filters play an important role in processing streams of events, both for filtering as well as routing events based on their content. An information filter connects sources and sinks of information using profiles. Parties interested in receiving information (sinks) submit a profile of their interest to the information filter, while parties interested in disseminating information (sources) send messages to the information filter. Sources are often sensors or data derived from sensors. A context-aware information filter has two input streams: (a) a stream of messages (e.g., blood pressure readings) that need to be routed and (b) a stream of context updates such as the position or activity of a person.

2.1 Status

A test bed implementation of an information filter system has been developed at ETH since 2003. This test bed implementation supports keyword/value messages and predicate-based profiles, and can be extended to support XML documents and XPath profiles. The implementation is targeted towards large number of profiles, high message rates and high context update rates [Dittrich Fischer Kossmann 2005], also supporting batched handling of messages to increase message throughput [Fischer and Kossmann 2005]. In addition, the information filter system has support for high availability and reliability. On the basis of this information filter system, a QoS study has been performed, focusing on the impact of processing strategies [Fischer and Kossmann 2006].

(4)

Figure 2 Schema of a context-aware information filter

2.2 Information Filters in Data Streams

Information Filters allow loose coupling of data producers and consumers via content-based profiles, which is particularly useful in the medical and traffic scenarios in this document:

• Support for personalized information delivery by profiles, e.g. delivering news about medical progress to patients and doctors based on the illnesses of the patients.

• Forwarding/retaining of sensor data from a patient to other systems depending on the context: a heart rate above 130 is OK for a young person doing sports, but indicates problems otherwise.

• Routing of medical events to the nearest/best fitting medical supplier/treatment institution, based on the location of patients, ambulances and also the preliminary medical analysis

To support these functionalities, information filters should be integrated at the various stages of the architecture: mobile devices, home PCs/gateways, healthcare and service providers.

A demonstrator has been developed which shows how the information filter test bed can be used to examine various workloads and processing strategies. In this test bed, it is possible to measure event throughput, memory consumption and various QoS parameters like latency, errors or jitter.

2.3 MXQuery – Declarative Streaming Queries

Existing Data Stream Systems are typically built at a fairly low abstraction level, providing operators that need to be manually placed into a plan. The data model and the operations are loosely based on SQL, but the relational model does not work very well for streaming data, since extensions for ordered or time-based data don’t mix well with the rest of the language.

Our proposal is based on XQuery [Boag et alii 2006]. A significant amount of streaming data (RSS, monitoring data) is already in XML format or can be easily expressed in the XQuery data model [Fernandez et alii 2006]. The XQuery data model contains ordered sequences of items, which can be simple types or node trees. Contrary to popular belief, the XQuery data model can be implemented very efficiently, giving comparable performance to specialized relational implementations like Aurora [Abadi et. alii 2003]. XQuery with the relevant extensions (e.g., window statements) [Carabus et alii 2007] provides an elegant way to do stream processing. Another extension of XQuery, called XQueryP [Carey et alii 2006] includes the ability to call and also export web services. Both extensions have been co-developed at ETH and sent to the W3C for standardization.

MXQuery [Fischer et alii 2007] is an open-source, lightweight implementation of XQuery, XQueryP and the streaming extensions developed at ETH. It runs on any Java platform (including J2ME CLDC 1.0), using a small amount of memory: The minimal package is around 80KB; the typical size is around 350KB. It has been deployed onto Mobile Phones, PDAs, Network Gateways and regular PC platforms. It will also run on the upcoming SunSpot sensors.

In the streaming scenarios, it can provide the combination of streaming and web service data as well as many stream-related operations that do not require low-level data transformations. Since it runs on all stages of the proposed architecture, it provides a uniform, high-level programming environment for the complete architecture.

(5)

3 Data Stream Processing and Failure Handling with OSIRIS-SE

Data Stream Management (DSM) addresses the continuous processing of sensor data. This is done by combining dedicated data stream operators into stream processes. These operators might run on different distributed devices (e.g., sensor signal filtering at a PDA while sophisticated analysis and correlation operators are hosted by a more powerful server). Recent advantages in wireless communication standards, powerful mobile devices, and wearable computers proliferate ubiquitous and pervasive computing. At the same time, new sensor technologies are emerging and producing vast amounts of data. These trends are fostering distributed DSM, which is increasingly gaining importance in various application domains. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) and integrating this information in an e-Health DL (electronic patient record) is becoming more and more important, mainly due to the progression of chronic ailments in an aging society. A vital requirement in telemonitoring is that the infrastructure for distributed DSM provides a high degree of reliability and availability, since it can potentially be life-saving.

Consider, as an example, our patient Fred being equipped with a wearable telemonitoring system as described in Section 1.2. Quality of life and disease treatment can greatly benefit from reliable and correct interpretation of Fred’s physiological signs. Dedicated operators (e.g., for detection of pathological heartbeats) are shared among a wearable device (e.g., smart phone or PDA), a PC at the patient’s home and servers at the caregiver side. Sensor information is reliably processed by a telemonitoring infrastructure and medical relevant information has to be stored into an e-Health Digital Library. A telemonitoring infrastructure offers distributed DSM, supports the analysis of the data accumulated, and allows to extract and forward relevant information to the healthcare provider in charge and offers integration into Digital Libraries for long term storage and analysis. Reliability is of utmost importance in this scenario. Therefore no data stream elements are allowed to be omitted from processing since the infrastructure is in charge of detecting critical situations or even anticipating them.In this work, we focus on the reliability of an integrated data stream management and Digital Library infrastructure. In particular, we investigate algorithms for the reliable execution of data stream operators within a stream process based on coordinated and efficient checkpointing of operator states. Contrarily to activities, which are the basic building blocks of traditional workflow processes, data stream operators are stateful. This means that operators aggregate an internal state during the processing of a data stream over time. Checkpointing is the task of taking regular backups of these time-dependent operator states during processing and store them at a proper backup node. In case a running operator instance fails, a new operator instance can be created and initialized with a recent operator state backup, this mechanism is called operator migration. Contrarily to other work in data stream management [Hwang et alii 2005] , we apply reliability in a fine grained way at the level of operators, rather than considering the whole stream processing engine running on the affected node. Hence in case of failures or overload situations, each operator instance running on the affected node may be individually continued on the best available alternative node. Based on operator migration, we propose and evaluate a new reliability protocol, called Efficient and Coordinated Operator Checkpointing (ECOC), to reduce the drawbacks of passive standby, i.e., high runtime and recovery overhead. Similar high demands on result quality may also arise in different stream processing applications, e.g., in the presented traffic monitoring scenario.

The ECOC approach is incorporated into OSIRIS-SE [Brettlecker Schuldt Schatz 2004, Brettlecker Schuldt Schek 2005, Brettlecker Schuldt Schek 2006a, Brettlecker Schuldt Schek 2006b], the proposed infrastructure for a data stream enabled Digital Library. In particular, OSIRIS-SE provides an infrastructure that is able to efficiently combine, process, and manage continuous streams of data coming from different sensors across a loosely coupled network of nodes.

3.1 OSIRIS-SE

OSIRIS (Open Service Infrastructure for Reliable and Integrated process Support) [Schuler et alii 2003, Weber et alii 2003, Schuler et alii 2004, Schuler et alii 2005] is a prototype Digital Library infrastructure developed at ETH Zurich and the basis of our work. OSIRIS controls the execution of traditional processes as needed for DL applications. We refer to traditional processes as partially ordered sets of well defined activities which correspond to the invocation of services in a request/reply style. Furthermore, OSIRIS allows for reliable and distributed process execution in a peer-to-peer style without centralized control in a shared-nothing network. According to this, a node of the OSIRIS network works off its part of a process based on locally replicated meta-data about the process definition and then directly migrates the process instance data to a node offering a suitable service for the next activity in the process. For this reason, the OSIRIS architecture consists of two parts (see Figure 3): i.) a software layer on each node (OSIRIS layer), and ii.) core services offering repositories for meta-data. Due to the invocation of discrete services, these processes are not able to perform data stream processing.

(6)

We have extended OSIRIS to OSIRIS-SE (Stream Edition) [Brettlecker Schuldt Schatz 2004, Brettlecker Schuldt Schek 2005, Brettlecker Schuldt Schek 2006a, Brettlecker Schuldt Schek 2006b], which supports DSM. Similar to process execution in OSIRIS, the execution of DSM processes, called stream processes, is based on locally replicated meta-data. Additional information in core repositories is needed for stream processing, like available operators and providers, and stream process definitions. Also the reliable FIFO-transfer of stream elements and duplicate elimination is added for DSM.

Figure 3 OSIRIS-SE Architecture

3.2 Data Stream Model and Failure Handling

The basis of the proposed reliable data stream management is the operator and failure model, we present in this section. All definitions are based on data streams, which are defined as a continuous transmission of sequentially ordered data elements. Each data element contains several data items as payload information and has discrete time context, e.g., a sequence number.

Operator

Operator State Processing

Step

consume input elements do processing

write output elements

Side Effects within current processing step

Input Streams Output Streams

B A D C t t t t Elements consumed by current processing step Elements produced by current processing step

Figure 4 Operator Model

Operators (Figure 4) are the basic building blocks of data stream management. Running operator instances consume input elements from one ore more input streams and/or produce output elements (marked in grey color in Figure 4) for one or more output streams during the execution of an atomic processing step, while performing a state transition. Produced output elements are stored in output queues for downstream operators. A node in the DSM infrastructure hosting a running operator is also called provider. With respect to the investigated application scenario, we consider operators as stateful and deterministic machines, for example considering an operator calculating an average value over heart rate measurement readings of the last hour. Essentially, every operator produces the same output stream and result into the same operator state when provided with the same input stream starting from the same operator state. Optionally, the processing step may produce a side effect, e.g., performing a backup of the current operator state.

Sensor operators are operators without input data streams in our model. Sensor operators acquire their input directly from senor data sources. Our deterministic ECOC approach for reliability is not valid for these sensors, because regardless they are restarted from a given state, the acquired physical sensor data will be

(7)

different and thus the output stream is not consistent. Reliability for sensor operators demands additional effort, e.g., establishing active standby sensor operators. Details on this are out of scope of this paper. Output operators are operators without output data streams. These operators store or transmit the result of stream processing to external systems.

A stream process is a well defined set of logically linked operators continuously processing the selected input data streams, thereby producing results and having side effects. Figure 5 illustrates a stream process which continuously monitors ECG and blood pressure of a patient. Each box in Figure 5 contains a full-fledged operator of Figure 4.

3.2.3 Failure Handling for Reliable Data Stream Management

In general, reliable and fault-tolerant DSM implies that stream processes have to be executed in a way that the process specification is met, even in case of failures, i.e., to correctly execute all operators in proper order without generating gaps, duplicates, or wrong elements in the result stream. The definition based on our deterministic operator model reads as follows: Reliable DSM produces a result stream and side effects, which are equal to the result stream and side effects produced by an ideal, faultless DSM system.

In this work, our approach supports the following failure scenarios for reliable DSM: single or multiple fail-stop failures of operator instances or their providers and single or multiple network failures. Multiple failures are a sequence of single failures within the recovery time. With respect to failure handling, we apply the following failure classification:

Temporary failures, e.g., a temporary network disconnection (loss of messages) or a temporary failure of a provider which is able to recover within the maximum allowed delay time dmax, are compensated by the output buffers of the upstream provider. For recovery, the upstream provider resends the data stream elements and receives an acknowledge message. Failures exceeding dmax become permanent failures.

Permanent failures, e.g., a permanent network disconnection or failure of a provider, require to migrate the operator instance with its aggregated operator state from the affected provider to another suitable provider. Operator migration implies the continuation of an operator instance from a recent checkpoint on a new provider in order to allow for seamless continuation of DSM, and eventually the stopping of an old running operator instance.

Consequences of a failure in distributed DSM usually affect more than one node of the infrastructure because upstream and downstream operators may be on different nodes. Therefore, it is vital for proper failure handling of OSIRISSE that all affected nodes detect the failure or have to be informed about the failure. Further details on OSIRIS-SE can be found in [Brettlecker Schuldt Schek 2005, Brettlecker Schuldt Schek 2006a, Brettlecker Schuldt Schek 2006b].

Analysis ECG Aquisition ECG Variability Critical Detection Invocation of Alarm Processes Blood Pressure

Aquisition Blood Pressure_Variability

Patient‘s PDA Patient‘s PC _{Caregiver‘s PC}

Figure 5 Example Stream Process

Whereas for temporary failures no further failure handling by the OSIRIS-SE infrastructure is needed, the infrastructure has different possibilities to deal with permanent failures:

Firstly, we have transparent failure handling, which is done by OSIRIS-SE without any application-specific behaviour. Transparent failure handling is automatically applied by the infrastructure if a permanent failure occurs. Failure handling is done by operator migration based on recent and consistent operator state backups.

(8)

Permanent failures cause the unavailability of one or more operator instances of a running stream process. In order to replace the unavailable (failed) operators and seamlessly continue stream processing, the infrastructure creates new operator instances running on still available providers and initializes the newly created operators with recent operator state backups. The efficient and coordinated checkpointing (ECOC) algorithm applied by OSIRIS-SE guarantees the availability of a recent and consistent operator state backup even in the case of multiple failures occurring at the same time [Brettlecker Schuldt Schek 2005, Brettlecker Schuldt Schek 2006a, Brettlecker Schuldt Schek 2006b].

Furthermore, if transparent failure handling is not applicable, there are two more application-specific failure handling mechanisms available in OSIRIS-SE. Firstly, the application designer, which is graphically creating the stream process, is able to define alternative execution branches. In case of failure, the infrastructure is able to move to an alternative execution branch, which is using a different set of operators, in order to continue stream processing. For example, for patient Fred it could be sufficient to derive only the heart rate from the ECG and do no further ECG signal analysis in case of limited available computing resources. Secondly, if also alternative execution branches are not applicable, the OSIRIS-SE infrastructure is able to generate application specific alarm processes. These alarm processes are traditional discrete processes as used for DL application and can generate information for the e-Health DL or inform physicians, emergency services, and patients. For example, if patient Fred is leaving the house and therefore continuous telemonitoring is unavailable, the OSIRIS-SE infrastructure will feed this information into the e-Health DL in order to maintain unmonitored periods for the analysis of the physician and may additionally inform Fred, that he is currently unmonitored.

3.3 Demonstration of OSIRIS-SE infrastructure

OSIRIS-SE is an extended version of the predecessor OSIRIS [Schuler et alii 2003, Schuler et alii 2004, Schuler et alii 2005] which offers an infrastructure for reliable P2P process execution. OSIRIS-SE, in addition, allows for the P2P execution of stream-processes in a distributed environment. OSIRIS-SE is programmed in Java and runs on various platforms, including also PDAs with MS Windows Mobile 2003. The demonstration scenario includes mobile devices (PDA) and a microcontroller based ECG sensor in order to demonstrate the usability of the presented OSIRIS-SE infrastructure. The ECG sensor is continuously measuring the patients ECG by using three electrodes. The sensor applies low level signal amplification and A/D conversion. The integrated microcontroller offers also the ability for short-term storage. The PDA is acquiring the ECG signal from the sensor and is a full fledged node of the OSIRIS-SE infrastructure with wireless network connection. Figure 6 shows the PDA’s and smart-phone, the ECG sensor with patient electrodes, the webcam, and the process editor O’Grape [Weber et alii 2003].

Figure 7 illustrates the waveforms of the acquired ECG signal containing noise artefacts occurring in the stream process used in the demonstration scenario. The next operator “Dsp Filter” is hosted by the laptop and applies a noise reduction at the ECG signal (see

Figure 8). Finally, the “QRS Detector” also hosted by the laptop derives medical relevant information of ECG signal. Figure 9 illustrates the result of the QRS Detector, where the centre of the QRS complex is marked and also the current heart frequency is derived.

(9)

Figure 6 The Data Stream Demonstrator Setup

Figure 7 ECG waveform acquired from the ECG Sensor device

Figure 8 ECG waveform produced by the DSP Filter operator

(10)

4 Related Work

Data Stream Management has received quite some attention recently. Although only few work is focusing on aspects of availability and reliability.

Aurora [Balakrishnan et. alii 2004] allows for user defined continuous query processing by placing and connecting operators in a query plan. Queries are based on a set of well defined operators. Algorithms for high available DSM in the context of Aurora are discussed in [Hwang et alii 2005]. In contrast to our work, this work addresses reliability at the level of the whole stream processing engine running on the affected node whereas we focus on reliability at the level of operator execution. Further work [Balazinska et alii 2005] presented in the context of Borealis, an extension of Aurora, allows for reduced result quality which is not applicable considering our indented healthcare application scenario.

TelegraphCQ [Chandrasekaran et alii 2003] is a DSM project with special focus on adaptive query processing. Fjords allows for inter-module communication between an extensible set of operators enabling static and streaming data sources. Eddies supports adaptive query processing. Sets of operators are connected to the Eddy, and Eddy routes each tuple individually. Flux [Shah et alii 2003] provides load balancing and fault tolerance [Shah et alii 2004] by providing adaptive partitioning of operator execution over multiple network nodes. This is realized by placing Flux between producer/consumer pairs. Therefore, contrarily to our approach, Flux describes an active-standby approach, where parts of stream processing are partitioned to be reliably executed in parallel. This active approach is not applicable to our intended healthcare monitoring scenario, where hardware resources at the patient homes are limited.

5 Conclusion and Outlook

In this paper, we have reported on the prototype system for integrated data stream processing and process management into Digital Libraries that has been jointly developed. Starting with an evaluation of use cases for managing and processing continuous streams of data, two core parts of the stream-enabled Digital Library infrastructure have been developed. First, information filtering operators allow for analysis of data streams and extracting relevant information. Second, it contains a reliable infrastructure for combining data stream operators and (traditional) discrete web services. This document has described in details the concepts of these demonstrators, their implementation, and also how these demonstrators can be used.

In future work, this prototype DL system will be refined and more functionality will be added. From the point of view of the DL infrastructure, we will emphasize on the performance of complex stream processes with intra-process parallelism, i.e., on data stream processes containing join and split operators, and stream processes with cyclic data stream processing flows. From a user’s point of view, it is planned to add the support for personalizing operators and stream processes. This means that based on individual profiles, different sensor measurements should be interpreted differently for different users (e.g., in the eHealth applications: high blood pressure for an athlete while engaging in strenuous training does not need to raise an alarm, whereas for a patient that has already suffered a heart attack, such a reading might indicate hypertension). This also affects information filtering in that queries, filters, and alarm triggers need to be personalized. Finally, support for using historic data from Digital Libraries for data stream processing will be added. Various telemonitoring applications require the comparison of current stream data to historic data. This is, for instance, important when changes to the diurnal biorhythm have to be detected (e.g., as it is required for monitoring patients with cognitive disabilities). The stream processing infrastructure has to make this historic stream data available, for instance by aggregating and persistently storing historic stream data. Aggregation then has to take into account the accuracy needed for processing historic data. The goal of this activity is to prototypically implement operators for the aggregated storage of stream data.

References

Abadi D., et alii. 2003. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, 12(2):120–139

Asada H. et alii 2003. Mobile Monitoring with Wearable Photoplethymographic Biosensors. IEEE EMB Magazine, 22(3):28–40.

Balakrishnan H. et alii 2004. Retrospective on Aurora. VLDB Journal

Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M. 2005. Fault-Tolerance in the Borealis Distributed Stream Processing System. In: Proc. of SIGMOD 2005 Conf., Baltimore, MD (USA)

(11)

Boag S. et alii. 2006 XQuery 1.0: An XML Query Language

Brettlecker, G., Schuldt, H., Schatz, R. 2004. Hyperdatabases for Peer–to–Peer Data Stream Processing. In: Proc. of ICWS 2004 Conf., San Diego, CA, (USA) 358–366

Brettlecker, G., Schuldt, H., Schek, H.J. 2005. Towards Reliable Data Stream Processing with OSIRIS-SE. In: Proc. of BTW 2005 Conf., Karlsruhe, (Germany) 405–414

Brettlecker, G., Schuldt, H., Schek, H.-J., 2006a. Eine Pervasive-Healthcare-Infrastruktur für die verlässliche Informationsverwaltung und -verarbeitung im Gesundheitswesen. In: Datenbank-Spektrum 6 (17), May 2006

Brettlecker G., Schuldt H., Schek H.-J. 2006b. Efficient and Coordinated Checkpointing for Reliable Distributed Data Stream Management. In: Proc. of the ADBIS 2006 Conf., Thessaloniki (Greece)

Hwang, J., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., Zdonik, S. 2005. High Availability Algorithms for Distributed Stream Processing. In: Proc. of ICDE 2005 Conf., Tokyo (Japan)

Carabus I. et alii 2007. Extending XQuery with Window Functions: Technical Report, ETH Carey M. et alii 2006. XQueryP: An XML Application Development Language, XML Conference

Chandrasekaran, S., et alii. 2003. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: Proc. of CIDR Conf., Asilomar (USA) (2003)

Dittrich J.-P., Fischer P. M., Kossmann D. 2005 AGILE: Adaptive Indexing for Context-Aware Information Filters. In: Proc. of SIGMOD 2005, Baltimore, Maryland (USA).

Fernandez M., et alii. 2006. XQuery 1.0 and XPath 2.0 Data Model (XDM), 2006.

Fischer P. M. , Kossmann D. 2005. Batched Processing for Information Filters. In: Proceedings of ICDE‘05, pp 902-913, Tokyo (Japan) April 2005.

Fischer P. M. , Kossmann D. 2006. Quality of Service in Stateful Information Filters. In: Proceedings of DMSN‘06, Seoul (Korea)

Fischer P., et alii. 2007. MXQuery, http://mxquery.org

The glucose watch. 2006. http://www.glucowatch.com

Nichols G.A. et. alii. 2001. Congestive heart failure in type 2 diabetes: prevalence, incidence, and risk factors. Diabetes Care, 24 (9), Sep. 2001.

Park S., Mackenzie K., and Jayaraman S. 2002. The wearable motherboard: a framework for personalized mobile information processing (PMIP). In Proc. of the 39th Conf. on Design Automation, pages 170–174, 2002.

Petersen S., Peto V., Rayner M. 2005. The European Cardiovascular Disease Statistics 2005,

http://www.ehnheart.org/files/statistics%202005-092711A.pdf

Schuldt H. 2005. Service-oriented Advanced Digital Libraries in Healthcare and their Application to Virtual Electronic Health Records and Telemonitoring. In: Proceedings of the Healthcare Digital Libraries Workshop (HDL 2005), Vienna (Austria) September 2005.

Shah, M., Hellerstein, J., Chandrasekaran, S., Franklin, M. 2003. Flux: An adaptive partitioning operator for continuous query systems. In: Proc. of ICDE 2003 Conf., Bangalore, (India)

Shah, M., Hellerstein, J., Brewer, E. 2004. High Available, Fault-Tolerant, Parallel Dataflows. In: Proc. of SIGMOD 2004 Conf. 827–838.

Schuler, C., Schuldt, H., Türker, C.,Weber, R., Schek, H.J. 2005. Peer-to-Peer Execution of (Transactional) Processes. International Journal of Cooperative Information Systems (IJCIS) 14 377–405

Schuler, C., Weber, R., Schuldt, H., Schek, H.J. 2003. Peer–to–Peer Process Execution with OSIRIS. In: Proc. of ICSOC Conf., Trento (Italy) 483–498

Schuler, C., Weber, R., Schuldt, H., Schek, H.J. 2004. Scalable Peer–to–Peer Process Management – The OSIRIS Approach. In: Proc. of ICWS 2004 Conf., San Diego, CA, (USA) 26–34

Vivometrics. 2006. The life shirt, http://www.vivometrics.com

Weber, R., Schuler, C., Neukomm, P., Schuldt, H., Schek, H.J. 2003. Web Service Composition with OGrape and OSIRIS. In: Proc. of VLDB Conf., Berlin (Germany)