X. Zhou et al. (Eds.): APWeb 2006, LNCS 3841, pp. 1168–1172, 2006. © Springer-Verlag Berlin Heidelberg 2006
Information System
Yu Fan, Hongyan Li**, Zijing Hu, Jianlong Gao, Haibin Liu, Shiwei Tang, and Xinbiao Zhou
National Laboratory on Machine Perception, School of Electronics Engineering and Computer Science,
Peking University, Beijing, 100871, P.R. China
{efan, lihy, huzijing, jlgao, liuhaibin, tsw, Zhouxb}@cis.pku.edu.cn
Abstract. This demo paper describes a clinical information system: Data Stream Engine based Clinical information system (DSEC). In DSEC, data stream technology as well as traditional computer mechanism is used to process medical data and improve the quality of service in hospitals. The novel features of DSEC include: 1) A complete data stream processing and querying architecture for medical application 2) being able to detect data changes in a simple and effective way; 3) a load shedding mechanism used to avoid system crash when high data rate occurs.
Keywords: Data stream, Change detection, Load shedding, Clinical Information System.
1 Introduction
Medical organization is an important application area of data stream technology [1] [2]. A number of streaming data called medical data stream are generated by equipments and sensors all over the hospital, especially in the ICU (Intensive Care Unit) with many digital equipments and seriously ill patients. Medical data stream mainly comes from the patient pathology value got by sensors, such as blood pressure, heart rate, breathe volume etc. However, it is difficult to satisfy the application requirement of with existing mechanisms. The reasons are illustrated below:
• Noisy: The noise made by treatment or the equipment imprecision will cause misre- presentation. If not processed properly, incorrect data will lead improper diagnoses that may result in medical accidents since doctors rely on data to make treatment.
*
This work was supported by Natural Science Foundation of China(NSFC) under grant number 60473072.
**
• Requiring rapid response: The rapid changes of pathology data streams should be recognized as soon as it occurs for treatment.
• Heavy system cost: High cost of the system: The processing and representation of data streams always need such costly reference of system resource as memory, storage, CPU.
In order to offer better data stream management mechanism and to improve the data quality, we have integrated a Data Stream Management System (DSMS) as data processing engine in our Clinical Information System (CIS). With the help of the latest data stream research production, we successfully meet the medical application requirement and implement the following functions:
• Stream query and storage: Besides the basic query operators such as select, join and union, some processing methods are also implemented for particular medical application.
• Change detection: Monitoring the data patterns, eliminate noisy data effect, find out the abnormity and trigger data alert when necessary.
• Load shedding: Guided by control theory, we have designed load shedding module to avoid system crash when the data rate exceeds the process capability. In addition to CPU availability, we concern more about memory resources.
Besides an efficient data stream management engine, the system presented in this paper offers the functions of medical order, Electronic Medical Record (EMR), patient management. The detailed structure and features are described in the author’s other papers [3] [4] [5] [6].
2 Related Work
Data stream applications such as network monitoring, online transaction, and sensor processing pose tremendous challenges to traditional database systems. Many projects such as STREAM, Aurora are working on it for different application requirements. In DSMS for general purpose, some ideas are helpful for the problems in medical environment.
Monitoring the pattern changes and trends of the data stream online is a fundamental challenge. As data stream mining algorithm MAIDS [7] is correspondingly complex, and will cause unacceptable delay in the actual application, the VDDM [8] proposed the concept of velocity density estimation, a technique used to understand, visualize, and determine trends in the evolution of fast data streams.
The survey [1] [2] contains a wide overview of work in the field while load shedding is also proposed. The corresponding output in Aurora is associated with a QoS specification function, chosen among a latency graph, a value-based graph and a loss-tolerance graph. Some other approaches to dealing with data stream rates, which temporarily exceed system capacity, focus on limiting the memory utilization of the system. Improving the efficiency of the allocation of memory among query operators or among inter-operator queues are alternatives to load shedding that can improve system performance when limited main memory is the primary resource bottleneck.
3 System Architecture
As illustrated in Fig 1.,DSEC is a prototype information system from the perspectives of doctors and a J2EE project developed in IBM WSAD and deployed on Websphere Application Server. Besides the data stream processing engine, we have developed patient management, visual medical record interface, patient situation evaluation, etc, which are able to satisfy the medical requirement. The entire system structure can be seen in the author’s previous works [4].As the core module of our system, data stream engine has the data preprocessing, data queues, and query network, change detection and query output.
Query Output Preprocessing Data queues
Operator Scheduler Qos Monitor Load Shedder Query network Patient Change Detection Doctor Changes
Fig. 1. Data stream engine architecture
3.1 Data Preprocessing: Wrapper and Filter for Data Stream
There are two parts in the preprocessing module: data wrapper and data filter. Data filter is used to get rid of unnecessary data because some data gathered by the medical machines will never be used in the application. We use project operator to take out the useless data filed such as machine type, connection mode, etc. Another function of data filter is to manage the system buffer in which filtered data is stored.Data wrapper is a module to convert the data from different data sources to a structured, parse-able unit. We generate some xml files to define the data structure and the wrapper is able to find the proper xml file for parsing.
3.2 Detecting Changes: Find Out the Abnormal Data Patterns
In contrast to previously proposed tools, our method almost requires no prior assumption and we do not need that the streaming time series following any specified model. SWAB algorithm given by Keogh et al [9], using sliding window, is particularly
popular with the medical community, since patient monitoring is inherently an online task. With the help of some SWAB ideas, we separate the data stream to be a set of data linear segment in which data points follows the same linear function. After segmentation, the slope differences between current segment and its neighbors are symbols of the data changes. If the different exceeds the predefined threshold, a change will be announced by our system.
3.3 Load Shedding: Avoid System Crash by Limiting Resource Abuse
Various system resources such as CPU, memory, and network bandwidth may become the bottleneck in DSMS query processing. Mainly focusing on the memory availability, our project shed system load with the help of control theory. Controlled variable is the performance metric controlled by the real-time search engine system. Performance reference represents the desired system performance in terms of the controlled variable. E(k) M(k)AR(k) C(k) I(k) + Controller Query Networ Monitor Data stream
-Fig. 2. Framework of load shedding model
Fig. 2 illustrates the framework of our feedback on the basis of DSMS load shedding model. The monitor measures the controlled variable M(k) and feeds back the samples back to the controller. The controller compares the performance references Mref with M(k) to get the current error E(k), and computes a change D(k), called control input, to the data arrival rate I(k), then we get respected data arrival rate - R(k). The controller uses a simple P (proportional) control function to compute the respected data arrival rate to keep the total memory utilization decrease to the reference Mref. The filter manager dynamically changes the data arrival rate at each sampling instant k according to the control input D(k) by adjusting the filter proportion of filters. The goal of the filter manager is to enforce the new data arrival rate R(k+1)=R(k)-D(k).
4 Conclusions and Future Works
With an efficient data stream engine, DSEC is a patient central clinical information prototype system. It has the following contribution:
• A universal data wrapper to preprocess data stream and an integrated data stream query engine.
• A more accurate and faster data trend change detection mechanism based on linear segment and vector search.
• Being able to protect the system and balance the system load by control theory based load shedding module.
In the future, the following work maybe our objectives:
• Improve the above mechanism based on the semantic data stream.
• Find out some new applications for our stream engine and extend its scope.
References
1. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and Issues in Data Stream Systems. In Proc. 2002 ACM Symp. on Principles of Database Systems, June 2002. 2. D Carney, U Cetintemel, M Cherniack, C Convey, S Lee, G Seidman, M tonebraker, N
Tatbul, and S Zdonik. Monitoring Streams: a New Class of Data Management Applications. In: Proc. 28th Intl. Conf. on Very Large Data Bases. Hong Kong, China, August 2002. 3. Yu Fan, Hongyan Li. ICUIS A Rule-Based Intelligent ICU Inform ation System. In IEEE
Proc.IDEAS-DH2004, IEEE Computer Society, Sep.2004.
4. Yongsheng Tan, Yibin Wang, Hongyan Li. A Management Strategy of Monitoring Data in ICU Based on Data Stream Technology. In IEEE Proc.IDEAS-DH2004, IEEE Computer Society, Sep.2004.
5. Hu Zijing, Li Hongyan, Yu Fan, etc. Using Control Theory to Guide Load Shedding in Medical Data Stream Management System. Accepted as regular paper by Asian 2005 LNCS. 6. Yin Ting ,Li Hongyan,Yu Fan, etc. A Hybrid Method for Detecting Data Stream Changes with Complex Semantics in Intensive Care Unit. accepted as short paper by Asian 2005 LNCS
7. Y. Dora Cai,J. Han,et.al., MAIDS: Mining Alarming Incidents from Data Streams, Proc. ACM SIGMOD 2004
8. C.C. Aggarwal, On Change Diagnosis in Evolving Data Streams, In Proc. ACM SIGMOD Conf., 2003.
9. E. J. Keogh, S. Chu, D. Hart, and M. J. Pazzani. An Online Algorithm for Segmenting Time Series. In Proc.of ICDM, 2001.