Active Disk System
Chapter 9: Related Work
9.6 Data Mining and OLTP
One of the performance advantages of Active Disks discussed in Chapter 5 was the use of integrated scheduling at the individual disk drives to combine a “background” workload that can take advantage of the characteristics of a particular “foreground” work- load to share resources more efficiently. The most obvious example of this is the combina- tion of a decision support workload and a transaction processing workload. This allows decision makers to identify and evaluate patterns in the database while the system contin- ues to process new transactions. The closer this connection is, the more up-to-date and rel- evant decisions can be. Chapter 5 proposed a system where these decision support queries can be performed against the “live” production system. This extends previous work in mixed database workloads, and in disk scheduling.
9.6.1 OLTP and DSS
Previous studies of combined OLTP and decision support workloads on the same system indicate that the disk is the critical resource [Paulin97]. Paulin observes that both CPU and memory utilization is much higher for the Data Mining workload than the OLTP, which is also clear from the design of the decision support system shown in Table 5-13 in Section 5.4.2 of Chapter 5. In his experiments, all system resources are shared among the OLTP and decision support workloads with an impact of 36%, 70%, and 118% on OLTP response time when running decision support queries against a heavy, medium, and light transaction workload, respectively. The author concludes that the primary performance issue in a mixed workload is the handling of I/O demands on the data disks, and suggests that a priority scheme is required in the database system as a whole to balance the two types of workloads.
9.6.2 Memory Allocation
Brown, Carey and DeWitt [Brown92, Brown93] discuss the allocation of memory as the critical resource in a mixed workload environment. They introduce a system with multiple workload classes, each with varying response time goals that are specified to the memory allocator. They show that a modified memory manager is able to successfully meet these goals in the steady state using ‘hints’ in a modified LRU scheme. The modified allocator works by monitoring the response time of each class and adjusting the relative amount of memory allocated to a class that is operating below or above its goals. The scheduling scheme we propose here for disk resources also takes advantage of multiple workload classes with different structures and performance goals. In order to properly support a mixed workload, a database system must manage all system resources and coor- dinate performance among them.
9.6.3 Disk Scheduling
Existing work on disk scheduling algorithms [Denning67,..., Worthington94] shows that dramatic performance gains are possible by dynamically reordering requests in a disk queue. One of the results in this work indicates that many scheduling algorithms can be performed equally well at the host [Worthington94]. The scheme that we propose here takes advantage of additional flexibility in the workload (the fact that requests for the background workload can be handled at low priority and out of order) to expand the scope of reordering possible in the disk queue. Our scheme also requires detailed knowledge of the performance characteristics of the disk (including exact seek times and overhead costs such as settle time) as well as detailed logical-to-physical mapping information to deter- mine which blocks can be picked up for free. This means that this scheme would be diffi- cult, if not impossible, to implement at the host without close feedback on the current state of the disk mechanism. This makes it a compelling use of additional “smarts” directly at the disk.
With the advent of Storage Area Networks (SANs), storage devices are being shared among multiple hosts performing different workloads [HP98a, IBM99, Seagate98, Veritas99]. As the amount and variety of sharing increases, the only central location to optimize scheduling across multiple workloads will be directly on the devices themselves.
9.7 Miscellaneous
There are several areas of research that have explored “activeness” in other contexts, placing general-purpose computation outside the domain of traditional microprocessors. There have also been significant advances in the commercial deployment of small-foot- print execution environments that can be used in very resource-constrained environments. 9.7.1 Active Pages
The Active Pages work at the University of California at Davis proposes computa- tion directly in memory elements, moving parallel computation to the data [Oskin98]. Their architecture is based on a memory system where RAM is integrated with some amount of reconfigurable logic. Results from a simulator promise performance up to 1000 times that of conventional systems, which often cannot keep their processors fed with data due to limitations in bandwidth and parallelism. This work takes advantage of the same silicon technology trends as Active Disks, but must operate at a much lower granularity than the parallelism of Active Disk operations.
The authors suggest that the partitioning between the computation performed in the processor and in the Active Pages can be done by a compiler that takes into account band- width, synchronization, and parallelism to determine the optimal location for any piece of code. For Active Pages, this scheduling would have to be done at the instruction or basic block level due to the tight coupling between the processor and the Active Pages. For Active Disks, this scheduling would be done at the module or component level, as dis-
cussed in the previous sections, since the coupling is much lower and the “distance” between Active Disks and the host is much larger.
9.7.2 Active Networks
The Active Networks project provides the inspiration for the name Active Disks1 and proposes a mechanism for running application code at network routers and switches to accelerate innovation and enable novel applications in the movement of data and network management [Tennenhouse96]. This work suggests two possible approaches for managing network programs - a discrete approach that allows programs to be explicitly loaded into the network and affect the processing of future packets and an integrated approach in which each packet consists of a program instead of simply “dumb” data. The tradeoff between the two is the amount of state that devices can be expected to maintain between requests and how many requests can be active at any given time. The implementation of the Active IP option [Wetherall96] describes a prototype language system and an API to access router state and affect processing. It does not address the resource management issues inherent in allowing these more complex programs.
These types of functions are much more sensitive to execution time than Active Disk functions. Network packets within IP switches are processed at rates of gigabits per second, while Active Disks have the “advantage” of being limited on one side by the (low) performance of the mechanical portions of the disks. This also means that the resource management system for Active Disks must only take into account a small number of con- currently running functions at the disks, while Active Network switches might easily have thousands of concurrent processing streams.
9.7.3 Small Java
There has been considerable work on optimizing safe languages such as Java through the use of just-in-time compilation [Gosling96, Grafl96] or translation [Proebsting97]. Small-footprint Java implementations are becoming available for embed- ded devices due to the popularity of the language and the promise of portability among hardware platforms. Recent product announcements promise a Java virtual machine in 256K of ROM [HP98] or as tiny as a smart card that provides a Java virtual machine in 4K of ROM and can run bytecode programs up to 8K in size for a significant subset of the lan- guage [Schlumberger97]. This demonstrates that it is possible to implement a workable subset of the Java virtual machine in a very limited resource environment. Other systems such as Inferno [Inferno97] are specifically targeted for embedded, low-resource environ- ments and might also be appropriate choices for Active Disk execution.
1. The name was originally suggested by Jay Lepreau from the University of Utah in October 1996 during a question at the OSDI work in progress session where the original work on Network-Attached Secure Disks, later published as [Gibson97] was being presented.