This chapter provided background material about Distributed Query Processing (DQP) and Grid Computations, and discussed their integration. From the perspective of Grid computing, DQP offers an alternative manner to describe and run computations, char- acterised by ease of expression and system-performed optimisation. From the DQP perspective, the Grid offers a platform that addresses infrastructure concerns regard- ing security and coordinated discovery and allocation of remote and autonomous re- sources.
In addition, this chapter identified the issues of generic adaptivity mechanisms cov- ering adaptations to volatile wide-area environments, and resource scheduling, which constitute the research problems addressed in this thesis (Chapter 1), as an important challenge for efficient and effective DQP on the Grid.
The last sections dealt with Grid-enabled database systems, presenting in more detail two existing DQP systems for the Grid, Polar* and OGSA-DQP. The develop- ment of these systems is of considerable importance for the remainder of the thesis. As there are no relevant validated simulators and models describing the Grid environ- ment and the behaviour of Grid-enabled query processors yet, the research results are
evaluated in the context of these two real systems. Thus, they form the basis for a testing platform. As stated in Chapter 1 explicitly, the contribution of this thesis to the development of Polar* and OGSA-DQP corresponds to the compiler of the systems.
The integration of Grid and database technologies (Section 2.3) has been discussed also in [GPSF04]. Descriptions of Polar* have appeared in [SGW+03, SGW+02], and of OGSA-DQP in [AMP+03, AMG+04].
Scheduling Queries in Grids
This chapter presents a novel scheduling algorithm for queries running in distributed heterogeneous environments like the Grid. In wide-area query processing, the resource scheduling problem is defined as the problem of (i) selecting resources and (ii) match- ing subplans with resources. In the non-database literature, sometimes, scheduling involves the problems of defining the execution order of jobs (which may correspond to subplans in query processing) and exploiting pipelined parallelism in addition to the resource selection and the job matching [DSB+03]. However, in databases, the issues of execution order and pipelined parallelism have been effectively and efficiently ad- dressed by adopting well-established execution models, such as iterators [Gra93], and thus need not be the responsibility of query schedulers.
To date, scheduling algorithms for heterogeneous distributed databases compro- mise partitioned parallelism (e.g., [ESW78, Kos00]) and consequently are not suitable for a significantly wide range of intensive queries1, for which parallelism is particu- larly beneficial. The algorithm proposed addresses this limitation in a practical way with low-complexity cost. It allows queries over Grid databases to employ both par- titioned and pipelined parallelism using a diverse set of resources that may or may not store data. This contribution allows one to move a step towards attaining high performance in grid querying as it may improve the performance of query processing significantly.
Resource selection is an integral phase of query optimisation in distributed query processing. Regardless of the plan enumeration approach, e.g., dynamic programming, two-step optimisation, and so on, the query optimiser needs to decide which resource
1Intensive queries are the queries that either process large volumes of data, or apply expensive com-
putations onto the data processed, and consequently place a heavy demand on any of the CPU, I/O bandwidth and network bandwidth types of resources (or their combinations).
will evaluate each part of the query. In architectures similar to the ones of Polar* and OGSA-DQP, there is a dedicated component in the query optimiser for this purpose, namely the scheduler, as described in Chapter 2. The new algorithm is implemented by re-engineering this component.
The chapter is organised as follows. The discussion of related work appears in Sec- tion 3.1. The proposed solution is described in Section 3.2. This solution is evaluated in Section 3.3. Section 3.4 summarises the chapter.
3.1
Related Work
Existing scheduling algorithms and techniques, either from the database or the Grid or the parallel computing research communities, are inadequate for parallel query processing on the Grid basically because the way they select machines and allocate tasks compromises partitioned parallelism in a heterogeneous environment. For exam- ple, generic DAG schedulers (e.g., [KA99, TWML02, SZ04]), and their Grid variants (e.g., [TTL03]) tend to allocate a graph vertex to a single machine, which corresponds to no partitioned parallelism (if the DAG represents a query plan, a vertex corresponds to a query operator). More comprehensive proposals (e.g., GrADS [DSB+03]) still rely on application-dependent “mappers” to map data and tasks to resources, and thus come short of constituting complete scheduling algorithms on their own right. In addition, no mapper has been proposed that covers query processing on the Grid. Efficient proposals for mixed-parallelism scheduling (e.g., [RvG01]) and parallel data- base scheduling (e.g. [ESW78, DGG+86]), are restricted to homogeneous settings. Contrary to the above approaches, the proposal of this thesis effectively addresses the resource scheduling problem for Grid databases in its entirety, allowing for arbitrarily high degrees of partitioned parallelism across heterogeneous machines.
This section presents additional scheduling proposals found in the literature and discusses their differences with the algorithm proposed. It is divided into two parts, one examining contributions coming from the database community and thus tailored to database queries, and the other referring to schedulers for more generic computations in Grids.
3.1.1
Resource Allocation in Database Queries
Distributed query processing has mostly been influenced by some pioneering systems. The most influential, R* [ML86], simplified the problem of resource scheduling by neglecting the benefits of partitioned parallelism. The data is retrieved from a single site only, and are joined on a single site, which is either the site of one of the inputs or the site that asked for the data. SDD-1 [BGW+81] focused on semijoins and also did not employ partitioned parallelism. Distributed Ingres [ESW78] took a step for- ward, and provided for partitioned parallelism, but only for machines that store data. Their scheduling algorithm assumed that all the participating machines have the same computational capabilities (a property that, in the general case, cannot be expected to hold on the Grid). It also forced a choice to use either all nodes available or just one. [RM95] discusses an approach for load balancing employing partitioned parallelism, and although it refers to completely homogeneous environments, it does not force the system to employ all the available nodes when these are not needed, extending the work of [WFA92]. Other existing techniques for parallel and distributed databases do not consider partitioned parallelism or completely skip the resource selection phase by assuming a fixed set of resources and then trying to schedule tasks over these resources (e.g., [GI97, MBGS03]). Moreover, they usually assume homogeneous and stable en- vironments (e.g., [WFA92, WCwHK04]). For example, Gamma [DGG+86] assumes that all the available machines are similar in terms of capabilities, connection speed and ownership.
In early commercial distributed database systems, the reason for not developing schedulers supporting partitioned parallelism was that the communication cost was the predominant cost. The biggest effort was put into minimising this cost [TTC+90], and each operator was executed at a single site. In modern applications, and due to advances in network technologies, the computation cost is often the dominating cost (e.g., [SA02]). However, advanced distributed query processing systems may employ only pipelined and independent parallelism (e.g., [ROH99, SAL+96]). A question may arise as to whether existing techniques that do not employ partitioned parallelism (like, for example, the query/data/hybrid shipping in client-server architectures [Kos00]) are efficient in most cases. For a large range of queries, the answer is negative. The parallelisation saturation point (i.e. the point after which further parallelisation does not yield any benefit) is lower when the start-up cost increases. The start-up cost depends strongly on the machine and the evaluation method, but in most cases it occurs once for all the operators processed on a single machine. Therefore, the contribution
of the startup cost becomes less significant for expensive queries. When the ratio between workload and machines used increases, the saturation point increases as well. These properties are quite appealing for tasks in computational grids insofar as they are envisaged to be of significant size and complexity.
3.1.2
Generic Resource Scheduling on the Grid
In the area of DAG scheduling to support mixed parallelism, there have been many interesting proposals. The algorithm proposed may be regarded as an adaptation of the task allocation in [RNvGJ01, RvG01] for heterogeneous environments, tailored to the needs of query processing. With respect to heterogeneity, [BDS03] takes one step further, by considering heterogeneous clusters of homogeneous machines, but has significantly higher complexity and is not compatible with the iterator model of query execution [Gra93]. Ordinary DAG schedulers for Grid environments, such as [SCS+04, SZ04, TTL03], do not consider partitioned parallelism, nor the type of inter- dependencies between operators in a query execution plan (e.g., the order of operators in a query plan may be fixed and some operators need to be computed earlier than oth- ers, whereas other operator sets may be able to execute in a pipeline). Not considering partitioned parallelism is the usual case even for homogeneous systems [KA99].
The proposal in Section 3.2 is, to a certain extent, compatible with the GrADS project [DSB+03], as it can play the role of the GrADS mapper for database query applications. The main differences lie in the fact that, in query processing, the initial pruning of the resource sets cannot be done in a completely application-independent way, as in GrADS, because database locality is very important for efficient query plan construction. Also, in GrADS, the scheduling algorithm runs many times, one for each candidate machine group, whereas, in the proposal of this thesis, the scheduler is executed only once, resulting in lower algorithm execution times. In the context of [DSB+03], [YD02] has presented a heuristic that allocates nodes in an incremental way similar to the scheduler proposed. One difference between the two techniques, apart from the fact that the graphs in [YD02] are simpler than typical query plans, is that in [YD02] the machines are first chosen and then it is decided how to use them, whereas in the approach followed in this thesis, a bottleneck is first identified and then an attempt is made to increase the parallelism. Other schedulers developed in GrADS (e.g., [PBD+01]) do not provide for different degrees of parallelism in different parts of the query plan, and may require the user to prune first, explicitly, an extended set of resources. In the current proposal, this is a responsibility of the scheduler.
Other powerful schedulers, such as Condor [TWML02, TTL03], suffer mainly from the limitations in the dependencies in the graphs supported, and the restriction of allocating each node in the graph to a single site [KA99].