4 An Analytical Model for Recovery Mechanisms
4.3 Developing the MaStA Cost Model
4.3.4 Assigning I/O Access Patterns
The assignment of I/O access patterns to I/O cost categories for a given recovery mechanism is dependent on the characteristics of the mechanism. For example, a mechanism that maintains the original clustering of data performs both clustered and unclustered database reads. On the other hand mechanisms that lose the original clustering of data are assumed to always perform unclustered or disk database reads. It is conceivable that some of these mechanisms may be able to take advantage of dynamic re-clustering of data for some applications in order to perform clustered reads. To cater for such cases in MaStA requires only a reassignment of I/O access pattern costs to the database read categories for such mechanisms. It is assumed that application workloads have characteristics such that no effective re-clustering of pages can take place to reduce read costs.
The I/O access patterns assigned to the I/O cost categories for the four mechanisms are given in Table 4.2. In DataSafe, each database read is either clustered or unclustered. Log writes consist of writing updated pages sequentially to the safe, and writing pages of the safe map in an ordered manner to preallocated locations on disk. Committed pages are written back to the database using propagation writes. Propagation I/O can be delayed and may therefore be ordered. The commit I/O cost category consists of writing the root block and is assigned a unclustered write. Writing to the safe may also incur two disk seeks, if the same device is used to hold both the database and the safe: one to position the device at the safe and one to move it back to the database. The second occurs at the beginning of the next database read but is most conveniently modelled as a commit cost. Since committed changes are retained in the cache until propagated to the database, no propagation reads are required to read the changes back from the safe.
I/O Categories DataSafe AISP LSD BISP Database Read clusteredunclustered & unclustered disk clusteredunclustered &
Database Write ordered
Log Read ordered ordered
Log Write sequentialordered & ordered sequential sequentialordered & Propagation Read
Propagation Write ordered
Commit unclustereddisk & unclustereddisk & unclustered &disk unclustereddisk & Table 4.2: I/O Access Pattern Assignments to I/O Cost Categories
In AISP, updated pages are written to free blocks. In the variation of AISP examined here, it is assumed that updated pages are written to free blocks within the database before being allocated new blocks at the end of the database. This ensures that the size of the database is minimised and so the mechanism incurs unclustered reads instead of disk reads. An alternative strategy is to extend the database when creating shadow pages and only reuse free blocks within the database when it reaches some predefined size or fills the device. This strategy would alter the characteristics of AISP to more like those of the LSD. Because the original clustering of pages is lost, database reads always require unclustered reads. Log reads are performed to access the page map; such reads incur ordered read costs. Log writes, to update the page map, can be performed in an ordered fashion once the device head is moved to the required location. The cost of this seek is charged to the commit I/O cost category. The commit I/O cost category also consists of writing the root block and is assigned a unclustered write. The additional seek incurred by the next I/O operation is also charged to the commit category.
The main difference between the LSD and AISP is that the LSD performs less expensive sequential log writes instead of ordered writes. A requirement of being able to perform sequential writes in the LSD is that the database is dispersed over a larger area of the device and hence database reads are assigned the more expensive disk read costs.
Notice that database reads in the LSD and AISP are assigned unclustered and disk read costs respectively. If the database has never been updated before and read-only applications are executed over the database, these mechanisms may be assigned the same database read patterns as DataSafe. Such workloads are not interesting in the context of this work, since they incur the same read costs under each mechanism. This thesis focuses on workloads under which there is a potential advantage in choosing
one mechanism over another. Hence in MaStA it is assumed that update queries have already been executed against the database and that the original clustering of pages of data has been lost in the LSD and in AISP.
In BISP the original clustering is maintained so database reads are either clustered or unclustered. Database writes may be performed in block order and so incur ordered costs. There are three costs involved in log writes. The first is writing before-images to shadow blocks in the log. Shadow blocks may be allocated contiguously and written sequentially. The second cost is writing the page map indicating the locations of the shadow copies. These mappings must be written before an original block is overwritten and consist of ordered writes. The third cost is incurred after the updated pages have been written to the database and consists of re-writing the page map to discard the locations of the corresponding shadow pages. The cost of seeking to and from the page map is charged to the commit cost category. The other commit I/O costs are as for after-image shadow paging.