• No results found

Low-Latency Synchronous Logging

3.2 Batching Modifications with Sequential Commit

3.2.1 Low-Latency Synchronous Logging

Logging an update request to disk synchronously and then performing what- ever operations triggered by the update asynchronously is a well-known tech- nique. BOSC takes the same approach to deliver high sustained disk update

Sequential I/O Thread Block Interface ... ... ... Disk Queue BOSC

Data Logging Data Commit

Modify Read

Low−Latency Disk Logging

Access Interface Update−Aware Disk

Logging Disk Blocks

Data Disk Blocks

Index Data Structure

1 M

1 N

1 1 N

Figure 3.1: BOSC associates with each disk block an in-memory update request queue. When BOSC receives a disk update request, it logs the request to disk, queues the request in the update request queue associated with its target disk block, and then performs the update operation only when the target disk block is brought into memory. BOSC fetches target disk blocks using sequential disk I/O.

throughput and the same durability guarantee as synchronous disk updates. In BOSC, each log record for a disk update request contains the following information:

• A copy of the data structure pointed to by ptr modification,

• A global sequence number for the current disk update request,

• A back pointer to the disk location of the log record that is temporally immediate before this record,

• A global frontier, which corresponds to the global sequence number for the youngest disk update request before which all disk update requests have been committed to disk, and

• A local frontier, which corresponds to the global sequence number for the youngest disk update request before which all disk update requests to the target (local) disk block of the current disk update request have been committed to disk.

Upon receiving a disk update request, BOSC increments the current global sequence number and assigns the result to the request, and prepares its log

record by extracting the data structure pointed to by ptr modification and copying the global frontier, the local frontier associated with the specified target disk block, and the disk location of the last log record. To illustrate how BOSC maintains the system-wide global frontier and the per-block local frontiers, let’s consider the following update request sequence: 1(10), 2(15), 3(10), 4(2000), 5(30), 6(10), 7(15), where the numbers outside the parentheses are global sequence numbers of disk update requests and those inside are their target disk block numbers. Suppose a system failure occurs immediately after Request 7 is logged to disk, and at that point only the effects of Requests 1, 2, 3, 5 and 6 are committed to disk. So at that instant, the global frontier is 4, the local frontiers for Blocks 10, 15, 30 and 2000 are 6, 2, 5, and 0, respectively. BOSC uses the global frontier to determine which log records can be recycled at run time and to reduce the number of log records that need to be examined at recovery time. The per-block local frontiers can further cut down the recovery processing load, as is explained in Section 3.2.3.

Because the end-to-end throughput of BOSC is bounded by its synchronous disk logging performance, BOSC extends a low-latency disk logging tech- nique [12] to be both low-latency and space-efficient, to support aggressive disk request batching, and to work on a commodity disk array [160]. The key idea in this low-latency logging technique is to write a disk block to where the disk head happens to be. More concretely, BOSC maintains a separate disk request queue for each disk in the log disk array. At any point in time, one of the log disks serves as the active disk. In the beginning, BOSC randomly chooses one of the log disks as the active disk. Once a log disk becomes the active disk, it remains as the active disk until the waiting time of the oldest pending request in its queue exceeds a threshold, Twait. Whenever a new disk

write request arrives, BOSC inserts the request to the active disk’s queue as long as the waiting time of its oldest pending request is smaller than Twait

and there is enough free space in the current track to accommodate the new request; otherwise BOSC dispatches the request batch currently in the active disk’s queue, chooses another log disk as the active disk and inserts the new request to its queue.

To select a new active disk for an incoming write request, BOSC computes the earliest time at which the write request could be written to each log disk, and selects the one that can write the request to disk at the earliest. When computing a write request’s write time on a log disk, BOSC takes into account the current position of the log disk’s head and the possibility of batching the new request with others already in the disk’s queue. For those log disks that are currently idle, BOSC only needs to consider the delay due to batching. A key design decision in BOSC is to dispatch a new write request to a log

disk that allows batching of as many disk write requests into one physical disk write operation as possible, rather than to one with the earliest write time for that request. However, BOSC uses Twait to limit the size of batching and

to ensure that the experienced latency of each incoming disk write request is always bounded.

Because of batching, multiple log records could be merged into a physical disk request when they are written to disk. Also, the actual disk location of each log record is only known at the last moment, i.e., right when they are written to disk, and BOSC keeps track of this information accurately to chain log records together through their back pointers.

To track the log disks’ disk head position, BOSC statically extracts the physical disk geometry information from every log disk, and constantly keeps track of each log disk’s disk head position at run time. More concretely, after a physical disk write is completed, BOSC records the LBA (Logical Block Address) of its last sector, LBA0, and its completion timestamp T0. Assuming

the disk head stays in the same track, when the next write arrives at T1, BOSC

estimates the disk head’s current position CurrentLBA using the following formula:

CurrentLBA = SP T·(T1− T0) mod RoT ime

RoT ime + LBA0 (3.1)

where SP T is the number of sectors in the current track, RoT ime is the disk’s full rotation time. The final predicted position, DestinationLBA, is CurrentLBA + Lookahead, where Lookahead is an empirical value chosen to account for such delays as the controller delay and avoid a full rotation delay due to tracking errors. Detailed design and analysis of low-latency synchronous logging can be found in chapter 4.