Logging disk, Data disk combinations

5.2 BOSC Architecture

5.4.2 Logging disk, Data disk combinations

We measured both the insert throughput and read throughput to get an overall measure of BOSC performance. We see that the performance is maximum for the case of 1 Logging disk and 4 Data disks as shown in figure5.3.

6000 6500 7000 7500 8000 8500 9000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Insert Throughput (records/second) Read Throughput (records/second)

Ratio of Logging disks : Data disks Logging disk, data disk variations in BUSC

Insert throughput Read throughput

Figure 5.3: Insertion and read throughput for various logging disk/data disk variations

With the increase in data disks, insert throughput increases because when we have more data disks we can do more data splicing and when I/O thread commits records to disk, disk I/O happens on multiple data disks in parallel. In case of adding more logging disks, we can bring more parallelism and increase the logging throughput. But we already have a very fast logging operation that can log with a latency of under 1 ms. The main bottleneck in BOSC is I/O, which commits records to disk. Hence with a fixed amount of resources, it’s preferable to dedicate more disks to data disks than to logging disks. But on comparison between 2 logging disks, 3 data disks vs 3 logging disks, 2 data disks: We see that latter has a better throughput in spite of lower number of data disks. This is an anomaly, because the low read throughput gives more time for IO thread to commit more records to disk. Hence by the time read thread finishes and insertion thread starts, the per-block queue is almost empty and hence the insertion thread doesn’t block for want of memory and hence insert throughput looks to be high.

We also see that read throughput drops with the decrease in data disks. There is only a slight drop in read performance and this is because of the drop in cache hit percentage. We measure the cache hit percentage as the number of times we find a record in per-block-queue in memory out of total number of read requests issued. It is not straightforward to reason out the read performance or drop in cache hits, because we cannot predict when the I/O thread commits the records in per block queue. TPCC trace might want to fetch some records and the I/O thread could have already committed that record to disk. So this can happen even if the I/O thread is slow or fast. But overall, we can guarantee that BOSC read performance does not deteriorate very much compared to vanilla B+ tree. For the last 2 cases, read throughput is very low because number of data disks are very low. When number of data disks are very low, insertion thread operates very slowly and hence the entire B+ tree operation proceeds very slowly. It’s observed that cache hit percentage is 0.11% when data disks are too low. The ratio is around 0.4% for the other cases. In another set of experiments, we will see how these cache hit percentages keep changing with memory size. In a normal case, read throughput should be around 50 records/second, but in the case of more log disks and less data disks, we see that read throughput decreases to as low as 28-30 records/second. That’s because, cache hit ratio is very bad and also read thread is in contention with IO thread, which is processing records very slowly. Note that IO thread and read thread will frequently run in contention for global B+ tree lock and hence lock contention increases a lot in the case of slow read thread processing.

5.4.3 Overall Performance Improvement on B

Tree

0 1000 2000 3000 4000 5000 6000 7000 8000 200 400 600 800 1000 1200 1400 1600

Throughput (Unit:Records Per Second)

Buffer Memory (Unit:MB) B+ Tree Using BOSC

Original B+ Tree

Figure 5.4: BOSC vs. Vanilla for Random insert workload. Comparison between the record insert throughput of a BOSC-based B+ _{tree implemen-} tation and a vanilla B+ _{tree implementation based on the conventional disk} read/write interface under the random insert workload, when the total amount of buffer memory is varied from 256 MB to 1.5 GB. The leaf block size is 64 KB, the record size is 64 B and the initial index size is 16 GB.

Figures 5.4 & 5.5 show the throughputs of a vanilla B+ _{tree implemen-} tation on a conventional disk read/write interface (with 5 data disks) and a BOSC-based B+ _{tree implementation under the random insert and random} update workload respectively. The throughput of the vanilla B+ _{tree imple-} mentation increases only slightly with the buffer memory because the poor locality in the random insert workload does not offer much room for leaf node caching to be effective. In contrast, the throughput of the B+ _{tree implemen-} tation keeps improving with the increase in buffer memory size because more pending insertion requests can be accumulated in each sequential commit cycle. This improvement saturates at 1024 MB because the given buffer memory exceeds the product of the new record insertion rate and the sequential commit cycle length. When the buffer memory size is 1024 MB, the sustained throughput of the BOSC-based B+ _{tree implementation under the random in-} sert workload reaches around 6410 requests/second, which is 20 times higher than that of the vanilla B+ _{tree implementation using the conventional disk} read/write interface (311 records/second). When buffer memory is not the performance bottleneck, the throughput of the BOSC-based B+ tree implementation is mainly bound by the physical disk I/O efficiency in the sequential commit process.

In document Efficient Implementation Techniques for Block-Level Cloud Storage Systems (Page 131-133)