Architecture of Relational DBMS - Flash-aware Database Management Systems

2. Background

2.1.2. Architecture of Relational DBMS

Figure 2.1 presents the module-based architecture of the classical relational DBMS. Despite the variety of today’s database systems, their design and implementation differences, the basic functional components are present in all of them. Each of those is complex and consists of multiple sub-modules. Although the detailed description of the DBMS architecture and the interplay of its modules are well described and can be easily found in classical database textbooks ([80], [22], [38]), we decided to provide here a brief recap for the purpose of completeness.

Storage

Data / Index Files Catalog / Metadata R e c o v e ry DBMS

Query evaluation engine Files and access methods

Buffer manager Storage manager C o n c u rr e n c y c o n tr o l SQL Database

Figure 2.1.: Module-based architecture of the relational DBMS (based on [80]).

Processing of incoming SQL queries begins with the Query Processor. The key compe- tence of this module is an establishment of the query plans and control of their execution.

In order to find an optimal execution strategy the DBMS utilizes various metadata, such as catalog information, access statistics, histograms, runtime data, etc. The query plan decisions are made not only regarding the kind of the operators, but also about their execution order. The emerging computational and storage hardware with high level of supported parallelism force the DBMS designers to revisit traditional operators and data structures in order to increase intra-query and intra-operator parallelism. Thus, the efficient query optimizer must be able to recognize possible ways of parallelizing the query execution based on the available hardware resources. Flash storage is the perfect example of hardware development which offers the possibility to significantly speedup the execution of major query operators. Read/write asymmetry and I/O parallelism are the main properties of Flash SSDs allowing these improvements. Further, the rich on-device computational resources of modern SSDs allow for execution of query operators on the storage device itself, increasing thereby the computational parallelism, and saving data transmission costs, as well as reducing CPU cache and buffer pool pollution of the host system. Those optimizations require, however, a re-design of large parts of the logic of DBMS’s modules, as well as an additional support from the storage device.

Once the query plan is defined and the query processor starts with the execution of operators (e.g., selection, projection, joins, modification, aggregation, etc.) the second layer of the DBMS comes into the play - Files and Access Methods. This layer is responsible for the organization of data in logical database structures such as files and indexes, as well as internal organization of data records within those structures. It provides, therefore, interfaces for the query operators to locate, access and modify the data. The standard data organization layouts supported by the majority of DBMSs today are heap, sorted and hashed files, as well as B-Tree indexes. Bitmap, UB-Tree or R-Tree indexes are further alternatives supported by some database systems.

The key challenge of introducing a novel data structure or access method is the efficient support of concurrency and recovery. In recent years the database community actively looked into the data access layer in order to better address the properties of Flash SSDs. Prioritization of reads for writes, parallel access and the out-of-place update strategies are common ideas underlying the proposed novel or modified data structures. In this work we touch this layer multiple times while implementing the NoFTL architecture. Especially the approach of In-Place Appends (IPA) (see Chapter 7) required modifications in page layout for table and index data.

The next essential module of the DBMS is the Buffer Manager. Once the data requested by the operators is being located using appropriate access routines, the Buffer Manager is asked to provide access to the corresponding database page. If the look-up in the internal hash table was successful, the reference to the buffer frame containing this page is returned. Otherwise (i.e., cache miss), the Buffer Manager issues an I/O request to

the underlying Storage Manager to read the page from the storage and place it into the reserved free buffer frame. In the meanwhile, the replacement policy is responsible for the decisions regarding when and what data should be removed from the buffer pool. Thereby, unmodified data is simply evicted from the cache, while modified pages are written back (flushed) to the storage. The flushing API of the Buffer Manager is also used by other database modules, such as Recovery and Transaction Managers, for guaranteeing atomicity and durability properties of the DBMS.

The influence of the buffer’s replacement strategy on the database performance corre- lates with system’s dependency on the I/O stack. In systems where the working set cannot be cached completely in the buffer, the improper replacement strategy can easily become a significant bottleneck. Especially, if the workload mixes large table scans with small OLTP- like accesses, an intelligent Buffer Manager is vitally important. The efficient replacement strategy must take into account not only the simple statistics about frequency (e.g., LFU) or recency (e.g., LRU) of page accesses, but also the information about the workload, the query plan and the current operator causing those accesses, as well as the type of database pages (e.g., index or table pages). Also, the type and characteristics of storage have an influence on the Buffer Manager. The clear latency asymmetry of read and write requests of the Flash storage (e.g., 10x and more) encourage to revisit the replacement strategy. The recent approaches addressing those issues are covered in Chapter 9. In this work we have looked at the Buffer Manager considering yet another important property of Flash SSDs - parallelism. We do not change the way the replacement strategy decides about what and when to write out back to storage, but rather how the selected pages are being flushed. We show that (i) by utilizing information about the internal architecture of the Flash storage, and (ii) having a control of the physical data placement on Flash - both of which have first become available in the NoFTL architecture - the DBMS can speed-up the flushing process by about 50%. This is achieved by better utilization of a device’s parallelism and reducing the contention for physical resources between DBMS flusher threads (page cleaners). We did further modifications to the Buffer Manager module in connection with the IPA approach. The detailed description of those approaches and the required modifications are presented in Chapters 5.4 and 7, respectively.

The nearest to the physical storage is the Storage Manager module of the DBMS. It is responsible for the space management on the storage, which includes among others storage access routines, allocation and deallocation of space for database structures, and mapping of those structures to physical storage addresses (or files and their offsets in case of using a file system). This module was getting the lowest attention from the DBMS research community and is, therefore, the least modified one over the last couple of decades. The reason for this is an assumption of operating on storage devices which support the block device interface. Having only a hard disk drive as a reasonable storage

for more than three decades made this assumption an indisputable rule. Flash as a viable storage option was made possible mainly due to the support of a block device interface, i.e., SSDs were made backwards compatible with the traditional HDDs. In contrast to this, the NoFTL architecture removes the backwards compatibility of Flash storage, gives the DBMS full and direct control over the Flash by means of the novel native Flash interface. As a consequence, the Storage Manager of the DBMS has been rewritten to support FTL-less Flash SSD. More information about the changes in this module are provided in Chapters 4 and 5.

The next important module in the simplified DBMS architecture is the Transaction Manager. Its main task is to provide concurrency control. The efficient support for parallel execution of transactions is the most critical and important functionality of modern DBMSs. Not surprising is, therefore, the attention to this module from industry and academia over the last decade, motivated by the continuously growing gap between hardware parallelism and concurrency level of the DBMS. In fact, the majority of today’s database systems utilize traditional concurrency models, which often are going back to the pioneer systems in the late 80s. The typical mechanisms to support different levels of transaction’s isolation are locking, multi-versioning and tracking of transactional access history (e.g., in an optimistic concurrency control model). Although, compared to other modules, this module is only weakly coupled to the characteristics of physical storage, careful consideration of Flash specifics allows us to optimize traditional approaches. For instance, the implementation of multi-version concurrency control can be significantly improved by considering the out-of-place update principle of Flash SSD. More details about recent research in this area are provided in Chapter 9.

The last module in our simplified architecture of the DBMS is the Recovery Manager, which is, however, definitely not less important than others. Although, its core compe- tence is to guarantee the durability of data after failures or system crashes ("D" in ACID properties of RDBMS transactions), it typically assists also the Transaction Manager to realize atomicity and isolation properties of a transaction’s execution ("A" and "I" in ACID). The common principle for all systems to provide the recovery functionality is based on maintaining a certain level of data redundancy. Database systems realize this via logging. Logs are small records describing a single modification of data item(s), which can be used to recover the corresponding data in case of losses or data corruption. The majority of today’s relational database systems are implementing a kind of ARIES-like recovery. Its main principle is based on persisting the log records before writing the corresponding modified data items to the stable storage, the so-called Write-Ahead Logging (WAL). Flash storage with its "native" support of data redundancy caused by the out-of-place update strategy allows for significant simplification of these traditional recovery mechanisms. In other words, on writing a modified database page out of the buffer pool to the Flash

SSD, there is always a previous unmodified version of the page present on the storage. This "backup" page can be used for recovery purposes. However, modern FTL-based SSDs hide completely the out-of-place update behavior from the host system, making thereby this "free" recovery support of Flash being unaccessible for the DBMS. In contrast, under the NoFTL architecture the DBMS performs direct physical data placement, and thus has full control over the out-of-place update strategy, which allows us to use it for recovery optimizations. We discuss similar approaches proposed in recent years, as well as our ideas in Chapter 9. Our modifications to the Recovery Manager required for the basic realization of NoFTL are provided in Chapter 5.5.

Apart from the mentioned six modules, database systems must implement also other important functionalities, which might be realized as separate modules or be a part of the above. Those include authorization, metadata management, memory management, administration, etc. Those modules are typically completely independent of the underlying storage infrastructure and are, therefore, out of the scope of this work. In our NoFTL prototype implemented on top of an open-source storage engine they are left unchanged.

In document Flash-aware Database Management Systems (Page 40-44)