9. Related Work
9.2. Native Flash Storage
This group of related research work looks deeper into the disadvantages and limitations of modern Flash SSDs, and seeks the root causes.
LightNVM Bjorling et al. in [9] discuss the unsuitability of the block device interface for Flash SSDs. Four years later, in [10] they described their solution - the open-channel SSD subsystem called LightNVM. In order to overcome the aforementioned disadvantages of today’s SSDs they propose to shift Flash management to the host. The physical page address (PPA) I/O interface of LightNVM resembles our native Flash interface (NFI), as both allow the application to execute read, write and erase commands on physical addresses on Flash memory. However, NFI, which was first introduced in [30], is not limited to only those commands. Our current prototype of the interface offers the DBMS further commands like: copyback, write_delta, get_addr_table, etc, which allow for significant reduction of GC overhead and amount of transmitted data to/from storage. Further, NFI contains a set of commands enabling near-data processing by the DBMS under NoFTL architecture (see Chapter 4).
Although [10] describes the LightNVM subsystem and PPA I/O interface, it does not show how this can be utilized by the DBMS to increase its performance. In this sense, the NoFTL approach can be seen as what the authors of [10] described as "future work" in regard of "tuning SQL databases for performance on open channel SSDs" and "characterizing the potential of application specific FTLs with open-channel SSDs". NoFTL did this "tuning" of the database system. It shows how Flash management can be effectively integrated into the DBMS subsystems; and how a DBMS can efficiently manage physical Flash space by means of novel storage structures. Thus, regions allow us to control available I/O parallelism of Flash storage, and apply different Flash management algorithms simultaneously depending on characteristics of database objects, while groups can further reduce the overhead of garbage collection (see Chapter 6.1).
BlueDBM Jun et al. in [43] described the architecture of the distributed Flash storage - BlueDBM. It is a cluster of nodes, each containing a Flash storage device, which is accessed by the host as raw Flash memory. For this, the Flash controller provides an interface that is similar to NFI (see Chapter 4) and PPA I/O in [10] as it defines read, write and erase commands on physical addresses. However, in contrast to the NoFTL architecture, the responsibility for Flash management is shifted to the file system (log-structured, flash- aware file system RFS [53]). Although removing the on-device FTL is an important step in optimizing Flash storage, we argue that it is more beneficial to integrate the Flash management into the DBMS, and thus let the DBMS control the physical placement on Flash. Only the DBMS has a comprehensive knowledge about the data, which allows it to optimize both the Flash management and the traditional DBMS algorithms (see Chapters 5, 6).
Another important concept of BlueDBM is that every storage device is equipped with a hardware accelerator, which can perform in-storage processing. However, as the queries for the on-device FPGA must be defined on physical addresses, before sending such a query to storage, the application (DBMS) must consult the file system about the physical address for each logical address touched in the query. These calls to the file system produce extra overhead, which is not present in the NoFTL architecture.
CORFU Balakrishnan et al. in [6] presented a design of a shared log called CORFU. The log is implemented on top of a cluster of Flash SSDs. The distribution of the log over multiple SSDs allows to efficiently scale the I/O bandwidth of the system. Every client in the system maintains a replica of a small map (called projection), which maps log positions to addresses on Flash SSDs in the cluster. In order to read from a certain log position, a client consults this map, and then issues I/O requests to the relevant SSDs.
To append to the log, the client first requests the position from a separate central node (called sequencer). The sequencer returns the synchronized last log position (log tail), which is then used by the client to write the log entry.
Although the abbreviation CORFU comes from Cluster of Raw Flash Units, in fact, CORFU SSDs are not FTL-less (what we assume under raw Flash) in general case. CORFU SSDs differ from conventional SSDs in the following points. First, they provide to the application write-once semantic, which is exposed through an API consisting of append, read, fill and trim commands. Second, these SSDs offer to the host an infinite address space. The latter requirement results in a special FTL-scheme, which is based on page-level mapping organized as a hash table (needed to map infinite logical addresses to limited physical addresses). This, however, might become an issue as maintaining a pure page-level mapping is typically impractical for conventional SSDs due to limited on-device DRAM resources. Implementing a kind of partially cached page-level mapping (e.g., DFTL [28]) results in significant I/O overhead (see Table 5.3 in Chapter 5.6). Similar to conventional SSDs, CORFU SSDs also include garbage collection, bad-block management, and most probably wear-leveling in their FTL-scheme. With this, CORFU does not change the black-box design of SSDs, and thus it does not address their drawbacks.
File Systems for Flash Several file systems have been proposed for Flash-based storage.
Some of them, such as [67], [49], operate on conventional FTL-based Flash SSD. By addressing main properties of Flash (e.g., asymmetric latencies of I/O operations, out-of- place update strategy) they can reduce the overhead caused by traditional file systems (see Chapter 5.6). However, the problems and limitations resulting from the black-box design of modern SSDs are not solved in these file systems.
Another group of proposed file systems targets the raw (native) Flash storage. Some examples of them are [107], [57], [44], [76], [110], [108]. These file systems typically apply a log-structured design, and assume the responsibility for all or most of Flash management tasks (e.g., garbage collection, wear-leveling). Exposing the native Flash to the host, and integration of Flash management into the file system allows them to solve multiple problems of conventional SSDs. Thus, (i) the redundant functionalities between file system and FTL are removed (e.g., garbage collection, mapping); (ii) support of journaling does not require additional I/O overhead; and (iii) richer resources of host systems allow them to use fine-grained address translation tables.
However, all these approaches have one important bottleneck - for database systems the Flash storage remains a black-box. As we have shown in this work, much more significant improvements and optimizations can be achieved by giving the DBMS (and not the file system) full control over the Flash storage. Only the DBMS has comprehensive
knowledge about the data and the workload, which is necessary to effectively manage Flash storage. Based on this knowledge the DBMS can utilize multiple Flash management schemes simultaneously, depending on the properties of data (see Chapter 6). This solves the so-called "one size fits all" problem, which is typical for all modern SSDs, but also for all Flash-aware file systems. Moreover, the DBMS can better control available Flash parallelism (see Chapters 5 and 6.1), and further eliminate redundancies along the I/O path.