• No results found

System Model for Persistence

The fundamental challenge for PTM is to ensure that program data is in a recoverable state at all times. That is, if the system should encounter a failure, then after the failure

is addressed and the system restarted, the program’s data should not be invalid. At a high level, a transactional model ensures this property by executing atomic transactions that appear to happen all at once or not at all. However, the implementation of persistent transactions depends on the following factors:

5.2.1 Hardware Support for Persistence

Marathe et al. [77] describe three hardware persistence domains. The simplest (persistence domain 0, or PDOM-0) only contains the NVM RAM modules themselves. PDOM-1 adds the memory controller. PDOM-2 adds the entire CPU state, including caches and registers. As the persistence domain expands, it becomes easier to ensure a recoverable state. For example, if a power failure occurs for a PDOM-2 system, then when the machine is powered back, it can resume immediately, with no loss of data. In PDOM-1, memory buffers are flushed to DIMMs on power failure. As a result, programmers must ensure that data reaches the buffers in a correct order, through the use of clwb instructions that cause a cache line to write back, and sfence instructions to order the a clwb with respect to subsequent stores. Finally, in PDOM-0, only the DIMMs are persistent, leading to additional instructions (e.g., pcommit) that run after all clwbs, to move data from the memory controller to the DIMMs. Current and upcoming Intel systems provide PDOM-1, and instructions related to PDOM-0 have been deprecated [58]. In Intel’s PDOM-1 systems, a failure that occurs in the middle of a transaction requires care to recover correctly: when the system recov- ers, the program counters at the time of the failure are unknown. As a result, persistent transactions must either (a) use undo logs to record all overwritten values, so that they can roll back a transaction that is interrupted, or (b) use redo logs to record all to-be-updated values, so that they can roll forward a transaction after it is guaranteed to complete. The contents of either log must be stored in persistent memory, and updates require specific or- dering with respect to accesses to program data. As a result, any transaction, even one that is not concurrent, requires instrumentation on every load and store of persistent memory.

benchmark TPCC-B+Tree TATP TATP (1Kops/tx)

overhead 5.15% 2.67% 5.1%

Table 5.1: Overhead of self-referential pointers

5.2.2 Position-Independence

Typically, a persistent region is achieved by mapping a named, contiguous range of physical addresses in NVM into a program’s virtual address space via mmap [17]. When a program restarts and reloads such a region, its virtual-to-physical mappings may change. The sys- tem may require that a data structure stored in NVM use position-independent pointers. These can either consist of two machine words (to represent a file ID and offset within the file) [59] or a single machine word that represents an offset relative to the location of the pointer (e.g., for a pointer at 0xAA00 to refer to a word at 0xAAF0, it would store the value 0xF0). Position-independence simplifies recovery: when a program re-starts, it can load the persistent region and use it immediately. Without position independence, it is necessary to walk the entire persistent region and re-write pointers. Note that rapid recovery requires position independent pointers and also a persistent allocator.

Table 5.1 depicts the increase in latency that position-independent pointers introduce in a non-persistent program. The experiment was conducted by using our transactional in- strumentation (discussed in Section 5.3) to dynamically treat each pointer in the benchmark as a self-referential pointer. As such, this experiment is a low estimate of the true cost of position independence, as it does not consider the additional clwb and sfence instructions that a persistent allocator would introduce. When we consider that persistent regions are rarely loaded, this cost seems excessive, and thus we focus on non-position-independent pointers.

5.2.3 Hardware Memory Diversity

In systems with NVM RAM, it is also possible to have traditional DRAM. One example is Intel Apache Pass. In these systems, a fundamental question is whether a persistent transaction is allowed to read and write to the DRAM, or only the persistent RAM. It may be difficult to statically enforce a requirement that transactions operate exclusively on one

type of memory or the other. The distinction is important, because a persistent memory region is typically allocated with mmap, and deallocated with munmap. An allocator that runs within the region may create and reclaim ranges of memory within the region, but it cannot return portions of the region to the operating system. In contrast, allocators for DRAM can return individual pages of virtual memory to the operating system when they are no longer in use.

STM literature establishes that when transactions are able to execute speculatively, then a transaction that frees memory cannot simply defer the call to free until after the transaction commits: freeing might return a page to the operating system, and a concurrent transaction (which is destined to abort) may be in the process of accessing a location on that page. If the freeing thread does not wait for all concurrent transactions to reach a safe point, or epoch, then those threads may incur a segmentation fault. This behavior is a subset of a larger pattern called privatization. A PTM that allows transactions to access DRAM and NVM must incur small privatization overheads at the boundaries of every transaction (and optionally also whenever an in-flight transaction checks the consistency of its read set). It must also incur large overheads when committing a transaction that privatizes memory. Privatization patterns are sufficiently complex that the default is for every transaction to incur this overhead whenever it commits.

5.2.4 Static Separation

When transactions are used for concurrency, there is no need to instrument every access to DRAM; only accesses that could be concurrent with a transactional access to the same

location need to be instrumented. In legacy systems, this may mean that on a single

cache line, one byte may be private to a thread, while an adjacent byte is shared among many threads, and accessed via transactions. In contrast, our focus on PDOM-1 means that every store to the NVM must be instrumented, so that clwb and sfence instructions can be performed correctly. It is natural, then, to require that every store be part of a transaction.

Going a step further, we can require that every load from a persistent region is also part of a transaction (note that micro-transactions make the overhead of such a design

minimal [33]). The resulting “static separation” model [1] is able to track memory at a coarser granularity than permitted by transactional programming models for DRAM [122].

5.2.5 Multiple Persistent Regions

Applications should be able to work with multiple persistent regions at the same time. However, past work has established that some constraints may be enforced, such as for- bidding pointers from NVM-backed regions to DRAM, or between NVM regions [17]. For the purposes of this Chapter, the distinction is not significant: as long as every attempt to mmap a named persistent region is done in a manner that persistently tracks (a) the name of the region (e.g., file name), (b) the virtual address assigned to the first byte of the mapped region, and (c) the size of the mapped region, then management of cross-region pointers can be handled by the code that runs upon recovery after a failure.

5.2.6 Models Considered in this Chapter

From the above, we focus on two models. In both models, the underlying hardware is assumed to provide PDOM-1, and the programmer is expected to provide recovery code, so that persistent regions do not require position independent pointers. Note that during recovery, it will be necessary to both (a) apply a redo/undo log and (b) remap pointers within the persistent region. Upon this base, the general persistence model (GP) assumes that main memory consists of both NVM and DRAM, that any single transaction may access both kinds of memory, and that it programs may access memory (reads and writes of DRAM, reads of NVM) from outside of transactions. The ideal persistence model (IP) assumes that a transaction may only access one type of memory (NVM or DRAM), and that every access to NVM is performed from within a transaction.