Lazy Snapshot Algorithm (LSA)

5.1 Time-Based STM

5.1.1 Lazy Snapshot Algorithm (LSA)

In the following, I will describe LSA on a higher level of abstraction. This makes it easier to understand and avoids having to dive into the low-level implementation details. Most notably, for the pseudo code below, let us assume that all STM operations (e. g., individual loads and stores executed by a transaction) execute atomically and in a sequentially consistent manner. I will present a complete concurrent implementation for C/C++ on the basis of the C++11 memory model in Section 5.2, which does not rely on these simplifying assump- tions.

LSA handles transactional accesses to shared data items, which can either be higher-level constructs or just plain memory locations (see Figure 3.2 and the discussion in Section 3.1.2). For now, let us assume a set of abstract shared objects O. Each object o ∈ O traverses a series of versions o1, o2, . . . , oi. Every

time an object is written by a committed transaction, a new version of this object is created and the previous one becomes obsolete. The STM may—but does not need to—keep multiple versions of an object at a given time; only the latest version is necessary.

The discrete logical global time base of LSA is designated by clock. It can be implemented using a simple shared integer counter, which update transactions can increment atomically to acquire a unique commit timestamp.

Let us also assume that objects are only accessed and modified within transactions. Hence, we can describe a history of an object with respect to the global time base clock. boic denotes the time when version i of object o has been

written, and by doie the last time before the next version is written. Let us call

the interval between these two bounds the validity range of the object version and we denote it simply by [oi]. If oiis the latest version of object o, then doie

is undefined (because we do not know until when oi will be valid), otherwise

doie = boi+1c − 1. For convenience, o?denotes the most recent version of object

The sequence H(o) = (bo1c, . . . , boic, . . .) denotes all the times at which

updates to object o are committed by some update transactions. bo1c is the time

when the object was created. Sequence Hi is strictly monotonically increasing,

i.e., ∀oi6= o?: boic < boi+1c.

Algorithm 1 shows the multi-version, write-through variant of LSA as pseudo code. Functions prefixed withstm− are the functions that comprise the external interface of the STM: Starting a transaction (stm−start), loading from and storing to transactional data items (stm−load andstm−store), and committing

Algorithm 1 Lazy Snapshot Algorithm (write-through, multi-version variant)

1: Global state:

2: clock ← 0 . Global time base (shared integer)

3: State of thread p:

4: st : snapshot time (range)

5: r-set : set of read object version

6: w-set : set of written objects

7: stm-start()p:

8: st ← [clock , clock ] . Initial snapshot time bounds

9: r-set ← w-set ← ∅

10: stm-load(o)p:

11: if o ∈ w-set then

12: return o? . Read previous write

13: if bo?c > dste then . Is latest version too recent?

14: extend(clock ) . Try to extend (optional)

15: if bo?c ≤ dste then . Can use latest version?

16: st ← [max(bst c, bo?c), dste)] . Yes, use latest version

17: r-set ← r-set ∪ {o?}

18: return o?

19: else

20: if w-set = ∅ ∧ (∃oi: boic ≤ dste ∧ doie ≥ bstc) then . Can use older version?

21: st ← [max(bst c, boic), min(dste, doie)]

22: r-set ← r-set ∪ {oi}

23: return oi

24: else

25: abort() . Cannot find valid version

26: stm-store(o,val)p:

27: if o /∈ w-set then . Need to acquire on first write to o?

28: if bo?c > dste then . Is latest version too recent?

29: extend(clock ) . Try to extend

30: if bo?c ≤ dste then . Can use latest version?

31: st ← [max(bst c, bo?c), dste]

32: o?← new() . Create/acquire a new latest version

33: bo?c ← ∞

34: w-set ← w-set ∪ {o}

35: else

36: abort() . Cannot find valid version

37: o?← val . Write through to acquired latest version

38: extend(t)p:

39: dste ← t . Try to extend snapshot up to t

40: for all oi∈ r-set do

41: if o /∈ w-set then . Recompute validity unless object already acquired

42: dste ← min(dste, doie)

43: stm-commit()p:

44: if w-set 6= ∅ then . Nothing to do if read-only transaction

45: ct ← (clock ← clock + 1) . Unique commit time (atomic increment)

46: if dst e < ct − 1 then . Must validate if others committed in the meantime

47: extend(ct − 1)

48: if dst e < ct − 1 then . Are snapshot and commit time adjacent?

49: abort() . No, commit would not be atomic

50: for all o ∈ w-set do . Make new versions accessible to other transactions

51: bo?c ← ct . Validity starts at commit time

52: abort()p:

53: for all o ∈ w-set do

a transaction (stm−commit). A transaction completes successfully if it executes the algorithm until commit without executing the abortfunction (after whose execution the transaction will be restarted from its beginning).

For each thread executing transactions, the STM maintains a read set r-set , a write set w-set , and a snapshot time range st . The read set tracks all the object versions read by a transaction, whereas the write set tracks the objects written by the transaction (we always write the most recent version).

When starting a transaction, the STM clears both read and write set. It also sets the snapshot time range so that it is valid starting at the current time (i. e., value of the global time base clock), which I will explain in detail later.

Write-only transactions. Let us start by considering just transactions that only write objects. If stm−store gets called for an object that the transaction has not written to yet, it first checks whether the most recent version of the object is accessible and tries to extend the snapshot time if necessary (line 29). We can ignore this for now because the read set r-set will always be empty for write-only transactions, but I will revisit the snapshot time maintenance aspects later. In our case, dst e will always be set to clock (line 29 and line 39), which is the current time. We will thus be able to access the most recent version o?

(line 30) unless, as we will see next, another transaction is already writing to this object.4 _{To write to the object, we (1) first update our snapshot time (explained}

later), (2) create a new most recent object version o?, (3) set its validity range

to start at ∞, and (4) add the object to our write set (lines 31–34). The third step is precisely what makes the most recent object version inaccessible to other transactions because the current time will always be smaller than ∞. Thus, if another transaction has installed such a o?, we will just abort (line 36). Finally,

if we did not abort, we modify the value of o? (i. e., write through to memory).

Thus, writing an object is visible to other threads and prevents them from writing. This exclusive access can either be implemented like locking so that other writers simply have to wait until the existing writer transaction has finished. Alternatively, this also allows nonblocking implementations, where the exclusive access is handled more like a write marker that requires another writer to first abort the existing writer transaction. The first Java-based LSA implementations [92, 90] where indeed nonblocking but we will assume a lock-based approach in what follows.

By acquiring and writing to the most recent object version, we thus essentially lock the object, write to it, and the updates will become visible as new most recent object versions o?. If we have to abort we simply remove o?, which

will both undo our updates and make the previous most recent version accessible again to other transactions (lines 53–54 instm−abort).

Whenstm−commit is called, the update transaction is ready to commit: It (1) acquires a unique commit time ct from the global time base clock on line 45, (2) validates (lines 46–49), which will always succeed for write-only transactions for similar reasons as instm−store, and (3) makes the new most recent object versions created by this transaction accessible by letting their validity ranges start at the commit time (lines 50–51).

4_{Remember that we assume that STM functions execute atomically, so there cannot be}

a committed version whose validity starts at a most recent time than the time we obtained from clock previously.

Overall, updates thus resemble a two-phase locking scheme: we acquire locks before writing to an object, and never release the locks until commit. We also obtain the commit time when we have acquired all the locks. Therefore, all updates of a transaction are atomic and the commit times will be consistent with the order of committed object versions by different transactions: two- phase locking serializes conflicting transactions, which also enforces serialized acquisition of commit times.

Atomic snapshots. Now that we know how atomic updates and the validity ranges of object versions work, we can also understand how we can maintain atomic snapshots. Because those validity ranges are all constructed using the global time base, and because uncommitted updates are inaccessible due to the object locking being used, we can just combine the ranges to reason about the order of a transaction’s operations with respect to other transactions’ operations.

The transaction uses the snapshot time range st to keep track of the validity ranges of the objects that it read: Each accessed object version’s validity range incrementally constrains its lower bound bst c and its upper bound dst e further (i. e., st becomes the intersection of st and the validity range). st is a range because the same snapshot might be valid at different adjacent positions in the serialization order of transactions; if the range becomes empty, the snapshot is not atomic anymore, and the transaction has to abort.

Considering read-only transactions first, lets us now look at how snapshots are constructed. In stm−start, we set the snapshot time range to the current value of the global time base clock, which thus represents the current time.5

This ensures that we do not read from out-dated object versions and is required to ensure linearizability for transactions, for example. Also, we never speculate about what might be committed in the future, and thus set dst e to a time that is never higher than the current time clock.

Instm−load, we first try to access the most recent object version o?and try

to extend st if necessary, which I will explain later. Then, if we can access o?, we

intersect st with o?’s validity range, add o?to the read set, and return its value

(lines 16–18). Otherwise, we look for another object version oi whose validity

range intersects with st and use it instead (lines 21–23), only using the existing upper bound doie to further constrain st . If no suitable object version exists,

we have to abort. Note that if we read an object several times, this will also ensure that we always read the same object version (and thus keep the snapshot atomic) because of how we constrain dst e.

Up to now, we have only considered constraining dst e (e. g., by validity ranges or by the current time in stm−start). However, we can also check whether our snapshot would still be valid at a more recent snapshot time and not capped by more recently committed object versions. We call this a snapshot extension and it is performed by the functionextend, which iterates through the read set and recomputes a new upper bound for st based on the object versions’ validity ranges (line 42) and a caller-supplied upper bound t (line 39). Note that t is always less than or equal to the current time, so we never speculate about what might nor might not happen in the future (lines 14, 29, and 47).

5_{We could also set it to [clock , ∞] but would then need to use a different representation of}

The validation of the readset during snapshot extensions is thus similar to what we would do when using incremental validation; however, we only need to extend on demand, which also motivated the name Lazy Snapshot Algorithm. We only need to extend if another conflicting update transaction committed concurrently, but not in the assumed common case; even if we have to extend, we will be on somewhat of a slow path anyway because we are reading a fresh update by another thread and thus will suffer from cache misses. Overall, this leads to the small and typically constant runtime overhead forstm−load that leads to the performance advantage of LSA shown in Figure 5.1.

Update transactions. While we can commit read-only transactions right away (line 44) because their snapshot is always kept atomic and never earlier than the transaction started, we have to ensure for update transactions that the snapshot time range and the commit time overlap.

Therefore, an update transaction must have only the most recent versions of each accessed object in its snapshot. This is ensured by the checks on line 30 instm−storeand line 20 instm−load. In turn, we rely on this on lines 12 and 41 (i. e., if we have written an object, there is no other object version ordered between the version in our snapshot and o?).

Update transactions commit by first acquiring a new commit time ct from the global time base (line 45). Next, they ensure that dst e is equal or larger than ct − 1, attempting a snapshot extension if necessary, and aborting if it cannot be ensured (lines 46–49). Thus, st and ct are adjacent on the global time base; because ct is a unique timestamp acquired by this transaction, st and ct essentially overlap and there cannot be another transaction whose commit would prevent our snapshot from being atomic with our updates and the commit time acquisition. This works again similar to two-phase locking, only that our reads are invisible—but using time-based validation, we ensure that we could have held read locks until after acquiring ct during commit.

The only remaining step is to release all write locks, which we can do by setting the lower bound of the validity ranges of the written object versions to ct (lines 50–51).

Essential ordering constraints. The correctness of LSA relies on a few ordering and atomicity constraints ensured by the algorithm. First, on transactional loads, we obtain an atomic snapshot of both the data and the commit time associated with this data. While this is simply assumed to be the case in Algorithm 1, concrete implementations have to enforce this (see Algorithm 3). As a result, we can reason about snapshot consistency based on just the timestamps.

Second, a transaction makes its updates visible to other transactions (e. g., by installing tentative versions as in Algorithm 1) before obtaining a new commit time. Thus, the commit time is greater than any value that the global time base had when any of these updates were not yet visible to other transactions. This ensures that any other transaction making a snapshot at or after the updating transaction’s commit time will also see the updates (or, alternatively, that the updated objects are still inaccessible or locked because the updates have not finished committing yet). Updates are made accessible only after it has been ensured that the transaction is actually allowed to commit.

As a result, updates are virtually committed atomically with obtaining the commit time (i. e., right between the commit time and the previous timestamp). Also, this ensures that reading transactions will observe the effects of other update transaction completely or not at all.

To ensure that a transaction’s updates and its snapshot are consistent, updating transactions check that their snapshot is still valid at the time of the commit. While there exist different variations of this (see below), the underly- ing key principle is that if such a check succeeds, we know that the snapshot is atomic with respect to obtaining the commit time; because the updates are atomic with respect to the commit time too, both the snapshot and the updates are atomic with respect to each other.

Thus, the global time base and how transactions link their updates and snapshots to timestamps from this time base effectively orders transactions. Note that this is not necessarily a total order; for example, transactions that do not have data conflicts with each other (e. g., two read-only transactions accessing the same object) need not be ordered.

Algorithm 3 enforces a total order for all update transactions by letting them all acquire a commit time on their own, but this is not strictly necessary. It is required that all updates of a transaction have the same commit time (to make snapshots work), but updates having an equal commit time does not necessarily imply that those updates were by the same transaction.

Essentially, concurrent update transactions with no data conflicts can share a commit time. This is possible as long as the commit time they obtain is still greater than any value of the global time base when their updates were not yet fully visible to other transactions (see above). However, if transactions do share commit times, they cannot skip validating their snapshot if the snapshot time is right before their commit time (lines 46–49 in Algorithm 3). Instead, they always have to validate that their snapshot is right before their commit time and no other transaction has tentative conflicting updates pending; those updates could be by other transactions that use the same commit time. The LSA variants using a real-time clock as time base rely on this approach (see Section 5.3), as does related work (see Section 5.4).

Write buffering. Algorithm 1 shows a write-through variant of LSA, but an STM can also buffer writes until commit as outlined in Algorithm 2 (the functions ending in−wbthen comprise the external STM interface). This write- back variant really just defers making writes visible until right before commit; the allowed externally visible behavior of a transaction and the essential ordering guarantees do not change. Note that we could also defer writing back real values to the to-be-committed object version until after validation in commit as long as this happens before we release the write locks (line 50 in Algorithm 1, not shown in Algorithm 2 for brevity). Furthermore, we could also acquire locks early as in the write-through variant but never write data updates until we are sure to commit, leading to a variant called encounter-time locking [42].

Discussion. In what follows, I will conclude with a discussion of some performance-related properties of LSA and time-based STMs in general.

First, one might expect that LSA needs to perform extensions frequently when there are concurrent updates. In fact, however, LSA is quite independent

Algorithm 2 Lazy Snapshot Algorithm (write-back variant, extends Algo- rithm 1)

1: State of thread p: . extends state of Algorithm 1

2: w-buffer : buffered writes per object

3: stm-start-wb()p:

4: stm-start()

5: w-buffer ← ∅

6: stm-load-wb(o)p:

7: if ∃w-buffer_othen

8: return w-buffer_o . Read buffered write

9: return stm-load(o)

10: stm-store-wb(o,val)p:

11: w-buffer_o← val . Buffer store to o

12: stm-commit-wb()p:

13: for all o : ∃w-buffer_odo . Acquire objects with buffered writes before trying to commit

14: stm-store(o, w-buffer_o)

15: stm-commit()

of the speed in which concurrent transactions commit and thus increase the value of the global time base. If there are no concurrent updates to the objects that a transaction accesses, the most recent object versions do not change and no extension is required for obtaining a consistent snapshot. This is the case, in particular, if the value of clock has not changed since the start of the transaction. If clock has been increased concurrently and the transaction is an update transaction, then one extension is required during commit. Extensions are thus only required due to concurrent conflicting commits, and thus after cache misses. Also, at most one extension is performed per accessed object, but this worst case is extremely rare in practice because it requires very specific update patterns.

Accesses to the global time base might become a bottleneck when update transactions execute frequently due to cache misses and contention on clock. In practice, however, the number of accesses to clock remains small. All trans-

In document Software Transactional Memory Building Blocks (Page 91-98)