• No results found

5.10 Cherry Garcia Protocol

5.10.1 Start transaction

The transaction is started by creating a unique transaction identifier using a UUID generator and setting the transaction start time (Tstart) to the current time using

the TrueTime API.

5.10.2

Transactional read

If the supplied key is already in the transaction’s cache, that version is used, to enforce that the transaction sees any of its previous writes to the record. Other- wise, the record is read from the data store using the supplied key. The data store

Algorithm 1 Start transaction

1: function start

2: T.identif ier ← U U ID()

3: Tstart ← now()

4: Tstate ← ST ART ED

5: end function

populates the record header and contents that can be used by the transaction code. If the record is in PREPARED state, then we try to determine the status of the writer of the latest version, and bring the record up-to-date. The details are explained below as Transaction recovery.

If we cannot determine the status of the writer of a PREPARED current version (because the lease time has not expired and there is no TSR yet), then the read attempt fails, and the reading transaction will itself abort. This is a pessimistic approach.

An alternate approach would be to return the last committed version of the record. However, the current transaction would be aborted if the previously in- complete transaction is committed successfully. If the incomplete transaction is aborted, then the transaction would succeed. This is an optimistic approach. The degree of success of each approach depends on various factors including the number of concurrent transactions and the number of records involved in each transaction. Once we have a committed state for the record, we find the latest version which is valid for the current reader transaction’s start time, Tstart. This may be

the current version, or the previous version; if we cannot find a version valid for the reader’s snapshot, the read fails.

When a valid version of the record is read from the data store, it is put into the transaction record cache and also returned to the caller.

5.10.3

Transactional write

Transactional write operation is simple. The record value associated with the key is written to the Transaction object cache. In Algorithm 3 line 2 ensures that the earlier version of the record is marked as the previous version if it already exists in the cache and no write operation of the same record is performed during the course of the current transaction. The data record is only written to the actual

5.10. CHERRY GARCIA PROTOCOL 99

Algorithm 2 Read a record

1: function read(datastore, key) 2: if ∃ T.cache(datastore, key) then 3: return T.cache(datastore, key)

4: end if

5: record ← datastore.read(key)

6: if record.state 6= COM M IT T ED then

7: checktime ← now()

8: tx record ← coord datastore.read(record.T xID) 9: if ∃ tx record then

10: if tx record.state = COM M IT T ED then

11: datastore.commit(key, record)

12: else

13: datastore.abort(key, record)

14: end if

15: else

16: if checktime 6< record.Tlease time then 17: throw Exception(“Read fails”)

18: end if

19: datastore.abort(key, record)

20: go to 5

21: end if

22: end if

23: if Tstart 6< record.Tvalid start then

24: record ← datastore.prev(key)

25: if Tstart 6< record.Tvalid start then

26: throw Exception(“Read fails”)

27: end if

28: end if

29: T.cache.put(datastore, key, record)

30: return record

data store at the time of executing the transaction commit. Algorithm 3 Write a record

1: function write(datastore, key, record) 2: if record.T xID 6= T.T xID then

3: record.dirty ← true

4: record.prev ← T.cache.get(datastore, key)

5: record.T xID ← T.T xID

6: T.cache.put(datastore, key, record)

7: end if

8: end function

5.10.4

Transaction commit

The transaction commit is performed in two phases.

The Prepare phase: The record cache is inspected and all dirty objects are in- serted into the write-set. Each record in the write-set is marked with the transaction status record URI, the transaction commit time, and the transac- tion state is set to PREPARED then conditionally written to the respective data store in a fixed total order. This is done by performing the opera- tion in the order of the hash values of the identifying keys of the records. The reason for this is discussed in more detail in Section 5.11.1 later in this chapter. The operation is performed using the Datastore.prepare() method which utilises a conditional write to the data store using the record version tag (ETag or equivalent mechanism). The prepare phase is considered to be successful if all dirty records are successfully prepared. Should one wish to provide Serializable isolation, one needs to also prevent read-write con- flicts, by additionally checking that each unmodified but accessed item is unchanged, between its initial access and the end of the transaction. This is further discussed in Section 5.15.

The Commit phase: The TSR is written to the coordinating data store to in- dicate that all the records have been successfully prepared. The records are then committed by calling the data store commit() method for all records in parallel. The record commit method marks the record with the COM- MITTED state. The operation is performed using the Datastore.commit()

5.10. CHERRY GARCIA PROTOCOL 101

Algorithm 4 Commit transaction

1: function commit

2: T.commit time ← now() . phase 1: prepare

3: T.lease time ← now() + commit timeout

4: for (datastore, key, record) ∈ ordered(cache) do

5: if record.isDirty() then

6: record.Tvalid start← T.commit time 7: record.Tlease time ← T.lease time

8: status = datastore.prepare(key, record)

9: if status = ERROR then

10: recov rec ← datastore.recover(key)

11: if recov rec then needs recovery

12: if ∃ coord datastore.recover(recov rec.T xID) then

13: datastore.commit(key, recov rec)

14: else

15: datastore.abort(key, recov rec)

16: end if

17: end if

18: prev rec ← datastore.read(key)

19: if ∃ prev rec then

20: datastore.write(key, record)

21: end if

22: status = datastore.prepare(key, record)

23: if status = ERROR then

24: abort() 25: return ERROR 26: end if 27: end if 28: end if 29: end for

30: T.state ← COM M IT T ED . phase 2: commit

31: coord datastore.write(T.T xID, T.record)

32: for all (datastore, key, record) ∈ cache.keys() do

33: datastore.commit(key, record)

34: end for

35: return SU CCESS

method which also utilises a conditional write to the data store using the record version tag (ETag or equivalent mechanism). This ensures that the one-phase commit optimisation (described in Section 5.12) does not violate transactional behaviour. Once the records are committed the transaction status record is deleted asynchronously from the coordinating data store.

5.10.5

Transaction abort

If the transaction commit operation has not been initiated the abort operation is trivial. The record cache is cleared and the transaction is marked as aborted.

Algorithm 5 Abort transaction

1: function abort

2: T.state ← ABORT ED

3: coord datastore.write(T.T xID, T.record)

4: for all (datastore, key, record)incache.keys() do

5: if record.state = P REP ARED then

6: datastore.abort(key, record)

7: end if

8: end for

9: return SU CCESS

10: end function

If some of the updated records have had the Datastore.prepare() method ex- ecuted, then transaction can be aborted if the TSR has not been written to the transaction coordinating data store. In this case, the rollback is performed by issuing an abort on all prepared records. In a data store that does not sup- port multi-version records, the rollback operation is performed by overwriting the record with the application data and metadata that are found in the metadata field P rev. In this situation, we rollback to the previous state of the current ver- sion, but we have lost the former contents of the P rev field itself; this will never be needed, since the version we are restoring was itself COMMITTED.

Once the transaction status record has been written to the coordinating data store the transaction cannot be aborted.

5.10. CHERRY GARCIA PROTOCOL 103

5.10.6

Transaction recovery

If an application fails during the commit process, individual data items may need to be recovered; this will happen lazily as relevant records are accessed by other transactions. When a data item is read, its transaction state is inspected. If it is in the COMMITTED state, recovery is not necessary. However, if it is in the PREPARED state it is either rolled forward or rolled back, depending on the transaction status. The writer transaction’s URI is used to inspect the state of the Transaction Status Record. We may have to roll the data record forward (that is, marking the record header with the COMMITTED state) if the writer did commit, as determined by the TSR. If the writer aborted, which may either be known from the TSR or because the lease time has expired with no TSR, then we rollback the record, as described above for Transaction abort.

Algorithm 6 Recover transaction status

1: function recover(TxID)

2: tx record ← coord datastore.read(T xID)

3: if ∃ tx record ∧ tx record.state = COM M IT T ED then

4: return true

5: end if

6: return f alse

7: end function