5.10 Cherry Garcia Protocol
5.10.1 Start transaction
The transaction is started by creating a unique transaction identifier using a UUID generator and setting the transaction start time (Tstart) to the current time using
the TrueTime API.
5.10.2
Transactional read
If the supplied key is already in the transaction’s cache, that version is used, to enforce that the transaction sees any of its previous writes to the record. Other- wise, the record is read from the data store using the supplied key. The data store
Algorithm 1 Start transaction
1: function start
2: T.identif ier ← U U ID()
3: Tstart ← now()
4: Tstate ← ST ART ED
5: end function
populates the record header and contents that can be used by the transaction code. If the record is in PREPARED state, then we try to determine the status of the writer of the latest version, and bring the record up-to-date. The details are explained below as Transaction recovery.
If we cannot determine the status of the writer of a PREPARED current version (because the lease time has not expired and there is no TSR yet), then the read attempt fails, and the reading transaction will itself abort. This is a pessimistic approach.
An alternate approach would be to return the last committed version of the record. However, the current transaction would be aborted if the previously in- complete transaction is committed successfully. If the incomplete transaction is aborted, then the transaction would succeed. This is an optimistic approach. The degree of success of each approach depends on various factors including the number of concurrent transactions and the number of records involved in each transaction. Once we have a committed state for the record, we find the latest version which is valid for the current reader transaction’s start time, Tstart. This may be
the current version, or the previous version; if we cannot find a version valid for the reader’s snapshot, the read fails.
When a valid version of the record is read from the data store, it is put into the transaction record cache and also returned to the caller.
5.10.3
Transactional write
Transactional write operation is simple. The record value associated with the key is written to the Transaction object cache. In Algorithm 3 line 2 ensures that the earlier version of the record is marked as the previous version if it already exists in the cache and no write operation of the same record is performed during the course of the current transaction. The data record is only written to the actual
5.10. CHERRY GARCIA PROTOCOL 99
Algorithm 2 Read a record
1: function read(datastore, key) 2: if ∃ T.cache(datastore, key) then 3: return T.cache(datastore, key)
4: end if
5: record ← datastore.read(key)
6: if record.state 6= COM M IT T ED then
7: checktime ← now()
8: tx record ← coord datastore.read(record.T xID) 9: if ∃ tx record then
10: if tx record.state = COM M IT T ED then
11: datastore.commit(key, record)
12: else
13: datastore.abort(key, record)
14: end if
15: else
16: if checktime 6< record.Tlease time then 17: throw Exception(“Read fails”)
18: end if
19: datastore.abort(key, record)
20: go to 5
21: end if
22: end if
23: if Tstart 6< record.Tvalid start then
24: record ← datastore.prev(key)
25: if Tstart 6< record.Tvalid start then
26: throw Exception(“Read fails”)
27: end if
28: end if
29: T.cache.put(datastore, key, record)
30: return record
data store at the time of executing the transaction commit. Algorithm 3 Write a record
1: function write(datastore, key, record) 2: if record.T xID 6= T.T xID then
3: record.dirty ← true
4: record.prev ← T.cache.get(datastore, key)
5: record.T xID ← T.T xID
6: T.cache.put(datastore, key, record)
7: end if
8: end function
5.10.4
Transaction commit
The transaction commit is performed in two phases.
The Prepare phase: The record cache is inspected and all dirty objects are in- serted into the write-set. Each record in the write-set is marked with the transaction status record URI, the transaction commit time, and the transac- tion state is set to PREPARED then conditionally written to the respective data store in a fixed total order. This is done by performing the opera- tion in the order of the hash values of the identifying keys of the records. The reason for this is discussed in more detail in Section 5.11.1 later in this chapter. The operation is performed using the Datastore.prepare() method which utilises a conditional write to the data store using the record version tag (ETag or equivalent mechanism). The prepare phase is considered to be successful if all dirty records are successfully prepared. Should one wish to provide Serializable isolation, one needs to also prevent read-write con- flicts, by additionally checking that each unmodified but accessed item is unchanged, between its initial access and the end of the transaction. This is further discussed in Section 5.15.
The Commit phase: The TSR is written to the coordinating data store to in- dicate that all the records have been successfully prepared. The records are then committed by calling the data store commit() method for all records in parallel. The record commit method marks the record with the COM- MITTED state. The operation is performed using the Datastore.commit()
5.10. CHERRY GARCIA PROTOCOL 101
Algorithm 4 Commit transaction
1: function commit
2: T.commit time ← now() . phase 1: prepare
3: T.lease time ← now() + commit timeout
4: for (datastore, key, record) ∈ ordered(cache) do
5: if record.isDirty() then
6: record.Tvalid start← T.commit time 7: record.Tlease time ← T.lease time
8: status = datastore.prepare(key, record)
9: if status = ERROR then
10: recov rec ← datastore.recover(key)
11: if recov rec then needs recovery
12: if ∃ coord datastore.recover(recov rec.T xID) then
13: datastore.commit(key, recov rec)
14: else
15: datastore.abort(key, recov rec)
16: end if
17: end if
18: prev rec ← datastore.read(key)
19: if ∃ prev rec then
20: datastore.write(key, record)
21: end if
22: status = datastore.prepare(key, record)
23: if status = ERROR then
24: abort() 25: return ERROR 26: end if 27: end if 28: end if 29: end for
30: T.state ← COM M IT T ED . phase 2: commit
31: coord datastore.write(T.T xID, T.record)
32: for all (datastore, key, record) ∈ cache.keys() do
33: datastore.commit(key, record)
34: end for
35: return SU CCESS
method which also utilises a conditional write to the data store using the record version tag (ETag or equivalent mechanism). This ensures that the one-phase commit optimisation (described in Section 5.12) does not violate transactional behaviour. Once the records are committed the transaction status record is deleted asynchronously from the coordinating data store.
5.10.5
Transaction abort
If the transaction commit operation has not been initiated the abort operation is trivial. The record cache is cleared and the transaction is marked as aborted.
Algorithm 5 Abort transaction
1: function abort
2: T.state ← ABORT ED
3: coord datastore.write(T.T xID, T.record)
4: for all (datastore, key, record)incache.keys() do
5: if record.state = P REP ARED then
6: datastore.abort(key, record)
7: end if
8: end for
9: return SU CCESS
10: end function
If some of the updated records have had the Datastore.prepare() method ex- ecuted, then transaction can be aborted if the TSR has not been written to the transaction coordinating data store. In this case, the rollback is performed by issuing an abort on all prepared records. In a data store that does not sup- port multi-version records, the rollback operation is performed by overwriting the record with the application data and metadata that are found in the metadata field P rev. In this situation, we rollback to the previous state of the current ver- sion, but we have lost the former contents of the P rev field itself; this will never be needed, since the version we are restoring was itself COMMITTED.
Once the transaction status record has been written to the coordinating data store the transaction cannot be aborted.
5.10. CHERRY GARCIA PROTOCOL 103
5.10.6
Transaction recovery
If an application fails during the commit process, individual data items may need to be recovered; this will happen lazily as relevant records are accessed by other transactions. When a data item is read, its transaction state is inspected. If it is in the COMMITTED state, recovery is not necessary. However, if it is in the PREPARED state it is either rolled forward or rolled back, depending on the transaction status. The writer transaction’s URI is used to inspect the state of the Transaction Status Record. We may have to roll the data record forward (that is, marking the record header with the COMMITTED state) if the writer did commit, as determined by the TSR. If the writer aborted, which may either be known from the TSR or because the lease time has expired with no TSR, then we rollback the record, as described above for Transaction abort.
Algorithm 6 Recover transaction status
1: function recover(TxID)
2: tx record ← coord datastore.read(T xID)
3: if ∃ tx record ∧ tx record.state = COM M IT T ED then
4: return true
5: end if
6: return f alse
7: end function