5.6 Exception handling support
5.6.2 Enhanced exception handling capabilities
As it has been presented, the C++ specification defines the exception handling support of C++ only to satisfy exception handling needs of its new transactional constructs. The point of view in the C++ specification is that the transaction abstraction is used mainly as a data synchronization mechanism. We argue that by considering the transaction abstraction as a tool in the design of a programming language, it is possible to improve aspects other than data synchronization in a language. We exemplify this on exception handling of Java language and we propose to use a transaction block not only for data synchronization, but also for exception handling purposes.
In this section, we only overview the syntax and semantics of this augmented transaction block (which will later be named atomic box) in order to cover the complete language extension specification in this chapter. This novel language extension is mainly presented in Chapter 7 where its syntax and semantics are described in detail, its usefulness is illustrated on different examples, its implementation is given in detail and its performance is analyzed.
Our objective of using a transaction block for exception handling is to preserve data consistency. In general, when an exception is raised in an application, a subtask of the ap- plication is left unfinished and, hence, some data in the application is left in an inconsistent state. For a sequential application such inconsistency only causes an abrupt and improper termination, while for multi-threaded applications it may cause incorrect executions, since an exception can be raised in a thread handling shared data, leaving shared data in incon- sistent state either permanently or temporarily until the correction of inconsistency (if ever such correction exists).
The inconsistency created by a raised exception can easily be avoided with the transac- tional execution of try block: a transaction has the ability to rollback, hence, it allows to go back to the consistent state before the execution of the catch block. Conveniently, the abort-and-throw semantics provide automatically the rollback before the exception is raised out of the transaction. Setting the abort-and-throw semantics for transactional try blocks as the default behavior, prevents the application from executing on inconsistent state not only for handled exceptions but also unhandled exceptions.
To provide the above functionality we augment the transaction block of our lan- guage extension with an optional recover block serving as a special type of catch block to the transaction block. When the transaction block is used without the recover block, it will keep the semantics defined as before. However, if used with the optional recover block, the transaction block will become a transactional try block where recover block is its only catch block that catches any exception (whether raised accord- ing to commit-and-throw or abort-and-throw semantics) not handled in the transaction block. Note, however, that the contents of the recover block are executed only after the transaction represented by the transaction block is rolled back.
Our language extension goes even further and allows naming transaction blocks with parameters to the transaction keyword. Such naming is used to describe the set of transaction blocks that should act together in a coordinated manner on an exception not handled in any of the transaction blocks constituting the set. We call such a set of transaction blocks an atomic box. Raising of an unhandled exception in transaction block of an atomic box results in the roll back of all the transaction blocks of the atomic box, ensuring that the application state is always consistent.
The complete syntax for such transaction block incorporating the above function- alities is:
__transaction [ (<type>)|("name", <handlingContext>) ]
{ S }
[ recover(CancelException <exceptionName>) { S’ } ]
where transaction and recover are keywords, S and S’ are sequences of state- ments (that may include, among other statements, the transactional control flow keywords introduced by our language extension1 ), <type> is the parameter specifying the transac- tional guarantee (atomic, irrevocable or elastic) explained in Section 5.2, name is a string that associates the transaction to the atomic box it belongs to, <handlingContext> is a keyword describing which recover blocks will execute for performing recovery in the associated atomic box and the <exceptionName> is the parameter of the recover key- word. Optional parameters and structures are enclosed in square brackets: transaction may have no parameter and the block recover is optional. Note that, the parame- ters taken by the transaction keyword can either be the <type> parameter or the name,<handlingContext> pair. The <type> parameter can only be used when the recover block is not used (our implementation does not allow elastic or irrevocable transaction guarantees for the transaction block when used with a recover block), whereas the name,<handlingContext> pair can only be used when there is a recover block (with- out the recover block this parameter pair is meaningless). The above syntax description results in three forms which we describe as follows:
transaction form: This is the syntax where transaction block is used neither with the optional parameter pair name,<handlingContext> nor with the recover block (the <type> parameter is, however, allowed). This form corresponds to the language extension of the transaction block where there is no built in exception handling support.
transactional try form: This is the syntax where the recover block is used with the transaction block without the optional parameter pair name,<handlingContext> (the <type> parameter is not allowed in this form). This form corresponds to the augmented transaction block where the attached recover block serves as a catch block. In this syntax, irrevocable language statements in the transaction block are forbidden since this would be in contradiction with the ability to rollback before performing recovery actions in a recover block.
1For the current implementation, an exception for the sequence of statements S is that when a
transaction block is used with an accompanying recover block, S can not include irrevocable state- ments.
atomic box form: This is the syntax where transaction block is used both with the optional parameter pair name,<handlingContext> and the recover block. This form is similar to transactional try form but with the use of optional parameters the full potential of the language extension is enabled. With this form atomic boxes are defined and the members of this atomic box can handle exceptions in a coordinated manner using their codes in the respective recover blocks. As with the transactional try form irrevocable language statements in the transaction block are forbidden with this form.
The transactional try and atomic box forms of transaction block propose a cleaner exception handling semantics compared to the transaction form. These two novel forms allow to separate cleanly the exceptions requiring commit-and-throw and abort-and-throw semantics:
Exceptions requiring commit-and-throw semantics are handled in catch blocks ap- pearing inside the transaction block. This implies that such exception handling code is executed in transactional context which makes sense since the exceptional state of the system is valid only in transactional context. In other words, if an ex- ception requiring commit-and-throw semantics is raised in a transaction, it needs to be handled in the transaction block. This way if the transaction commits, the corresponding exception handling actions are also committed, otherwise neither the transaction nor its exception handling actions appear as executed.
Exceptions requiring abort-and-throw semantics are handled in the recover block. The recover block is meant to be executed only after the transaction has aborted, thus its content is executed outside transactional context. For abort-and-throw se- mantics this also makes sense, since the recover block should modify the system state permanently either for a later re-execution of the transaction block (on a corrected state) or for continuing with the code following the transaction block.
An additional advantage of the transactional try and atomic box forms is that they pro- pose a safer solution for unhandled exceptions: an exception not handled in a transaction block is raised according to abort-and-throw semantics and will be treated in the recover block even if the code raising it inside the transaction required commit-and-throw semantics. This is a deliberate choice and is done to preserve data consistency through the rollback of the transaction in case of unhandled exceptions.
5.7
Summary
In this chapter, we have presented our complete Java language extension specification for transactional behavior. This specification allows the programmer to use the transaction abstraction as an exception handling and transactional control flow support as well as a synchronization mechanism. The specification is also contrasted wherever possible with the existing C++ specification [12] (also discussed in AppendixC).
The syntax of the extension is kept as simple as possible while still providing flexibil- ity. The syntax allows multiple transactional semantics for a transaction block in a simple manner which makes it very easy to extend the specification with even further transac- tional semantics. Also the syntax requires the minimum effort from the programmer to use irrevocable statements in transaction blocks. Alternative execution paths are proposed as part of the specification, which introduces simple-to-use control flow possibilities to the programmer to provide alternative solutions to a given functionality. Alternative execution paths also provide a natural environment to support speculative execution. The specifi- cation allows the programmer to control whether a transaction should be committed or aborted upon exception raise. Moreover, it proposes safe coordinated exception handling mechanism over multiple threads. This mechanism provides safety since rollback ability of the transaction abstraction can bring the application’s shared state to a consistent safe state upon an exception. The mechanism also allows coordination of multiple threads to handle an exception concerning shared or global state.
With all the powerful features provided together in a simple syntax, we hope our lan- guage extension can be useful at least for inspiring other similar specifications, if not used as is.
The implementation study we have performed for the specification explained in this chapter is presented in Chapters6 and7. Chapter 6 presents the implementation details related to major part of the specification except the enhanced exception handling related issues (i.e., atomic boxes). The implementation for the atomic box related syntax and semantics is presented in Chapter 7. A realization of the full specification merging the implementations from Chapter 6and Chapter 7 has been left as future work.
Automatic Transactification of the
TM Language Extension
The language specification presented in Chapter 5 is described at the level of program- ming language, hence, it is simple and intuitive for the programmer to use this language specification to introduce transactional behavior into his/her program. However, the actual transactional behavior is implemented by a TM library that has as interface a collection of function calls (e.g., start, commit, read, write etc.). Hence, a necessary step for integrating TMs in a programming language is transactification, i.e., the transformation of a code written according to the language extension specification to one composed of TM library calls.
For the initial TM prototypes transactification was generally performed manually by programmers. The integration of TMs to a programming language requires, however, that transactification is done automatically. In this chapter, we present our automatic transactification solution and detail how it is applied to the language extension specification presented for the Java language in Chapter 5. In describing our solution we mainly focus on STMs, while most concepts are applicable also to HTMs (in case of HTMs, library calls correspond to special instructions, or short functions using these special instructions).
The organization of the chapter is the following. We first describe existing automatic transactification methods. This is followed by the solution we propose for automatic transac- tification in the Java language. We then explain the implementation details of our solution.
6.1
Automatic transactification methods
Native compiler transactification: This is the most natural transactification ap- proach where the compiler interprets the transactional constructs and generates the corresponding code directly. This approach has been adopted mostly for the C/C++ languages. Examples of the approach are Intel-STM compiler [140, 3], Sun C++ Compiler with TM extensions [5], Dresden TM Compiler [35,1] and gcc-tm [160,2]. All these compilers follow the draft specification for C++ [12] (which is similar to the language specification proposed in Chapter 5 except for transactional control flow and exception handling features).
Source-to-source transformation: The input code is preprocessed and the newly introduced transactional constructs are replaced with some code that represents the semantics of the described transactional constructs but using only existing program- ming language statements. The first known effort that has performed such transac- tification is AtomJava [101]. However, AtomJava produces a lock-based code that has atomicity semantics rather than using optimistic concurrency as TMs do. It in- troduces an extra instance field to every class and relies on this field to maintain an object-based lock schema. This schema is shown to reduce lock access overhead, but imposes a memory overhead which impacts not only transactional objects but all the application objects. Source-to-source transformation simplifies development of TM based programs while it introduces difficulties to the programmer to debug the origi- nal code. Although the objective is to replace Java monitors with transactions, Ziarek et al. [197] also uses the source-to-source transformation for their implementation. Bytecode instrumentation: The input code in this approach does not contain
any new programming language constructs, but rather special annotations expressing transactional behavior. These approaches transform Java bytecode at load time (using BCEL/ASM-like libraries, or using Aspect-Oriented Programming (AOP)). Examples of this approach are LSA-STM [153], Deuce [111] and Multiverse [4]. This approach is elegant in the sense that no compiler modification is needed, yet the transactional behavior is available to the programmer. The downside is the lack of flexibility in the possible transactional language features since additional behavior is provided through annotations only.
Just-in-time compilation: It delays the transactification of the code inside a trans- action until runtime. The transaction start and commit are expressed with dedicated function calls. These calls can be replaced by macros in C/C++ to give the im- pression that a transaction code is enclosed in a transactional block marked with a language level keyword. The function calls allow a just-in-time compiler to switch mode such that in a transaction it augments the code on-the-fly in order to gener-
ate transactified code; and it does compilation without modifying/adding any code outside the transaction. This approach has been used by JudoSTM [143]. The advan- tage of this approach is the possibility to alter behavior at runtime and thus optimize transactional execution. One limitation with this approach, however, is that it is language dependent (it uses macros to provide the transactional block specification in the C/C++ language) and is not flexible in the transactional language features that can be provided. Moreover, an important downside that comes with just-in-time compilation is the runtime overhead required for instrumentation, especially if there are more concurrent threads than cores.