4.9 Extensions
4.9.2 Control Flow Specification Language
So far, we have been specifying the (static and dynamic) semantics of pro- gramming languages by graph transformation rules in a manual fashion. When considering more exotic programming language constructs (and combinations thereof) the flow of control through a program can become rather complex and highly non-trivial. Listing 4.4 shows a small example code pattern with non- trivial control flow.
while ( true ) { try { ... break ; ... } f i n a l l y { ... } }
4.9 Extensions
In another master project [175, 174] carried out under our supervision, we have developed the Control Flow Specification Language (cfsl, for short). The cfsl is a specification language in which one can specify the control flow se- mantics of programming languages that feature many of the usual program- ming constructs. It supports the specification of control flow semantics for advanced constructs such as, e.g., for and while statements containing break or continue statements, and even more complex programming constructs such as try-finally-catch as available in, e.g., Java.
Fig. 4.36 gives an overview of the processes involved in this project. In the role of a meta language designer, we have defined the (concrete and abstract) syntax of the cfsl and its semantics, again, through a set of graph transforma- tion rules, denoted cfsl-sem in Fig. 4.36. The basic idea behind the cfsl is that the control flow semantics of the Language Under Development (Lud) are specified as Control Flow Specification Graphs (cfsgs, for short). Every cfsg is an instance of the cfsl meta model and conforms to the abstract syntax meta model of the Lud. The set of cfsgs then have to be specified by the language designer who defined the Lud and its semantics.
From the set of all cfsgs (i.e., one for each language construct) and the transformation rules in cfsl-sem, Groove generates a set of transformation rules, denoted lud-flow, which specifies the control flow semantics of the Lud. Stated differently, given a single cfsg of some language construct, say C, and the set of transformation rules cfsl-sem, Groove generates a (possibly singleton) set of graph production rules which specifies the construction of control flow graphs for any C-instance in the abstract syntax graph of the Lud program. Applying the transformation rules in lud-flow to the Abstract Syntax Graph of some Lud program P then results in the corresponding pg of P . Whether the pgs obtained this way are subject to further analysis such as, e.g., simulation, is outside the scope of this Master project, as indicated by the dashed arrow in the figure.
CFSL Meta-model
In this project we distinguish three types of control flow, namely sequential,
conditional, and disruptive flow. Each type of control flow is facilitated through
one or several constructs in the cfsl. All available elements in the cfsl and their associations are shown in the meta-model of the cfsl (Fig. 4.37), i.e., the model to which every cfsg has to conform. We will shortly discuss the most important concepts and relate them to the different types of control flow just mentioned.
Chapter 4. Semantics Through Graph Transformations
GROOVE Tool Set
LUD-FLOW LUD Abstract Syntax Graph LUD Program Graph flow graph construction Control Flow Specification Graph
flow graph rules construction CFSL-SEM language designer specifies CFSL Meta Model defines LUD Abstract Syntax Meta Model instance of instance of conforms to meta language designer defines defines
Figure 4.36: From cfsl specifications to control flow graphs.
Every cfsg specifies the control flow semantics for a specific language con- struct; language constructs are represented by AbstractSyntaxElements (in the sequel referred to as ASE). The ASE can be seen as a generic node representing all elements of the ASG (which all happen to be nodes) to which control can be transfered during execution; it is similar to the FlowElement in Taal. As a rule of thumb, we can say that for every type of ASE we specify a single cfsg. The outgoing entry and an exit-edges identify the point at which the actual execution of that ASE starts or ends, respectively.
4.9 Extensions AbstractSyntaxElement Branch Exit Abort PrimitiveValue -reason 1..* -abort 0..1 -exit 0..1 -branchOn 0..1 -resumeAbort * -abortFrom * -condition -branch * -branchDefault 0..1 -entry 0..1 -exit 0..1 -flow 0..1 -KeyElement 0..1
Figure 4.37: The meta-model of the cfsl.
Sequential flow represents the type of control flow that reflects the appear-
ance of statements in the program. Statements that are executed subsequently are connected by edges labelled flow. Conditional flow refers to the type of con- trol flow that is based on the evaluation of some expression, as in the case of the whilestatement. For each potential outcome of the evaluation of the expres- sion, we introduce a Branch-node. Those Branch-nodes are connected (1) to the KeyElementthrough a branch-edge, (2) to the original expression by means of a condition-edge, and (3) to the corresponding value with a branchOn-edge. The branchDefault-edge represents the branch that is taken when no other branches apply. The last type of control flow is the disruptive flow. A statement is said to introduce disruptive flow of control if it is the cause of an abrupt termination (in contrast to what is often called successful termination). Disruptive flow is mod- elled through an Abort node to which control flows along an abort-edge. Every Abortnode has a reason-edge pointing to the ASE that caused the abrupt ter- mination. In some cases of disruptive flow it cannot immediately be determined at which statement to continue the program. For these situations we introduce the abortFrom and resumeAbort-edges. The details of how this is handled are out of scope of this thesis. The interested reader is referred to [174].
Example CFSG
We will now give an impression on how the control flow semantics of the while statement in Java would be specified in cfsl. For this, we start at the very
Chapter 4. Semantics Through Graph Transformations
first beginning, namely the BNF rule for the while statement. We require that the grammar is enriched with role names for the different elements that form the context of a while statement; those role names will reappear in the cfsg. The WhileStatement BNF rule could then look as follows:
WhileStatement ::= <WHILE>
<LPAR> condition:Expression <RPAR> body: Statement
The elements <WHILE>, <LPAR>, and <RPAR> represent the while-keyword and the parentheses enclosing the condition, and are only part of the concrete syntax; they will not be part of the ASG. This BNF rule gives rise to a local ASG as show in Fig. 4.38.
Figure 4.38: Local ASG of the while statement.
The cfsg of the WhileStatement, which is shown in Fig. 4.39, includes ele- ments of all three types of control flow. First of all, the entry and the exit of the execution of the WhileStatement are specified. The execution of a WhileState- ment starts with evaluating its condition. Thus, the node pointed to by the condition-edge will be the WhileStatement’s entry. Its exit is represented by an unlabelled node. If the WhileStatement terminates successfully, control will flow to this node. The sequential flow is specified by the flow-edges from Expression to WhileStatement and from Statement to Expression. That is, after evaluating the condition, control will always flow to the WhileStatement, and after exe- cuting the body, control will always flow to the condition, which will then be reevaluated.
The conditional flow is specified by means of two Branch nodes, since the Expression has exactly two possible outcomes, namely true and false. On true, the body is executed another time; on false, the WhileStatement has terminated successfully and control will flow to the its exit.
There are cases in which the execution of the body of a WhileStatement is disrupted. We will discuss two of those cases, being the execution of a break
4.9 Extensions
Figure 4.39: cfsgfor the Java while statement.
and a continue statement. This is specified by two Abort nodes having incom- ing abortFrom-edges originating from the body. An Abort node must have an outgoing reason-edge by which it keeps track of the AbstractSyntaxElement that caused the abruptive flow. For the WhileStatement, both the break and the continuestatement refer to the body as the reason for abruptly terminating the WhileStatement. Both cases, however, differ in the way control is specified to continue. A break statement causes control to flow to the exit of the WhileState- ment, whereas a continue statement directs control to the condition which will then be reevaluated to determine whether the body of the WhileStatement has to be executed (at least) once more.
4.9.3
Remarks
The extensions discussed in this section point out that, on the one hand, the approach applied for specifying the semantics for Taal is general and can in principle be applied to any programming language. On the other hand, we have shown that the approach can also be applied on a higher level of abstraction. That is, the cfsl can be regarded as a meta-language for which we have spec- ified the semantics in terms of graph transformation rules by hand. Whereas, usually, a graph production systems generates graphs, the full transformation of a cfsg results in a set of final graphs which should be interpreted as graph transformation rules. Here we encounter another advantage of the Groove ap- proach in which graph transformation rules are specified as graphs themselves.
Chapter 4. Semantics Through Graph Transformations
4.10
Conclusion
4.10.1
Summary
In this chapter we have shown how to use graph transformations for specify- ing the control flow and operational semantics of programming languages. The graph transformation formalism offers a number of advantages. First, the visual presentation of the graph transformation rules provide an intuitive understand- ing of the semantics. Second, formal verification techniques become available. Furthermore, the graph transformation rules offer the possibility to include in one mathematical structure, the graph, information on both the run-time system and the program that is being executed. Traditional approaches to operational semantics (e.g. [198, 24, 1, 150, 25, 49]) often need to revert to inclusion in the syntax definition of run-time concepts, e.g. inclusion of the concepts of lo- cation to indicate a value that may possibly change over time. This seems to be an artificial manner of integrating parts of the language definition, i.e. of the abstract syntax and the semantic domain, that can be avoided using graph transformation rules. Finally, in graph transformation rules, context informa- tion can be included more naturally and uniformly than for example when using SOS-rules [151, 198].
The example language Taal that we have developed comprises some of the fundamental aspects of object-oriented programming languages, like inheritance, including dynamic method look-up, and object creation. The structure of our solution, and the ease with which it can be applied to others languages as shown in Section 4.9, makes us confident that the approach can be extended to real-life software languages in the object-oriented paradigm:
• All the transformation steps (parsing, static analysis, flow generation and simulation) are structured according to the concepts in the abstract syn- tax. This lends a modularity to the definitions that is independent of the language being defined.
• The structure of the Flow and Execution Graphs is generic, in the sense that the elements therein are not specific to Taal; rather, they capture the essential aspects of imperative, object-oriented languages.