In the previous section, the notion of term graph helped us make sense of the idea of the “same” sub- computation or value existing in different computations. To “change” a node is to retain it in the new state, but label it with a different constructor, or change the outgoing edges in which it participates. We now show that reuse of nodes also arises naturally within a state if computations and values are represented as term graphs.
Consider functional computations that operate on structured values such as lists and trees. These programs take values apart, via pattern-matching and projection, and assemble new ones, via construction. When these assembly and disassembly actions take place, the sub-graphs representing values are automatically reused. For example when projecting a component out of a structured value, we return it “by reference”, and similarly when we construct such a value, we include its components “by reference”. Sharing also takes place when a computation simply returns a value constructed elsewhere: when evaluating a variable, we return “by reference” the value to which it is bound, and for a conditional expression, we return “by reference” the value computed by the selected branch. This kind of natural sharing is a consequence of dataflow, the paths that values take through the computation.
Implementations of mature functional languages often use term graphs, because they tend to be based on imperative languages with pointers in which graphs are easy to implement. The kind of reuse described above is then the default behaviour, in that when mentioning a node, one must explicitly copy its sub-graph to avoid sharing its representation. With lazy languages, sharing has an additional role to play in avoiding the duplication of delayed computations [Wad71]. In these cases sharing is technically an optimisation
because it is unobservable at the level of the computation – one cannot write a program whose abstract behaviour differs depending on whether or not values are shared.
In LambdaCalc, the situation is slightly different, because the paths taken by values through the compu- tation are made explicit to the user. (We will see this more comprehensively in §2.5below.) These paths are determined by the semantics of the language, and establish a relation of synonymy between values. This synonymy in turn serves as the basis of a visualisation technique with a dual role: enabling more compact views, and helping the user understand a computation by revealing how it decomposes and composes values. This is not an “optimisation” particular to the implementation but rather an aspect of the semantics exposed by the tool.
Let us consider an example. In Figure2.8(a), the user initially sees the valueSome("claire"). As before, the rectangular tab to the left of the value pane indicates that the value has an explanation, but in this example the explanation is completely hidden. (It would be more consistent to display a single ellipsis for the empty partial explanation, but for compactness we omit it.) This illustrates a presentation option which might for example be enabled by default in a distributed setting whenever a value is computed remotely.
Now suppose the user wants to understand the provenance of that value, for example why the string
"claire" appears here. They would start by revealing the partial explanation shown in (b). This partial explanation indicates that the result was computed by applying a function calledlookupto two arguments: an integer7, and a binary tree containing some data. Suppose that here the user knows that the nodes of the tree store(k, s) pairs and are sorted by integer keys k, and thatlookup returns eitherSome(s), wheres is the string associated withk in the tree, orNoneifk is not found. What the partial explanation tells them is that the key7was found in the tree, but also that the value"claire"is not just equal to, but is identical to (synonymous with) the occurrence of the same string in the input tree. This is indicated by the dotted line pointing back from the output to the occurrence of that string in the input. Since this notion of synonymy coincides with the sharing that arises naturally as a consequence of dataflow, we call these sharing links.
However, although these links do represent a kind of sharing, an implementation of our approach may internally choose to store data quite differently from the arrangement implied by the sharing links. In a distributed setup, values might be “shared” in this abstract sense but duplicated in the implementation; and conversely, values might be “distinct” in this abstract sense but equal and therefore able to share a representation. Such implementation choices are not part of the abstract operational model, and therefore would be invisible to a user of the system. (It may occasionally be useful to see details of the underlying language implementation too, but that is not our goal here; throughout this thesis we will only be concerned with execution with respect to a reference semantics.)
The sharing links become more informative as the user exposes more of the computation. In (c), they expand the ellipsis to reveal the body oflookup. We see that the tree passed tolookupis bound to the pa- rametertand then immediately pattern-matched as aBranchnode. (For this example, we have suppressed the presentation of dead branches.) Pattern-matching binds the variablest1,kvandt2to the components oft, establishing further sharing links. But note how the visualisation shows the components oftpointing to the values of the three variables bound to them, rather than the other way around. The rule is that the first occurrence of a value in the visualisation, with respect to a postorder traversal of the view structure, is
Some("claire")
(a) Explanation completely hidden
lookup 7
Branch Branch Empty
Pair(3,"simon") Empty
Pair(4,"john") Branch Branch Empty
Pair(6,"sarah") Empty Pair(7,"claire") Empty … Some( ) lookup k:7 t:Branch ↠ case t of
Branch t1:Branch Empty
Pair(3,"simon") Empty
kv:Pair(4,"john")
t2:Branch Branch Empty
Pair(6,"sarah") Empty Pair(7,"claire") Empty ↠ case GT of GT ↠ lookup k t2 … Some( )
(b) Partly expanded (c) Browsing into body oflookup
lookup k:7
t:Branch
↠ case t of
Branch t1:Branch Empty
Pair(3,"simon") Empty
kv:Pair( ,"john")
t2:Branch Branch Empty
Pair(6,"sarah") Empty Pair(7,"claire") Empty ↠ case compare k fst kv 4 … GT of GT ↠ lookup k t2 … Some( ) lookup k:7 t:Branch ↠ case t of
Branch t1:Branch Empty
Pair(3,"simon") Empty kv:Pair(4,"john") t2:Branch ↠ case of GT ↠ lookup k:k t: ↠ case t of
Branch t1:Branch Empty
Pair(,"sarah") Empty kv:Pair(7, ) t2:Empty ↠ case EQ of EQ ↠ Some snd kv "claire" Some( )
(d) Partial explanation ofGT (e) Browsing into recursive call Figure 2.8 Exploring a computation to reveal value assembly and disassembly
the one that is actually rendered, with any other occurrences then being rendered as sharing links. This (ad- mittedly simplistic) convention means that the user can observe sub-values propagate into the computation
via pattern-matching and projection, and propagate out of the computation via construction.
Moreover, by controlling how much they see of the execution, the user can control how much decompo- sition of the input tree they see. After the initial pattern-match fort, we see that some kind of comparison was made, yielding the valueGT (“greater than”), whose explanation is also hidden. On the basis of that result, lookup was called recursively on the subtree bound tot2. In (d), the user reveals the explanation behind GT: the keykbeing searched for was compared with the first component ofkv, the key-value pair currently being considered. An additional sharing link indicates the consumption of the first component of
kv by the projectionfst kv. In (e), the user expands the recursive call, and sees something analogous to what happened in (c): namelytbeing bound to a tree and then pattern-matched as a branch, causing more binding, and as a consequence more sharing. The user has moved smoothly from an extensional view of the computation where the function monolithically mapped input to output, to a more intensional view where the input has been broken into sub-values distributed through the computation. It is quite visible now to the user howlookuprecursively consumes its tree argument.
To recap, the operational semantics of a functional language can be interpreted in a way that exposes the fine-grained paths that values take through the computation. This information can be exploited to make visualisations both more compact and more informative. Indeed, to neglect this aspect of the computation, as debuggers generally do, is to hide from the user an important aspect of what their program actually does, namely assemble and disassemble values. In Chapter5we will show that, with a modest extension to the in- terpreter, this fine-grained structure is inherent in the semantics, not an internal detail of an implementation. This may explain why it arises naturally in term-graph implementations, albeit as an optimisation.
Benefits aside, the visualisation scheme shown here based on sharing links is rather naïve, and quickly de- grades in usefulness in the presence of non-linear sharing and as the amount of computation being visualised increases. A better approach, that would require additional implementation effort, would be to visualise the transitive reduction of the dataflow graph directly: in other words, the paths taken by values through the computation, rather than the relation of synonymy which those paths entail. We would expect this to scale better because the edges of this graph are more “local” than sharing links. It would also make more explicit the connection to the slicing features we discuss in §2.5, which work by back-propagating demand along exactly these edges. Unfortunately this is beyond the scope of the present work.