4.4 Pattern Matching
4.4.2 Typing pattern matching expressions
The rules for typing pattern matching expressions and pattern match rules appear in figure 4.12. For the expression itself, the only interesting piece to note is that the scrutinee must have decorated type. All the interesting bits are in the rule for
4.4. Pattern Matching Γ` E : Decorated n < Tn> Γ` P -> Ep: n< Tn>→ T Γ` case E of P -> Ep : T (E-case) xp : (n< Tl> ::= Tr)∈ P θ∈ mgu(Tl = Tn) θ(Γ, xv : dec(Tr))` E : θ(T) Γ` xp(xv) -> E : n < Tn>→ T (P-prod)
Figure 4.12: Typing rules for pattern matching (E-case and P ) of Ag.
patterns.
Γ` E : T → T Patterns
The typing judgment for patterns is supplied with both the scrutinee type and also with the return type of the expression. There are a few subtleties in the rule for patterns. Let us start with the less obvious. Note the application of dec to Tr in
P-prod. The use of this function appears here, because the children being extracted from the scrutinee are decorated–just as the children in a production appear decorated within its body, to its equations. But also notice this function (dec) is applied prior to the use of θ. We will momentarily discuss θ in depth, but for the moment we can think of it as being information about how type variables are instantiated for the scrutinee. That is, instead of writing an equation solely about a general Pair<a b> we will be specifically writing an expression about a concrete instantiation, for example the scrutinee’s type may be Pair<Expr Boolean>. Thus, dec must look at the type before the instantiation shows up, otherwise we would be under the erroneous impression that the first element of the pair was a Decorated Expr.
The explicit use of θ in the type rule P-prod is the cost that we must pay for supporting GADTs. The approach we show here for handling GADTs in patterns are
4.4. Pattern Matching
adapted from an especially simple to implement approach to handling them[73]. In that paper, much attention is paid to a notion of wobbly and rigid types. Thanks to the concessions in type annotations we must make due to the attribute access problem discussed in section 4.3.2, all bindings in Ag can be considered rigid in their sense4, leading to our slightly simpler type rules.
The essential idea is to compute a most general unifier (θ) between the pattern scrutinee’s type and the result type of the production5. The explicit application of θ to
the environment and type makes these assumptions visible while checking that match rule’s expression, but also means these assumptions are “undone” when we move on to the next match rule. In effect, all this rule is really stating is that whatever type information we learn from successfully matching a particular GADT-like production stays confined to that branch of the pattern matching expression. In this way, when pattern matching on an expression of type Foo<a> and matching a production that constructs a Foo<Integer>, we type check the corresponding expression under a “world” where a no longer exists because it has been rewritten away to Integer.
One down side to the introduction of pattern matching (with GADTs supported in this manner,) is that we may be unable to infer the resulting type of a pattern matching expression in a bottom-up way. The “attribute access problem” we men- tioned earlier was resolved on the assumption that we would always be able to infer types for subexpressions. Although this represents a hole in that reasoning, it’s a small one. It requires an attribute access to occur directly on a pattern matching subexpression that matches exclusively to GADT-like productions, with no interven-
4Well, in practice almost all are rigid. Where we do have a difference, we choose to instead
simply raise a (possibly unexpected) type error, rather than deal with the complication of “wobbly” types.
5The need to concern ourselves with “fresh” most general unifiers in the sense of the cited paper
4.4. Pattern Matching
ing explicit result type. That is, an expression like (case x of eq() -> y).attr, where our algorithm is unable to discover the result type of the return expression. Normally, we would find the result type of the case expression by seeing how it is used (and then later checking this against the type of the expression y inside the match rule,) but in this case we only use it by accessing an attribute–which doesn’t help us discover its type due to the attribute access problem. We have yet to have any users run afoul of this small hole in our type inference engine in practice, and so we’ve chosen to simply live with the potentially unexpected error message.
4.4.3 Semantics
In giving the semantics for pattern matching, we run into a serious technical issue. We had noted in passing earlier that GADTs were quite easy to support on the attribute grammar side of things, but they caused a complication in our typing rules for pattern matching (the need for θ in the rules for patterns). However, there is in fact a duality to this issue, something patterns make easy that attribute grammars make hard. Consider a simple pattern, such as one within a sum function matching on a List<Integer>. But now consider what this would be equivalent to: a sum attribute with type Integer that occurs on List<Integer>... except that a concrete type like Integer is illegal in the parameters of the nonterminal in an occurs on declaration! Only variables may appear there, not concrete types.
We chose to permit only variables there (and thus forbid these “partial occur- rences”) for two reasons. First, it is simple, and prevents a proliferation of hacks (like
θ) from polluting all the type rules for the language. Second, this syntactic restric-
tion lets us re-use previous work on the semantics of attribute grammars. Translation down to a functional language for productions of parameterized nonterminals is iden-
4.4. Pattern Matching
tical to simple nonterminals, except that parameterized types are used instead of simple ones. However, these “partial occurrences” throw a wrench into that machin- ery. What does it mean for an attribute like sum to occur only on List<Integer>? Does that mean productions have different semantic functions, depending on the type parameters of the nonterminal?
We will sketch a solution to these issues in section 4.4.4 by using a more powerful type system. But first, we proceed to give a semantics based on the restriction that pattern matching expressions are only used on decorated types where the parameters are held abstract (identical to how equations in productions work: we can’t know what the types A and B are within a pair<A B> production.) This is a somewhat unreasonable restriction, as it precludes our simple of example of the pattern matching in a sum function. However, it is not useless exercise, as the shape of the translation will actually be identical for the general case: all we’re really missing is an expressive enough type system in the target language.
In figure 4.13 we give a semantics to pattern matching by translation to attributes, under the assumption that there are only variables in the type parameters of the nonterminal type of the scrutinee. The places where this assumption shows up have been highlighted in the figure. This translation analyzing a case expression and yields a set of declarations (D) that declare a synthesized attribute and its equations that are equivalent to the pattern matching expression. The type of the generated attribute is a function, from the free variables that appear in the match rule’s expression, to the result type of the pattern matching expression. After generation of this attribute and these declarations, the pattern matching expression can be replaced (to use the notation from the figure) with E.ν(xf ree). That is, simply accessing the generated
4.4. Pattern Matching
[[Γ` case E of P -> Ep : T]] =synthesized attribute ν < v > ::(T ::= Tf ree);
attribute ν < v > occurs on n < v > ; [[Γ` P -> Ep: n< v >→ T]] ν xf ree:: Tf ree
where : ν is a fresh name xf ree= f v(P -> Ep)
Γ` E : Decorated n < v > Γ` xf ree : Tf ree
[[Γ` xp(xv) -> Ep : n< v >→ T]] ν xf ree:: Tf ree =aspect xp
top::n < Tn> ::= xv:: Tv
{
top.ν = \xf ree:: θ(Tf ree)− >Ep;
}
where : xp : (n< Tn> ::= Tv)∈ P
θ = [ v 7→ Tn]
[[Γ` _ -> Ep: n< v >→ T]] ν xf ree:: Tf ree=default
top::n < v > ::= {
top.ν = \xf ree:: Tf ree− >Ep;
}
Figure 4.13: A translation from fully-typed (thus why the whole judgment appears within brackets) pattern matching expressions to attribute declarations. Highlighted are our re- strictions to variables rather than types.
The syntactic restriction to variables only shows up in two places in Ag. First is the type parameters of the nonterminal in the occurs-on declaration, and second is the type parameters of the default production. As a consequence, we see that we require the scruntinee E’s type to also have only variables for its type parameters.
In the rule for the expression, note that when we write f v(P -> Ep), we do not
consider the variables bound by P to be free. That is, a match rule like p(x, y) -> x + y + z would only have z as a free variable.
4.4. Pattern Matching
In the rule for patterns, we see the effect of GADTs in a much more simplified form. When patterns are translated to attributes, we would simply have written different (concrete) types instead of variables for the free variables of the expression. That is, when matching on a Nt<a> with a free variable also of type a, and we match against a production of type Nt<Integer>, we are simply accepting a free variable of type Integer now.
And so, with a restriction to type variables, we are able to translate away pattern matching expressions to attributes. This conveniently gives us exactly the same semantics for “missing match rules” as for missing equations. As a result, pattern matching expressions will “look through” forwarding production, and successfully match on trees they eventually forward to.