Study of the Subtyping Machine
of Nominal Subtyping with Variance (full version)
ORI ROTH,TechnionβIIT, Israel
This is a study of the computing power of the subtyping machine behind Kennedy and Pierceβs nominal sub- typing with variance. We depict the lattice of fragments of Kennedy and Pierceβs type system and characterize their computing power in terms of regular, context-free, deterministic, and non-deterministic tree languages.
Based on the theory, we present Treetopβa generator of C#implementations of subtyping machines. The software artifact constitutes the first feasible (yet POC) fluent API generator to support context-free API protocols in a decidable type system fragment.
CCS Concepts: β’Software and its engineering β General programming languages; Inheritance;API languages; Software development techniques.
Additional Key Words and Phrases: subtyping, variance, metaprogramming, fluent API, DSL
1 INTRODUCTION
Object-oriented programming is all about inheritance or subtyping, and how software can and cannot be properly engineered using it. In the eyes of the theoretical computer scientist or the type theorist, the question is what (compile-time) computations can and cannot be done with subtyping.
The question becomes challenging when generics, i.e., parametric polymorphism, interact with subtyping, raising the question of when a certain generic taking type argument π‘1is a subtype of the same generic taking type argument π‘2. In the invariant regime of subtyping with generics, the answer is negative (unless π‘1= π‘2); in the covariant regime, the answer is positive precisely when π‘1 is a subtype of π‘2; in contrast, in the contravariant regime, when π‘2is a subtype of π‘1.
Consider, for example, the C#[Hejlsberg et al. 2003] code inList. 1(unless said otherwise, all code examples in this paper are written in C#):
1 interfacea<out x> {} interface b<out x> {} interfaceE {}
2 classv0<x> : a<v0<a<x>>>, a<a<x>>, a<x>, b<v0<b<x>>>, b<b<x>>, b<x> {}
3 a<b<b<a<b<b<a<E>>>>>>> w1 = new v0<E>(); // Compiles: πππππππ is a palindrome
4 a<b<b<a<b<a<a<E>>>>>>> w2 = new v0<E>(); // Does not compile: πππππππ is not a palindrome Listing 1. C#subtyping machine recognizing palindromes over {π, π}
The program in List. 1defines four classes (lines 1β2): generic (polymorphic) classesa and b employing a covariant type parameterx, as indicated by the keywordout; non-generic classE; and generic classv0 that inherits from the six supertypes written after the colon. Below the classes are two variable declarations (lines 3β4). The type of variablew1 is a<b<b<a<b<b<a<E>>>>>>>, encoding the word π€1 = πππππππ, and the type of variablew2 is a<b<b<a<b<a<a<E>>>>>>>, encoding the word π€2 = πππππππ.List. 1demonstrate the encoding of β , the non-deterministic context-free formal language of palindromes over alphabet{π, π}, in the type system of C#. Each variable is assigned an instantiation of typev0<E>, that type checks if and only if the variable type encodes a palindrome. Thus, the assignment to variablew1 type checks since π€1β β, but the assignment tow2 fails since π€2β β. The mechanics behind this βmagicβ, and why it matters, is revealed by our study of the computing power of subtyping in the context of the work ofKennedy and Pierce[2007] on thenominal subtyping with variance type system.
Authorβs address: Ori Roth, TechnionβIIT, Haifa, Israel, [email protected].
arXiv:2109.03950v1 [cs.PL] 8 Sep 2021
1.1 Kennedy and Pierceβs Type System and Its Decidable Fragments
Fig. 1places our results in context. The figure shows a three-dimensional lattice, structured similarly to the π-cube [Barendregt 1991]. The lattice is spanned by three type system features, which are explained briefly here and in greater detail inSection 3. The head of the latticeSβ€is the abstract type system developed byKennedy and Pierce[2007], which employs all three features. The other lattice points are type system fragments ofSβ€that use different feature combinations. Nodes are adorned with references to previous works (shown as citations) and to our contributions (shown as forward references).
β¨X, Mβ© Sβ€
β¨Xβ© β¨C, Xβ©
β¨Mβ© β¨C, Mβ©
Sβ₯ πΆπππ‘ π π π£ππ ππππ‘ β¨Cβ©
πΈπ₯ππππ ππ£π
Multiple Instantiation
Multiple Instantiation Multiple
Instantiation
πΆπππ‘ π π π£ππ ππππ‘
πΈπ₯ππππ ππ£π
πΆπππ‘ π π π£ππ ππππ‘
Multiple Instantiation
πΈπ₯ππππ ππ£π
πΈπ₯ππππ ππ£π
πΆπππ‘ π π π£ππ ππππ‘
Kennedy and Pierce[2007]
Grigore [2017]
Prop. 5.4 Prop. 5.4
Cor. 5.3 Cor. 5.3
Prop. 5.8
Cor. 5.7
Background color indicates the computing power of the type system:
Deterministic regular forests Regular forests
Deterministic context-free forests Context-free forests with grammars
in Greibach normal form Recursively-enumerable
Fig. 1. Lattice describing the eight subtyping type system fragments employing contravariance (C), expansive inheritance (X), and multiple instantiation inheritance (M) (defined inSection 3)
The lattice is spanned by three orthogonal Boolean coordinates or features denoted by C, X, and M, representing the three core features ofSβ€:Contravariance, eXpansively-recursive inheritance, and Multiple instantiation inheritance. The most familiar feature is the kind of variance; the horizontal direction in the figure denotes the distinction between covariant type systems (positioned on the plane on the left-hand side of the lattice) and contravariant type systems (right-hand).
Precise definitions of the three features follow inSection 3; here we provide a quick review of the concepts and the intuition behind them:
(C)Contravariance: Type arguments assigned to contravariant type parameters can be substi- tuted with less derived types. For example, using the βis aβ relationship we can say that if x ina<x> is contravariant and b is a c then a<c> is an a<b>.
1 interface a<in x> {} // type parameter x of class a is contravariant
2 interface b : c {} // type b is a subtype of c
3 a<b> x=(a<c>) null; // type a<c> is a subtype of a<b>
(X) Expansively-recursive inheritance: The expansion of types through inheritance (as defined byViroli[2000]) is unbounded. Intuitively, this expansion involves thecuriously recurring template pattern [Coplien 1996], where a class appears in one of its supertypesβ generic arguments.
1 interface a<x, y> : b<a<b<x>, y>> {} // class a's inheritance is expansivelyβrecursive
(M)Multiple instantiation inheritance: Inheritance is non-deterministic [Kennedy and Pierce 2007], i.e., type arguments of supertypes are not unique.
1 interface a : b<c>, b<d> {} // supertype b is instantiated twice, with arguments c and d
A point in the lattice is denoted by the set of its features. Type systemSβ€is denoted by the triple β¨C, X, Mβ©, and the notation β¨C, Xβ© designates a type system that allows contravariance and expansive inheritance, but not multiple instantiation inheritance. The code examples above also demonstrate two key features ingrained withinSβ€and its fragments: βpolyadicβ parametric polymorphism and multiple inheritance, allowing a class to have multiple type parameters and supertypes, respectively.
As shown inFig. 1, the new results here, together with the proof thatSβ€is undecidable due to Kennedy and Pierce and the subsequent proof byGrigore[2017] implying thatβ¨C, Xβ© is undecidable, completely characterize the computing power of all eight points of the lattice. This characterization is not in terms of the Chomsky hierarchy of classes of formal languages, but rather in terms of a similar hierarchy of classes of forests, also called formal tree languages (see, e.g., [Comon et al.
2007]), as depicted byFig. 2. The classes appearing in the figure are defined inSection 2.
DRF
NRF DCF
NCFGNF NCF REF
DRF Deterministic Regular Forests NRF Non-deterministic Regular Forests DCF Deterministic Context-free Forests NCFGNF Non-deterministic Context-free Forests with
grammars inGreibach Normal Form NCF Non-deterministic Context-free Forests REF Recursively-Enumerable Forests
Fig. 2. A Chomsky-like hierarchy of formal forest classes (defined inSection 2)
As shown inFig. 2, the hierarchy of tree languages is a non-linear lattice, spanned by two orthogonal Boolean features abbreviated by letters in the acronyms in the figure as follows:
β’ Determinism (D < N) distinguishing between Deterministic forests and their super-set, Non- deterministic forests;
β’ Method of specification (R < C) distinguishing between Regular forests and their super-set, Context-free forests.
InFig. 2, we see that forest class DRF is the smallest class in the diagram. Moving up the lattice, the two next classes, NRF and DCF, cannot be compared. Both are contained in NCF, which is contained in REF, the class of all recursively enumerable forests. We also make use of class NCFGNF β NCF, which includes the non-deterministic context-free forests that can be specified by a tree grammar in Greibach normal formal (GNF). Observe that class NCFGNFalso contains DCF and NRF.
The theoretical contribution of our work is a characterization of the computing power of the six decidable points in the lattice ofFig. 1in terms of the hierarchy ofFig. 2. These results are obtained by examining thesubtyping machine behind subtyping queriesβa kind of automata with finite control (dictated by the nominal inheritance table) whose auxiliary storage is not a tape, stack, or tree, but rather the contents of a specific subtyping query that changes along the run of this automaton. These machines stand behind Kennedy and Pierceβs study and Grigoreβs emulation.
1.2 Fluent APIs and Treetop
Fluent API (application programming interface) [Fowler 2005] is a trendy practice for the creation of software interfaces [Nakamaru et al. 2020]. In a statically-typed, object-oriented setting, a fluent API method returns a type that receives other API methods. In this way, fluent API methods are invoked in a chain of consecutive calls:
π1().π2().Β· Β· Β· .ππ() (1)
The main advantage of fluent API is its ability to enforce a protocol at compile time. If an invocation of method ππ+1after a call to ππbreaks the protocol, then the type returned from method ππ must not declare ππ+1, which would cause the expression to fail type checking.
A practical application of fluent API is embedding domain specific languages (DSLs) in a host programming language [Gil and Levy 2016;Gil and Roth 2019;Nakamaru and Chiba 2020;Nakamaru et al. 2017;Xu 2010;Yamazaki et al. 2019]. The syntax and semantics of a DSL are optimally designed for its domain; for instance, SQL [Chamberlin and Boyce 1974] is a DSL suited for composing database queries. If the protocol of the fluent API is a grammar πΊ describing the DSL, then the chain of method calls ofEq. (1)type checks if and only if π€ = π1π2Β· Β· Β· ππ β πΏ(πΊ), i.e., the expression encodes a legal DSL program. For example, with an SQL fluent API implemented in Scala [Odersky et al. 2004] we can embed database queries as Scala expressions:
select("id").from(grades).where(_.grade>90).
The expression
select("name").where(_.length==4), however, does not type check, as it represents an illegal SQL query,
SELECT name WHERE length=4.
The academic study of fluent APIs seeks new ways to encode increasingly complex DSL classes.
Results in the field are given in the form ofgenerators: software artifacts that convert a formal gram- mar into a fluent API encoding its language. Fling [Gil and Roth 2019] and TypeLevelLR [Yamazaki et al. 2019] are two prominent fluent API generators that support all deterministic context-free languages (DCFLs).Grigore[2017] implemented a generator that accepts context-free grammars describing general context-free DSLsβa proper super-set of DCFLs. Specifically, his implementation coded a CYK parser in an intermediate simple, yet high level language, and then translated the code in this language into a Turing machine, being emulated in a subtyping query. Unfortunately, the resulting APIs are unusable in practice, as they lead to extremely long compilation times and crash the compiler.
ExaminingFig. 1we see that thecontravariant regime for combining genericity with subtyping is essential for the undecidability result of Kennedy and Pierce and its use by Grigore to encode Turing machines with Java [Arnold et al. 2006] generics. Contravariance is also essential to Grigoreβs im- plementation of a fluent API generator forall context-free languages. Grigoreβs result is noteworthy
since the best previous fluent API generators
[Gil and Levy 2016;Gil and Roth 2019;Yamazaki et al. 2019], relying on invariance, were only able to encode the deterministic subset of context-free languages (languages with an LR grammar). As shown inFig. 1, our results prove that the covariant regime, exemplified by the simple phrase βa bag of apples is bag of fruitβ, is weaker than its contravariant counterpart.
Establishing that covariance is weaker than contravariance and noticing that it is stronger than invariance, it is natural to ask whether the entire class of context-free languages can be encoded in a type system with nominal subtyping combined with only covariant genericity. Conversely, the question is motivated by the argument that there is no theoretical difficulty in recognizing context-free languages in a Turing-complete type system, but it might be a challenge to recognize these in a decidable type system.
Unfortunately, the code produced by Grigoreβs fluent API generator collapses under its own weight (measured in gigabytes) and crashes rugged modern compilers. Another natural question to ask is: Is the (computationally) weaker artillery of covariance also lighter in weight (in terms of code size)?
Treetop is a product of our theoretical study that answers both questions affirmatively. It is a proof of concept (POC) generator of βcovariant generic subtypingβ machines that are implemented as C#code. Given πΊ , a CFG defining a formal language β (such as the language of palindromes in the example ofList. 1), Treetop generates a library in C# that defines a fluent API of β . The library uses expansive inheritance (the X feature in the lattice ofFig. 1) and multiple instantiation inheritance (M in this lattice), but not contravariance (C). Thus, Treetop not only supports general context-free grammars, but it also does this in a decidable type system fragment. We demonstrate in an experiment non-trivial subtyping machines generated by Treetop that compile relatively fast, yet we also identify at least one example which induces impractically long compilation times using the CSC C#compiler.
Outline.Sections 2and3elaborate on Kennedy and Pierceβs work on subtyping, to refresh readersβ understanding of concepts such as formal forests and tree grammars, and set up our notations. Subtyping machines are then defined inSection 4as computational devices that model nominal subtyping.Section 5derives the results ofFig. 1. Treetop, our API generator for context-free protocols, is the focus ofSection 6.Section 7discusses the results of our study.
Note on bold notation. The bold version of a symbol denotes a (possibly empty) sequence of instances of the non-bold version of this symbol, e.g.,π stands for π₯1, π₯2, . . . , π₯πfor some π β₯ 0.
This notation is abused when the intent is clear from the context, e.g., π [π β π]
abbreviates
π1[π₯1β π‘1, . . . , π₯π β π‘π], . . . , ππ[π₯1β π‘1, . . . , π₯π β π‘π],
whereπ , π, and π are the three sequences π = π1. . . , ππ,π = π‘1. . . , π‘π, andπ = π₯1, . . . , π₯π. 2 PRELIMINARIES: FORMAL FORESTS AND THEIR GRAMMARS
We assume that the reader is familiar with the classical theory of automata and formal languages (e.g., [Hopcroft et al. 2007]). Here we provide a quick review of the standard generalization of the theory toforests, also called (formal) tree languages. (For a more detailed review, see standard references such as [Comon et al. 2007;GΓ©cseg and Steinby 1997;Osterholzer 2018].)
We use parenthetical notation for terms (trees), e.g.,
π‘= π1(π2(π3()), π4()) (2)
Rank πβ N of symbol π, denoted rank(π), is the number of its children. If π = 0, then π is called a leaf, and if π= 1, then π is monadic. In both cases, πβs parentheses may be omitted, e.g., π‘ inEq. (2) becomes π1(π2π3, π4). The height of a tree π‘ is denoted height(π‘ ), e.g., for π‘ inEq. (2),height(π‘ ) = 3.
The set of trees over ranked alphabetΞ£ (i.e., an alphabet of symbols with ranks) is denoted Ξ£β³, generalizing the Kleene closure πΎβfor forests.
Consider the following definitions, adapted fromComon et al.[2007, Β§2.1.1]:
(a)Nat ::= z (b)Nat ::= s(Nat)
(c)List ::= nil (d)List ::= cons(Nat, List) (3) These are tree grammar productions deriving Peano numbers (productions (a,b)) and cons-lists of those integers (productions (c,d)). An integer πβ N is encoded by the monadic term sπz. Instead of strings, tree grammar productions derive terms: The left-hand side of a production is a variable, and the right-hand side is a term with which the variable can be substituted. As in the string case,
List cons(Nat, List)
(π)
cons(Nat, List)
(π)
cons(Nat, List)
(π)
nil
(π)
s(Nat)
(π)
s(Nat)
(π)
z
(π)
s(Nat)
(π)
z
(π)
z
(π)
Fig. 3. The derivation of the term inEq. (4), encoding the list β¨2, 1, 0β©, from variableList according to the productions ofEq. (3)
terminal (ground) terms are derived from variables by repeatedly substituting the latter according to the given productions. For example, we encode the listβ¨2, 1, 0β© using the term
cons(ssz, cons(sz, cons(z, nil))) (4)
This term can be derived from variableList fromEq. (3), as illustrated inFig. 3.Fig. 3uses arrows to depict variable substitution; an arrow label references the production fromEq. (3)used to apply the derivation.
For readers familiar with functional programming languages as ML [Milner et al. 1997] and Haskell [Jones 2003], it is intuitive to see the grammar inEq. (3)as a set of datatype definitions, as shown inList. 2. The grammar variablesNat and List are datatypes whose value constructors are the grammar terminalsz, s, nil, and cons. For each grammar production π£ ::= π, we define π as a value constructor of π£ ; if π has children, they are used as constructor parameters. Type checking in the resulting ML program coincides with derivation of Peano numbers. In particular, an ML expression belongs to typeList if and only if it can be derived from variable List, e.g., the expression in line 3 ofList. 2encodes the tree inEq. (4)and compiles as expected.
1 datatypeNat = z | s of Nat;
2 datatypeList = nil | cons of Nat * List;
3 cons(s(s(z)), cons(s(z), cons(z, nil))) : List;
Listing 2. The grammar ofEq. (3)encoded as ML datatypes
Formally, a forest grammar πΊ= β¨Ξ£,π, π£0, π β© employs a finite ranked alphabetΞ£, a finite ranked set of variables π , a designated initial leaf π£0β π , and a finite set of productions π . To generate a terminal tree, grammar πΊ repeatedly applies π productions to the initial leaf π£0, as explained below.
Intermediate trees, members of(Ξ£ βͺπ )β³, are called tree forms, similar to sentential forms of string grammars.
Let π be a finite set ofparameter leaf symbols, disjoint to sets Ξ£ and π . Terms over Ξ£ βͺ π , denoted π‘ , are ground, while terms overΞ£ βͺ π βͺ π, denoted π, are unground. Unground terms are used as tree patterns, where the π parameters can be substituted with terms (usually ground). The application of substitution π = {π β π } = {π₯1β π‘1, . . . , π₯πβ π‘π} to unground term π , denoted π [π ], yields a term where every occurrence of parameter π₯π is substituted with term π‘π. (A substitutionβs curly brackets may be omitted if written inside square brackets.) A term π matches term πβ²if and only if there exists a substitution π such that πβ²[π ] = π, denoted π βπ πβ². Letparams(π) β π denote the set of parameters appearing in term π .
A production π= π ::= πβ²β π describes a tree transformation. If tree π‘ matches π with substitu- tion π , π‘βπ π , the application of π to π‘ , denoted π‘[π], yields πβ²substituted by π ,
π‘ βπ π β π‘ [π ::= πβ²] = πβ²[π ].
We require that every parameter π₯β π on the right-hand side of production π β π be found on the left-hand side, so that all the applications of π to ground trees are ground.
Note the distinction between variables and parameters. Variables are found in tree forms and constitute the nodes remaining to be derived. Parameters, on the other hand, appear only in productions and are used as templates in tree patterns.
We say that tree form π‘ derives tree form π‘β², denoted π‘β π‘β²(or π‘ βπΊ π‘β²), if π‘ can be transformed into π‘β²by a single application of an π production:
π‘β π‘β² β βπ β π , π‘ [π] = π‘β².
The tree language (forest) of grammar πΊ , πΏ(πΊ), is the set of terminal trees that can be derived from the initial variable π£0in any number of steps:
πΏ(πΊ) = {π‘ βΞ£β³| π£0βπΊβ π‘}.
Each class (family) of tree grammars is defined by a set of restrictions imposed on the shape of productions. In regular tree grammars that define regular forests (NRF), productions are of the form π£ ::= π (π), where π βΞ£ and π£, π β π ; a more relaxed definition simply restricts variables to be leaves. Observe that the productions ofEq. (3)belong to a regular grammar.
Productions of context-free tree grammars (CFTG), yielding context-free forests (NCF), are of the form π£(π) ::= π (the parameters in π are unique). For example, the following productions belong to a CFTG:
(a) π£0::= π£1(leaf) (b) π£1(π₯ ) ::= π£1(node(π₯, π₯)) (c) π£1(π₯ ) ::= π₯ (5) These productions derive complete binary trees over{leaf, node}. The binary trees are derived recursively. The child of variable π£1, captured by parameter π₯ , describes a tree of height π β₯ 1;
production (b) is then used to replace π₯ withnode(π₯, π₯), the complete binary tree of height π + 1.
The recursion terminates by production (c), which simply returns the constructed tree.
To encode context-free tree grammars in ML, we use polymorphic datatypes, i.e., datatypes that employ type parameters. Consider, for example, the following grammar:
π£0(π₯ ) ::= π (π₯ ) π£1::= πΈ
π£2(π₯ ) ::= π(π₯ ) π£2(π₯ ) ::= π(π£2(π£0(π₯ ))) (6) This grammar defines the context-free language (forest) ππππππΈ (πβ₯ 0). The ML program encoding the grammar is shown inList. 3. Variables π£0 and π£2, which use parameters, are encoded by polymorphic datatypes that define the parameterized type constructorsv0 and v2. Following the datatype definitions are three expressions: While the expression π3ππ3πΈ (line 4) compiles, the expressions π1ππ2πΈ and π2ππ1πΈ (lines 5β6) do not, as expected.
1 datatype'x v0 = b of 'x;
2 datatypev1 = E;
3 datatype'x v2 = m of 'x | a of 'x v0 v2;
4 a(a(a(m(b(b(b(E))))))) : v1 v2;
5 a(m(b(b(E)))) : v1 v2;(β Does not compile β)
6 a(a(m(b(E)))) : v1 v2;(β Does not compile β)
Listing 3. The grammar ofEq. (6)encoded as polymorphic ML datatypes
Not all context-free grammars, however, can be encoded in ML using this method. A production deriving a grammar variable to one of its parameters, e.g., production (c) inEq. (5), results in a nonsensical ML declarationdatatype 'x v='x. In addition, a grammar terminal that appears in more than one production introduces overloaded value constructors, that not supported by ML.
In CFTG in Greibach normal form (GNF), the roots of the right-hand side of productions are terminal, π£(π) ::= π (π ) [Guessarian 1983]. Observe that none of the productions inEq. (5)are in GNF: The roots of the right-hand sides are variables in productions (a,b) and a parameter in production (c), both forbidden in GNF. In contrast, the following productions do belong to a CFTG in GNF:
(a) π£0π₯ ::= ππ£0ππ₯ (b) π£0π₯ ::= πππ₯ (c) π£0π₯ ::= ππ₯
(d) π£0π₯ ::= ππ£0ππ₯ (e) π£0π₯ ::= πππ₯ (f ) π£0π₯ ::= ππ₯ (7) Variable π£0derives palindromes over{π, π}. For instance, the palindrome πππππππ is derived from the tree form π£0πΈ (πΈ is a dummy leaf ) as follows:
π£0πΈ
(a)
βββ ππ£0ππΈ
(d)
βββ πππ£0πππΈ
(d)
βββ ππππ£0ππππΈ
(c)
βββ ππππππππΈ
(The production fromEq. (7)used in each derivation is written above the arrow.) The productions inEq. (7)are context-free, as π£0employs a parameter, and they are in GNF, as the root of the right-hand side of each production is a terminal (either π or π ).
Notice that all regular grammars are context-free and in GNF.
In deterministic context-free tree grammars in GNF, including deterministic regular tree gram- mars, if both π1= π£ ::= π (π ) and π2= π£ ::= π (πβ²) are productions in π , then π = πβ². Deterministic regular grammars define deterministic regular forests (DRF), a proper subset of regular forests, also called path-closed forests [Comon et al. 2007]. Likewise, deterministic context-free tree grammars define deterministic context-free forests (DCF). In contrast to context-free string grammars, not every context-free tree grammar has a GNFβbut every deterministic tree grammar does [Guessarian 1983]. The class of context-free forests that can be described by a grammar in GNF is denoted as NCFGNF.
In this paper, we use an extended version of context-free tree grammars where the initial variable π£0is replaced by an initial tree.
Definition 2.1 (ECFTG). Extended context-free tree grammars (ECFTGs) generalize standard CFTGs by allowing initialization by any tree π‘0. Otherwise, the semantics are the same.
A standard CFTG is extended by definition, with π‘0= π£0. We show that extended grammars are as expressive as standard grammars:
Lemma 2.2. For every ECFTG (in GNF), there exists a CFTG (in GNF) that defines the same forest.
Proof. Let πΊ= β¨Ξ£,π, π‘0, π β© be an ECFTG. If the initial tree π‘0is a variable, π‘0= π£ β π and we are done. Otherwise, let CFTG πΊβ²= β¨Ξ£,πβ², π£0, π β²β©, introducing the initial variable π£0β π , πβ²= π βͺ {π£0}, and employing the original productions, π β π β². If the initial tree is terminal, π‘0= π0βΞ£, simply add the rule π£0::= π0to π β². Else π‘0is not a leaf: Let π‘0= π0(π0), where the root π0βΞ£ βͺ π is of rank one or above. For every production π0(π) ::= π β π if π0is a variable, or production π0(π) ::= π, π = π0(π) if π0is terminal, add production π£0::= π [π β π0] to π β².
Derivations of the initial variable π£0expand it into the initial tree π‘0and apply an original π derivation in a single step. After π£0is derived, the resulting tree form is identical to the one derived from π‘0,
π£0[π£0::= π [π β π0]] = π‘0[π0(π) ::= π], thus they derive the same forest.
π ::=Ξ π
Ξ ::= πΏ1. . . πΏπ πβ₯ 0 πΏ ::= πΎ (βπ) : π params(π) β π
π β© π = β π ::= βπ₯1, . . . ,βπ₯π πβ₯ 0
β ::= β¦ | + | β π ::= π1, . . . , ππ πβ₯ 0 π ::= πΎ (π ) | π₯ π ::= π‘β₯<: π‘β€ π‘ ::= πΎ (π)
π ::= π‘1, . . . , π‘π πβ₯ 0
π program πΏ class declaration
Ξ class table; a set of class declarations πΎ class name drawn from finite setΞ π unground type
π sequence of unground types
π₯ type parameter drawn from finite set π disjoint fromΞ
βπ₯ type parameter with variance
βπ sequence of type parameters with variances
β¦ invariance + covariance
β contravariance π subtyping query π‘ ground type
π sequence of ground types
(a)Syntax (b)Nomenclature
π‘βπ πΎ(π) πΎ(π) : π π‘ <:: π[π ]
(Var)
βπ, π‘π<:βπΎ π
π‘β² π
πΎ(π) <: πΎ (πβ²) π‘ <: π‘β² π‘ <:+π‘β² π‘ <:β¦π‘
π‘ :> π‘β² π‘ <:βπ‘β²
(Super)π‘ <:: π‘β² π‘β²<: π‘β²β²
π‘ <: π‘β²β²
(c)Inheritance (d)Subtyping
Acyclic Inheritance
πΎ(π ) <::+πΎβ²(πβ²) β πΎ β πΎβ²
Valid Variance
Β¬β =


ο£²


ο£³
β β = +
β¦ β = β¦ + β = β
βπβ {β¦, +}
βπ β’ π₯π
βπ,
(βπΎπ β {β¦, +} β βπ β’ ππ
βπΎ
π β {β¦, β} β Β¬βπ β’ ππ
βπ β’ πΎ (π)
βπ β’ π πΎ(βπ) : π β (e)Well-Formedness
Fig. 4. Kennedy and Pierceβs type system (without mixin inheritance)
This construction preserves GNF. Newly introduced rules derive tree π[π β π0], whose root is terminal given that either(i) the root of π‘0is terminal, or otherwise(ii) π is the right-hand side of a rule in π , starting with a terminal given that the original grammar is in GNF. β‘ 3 PRELIMINARIES: NOMINAL SUBTYPING WITH VARIANCE
This work focuses onKennedy and Pierceβs [2007] abstract type system ofnominal subtyping with variance. This type system supports nominal subtyping with declaration-site variance as found in, e.g., C#and Scala.Fig. 4describes the abstract syntax (sub-figures (a,b)) and semantics (sub-figures (c,d,e)) of the type system, both adopted fromKennedy and Pierce[2007]. Note that we reuse notations introduced inSection 2in the context of formal forests, foreshadowing the connection between the two formalisms.
The full syntax of the type system is given inFig. 4 (a)and a nomenclature is available in Fig. 4 (b). A program consists of a class tableΞ and a subtyping query π. A class table over a finite set of class namesΞ is a finite set of inheritance rules πΉ. An inheritance rule πΏ = πΎ (βπ) : π declares a parameterized class (polytype) named πΎ and its inheritance from (super-) typesπ ; to highlight that a sequence of supertypesπ is empty, we write β after the colon. We require supertypes to reference only class parameters,params(π ) β π. Although Kennedy and Pierce included mixin inheritance in their type system, allowing a class to inherit one of its parameters, we chose to exclude it; accordingly,π β© π = β (nevertheless, the implications of mixin inheritance are discussed inSection 7). Variance is denoted by a diamondβ, which can be covariant, β = +,
contravariant,β = β, or invariant, β = β¦. The variance of the ithparameter of class πΎ is denotedβπΎπ, and the vector describing all πΎ parameter variances is denotedβπΈ. A subtyping query π‘β₯<: π‘β€(also written π‘β₯<:Ξπ‘β€), for subtype π‘β₯and supertype π‘β€, both ground, type checks if and only if π‘β₯is a subtype of π‘β€(considering class tableΞ).
Example. For example, we can write the C#program ofList. 1using abstract syntax as follows (recall that we use an abbreviated notation of monadic terms):
Ξ



ο£²



ο£³
π(+π₯ ) : β π(+π₯ ) : β πΈ :β
π£0(β¦π₯ ) : ππ£0ππ₯ , πππ₯ , ππ₯ , ππ£0ππ₯ , πππ₯ , ππ₯ π
n
π£0πΈ <: ππππππππΈ
(8)
Class tableΞ has four classes, π, π, πΈ, and π£0. In C#, type parameters are invariant by default, covariance is denoted by the keywordout, and contravariance by the keywordin. Therefore, the type parameters of π and π are covariant (βπ1= βπ1= +), and the type parameter of π£0is invariant (βπ£10 = β¦). Query π is the subtyping query invoked by the variable assignment in line 3 ofList. 1.
According to the type inheritance relation π‘ <:: π‘β², defined inFig. 4 (c), type π‘ inherits type π‘β²if π‘β²matches one of the supertypes of π‘ βs root. Unlike Kennedy and Pierce, we distinguish between the type inheritance relation and class inheritance, πΎ : πΎβ², found in class declarations. We also use the transitive closures of these:
πΎ(π) :βπΎ(π)
πΎ(π) : πΎβ²(π ) πΎβ²(πβ²) :βπ
πΎ(π) :βπ[πβ²β π ] (9)
The same rules apply for transitive type inheritance. The asterisk is replaced with a plus+ in the non-reflexive variants :+and <::+.
The rules of subtyping are detailed inFig. 4 (d). The Super rule establishes that a class is a subtype of its supertypes. The Var rule handles the type parameters, depending on their variance:
Statement πΎ(π‘ ) <: πΎ (π‘β²) implies π‘ <: π‘β² if πΎ βs type parameter is covariant, π‘ = π‘β² if invariant, and π‘β²<: π‘ if contravariant. If πΎ accepts multiple type parameters, then each may have its own variance. We also define transitive subtyping:
π‘ <:βπ‘
π‘ <: π‘β² π‘β²<:βπ‘β²β²
π‘ <:βπ‘β²β²
(10) For example, we can use the Super and Var rules to prove the subtyping query ofEq. (8):
π£0πΈ <: ππππππππΈ
β ππ£0ππΈ <: ππππππππΈ (Super, π£0(β¦π₯ ) : ππ£0ππ₯)
β π£0ππΈ <: πππππππΈ (Var, βπ1= +)
β ππ£0πππΈ <: πππππππΈ (Super, π£0(β¦π₯ ) : ππ£0ππ₯)
β π£0πππΈ <: ππππππΈ (Var, βπ1= +)
β π£0ππππΈ <: ππππππΈ (Super, π£0(β¦π₯ ) : ππ£0ππ₯)
β π£0ππππΈ <: πππππΈ (Var, βπ1= +)
β πππππΈ <: πππππΈβ (Super, π£0(β¦π₯ ) : ππ₯ )
Note that this subtyping query is recursive because the type parameters of classes π and π are covariant. If the parameters were contravariant, then the query would reverse (:>) after applying the Var rule.
A class table is well-formed if it complies with the rules inFig. 4 (e). First, inheritance must be acyclic, i.e., a class may not (transitively) inherit itself. Thus, the class inheritance closure :βis finite.
Kennedy and Pierce also explain that transitive subtyping and semantic soundness rely on certain restrictions to class inheritance, πΎ(βπ) : π.
To demonstrate the problem, consider the followinginvalid class declarations:
π(βπ₯ ) : β π(+π¦) : π(π¦)
Type parameter π¦ of class π is covariant, but it also substitutes for πβs contravariant parameter π₯ . This means that π¦ can be used both covariantly and contravariantly. For instance, if π <:: π , then
π(π) <: π (π) but also
π(π) <: π(π)
One way to solve this issue is by changing π βs supertype to π(π(π¦)):
π(+π¦) : π(π(π¦)) Once done, π¦ behaves covariantly, as expected:
ππ <: πππ
β πππ <: πππ (Super, π (+π¦) : πππ¦)
β ππ :> ππ (Var, βπ1 = β)
β π <: π β (Var, βπ1 = β)
Intuitively, one may think of a covariant type parameter as defining a βpositive positionβ, and a contravariant one a βnegative positionβ. A covariant type parameter π₯ β π can be placed in supertype π only in positive positions, and if it is contravariant, only in negative positions. A sub-tree πβ²of π in a negative position reverses its polarity: Positive positions become negative, and negative positions become positive. The initial position of π is positive. In this context, an invariant position is both positive and negative, and so are invariant type parameters. Thus, invariant type parameters can appear in any position, while a type in an invariant position contains only invariant type parameters, because its positions are both positive and negative.
For example, consider the following class table:
π(+π₯ ) : β π(+π₯1) : ππππ₯1
π(βπ₯ ) : β π(βπ₯2) : πππππ₯2
π(β¦π₯ ) : β π(βπ₯3,β¦π₯4,+π₯5) : ππ (π₯4, ππ₯4, π₯4)
(11)
This class table, ranging over classes π, π, π, π, π, and π , is well-formed. The table has no inheritance cycles, e.g., although type π appears in the type it inherits, we consider only the roots of supertypes in checks of circularity. Restrictions on the use of type parameters are also respected. Covariant type parameter π₯1in type ππππ₯1is in a positive position, as both π and π are covariant. Contravariant type parameter π₯2in type πππππ₯2is in a negative position: Type π reverses the polarity an odd number of times and type π does not change it, so the final position is negative. Invariant type parameter π₯4 can be placed in all the positions of π in type π π(π₯4, ππ₯4, π₯4), and it is the only parameter suitable for the invariant (second) position. The C#programming language enforces the rules ofFig. 4 (e); in List. 4, we encode the class table ofEq. (11)as C#interface declarations that compile as expected.
Let the type system featured inFig. 4be denoted asSβ€.Kennedy and Pierce[2007] proved thatSβ€is undecidable. Nevertheless, they introduced three restrictions onSβ€, each defining a decidable type system fragment ofSβ€. Each restriction removes a certain feature of the type system:
1 interfacea<out x> {} interface d<out x1> : a<d<a<x1>>> {}
2 interfaceb<in x> {} interface e<in x2> : b<b<a<b<x2>>>> {}
3 interfacec<x> {} interface f<in x3, x4,out x5> : a<f<x4,a<x4>,x4>> {}
Listing 4. The class table ofEq. (11)encoded in C#
contravariance (C), expansive inheritance (X), or multiple instantiation inheritance (M). We denote a type system by the set of features used by its programs, e.g.,β¨C, X, Mβ© is an alias of Sβ€, and its subset containing only programs without contravariance is denoted byβ¨X, Mβ©. Each element of the power set of{C, X, M} defines a type system. We organize the eight type systems in a lattice, depicted inFig. 1, ordered with respect to inclusion (of features and programs). The bottom of the lattice isSβ₯, bare of any special features. All these type systems employ polyadic parametric polymorphism, i.e., where a class can have multiple type parameters.
The first restriction of Kennedy and Pierce forbids contravariant type parameters,βπ₯ , yielding type systemβ¨X, Mβ©. The second restriction disallows multiple instantiation inheritance, making inheritance deterministic: If πΎ(π) :+πΎβ²(π ) and πΎ (π) :+πΎβ²(πβ²), then necessarily π = πβ². For instance, the class table ofEq. (8)does not conform to this restriction, as class π£0extends both π(π₯ ) and π(ππ₯ ). Kennedy and Pierce suspected that single instantiation inheritance, featured in β¨C, Xβ©, is not enough to achieve decidability.Grigore[2017] verified the suspicion by simulating Turing machines with subtyping queries in Java, which employs deterministic inheritance. To make single inheritance decidable, it had to be further restricted; the resulting type system will not be discussed in this paper.
The third restriction, due toViroli[2000], limits the growth of types during subtyping. Consider, for example, the class table depicted inEq. (11). Given type ππ‘ , for some type argument π‘ , on either side of a subtyping query, it can change to ππππ‘ by applying the Super rule and then to πππ‘ by Var, yielding the type ππ‘β²for π‘β²= ππ‘ . Type ππ‘ can, therefore, expand indefinitely during subtyping.
Kennedy and Pierce named this kind of inheritancerecursively expansive and showed its connection to infinite subtyping proofs.
Originally, Viroli illustrated this growth of types by a graph, depicting the dependencies between the different type variables employed by the class table. In this paper, however, we use a more approachable definition, following an equivalence result of Kennedy and Pierce. Consider the following definition:
Definition 3.1 (Inheritance and decomposition closure). Set π of types is closed under inheri- tance and decomposition if for every type πΎ(π) in π , its sub-trees are in π , π β π , and so are its supertypes, πΎ(π) <:: π‘ β π‘ β π . The minimal super-set of set π closed under inheritance and decomposition is denotedcl(π). (The curly braces of set π may be omitted inside cl(Β·), e.g., cl(π‘1, π‘2).) With the class table ofEq. (11), for instance,cl(ππ‘ ) β {ππππ‘| π β₯ 0}, as explained above.Kennedy and Pierce[2007] noted that the set of types involved in a subtyping query π‘β₯<: π‘β€is contained in their inheritance and decomposition closure,cl(π‘β₯, π‘β€), following the Super (inheritance) and Var (decomposition) rules. If a closurecl(π‘β₯, π‘β€) is finite, then the subtyping query π‘β₯<: π‘β€is decidable, as we can trace infinite subtyping cycles. Therefore, if the closurecl(π‘, π‘β²) of any two types π‘ and π‘β²is finite, subtyping is decidable, and the class table is called finitary. Kennedy and Pierce proved that subtyping with non-expansive inheritance is decidable, by showing that a class table is non-expansive if and only if it is finitary. Thus, we can useDefinition 3.1to define non-expansive inheritance directly:
Definition 3.2 (Non-expansive class table). Class table Ξ is non-expansive if and only if the inheritance and decomposition closure of any set of types π is finite,| cl(π )| < β.
We denote the type system fragment where all the class tables are non-expansive asβ¨C, Mβ©.
If a class tableΞ conforms to type system S, we write S β’ Ξ. For example, class table Ξ ofEq. (11) conforms toSβ€, writtenSβ€β’Ξ, as it is well-formed, but not to β¨C, Mβ©, written β¨C, Mβ© β¬ Ξ, as its inheritance is expansively recursive.
4 SUBTYPING MACHINES AS FOREST RECOGNIZERS
A subtyping machine is a set of type system definitions that utilize subtyping and its rules to perform a computation.Grigore[2017], who coined the term, used subtyping machines to simulate Turing machines in Java, thus proving its type system is undecidable.
This section formalizes subtyping machines as general computation devices, following a standard approach in fluent API research [Gil and Roth 2020,2019;Yamazaki et al. 2019]. Subtyping machines recognize formal forests, i.e., sets of trees or terms. InSection 5, we use this formalism to classify the computational power of decidable subtyping in its various forms.
This sectionβs running example is the program shown inList. 1. We focus on the type declaration in the first two lines of the code:
1 interfacea<out x> {} interface b<out x> {} interfaceE {}
2 classv0<x> : a<v0<a<x>>>, a<a<x>>, a<x>, b<v0<b<x>>>, b<b<x>>, b<x> {}
Let us name this programPali. InSection 1, we introducedPali as an example of a subtyping machine that recognizes palindromes, but did not explain what βrecognizeβ means. Our goal is, therefore, to formalize an interpretation ofPali as a formal forest of palindromes.
In our type system, presented inFig. 4, a program π =Ξ π comprises a set of declarations Ξ constituting a class table and a query (or assertion) π= π‘β₯<: π‘β€. The query succeeds (compiles) if and only if type π‘β₯is a subtype of type π‘β€considering class tableΞ and the subtyping semantics of the type system, π‘β₯ <:Ξπ‘β€. Therefore, a given class table defines the set of queries that compile against it; we call this set theextent of the class table.
The extent ofPaliβs class tableΞ (written using abstract syntax inEq. (8)) contains, for instance, the queries
π£0πΈ <: ππΈ , π£0πΈ <: ππππΈ , and π£0πΈ <: ππππππππΈ ,
because they type check againstΞ. In general, the extent of Ξ contains all the queries of the form π£0πΈ <: π πΈ
where π is a palindrome over{π, π} (we prove this inSection 5.2).Ξβs extent, however, also contains queries that do not include palindromes at all, such as
πΈ <: πΈ , π£0ππΈ <: πππΈ , and ππ£0πΈ <: ππ πΈ .
To correlatePaliβs extent with a forest of palindromes, we first need to restrict the contents of its queries.
Recall that formal grammars distinguish between terminal and non-terminal (variable) symbols, drawn from setsΞ£ andπ , respectively. This dichotomy allows grammars to use variables as auxiliary symbols, which cannot appear as part of derived words. To make our definition of subtyping machines effective, we introduce an analogous separation inΞ, the set of class names in Ξ. Let sub-alphabetΞ£β₯βΞ and super-alphabet Ξ£β€βΞ be the domains of subtypes π‘β₯and supertypes π‘β€ in subtyping queries, i.e., types π‘β₯and π‘β€consist of type names drawn from designatedΞ subsets Ξ£β₯
andΞ£β€. We amend Kennedy and Pierceβs type system, defined inFig. 4, to include the sub- and super-alphabets:
π ::= π‘β₯<: π‘β€ π‘β₯βΞ£β³β₯, π‘β€βΞ£β³β€ (Ξ£β₯β³,Ξ£β³β€βΞ) (12) From this point on, we discuss programs that include the amendment presented inEq. (12). We define the extent of class tableΞ as the subset of Ξ£β₯β³ΓΞ£β³β€:
Definition 4.1 (The extent of a class table). Let Ξ be a class table, as described inFig. 4. The extent ofΞ, denoted πΏ(Ξ), contains all the pairs of types β¨π‘β₯, π‘β€β© such that the subtyping query π‘β₯<: π‘β€ type checks againstΞ:
πΏ(Ξ) = {β¨π‘β₯, π‘β€β© | π‘β₯βΞ£β₯β³, π‘β€βΞ£β€β³, π‘β₯<:Ξπ‘β€} Ξ£β₯,Ξ£β€βΞ
Definition 4.1gives us more control over the expressiveness of programs. The extent of a class table, however, does not yet describe a formal forest. Although types are homomorphic to trees, an extent is a set of type pairsβ¨π‘β₯, π‘β€β©, rather than a set of lone types. To solve this issue, we fix the subtype π‘β₯to some constant type that is no longer a variable part of the subtyping query. Then, a class tableΞ defines a (tree) language of π‘β₯supertypes; this set is the projection of type π‘β₯onto extent πΏ(Ξ):
Definition 4.2 (The language of a class table). Let Ξ be a class table and π‘β₯be a type, as described inFig. 4. The language ofΞ over π‘β₯, denoted πΏπ‘
β₯(Ξ), contains all the supertypes of π‘β₯againstΞ, πΏπ‘
β₯(Ξ) = {π‘β€| β¨π‘β₯, π‘β€β© β πΏ(Ξ)} = {π‘β€| π‘β€βΞ£β€β³, π‘β₯<:Ξπ‘β€} Ξ£β€βΞ
We can now useDefinition 4.2to describePaliβs language over subtype π‘β₯= π£0πΈ as a forest of palindromes:
πΏ(Pali)π£0πΈ = {ππΈ | π is a palindrome} (Ξ£β€= {π, π, πΈ}) (13) In other words,Pali is a subtyping machine that recognizes palindromes as supertypes of type π‘β₯= π£0πΈ . We proveEq. (13)inSection 5.2.
ConsideringDefinition 4.2, we see that class tables describe formal languages (forests), as grammars do, and their compilers serve as language (forest) recognizers, similarly to grammar parsers. Correspondingly, a type system is a family of programs that recognizes a class of languages:
Definition 4.3 (The class of languages of a type system). The computational class (expressiveness) of subtyping-based type systemS, denoted L (S), is the set of languages described by its programs:
L (S) = {πΏπ‘β₯(Ξ) | π‘β₯βΞ£β₯β³, S β’Ξ} Ξ£β₯βΞ
(Recall thatS β’Ξ means class table Ξ is well-formed and conforms to the restrictions of type systemS.)
5 EXPRESSIVENESS OF DECIDABLE SUBTYPING
Recall that the type system of Kennedy and PierceSβ€ features contravariance (C), expansive inheritance (X), and multiple instantiation inheritance (M). Subset π of{πΆ, π , π } describes the type system fragment ofSβ€endorsing only the features included in π . Subtyping inSβ€andβ¨C, Xβ© is known to be undecidable, but it is decidable in the other six fragments [Grigore 2017;Kennedy and Pierce 2007]. InSection 4, we described class tables as subtyping machines that recognize formal forests (Definition 4.2) and defined the expressiveness of a type system as the class of forests that its tables recognize (Definition 4.3). Now we use these formalizations to describe the computational power of Kennedy and Pierceβs decidable type system fragments.
Given type systemS, we show S is at least as complex as class of languages L by proving that any language β inL is described by some class tableΞ and subtype π‘β₯ofS:
ββ β L, βΞ, π‘β₯, S β’Ξ, β = πΏπ‘β₯(Ξ) β L β L(S)
To show that type systemS is not more complex than class of languages L, we prove that any class tableΞ and subtype π‘β₯describe anL language:
βΞ, π‘β₯, S β’Ξ, ββ β L, β = πΏπ‘β₯(Ξ) β L(S) β L
If the expressiveness ofS is bounded by L from both directions, then it is exactly L:
L β L (S) β L β L (S) = L
In all the following proofs, formal languages are specified by their corresponding grammars.
5.1 Non-Expansive Subtyping Is Regular
Greenman et al.[2014] raised a concern about non-expansive inheritance restricting some con- ventional class table use-cases. We go one step further by showing that the expressiveness of non-expansive class tables is regular, i.e., of the lowest tier.
Theorem 5.1. Any regular forest can be encoded in a class table using only multiple instantiation inheritance:
NRF β L (β¨Mβ©)
Intuition. We want to simulate the derivation of a regular tree grammar with subtyping. In the first step, we encode every terminal node π of rank π by class π that has π type parameters. Thus, terminal trees (derived from the grammar) and types (in subtyping queries) coincide, as both use the same term notation and base alphabet, e.g., π1(π2, π3π4) is both a tree and a type. Next, every regular derivation rule π£ ::= π (π) is directly encoded by a corresponding inheritance rule π£ : π (π).
With these definitions, the subtyping query π£ <: π‘ simulates the grammar derivation π£ ββπ‘ : Let π‘ = π (π). Grammar variable π£ derives π‘ if π£ derives π, π£ ::= π (π) and then, recursively, variables π derive πβs children, π ββ π. Correspondingly, the subtyping query π£ <: π‘ type checks if class π£ inherits type π , π£ : π(π) and then, recursively, types π are subtypes of π, π <: π. The recursion of subtyping originates from the Var rule, under the condition that the type parameters of π are covariant.
Example. Consider the regular tree grammar inEq. (3)inSection 2, describing lists of Peano numbers. This grammar can be encoded in a C#program with neither contravariance nor expansive recursive inheritance, as demonstrated inList. 5.
1 interfacez {} // Terminal leaf z
2 interfaces<out x> {} // Terminal node s(π₯)
3 interfacenil {} // Terminal leaf nil
4 interfacecons<out x1, out x2> {} // Terminal node cons(π₯1, π₯2)
5 interfaceNat: // Variable Nat
6 z, // Production Nat ::= z
7 s<Nat> {} // Production Nat ::= s(Nat)
8 interfaceList: // Variable List
9 nil, // Production List ::= nil
10 cons<Nat, List> {}// Production List ::= cons(Nat, List)
11 // Subtyping query List <: cons(ssz, cons(sz, cons(z, nil)))
12 cons<s<s<z>>, cons<s<z>, cons<z, nil>>> t=(List)null;
Listing 5. C#program encoding the grammar ofEq. (3)
The grammar terminalss, z, cons, and nil are encoded by interfaces with covariant type parameters, and the grammar variablesNat and List are encoded by interfaces whose supertypes are the possible productions of each variable. The last line of the listing contains a variable assignment that ensures
type π£0 = List is a subtype of the C#type encoding the listβ¨2, 1, 0β© (described inEq. (4)). This assignment compiles, as expected.
Proof ofTheorem 5.1. A given regular forest is described by a regular tree grammar πΊ =
β¨Ξ£,π, π£0, π β©. We encode πΊ as class tableΞ, β¨Mβ© β’ Ξ, such that the language of Ξ (over π‘β₯= π£0) is exactly the language described by πΊ , πΏ(πΊ).
Construction. The class table Ξ employs type names drawn from Ξ = Ξ£ βͺ π . The number of type parameters of each type π βΞ£ is identical to its rank in πΊ, and all type parameters are covariant (nodes π£ β π are leaves). For each regular production rule π£ ::= π (π) β π , set type π (π) as a super-type of type π£ , π£ : π(π). We set the subtype to π‘β₯= π£0, and the super-alphabet toΞ£β€=Ξ£.
Correctness of the construction. Subtyping against Ξ directly emulates derivations of grammar πΊ, where the left-hand side is the current variable (tree form), starting with the initial variable π£0, and the right-hand side is the derived tree. We show that a tree π‘ is produced by variable π£ if and only if type π‘ is a supertype of type π£ ,
π£βπΊβ π‘ β π£ <: π‘ (14)
by induction on the height of π‘ , π=height(π‘ ).
If π= 0, then tree π‘ is a leaf, π‘ = π, π β Ξ£. Variable π£ derives π only if production π£ ::= π is in π .
In that case, type π is a supertype of π£ , π£ : π βΞ, so π£ is a subtype of π‘ by inheritance:
π£βπΊβ π β π£ ::= π β π β π£ : π βΞ β π£ <: π
SupposeEq. (14)holds for πβ₯ 0, and let π‘ be a tree of height π + 1, π‘ = π (π), π βΞ£, π β Ξ£β³, where the heights of sub-treesπ are π at most. If variable π£ derives π‘ , then there is a production π£ ::= π (π) in π , for which every child variable π£π β π derives the corresponding child tree π‘π,π ββπΊ π. This production ensures type π£ inherits type π(π). As the heights of π are no higher than π, we can use the inductive assumption to deduceπ <: π. Because the type parameters of π are covariant, we can show that π£ is a subtype of π‘ by applying inheritance, decomposition, and the inductive assumption:
π£ <: π(π) β π£ : π (π) β§ π (π) <: π (π) π(π) <: π (π) β π <: π β§ βπ= + Overall, we have now provedEq. (14):
π£βπΊβ π‘ β π£ ::= π (π) β§ π βπΊβ π β π£ : π (π) β§ π <: π β π£ <: π‘ By applyingEq. (14)to type π‘β₯= π£0, we get
π£0βπΊβ π‘ β π‘β₯<: π‘ , i.e., πΏ(πΊ) = πΏπ‘β₯(Ξ) by definition.
Class tableΞ conforms to the restrictions of the type system, β¨Mβ© β’ Ξ. First, the class table is well-formed. Only types π£β π inherit types of the form π (π) for π βΞ£, so there are no inheritance cycles and no type parameters in the wrong position. Second,Ξ does not employ contravariance and is non-expansive: The structure of inheritance rules guarantees that any type incl(π‘ ) is higher
than π‘ by one level at the most. β‘
Theorem 5.2. Non-expansive class tables describe regular forests:
L (β¨C, Mβ©) β NRF