Semantic Expressive Capacity with Bounded Memory

(1)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 65–79 65

Semantic expressive capacity with bounded memory

Antoine Venant and Alexander Koller Department of Language Science and Technology

Saarland University

{venant|koller}@coli.uni-saarland.de

Abstract

We investigate the capacity of mechanisms for compositional semantic parsing to describe re-lations between sentences and semantic repre-sentations. We prove that in order to represent certain relations, mechanisms which are syn-tactically projective must be able to remem-ber an unbounded numremem-ber of locations in the semantic representations, where nonprojective mechanisms need not. This is the first result of this kind, and has consequences both for grammar-based and for neural systems. 1 Introduction

Semantic parsers which translate a sentence into a semantic representation compositionally must recursively compute a partial semantic represen-tation for each node of a syntax tree. These partial semantic representations usually contain placeholders at which arguments and modifiers are attached in later composition steps. Ap-proaches to semantic parsing differ in whether they assume that the number of placeholders is bounded or not. Lambda calculus (Montague,

1974;Blackburn and Bos,2005) assumes that the number of placeholders (lambda-bound variables) can grow unboundedly with the length and com-plexity of the sentence. By contrast, many meth-ods which are based on unification (Copestake et al.,2001) or graph merging (Courcelle and En-gelfriet,2012;Chiang et al.,2013) assume a fixed set of placeholders, i.e. the number of placeholders is bounded.

Methods based on bounded placeholders are popular both in the design of hand-written gram-mars (Bender et al.,2002) and in semantic parsing for graphs (Peng et al., 2015; Groschwitz et al.,

2018). However, it is not clear that all relations be-tween language and semantic representations can be expressed with a bounded number of place-holders. The situation is particularly challenging

when one insists that the compositional analysis is projective in the sense that each composition step must combine adjacent substrings of the in-put sentence. In this case, it may be impossible to combine a semantic predicate with a distant argu-ment immediately, forcing the composition mech-anism to use up a placeholder to remember the ar-gument position. If many predicates have distant arguments, this may exceed the bounded “memory capacity” of the compositional mechanism.

In this paper, we show that there are relations between sentences and semantic representations which can be described by compositional mech-anisms which are bounded and non-projective, but not by ones which are bounded and projective. To our knowledge, this is the first result on expressive capacity with respect to semantics – in contrast to the extensive literature on the expressive capacity of mechanisms which describe just the string lan-guages.

More precisely, we prove that tree-adjoining grammars can describe string-graph relations us-ing the HR graph algebra (Courcelle and En-gelfriet, 2012) with two sources (bounded, non-projective) which cannot be described using linear monadic context-free tree grammars and the HR algebra withksources, for any fixed k(bounded, projective). This result is especially surprising be-cause TAG and linear monadic CFTGs describe the same string languages; thus the difference lies only in the projectivity of the syntactic analysis.

We further prove that given certain assump-tions on the alignment between tokens in the sen-tence and edges in the graph, no generative de-vice for projective syntax trees can simulate TAG with two sources. This has practical consequences for the design of transition-based semantic parsers (whether grammar-based or neural).

Plan of the paper.We will first explain the

(2)

(3)

originally had. However, this derivation combines verbs with nouns which are not adjacent in the string, which is not allowed in many grammar for-malisms. If we limit ourselves to combining only adjacent substrings (projectively, see Fig.1b), we must remember the placeholders for all the verbs at the same time if we want to obtain the correct predicate-argument structure. Thus, the number of placeholders grows with the length of the sen-tence; this is only possible with a lambda-style compositional mechanism.

There is scattered evidence in the literature for this tension between bounded memory and projectivity. Chiang et al. (2013) report (of a compositional mechanism based on the HR al-gebra, unification-style) that a bounded num-ber of placeholders suffices to derive the graphs in the AMR version of the Geoquery corpus, but Groschwitz et al. (2018) find that this re-quires non-projective derivations in 37% of the AMRBank training data (Banarescu et al.,2013). Approaches to semantic construction with tree-adjoining grammar either perform semantic com-position along the TAG derivation tree using unifi-cation (non-projective, unifiunifi-cation-style) (Gardent and Kallmeyer, 2003) or along the TAG derived tree using linear logic (projective, lambda-style) (Frank and van Genabith, 2001). Bender (2008) discusses the challenges involved in modeling the predicate-argument structure of a language with very free word order (Wambaya) with projective syntax. While the Wambaya noun phrase does not seem to require the projective grammar to collect unbounded numbers of unfilled arguments as in Fig.1b, Bender notes that her projective analysis still requires a more flexible handling of seman-tic arguments than the HPSG Semanseman-tic Algebra (unification-style) supports.

In this paper, we define a notion of semantic expressive capacity and prove the first formal re-sults about the relationship between projectivity and bounded memory.

3 Formal background

LetN0 = {0,1, . . .}be the nonnegative integers.

Asignature is a finite setΣ of function symbols

f, each of which has been assigned a nonnega-tive integer called itsrank. We write Σn for the

symbols of rank n. Given a signatureΣ, we say that all constantsa∈Σ0aretreesoverΣ; further,

if f ∈ Σn andt1, . . . , tn are trees over Σ, then

f(t1, . . . , tn) is also a tree. We writeTΣ for the

set of all trees overΣ. We define theheightht(t)

of a treet = f(t1, . . . , tn) to be1 + maxht(ti),

andht(c) = 1forc∈Σ0.

LetX /∈ Σ, and letΣX = Σ∪ {X} (withX

as a constant of rank 0). Then we call a treeC ∈ TΣX acontextif it contains exactly one occurrence ofX, and writeC_Σ for the set of all contexts. A context can be seen as a tree with exactly one hole. Ift ∈ TΣ, we writeC[t]for the tree inTΣ that is

obtained by replacingXwitht.

Given a string w ∈ W∗, we write|w|_a for the number of times thata∈W occurs inw.

3.1 Grammars for strings and trees

We take a very general view on how semantic rep-resentations for strings are constructed composi-tionally. To this end, we define a notion of “gram-mar” which encompasses more devices for de-scribing languages than just traditional grammars, such as transition-based parsers.

We say that a tree grammar G over the sig-nature Σ is any finite device that defines a lan-guage L(G) ⊆ TΣ. For instance, regular tree

grammars (Comon et al.,2007) are tree grammars, and context-free grammars can also be seen as tree grammars defining the language of parse trees.

We say that astring grammarG= (G,yd)over the signatureΣand the alphabetW is a pair con-sisting of a tree grammar G over Σ and a yield function yd : T_Σ → W∗ which maps trees to strings over W (Weir, 1988). A string grammar defines a languageL(G) ={yd(t)|t∈L(G)} ⊆

W∗. We call the treest∈L(G)derivations. A particularly common yield function is the function yd_pr, defined as yd_pr(f(t1, . . . , tn)) =

yd_pr(t1)·. . .·ydpr(tn)ifn >0andydpr(c) =c

if c has rank 0. This yield function simply con-catenates the words at the leaves of t. Applied to the phrase-structure tree t in Fig. 2c, yd_pr(t)

is the Swiss German sentence in (1). Context-free grammars can be characterized as string grammars that combine a regular tree grammar with yd_pr. By contrast, we can model tree-adjoining gram-mars (TAG,Joshi and Schabes,1997) by choosing a tree grammarGthat describes derivation trees as in Fig.2b. Theyd function could then substitute and adjoin the elementary trees as specified by the derivation tree (see Fig.2a) and then read off the words from the resulting derived tree in Fig.2c.

(4)

(5)

stant s-graphs and the rename and forget opera-tions. The class of graphs which can be expressed as values of terms over the algebra increases with

k. We writeH_k for the HR algebra withksource names (and some set of edge labels).

LetGbe an s-graph, and let G0 be a subgraph of G, i.e. a subset of its edges. We call a node aboundary nodeofG0 if it is incident both to an edge in G0 and to an edge that is not inG0. For instance, the s-graph in Fig. 2e is a subgraph of the one in Fig.2d; the boundary nodes are drawn shaded in (d). The following lemma holds:

Lemma 2. LetG=JC[t]Kbe an s-graph, and let

G0 be a subgraph ofG such that the s-graph_Jt_K

contains the same edges asG0. Then every bound-ary node inG0 is a source in_JtK.

3.4 Grammars with semantic interpretations

Finally, we extend string grammars to composi-tionally relate strings with semantic representa-tions. LetG = (G,yd)be a string grammar. The tree grammarGgenerates a languageL(G)⊆ TΣ

of trees. We will map each tree t ∈ L(G) into a term h(t) over some algebra A over a signa-ture ∆using a linear tree homomorphism (LTH)

h:T_Σ→ T_∆(Comon et al.,2007), i.e. by compo-sitional bottom-up evaluation. This defines a rela-tion between strings and values ofA:

REL(G, h,A) ={(yd(t),_Jh(t)_KA)|t∈L(G)}

For instance,Acould be some HR algebraHk;

thenREL(G, h,H_k)will be a binary relation be-tween strings and s-graphs. In this case, we abbre-viateJh(t)Kasgraph(t).

If we look at an entire class Gof string

gram-mars and a fixed algebra, this defines a class of such relations:

L(G,A) ={REL(G, h,A)| G ∈G, hLTH}.

In the example in Fig.2, we can define a linear homomorphismh to map the derivation tree t in (b) to a term h(t) which evaluates to the s-graph shown in (d). At the top of this term, the s-graphs at the “chind” and “h¨alfed” (f,g) nodes are com-bined into (d) byh(l¨ond):

h(l¨ond) =hrti let

||fo(hrti

ARG1

−−→ hoi ||renrt→o(G(f)))

||fo(hrti

ARG2

−−→ hoi ||renrt→o(G(g)))

This non-projective derivation produces the s-graph in (d) using only two sources,rtando. By

a b b

c d d

a a

c

b

Figure 3: TheCSDgraph for ((2), (1, 0), (1), (0, 0)); blocks indicated by gray boxes.

contrast, a homomorphic interpretation of the pro-jective tree (c) has to use at least four sources, as the intermediate result in (e) illustrates.

4 Projective cross-serial dependencies

We will now investigate the ability of projec-tive grammar formalisms (G,Hk) to express

L(TAG,H2). We will define a relation CSD ∈

L(TAG,H₂) and prove that CSD cannot be gen-erated by projective grammar formalisms with boundedk. We show this first for arbitrary projec-tiveG, under certain assumptions on the alignment

of words and graph edges. In Section5, we drop these assumptions, but focus onG=LM-CFTG.

4.1 The relationCSD

To construct CSD, consider the string language CSDs={AnBmCnDm |m, n≥1}, where

A={ahkakik|k≥0},

and analogously for B, C, D. An example string inCSDs is ahhaaii bhbi b chci d d. Note thatk

can be chosen independently for each segment. Every string w ∈ CSDs can be uniquely

described by m, n, and a sequence K(w) = (K(a), K(b), K(c), K(d)) of numbers specifying the k’s used in each segment, where K(a), K(c)

each containnnumbers andK(b)_{, K}(d)_contain_m

numbers. In the example, we haven= 1,m= 2, andK(w) = ((2),(1,0),(1),(0,0)).

We associate a graphGw with each stringw∈

CSDsby the construction illustrated in Fig.3. For

each1 ≤ i ≤n, we define thei-tha-blockto be the graph consisting of nodesu−→c vwith a further outgoinga-edge from u. In addition,u connects to a linear chain ofK_i(a)edges with labela, andv

to a linear chain ofK_i(c)c-edges. Gw consists of

a linear chain of the n a-blocks, followed by the

(6)

*1

a

b *1

b! *0 *0

*0

c d

d

→ (a)

b <rt>

d G0

<rt>

d b

d b a <rt>

c d

b

d b

→ (b)

→ (c)

→ (d) (a)

(b)

(c) (d)

Figure 4: An derivation of ((0), (0,0), (0), (0,0)).

Note thatCSDis a more intricate version of the cross-serial dependency language. CSD can be generated by a TAG grammar along the lines of the one from Section3.4, using a HR algebra with two sources; thusCSD∈ L(TAG,H₂).

4.2 CSDwith bounded blocks

The characteristic feature of CSD is that edges which are close together in the graph (e.g. the a

andcedge in an a-block) correspond to symbols that can be distant in the string (e.g. a andc to-kens). Projective grammars cannot combine pred-icates (a) and arguments (c) directly because of their distance in the string; intuitively, they must keep track of either the c’s or thea’s for a long time, which cannot be done with a boundedk.

Before we go into exploiting this intuition, we first note that its correctness depends on the details of the construction ofCSD, in particular the abil-ity to select arbitrary and independentK(x)for the differentx ∈ {a, b, c, d}. Consider the derivation

t on the left of Fig. 4 with its projective yield

abbcdd; this is the case((0),(0,0),(0),(0,0))of

CSD, corresponding to the CSDgraphG1 shown

in Fig. 4 (a). We can map t to this graph by applying the following linear tree homomorphism

hintoH2:

h(∗1) =fs(x1||renrt→s(x2)) h(∗0) =x1

h(b) =d←−bhrti −→ hsi h(b!) =d←−bhrti h(a) =c←−ahrti −→ hsi

A derivation of the form∗0(t1, t2)evaluates to the

same graph ast1; the graph value oft2is ignored.

Thus if we assume that the subtree of t for cdd

evaluates to some arbitrary graphG0, the complete

derivationt evaluates to G1. Some intermediate

results are shown on the right of Fig.4.

If we letCSD0 be the subset ofCSDwhere all K(x)are zero, we can generalize this construction into an LM-CFTG which generatesCSD0. Thus,

CSD0 can be generated by a projective grammar

that is interpreted into H2. But note that the

derivation in Fig.4is unnatural in that the symbols in the string are not generated by the same deriva-tion steps that generate the graph nodes that intu-itively correspond to them; for instance, the graphs generated for the d tokens are completely irrele-vant. Below, we prevent unnatural constructions like this in two ways. We will first assume that string symbols and graph nodes must bealigned

(Thm.1). Then we will assume that theK(x) can be arbitary, which allows us to drop the alignment assumption (Thm.2).

4.3 k-distant trees

Let R ⊇ CSD0 be some relation containing at

least the string-graph pairs ofCSD0, e.g.CSD

it-self. Assume that R is generated by a projec-tive grammar (G, h) with G = (G,yd_pr) and a fixed number k of sources, i.e. we have R = REL(G, h,Hk). We will prove a contradiction.

Given a pair (w, Gw) ∈ R, we say that two

edgese, f inGwareequivalent,e≡f, if they

be-long to the same block. We call a derivation tree

t∈T=L(G)k-distantifthas a subtreet0such that we can find k edges e1, . . . , ek ∈ graph(t0)

with ei 6≡ ej for alli 6= j and k further edges e0₁, . . . , e0_k ∈ Gw\graph(t0)such thatei ≡ e0i for

alli. For such trees, we have the following lemma.

Lemma 3. Ak-distant tree has a subtreet0 such

thatgraph(t0)has at leastksources.

Proof. Let BKi be thei-th block in Gw; we let 1 ≤ i ≤ m+nand do not distinguish between

a- andb-blocks. Lett0 be the subtree oftclaimed by the definition of distant trees. For each i, let

E_i0 = BKi ∩ graph(t0) be the edges in the i-th

block generated byt0, and letEi =BKi\E_i0.

By definition,EiandEi0are both non-empty for

at leastkblocks. Each of these blocks is weakly connected, and thus contains at least one nodeui

which is incident both to an edge inEi and inEi0.

This node is a boundary node of graph(t0). Be-cause u1, . . . , uk are all distinct, it follows from

Lemma2thatgraph(t0)has at leastksources.

We also note the following lemma about deriva-tions of projective string grammars, which fol-lows from the inability of projective grammars to combine distant tokens. We write Sep =

{a/c, c/a, b/d, d/b}.

Lemma 4. LetG= (G,yd)be a projective string

(7)

that anyt ∈ L(G) with yd(t) ∈ a∗bscsd∗ has a subtreet0 such thatyd(t0)containsr occurrences ofxand no occurrences ofy, for somex/y∈Sep.

4.4 Projectivity and alignments

A consequence of Lemma 3 is that if certain string-graph pairs inCSD0 can only be expressed

with k+1-distant trees, then R (which contains these pairs as well) is not in L(G,H_k), because

Hkonly admitsksources.

However, as we saw in Section 4.2, pairs in CSD0 can have unexpected projective derivations

which make do with a low number of sources. So let’s assume for now that the string grammar

G and the tree homomorphism h produce tokens and edge labels that fit together. Let us callG, h

aligned if for all constantsc ∈ Σ0, graph(c)is a

graph containing a single edge with label yd(c). The derivation in Fig. 4 cannot be generated by an aligned grammar because the graph for the to-ken bcontains a d-edge. We writeL↔(G,A) = {REL(G, h,A)| G ∈GandG, haligned}for the

class of string-semantics relations which can be generated with aligned grammars.

Under this assumption, it is easy to see that any relation including CSD0 (hence,CSD) cannot be

expressed with a projective grammar.

Theorem 1. Let G be any class of projective

string grammars and R ⊇ CSD0. For any k, R6∈ L↔(G,Hk).

Proof. Assume that there is aG = (G,yd_pr) ∈

Gand an LTHhsuch thatR = REL(G, h,Hk).

Givenk, chooses ∈ _N₀ such that every treet ∈

T = L(G) withyd(t) = asbscsds has a subtree

t0 such thatyd(t0)containsk+ 1occurrences of

xand no occurrences of y, for some x/y ∈ Sep. Such an sexists according to Lemma4. We can choosetsuch that(yd(t),graph(t))∈CSD0.

BecauseG, hare aligned,graph(t0)contains no

y-edge and at leastk+ 1x-edges. Each of these

x-edges is non-equivalent to all the others, and equivalent to ay-edge ingraph(t)\graph(t0), so

t is k+1-distant. It follows from Lemma 3 that graph(t0)hask+ 1sources, in contradiction to the assumption thatG, huses onlyksources.

5 Expressive capacity of LM-CFTG

Thm.1is a powerful result which shows thatCSD cannot be generated by any device for generating projective derivations using bounded placeholder

memory – if we can assume that tokens and edges are aligned. We will now drop this assumption and prove thatCSDcannot be generated using a fixed set of placeholders using LM-CFTG, regardless of alignment. The basic proof idea is to enforce a weak form of alignment through the interaction of the pumping lemma with very longx-chains. The result is remarkable in that LM-CFTG and TAG are weakly equivalent; they only differ in whether they must derive the strings projectively or not.

Theorem 2. CSD 6∈ L(LM-CFTG,Hk), for any

k.

5.1 Asynchronous derivations

Assume that CSD = REL(G, h,H_k), for some

k, withG = (G,yd) an LM-CFTG. Proving that this is a contradiction hinges on a somewhat tech-nical concept ofasynchronousderivations, which have to do with how the nodes generating edge la-bels such asaare distributed over a derivation tree. We prove that all asynchronous derivations of cer-tain elements ofCSDare distant (Lemma5), and that all LM-CFTG derivations of CSD are asyn-chronous (Lemma6), which proves Thm.2.

In what follows, Let T = L(G). let us write for any tree or context t and symbol x, nt_x as a shorthand for |yd(t)|x, etx for the number of x

-edges ingraph(t)andmt_xfor the maximum length of a string inx∗which is also substring ofyd(t).

Definition 1(x, y, l-asynchronous derivation). Let

x/y ∈ Sep, l > 0, t ∈ T, We callt an x, y, l

-asynchronous derivationiff there is a decomposi-tiont=C[t0]such that

et_y0 ≥nt_y−nt_xl−mt_y

et_x0 ≤nt_xl+mt_x(l+ 1).

We call the pair (C, t0) an x, y, l-asynchronous splitoft.

Lemma 5. For any k, l > 0, there is a pair

ok,l = (wk,l, Gk,l) ∈ CSDsuch that everyx, y, l -asynchronoustwithok,l = (yd(t),graph(t))isk -distant.

Proof. Forx∈ {a, b, c, d}andm∈_N0, letx(m)

denote the wordhm_xm_im_{. Let}_r₌_s_{= 3}_l₊_k_{+ 1}

andok,l = (wk,l, Gk,l) be the unique element of

CSDsuch that

wk,l = (aa(s))r(bb (s)

)r(cc(s))r(dd(s))r.

Let t be an a, c, l-asynchronous derivation of

(8)

By definition, we can split t = C[t0] such that graph(t0)has at mostqa=lr+(l+1)s= (2l+1)s a-edges and at leastqc=rs−rl−s= (2l+k+1)s c-edges. Notice first that graph(t0) contains at most2l+ 1 different complete a-blocks of Gk,l,

because eacha-block containss a-edges. Having

2l+ 2 of them would require(2l+ 2)s a-edges, which is more than theqaa-edges thatgraph(t0)

can contain.

Next, consider2l+kdistincta-blocks ofGk,l.

These blocks contain a total of(2l+k)s <(2l+

k + 1)s = qc c-edges. Hence, the c-edges of

graph(t0)cannot be contained within only2l+k

distinct blocks.

So we can find at least 2l + k + 1 c-edges in graph(t0) which are pairwise non-equivalent. There are at leastkedges among these which are equivalent to an edge inGk,l\graph(t0), because

graph(t0)contains at mostlcompletea-blocks of

Gk,l. Thus,tisk-distant.

5.2 LM-CFTG derivations are asynchronous

So far, we have not used the assumption that T

is an LM-CFTL. We will now exploit the pump-ing lemma to show that all derivation trees of an LM-CFTG forCSDmust be asynchronous.

Lemma 6. IfTis an LM-CFTL, then there exists

l0 ∈ N0 such that for every t ∈ T, there exists x/y∈Sepsuch thattisx, y, l0-asynchronous.

We prove this lemma by appealing to a class of derivation trees in which predicate and argument tokens are generated in separate parts.

Definition 2 (x, y, l-separated derivation). Let

l-separation oftwith a smallerC0.

Intuitively, we can use the pumping lemma to systematically remove some contexts from at ∈

T. From the shape of CSD, we can conclude certain alignments between the strings and graphs generated by these contexts and establish bounds on the number ofx- andy-edges generated by the lower part of a separated derivation. The full proof is in the appendix; we sketch the main ideas here.

Letpdenote the pumping height ofT. There is a maximal number of string tokens and edges that a context of height at most p can generate under

a given yield and homomorphism. We call this numberl0in the rest of the proof.

Lemma 7. Fort ∈ T, letr_yt be the length of the

maximal substring of yd(t) consisting in only y -tokens and containing the rightmost occurrence of

y inyd(t). If t isx, y, l0-separated, there exists

a minimalx, y, l0-separationDx[D0[ty]]oftsuch that, lettingt0 =D0[ty],et_y0 ≥nty−ntxl0−rty.

Moreover, for any x, y, l0-separation t = Ex[E0[t1y]], lettingt1=E0[t1y],etx1 ≤n

t1

x +ntxl0.

Proof (sketch). Both statements must be achieved in separated inductions on the height of t, although they mostly follow similar steps. We therefore focus here only on the crucial parts of the (slightly trickier) bound on et0

y . Let Dx[D0[ty]] be a minimal x, y, l0-separation of t

andt0=D0[ty].

Base CaseIfht(t) ≤p, we havent_y ≤ l0. We

also havent_x>0, sont_y−nt_xl0−rty ≤0≤e t0 ¯ y.

Induction step If h(t) > p, we apply

Lemma 1 to t to yield a decomposition t =

C1[C2[C3[C4[t5]]]], where t0 = C1[C3[t5]] ∈ T,

ht(t0) < ht(t)andht(C2[[C3]C4]) ≤p. We first

observe thatt0 isx, y, l0-separated. By induction,

there exists a minimal separationt0 =Cx[C0[t0y]]

witht0₀ =C0[t0y]validating the bound one t0₀ y .

Be-cause of pumping considerations, we need to dis-tinguish only three configurations ofC2 andC4.

We present only the most difficult case here. In this caseC2 andC4 generate only one kind

of bar symbol,y, and brackets. One needs to ex-amine all possible waysC2,C4 andt0 may

over-lap. We detail the reasoning in the case wheret0

does not overlap withC2 orC4. Then, since all y-tokens are generated by t0, projectivity of the

yield and the definition of CSD impose that the generatedy-tokens contribute to the rightmosty -chaini.e. r_yt =r_yt0 +nC2[C4]

y . Hencee t0

y ≥e

t0 0

y ≥

nt_y−nC2[C4]

y +ntxl0−r_yt0 +nC_y2[C4].

Lemma 8. For anyt∈T, iftisx, y, l0-separated

thentisx, y, l0-asynchronous.

Proof. By Lemma7there is a minimalx, y, l0

-separation t = Dx[D0[ty]] such that, for t0 = D0[ty], the bound onety0 and the bound one

t0

x both

obtain. Observe thatrt

y ≤ mty by definition, and

sincet0 generates at mostl0 x-tokens, by

projec-tivity it generates at most(l0+1)mtxx-tokens (one

sequence ofmt

xbetween each occurrence ofxand

(9)

Lemma 9. For anyt ∈ T,tisx, y, l0-separated

for somex/y∈Sep.

Proof(sketch). The proof proceeds by induction on the height oft.

If ht(t) ≤ p, then |yd(t)|z ≤ l0 for any z ∈

{a, b, c, d}, hence t is trivially x, y, l0-separated

for somex/y∈Sep.

If h(t) > p, Lemma 1 yields a decomposition

t =C1[C2[C3[C4[t5]]]], wheret0 =C1[C3[t5]]∈ T, ht(t0) < ht(t) andht(C2[C3[C4]]) ≤ p. By

induction t0 is x, y, l0-separated for some x/y ∈

Sep. Let us assumex/y = a/c, other cases are analoguous. The challenge is to conclude to the

x, y, l separation of t, after reinsertion of C2 and

C4int0.

IfC2andC4generate noaorc-token, the

dis-tribution of a- andc-tokens in the tree is not af-fected, hence t is a, c, l0-separated. Otherwise,

due to pumping considerations, we need to distin-guish three possible configurations regarding the shape of the yields ofC2andC4. We present one

here, see the appendix for the others; they are in the same spirit.

We consider the case where left(C2) contains

some a-token and no b, c, d-tokens, and yd(C4)

contains somec-token. Assumeleft(C4)contains

somec. It follows that allb-tokens are generated by C3. Sot has less thanl0 b-tokens, by

defini-tion ofCSDit has then also less thanl0 d-tokens,

so(C1, C2[C3[C4]], t5)is ad, b, l0-separation.

As-sume now thatright(C4) contains somec. It

fol-lows that t5 generate no d-token and C1

gener-ate no b-token. Hence (C1, C2[C3[C4]], t5) is a

b, d, l0-separation.

This concludes the proof of Lemma6and Thm.2.

6 Conclusion

We have established a notion of expressive capac-ity in compositional semantic parsing. We have proved that non-projective grammars can express sentence-meaning relations with bounded memory that projective ones cannot. This answers an old question in the design of compositional systems: assuming projective syntax, lambda-style com-positional mechanisms can be more expressive than unification-style ones, which have bounded “memory” for unfilled arguments.

From a theoretical perspective, the stronger re-sult of this paper is perhaps Thm.2, which shows without further assumptions that weakly equiva-lent grammar formalisms can differ in their

se-mantic expressive capacity. However, Thm.1may have a clearer practical impact on the development of compositional semantic parsers. Consider, for instance, the case of CCG, a lexicalized grammar formalism that has been widely used in seman-tic parsing (Bos, 2008; Artzi et al., 2015; Lewis et al., 2016). While a potentially infinite set of syntactic categories can be used in the parses of a single CCG grammar, CCG derivations are still projective in our sense. Thus, if one assumes that derivations should be aligned (which is natural for a lexicalized grammar), Thm.1implies that CCG with lambda-style semantic composition is more semantically expressive than with unification-style composition. Indeed, lambda-style compositional mechanisms are the dominant approach in CCG (Steedman, 2001; Baldridge and Kruijff, 2002;

Artzi et al.,2015).

Furthermore, under the alignment assumptions of Section 4, no unification-style compositional mechanism can describe string-meaning relations like CSD. This includes neural models. For in-stance, most transition-based parsers (Nivre,2008;

Andor et al.,2016;Dyer et al.,2016) are projec-tive, in that the parsing operations can only con-catenate two substrings on the top of the stack if they are adjacent in the string. Such transition sys-tems can therefore not be extended to transition-based semantic parsers (Damonte et al., 2017) without (a) losing expressive capacity, (b) giving up compositionality, (c) adding mechanisms for non-projectivity (G´omez-Rodr´ıguez et al., 2018), or (d) using a lambda-style semantic algebra. Thus our result clarifies how to build an effective and accurate semantic parser.

We have focused on whether a grammar formal-ism is projective or not, while holding the seman-tic algebra fixed. In the future, it would be inter-esting to explore how a unification-style composi-tional mechanism can be converted to a lambda-style mechanism with unbounded placeholders. This would allow us to specify and train semantic parsers using such abstractions, while benefiting from the efficiency of projective parsers.

Acknowledgments

(10)

References

Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally nor-malized transition-based neural networks. In Pro-ceedings of ACL.

Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. Broad-coverage CCG Semantic Parsing with AMR. InProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.

Jason Baldridge and Geert-Jan M. Kruijff. 2002. Cou-pling CCG and Hybrid Logic Dependency Seman-tics. InProceedings of the 40th ACL.

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract Meaning Representation for Sembanking. InProceedings of the 7th Linguis-tic Annotation Workshop and Interoperability with Discourse.

Emily M. Bender. 2008. Radical non-configurationality without shuffle operators: An analysis of Wambaya. In Proceedings of the 15th International Conference on HPSG.

Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The Grammar Matrix: An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the COLING Work-shop on Grammar Engineering and Evaluation.

Patrick Blackburn and Johan Bos. 2005. Represen-tation and Inference for Natural Language. CSLI Publications.

Johan Bos. 2008. Wide-coverage semantic analy-sis with Boxer. In Semantics in Text Processing. STEP 2008 Conference Proceedings. College Pub-lications.

David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, and Kevin Knight. 2013. Parsing graphs with hyperedge replacement grammars. InProceedings of the 51st ACL.

Hubert Comon, Max Dauchet, R´emi Gilleron, Flo-rent Jacquemard, Denis Lugiez, Sophie Tison, Marc Tommasi, and Christof L¨oding. 2007. Tree Au-tomata techniques and applications. Published on-line athttp://tata.gforge.inria.fr/.

Ann Copestake, Alex Lascarides, and Dan Flickinger. 2001. An algebra for semantic construction in constraint-based grammars. InProceedings of the 39th ACL.

Bruno Courcelle and Joost Engelfriet. 2012. Graph Structure and Monadic Second-Order Logic, a Lan-guage Theoretic Approach. Cambridge University Press.

Mary Dalrymple, John Lamping, Fernando C. N. Pereira, and Vijay Saraswat. 1995. Linear logic for meaning assembly. InProceedings of the Workshop on Computational Logic for Natural Language Pro-cessing.

Marco Damonte, Shay B. Cohen, and Giorgio Satta. 2017. An incremental parser for Abstract Meaning Representation. InProceedings of the 15th EACL.

Frank Drewes, Hans-J¨org Kreowski, and Annegret Ha-bel. 1997. Hyperedge replacement graph gram-mars. In G. Rozenberg, editor,Handbook of Graph Grammars and Computing by Graph Transforma-tion, pages 95–162. World Scientific.

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. Recurrent neural network grammars. InProceedings of NAACL.

Anette Frank and Josef van Genabith. 2001. GlueTag: Linear logic based semantics for LTAG. In Proceed-ings of the LFG Conference.

Claire Gardent and Laura Kallmeyer. 2003. Semantic construction in feature-based TAG. InProceedings of EACL.

Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, and Alexander Koller. 2018. AMR dependency parsing with a typed semantic al-gebra. InProceedings of ACL.

Carlos G´omez-Rodr´ıguez, Tianze Shi, and Lillian Lee. 2018. Global transition-based non-projective depen-dency parsing. InProceedings of ACL.

Aravind K. Joshi and Yves Schabes. 1997. Tree-Adjoining Grammars. In G. Rozenberg and A. Salo-maa, editors,Handbook of Formal Languages, vol-ume 3. Springer-Verlag.

Stephan Kepser and James Rogers. 2011. The equiva-lence of tree adjoining grammars and monadic linear context-free tree grammars. Journal of Logic, Lan-guage and Information, 20(3):361–384.

Alexander Koller. 2015. Semantic construction with graph grammars. InProceedings of the 11th Inter-national Conference on Computational Semantics, pages 228–238.

Mike Lewis, Kenton Lee, and Luke Zettlemoyer. 2016. LSTM CCG Parsing. InProceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

T. Maibaum. 1978. Pumping lemmas for term lan-guages. Journal of Computer and System Sciences, pages 319–330.

(11)

Joakim Nivre. 2008. Algorithms for deterministic in-cremental dependency parsing. Computational Lin-guistics, 34(4):513–553.

Xiaochang Peng, Linfeng Song, and Daniel Gildea. 2015. A synchronous hyperedge replacement gram-mar based approach for AMR parsing. In Proceed-ings of the 19th Conference on Computational Lan-guage Learning.

William Rounds. 1969. Context-free grammars on trees. InProceedings of the First Annual ACM Sym-posium on Theory of Computing (STOC).

Stuart M. Shieber. 1985.Evidence against the context-freeness of natural language. Linguistics and Phi-losophy, 8(3):333–343.

Mark Steedman. 2001. The Syntactic Process. MIT Press, Cambridge, MA.

David Weir. 1988. Characterizing mildly context-sensitive grammar formalisms. Ph.D. thesis, Uni-versity of Pennsylvania.

A Details of the proof of Theorem 1

Lemma 4. LetG= (G,yd)be a projective string

grammar. For anyr ∈_N₀there existss∈_N₀such that anyt ∈ L(G) with yd(t) ∈ a∗bscsd∗ has a subtreet0 such thatyd(t0)containsr occurrences ofxand no occurrences ofy, for somex/y∈Sep.

Proof. Depending onyd, one can always choose

s > r such that anyt with |yd(t)| > 2s has at

least one strict subtreet0with|yd(t0)| ≥2r. The lemma follows by induction over the height oft. It is trivially true for height 1. For the induc-tion step, consider thatw0 = yd(t0)must have at leastroccurrences of some letter because of pro-jectivity and the shape of yd(t); assume it is a, the other cases are analogous. Ifw0 has no occur-rences of c, we are done. Otherwise, by projec-tivity,w0 contains all theb’s, i.e.w0 ∈ a∗bsc+d∗. In this case, eitherw0 containss > roccurrences of b and no occurrences of d, in which case we are again done. Or it contains an occurrence ofd; thenw0 ∈a∗bscsd∗is in the shape required by the lemma, and we can apply the induction hypothesis to identify a subtreet00oft0 withroccurrences of somexand none of the correspondingy; andt00is also a subtree oft.

B Details of the proof of Theorem 2

In all of the following, we assume that for some

k ∈ _N₀ we haveCSD = REL(G, h,H_k), where

G = (G,yd) is an LM-CFTG (hence projective,

i.e. yd = yd_pr). We letT = L(G) andp be the pumping height ofT.

B.1 Terminology

Let us extend the domain ofydto contexts: for a contextC, we letyd(C) =left(C)·right(C).

We say that a stringsisbalancedif, for anyz∈

{a, b, c, d}and any positioniinssuch thatsi =z

there are two encompassing positionsk ≤ i ≤ l

such thats_[_k,l_]∈ {hn_z_in_|_n_∈

N0}. We say that a

tree or a context is balanced if its yield is balanced. By construction, all trees ofTare balanced.

For t ∈ T, a pumping decomposition of t

is a 5-tuple (C1, C2, C3, C4, t5), consisting in

4 contexts C1-C4 and one tree t5 such that t = C1[C2[C3[C4[t5]]]], ht(C2[C3[C4]]) ≤ p,

ht(C2) + ht(C4) > 0 and for any i ∈ N0, C1[vi[t5]] ∈ T, where we let v0 = C3 and vi+1=C2[vi[C4[X]]].

B.2 Pumping considerations

Lemma 10. Let t ∈ T with ht(t) > p,

and consider a pumping decomposition t =

C1[C2[C3[C4[t5]]]]. Lets = left(C2)·left(C4)·

right(C4)·right(C2) =yd(C2[C4]). The two

fol-lowing propositions obtain:

• For any(x, y)∈ {(a, c),(b, d)},|s|x =|s|y.

• Lett0 = C1[C3[t5]]. For u ∈ T∆ andz ∈

{a, b, c, d, a, b, c, d}leteu_z denote the number

ofz-edges ingraph(u). It holds for anyz ∈ {a, b, c, d, a, b, c, d}thatet_z=e_zt0+|s|z. Proof. Let (x, y) ∈ {(a, c),(b, d)}. t ∈ T so yd(t)∈CSDswhich entails

|yd(t)|x=|yd(t)|y. (1)

since t0 ∈ T by construction, we have

hyd(t0),graph(t0)i ∈CSD. From there

|yd(t0)|x=|yd(t0)|y. (2)

But|yd(t)|_x,y =|yd(t0)|_x,y+|s|_x,y. Plugging this into (1) yields

|yd(t0)|_x+|s|_x=|yd(t0)|_y+|s|_y.

Simplifying using (2) we find |s|x = |s|y which

establishes the first point. For the second point, we have fromhyd(t),graph(t)i ∈CSDand by defini-tion ofCSDet_z =|yd(t)|z =|yd(t0)|z+|s|z.

Sim-ilarily since hyd(t0),graph(t0)i ∈ CSD we have

et_z0 =|yd(t0)|z. Henceetz =et 0 z +|s|z.

We will now present a pair of lemmas stat-ing, in formal terms, that decompositions t =

C1[C2[C3[C4[t5]]]] provided by the pumping

(12)

• First, in the case where the ’pumpable’ con-texts C2 and C4 generate only ‘bar’ tokens

and brackets in {a, b, c, d,h,i}∗, we show that yd(C2) ∈ {h,i}∗, so that only C4 is

actually pumping ‘bar’ tokens of some kind. Moreover,t5 generates only ’bar’ tokens and

brackets as well.

• Second, we explore the alternative, where the ’pumpable’ contexts generate some of the ‘core’ tokens in {a, b, c, d}, say – for the sake of this informal presentation – somea -tokens. By lemma10, they must generate as manyc-tokens, for which we can again dis-tinguish three possible configurations: 1.a’s andc’s are respectively generated on differ-ent sides of a single context (C2 and/orC4),

but then neitherC2 norC4 generate anybor d-tokens. 2.C2 generate bothaandd-tokens

(on the left and right sides respectively) and nobandc-tokens, whileC4 ensures

genera-tion of correspondingbandc-tokens (on the left and right sides respectively). 3. Or else, one ofC2, C4 generates thea-tokens and no c,bordwhile the other generates the corre-spondingc-tokens and noa,bord.

Below follows the formal presentation of these lemmas:

Lemma 11. Let t ∈ T with ht(t) > p,

and consider a pumping decomposition t =

C1[C2[C3[C4[t5]]]] such that for all z ∈

{a, b, c, d}, |yd(C2[C4])|z = 0. There is x ∈

{a, b, c, d}such that all of the following holds:

1. yd(C2)∈ {h,i}∗andyd(C4)∈ {h, x,i}∗.

2. Either left(C4) ∈ {x}∗ and |left(C3)|z = 0for any z ∈ {a, b, c, d}, or symmetrically,

right(C4) ∈ {x}∗ and|right(C3)|z = 0for

anyz∈ {a, b, c, d}.

3. |yd(t5)|z = 0for anyz∈ {a, b, c, d}

Proof. First point: Lets = left(C1) andn0 =

|s|h. Let y ∈ {a, b, c, d} and assume y ∈

left(C2). Pumping C2-C4 n0 + 1-times yields

a tree tn0+1 ∈ T such that s ·left(C2)

n0+1 _is

a prefix of yd(tn0+1). We thus see that tn0+1

is not balanced, which is in contradiction with

tn0+1 ∈ T. A symmetric argument establishes

thaty /∈right(C2).

Assume now that there are two distinctx, y ∈

{a, b, c, d}such thatx∈yd(C4)andy∈yd(C4).

Notice that, sinceC4does not contain non-bar

to-kens, if x and y occur on the same side of C4

(for instanceleft(C4) = hxihyi) thent /∈ T

be-cause no string inCSDSadmitsleft(C4)as a

sub-string, whereasyd(t) does. So xandy must oc-cur on distinct sides. It follows that C4 does not

generate tokens in {h,i} either: if for instance left(C4) =u· h ·x·vfor some stringsuandvin

{xh,i}∗,u·h·x·v·u·h·x·vwould be a substring of

C1[C2[C2[C3[C4[C4[t5]]]]]] ∈Twhich again is a

contradiction. Let nown1 = |yd(t5)|i. Pumping C2-C4n1+ 1times yields a treetn1+1 ∈ Twith

a substring of the formxn1+1_yd₍_t

5)yn1+1 (up to x/ysymmetry) which cannot be balanced, yield-ing a final contradiction.

Second point: yd(C4) ∈ {h/ ,i}∗, because

oth-erwise pumping C2 andC4 more times than the

maximum number of occurrences of a bar to-ken inyd(t) would yield an unbalanced tree. So there is a x such that x ∈ left(C4) or x ∈

right(C4). Assume for contradiction that any

dif-ferent token occurs on the same side ofC4 then C1[C2[C2[C3[C4[C4[t5]]]]]] ∈ T contains a

sub-string that cannot be found in any sub-string of CSD yielding a contradiction. So left(C4) ∈ {x}∗ or

right(C4) ∈ {x}∗. Assume left(C4) ∈ {x}∗, the

other case is symmetric. Assume for contradiction that|left(C3)|z >0for somez∈ {a, b, c, d}. Let n2 = |yd(C3)|h. PumpingC2-C4 n2 + 1 times

yields a treetn2+1 _∈_T_{such that (by projectivity)}

yd(tn2+1₎_{has a substring of the form}_z_·_u_·_xn2+1

where |u|_h ≤ n2. Hence tn2+1 ∈ Tis not

bal-anced, yielding a contradiction.

Third point: Assume for contradiction that

|yd(t5)|z >0. Assume thatleft(C4) ∈ {x}∗, the

caseright(C4) ∈ {x}∗ is symmetric, and point 2

ensures that these two cases are exhaustive. Let

n3 = |t5|i and consider the treetn3+1 ∈ T ob-tained by pumpingC2-C4 n3 + 1times. By

pro-jectivity, yd(tn3+1₎ _{has a substring of the form}

xk·u·z·vwithk≥n3+ 1and|u|i≤n3. Hence tn3+1 _{is not balanced and}_tn3+1 _∈_/ _T_{, yielding a}

contradiction.

Lemma 12. let t ∈ T with ht(t) > p,

and consider a pumping

decomposi-tion t = C1[C2[C3[C4[t5]]]]. Let

(x, y, X, Y) ∈ {(a, c, A, C),(b, d, B, D)}

such that |yd(C2[C4])|x 6= 0. One of the following obtains:

1. For some(i, j) ∈ {(2,4),(4,2)},left(Ci) ∈ X+_, _right₍_C

i) ∈ Y+, left(Cj) ∈ X∗ and

right(Cj)∈Y∗.

(13)

B+andright(Cj)∈C+.

3. Either left(C2) ∈ X+, right(C2) = and

left(C4)·right(C4) ∈Y+, or symmetrically

left(C2) =,right(C2) ∈Y+andleft(C4)·

right(C4)∈X+.

Proof. All these observations follow easily from the first point of Lemma 10 (governing the rela-tive number of occurrences of a, c-tokens on one hand andb, d-tokens on the other hand), projectiv-ity, and the following observation: only one side ofC2orC4cannot generate two different kinds of

tokens in{a, b, c, d}or be unbalanced. Otherwise pumping would (from projectivity) ensure that the resulting tree has a substring of a shape impossible forCSD(for example, if bothaandb-tokens occur on the same side ofC2, pumping once produces a

substringa·u·b·v·a·u·b·v).

B.3 Separation

Lemma 13. Lett=C1[C2[C3[C4[t5]]]] ∈Tand

t0 =C1[C3[t5]]∈ T. Iftisx, y, l-separated then

so ist0.

Proof. Consider an x, y, l-separation oft: t =

Dx[D0[ty]]. Let Cx, C0 and t0y be respectively

obtained by removing all nodes from C2 or C4

from Dx, D0 and ty. One easily checks that t0 =Cx[C0[t0y]].

Moreover, |yd(Cx)|y ≤ |yd(Dx)|y = 0,

|yd(C0)|x ≤ |yd(D0)|x ≤ l and |yd(t0y)|x ≤

|yd(ty)|x= 0. Hencet0isx, y, l-separated.

B.4 Minimality argument

Lemma 14. Let t = C1[C2[C3[C4[t5]]]] ∈ T

and t0 = C1[C3[t5]] ∈ T such that t is x, y, l

-separated. By Lemma 13, t0 is separated. Let

Dx[D0[ty]] be a minimal separation of t and Cx[C0[t0y]]be a minimal separation oft0. D0[ty] contains all nodes ofC0[t0y].

Proof. Assume for contradiction that a node π

of C0 is not in D0. It must then be in Dx or ty. Assume that it is in Dx, the case where it

is in ty is analoguous. Since π is not in D0,

there is a non-trivial subcontextD_x0 ofDx rooted

at π, i.e. Dx = Dx00[Dx0] with ht(D0x) > 0.

Let C_x00, C_x0 be obtained by removing all nodes from C2 or C4 from D00x and D0x respectively.

By definition ofDx, |yd(Dx00[D0x])|y = 0, hence

|yd(C_x00[C_x0])|y = 0. Further observe that we have Cx[C0] = Cx00[Cx0[C00]] for some subcontext C00

ofC0. Since π is not in C2 or C4, ht(Cx0) > 0

thusht(C₀0) <ht(C0). But lettingEx =Cx00[Cx0], Ex[C00[t0y]]is then anx, y, l-separation oftwhich

contredicts the assumed minimality ofCx[C0[t0y]].

B.5 Inductive bounds

For any tree or contexttand symbolx, let us write

nt_xas a shorthand for|yd(t)|x,etx for the number

ofx-edges generated bytandrt_xthe length of the rightmost maximal substring ofyd(t)consisting in onlyx-tokens (more formally,r_xt =|s|x, wheres

is the unique substring such thatyd(t) =u·s·v

wheres ∈ x∗, ifu is non empty its last token is notx, and|v|x= 0).

There is a maximal number of string tokens and edges that a context of height at mostpcan gen-erate under the considered yield and homomor-phism. We calll0this number and focus from now

onl0-separated andl0-asynchronous derivations.

Below are the proofs of the two statements of Lemma 7 of the main paper (respectively, 7-1 and 7-2).

Lemma 7-1. If t ∈ T is x, y, l0-separated and

t=Dx[D0[ty]]is anx, y, l0-separation oft, then

fort0 =D0[ty]we have

et0

x ≤n

t0

x +n

t

xl0. (xbound)

Proof. We prove the result by induction over the pair(ht(t0),ht(t))(with lexicographic ordering).

Base CaseAssumeht(t0)≤p. Thenet0 ≤l0.

Sinceyd(t)∈CSDs,ntx>0, thusntx0+ntxl0 ≥l0

which ensures the bound.

Induction step If h(t0) > p then h(t) ≥

h(t0) > p. We apply Lemma 1 to t to yield

a decomposition t = C1[C2[C3[C4[t5]]]], where

t0 = C1[C3[t5]] ∈ T, ht(t0) < ht(t) and

ht(C2[C3[C4]]]) ≤p. Notice thatt0 cannot

over-lap withC2[C3[C4]without overlapping with C1

ort5as well, for otherwiseh(t0)≤p.

As in the proof of Lemma13, lettingCx, C0, t0y

be obtained by removing all nodes from C2 and C4fromDx,D0andtyrespectively, we obtain an

x, y, l-separation t0 = Cx[C0[t0y]]. We let t00 =

C0[t0y]and distinguish between possible

configu-rations forC2 andC4:

Case 0 If neither C2 or C4 generate any x

-token, we find by induction

et 0 0

x ≤n

t00

x +n

t0 xl0.

Moreover, we haveet 0 0

x =e

t0

x,n

t0 0

x =n

t0

x andnt

0

x ≤

(14)

(15)

nort5 generate anyy-token, projectivity imposes r_yt = r_yt0 + nC2[C4]

y and we can conclude as

in the previous case. The only remaining sub-case is when |yd(C3)|x = 0, in which case t

is 0-separated, and considering the (minimal) 0 -separation (C1, X, C2[C3[C4]]) we can use the

same argument as in the case where t0

encom-passes all nodes ofC2andC4.

Case 2Consider now the remaining case where

Lemma12 applies. If neitherC2 orC4 generate

some x or y-token, they don’t generate x or y -tokens either, and the same reasoning as Case 1 subcase i) applies. OtherwiseC2[C4]generate at

least somex-token. We then havent_x ≥ nt_x0 + 1. Sincet0contains all nodes fromt00we further have

et0

y ≥e

t0 0

y. Finallynty ≤nt 0

y +l0. We conclude

us-ing inequation3.

B.6 Conclusion

Lemma 8. For anyt∈T, iftisx, y, l0-separated

thentisx, y, l0-asynchronous.

Proof. By Lemma 7-2, there is a minimal

x, y, l0-separation t = Dx[D0[ty]] such that the

y bound obtains fort0 = D0[ty]. By lemma 7-1

the x bound obtains for t0 as well. Observe

fi-nally, thatrt_y ≤mt_yand sincet0generates at most l0x-tokens, by projectivity and definition ofCSD,

it generates at most(l0+ 1)mtx x-tokens (one

se-quence of mt_x between each occurrence ofx and the next, plus possibly one in front of the first and one after the last). Hence,

et0

y ≥n

t

y−ntxl0−mty et0

x ≤ntxl+mtx(l0+ 1).

andtisx, y, l0-asynchronous.

Lemma 9. For anyt ∈ T,tisx, y, l0-separated

for somex/y∈Sep.

Proof. The proof proceeds by induction on the height oft.

If ht(t) ≤ p. Then|yd(t)|z ≤ l0 for anyz ∈

{a, b, c, d}, hence t is trivially x, y, l0-separated

for somex/y∈Sep.

If h(t) > p, Lemma 1 yields a decomposition

t =C1[C2[C3[C4[t5]]]], wheret0 =C1[C3[t5]]∈ T,ht(t0) <ht(t)andht(C2[C4])≤p. By

induc-tion t0 is x, y, l0-separated for some x/y ∈ Sep.

For sake of succintness, let us present the induc-tive step forx/y = a/c, the reasoning for other cases is analoguous. Let us examine the different possible configurations ofC2andC4.

Case 1If Lemma11appliesi.e.C2andC4

gen-erate only one kind of bar token,z, and brackets, one checks easily that inserting C2 and C4 does

not change the distribution of a and c-tokens in the tree, hencetisa, c, l0-separated.

Case 2If Lemma12applies, note first that ifC2

andC4 generate noaorc-token, we can conclude

as in Case 1 as the distribution ofaandc-tokens in the tree is not changed either. Otherwise, we as-sume thatC2orC4generate someaorc-token and

distinguish between subcases 1-3 of Lemma12:

Subcase 1 in this case for some i ∈ {2,4}

left(Ci) contains an a-token and no b, c or d

-token whileright(Ci) contains some c-token and

noa, bord-token. Assumei= 2, the case where

i = 4 is similar. By projectivity and definition ofCSDsfollows that allb-tokens are generated in C3[C4[t5]] and allc-tokens inC1. tis therefore

b, d,0-separated, henceb, d, l0-separated.

Subcase 2in this case,left(C2)contains some

a-token and no b, c, d-token, right(C3) contains

somed-token and no a, c, d-token, left(C4)

con-tains someb-token and noa, c, d-token,right(C4)

contains somec-token and noa, b, d-token. It fol-lows thatt5generate no occurrence ofaandC1no

occurrence of c. Since|yd(C2[C3[C4]])|a ≤ l0,

(C1, C2[C3[C4]], t5)is ana, c, l0-separation.

Subcase 3 Assume left(C2) contains some a

-token and no b, c, d-token and that left(C4)

con-tains some c-token. It follows that all b-tokens are generated by C3. So yd(t) contains less than l0 b-tokens, by definition ofCSD it also contains

less thanl0 d-tokens, so(C1, C2[C3[C4]], t5)is a

d, b, l0-separation.

Assume now left(C2) contains some a-token

and no b, c, d-token and that right(C4) contains

some c-token. It follows that t5 generate no d-token and C1 generate no b-token. Hence

(C1, C2[C3[C4]], t5)isb, d, l0-separation.

The remaining cases are symmetric exchanging