Learning Tier based Strictly 2 Local Languages

(1)

Learning Tier-based Strictly 2-Local Languages

Adam Jardine and Jeffrey Heinz University of Delaware {ajardine,heinz}@udel.edu

Abstract

The Tier-based Strictly 2-Local (TSL2) lan-guages are a class of formal lanlan-guages which have been shown to model long-distance phonotactic generalizations in natural lan-guage (Heinz et al., 2011). This paper in-troduces the Tier-based Strictly 2-Local In-ference Algorithm (2TSLIA), the first non-enumerative learner for the TSL2 languages. We prove the 2TSLIA is guaranteed to con-verge in polynomial time on a data sample whose size is bounded by a constant.

1 Introduction

This work presents the Tier-based Strictly 2-Local

Inference Algorithm (2TSLIA), an efficient learn-ing algorithm for a class ofTier-based Strictly Lo-cal(TSL) formal languages (Heinz et al., 2011). A TSL class is determined by two parameters: thetier, or subset of the alphabet, and the permissible tier k-factors, which are the legal sequences of length

k allowed in the string, once all non-tier symbols have been removed. The Tier-based Strictly2-Local

(TSL2) languages are those in whichk= 2.

As will be discussed below, the TSL languages are of interest to phonology because they can model a wide variety of long-distance phonotactic patterns found in natural language (Heinz et al., 2011; Mc-Mullin and Hansson, forthcoming). One example is derived from Latin liquid dissimilation, in which two ls cannot appear in a word unless there is an

r intervening, regardless of distance. For exam-ple, floralis ‘floral’ is well-formed but not *mili-talis(cf. militaris ‘military’). As explained in

sec-tions 2 and 4, this can be modeled with permissible

2-factors over a tier consisting of the liquids_{l, r_}.

For long-distance phonotactics,kcan be fixed to 2, but it is does not appear that the tier can be fixed

since languages employ a variety of different tiers. This presents an interesting learning problem: Given a fixed k, how can an algorithm induce both a tier

and a set of permissible tierk-factors from positive data?

There is some related work which addresses this question. Goldsmith and Riggle (2012), building on work by Goldsmith and Xanthos (2009), present a method based on mutual information for learn-ing tiers and subsequently learnlearn-ing harmony pat-terns. This paper differs in that its methods are rooted firmly in grammatical inference and formal language theory (de la Higuera, 2010). For instance, in contrast to the results presented there, we prove the kinds of patterns 2TSLIA succeeds on and the kind of data sufficient for it to do so.

Nonetheless, there is relevant work in computa-tional learning theory: Gold (1967) proved that any finite class of languages is identifiable in the limit via an enumeration method. Given a fixed alphabet and a fixedk, the number of possible tiers and

per-missible tierk-factors is finite, and thus learnable in

this way. However, such learners are grossly inficient. No provably-correct, non-enumerative, ef-ficient learner for both the tier and permissible tier

k-factor parameters has previously been proposed.

This work fills this gap with an algorithm which learns these parameters whenk = 2 from positive

data in time polynomial in the size of the data. Finally, Jardine (2016) presents a simplified

ver-87

(2)

sion of 2TSLIA and reports the results of some sim-ulations. Unlike that paper, the present work pro-vides the full mathematical details and proofs. The simulations are discussed in the discussion section.

This paper is structured as follows. §2 moti-vates the work with examples from natural language phonology. §3 outlines the basic concepts and def-initions to be used throughout the paper. §4 defines the TSL languages and discusses their properties.§5 details the 2TSLIA and proves it learns the TSL2

class in polynomial time and data. §6 discusses fu-ture work, and§7 concludes.

2 Linguistic motivation

The primary motivation for studying the TSL lan-guages and their learnability comes from their rel-evance to phonotactic (word well-formedness) pat-terns in natural language phonology. Many phono-tactic patterns belong to either the Strictly Local (SL) languages (McNaughton and Papert, 1971) or the Strictly Piecewise (SP) languages (Rogers et al., 2010).1 _{An example of phonotactic knowledge} which is SL is Chomsky and Halle’s (Chomsky and Halle, 1965) observation that English speakers will classify blick as a possible word of English while rejecting bnick as impossible. This is SL because it can be described as *bn being unacceptable as a word-initial sequence. SP languages can describe long-distance dependencies based on precedence re-lationships between sounds (such as consonant har-mony in Sarcee, in whichsmay not follow aS any-where in a word, but may precede one (Cook, 1984)) (Heinz, 2010a). Also, the SL and SP languages are efficiently learnable (Garc´ıa et al., 1990; Heinz, 2010a; Heinz, 2010b; Heinz and Rogers, 2013).

However, there are some long-distance patterns which cannot be described purely with precedence relationships. One example is a pattern from Latin, in which in certain cases anlcannot follow another

lunless anr intervenes, no matter the distance be-tween them (Jensen, 1974; Odden, 1994; Heinz, 2010a). This can be seen in the-alisadjectival suf-fix (Example 1), which appears as-arisif the word it attaches to already contains an l((d) through (f) 1_{The relationship between these formal language classes and} human cognition (linguistic and otherwise) is discussed in more detail in Rogers and Pullum (2011), Rogers and Hauser (2010), Rogers et al. (2013), and Lai (2013; 2015).

in Example 1), except in cases where there is an in-terveningr, in which it appears again as -alis ((g) through (i) in Example 1). In the examples, the sounds in question are underlined for emphasis (data from Heinz (2010a)), and for (d) through (f), illicit forms are given, marked with a star (*).

Example 1.

a. navalis ‘naval’ b. episcopalis ‘episcopal’ c. infinitalis ‘negative’ d. solaris ‘solar’ (*solalis) e. lunaris ‘lunar’ (*lunalis)

f. militaris ‘military’ (*militalis) g. floralis ‘floral’

h. sepulkralis ‘funereal’ i. litoralis ‘of the shore’

This non-local alternating pattern is not SP be-cause SP languages cannot express blocking effects (Heinz, 2010a). However, it can be described with a TSL grammar in which the tier is {r, l} and the permissible tier 2-factors do not include *lland *rr.

This yields exactly the set of strings in which anlis always immediately (disregarding all sounds besides {r, l_}) followed by anr, and vice versa.

Formal learning algorithms for SL and SP lan-guages can provide a model for human learning of SL and SP sound patterns (Heinz, 2010a). TSL lan-guages are also similarly learnable, given the stip-ulation that both the tier andk are fixed. For

nat-ural language, the value for k never seems to go above 2 (Heinz et al., 2011). However, tiers vary

in human language—TSL patterns occur both with different kinds of vowels (Nevins, 2010) and con-sonants (Suzuki, 1998; Bennett, 2013). For exam-ple, in Turkish the tier is the entire vowel inven-tory (Clements and Sezer, 1982), while in Finnish it is vowels except /i,e/ (Ringen, 1975). In Samala consonant harmony, the tier is sibilants (Rose and Walker, 2004), whereas in Koorete, the tier is sibi-lants and {b,r,g,d}(McMullin and Hansson, forth-coming). Thus, it is of interest to understand how

boththe tier and permissible tier2-factors for TSL2

grammars might be learned efficiently.

3 Preliminaries

(3)

S2. LetP(S)denote the powerset ofSandPf in(S)

be the set of all finite subsets ofS.

LetΣdenote a set of symbols, referred to as the alphabet, and let a string over Σ be a finite

se-quence of symbols from that alphabet. The length of a stringwwill be denoted_|w_|. Letλdenote the

empty string; |λ| = 0. Let Σ∗ (Kleene star)

repre-sent all strings over this alphabet, andΣkrepresent

all strings of lengthk. Concatenation of two strings wandv(or symbols σ1 andσ2) will be writtenwv

(σ1σ2). Special beginning (o) and end (n) symbols

(o,n _6∈ Σ) will often be used to mark the

begin-ning and end of words; the alphabet Σaugmented

with these,Σ_{∪ {}o,n_}, will be denotedΣ_on.

A string u _∈ Σ∗ is said to be a factor or sub-string of another string w if there are two other

stringsv, v0 _∈ Σ∗ such thatw = vuv0. We callu

a k-factor of w if it is a factor of w and|u_| = k. Letfack : Σ∗ → P(Σ≤k)be a function mapping

strings to their k-factors, where fack(w) equals

{u_|uis ak-factor ofw_}if|w_| > kand equals {w_}

otherwise. For example,fac2(aab) ={aa, ab}and

fac8(aab) = {aab}. We extend thek-factor

func-tion to languages;fack(L) =S_w_∈_Lfack(w).

3.1 Grammars, languages, and learning

A language (or stringset) L is a subset of Σ∗. If L is finite, let|L_|denote the cardinality of L, and let||L||denote thesizeofL, which is defined to be

P

w∈L|w|. LetL1·L2denote the concatenation of

the languagesL1 andL2, i.e., the pairwise

concate-nation of each word inL1 to each word inL2. For

notational simplicity, the concatenation of a single-ton language{w_}to another languageL2 (orL1to

{w}) will be writtenwL2(L1w).

A grammar is a finite representation of a pos-sibly infinite language. A class L of languages is represented by a classRof representations if every

r_∈Ris of finite size and there is a naming function

L:R→Lwhich is both total and surjective. The learning paradigm used in this paper is identi-fication in the limit learning paradigm (Gold, 1967), with polynomial bounds on time and data (de la Higuera, 1997). This paradigm has two complemen-tary aspects. First, it requires that the information which distinguishes a learning target from other po-tential targets be present in the input for algorithms to successfully learn. Second, it requires successful

algorithms to return a hypothesis in time polynomial of the size of the sample, and that the size of the sam-ple itself must be polynomial in the size of grammar. The definition of the learning paradigm (Defini-tion 3) depends on some preliminary no(Defini-tions.

Definition 1. LetLbe a class of languages repre-sented by some classRof representations.

1. An inputsample I for a languageL ∈ Lis a finite set of data consistent withL, that is to say I _⊆L.

2. A (_L,_R)-learning algorithm A is a program

that takes as input a sample for a language L_∈Land outputs a representation fromR.

The notion of characteristic sample is integral.

Definition 2(Characteristic sample). For a(L,R )-learning algorithmA, a sampleCSis a

characteris-tic sampleof a languageL∈ Lif for all samplesI forLsuch thatCS ⊆I,Areturns a representation

rsuch that_L(r) =L.

Now the learning paradigm can be defined.

Definition 3(Identification in polynomial time and data). A class L of languages is identifiable in polynomial time and data if there exists a (L,R )-learning algorithmAand two polynomialsp()and q()such that:

1. For any sample I of size mfor L ∈ L, A

re-turns a hypothesisr_∈Rin_O(p(m))time. 2. For each representationr _∈ _Rof sizen, there

exists a characteristic sample ofrforAof size

at mostO(q(n)).

4 Tier-based Strictly Local Languages This section introduces the Tier-based Strictly Local (TSL) languages (Heinz et al., 2011). The TSL lan-guages generalize the SL lanlan-guages (McNaughton and Papert, 1971; Garc´ıa et al., 1990; Caron, 2000), and as such these will briefly be discussed first.

4.1 The Strictly Local Languages

The SL languages can be defined as follows (Heinz et al., 2011):

Definition 4 (SL languages). A language L is Strictly k-Local (SLk) iff there exists a

fi-nite set S _⊆ fack(oΣ∗n) such that L =

(4)

Such a setS is sometimes referred to as the per-missible k-factors or permissible substrings of L.

For example, let L = _{λ, ab, abab, ababab, ..._}.

This L can be described with a set of permissible 2-factorsS = {on,oa, ab, ba, bn} because every 2-factor of every word inLis inS; thus,LisStrictly 2-Local(abbreviated SL2).

As a setS of permissiblek-factors is finite it can

also be viewed as a SL grammar where L(S) =

{w∈Σ∗|fack(own)⊆S}. An element s of an

SL grammarSfor a languageLisuseful(resp. use-less) iffs_∈fack(L)(s6∈fack(L)). Acanonical

SL grammar contains no useless elements.

In the example above,aa_6∈Sandaa_6∈fac2(w)

for any w _∈ L. Such a string is referred to as a

forbiddenk-factoror arestrictiononL. The set of

forbiddenk-factorsRisfack(oΣ∗n)−S.

Think-ing about the grammar in terms ofSor in terms ofR

is equivalent, but in some cases it is simpler to refer to one rather than the other, so we shall use both.

Any SLk class of languages is learnable with

polynomial bounds on time and data ifk is known

in advance (Garc´ıa et al., 1990; Heinz, 2010b). The class of SLklanguages (for each k) belongs

to a collection of language classes calledstring ex-tension language classes (Heinz, 2010b). The dis-cussion above presents SLklanguages from this

per-spective. These language classes have many desir-able learning properties, due to their underlying lat-tice structure (Heinz et al., 2012).

4.2 The TSL languages

The TSL languages can be thought of as a further pa-rameterization on thek-factor function where a

cer-tain subset of the alphabet takes part in the grammar and all other symbols in the alphabet are ignored. This special subset is referred to as atier T ⊆ Σ.

Symbols not on the tier are removed from consider-ation of the grammar by an erasing function ET :

Σ∗ → T∗ defined as ET(σ0σ1...σn) = u0u1...un

whereui =σiifσi ∈T; elseui=λ.

For example, if Σ = _{a, b, c_} and T = _{a, c_}

then ET(bbabbcbba) = aca. We can then define

a tier version facT-k of fack as facT-k(w) = fack(oET(w)n).

Here,oandnare built into the function as they are always treated as part of the tier. Continu-ing the example from above,facT-2(bbabbcbba) =

{oa, ac, ca, a_n_}. facT-k can be extended to lan-guages as withfackabove.

The TSL languages can now be defined parallel to the SL languages (the following definition is equiv-alent to the one in Heinz et al. (2011)):

Definition 5(TSL languages). A languageLis

Tier-based Strictlyk-Localiff there exists a subsetT _⊆Σ of the alphabet and a finite set S ⊆ fack(oT∗n)

such thatL=_{w_∈Σ∗_|facT-k(w)⊆S}

Parallel to SL grammars above, hT, S_i can be

thought of as a TSL grammar of L. Likewise, the forbidden tier substrings(or tier restrictions) R is

simply the setfacT-k(Σ∗)−S. Finally, hT, Si is canonical ifScontains no useless elements (i.e.,s∈

S ⇔ s ∈ facT-k(L(hT, Si))) and R is nonempty (this second restriction is explained below).

For example, let Σ = {a, b, c} and T =

{a, c} as above and let S = {on,oa,oc, ac, an, ca, cc, cn_}. Plugging these into Definition

5, we obtain a language L which only contains

those strings without the forbidden 2-factor aa on

tierT. These are words which may contain bs

in-terspersed with as and cs provided that no a

pre-cedes another without an interveningc. For

exam-ple, bbabbcbba _∈ L but bbabbbbabb _6∈ L, because ET(bbabbbbabb) =aaandaa ∈ fac2(oaan) but aa6∈S.

Like the class of SLklanguages, the class of TSL

languages (for fixedT and k) is a string extension

language class. The relationship of the TSL lan-guages to other sub-Regular language classes (Mc-Naughton and Papert, 1971; Rogers et al., 2013) is studied in Heinz et al. (2011).

Given a fixedkandT,Sis easily learnable in the

same way as the SL languages (Heinz et al., 2011). However, as discussed above, in the case of natural language phonology, it is not clear that information aboutT is available a priori. Learning both T and

Ssimultaneously is thus an interesting problem.

This problem admits a technical, but unsatisfying, solution. The number of subsetsT such thatT _⊆Σ

is finite, so the number of TSLk languages given a

fixed kis finite. It is already known that any finite

(5)

is of interest to pursue a smarter, computationally efficient method for learning them.

What are the consequences of varyingT? When T = Σ, the result is an SLk language, because the

erasing function operates vacuously (Heinz et al., 2011). Conversely, whenT =_∅, by Definition 5,S

is either∅or{on}. The former obtains the empty language while the latter obtainsΣ∗. By Definition

5,Σ∗ can be described with any T _⊆ Σas long as S =fac2(oT∗n). In such a grammar, none of the

members ofT serve any purpose; thus we stipulate

that for a canonical grammarRis nonempty.

Importantly, a member of T may fail to belong

to a string inR and still serve a purpose. For

ex-ample, let Σ = _{a, b, c_}, T = _{a, c_}, and S =

fac2(oT∗n)− {aa}. Becausecappears in no

for-bidden tier substrings inR, it is freely distributed in L = L(_hT, S_i). However, it makes a difference in the language, because aca ∈ L butaba 6∈ L. If c(and the relevant tier substrings in S) were

miss-ing from the tier, neitherabanoracawould be inL. This can be thought of a ‘blocking’ function ofc,

be-cause it allows sequencesa...aeven thoughaa_∈R.

We may now return to the Latin example from §2 in a little more detail. The Latin pattern, in which rs and ls must alternate, regardless of other intervening sounds. This can be captured by a TSL grammar in which T = _{l, r_} and S =

{ol,_or, lr, rl, rr, r_n, l_n_}. This captures the

gen-eralization that, ignoring all sounds besideslandr, l and l are never allowed to be adjacent. The

re-mainder of the paper discusses how such a grammar, includingT, may be induced from positive data.

5 Algorithm

This section introduces the Tier-based Strictly 2 -Local Inference Algorithm (2TSLIA), which in-duces both a tier and a set of permissible tier 2

-factors from positive data. First, in §5.1, the con-cept of a path, which is crucial to the learner, is defined. §5.3 introduces and describes the algo-rithm, and§5.4 defines a distinguishing example and proves that it is a characteristic sample for which the algorithm is guaranteed to converge. Time and data complexity for the algorithm are discussed there.

5.1 Paths

First, we define the concept of 2-paths. How this

concept might be generalized tokis discussed in_§6.

Paths denote precedence relations between symbols in a string, but they are further augmented with sets of intervening symbols. Formally, a2-path is a 3

-tuplehx, Z, y_iwherex andy are a symbol inΣ_on

andZ is a subset ofΣ. The2-paths of a stringw= σ0σ1. . . σn, denotedpaths2(w), are

paths2(w) =hσi, Z, σji

i < jand

Z ={σz|i < z < j}

Essentially, the 2-paths are pairs of symbols in a string ordered by precedence and interpolated with the sets of symbols intervening between them. For example,ha,_{b, c_}, d_i is inpaths2(abbcd)because a precedes d and {b, c} is the set of symbols that come between them. Intuitively, it gives the set of symbols one must ‘travel over’ in order to get from one symbol to another in a string. As shall be seen shortly, this information is essential for how the al-gorithm works. Let paths2() be extended to

lan-guages such that paths2(L) = S_w_∈_Lpaths2(w).

As we are here only concerned with 2-paths, we

henceforth simply refer to them as ‘paths’.

Remark 1. The paths of a string w can be calcu-lated in time at most quadratic in the size of w = σ1· · ·σn.

To see why, consider an_×ntable and consideri, j

such that1_≤i < j _≤n. The idea is each nonempty cell in the table contains the set of intervening sym-bols betweenσiandσj. Letp(x, y)denote the entry

in thex-th row and they-th column. The following hold: p(i, i+ 1) = ∅; p(i, i+ 2) = {σi+1}; and p(i, j) = S_i_≤_x_≤_j₋₂p(x, x+ 2)for anyj _≥i+ 3.

Since each of these operations is linear time or less, the size of the table, which equalsn2, provides an

upper bound on the time complexity ofpaths2(w).

(This bound is not tight since half the table is empty.)

5.2 Terminology

Before introducing the algorithm we define some terms which are useful for understanding its oper-ation. For a TSL2 grammar G = hT, Si, T ⊆ Σ

will be referred to as thetier,Stheallowed tier sub-strings, and R = facT-2(Σ∗) −S as the

(6)

non-tier elements ofG. For any elements σi, σj ∈ T,

theirtier adjacencywith respect to a setLof strings

refers to whether or not they may appear adjacent on the tier; formally,σi, σj are tier adjacent inLiff

σiσj ∈facT-2(w)for somew∈L.

Anexclusive blocker is a σ ∈ T which does not

appear inR. That is,∀w ∈ R, σ 6∈ fac1(w). An

exclusive blocker is thus not restricted in its distri-bution but may intervene between other elements on the tier. Thefree elementsof a grammarGwill refer

to the union of the set of the nontier elements ofG

and exclusive blockers givenG.

Given an order σ0, σ1, ..., σn on Σ, let Σi =

{σh|h < i}refer to the elements of Σless thanσi

andJi= Σ−Σibe the elementsσiand greater. Note

thatΣ0 =∅andJ0 = Σ. LetHi = H∩Σi be the

non-tier elements less thanσiandTi= (T∩Σi)∪Ji

be the expanded tier given σi, or the tier plus

ele-ments fromJi. Note thatHiandTiare complements

of each other with respect toΣ. In the context of the

algorithm,Ti will represent the algorithm’s current

hypothesis for the tier. When referring not to the or-der onΣbut to the index of positions in a string or

path,τ1, τ2, ...,∈Σshall be used.

For a pathhτ1, X, τ2i, we refer toτ1andτ2as the symbolson the path, and X as the intervening set.

For a set of pathsP, the term tier pathswill refer

to the paths whose symbols are onT_on. Formally,

the tier paths arePT = {hτ1, X, τ2i ∈ P|τ1, τ2 ∈ T_on, X _⊆ Σ_}. Note that PT is restricted only by

the symbols on the path; the intervening setXmay

beanysubset ofΣ.

Example 2. LetΣ =_{a, b, c, d_}with the ordera < b < c < dand letT =_{b, d_}. Referring tobasσ2 in the order,Σ2 =H2={a}, andT2 ={b, c, d}. If w=bacd, andP =paths2(w), then

P =n_ho,_{}, b_i,_ho,_{b_}, a_i,

ho,_{a, b_}, c_i,_h_o,_{a, b, c_}, d_i,_h_o,_{a, b, c, d_},_n_i,

hb,{}, ai,hb,{a}, ci,hb,{a, c}, di,hb,{a, c, d},ni,

ha,_{}, c_i,_ha,_{c_}, d_i,_ha,_{c, d_},n_i,_hc,_{d_},n_i,

hc,_{}, d_i,_hd,_{},_n_io,

PT2 =

n

ho,{}, bi,ho,{a, b}, ci,ho,{a, b, c}, di,

ho,_{a, b, c, d_}n_i,_hb,_{a_}, c_i,_hb,_{a, c_}, d_i,

hb,{a, c, d},ni,hc,{d},ni,hc,{}, di,hd,{},nio, andPT =

n

ho,_{}, b_i,_ho,_{a, b, c_}, d_i,

ho,_{a, b, c, d_}_n_i,_hb,_{a, c_}, d_i,_hb,_{a, c, d_},_n_i,

hd,{},nio

5.3 Algorithm

The 2TSLIA2 (Algorithm 1 on the following page)

takes an ordered alphabetΣand a set of input strings I and returns a TSL grammar _hT, S_i. It has two

main functions:get tier, which calculatesT, and main, which callsget tierand uses the resulting tier to determineS.

First,get tiertakes as arguments an indexi, an

expanded tierTi, and a set of pathsP. The expanded

tier is the algorithm’s hypothesis for the tier at stage

i. It is first called with Ti = Σ (the most

conser-vative hypothesis for the tier) andi = 0. The goal

ofget tieris to whittle downTi, which is the set

of elements known to be in T plus the set of

ele-ments whose membership inT has not yet been

de-termined, down to only elements known to be inT.

The get tier function recursively iterates through Ti, starting with σ0, to determine which

members of Ti should be in the final hypothesis

T. Two other pieces of data are important for this:

PTi, or the tier paths of Ti whose symbols are in

Ti∪ {o,n}, andHi, or the set of non-tier elements

less thanσi. The set Hi is the algorithm’s largest

safe hypothesis for non-tier elements, and it will rea-son about restrictions onσiusing paths fromPTi.

Elements of Σ are checked for membership on

the tier in order; thus, when checkingσi,get tier

has already checked every otherσh forh < i. For

eachσi, membership inT is decided on two

condi-tions labeled (a) and (b) in theif-then statement of get tier. Condition (a) tests whetherσi is a free

element, and condition (b) further tests whether σi

is an exclusive blocker. Ifσi is found to be a free

element and not an exclusive blocker, then it is a non-tier element (see§5.2), and should be removed fromT. In detail:

Condition (a). To test whether σi is a free

ele-ment, this condition checks to see if there are any restrictions on σi given PTi and Hi. It searches

throughPTi forho, X, σii,hσi, X0,ni, and, for all

σ0 _∈ Ti, hσi, Y, σ0i and hσ0, Y0, σii, where the

in-tervening setsX, X0, Y, Y0 are all subsets ofHi. If

all of these appear inPTi, it means thatσi is a free

(7)

Data: An alphabetΣ ={σ0, σ1, ..., σn}; finite

setIof input strings inΣ∗

Result: AT SL2grammarhT, Si

functionget tier(Ti, P, i):

InitializePTi ={hτ1, X, τ2i ∈P|τ1, τ2∈ Ti∪ {o,n}};

InitializeHi = Σ−Ti;

fori_≤ndo

Letσ =σi;

if a.)∃X, X0_⊆Hisuch that

ho, X, σ_i,_hσ, X0,n_{i ∈}PTiand

∀σ0∈Ti,∃Y, Y0⊆Hisuch that

hσ, Y, σ0_i,_hσ0, Y0, σ_{i ∈}PTi and

b.) for eachhτ1, Z, τ2i ∈PTi with

τ1, τ2 ∈((Ti∪ {o,n})− {σ}), σ∈

Z, Z_{− {}σ_{} ⊆}Hi,∃Z0 ⊆Hisuch that

hτ1, Z0, τ2i ∈PTithen

Returnget tier(Ti−

{σ_}, PTi, i+ 1);

else

i = i+1

end end

ReturnhTi, PTii;

functionmain(I,Σ):

InitializeP =paths2(I);

InitializeS=∅;

SetT, PT =get tier(Σ, P,0);

forp=_hτ1, X, τ2i ∈PT do

ifX⊆Σ−T then

Addτ1τ2 toS;

end end

ReturnhT, Si;

Algorithm 1: The TSL2 Learning Algorithm

(2TSLIA)

Example 3. Let Σ = _{a, b, c, d_} from Example 2, with that order, and let I = {aaa, bab, cac, dad}. In the initial condition of the algorithm, i = 0, so σi =a,Ti = Σ,Hi =∅, andPTi =paths2(I).

Because Hi = ∅, a satisifes condition (a)

if ho,_∅, a_{i ∈} PTi, ha,∅,ni ∈ PTi, and for

each σ ∈ Ti = {a, b, c, d}, hσ,∅, ai ∈ PTi and

ha,_∅, σ_{i ∈} PTi. This is true given this

partic-ular I, because ho,_∅, a_i,_ha,_∅,_n_i,_ha,_∅, a_i,_∈

paths2(aaa), hb,∅, ai,ha,∅, bi ∈ paths2(bab),

hc,∅, ai,ha,∅, ci ∈ paths2(cac), and

hd,_∅, a_i,_ha,_∅, d_{i ∈}paths2(dad).

Condition (b). This condition checks whetherσi

is an exclusive blocker, and thus should not be re-moved from Ti. It does this by looking for a pair

τ1, τ2 of members ofTidistinct fromσiwhose

tier-adjacency is dependent on σi. It searches through

paths of the formhτ1, Z, τ2i, where Z includes σi

and some subset ofHi. For each suchhτ1, Z, τ2i, it

searches for ahτ1, Z0, τ2iwhereZ0 isonlya subset

ofHi. If such ahτ1, Z0, τ2iexists, thenτ1andτ2are

tier-adjacent in the data regardless ofσi. However,

if no such path exists, then it may be thatτ1 andτ2

can only appear withσi in between them, and thus

get tierinfers that σi is an exclusive blocker. If

any such pair is found, then condition (b) fails, and

σimust remain inTi.

Example 4. Continuing with Example 3, a would not satisfy condition (b) given I. Given that i = 0, in order for a to satisfy condition (b), for all τ1, τ2 ∈ {o, b, c, d,n}, if hτ1,{a}, τ2i is in PTi,

thenhτ1,∅, τ2i must also be inPTi. For thisI, this

is false for τ1 = τ2 = b, because hb,{a}, bi ∈ paths2(bab), but hb,∅, bi 6∈ paths2(I). Thus

get tierwould infer thatais an exclusive blocker. However, the reader can confirm that a would satisfy condition (b) for an input I0 = I _∪

{λ, bb, cc, dd_}.

If both conditions (a) and (b) are met, then the algorithm determines that σi should not appear on

the final hypothesisT for the tier, and soget tier is recursively called with Ti − {σi} and i+ 1 as

arguments. Because the hypothesis for the tier has changed,PTiis also passed, so on the next call when

get tier calculates PTi+1, it only has to search

throughPTiinstead of the full set of pathsP.

If neither condition is met, no change is made toTi,iis increased, and the next σi is checked by

continuing through thefor loop. This process is re-peated for each member of the alphabet, after which

Ti is returned as the final answer for the tier. Its tier

pathsPTi are also returned.

The function main calls get tier and takes its result asT andPT. Using this, it finds eachτ1τ2 ∈ S. It does this by searching throughPT and finding

(8)

sub-set of the non-tier elementsΣ₋T. Suchτ1τ2pairs

are thus tier-adjacent inI. The resulting grammar

hT, S_iis then returned.

5.4 Identification in polynomial time and data

Here we establish the main result of the paper that 2TSLIA identifies the TSL2 class in polynomial

time and data. As is typical, the proof relies on a establishing a characteristic sample for the 2TSLIA.

Lemma 1 establishes 2TSLIA runs efficiently.

Lemma 1. Given an input sample I of size n, 2TSLIA outputs a grammar in time polynomial in the size ofn.

Proof. The paths are calculated for each word once at the beginning ofmain. This takes time quadratic in the size of the sample (Remark 1), soO(n2). Call

this set of pathsP (noteP is also bounded in size

byn2_).

Additionally, the loop in get tier is called exactly |Σ_| times. Checking condition (a) in

get tier requires a single pass through the paths. On other hand, condition (b) requires one search throughP for every path elementp_∈Pin the worst

case, which isO(n4). Thus the time complexity for

get tierisO(|Σ|(n2+n4)) =O(n4)since|Σ|is a constant.

Lastly, finding the permissible substrings on the tier also requires a single pass throughP, which also

takesO(n2₎_{. Altogether then an upper bound on the}

time complexity of 2TSLIA is given byO(n2+n4+ n2) =_O(n4), which is polynomial.

Next we define adistinguishing samplefor a tar-get language for the 2TSLIA. We first show it is polynomial in the size of the target grammar. We then show that it is a characteristic sample.

Definition 6(Distinguishing sample). Given an al-phabetΣ, with some orderσ1, σ2, ..., σn on Σ, the

target languageL_∈TSL2, and the canonical gram-marG = hT, Si forL, a distinguishing sampleD forGis the set meeting the following conditions. Re-call thatH = Σ₋T are the non-tier elements and Hi refers to the non-tier elements less thanσi.

1. The non-tier element condition. For all non-tier elementsσi ∈H,

i. ∃w1, w2 ∈ D s.t. ho, X, σi ∈

paths2(w1) and hσ, X0,ni ∈

paths2(w2),X, X0 ⊆Hi

∀σ0 _∈ Ti, ∃v1, v2 ∈ D s.t.

hσ, Y, σ0i ∈ paths2(v1) and

hσ0, Y0, σ_{i ∈}paths2(v2),Y, Y0 ⊆Hi.

ii. ∀τ1τ2∈(facTi-2((Σ−{σi})∗)−R),w∈

Ds.t. hτ1, X, τ2i ∈paths2(w),X ⊆Hi.

2. The exclusive blocker condition. For each σi ∈ T which is an exclusive blocker,w ∈ D

s.t. hτ1, X, τ2i ∈paths2(w), whereτ1τ2 ∈ R, τ16=σi,τ2 6=σi,σi ∈X, andX−σi ⊆Hi

3. The allowed tier substring condition. ∀τ1τ2 ∈ S,somews.t. _hτ1, X, τ2i ∈paths2(w)where

X⊆H

Essentially, item (1) ensures that all symbols σi

not on the targetT will meet conditions (a) and (b)

in the for loop and be removed from the algorithm’s hypothesis forT. Item (2) ensures that, in the

situa-tion an exclusive blockerσi ∈T meets condition (a)

for removal from the tier hypothesis, it will not meet condition (b). Item (3) ensures that the sample con-tains everyτ1τ2inS. These points will be discussed

more detail in the proof that the distinguishing ex-ample is a characteristic sex-ample.

Lemma 2. Given a grammar G = _hT, S_i whose size is|T_|+_||S_||, the size of a distinguishing sample DforGis polynomial in the size ofG.

Proof. Recall that T _⊆ Σ and S _⊆ Σ2_on, that H = Σ₋T andR = Σ2 ₋_S_{. The non-tier}

el-ement condition requires that for eachσ ∈ H and σ0 _∈ T, the sample minimally contains the words σ, σ0σ, andσ0σ, whose total length is|H_|+2_|H_||T_|. The exclusive blocker condition requires for each exclusive blocker σ _∈ T and each τ1τ2 ∈ R that

minimallyτ1στ2is contained in the sample. Letting B denote the set of exclusive blockers, we have the

total length of words in the characteristic sample is ||B_{|| × ||}R_||. Finally, the allowed tier substring con-dition is requires for eachτ1τ2 ∈ S that minimally τ1τ2is contained in the sample. Hence, the length of

these words equals|S_|.

Altogether this means there is a characteristic sampleDsuch that_||D_||=_|H_|+ 2_|H_||T_|+_||B_{|| ×}

(9)

Σ2_, _||_D_{|| ≤ |}_Σ_| _{+ 2}_|_Σ_||_Σ_|₊ _|_Σ_||_Σ_|2 ₊_|_Σ_|2 ₌

|Σ| + 3|Σ|2 ₊ _|_Σ_|3_{. Thus,} _||_D_|| _{is bounded by}

O(_|Σ_|3)which, since_|Σ_|is a constant, is effectively

O(1).

Next we prove Lemma 3, which shows that the tier conjectured by the 2TSLIA at step iis correct

for all symbols up toσi. Thus, once all symbols are

treated by the 2TSLIA, its conjecture for the tier is correct. LetT_i0correspond to the algorithm’s current

tier hypothesis whenσi is being checked in the for

loop of the get tier function, let H_i0 = Σ−T_i0

be the algorithm’s hypothesis for the set of non-tier elements less thanσi, and letPT0

i be the set of paths

under consideration (i.e., the set of paths from the initialization ofPTi before thefor loop). As above,

τ0τ1...τmindex positions in a string or path.

Lemma 3. LetΣ ={σ0, . . . , σn}and consider any

G= _hT, S_i. Given any finite input sampleI which contains a distinguishing sampleDforG, it is the case that for alli(0≤i≤n),T_i0 =Ti.

Proof. The proof is by recursion oni. The base case

is wheni = 0. By definition, T0 = Σ. The

algo-rithm starts withT₀0 = Σ, soT₀0 =T0.

Next we assume the recursive hypothesis (RH): for somei_∈_NthatT0

i =Ti. We prove thatTi0+1= Ti+1. Specifically, we show that if RH is true for i, then (Case 1) σi ∈ Hi+1 impliesσi ∈ Hi0+1 and

(Case 2)σi 6∈Hi+1impliesσi 6∈Hi0+1.

Case 1. This is the case in whichσi is a non-tier

element. The non-tier element condition in Defini-tion 6 forDensures that the data inIwill meet both

conditions (a) and (b) in theforloop inget tier()

for removingσifrom the tier hypothesis.

Condition (a) requires thatho, X, σii ∈ PT_i0 and

hσi, X0,ni ∈ PT0

i for some X, X

0 _⊆ _H0

i, and

∀σ0 _∈ Ti,hσi, Y, σ0i ∈ PT0

i andhσ

0_{, Y}0_{, σ}

ii ∈ PT0 i

for some Y, Y0 ⊆ H_i0. Part (i) of the non-tier

ele-ment condition in Defintion (6) ensures that forσi,

there are wordsw1, w2 ∈ I such thatho, X, σii ∈

paths2(w1),hσi, X0,ni ∈paths2(w2), and, for all σ0 _∈ Ti, v1, v2 ∈ I s.t. hσi, Y, σ0i ∈ paths2(v1)

andhσ0, Y0, σii ∈ paths2(v2), where the

interven-ing setsX, X0, Y, Y0are all subsets ofHi. Because

by RHH_i0 =Hi, condition (a) for removingσifrom

the tier hypothesis is satisfied.

For σi to satisfy condition (b), for any path

hτ1, X, τ2i ∈PT_i0 such thatσi ∈XandX− {σi} ⊆

H_i0, there must be another path _hτ1, X0, τ2i ∈ PT0 i

whereX0 _⊆H_i0. Ifτ1τ2∈R, then such ahτ1, X, τ2i

is guaranteed not to exist in P_T0

i, because τ1 and

τ2 will, by the definition of R, not appear in the

data with only non-tier elements between them. For

τ1τ2 6∈ R, part (ii) of the nontier element

condi-tion in Definicondi-tion 6 ensures that some hτ1, Y, τ2i

wher e Y _⊆ Hi exists in PT0

i, as it requires that

for each suchτ1, τ2 there is somew ∈ I such that

hτ1, X, τ2i ∈ paths2(w) where X ⊆ Hi. By

hy-pothesisH_i0 = Hi, and so there is always

guaran-teed to be somehτ1, X0, τ2i ∈PT_i0 whereX0 ⊆Hi0.

Thus, condition (b) will always be satisfied forσi.

Thus, assuming the RH,σi ∈Hi+1is guaranteed

to satisfy conditions (a) and (b) and be removed from the algorithm’s hypothesis for the tier, soσi ∈Hi0+1.

Case 2. This is the case in whichσi ∈T. There

are two mutually exclusive possibilities. The first possibility is that σi is not a free element. Here,

σi is guaranteed to not be taken off of the tier

hy-pothesis, because condition (a) for removing a sym-bol from the tier requires that there exists some path hσj, X, σii and hσi, X0, σji where X, X0 ⊆ H_i0.

From the definition of a TSL grammar, there will ex-ist nohσj, Y, σiiandhσi, Y0, σji, whereY, Y0 ⊆H,

in the paths ofI. BecauseHi ⊆H and by

hypoth-esisH_i0 = Hi, Hi0 ⊆ H, so the algorithm will

cor-rectly register that σjσi ∈ R or σiσj ∈ R and σi

will remain on the tier hypothesis. Thusσi6∈H_i0₊₁.

The other possibility is that σi is an exclusive

blocker. If so, σi may satisfy condition (a) for tier

removal (as discussed above for the case in which

σi ∈ H). However, the exclusive blocker

condi-tion in Definicondi-tion 6 for D guarantees that σi will

not meet condition (b). As discussed above, con-dition (b) will fail if there is a hτ1, X, τ2i ∈ PT0 i

such thatX includesσi and zero or more elements

ofH_i0and no other pathhτ1, X0, τ2i ∈PT_i0whereX0

only includes elements ofH_i0. The exclusive blocker

condition requires that there will be someτ1τ2 ∈R

such that there is a wordw ∈ D s.t. hτ1, Y, τ2i ∈ paths2(w) where σi ∈ Y and Y − {σi} = Hi.

From the definition of a TSL grammar, now such thathτ1, Y0, τ2i ∈ paths2(w) whereYi ⊆ Hi will

appear in I, because Hi ⊆ H. Because by

(10)

hτ1, Y, τ2iand also find that there is nohτ1, Y0, τ2i.

Thus whenσi is an exclusive blocker, it will not be

removed from the tier hypothesis, andσi6∈Hi0.

We now know that both Cases (1) and (2) are true; thus, assuming the RH,Hi+1 = Hi0+1. Thus,

by induction, (_∀i)[Hi = Hi0]. Because Hi and

H0

i are the complements ofTi andTi0, respectively,

(∀i)[Ti =Ti0].

Lemma 4 (Distinguishing sample = characteristic

sample). A distinguishing sample for a TSL2 lan-guage as defined in Definition 6 is a characteristic samplefor that language for the 2TSLIA.

Proof. As above, letG0 = (T0, S0)be the output of the algorithm. From Lemma 3, we know that for any language L(G), G = _hT, S_i, and a characteristic

sampleDforG, given any sampleI ofLsuch that

D ⊆ I, (∀i)[Ti = Ti0]. ThatT0 = T immediately

follows from this fact.

That S0 ₌ _S _{follows directly from} _T0 ₌ _T _and

the allowed tier substring condition of Definition 6 for D. The allowed tier substring condition states

that for allτ1τ2 ∈S, the distinguishing sample will

contain somews.t. hτ1, X, τ2i ∈ paths2(w)where X _⊆ H. Because T = T0, the for loop of main will correctly find all suchτ1τ2. Thus,S =S0, and G=G0.

Theorem 1(Identification in the limit in polynomial time and data). The 2TSLIA identifies the TSLk

lan-guages in polynomial time and data.

Proof. Immediate from Lemmas 1, 2, and 4. We note further that in the worst case the time complexity is polynomial (degree four Lemma 1) and the data complexity is constant (Lemma 2).

6 Discussion

This algorithm opens up multiple paths for future work. The most immediate theoretical question is whether the algorithm here can be (efficiently) gen-eralized to any TSL class as long as the learner knows ka priori. We believe that it can. The

no-tion of2-paths can be straightforwardly extended to k such that ak-path is a2k₋1 tuple of the form hσ0, X0, σ1, X1, ..., Xk−1, σki, where each set Xi

represents the symbols between σi andσi+1. The

algorithm presented here can then be modified to

check a set of suchk-paths. We believe such an al-gorithm could be shown to be provably correct using a proof of similar structure to the one here, although time and data complexity will likely increase.

However, in terms of applying these algorithms to natural language phonology, it is likely that ak

value of2is sufficient. Heinz et al. (2011) argue that

TSL2 can describe both long-distance dissimilation

and assimilation patterns. One potential exception to this claim comes from Sundanese, where whether liquids must agree or disagree partly depends on the syllable structure (Bennett, 2015).

Additional issues arise with natural language data. One is that natural language corpora often include some exceptions to phonotactic generalizations. Al-gorithms which take as input such noisy data and output grammars that are guaranteed to be represen-tations of languages ‘close’ to the target language have been obtained and studied in the PAC learning paradigm (Angluin and Laird, 1988). It would be in-teresting to apply such techniques and other similar ones to adapt 2TSLIA into an algorithm that remains effective despite noisy input data.

Another area of future research is how to gen-eralize over multiple tiers. Jardine (2016), in run-ning versions of the 2TSLIA on natural language corpora, shows that it fails because local dependen-cies (which again can be modeled whereT = Σ)

prevent crucial information in the CS from appear-ing in the data. Furthermore, natural languages can have separate long-distance phonotactics which hold over distinct tiers. For example, KiYaka has both a vowel harmony pattern (Hyman, 1998) and a liquid-nasal harmony pattern over the tier {l,m,n,N} (Hy-man, 1995). Thus, words in KiYaka exhibit a pattern corresponding to theintersectionof two TSL2

gram-mars, one with a vowel tier and one with a nasal-liquid tier. The problem of learning a finite inter-section of TSL2 languages is thus another relevant

learning problem.

(11)

can be efficiently learned, building on ideas on the learning of SL functions. An open question, then, is how these ideas can be used to develop a functional version of the TSL languages to model long-distance processes. The central result in this paper may then help to understand how the tiers over which these processes apply can be learned.

7 Conclusion

This paper has presented an algorithm which can learn a grammar for any TSL2 language in time

polynomial in the size of the input sample, whose size is bounded by a constant. As the TSL2

lan-guages can model long-distance phonotactics in nat-ural language, this represents a step towards under-standing how humans internalize such patterns.

Acknowledgments

We thank R´emi Eyraud, Kevin McMullin, and two anonymous reviewers for useful comments. This re-search was supported by NSF#1035577.

References

Dana Angluin and Philip Laird. 1988. Learning from noisy examples. Machine Learning, 2:343–370. William G. Bennett. 2013. Dissimilation, Consonant

Harmony, and Surface Correspondence. Ph.D. thesis, Rutgers, the State University of New Jersey.

William G. Bennett. 2015. Assimilation, dissimilation, and surface correspondence in sundanese. Natural Language and Linguistic Theory, 33(2):371–415. Pascal Caron. 2000. Families of locally testable

lan-guages. Theoretical Computer Science, 242:361–376. Jane Chandlee, R´emi Eyraud, and Jeffrey Heinz. 2014. Learning Strictly Local Subsequential Functions. Transactions of the Association for Computational Linguistics, 2:491–503.

Jane Chandlee, R´emi Eyraud, and Jeffrey Heinz. 2015. Output Strictly Local functions. InProceedings of the 14th Meeting on the Mathematics of Language (MoL 14), Chicago, IL, July.

Jane Chandlee. 2014. Strictly Local Phonological Pro-cesses. Ph.D. thesis, University of Delaware.

Noam Chomsky and Morris Halle. 1965. Some contro-versial questions in phonological theory. Journal of Linguistics, 1(2):pp. 97–138.

George N. Clements and Engin Sezer. 1982. Vowel and consonant disharmony in Turkish. In Harry van der Hulst and Norval Smith, editors, The Structure of

Phonological Representations (Part II). Foris, Dor-drecht.

Eung-Do Cook. 1984. A Sarcee Grammar. Vancouver: University of British Columbia Press.

Colin de la Higuera. 1997. Characteristic sets for poly-nomial grammatical inference. Machine Learning, 27(2):125–138.

Colin de la Higuera. 2010. Grammatical Inference: Learning Automata Grammars. Cambridge University Press.

Pedro Garc´ıa, Enrique Vidal, and Jos´e Oncina. 1990. Learning locally testable languages in the strict sense. InProceedings of the Workshop on Algorithmic Learn-ing Theory, pages 325–338.

Mark E. Gold. 1967. Language identification in the limit. Information and Control, 10:447–474.

John Goldsmith and Jason Riggle. 2012. Information theoretic approaches to phonological structure: The case of Finnish vowel harmony. Natural Language & Linguistic Theory, 30(3):859–896.

John Goldsmith and Aris Xanthos. 2009. Learning phonological categories.Language, 85(1):4–38. Jeffrey Heinz and James Rogers. 2013. Learning

sub-regular classes of languages with factored determinis-tic automata. In Andras Kornai and Marco Kuhlmann, editors,Proceedings of the 13th Meeting on the Math-ematics of Language (MoL 13), pages 64–71, Sofia, Bulgaria, August. Association for Computational Lin-guistics.

Jeffrey Heinz, Chetan Rawal, and Herbert G. Tanner. 2011. Tier-based strictly local constraints for phonol-ogy. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 58–64, Portland, Oregon, USA, June. Association for Computational Linguistics.

Jeffrey Heinz, Anna Kasprzik, and Timo K¨otzing. 2012. Learning with lattice-structured hypothesis spaces. Theoretical Computer Science, 457:111–127, October. Jeffrey Heinz. 2010a. Learning long-distance

phonotac-tics.Linguistic Inquiry, 41:623–661.

Jeffrey Heinz. 2010b. String extension learning. In Pro-ceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 897–906. Asso-ciation for Computational Linguistics, July.

Larry Hyman. 1995. Nasal consonant harmony at a dis-tance: The case of Yaka. Studies in African Linguis-tics, 24:5–30.

Larry Hyman. 1998. Positional prominence and the ‘prosodic trough’ in Yaka.Phonology, 15:14–75. Adam Jardine, Jane Chandlee, R´emi Eyraud, and Jeffrey

(12)

on Grammatical Inference (ICGI 2014), JMLR Work-shop Proceedings, pages 94–108.

Adam Jardine. 2016. Learning tiers for long-distance phonotactics. In Laurel Perkins, Rachel Dudley, Ju-liana Gerard, and Kasia Hitczenko, editors, Proceed-ings of the 6th Conference on Generative Approaches to Language Acquisition North America (GALANA 2015).

John Jensen. 1974. Variables in phonology. Language, 50:675–686.

Regine Lai. 2013. Domain Specificity in Learning Phonology. Ph.D. thesis, University of Delaware. Regine Lai. 2015. Learnable versus unlearnable

har-mony patterns.Linguistic Inquiry, 46(3):425–451. Kevin McMullin and Gunnar ´Olafur Hansson.

forth-coming. Long-distance phonotactics as Tier-Based Strictly 2-Local languages. InProceedings of the An-nual Meeting on Phonology 2015.

Robert McNaughton and Seymour Papert. 1971. Counter-Free Automata. MIT Press.

Andrew Nevins. 2010. Locality in Vowel Harmony. Number 55 in Linguistic Inquiry Monographs. MIT Press.

David Odden. 1994. Adjacency parameters in phonol-ogy. Language, 70(2):289–330.

Catherine Ringen. 1975. Vowel Harmony: Theoretical Implications. Ph.D. thesis, Indiana University. James Rogers and Marc D. Hauser. 2010. The use of

formal language theory in studies of artificial language learning: A proposal for distinguishing the differences between human and nonhuman animal learners. In Harry van der Hulst, editor, Recursion and Human Language (Studies in Generative Grammar [SGG]) 104. de Gruyter.

James Rogers and Geoffrey Pullum. 2011. Aural pattern recognition experiments and the subregular hierarchy. Journal of Logic, Language and Information, 20:329– 342.

James Rogers, Jeffrey Heinz, Gil Bailey, Matt Edlefsen, Molly Visscher, David Wellcome, and Sean Wibel. 2010. On languages piecewise testable in the strict sense. In Christian Ebert, Gerhard J¨ager, and Jens Michaelis, editors,The Mathematics of Language, vol-ume 6149 of Lecture Notes in Artifical Intelligence, pages 255–265. Springer.

James Rogers, Jeffrey Heinz, Margaret Fero, Jeremy Hurst, Dakotah Lambert, and Sean Wibel. 2013. Cog-nitive and sub-regular complexity. InFormal Gram-mar, volume 8036 ofLecture Notes in Computer Sci-ence, pages 90–108. Springer.

Sharon Rose and Rachel Walker. 2004. A typology of consonant agreement as correspondence. Language, 80:475–531.