• No results found

Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints

N/A
N/A
Protected

Academic year: 2021

Share "Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints"

Copied!
85
0
0

Loading.... (view fulltext now)

Full text

(1)

Gradient exceptionality in Maximum Entropy

Grammar with lexically specific constraints

Claire Moore-Cantwell Joe Pater

University of Massachusetts Amherst

Exceptionality Workshop OCP, Barcelona Jan. 27, 2015

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 1 / 30

(2)

The problem

The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity).

A specific example: degrees of regularity / exceptionality in stress placement

Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:

(3)

The problem

The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity). A specific example: degrees of regularity / exceptionality in stress placement

Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:

French, Finnish, Hungarian > Polish > Spanish

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 2 / 30

(4)

The problem

The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity). A specific example: degrees of regularity / exceptionality in stress placement

Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:

(5)

The problem

Fidelholtz (1979: 58):

It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.

When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).

In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.

In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 3 / 30

(6)

The problem

Fidelholtz (1979: 58):

It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.

When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).

In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.

In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.

(7)

The problem

Fidelholtz (1979: 58):

It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.

When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).

In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.

In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 3 / 30

(8)

The problem

Fidelholtz (1979: 58):

It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.

When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).

In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.

In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.

(9)

The problem

Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.)

Correct response: 1 1 2 2 1

Also tested using a segmental contrast: m´uku vs. m´unu

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 4 / 30

(10)

The problem

Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.) Correct response: 1 1 2 2 1

(11)

The problem

Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.) Correct response: 1 1 2 2 1

Also tested using a segmental contrast: m´uku vs. m´unu

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 4 / 30

(12)
(13)

The problem

The formal descriptions of both languages require lexical marking at least one of the patterns.

In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014

The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar

This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 6 / 30

(14)

The problem

The formal descriptions of both languages require lexical marking at least one of the patterns.

In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014

The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar

This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)

(15)

The problem

The formal descriptions of both languages require lexical marking at least one of the patterns.

In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014

The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar

This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 6 / 30

(16)

The problem

There have been a number of proposals addressing the general problem of gradient productivity in the recent generative literature – ours builds on those using:

Stochastic grammars: Zuraw (2000) using Boersma’s

Stochastic OT, Ernestus and Baayen (2003) using Stochastic OT and MaxEnt, Hayes and Wilson (2008) and Hayes, Zuraw,

Sipt´ar and Londe (2009) using MaxEnt

(17)

The problem

There have been a number of proposals addressing the general problem of gradient productivity in the recent generative literature – ours builds on those using:

Stochastic grammars: Zuraw (2000) using Boersma’s

Stochastic OT, Ernestus and Baayen (2003) using Stochastic OT and MaxEnt, Hayes and Wilson (2008) and Hayes, Zuraw,

Sipt´ar and Londe (2009) using MaxEnt

Indexed constraints: Pater (2005) and Becker (2009)

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 7 / 30

(18)

The problem

Challenges:

Stochastic grammar approaches sometimes conflate

within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)

Indexed constraint approaches require a special calculation to get an influence of degree of regularity

We aim to avoid these problems by synthesizing the two approaches

(19)

The problem

Challenges:

Stochastic grammar approaches sometimes conflate

within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)

Indexed constraint approaches require a special calculation to get an influence of degree of regularity

We aim to avoid these problems by synthesizing the two approaches

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 8 / 30

(20)

The problem

Challenges:

Stochastic grammar approaches sometimes conflate

within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)

Indexed constraint approaches require a special calculation to get an influence of degree of regularity

We aim to avoid these problems by synthesizing the two approaches

(21)

Our proposal illustrated with stress

We will show that when lexically specific constraints are

incorporated into a MaxEnt framework (Goldwater and Johnson 2003), the outcome of learning does yield a grammatical difference between a language like Polish and one like English

H = weighted sum of violations (HG Harmony) p is proportional to exp(H)

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 9 / 30

(22)

Our proposal illustrated with stress

We will show that when lexically specific constraints are

incorporated into a MaxEnt framework (Goldwater and Johnson 2003), the outcome of learning does yield a grammatical difference between a language like Polish and one like English

H = weighted sum of violations (HG Harmony) p is proportional to exp(H)

(23)

Our proposal illustrated with stress

We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items

Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern

When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh

When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully

When a pattern is relatively rare, the general constraint will wind up with a relatively low weight

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30

(24)

Our proposal illustrated with stress

We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items

Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern

When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh

When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully

When a pattern is relatively rare, the general constraint will wind up with a relatively low weight

(25)

Our proposal illustrated with stress

We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items

Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern

When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh

When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully

When a pattern is relatively rare, the general constraint will wind up with a relatively low weight

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30

(26)

Our proposal illustrated with stress

We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items

Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern

When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh

When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully

When a pattern is relatively rare, the general constraint will wind up with a relatively low weight

(27)

Our proposal illustrated with stress

We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items

Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern

When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh

When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully

When a pattern is relatively rare, the general constraint will wind up with a relatively low weight

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30

(28)

Our proposal illustrated with stress

An illustration: a set of simple stress ‘languages’ with varying degrees of regularity

Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)

There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages

(29)

Our proposal illustrated with stress

An illustration: a set of simple stress ‘languages’ with varying degrees of regularity

Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)

There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages

100 lexically specific versions of Align-L, and 100 of Align-R

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 11 / 30

(30)

Our proposal illustrated with stress

An illustration: a set of simple stress ‘languages’ with varying degrees of regularity

Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)

There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages

(31)

Our proposal illustrated with stress

An illustration: a set of simple stress ‘languages’ with varying degrees of regularity

Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)

There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages

100 lexically specific versions of Align-L, and 100 of Align-R

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 11 / 30

(32)

Our proposal illustrated with stress

Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1

Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)

Used for convenience – closely approximates results of regular GLA

(33)

Our proposal illustrated with stress

Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1

Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)

Used for convenience – closely approximates results of regular GLA

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 12 / 30

(34)

Our proposal illustrated with stress

Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1

Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)

Used for convenience – closely approximates results of regular GLA

(35)

Our proposal illustrated with stress

Given lexical irregularity, how do we find out what the grammar does?

One increasingly popular answer: Wug test / nonce word judgment.

A wug test in our model means that the lexically specific constraints have no influence

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 13 / 30

(36)

Our proposal illustrated with stress

Given lexical irregularity, how do we find out what the grammar does?

One increasingly popular answer: Wug test / nonce word judgment.

A wug test in our model means that the lexically specific constraints have no influence

(37)

Our proposal illustrated with stress

Given lexical irregularity, how do we find out what the grammar does?

One increasingly popular answer: Wug test / nonce word judgment.

A wug test in our model means that the lexically specific constraints have no influence

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 13 / 30

(38)
(39)

Comparison with related approaches

MaxEnt without lexically specific constraints

Hayes and Wilson (2008): only applicable to phonotactics

Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and

Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words

The learned grammars in our simulation always gave near 1 probability correct to individual words

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30

(40)

Comparison with related approaches

MaxEnt without lexically specific constraints

Hayes and Wilson (2008): only applicable to phonotactics

Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and

Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words

The learned grammars in our simulation always gave near 1 probability correct to individual words

(41)

Comparison with related approaches

MaxEnt without lexically specific constraints

Hayes and Wilson (2008): only applicable to phonotactics

Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and

Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape

This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words

The learned grammars in our simulation always gave near 1 probability correct to individual words

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30

(42)

Comparison with related approaches

MaxEnt without lexically specific constraints

Hayes and Wilson (2008): only applicable to phonotactics

Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and

Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words

The learned grammars in our simulation always gave near 1 probability correct to individual words

(43)

Comparison with related approaches

MaxEnt without lexically specific constraints

Hayes and Wilson (2008): only applicable to phonotactics

Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and

Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words

The learned grammars in our simulation always gave near 1 probability correct to individual words

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30

(44)

Comparison with related approaches

Lexically specific constraints without MaxEnt

Pater (2005): nonce probabilities generated by considering all possible indexations

Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint

(45)

Comparison with related approaches

Lexically specific constraints without MaxEnt

Pater (2005): nonce probabilities generated by considering all possible indexations

Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 16 / 30

(46)

Comparison with related approaches

Lexically specific constraints without MaxEnt

Pater (2005): nonce probabilities generated by considering all possible indexations

Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint

(47)

Comparison with related approaches

Compared with prior work, this proposal is underdeveloped in terms of being applied to real cases of lexical irregularity and accompanied judgments

As a step forward, and as an illustration of how this model can be applied to alternations, we next turn to modeling of Ernestus and Baayen’s (2003) Dutch data

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 17 / 30

(48)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003):

Word-internal obstruents contrast in voicing

[vErVEid@n] ’widen-inf’

[vErVEit@n] ’reproach-inf’

Word-final devoicing

[vErVEit] ’widen’

(49)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003): Word-internal obstruents contrast in voicing

[vErVEid@n] ’widen-inf’

[vErVEit@n] ’reproach-inf’

Word-final devoicing

[vErVEit] ’widen’

[vErVEit] ’reproach’

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 18 / 30

(50)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003): Word-internal obstruents contrast in voicing

[vErVEid@n] ’widen-inf’

[vErVEit@n] ’reproach-inf’

Word-final devoicing

[vErVEit] ’widen’

(51)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003):

Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends

% voiced in lexicon % voiced in production

p/b 9% 4%

t/d 25% 9%

s/z 33% 23%

f/v 70% 49%

x/G 97% 80%

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 19 / 30

(52)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003):

Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends

% voiced in lexicon % voiced in production

p/b 9% 4%

t/d 25% 9%

s/z 33% 23%

f/v 70% 49%

(53)

Simulating Dutch voicing alternations

The pattern (Ernestus and Baayen, 2003):

Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends

% voiced in lexicon % voiced in production

p/b 9% 4%

t/d 25% 9%

s/z 33% 23%

f/v 70% 49%

x/G 97% 80%

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 19 / 30

(54)

Simulating Dutch voicing alternations

We used six general constraints (a subset of those used by Ernestus and Baayen pg.20):

*VTV Intervocalic obstruents should be voiced

*VDV Intervocalic obstruents should be voiceless

*P[+voice] Labial stops should be voiceless

*T[+voice] Coronal stops should be voiceless

*S[+voice] Sibilants should be voiceless

*F[-voice] Labial fricatives should be voiced

*X[-voice] Velar fricatives should be voiced

(55)

Simulating Dutch voicing alternations

We used six general constraints (a subset of those used by Ernestus and Baayen pg.20):

*VTV Intervocalic obstruents should be voiced

*VDV Intervocalic obstruents should be voiceless

*P[+voice] Labial stops should be voiceless

*T[+voice] Coronal stops should be voiceless

*S[+voice] Sibilants should be voiceless

*F[-voice] Labial fricatives should be voiced

*X[-voice] Velar fricatives should be voiced

n.b. these constraints are shorthand

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 20 / 30

(56)

Simulating Dutch voicing alternations

Each lexical item gets a specific version of one of the most general constraints:

/vErVEid/ *VTV-widen ‘widen’ has intervocalic voicing

(57)

Simulating Dutch voicing alternations

Data used:

Trend % Voicing Total forms

p/b voiceless 9% 230

t/d voiceless 25% 719

s/z voiceless 33% 451

f/v voiced 70% 166

x/G voiced 97% 131

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 22 / 30

(58)

Simulating Dutch voicing alternations

Simulation details:

Gradient Descent, with 10,000 learning epochs (one epoch: one update over the entire input set) Learning rate: 0.01

Regularization: constraint weights based on a Gaussian prior

(59)

Simulating Dutch voicing alternations

Results summary:

Real words are correctly voiced/voiceless % voiced on novel words:

lexicon production (EB) Simulated

p/b 9% 4% 1%

t/d 25% 9% 9%

s/z 33% 23% 18%

f/v 70% 49% 84%

x/G 97% 80% 99%

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 24 / 30

(60)

Simulating Dutch voicing alternations

Results summary:

Real words are correctly voiced/voiceless % voiced on novel words:

lexicon production (EB) Simulated

p/b 9% 4% 1%

t/d 25% 9% 9%

s/z 33% 23% 18%

f/v 70% 49% 84%

(61)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice] - Exceptions: high weights

⇒ conflict with *X[-voice] - Trend-followers: low weights

⇒ concord with *X[-voice]

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30

(62)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice]

- Exceptions: high weights ⇒ conflict with *X[-voice] - Trend-followers: low weights

(63)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice] - Exceptions: high weights

⇒ conflict with *X[-voice] - Trend-followers: low weights

⇒ concord with *X[-voice]

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30

(64)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice] - Exceptions: high weights

⇒ conflict with *X[-voice]

- Trend-followers: low weights ⇒ concord with *X[-voice]

(65)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice] - Exceptions: high weights

⇒ conflict with *X[-voice] - Trend-followers: low weights

⇒ concord with *X[-voice]

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30

(66)

Simulating Dutch voicing alternations

Weights for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)

- High weight on *X[-voice] - Exceptions: high weights

⇒ conflict with *X[-voice] - Trend-followers: low weights

(67)

Simulating Dutch voicing alternations

Probabilities for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug

- Novel words: voiced

- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30

(68)

Simulating Dutch voicing alternations

Probabilities for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug

- Novel words: voiced

- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization

(69)

Simulating Dutch voicing alternations

Probabilities for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug

- Novel words: voiced

- Trend-followers: correctly voiced

- Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30

(70)

Simulating Dutch voicing alternations

Probabilities for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug

- Novel words: voiced

- Trend-followers: correctly voiced - Exceptions: errors towards voicing

- Analogous to Polish stress: exceptions exist, but are prone to regularization

(71)

Simulating Dutch voicing alternations

Probabilities for forms with x/G: 97% voiced (131 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug

- Novel words: voiced

- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30

(72)

Simulating Dutch voicing alternations

Weights for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)

- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights

(73)

Simulating Dutch voicing alternations

Weights for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)

- Low-ish weight on *S[+voice]

- Exceptions: high weights - Trend-followers: lower weights

⇒ but higher than *S[+voice]

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 27 / 30

(74)

Simulating Dutch voicing alternations

Weights for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)

- Low-ish weight on *S[+voice] - Exceptions: high weights

- Trend-followers: lower weights ⇒ but higher than *S[+voice]

(75)

Simulating Dutch voicing alternations

Weights for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)

- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights

⇒ but higher than *S[+voice]

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 27 / 30

(76)

Simulating Dutch voicing alternations

Weights for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)

- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights

(77)

Simulating Dutch voicing alternations

Probabilities for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug

- Novel words: 20% voiced - Existing words: correct

- Analogous to English stress: each word’s pronounciation is stable

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 28 / 30

(78)

Simulating Dutch voicing alternations

Probabilities for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug

- Novel words: 20% voiced

- Existing words: correct

- Analogous to English stress: each word’s pronounciation is stable

(79)

Simulating Dutch voicing alternations

Probabilities for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug

- Novel words: 20% voiced - Existing words: correct

- Analogous to English stress: each word’s pronounciation is stable

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 28 / 30

(80)

Simulating Dutch voicing alternations

Probabilities for forms with s/z: 33% voiced (451 total)

0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug

- Novel words: 20% voiced - Existing words: correct

- Analogous to English stress: each word’s pronounciation is stable

(81)

Summary

Using lexically specific constraints together with more general constraints like *P[+voice], we can model:

1 Probabilistic lexical trends which are reflected in speakers’

treatment of novel words

2 Categorical behavior of extant words in a language’s lexicon

3 Difference between generalizations with few vs. many

exceptions

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 29 / 30

(82)

Summary

Using lexically specific constraints together with more general constraints like *P[+voice], we can model:

1 Probabilistic lexical trends which are reflected in speakers’

treatment of novel words

2 Categorical behavior of extant words in a language’s lexicon

3 Difference between generalizations with few vs. many

(83)

Summary

Using lexically specific constraints together with more general constraints like *P[+voice], we can model:

1 Probabilistic lexical trends which are reflected in speakers’

treatment of novel words

2 Categorical behavior of extant words in a language’s lexicon

3 Difference between generalizations with few vs. many

exceptions

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 29 / 30

(84)

Summary

Using lexically specific constraints together with more general constraints like *P[+voice], we can model:

1 Probabilistic lexical trends which are reflected in speakers’

treatment of novel words

2 Categorical behavior of extant words in a language’s lexicon

3 Difference between generalizations with few vs. many

(85)

Thank you!

Special thanks are due to Robert Staubs, Sharon Peperkamp, and Emmanuel Dupoux for useful discussion and comments, and to Robert for generously lending his gradient descent script. This material is based upon work supported by the National Science Foundation under Grant No. 1451512 to the first author, Grant BCS-424077 to the University of Massachusetts Amherst, and by the city of Paris through a Research in Paris fellowship to the first author.

Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 30 / 30

References

Related documents

the separation of sulphur was different in the two series of solutions, hence a comparison of the toxicities of such solutions with their thio- sulphuric acid contents might be

Other obstacles to the success of these systems have had to do with personal preference. During fieldwork in the villages of Shishmaref and White Mountain, we found

Azarmanesh and Bil?n EURASIP Journal on Wireless Communications and Networking 2013, 2013 289 http //jwcn eurasipjournals com/content/2013/1/289 RESEARCH Open Access I Q diagram

It may be helpful to give weight to the training programs to be given other branch teachers, about using information search strategies for accessing infor- mation, data, and

This mixer is realized with N anti-parallel diode pairs (APDPs) and designed in self-biased structure to obtain minimum noise figure and conversion loss.. In this paper, a

Collage Captcha: Collage Captcha [8] is another interesting image based Captcha which challenges the image recognition capacity of the human.. Collage is an art of

Agreement to donate samples for stor- age/reuse for future research also had weak statistically significant negative correlation with the participant not being comfortable

Nebraska Growers’ and Crop Consultants’ Knowledge and Implementation of Integrated Pest Management of Western Bean Cutworm.. Westen R. Archibald, 1 , 2 Jeffery D. Bradshaw, 3