Gradient exceptionality in Maximum Entropy
Grammar with lexically specific constraints
Claire Moore-Cantwell Joe Pater
University of Massachusetts Amherst
Exceptionality Workshop OCP, Barcelona Jan. 27, 2015
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 1 / 30
The problem
The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity).
A specific example: degrees of regularity / exceptionality in stress placement
Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:
The problem
The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity). A specific example: degrees of regularity / exceptionality in stress placement
Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:
French, Finnish, Hungarian > Polish > Spanish
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 2 / 30
The problem
The most general form of the problem: the inadequacy of a two-way distinction between rule-governed vs. exceptional (see e.g. Hayes 2008: ch. 9 on gradient productivity). A specific example: degrees of regularity / exceptionality in stress placement
Peperkamp et al. (2010) provide psycholinguistic evidence for the following continuum of regularity, which matches relative degrees of regularity in the lexicons:
The problem
Fidelholtz (1979: 58):
It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.
When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).
In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.
In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 3 / 30
The problem
Fidelholtz (1979: 58):
It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.
When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).
In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.
In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.
The problem
Fidelholtz (1979: 58):
It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.
When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).
In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.
In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 3 / 30
The problem
Fidelholtz (1979: 58):
It appears to be a problem for linguistic theory that there is nothing in the formal description of Polish stress which would indicate that Polish is a ‘penultimate-stress’ language, as compared with the similar rules in English, which is essentially a free-stress language.
When the penultimate syllable is light, stress falls on the penultimate syllable of some English and Polish words (e.g. English ban´ana), and on the antepenult on others (e.g. C´anada).
In English, both patterns are well-attested (Pater 1994), and each word’s pronunciation is stable.
In Polish, there are very few antepenultimately stressed words, and there is frequent regularization to penultimate stress.
The problem
Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.)
Correct response: 1 1 2 2 1
Also tested using a segmental contrast: m´uku vs. m´unu
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 4 / 30
The problem
Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.) Correct response: 1 1 2 2 1
The problem
Peperkamp et al.’s psycholinguistic evidence uses a memory task Sample item: n´umi n´umi num´ı num´ı n´umi OK (ISI 80 msec.) Correct response: 1 1 2 2 1
Also tested using a segmental contrast: m´uku vs. m´unu
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 4 / 30
The problem
The formal descriptions of both languages require lexical marking at least one of the patterns.
In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014
The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar
This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 6 / 30
The problem
The formal descriptions of both languages require lexical marking at least one of the patterns.
In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014
The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar
This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)
The problem
The formal descriptions of both languages require lexical marking at least one of the patterns.
In English it’s hard to say which one - Pater 1994. Rates of attestation of the two patterns and productivity is affected by the quality of the final vowel - Moore-Cantwell 2014
The number of exceptions, few or many, does not affect the grammatical status of a a pattern at all in a standard generative grammar
This is true of the SPE formalism of Fidelholtz’s time, of metrical rule approaches, and of OT accounts with lexically specific constraints (posited independently by Kraska-Szlenk 1995 diss. for Polish and Pater 1995 ROA ms. [2000 Phonology] for English)
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 6 / 30
The problem
There have been a number of proposals addressing the general problem of gradient productivity in the recent generative literature – ours builds on those using:
Stochastic grammars: Zuraw (2000) using Boersma’s
Stochastic OT, Ernestus and Baayen (2003) using Stochastic OT and MaxEnt, Hayes and Wilson (2008) and Hayes, Zuraw,
Sipt´ar and Londe (2009) using MaxEnt
The problem
There have been a number of proposals addressing the general problem of gradient productivity in the recent generative literature – ours builds on those using:
Stochastic grammars: Zuraw (2000) using Boersma’s
Stochastic OT, Ernestus and Baayen (2003) using Stochastic OT and MaxEnt, Hayes and Wilson (2008) and Hayes, Zuraw,
Sipt´ar and Londe (2009) using MaxEnt
Indexed constraints: Pater (2005) and Becker (2009)
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 7 / 30
The problem
Challenges:
Stochastic grammar approaches sometimes conflate
within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)
Indexed constraint approaches require a special calculation to get an influence of degree of regularity
We aim to avoid these problems by synthesizing the two approaches
The problem
Challenges:
Stochastic grammar approaches sometimes conflate
within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)
Indexed constraint approaches require a special calculation to get an influence of degree of regularity
We aim to avoid these problems by synthesizing the two approaches
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 8 / 30
The problem
Challenges:
Stochastic grammar approaches sometimes conflate
within-word variation and generalizations across the lexicon (a challenge first noted in Zuraw 2000)
Indexed constraint approaches require a special calculation to get an influence of degree of regularity
We aim to avoid these problems by synthesizing the two approaches
Our proposal illustrated with stress
We will show that when lexically specific constraints are
incorporated into a MaxEnt framework (Goldwater and Johnson 2003), the outcome of learning does yield a grammatical difference between a language like Polish and one like English
H = weighted sum of violations (HG Harmony) p is proportional to exp(H)
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 9 / 30
Our proposal illustrated with stress
We will show that when lexically specific constraints are
incorporated into a MaxEnt framework (Goldwater and Johnson 2003), the outcome of learning does yield a grammatical difference between a language like Polish and one like English
H = weighted sum of violations (HG Harmony) p is proportional to exp(H)
Our proposal illustrated with stress
We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items
Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern
When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh
When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully
When a pattern is relatively rare, the general constraint will wind up with a relatively low weight
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30
Our proposal illustrated with stress
We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items
Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern
When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh
When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully
When a pattern is relatively rare, the general constraint will wind up with a relatively low weight
Our proposal illustrated with stress
We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items
Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern
When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh
When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully
When a pattern is relatively rare, the general constraint will wind up with a relatively low weight
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30
Our proposal illustrated with stress
We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items
Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern
When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh
When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully
When a pattern is relatively rare, the general constraint will wind up with a relatively low weight
Our proposal illustrated with stress
We assume that the grammar is composed of general constraints, and lexically specific versions of (some of) them, indexed to particular items
Given basic assumptions about how MaxEnt grammars are learned, weights of the general constraints and of the lexically specific ones will vary as a function of the lexical probability of a pattern
When a pattern is relatively general across the lexicon, the general constraint will wind up with a relatively high weigh
When there are relatively few exceptions, the lexically specific constraints encoding them need to have relatively high weight for the words to be pronounced faithfully
When a pattern is relatively rare, the general constraint will wind up with a relatively low weight
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 10 / 30
Our proposal illustrated with stress
An illustration: a set of simple stress ‘languages’ with varying degrees of regularity
Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)
There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages
Our proposal illustrated with stress
An illustration: a set of simple stress ‘languages’ with varying degrees of regularity
Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)
There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages
100 lexically specific versions of Align-L, and 100 of Align-R
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 11 / 30
Our proposal illustrated with stress
An illustration: a set of simple stress ‘languages’ with varying degrees of regularity
Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)
There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages
Our proposal illustrated with stress
An illustration: a set of simple stress ‘languages’ with varying degrees of regularity
Stress can fall in one of two positions, and there is a general constraint preferring each (e.g. Align-L and Align-R)
There are 100 words; languages have stress in one position in 100 - 50 words (fully regular to fully lexical), making 51 languages
100 lexically specific versions of Align-L, and 100 of Align-R
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 11 / 30
Our proposal illustrated with stress
Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1
Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)
Used for convenience – closely approximates results of regular GLA
Our proposal illustrated with stress
Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1
Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)
Used for convenience – closely approximates results of regular GLA
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 12 / 30
Our proposal illustrated with stress
Constraint weights began at zero, and were updated over 1000 epochs of Gradient Descent, with a learning rate of 0.1
Gradient Descent is similar to the Stochastic OT / HG Gradual Learning Algorithm except that it is run in batch -each epoch is one presentation of all the data (see Pater and Staubs 2013: NECPhon)
Used for convenience – closely approximates results of regular GLA
Our proposal illustrated with stress
Given lexical irregularity, how do we find out what the grammar does?
One increasingly popular answer: Wug test / nonce word judgment.
A wug test in our model means that the lexically specific constraints have no influence
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 13 / 30
Our proposal illustrated with stress
Given lexical irregularity, how do we find out what the grammar does?
One increasingly popular answer: Wug test / nonce word judgment.
A wug test in our model means that the lexically specific constraints have no influence
Our proposal illustrated with stress
Given lexical irregularity, how do we find out what the grammar does?
One increasingly popular answer: Wug test / nonce word judgment.
A wug test in our model means that the lexically specific constraints have no influence
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 13 / 30
Comparison with related approaches
MaxEnt without lexically specific constraints
Hayes and Wilson (2008): only applicable to phonotactics
Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and
Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words
The learned grammars in our simulation always gave near 1 probability correct to individual words
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30
Comparison with related approaches
MaxEnt without lexically specific constraints
Hayes and Wilson (2008): only applicable to phonotactics
Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and
Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words
The learned grammars in our simulation always gave near 1 probability correct to individual words
Comparison with related approaches
MaxEnt without lexically specific constraints
Hayes and Wilson (2008): only applicable to phonotactics
Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and
Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape
This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words
The learned grammars in our simulation always gave near 1 probability correct to individual words
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30
Comparison with related approaches
MaxEnt without lexically specific constraints
Hayes and Wilson (2008): only applicable to phonotactics
Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and
Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words
The learned grammars in our simulation always gave near 1 probability correct to individual words
Comparison with related approaches
MaxEnt without lexically specific constraints
Hayes and Wilson (2008): only applicable to phonotactics
Ernestus and Baayen (2003) and Hayes, Zuraw, Sipt´ar and
Londe (2009): grammars generate the same degree of variation for all words of a given morpho-phonological shape This works for modeling the nonce word data, but the grammar needs something else to generate the correct pronunciation of individual words
The learned grammars in our simulation always gave near 1 probability correct to individual words
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 15 / 30
Comparison with related approaches
Lexically specific constraints without MaxEnt
Pater (2005): nonce probabilities generated by considering all possible indexations
Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint
Comparison with related approaches
Lexically specific constraints without MaxEnt
Pater (2005): nonce probabilities generated by considering all possible indexations
Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 16 / 30
Comparison with related approaches
Lexically specific constraints without MaxEnt
Pater (2005): nonce probabilities generated by considering all possible indexations
Becker (2009): nonce probabilities derived from the number of words attached to each indexed constraint
Comparison with related approaches
Compared with prior work, this proposal is underdeveloped in terms of being applied to real cases of lexical irregularity and accompanied judgments
As a step forward, and as an illustration of how this model can be applied to alternations, we next turn to modeling of Ernestus and Baayen’s (2003) Dutch data
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 17 / 30
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003):
Word-internal obstruents contrast in voicing
[vErVEid@n] ’widen-inf’
[vErVEit@n] ’reproach-inf’
Word-final devoicing
[vErVEit] ’widen’
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003): Word-internal obstruents contrast in voicing
[vErVEid@n] ’widen-inf’
[vErVEit@n] ’reproach-inf’
Word-final devoicing
[vErVEit] ’widen’
[vErVEit] ’reproach’
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 18 / 30
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003): Word-internal obstruents contrast in voicing
[vErVEid@n] ’widen-inf’
[vErVEit@n] ’reproach-inf’
Word-final devoicing
[vErVEit] ’widen’
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003):
Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends
% voiced in lexicon % voiced in production
p/b 9% 4%
t/d 25% 9%
s/z 33% 23%
f/v 70% 49%
x/G 97% 80%
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 19 / 30
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003):
Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends
% voiced in lexicon % voiced in production
p/b 9% 4%
t/d 25% 9%
s/z 33% 23%
f/v 70% 49%
Simulating Dutch voicing alternations
The pattern (Ernestus and Baayen, 2003):
Given a neutralized novel form, Dutch speakers can ’guess’ a voicing for the word-internal obstruent based on lexical trends
% voiced in lexicon % voiced in production
p/b 9% 4%
t/d 25% 9%
s/z 33% 23%
f/v 70% 49%
x/G 97% 80%
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 19 / 30
Simulating Dutch voicing alternations
We used six general constraints (a subset of those used by Ernestus and Baayen pg.20):
*VTV Intervocalic obstruents should be voiced
*VDV Intervocalic obstruents should be voiceless
*P[+voice] Labial stops should be voiceless
*T[+voice] Coronal stops should be voiceless
*S[+voice] Sibilants should be voiceless
*F[-voice] Labial fricatives should be voiced
*X[-voice] Velar fricatives should be voiced
Simulating Dutch voicing alternations
We used six general constraints (a subset of those used by Ernestus and Baayen pg.20):
*VTV Intervocalic obstruents should be voiced
*VDV Intervocalic obstruents should be voiceless
*P[+voice] Labial stops should be voiceless
*T[+voice] Coronal stops should be voiceless
*S[+voice] Sibilants should be voiceless
*F[-voice] Labial fricatives should be voiced
*X[-voice] Velar fricatives should be voiced
n.b. these constraints are shorthand
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 20 / 30
Simulating Dutch voicing alternations
Each lexical item gets a specific version of one of the most general constraints:
/vErVEid/ *VTV-widen ‘widen’ has intervocalic voicing
Simulating Dutch voicing alternations
Data used:
Trend % Voicing Total forms
p/b voiceless 9% 230
t/d voiceless 25% 719
s/z voiceless 33% 451
f/v voiced 70% 166
x/G voiced 97% 131
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 22 / 30
Simulating Dutch voicing alternations
Simulation details:
Gradient Descent, with 10,000 learning epochs (one epoch: one update over the entire input set) Learning rate: 0.01
Regularization: constraint weights based on a Gaussian prior
Simulating Dutch voicing alternations
Results summary:
Real words are correctly voiced/voiceless % voiced on novel words:
lexicon production (EB) Simulated
p/b 9% 4% 1%
t/d 25% 9% 9%
s/z 33% 23% 18%
f/v 70% 49% 84%
x/G 97% 80% 99%
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 24 / 30
Simulating Dutch voicing alternations
Results summary:
Real words are correctly voiced/voiceless % voiced on novel words:
lexicon production (EB) Simulated
p/b 9% 4% 1%
t/d 25% 9% 9%
s/z 33% 23% 18%
f/v 70% 49% 84%
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice] - Exceptions: high weights
⇒ conflict with *X[-voice] - Trend-followers: low weights
⇒ concord with *X[-voice]
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice]
- Exceptions: high weights ⇒ conflict with *X[-voice] - Trend-followers: low weights
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice] - Exceptions: high weights
⇒ conflict with *X[-voice] - Trend-followers: low weights
⇒ concord with *X[-voice]
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice] - Exceptions: high weights
⇒ conflict with *X[-voice]
- Trend-followers: low weights ⇒ concord with *X[-voice]
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice] - Exceptions: high weights
⇒ conflict with *X[-voice] - Trend-followers: low weights
⇒ concord with *X[-voice]
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 25 / 30
Simulating Dutch voicing alternations
Weights for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *X[-voice] *VTVi *VDVj (exceptions)
- High weight on *X[-voice] - Exceptions: high weights
⇒ conflict with *X[-voice] - Trend-followers: low weights
Simulating Dutch voicing alternations
Probabilities for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug
- Novel words: voiced
- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30
Simulating Dutch voicing alternations
Probabilities for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug
- Novel words: voiced
- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization
Simulating Dutch voicing alternations
Probabilities for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug
- Novel words: voiced
- Trend-followers: correctly voiced
- Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30
Simulating Dutch voicing alternations
Probabilities for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug
- Novel words: voiced
- Trend-followers: correctly voiced - Exceptions: errors towards voicing
- Analogous to Polish stress: exceptions exist, but are prone to regularization
Simulating Dutch voicing alternations
Probabilities for forms with x/G: 97% voiced (131 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (80%) Lexicon (97%) voiced in lexicon voiceless in lexicon wug
- Novel words: voiced
- Trend-followers: correctly voiced - Exceptions: errors towards voicing - Analogous to Polish stress: exceptions exist, but are prone to regularization
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 26 / 30
Simulating Dutch voicing alternations
Weights for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)
- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights
Simulating Dutch voicing alternations
Weights for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)
- Low-ish weight on *S[+voice]
- Exceptions: high weights - Trend-followers: lower weights
⇒ but higher than *S[+voice]
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 27 / 30
Simulating Dutch voicing alternations
Weights for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)
- Low-ish weight on *S[+voice] - Exceptions: high weights
- Trend-followers: lower weights ⇒ but higher than *S[+voice]
Simulating Dutch voicing alternations
Weights for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)
- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights
⇒ but higher than *S[+voice]
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 27 / 30
Simulating Dutch voicing alternations
Weights for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 -5 0 5 Epochs Weight *VTV *S[+voice] *VDVi *VTVj (exceptions)
- Low-ish weight on *S[+voice] - Exceptions: high weights - Trend-followers: lower weights
Simulating Dutch voicing alternations
Probabilities for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug
- Novel words: 20% voiced - Existing words: correct
- Analogous to English stress: each word’s pronounciation is stable
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 28 / 30
Simulating Dutch voicing alternations
Probabilities for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug
- Novel words: 20% voiced
- Existing words: correct
- Analogous to English stress: each word’s pronounciation is stable
Simulating Dutch voicing alternations
Probabilities for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug
- Novel words: 20% voiced - Existing words: correct
- Analogous to English stress: each word’s pronounciation is stable
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 28 / 30
Simulating Dutch voicing alternations
Probabilities for forms with s/z: 33% voiced (451 total)
0 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 Epochs P(vo ice d) EB wug-test (23%) Lexicon (33%) voiced in lexicon voiceless in lexicon wug
- Novel words: 20% voiced - Existing words: correct
- Analogous to English stress: each word’s pronounciation is stable
Summary
Using lexically specific constraints together with more general constraints like *P[+voice], we can model:
1 Probabilistic lexical trends which are reflected in speakers’
treatment of novel words
2 Categorical behavior of extant words in a language’s lexicon
3 Difference between generalizations with few vs. many
exceptions
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 29 / 30
Summary
Using lexically specific constraints together with more general constraints like *P[+voice], we can model:
1 Probabilistic lexical trends which are reflected in speakers’
treatment of novel words
2 Categorical behavior of extant words in a language’s lexicon
3 Difference between generalizations with few vs. many
Summary
Using lexically specific constraints together with more general constraints like *P[+voice], we can model:
1 Probabilistic lexical trends which are reflected in speakers’
treatment of novel words
2 Categorical behavior of extant words in a language’s lexicon
3 Difference between generalizations with few vs. many
exceptions
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 29 / 30
Summary
Using lexically specific constraints together with more general constraints like *P[+voice], we can model:
1 Probabilistic lexical trends which are reflected in speakers’
treatment of novel words
2 Categorical behavior of extant words in a language’s lexicon
3 Difference between generalizations with few vs. many
Thank you!
Special thanks are due to Robert Staubs, Sharon Peperkamp, and Emmanuel Dupoux for useful discussion and comments, and to Robert for generously lending his gradient descent script. This material is based upon work supported by the National Science Foundation under Grant No. 1451512 to the first author, Grant BCS-424077 to the University of Massachusetts Amherst, and by the city of Paris through a Research in Paris fellowship to the first author.
Claire Moore-Cantwell, Joe Pater UMass Amherst OCP Barcelona Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints 30 / 30