MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . )
s(r1) = exp(λmerge+ λt) s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)
(Hunter and Dyer 2013)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
The smarter parametrization
Solution: Have a rule’s probability be a function of (only) “what it does”
merge or move
what feature is being checked (either movement or selection)
MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . )
s(r1) = exp(λmerge+ λt) s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
The smarter parametrization
Solution: Have a rule’s probability be a function of (only) “what it does”
merge or move
what feature is being checked (either movement or selection)
MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)
s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)
(Hunter and Dyer 2013)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
The smarter parametrization
Solution: Have a rule’s probability be a function of (only) “what it does”
merge or move
what feature is being checked (either movement or selection)
MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)
s(r2) = exp(λmove+ λwh)
s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
The smarter parametrization
Solution: Have a rule’s probability be a function of (only) “what it does”
merge or move
what feature is being checked (either movement or selection)
MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)
s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd)
s(r5) = exp(λmerge+ λd)
(Hunter and Dyer 2013)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
The smarter parametrization
Solution: Have a rule’s probability be a function of (only) “what it does”
merge or move
what feature is being checked (either movement or selection)
MCFG Rule φmerge φd φv φt φmove φwh
st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0
Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))
= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)
s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd)
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
Generalizations missed by the naive parametrization
> < marie :: praise :: often :: v v often :: =v v < marie :: praise :: v v
st :: hvi0 → s :: h=v vi1 t :: hvi0
> < who :: -wh praise :: often :: v v, -wh often :: =v v < who :: -wh praise :: v v, -wh
hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
Generalizations missed by the naive parametrization
> < marie :: praise :: often :: v v often :: =v v < marie :: praise :: v v
st :: hvi0 → s :: h=v vi1 t :: hvi0
> < who :: -wh praise :: often :: v v, -wh often :: =v v < who :: -wh praise :: v v, -wh
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations
Comparison
The old way:
λ1 ts :: hci0 → hs, ti :: h+wh c, -whi0 λ2 st :: hci0 → s :: h=t ci1 t :: hti0 λ3 st :: hvi0 → s :: h=d vi1 t :: hdi1 λ4 st :: hvi0 → s :: h=v vi1 t :: hvi0
λ5 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1
λ6 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 Training question: What values of λ1, λ2, etc. make the training corpus most likely? The new way:
exp(λmove+ λwh) ts :: hci0 → hs, ti :: h+wh c, -whi0
exp(λmerge+ λt) st :: hci0 → s :: h=t ci1 t :: hti0
exp(λmerge+ λd) st :: hvi0 → s :: h=d vi1 t :: hdi1
exp(λmerge+ λv) st :: hvi0 → s :: h=v vi1 t :: hvi0
exp(λmerge+ λd) hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1
exp(λmerge+ λv) hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 Training question: What values of λmerge, λmove, λd, etc. make the training corpus most likely?
Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations