what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . )

s(r1) = exp(λ_merge+ λt) s(r2) = exp(λmove+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)

(Hunter and Dyer 2013)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

The smarter parametrization

Solution: Have a rule’s probability be a function of (only) “what it does”

merge or move

what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . )

s(r1) = exp(λmerge+ λt) s(r2) = exp(λ_move+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

The smarter parametrization

Solution: Have a rule’s probability be a function of (only) “what it does”

merge or move

what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)

s(r2) = exp(λ_move+ λwh) s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)

(Hunter and Dyer 2013)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

The smarter parametrization

Solution: Have a rule’s probability be a function of (only) “what it does”

merge or move

what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)

s(r2) = exp(λmove+ λwh)

s(r3) = exp(λmerge+ λd) s(r5) = exp(λmerge+ λd)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

The smarter parametrization

Solution: Have a rule’s probability be a function of (only) “what it does”

merge or move

what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)

s(r2) = exp(λ_move+ λwh) s(r3) = exp(λmerge+ λd)

s(r5) = exp(λmerge+ λd)

(Hunter and Dyer 2013)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

The smarter parametrization

Solution: Have a rule’s probability be a function of (only) “what it does”

merge or move

what feature is being checked (either movement or selection)

MCFG Rule φ_merge φd φv φt φmove φwh

st :: hci0 → s :: h=t ci1 t :: hti0 1 0 0 1 0 0 ts :: hci0 → hs, ti :: h+wh c, -whi0 0 0 0 0 1 1 st :: hvi0 → s :: h=d vi1 t :: hdi1 1 1 0 0 0 0 st :: hvi0 → s :: h=v vi1 t :: hvi0 1 0 1 0 0 0 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1 1 1 0 0 0 0 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 1 0 1 0 0 0

Each rule r is assigned ascoreas a function of the vector φ(r ): s(r ) = exp(λ · φ(r ))

= exp(λmergeφmerge(r ) + λdφd(r ) + λvφv(r ) + . . . ) s(r1) = exp(λmerge+ λt)

s(r2) = exp(λ_move+ λwh) s(r3) = exp(λmerge+ λd)

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

Generalizations missed by the naive parametrization

> < marie :: praise :: often :: v v often :: =v v < marie :: praise :: v v

st :: hvi0 → s :: h=v vi1 t :: hvi0

> < who :: -wh praise :: often :: v v, -wh often :: =v v < who :: -wh praise :: v v, -wh

hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

Generalizations missed by the naive parametrization

> < marie :: praise :: often :: v v often :: =v v < marie :: praise :: v v

st :: hvi0 → s :: h=v vi1 t :: hvi0

> < who :: -wh praise :: often :: v v, -wh often :: =v v < who :: -wh praise :: v v, -wh

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

Comparison

The old way:

λ1 ts :: hci0 → hs, ti :: h+wh c, -whi0 λ2 st :: hci0 → s :: h=t ci1 t :: hti0 λ3 st :: hvi0 → s :: h=d vi1 t :: hdi1 λ4 st :: hvi0 → s :: h=v vi1 t :: hvi0

λ5 hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1

λ6 hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 Training question: What values of λ1, λ2, etc. make the training corpus most likely? The new way:

exp(λ_move+ λwh) ts :: hci0 → hs, ti :: h+wh c, -whi0

exp(λ_merge+ λt) st :: hci0 → s :: h=t ci1 t :: hti0

exp(λ_merge+ λd) st :: hvi0 → s :: h=d vi1 t :: hdi1

exp(λ_merge+ λv) st :: hvi0 → s :: h=v vi1 t :: hvi0

exp(λmerge+ λd) hs, ti :: hv, -whi0 → s :: h=d vi1 t :: hd -whi1

exp(λ_merge+ λv) hst, ui :: hv, -whi0 → s :: h=v vi1 ht, ui :: hv, -whi0 Training question: What values of λ_merge, λ_move, λd, etc. make the training corpus most likely?

Easy probabilities Different frameworks Problem #1 Problem #2 Solution: Faithfulness to MG operations

Solution #1 with the smarter parametrization

Grammar

In document Sharpening the empirical claims of generative syntax through formalization (Page 60-69)