Structure and Predictions

(1)

Structure and Predictions

Machine Learning: Jordan Boyd-Graber

University of Maryland

PERCEPTRON: SLIDES ADAPTED FROM LIANG HUANG

Machine Learning: Jordan Boyd-Graber | ^UMD Structure and Predictions | ^{1 / 1}

(2)

How do we set the feature weights?

Goal is to minimize errors

Want to reward features that lead to right answers

Penalize features that lead to wrong answers

Problem: predictions are correlated

(3)

Perceptron Algorithm

Rather than just counting up how often we see events?

We’ll use this for intuition in 2D case

(4)

Perceptron Algorithm

1: ^w ~ 1 ← ~ ⁰

2: for t ← ¹ . . . T do

3: Receive x _t

4: ˆ y _t ← ^sgn ( w _~ _t · ~ ^x t )

5: Receive y _t

6: if ˆ y _t 6 = y _t then

7: ^w ~ t + 1 ← ~ ^w t + y _t ~ ^x t

8: else

9: ^w ~ t + 1 ← ^w t

return w _T ₊ ₁

(5)

Binary to Structure

(6)

Binary to Structure

(7)

Binary to Structure

(8)

Generic Perceptron

perceptron is the simplest machine learning algorithm

online-learning: one example at a time

learning by doing

find the best output under the current weights

update weights at mistakes

(9)

2D Example

Initially, weight vector is zero:

~

w ₁ = _〈 0 , 0 〉 ⁽¹⁾

(10)

Observation 1

x ₁ = _〈− 2 , 2 〉 ⁽²⁾ ˆ

y ₁ = 0 (3)

y ₁ = + 1 (4)

(11)

Update 1

~

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

~

w ₂ ← ⁽⁶⁾

(12)

Update 1

~

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

~

w ₂ ← 〈 ⁰ , 0 〉 + _〈− 2 , 2 〉 ⁽⁶⁾ (7)

(13)

Update 1

~

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

~

w ₂ ← 〈 ⁰ , 0 〉 + _〈− 2 , 2 〉 ⁽⁶⁾

~

w ₂ = _〈− 2 , 2 〉 ⁽⁷⁾

(14)

Observation 2

(15)

Observation 2

x ₂ = _〈− 2 , − ³ 〉 ⁽⁸⁾ ˆ

y ₂ = + 4 + ₋ 6 = ₋ 2 (9)

y ₂ = ₋ 1 (10)

(16)

Update 2

~

w _t ₊ ₁ ← ~ ^w t (11)

~

w ₂ ← ⁽¹²⁾

(17)

Update 2

~

w _t ₊ ₁ ← ~ ^w t (11)

~

w ₂ ← 〈− ² , 2 〉 ⁽¹²⁾

(13)

(18)

Update 2

~

w _t ₊ ₁ ← ~ ^w t (11)

~

w ₂ ← 〈− ² , 2 〉 ⁽¹²⁾

~

w ₂ = _〈− 2 , 2 〉 ⁽¹³⁾

(19)

Observation 3

(20)

Observation 3

x ₃ = _〈 2 , − ¹ 〉 ⁽¹⁴⁾ ˆ

y ₃ = ₋ 4 + ₋ 2 = ₋ 6 (15)

y ₃ = + 1 (16)

(21)

Update 3

~

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (17)

~

w ₃ ← ⁽¹⁸⁾

(22)

Update 3

~

w _t ₊ ₁ ← ~ ^w t + y _t ~ ^x t (17)

~

w ₃ ← 〈− ² , 2 〉 + _〈 2 , − ¹ 〉 ⁽¹⁸⁾ (19)

(23)

Update 3

~

w _t ₊ ₁ ← ~ ^w t + y _t ~ ^x t (17)

~

w ₃ ← 〈− ² , 2 〉 + _〈 2 , − ¹ 〉 ⁽¹⁸⁾

~

w ₃ = _〈 0 , 1 〉 ⁽¹⁹⁾

(24)

Observation 4

(25)

Observation 4

x ₄ = _〈 1 , − ⁴ 〉 ⁽²⁰⁾ ˆ

y ₄ = ₋ 4 (21) y ₄ = ₋ 1 (22)

(26)

Update 4

~

w ₄ ← ⁽²³⁾

(27)

Update 4

~

w ₄ ← ~ ^w 3 (23)

(24)

(28)

Update 4

~

w ₄ ← ~ ^w 3 (23)

~

w ₄ = _〈 0 , 1 〉 ⁽²⁴⁾

(29)

Observation 5

x ₅ = _〈 2 , 2 〉 ⁽²⁵⁾ ˆ

y ₅ = 2 (26) y ₅ = + 1 (27)

(30)

Update 5

~

w ₅ ← ⁽²⁸⁾

(31)

Update 5

~

w ₅ ← ~ ^w 4 (28)

(29)

(32)

Update 5

~

w ₅ ← ~ ^w 4 (28)

~

w ₅ = _〈 0 , 1 〉 ⁽²⁹⁾

(33)

Observation 6

x ₆ = _〈 2 , 2 〉 ⁽³⁰⁾ ˆ

y ₆ = 2 (31) y ₆ = + 1 (32)

(34)

Update 6

~

w ₆ ← ⁽³³⁾

(35)

Update 6

~

w ₆ ← ~ ^w 5 (33)

(34)

(36)

Update 6

~

w ₆ ← ~ ^w 5 (33)

~

w ₆ = _〈 0 , 1 〉 ⁽³⁴⁾

(37)

Structured Perceptron

(38)

Perceptron Algorithm

(39)

Structure and Predictions

Structure and Predictions

Machine Learning: Jordan Boyd-Graber

University of Maryland

How do we set the feature weights?

Goal is to minimize errors

Want to reward features that lead to right answers

Penalize features that lead to wrong answers

Problem: predictions are correlated

Perceptron Algorithm

Rather than just counting up how often we see events?

We’ll use this for intuition in 2D case

Perceptron Algorithm

1: w ~ 1 ← ~ 0

2: for t ← 1 . . . T do

3: Receive x t

4: ˆ y t ← sgn ( w ~ t · ~ x t )

5: Receive y t

6: if ˆ y t 6 = y t then

7: w ~ t + 1 ← ~ w t + y t ~ x t

8: else

9: w ~ t + 1 ← w t

return w T + 1

Binary to Structure

Binary to Structure

Binary to Structure

Generic Perceptron

perceptron is the simplest machine learning algorithm

online-learning: one example at a time

learning by doing

find the best output under the current weights

update weights at mistakes

2D Example

Initially, weight vector is zero:

~

w 1 = 〈 0 , 0 〉 (1)

Observation 1

x 1 = 〈− 2 , 2 〉 (2) ˆ

y 1 = 0 (3)

y 1 = + 1 (4)

Update 1

~

w t + 1 ← ~ w t + y t x ~ t (5)

~

w 2 ← (6)

Update 1

~

w t + 1 ← ~ w t + y t x ~ t (5)

~

w 2 ← 〈 0 , 0 〉 + 〈− 2 , 2 〉 (6) (7)

Update 1

~

w t + 1 ← ~ w t + y t x ~ t (5)

~

w 2 ← 〈 0 , 0 〉 + 〈− 2 , 2 〉 (6)

~

w 2 = 〈− 2 , 2 〉 (7)

Observation 2

Observation 2

x 2 = 〈− 2 , − 3 〉 (8) ˆ

y 2 = + 4 + − 6 = − 2 (9)

y 2 = − 1 (10)

Update 2

~

w t + 1 ← ~ w t (11)

~

w 2 ← (12)

Update 2

~

w t + 1 ← ~ w t (11)

~

w 2 ← 〈− 2 , 2 〉 (12)

(13)

Update 2

~

w t + 1 ← ~ w t (11)

~

w 2 ← 〈− 2 , 2 〉 (12)

~

w 2 = 〈− 2 , 2 〉 (13)

1: ^w ~ 1 ← ~ ⁰

2: for t ← ¹ . . . T do

3: Receive x _t

4: ˆ y _t ← ^sgn ( w _~ _t · ~ ^x t )

5: Receive y _t

6: if ˆ y _t 6 = y _t then

7: ^w ~ t + 1 ← ~ ^w t + y _t ~ ^x t

9: ^w ~ t + 1 ← ^w t

return w _T ₊ ₁

w ₁ = _〈 0 , 0 〉 ⁽¹⁾

x ₁ = _〈− 2 , 2 〉 ⁽²⁾ ˆ

y ₁ = 0 (3)

y ₁ = + 1 (4)

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

w ₂ ← ⁽⁶⁾

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

w ₂ ← 〈 ⁰ , 0 〉 + _〈− 2 , 2 〉 ⁽⁶⁾ (7)

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (5)

w ₂ ← 〈 ⁰ , 0 〉 + _〈− 2 , 2 〉 ⁽⁶⁾

w ₂ = _〈− 2 , 2 〉 ⁽⁷⁾

x ₂ = _〈− 2 , − ³ 〉 ⁽⁸⁾ ˆ

y ₂ = + 4 + ₋ 6 = ₋ 2 (9)

y ₂ = ₋ 1 (10)

w _t ₊ ₁ ← ~ ^w t (11)

w ₂ ← ⁽¹²⁾

w _t ₊ ₁ ← ~ ^w t (11)

w ₂ ← 〈− ² , 2 〉 ⁽¹²⁾

w _t ₊ ₁ ← ~ ^w t (11)

w ₂ ← 〈− ² , 2 〉 ⁽¹²⁾

w ₂ = _〈− 2 , 2 〉 ⁽¹³⁾

x ₃ = _〈 2 , − ¹ 〉 ⁽¹⁴⁾ ˆ

y ₃ = ₋ 4 + ₋ 2 = ₋ 6 (15)

y ₃ = + 1 (16)

w _t ₊ ₁ ← ~ ^w t + y _t ^x ~ t (17)

w ₃ ← ⁽¹⁸⁾

w _t ₊ ₁ ← ~ ^w t + y _t ~ ^x t (17)

w ₃ ← 〈− ² , 2 〉 + _〈 2 , − ¹ 〉 ⁽¹⁸⁾ (19)

w _t ₊ ₁ ← ~ ^w t + y _t ~ ^x t (17)

w ₃ ← 〈− ² , 2 〉 + _〈 2 , − ¹ 〉 ⁽¹⁸⁾

w ₃ = _〈 0 , 1 〉 ⁽¹⁹⁾

x ₄ = _〈 1 , − ⁴ 〉 ⁽²⁰⁾ ˆ

y ₄ = ₋ 4 (21) y ₄ = ₋ 1 (22)

w ₄ ← ⁽²³⁾

w ₄ ← ~ ^w 3 (23)

w ₄ ← ~ ^w 3 (23)

w ₄ = _〈 0 , 1 〉 ⁽²⁴⁾

x ₅ = _〈 2 , 2 〉 ⁽²⁵⁾ ˆ

y ₅ = 2 (26) y ₅ = + 1 (27)

w ₅ ← ⁽²⁸⁾

w ₅ ← ~ ^w 4 (28)

w ₅ ← ~ ^w 4 (28)

w ₅ = _〈 0 , 1 〉 ⁽²⁹⁾

x ₆ = _〈 2 , 2 〉 ⁽³⁰⁾ ˆ

y ₆ = 2 (31) y ₆ = + 1 (32)

w ₆ ← ⁽³³⁾

w ₆ ← ~ ^w 5 (33)

w ₆ ← ~ ^w 5 (33)

w ₆ = _〈 0 , 1 〉 ⁽³⁴⁾