• No results found

Lecture 10:

N/A
N/A
Protected

Academic year: 2021

Share "Lecture 10:"

Copied!
82
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 1 8 Feb 2016

Lecture 10:

Recurrent Neural Networks

(2)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 2 8 Feb 2016

Administrative

- Midterm this Wednesday! woohoo!

- A3 will be out ~Wednesday

(3)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 3 8 Feb 2016

(4)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 4 8 Feb 2016

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

(5)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 5 8 Feb 2016

http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html

(6)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 6 8 Feb 2016

Recurrent Networks offer a lot of flexibility:

Vanilla Neural Networks

(7)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 7 8 Feb 2016

Recurrent Networks offer a lot of flexibility:

e.g. Image Captioning

image -> sequence of words

(8)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 8 8 Feb 2016

Recurrent Networks offer a lot of flexibility:

e.g. Sentiment Classification sequence of words -> sentiment

(9)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 9 8 Feb 2016

Recurrent Networks offer a lot of flexibility:

e.g. Machine Translation seq of words -> seq of words

(10)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 10 8 Feb 2016

Recurrent Networks offer a lot of flexibility:

e.g. Video classification on frame level

(11)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 11 8 Feb 2016

Multiple Object Recognition with Visual Attention, Ba et al.

Sequential Processing of fixed

inputs

(12)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 12 8 Feb 2016

DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.

Sequential Processing of fixed

outputs

(13)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 13 8 Feb 2016

Recurrent Neural Network

x RNN

(14)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 14 8 Feb 2016

Recurrent Neural Network

x RNN

y

usually want to

predict a vector at

some time steps

(15)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 15 8 Feb 2016

Recurrent Neural Network

x RNN

y We can process a sequence of vectors x by

applying a recurrence formula at every time step:

new state old state input vector at some time step some function

with parameters W

(16)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 16 8 Feb 2016

Recurrent Neural Network

x RNN

y We can process a sequence of vectors x by

applying a recurrence formula at every time step:

Notice: the same function and the same set

of parameters are used at every time step.

(17)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 17 8 Feb 2016

(Vanilla) Recurrent Neural Network

x RNN

y

The state consists of a single “hidden” vector h:

(18)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 18 8 Feb 2016

Character-level language model example

Vocabulary:

[h,e,l,o]

Example training sequence:

“hello”

x RNN

y

(19)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 19 8 Feb 2016

Character-level language model example

Vocabulary:

[h,e,l,o]

Example training sequence:

“hello”

(20)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 20 8 Feb 2016

Character-level language model example

Vocabulary:

[h,e,l,o]

Example training sequence:

“hello”

(21)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 21 8 Feb 2016

Character-level language model example

Vocabulary:

[h,e,l,o]

Example training sequence:

“hello”

(22)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 22 8 Feb 2016

min-char-rnn.py gist: 112 lines of Python

(https://gist.github.

com/karpathy/d4dee566867f8291f086)

(23)

min-char-rnn.py gist

Data I/O

(24)

min-char-rnn.py gist

Initializations

recall:

(25)

min-char-rnn.py gist

Main loop

(26)

min-char-rnn.py gist

Main loop

(27)

min-char-rnn.py gist

Main loop

(28)

min-char-rnn.py gist

Main loop

(29)

min-char-rnn.py gist

Main loop

(30)

min-char-rnn.py gist

Loss function

- forward pass (compute loss)

- backward pass (compute param gradient)

(31)

min-char-rnn.py gist

Softmax classifier

(32)

min-char-rnn.py gist

recall:

(33)

min-char-rnn.py gist

(34)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 34 8 Feb 2016

x RNN

y

(35)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 35 8 Feb 2016

(36)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 36 8 Feb 2016

train more

train more

train more at first:

(37)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 37 8 Feb 2016

(38)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 38 8 Feb 2016

open source textbook on algebraic geometry

Latex source

(39)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 39 8 Feb 2016

(40)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 40 8 Feb 2016

(41)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 41 8 Feb 2016

(42)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 42 8 Feb 2016

Generated

C code

(43)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 43 8 Feb 2016

(44)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 44 8 Feb 2016

(45)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 45 8 Feb 2016

Searching for interpretable cells

[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]

(46)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 46 8 Feb 2016

quote detection cell

Searching for interpretable cells

(47)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 47 8 Feb 2016

line length tracking cell

Searching for interpretable cells

(48)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 48 8 Feb 2016

if statement cell

Searching for interpretable cells

(49)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 49 8 Feb 2016

quote/comment cell

Searching for interpretable cells

(50)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 50 8 Feb 2016

code depth cell

Searching for interpretable cells

(51)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 51 8 Feb 2016

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.

Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al.

Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.

Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

(52)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 52 8 Feb 2016

Convolutional Neural Network

Recurrent Neural Network

(53)

test image

(54)

test image

(55)

test image

X

(56)

test image

x0

<STA RT>

<START>

(57)

h0

x0

<STA RT>

y0

<START>

test image

before:

h = tanh(Wxh * x + Whh * h)

now:

h = tanh(Wxh * x + Whh * h + Wih * v)

v

Wih

(58)

h0

x0

<STA RT>

y0

<START>

test image

straw

sample!

(59)

h0

x0

<STA RT>

y0

<START>

test image

straw

h1 y1

(60)

h0

x0

<STA RT>

y0

<START>

test image

straw

h1 y1

hat

sample!

(61)

h0

x0

<STA RT>

y0

<START>

test image

straw

h1 y1

hat

h2 y2

(62)

h0

x0

<STA RT>

y0

<START>

test image

straw

h1 y1

hat

h2 y2

sample

<END> token

=> finish.

(63)

Image Sentence Datasets

Microsoft COCO

[Tsung-Yi Lin et al. 2014]

mscoco.org

currently:

~120K images

~5 sentences each

(64)
(65)
(66)

Show Attend and Tell, Xu et al., 2015 66

Preview of fancier architectures

RNN attends spatially to different parts of images while generating each word of the sentence:

(67)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 67 8 Feb 2016

time depth

RNN:

(68)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 68 8 Feb 2016

time depth

RNN:

LSTM:

(69)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 69 8 Feb 2016

LSTM

(70)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 70 8 Feb 2016

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

x

h

vector from before (h)

W

i f o g

vector from below (x)

sigmoid sigmoid

tanh sigmoid

4n x 2n 4n 4*n

(71)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 71 8 Feb 2016

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c

f

x

(72)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 72 8 Feb 2016

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c

f

x

i g

x +

(73)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 73 8 Feb 2016

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c

f

x +

tanh

o x

h

c

i g

x

(74)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 74 8 Feb 2016

Long Short Term Memory (LSTM)

[Hochreiter et al., 1997]

cell state c

f

x +

tanh

o x

h

c

i g

x

higher layer, or prediction

(75)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 75 8 Feb 2016

LSTM

cell state c

f

x

i g

x +

tanh

o

x f

x

i g

x +

tanh

o

x

one timestep one timestep

h h

h x x

(76)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 76 8 Feb 2016

f f f

RNN

state

f

+

f

+

f

LSTM

+ (ignoring forget gates)

(77)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 77 8 Feb 2016

Recall:

“PlainNets” vs. ResNets

ResNet is to PlainNet what LSTM is to RNN, kind of.

(78)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 78 8 Feb 2016

Understanding gradient flow dynamics

Cute backprop signal video: http://imgur.com/gallery/vaNahKE

(79)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 79 8 Feb 2016

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

(80)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 80 8 Feb 2016

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

can control exploding with gradient clipping can control vanishing with LSTM

(81)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 81 8 Feb 2016

LSTM variants and friends

[LSTM: A Search Space Odyssey, Greff et al., 2015]

[An Empirical Exploration of

Recurrent Network Architectures, Jozefowicz et al., 2015]

GRU [Learning phrase

representations using rnn encoder- decoder for statistical machine translation, Cho et al. 2014]

(82)

Lecture 10 - 8 Feb 2016

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Fei-Fei Li & Andrej Karpathy & Justin Johnson

Lecture 10 - 82 8 Feb 2016

Summary

- RNNs allow a lot of flexibility in architecture design - Vanilla RNNs are simple but don’t work very well

- Common to use LSTM or GRU: their additive interactions improve gradient flow

- Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

- Better/simpler architectures are a hot topic of current research

- Better understanding (both theoretical and empirical) is needed.

References

Related documents

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015. Generate

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 4 - 95 April 13, 2017 Biological Neurons:. ● Many

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 7 - April 25, 2017?. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 7 - 7 April

● Define layers in Python with numpy (CPU only).. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 8 - 150 April 27, 2017.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 13 - May 18, 2017. Training GANs:

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 8 - 7 April 26, 2018!. Spot

ImageNet Large Scale Visual Recognition Challenge (ILSVRC) winners.. shallow 8 layers

The optimal Q-value function Q* is the maximum expected cumulative reward achievable from a given (state, action) pair:...