Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 1 8 Feb 2016
Lecture 10:
Recurrent Neural Networks
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 2 8 Feb 2016
Administrative
- Midterm this Wednesday! woohoo!
- A3 will be out ~Wednesday
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 3 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 4 8 Feb 2016
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 5 8 Feb 2016
http://mtyka.github.io/deepdream/2016/02/05/bilateral-class-vis.html
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 6 8 Feb 2016
Recurrent Networks offer a lot of flexibility:
Vanilla Neural Networks
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 7 8 Feb 2016
Recurrent Networks offer a lot of flexibility:
e.g. Image Captioning
image -> sequence of words
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 8 8 Feb 2016
Recurrent Networks offer a lot of flexibility:
e.g. Sentiment Classification sequence of words -> sentiment
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 9 8 Feb 2016
Recurrent Networks offer a lot of flexibility:
e.g. Machine Translation seq of words -> seq of words
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 10 8 Feb 2016
Recurrent Networks offer a lot of flexibility:
e.g. Video classification on frame level
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 11 8 Feb 2016
Multiple Object Recognition with Visual Attention, Ba et al.
Sequential Processing of fixed
inputs
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 12 8 Feb 2016
DRAW: A Recurrent Neural Network For Image Generation, Gregor et al.
Sequential Processing of fixed
outputs
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 13 8 Feb 2016
Recurrent Neural Network
x RNN
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 14 8 Feb 2016
Recurrent Neural Network
x RNN
y
usually want to
predict a vector at
some time steps
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 15 8 Feb 2016
Recurrent Neural Network
x RNN
y We can process a sequence of vectors x by
applying a recurrence formula at every time step:
new state old state input vector at some time step some function
with parameters W
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 16 8 Feb 2016
Recurrent Neural Network
x RNN
y We can process a sequence of vectors x by
applying a recurrence formula at every time step:
Notice: the same function and the same set
of parameters are used at every time step.
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 17 8 Feb 2016
(Vanilla) Recurrent Neural Network
x RNN
y
The state consists of a single “hidden” vector h:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 18 8 Feb 2016
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training sequence:
“hello”
x RNN
y
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 19 8 Feb 2016
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training sequence:
“hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 20 8 Feb 2016
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training sequence:
“hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 21 8 Feb 2016
Character-level language model example
Vocabulary:
[h,e,l,o]
Example training sequence:
“hello”
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 22 8 Feb 2016
min-char-rnn.py gist: 112 lines of Python
(https://gist.github.
com/karpathy/d4dee566867f8291f086)
min-char-rnn.py gist
Data I/O
min-char-rnn.py gist
Initializations
recall:
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Main loop
min-char-rnn.py gist
Loss function
- forward pass (compute loss)
- backward pass (compute param gradient)
min-char-rnn.py gist
Softmax classifier
min-char-rnn.py gist
recall:
min-char-rnn.py gist
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 34 8 Feb 2016
x RNN
y
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 35 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 36 8 Feb 2016
train more
train more
train more at first:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 37 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 38 8 Feb 2016
open source textbook on algebraic geometry
Latex source
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 39 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 40 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 41 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 42 8 Feb 2016
Generated
C code
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 43 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 44 8 Feb 2016
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 45 8 Feb 2016
Searching for interpretable cells
[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 46 8 Feb 2016
quote detection cell
Searching for interpretable cells
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 47 8 Feb 2016
line length tracking cell
Searching for interpretable cells
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 48 8 Feb 2016
if statement cell
Searching for interpretable cells
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 49 8 Feb 2016
quote/comment cell
Searching for interpretable cells
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 50 8 Feb 2016
code depth cell
Searching for interpretable cells
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 51 8 Feb 2016
Explain Images with Multimodal Recurrent Neural Networks, Mao et al.
Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-Fei Show and Tell: A Neural Image Caption Generator, Vinyals et al.
Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.
Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick
Image Captioning
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 52 8 Feb 2016
Convolutional Neural Network
Recurrent Neural Network
test image
test image
test image
X
test image
x0
<STA RT>
<START>
h0
x0
<STA RT>
y0
<START>
test image
before:
h = tanh(Wxh * x + Whh * h)
now:
h = tanh(Wxh * x + Whh * h + Wih * v)
v
Wih
h0
x0
<STA RT>
y0
<START>
test image
straw
sample!
h0
x0
<STA RT>
y0
<START>
test image
straw
h1 y1
h0
x0
<STA RT>
y0
<START>
test image
straw
h1 y1
hat
sample!
h0
x0
<STA RT>
y0
<START>
test image
straw
h1 y1
hat
h2 y2
h0
x0
<STA RT>
y0
<START>
test image
straw
h1 y1
hat
h2 y2
sample
<END> token
=> finish.
Image Sentence Datasets
Microsoft COCO
[Tsung-Yi Lin et al. 2014]
mscoco.org
currently:
~120K images
~5 sentences each
Show Attend and Tell, Xu et al., 2015 66
Preview of fancier architectures
RNN attends spatially to different parts of images while generating each word of the sentence:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 67 8 Feb 2016
time depth
RNN:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 68 8 Feb 2016
time depth
RNN:
LSTM:
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 69 8 Feb 2016
LSTM
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 70 8 Feb 2016
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
x
h
vector from before (h)
W
i f o g
vector from below (x)
sigmoid sigmoid
tanh sigmoid
4n x 2n 4n 4*n
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 71 8 Feb 2016
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c
f
x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 72 8 Feb 2016
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c
f
x
i g
x +
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 73 8 Feb 2016
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c
f
x +
tanh
o x
h
c
i g
x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 74 8 Feb 2016
Long Short Term Memory (LSTM)
[Hochreiter et al., 1997]
cell state c
f
x +
tanh
o x
h
c
i g
x
higher layer, or prediction
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 75 8 Feb 2016
LSTM
cell state c
f
x
i g
x +
tanh
o
x f
x
i g
x +
tanh
o
x
one timestep one timestep
h h
h x x
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 76 8 Feb 2016
f f f
RNN
state
f
+
f
+
f
LSTM
+ (ignoring forget gates)Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 77 8 Feb 2016
Recall:
“PlainNets” vs. ResNets
ResNet is to PlainNet what LSTM is to RNN, kind of.
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 78 8 Feb 2016
Understanding gradient flow dynamics
Cute backprop signal video: http://imgur.com/gallery/vaNahKE
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 79 8 Feb 2016
Understanding gradient flow dynamics
if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish
[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 80 8 Feb 2016
Understanding gradient flow dynamics
if the largest eigenvalue is > 1, gradient will explode if the largest eigenvalue is < 1, gradient will vanish
[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]
can control exploding with gradient clipping can control vanishing with LSTM
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Lecture 10 - 81 8 Feb 2016
LSTM variants and friends
[LSTM: A Search Space Odyssey, Greff et al., 2015]
[An Empirical Exploration of
Recurrent Network Architectures, Jozefowicz et al., 2015]
GRU [Learning phrase
representations using rnn encoder- decoder for statistical machine translation, Cho et al. 2014]
Lecture 10 - 8 Feb 2016
Fei-Fei Li & Andrej Karpathy & Justin Johnson
Fei-Fei Li & Andrej Karpathy & Justin Johnson