• No results found

Project Proposal

N/A
N/A
Protected

Academic year: 2021

Share "Project Proposal"

Copied!
75
0
0

Loading.... (view fulltext now)

Full text

(1)

Administrative

- A1 is due Today (midnight). You can use up to 3 late days

- A2 will be up this Friday, it’s due next next Wednesday (Feb 4)

- Project Proposal is due next Friday at midnight (~one paragraph (200-400

words), send as email)

(2)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 2 21 Jan 2015

Lecture 5:

Backprop

and intro to

Neural Nets

(3)

SVM:

Softmax:

Linear Classification

(4)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 4 21 Jan 2015

Optimization Landscape

(5)

Gradient Descent

Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your

(6)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 6 21 Jan 2015

This class:

Becoming a

backprop ninja

(7)
(8)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 8 21 Jan 2015

Example: x = 4, y = -3. => f(x,y) = -12

partial derivatives gradient

(9)

Example: x = 4, y = -3. => f(x,y) = -12

partial derivatives gradient

(10)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 10 21 Jan 2015

Compound expressions:

(11)

Compound expressions:

Chain rule:

(12)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 12 21 Jan 2015

Compound expressions:

Chain rule:

(13)

Compound expressions:

Chain rule:

(14)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 14 21 Jan 2015

Another example:

(15)

Another example:

-1/(1.37^2) = -0.53

(16)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 16 21 Jan 2015

Another example:

[local gradient] x [its gradient]

[1] x [-0.53] = -0.53

(17)

Another example:

[local gradient] x [its gradient]

[e^(-1)] x [-0.53] = -0.20

(18)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 18 21 Jan 2015

Another example:

[local gradient] x [its gradient]

[-1] x [-0.2] = 0.2

(19)

Another example:

[local gradient] x [its gradient]

[1] x [0.2] = 0.2

[1] x [0.2] = 0.2 (both inputs!)

(20)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 20 21 Jan 2015

Another example:

[local gradient] x [its gradient]

x0: [2] x [0.2] ~= 0.4 w0: [-1] x [0.2] = -0.2

(21)

Every gate during backprop computes, for all its inputs:

[LOCAL GRADIENT] x [GATE GRADIENT]

Can be computed right away, even during forward pass

The gate receives this during backpropagation

a gate hanging out

(22)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 22 21 Jan 2015

sigmoid function

(23)

sigmoid function

(0.73) * (1 - 0.73) = 0.2

(24)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 24 21 Jan 2015

sigmoid function

(25)

sigmoid function

(26)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 26 21 Jan 2015

We are ready:

(27)

We are ready:

(28)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 28 21 Jan 2015

forward pass was:

(29)

forward pass was:

(30)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 30 21 Jan 2015

forward pass was:

(31)

forward pass was:

(32)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 32 21 Jan 2015

forward pass was:

(33)

forward pass was:

(34)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 34 21 Jan 2015

forward pass was:

(35)

forward pass was:

(36)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 36 21 Jan 2015

Patterns in backward flow

add gate: gradient distributor max gate: gradient router

mul gate: gradient… “switcher”?

(37)

Gradients for vectorized code

(38)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 38 21 Jan 2015

Gradients for vectorized code

X is [10 x 3], dD is [5 x 3]

dW must be [5 x 10]

dX must be [10 x 3]

(39)

Gradients for vectorized code

(40)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 40 21 Jan 2015

In summary

- in practice it is rarely needed to derive long gradients of variables on pen and paper

- structured your code in stages (layers), where you can derive the local gradients, then chain the gradients during backprop.

- caveat: sometimes gradients simplify (e.g.

for sigmoid, also softmax). Group these.

(41)

NEURAL

NETWORKS

(42)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 42 21 Jan 2015

(43)
(44)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 44 21 Jan 2015

(45)

sigmoid activation function

(46)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 46 21 Jan 2015

(47)

A Single Neuron can be used as a binary linear classifier

Regularization has the interpretation of “gradual forgetting”

(48)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 48 21 Jan 2015

Be very careful with your Brain analogies:

Biological Neurons:

- Many different types

- Dendrites can perform complex non- linear computations

- Synapses are not a single weight but a complex non-linear dynamical

system

- Rate code may not be adequate

[Dendritic Computation. London and Hausser]

(49)

Activation Functions

(50)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 50 21 Jan 2015

Activation Functions

Sigmoid

- Squashes numbers to range [0,1]

- Historically popular since they have nice interpretation as a

saturating “firing rate” of a neuron 2 BIG problems:

(51)

Activation Functions

Sigmoid

- Squashes numbers to range [0,1]

- Historically popular since they have nice interpretation as a

saturating “firing rate” of a neuron 2 BIG problems:

1. Saturated neurons “kill” the gradients

(52)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 52 21 Jan 2015

Activation Functions

Sigmoid

- Squashes numbers to range [0,1]

- Historically popular since they have nice interpretation as a

saturating “firing rate” of a neuron 2 BIG problems:

1. Saturated neurons “kill” the gradients

2. Sigmoid outputs are not zero- centered

(53)

Consider what happens when the input to a neuron is always positive...

What can we say about the gradients on w?

(54)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 54 21 Jan 2015

Consider what happens when the input to a neuron is always positive...

What can we say about the gradients on w?

Always all positive or all negative :(

(this is also why you want zero-mean data!)

hypothetical optimal w vector

zig zag path

allowed gradient update directions

allowed gradient update directions

(55)

Activation Functions

tanh(x)

- Squashes numbers to range [-1,1]

- zero centered (nice)

- still kills gradients when saturated :(

(56)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 56 21 Jan 2015

Activation Functions

ReLU

- Computes f(x) = max(0,x) - Does not saturate

- Very computationally efficient - Converges much faster than

sigmoid/tanh in practice! (e.g. 6x) - Just one annoying problem…

hint: what is the gradient when x < 0?

(57)

DATA CLOUD active ReLU

dead ReLU

will never activate

(58)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 58 21 Jan 2015

Activation Functions

Leaky ReLU

- Does not saturate

- computationally efficient

- Converges much faster than

sigmoid/tanh in practice! (e.g. 6x) - will not “die”.

(59)

Maxout “Neuron”

- Does not have the basic form of dot product ->

nonlinearity

- Generalizes ReLU and Leaky ReLU

- Linear Regime! Does not saturate! Does not die!

Problem: doubles the number of parameters :(

(60)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 60 21 Jan 2015

TLDR: In practice:

- Use ReLU. Be careful with your learning rates - Try out Leaky ReLU / Maxout

- Try out tanh but don’t expect much

- Never use sigmoid

(61)

NEURAL

NETWORKS

(62)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 62 21 Jan 2015

Neural Networks: Architectures

“Fully-connected” layers

(63)

Neural Networks: Architectures

“Fully-connected” layers

“2-layer Neural Net”, or

“1-hidden-layer Neural Net”

“3-layer Neural Net”, or

“2-hidden-layer Neural Net”

(64)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 64 21 Jan 2015

Neural Networks: Architectures

Number of Neurons: ? Number of Weights: ? Number of Parameters: ?

(65)

Neural Networks: Architectures

Number of Neurons: 4+2 = 6

Number of Weights: [4x3 + 2x4] = 20

Number of Parameters: 20 + 6 = 26 (biases!)

Number of Neurons: ? Number of Weights: ? Number of Parameters: ?

(66)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 66 21 Jan 2015

Neural Networks: Architectures

Number of Neurons: 4+2 = 6

Number of Weights: [4x3 + 2x4] = 20

Number of Parameters: 20 + 6 = 26 (biases!)

Number of Neurons: 4 + 4 + 1 = 9

Number of Weights: [4x3+4x4+1x4]=32 Number of Parameters: 32+9 = 41

(67)

Neural Networks: Architectures

Modern CNNs: ~10 million neurons

Human visual cortex: ~5 billion neurons

(68)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 68 21 Jan 2015

Example Feed-forward computation of a Neural Network

We can efficiently evaluate an entire layer of neurons.

(69)

Example Feed-forward computation of a Neural Network

(70)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 70 21 Jan 2015

What kinds of functions can a Neural Network represent?

[http://neuralnetworksanddeeplearning.com/chap4.html]

(71)

What kinds of functions can a Neural Network represent?

(72)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 72 21 Jan 2015

Setting the number of layers and their sizes

more neurons = more capacity

(73)

(you can play with this demo over at ConvNetJS: http://cs.stanford.

Do not use size of neural network as a regularizer. Use stronger regularization instead:

(74)

Fei-Fei Li & Andrej Karpathy Lecture 5 - 21 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 5 - 74 21 Jan 2015

Summary

- we arrange neurons into fully-connected layers

- the abstraction of a layer has a nice property in that it allows us to use efficient vectorized code (matrix

multiplies)

- neural networks are universal function approximators but this doesn’t mean much.

- neural networks are not neural

- neural networks: bigger = better (but might have to regularize more strongly)

(75)

Next Lecture:

More than you ever wanted to know

about Neural Networks and how to

train them

References

Related documents

[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei].. Lecture 10 - 8

Fei-Fei Li &amp; Andrej Karpathy &amp; Justin Johnson 6 22 Feb 20169. Caffe:

Denton et al, “Deep generative image models using a Laplacian pyramid of adversarial networks”, NIPS 2015. Generate

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 4 - 95 April 13, 2017 Biological Neurons:. ● Many

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 7 - April 25, 2017?. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 7 - 7 April

● Define layers in Python with numpy (CPU only).. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 8 - 150 April 27, 2017.

Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014.. Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 13 - May 18, 2017. Training GANs:

Fei-Fei Li &amp; Justin Johnson &amp; Serena Yeung Lecture 8 - 7 April 26, 2018!. Spot