CAP5415 Computer Vision

(1)

CAP5415

Computer Vision

Yogesh S Rawat [email protected]

HEC-241

(2)

Administrative details

11/25/2020 CAP5415 - Lecture 19 2

(3)

Administrative

• Mid-term

• 12/01/2020 Tuesday

• 4:30-5:45pm

• Same format as first mid-term

• Syllabus – All lectures until today (except first mid-term syllabus)

• No more homework

• SPI – Student Perception of Instruction

• Online questionnaire includes ONLY eleven (11) questions

• 9 multiple choice and 2 open comment questions

(4)

Administrative

• Project proposal missing

• Individual contribution in projects

• State how you contributed

• If there is any collaboration on something

• It should be stated in all such reports

• If there is any discrepancy, we will contact you!

11/25/2020 CAP5415 - Lecture 19 4

(5)

Questions?

(6)

Action Recognition Lecture 19

11/25/2020 CAP5415 - Lecture 19 6

(7)

Outline

• Action Recognition

• 3D CNN

• C3D

• I3D

• R(2+1)D

• Recurrent neural network

(8)

Video

• Sequences of frames

• 30 frames per second

CAP5415 - Lecture 19 8

11/25/2020

(9)

Sequences of Images

(10)

Action recognition

• Given a video

• Recognize which action is present

11/25/2020 CAP5415 - Lecture 19 10

(11)

Action recognition

• Variations

• Multiple instances

(12)

Action recognition

• Variations

• Multiple action

11/25/2020 CAP5415 - Lecture 19 12

(13)

Action recognition

• Variations

• Trimmed/untrimmed

• Temporal action localization

(14)

Action detection

• Spatio-temporal localization

11/25/2020 CAP5415 - Lecture 19 14

(15)

Action recognition – UCF101

Cycling Diving Golf Swinging Riding

Basketball Shooting Swinging Tennis Swinging

Volleyball Spiking

(16)

Action detection - VIRAT

11/25/2020 CAP5415 - Lecture 19 16

(17)

Video segmentation

(18)

CNN based solutions

• C3D

• I3d

• R(2+1)D

11/25/2020 CAP5415 - Lecture 19 18

(19)

3D convolution

(20)

C3D

• 8 3D conv layers

• 3x3x3 kernels

• 5 max pooling layers

• 2 fully connected layers

11/25/2020 CAP5415 - Lecture 19 20

Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.

(21)

I3D

(22)

I3D

• Rgb + optical flow

11/25/2020 CAP5415 - Lecture 19 22

Carreira, Joao, and Andrew Zisserman. "Quo vadis, action recognition? a new model and the kinetics dataset." proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

(23)

R(2+1)D

(24)

Practical aspects

- Pretrain on large datasets

- ImageNet, Kinetics, Sports1M, Youtube8M

- Finetune of the target dataset - Loading image weights for video

- ImageNet

11/25/2020 CAP5415 - Lecture 19 24

(25)

Video classification variants

(26)

Recurrent neural network

• Processing sequential data

• Text - “It is fun to learn deep learning.”

• Audio

• Video

26

11/25/2020 CAP5415 - Lecture 19

(27)

Recurrent neural network

• Comparison with convolution neural networks

(28)

Recurrent neural network

http://acsweb.ucsd.edu/~stripath/research/VOP.html

28

11/25/2020 CAP5415 - Lecture 19

(29)

Recurrent neural network

(30)

Recurrent neural network

Source: Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

_11/25/2020 CAP5415 - Lecture 19 ₃₀

(31)

• Process sequences

• Feedback from previous time step

An unrolled recurrent neural network.

Recurrent neural network

(32)

Recurrent neural network

http://karpathy.github.io

32

Vanilla Neural Networks

11/25/2020 CAP5415 - Lecture 19

(33)

Recurrent neural network

(34)

Recurrent neural network

http://karpathy.github.io

34

Sentiment Classification

sequence of words -> sentiment

11/25/2020 CAP5415 - Lecture 19

(35)

Recurrent neural network

(36)

Recurrent neural network

http://karpathy.github.io

36

Video classification on frame level

11/25/2020 CAP5415 - Lecture 19

(37)

Activation functions - recap

(38)

Vanilla RNN

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

_11/25/2020 CAP5415 - Lecture 19 ₃₈

(39)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

(40)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

• Input gate layer

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

_11/25/2020 CAP5415 - Lecture 19 40

(41)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

• Input gate layer

• Output gate layer

(42)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

42

• Cell state

11/25/2020 CAP5415 - Lecture 19

(43)

Long-short term memory (LSTM)

• Forget gate

(44)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

44

• Input gate

11/25/2020 CAP5415 - Lecture 19

(45)

Long-short term memory (LSTM)

• Update cell state

(46)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

46

• Output

11/25/2020 CAP5415 - Lecture 19

(47)

LSTM

(48)

Keras code

from keras.layers import LSTM model.add(LSTM(128))

48

11/25/2020 CAP5415 - Lecture 19

(49)

Video activity recognition

(50)

Questions?

11/25/2020 CAP5415 - Lecture 19 50