• No results found

CAP5415 Computer Vision

N/A
N/A
Protected

Academic year: 2022

Share "CAP5415 Computer Vision"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

CAP5415

Computer Vision

Yogesh S Rawat [email protected]

HEC-241

(2)

Administrative details

11/25/2020 CAP5415 - Lecture 19 2

(3)

Administrative

• Mid-term

• 12/01/2020 Tuesday

• 4:30-5:45pm

• Same format as first mid-term

• Syllabus – All lectures until today (except first mid-term syllabus)

• No more homework

• SPI – Student Perception of Instruction

• Online questionnaire includes ONLY eleven (11) questions

• 9 multiple choice and 2 open comment questions

(4)

Administrative

• Project proposal missing

• Individual contribution in projects

• State how you contributed

• If there is any collaboration on something

• It should be stated in all such reports

• If there is any discrepancy, we will contact you!

11/25/2020 CAP5415 - Lecture 19 4

(5)

Questions?

(6)

Action Recognition Lecture 19

11/25/2020 CAP5415 - Lecture 19 6

(7)

Outline

• Action Recognition

• 3D CNN

• C3D

• I3D

• R(2+1)D

• Recurrent neural network

(8)

Video

• Sequences of frames

• 30 frames per second

CAP5415 - Lecture 19 8

11/25/2020

(9)

Sequences of Images

(10)

Action recognition

• Given a video

• Recognize which action is present

11/25/2020 CAP5415 - Lecture 19 10

(11)

Action recognition

• Variations

• Multiple instances

(12)

Action recognition

• Variations

• Multiple action

11/25/2020 CAP5415 - Lecture 19 12

(13)

Action recognition

• Variations

• Trimmed/untrimmed

• Temporal action localization

(14)

Action detection

• Spatio-temporal localization

11/25/2020 CAP5415 - Lecture 19 14

(15)

Action recognition – UCF101

Cycling Diving Golf Swinging Riding

Basketball Shooting Swinging Tennis Swinging

Volleyball Spiking

(16)

Action detection - VIRAT

11/25/2020 CAP5415 - Lecture 19 16

(17)

Video segmentation

(18)

CNN based solutions

• C3D

• I3d

• R(2+1)D

11/25/2020 CAP5415 - Lecture 19 18

(19)

3D convolution

(20)

C3D

• 8 3D conv layers

• 3x3x3 kernels

• 5 max pooling layers

• 2 fully connected layers

11/25/2020 CAP5415 - Lecture 19 20

Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.

(21)

I3D

(22)

I3D

• Rgb + optical flow

11/25/2020 CAP5415 - Lecture 19 22

Carreira, Joao, and Andrew Zisserman. "Quo vadis, action recognition? a new model and the kinetics dataset." proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

(23)

R(2+1)D

(24)

Practical aspects

- Pretrain on large datasets

- ImageNet, Kinetics, Sports1M, Youtube8M

- Finetune of the target dataset - Loading image weights for video

- ImageNet

11/25/2020 CAP5415 - Lecture 19 24

(25)

Video classification variants

(26)

Recurrent neural network

• Processing sequential data

• Text - “It is fun to learn deep learning.”

• Audio

• Video

26

11/25/2020 CAP5415 - Lecture 19

(27)

Recurrent neural network

• Comparison with convolution neural networks

(28)

Recurrent neural network

http://acsweb.ucsd.edu/~stripath/research/VOP.html

28

11/25/2020 CAP5415 - Lecture 19

(29)

Recurrent neural network

(30)

Recurrent neural network

Source: Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

11/25/2020 CAP5415 - Lecture 19 30

(31)

• Process sequences

• Feedback from previous time step

An unrolled recurrent neural network.

Recurrent neural network

(32)

Recurrent neural network

http://karpathy.github.io

32

Vanilla Neural Networks

11/25/2020 CAP5415 - Lecture 19

(33)

Recurrent neural network

(34)

Recurrent neural network

http://karpathy.github.io

34

Sentiment Classification

sequence of words -> sentiment

11/25/2020 CAP5415 - Lecture 19

(35)

Recurrent neural network

(36)

Recurrent neural network

http://karpathy.github.io

36

Video classification on frame level

11/25/2020 CAP5415 - Lecture 19

(37)

Activation functions - recap

(38)

Vanilla RNN

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

11/25/2020 CAP5415 - Lecture 19 38

(39)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

(40)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

• Input gate layer

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

11/25/2020 CAP5415 - Lecture 19 40

(41)

Long-short term memory (LSTM)

• Three main components

• Forget gate layer

• Input gate layer

• Output gate layer

(42)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

42

• Cell state

11/25/2020 CAP5415 - Lecture 19

(43)

Long-short term memory (LSTM)

• Forget gate

(44)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

44

• Input gate

11/25/2020 CAP5415 - Lecture 19

(45)

Long-short term memory (LSTM)

• Update cell state

(46)

Long-short term memory (LSTM)

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

46

• Output

11/25/2020 CAP5415 - Lecture 19

(47)

LSTM

(48)

Keras code

from keras.layers import LSTM model.add(LSTM(128))

48

11/25/2020 CAP5415 - Lecture 19

(49)

Video activity recognition

(50)

Questions?

11/25/2020 CAP5415 - Lecture 19 50

Sources for this lecture include materials from works by

References

Related documents

The proposed security architecture is composed of two main components: trust manager service ( TM ) and policy monitoring and enforcement component ( PME )1. The TM is itself

Situated in Unitec Institute of Technology, a tertiary institution in Auckland, the major aims of the research project were: (a) to assess students and parents efficacies and agency

Messina Line is a company that currently performs liner services between the main ports in the West Mediterranean through Genoa, Marina di Carrara, Naples, Marseilles,

The reconstructed sign and magnitude of the annual change in these variables are comparable for Cox’s Bazar (0.26°C versus Figure 6: Total precipitation, number of rain days

Unusual zircon textures are spatially associ- ated with Fe–Ti oxides and occur as (1) vermicular-shaped aggregates 50–200 µm long and 5–20 µm thick and as (2) zir- con coronae

The descriptive approach has been applied to identify the concept of doctoral dissertation, its purpose, the expected roles of discussing it in front of a specialized

Normalized mean values of texture characteristics for calm open water (OWc), rough open water (OWr), ice, and fast ice, calculated in window size 64 × 64 pixels: (a) energy and (b) σ

applicants receive equal consideration utilizing a computerized randomization process while applicants with Veteran status and previous year alternates may receive one