Administrative details
11/25/2020 CAP5415 - Lecture 19 2
Administrative
• Mid-term
• 12/01/2020 Tuesday
• 4:30-5:45pm
• Same format as first mid-term
• Syllabus – All lectures until today (except first mid-term syllabus)
• No more homework
• SPI – Student Perception of Instruction
• Online questionnaire includes ONLY eleven (11) questions
• 9 multiple choice and 2 open comment questions
Administrative
• Project proposal missing
• Individual contribution in projects
• State how you contributed
• If there is any collaboration on something
• It should be stated in all such reports
• If there is any discrepancy, we will contact you!
11/25/2020 CAP5415 - Lecture 19 4
Questions?
Action Recognition Lecture 19
11/25/2020 CAP5415 - Lecture 19 6
Outline
• Action Recognition
• 3D CNN
• C3D
• I3D
• R(2+1)D
• Recurrent neural network
Video
• Sequences of frames
• 30 frames per second
CAP5415 - Lecture 19 8
11/25/2020
Sequences of Images
Action recognition
• Given a video
• Recognize which action is present
11/25/2020 CAP5415 - Lecture 19 10
Action recognition
• Variations
• Multiple instances
Action recognition
• Variations
• Multiple action
11/25/2020 CAP5415 - Lecture 19 12
Action recognition
• Variations
• Trimmed/untrimmed
• Temporal action localization
Action detection
• Spatio-temporal localization
11/25/2020 CAP5415 - Lecture 19 14
Action recognition – UCF101
Cycling Diving Golf Swinging Riding
Basketball Shooting Swinging Tennis Swinging
Volleyball Spiking
Action detection - VIRAT
11/25/2020 CAP5415 - Lecture 19 16
Video segmentation
CNN based solutions
• C3D
• I3d
• R(2+1)D
11/25/2020 CAP5415 - Lecture 19 18
3D convolution
C3D
• 8 3D conv layers
• 3x3x3 kernels
• 5 max pooling layers
• 2 fully connected layers
11/25/2020 CAP5415 - Lecture 19 20
Tran, Du, et al. "Learning spatiotemporal features with 3d convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.
I3D
I3D
• Rgb + optical flow
11/25/2020 CAP5415 - Lecture 19 22
Carreira, Joao, and Andrew Zisserman. "Quo vadis, action recognition? a new model and the kinetics dataset." proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
R(2+1)D
Practical aspects
- Pretrain on large datasets
- ImageNet, Kinetics, Sports1M, Youtube8M
- Finetune of the target dataset - Loading image weights for video
- ImageNet
11/25/2020 CAP5415 - Lecture 19 24
Video classification variants
Recurrent neural network
• Processing sequential data
• Text - “It is fun to learn deep learning.”
• Audio
• Video
26
11/25/2020 CAP5415 - Lecture 19
Recurrent neural network
• Comparison with convolution neural networks
Recurrent neural network
http://acsweb.ucsd.edu/~stripath/research/VOP.html
28
11/25/2020 CAP5415 - Lecture 19
Recurrent neural network
Recurrent neural network
Source: Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
11/25/2020 CAP5415 - Lecture 19 30• Process sequences
• Feedback from previous time step
An unrolled recurrent neural network.
Recurrent neural network
Recurrent neural network
http://karpathy.github.io
32Vanilla Neural Networks
11/25/2020 CAP5415 - Lecture 19
Recurrent neural network
Recurrent neural network
http://karpathy.github.io
34Sentiment Classification
sequence of words -> sentiment
11/25/2020 CAP5415 - Lecture 19
Recurrent neural network
Recurrent neural network
http://karpathy.github.io
36Video classification on frame level
11/25/2020 CAP5415 - Lecture 19
Activation functions - recap
Vanilla RNN
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
11/25/2020 CAP5415 - Lecture 19 38Long-short term memory (LSTM)
• Three main components
• Forget gate layer
Long-short term memory (LSTM)
• Three main components
• Forget gate layer
• Input gate layer
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
11/25/2020 CAP5415 - Lecture 19 40Long-short term memory (LSTM)
• Three main components
• Forget gate layer
• Input gate layer
• Output gate layer
Long-short term memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
42• Cell state
11/25/2020 CAP5415 - Lecture 19
Long-short term memory (LSTM)
• Forget gate
Long-short term memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
44• Input gate
11/25/2020 CAP5415 - Lecture 19
Long-short term memory (LSTM)
• Update cell state
Long-short term memory (LSTM)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
46• Output
11/25/2020 CAP5415 - Lecture 19
LSTM
Keras code
from keras.layers import LSTM model.add(LSTM(128))
48
11/25/2020 CAP5415 - Lecture 19
Video activity recognition
Questions?
11/25/2020 CAP5415 - Lecture 19 50