• L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e
UNIVERSITE DES SCIENCES ET TECHNOLGIES DE LILLE LIFL – UMR 8022 – Bât. M3 – 59655 Villeneuve d’Ascq cedex Tél. : (33) 3 28 77 85 41 – Fax : (33) 3 28 77 85 39 – e-mail : … @lifl.fr
Human behavior analysis from videos using optical flow
Yassine Benabbas
Directeur de thèse : Chabane Djeraba
Multitel Workshop 2011
1
Plan
• Introduction
• State of the Art
• Global approach
– Recognition of human Actions – Crowd Event Detection
– Motion Pattern Extraction
• Conclusion
2
Introduction
• Automatic behavior analysis is a very active field in research and industry
• It consists in extracting information from videos using computer vision algorithms
• The extracted information is used to:
– Assist surveillance operators
– Provide statistics for marketing agents – Perform video retrieval
– Allow more natural and immersive human machine interactions
– …etc
3
State of the art
• Many approaches have been proposed for behavior analysis
– Human activity recognition [Le et al. cvpr2011 ] – Crowd event detection [Adam et al. TPAMI 2008]
– Motion pattern extraction [Rodriguez et al, iccv2009]
• However, they were focusing on a single aspect of behavior analysis or were very complex
– Example : Dynamic textures [Ma and Cisar, cvpr2009]
• Privacy issues are not addressed
• Intelligent cameras that contain embedded software require fast and reusable algorithms
4
Our approach
• We propose a generic approach for behavior analysis
• It is based on three levels of features
– Easier understanding
• Each level can be designed separately
– More control
• Each level can be reused for other purposes
– Save more processing power
• The lower level relies on motion information
– Preserves privacy ‘out of the box’
5
General Approach
6
High level information
Mid-level descriptors
Low level features
Video stream
Applications
• Human action recognition• Crowd event detection
• Motion pattern extraction
LOW LEVEL FEATURES
General approach
7
Interest point detection
• Identification of ‘good’ points that can be efficiently and easily
tracked.
• We used the « good features to track » algorithm
– Fast and efficient OpenCV implementation
– Jianbo Shi; Tomasi, C.; , "Good
features to track," Computer Vision and Pattern Recognition, 1994.
Proceedings CVPR '94., 1994 IEEE
Computer Society Conference on , vol., no., pp.593-600, 21-23 Jun 1994
doi: 10.1109/CVPR.1994.323794
8
Optical flow computation
• Estimate the motion of interest points
• Implementation of Bouguet
9
Frame t and its interest points
Frame t+1
+ =
Optical flow vectors
General Approach
10
High level information
Mid-level descriptors
Low level features
Video stream
Applications
• Human action recognition• Crowd event detection
• Motion pattern extraction
MID-LEVEL FEATURES : DIRECTION MODEL AND MAGNITUDE MODEL
General approach
11
Vector allocation to blocks
• Each vector is allocated to a block depending on its origin
• Eliminate vectors with a very small or a very big magnitude
12
Optical flow vectors allocated to
a matrix of 8x4 blocs
Direction model
13
• The orientations of optical flow vectors are clustered in each bloc
• The circular data is clustered
using von Mises distributions
• The orientations of optical flow vectors are clustered in each bloc
• The circular data is clustered using von Mises distributions
Direction model
14
Direction model (2)
• The direction model is updated at each new frame for all the duration of the video clip
15
t=0
Direction model
Optical flow
Direction model (2)
16
t=40
Direction model Bloc size: 20x20 Optical flow
• The direction model is updated at each new frame for all the
duration of the video clip
Direction model (2)
• The direction model is updated at each new frame for all the duration of the video clip
17
T=115
Direction model
Bloc size: 20x20
Optical flow
Direction model (2)
• The direction model is updated at each new frame for all the duration of the video clip
18
T=160
Optical flow Direction model
Bloc size: 20x20
Magnitude model
• The magnitude model is estimated following the same steps as the
direction model
• We estimate a Gaussian mixture for each bloc
19
APPLICATIONS
General approach
20
General Approach
21
High level information
Mid-level descriptors
Low level features
Video stream
Applications
• Human action recognition• Crowd event detection
• Motion pattern extraction
Human Action Recognition
• Different terminologies (action, activity, event)
• In this presentation: action recogntion consists in the identification of simple daylife actions(ex : walk, run...)
• Our input is a video (query video) captured from a monocular camera
22
Answer to the phone Boxing
Model associated to a video sequence
23
Model of a video = (direction model, magnitude model)
running
jogging
handwaving
handclapping
boxing walking
Distance metric
24
Query model
Template models
…
…
…
…
…
…
running
jogging
handwaving
handclapping
boxing walking
Distance metric
25
…
…
…
…
…
… Query model
Template models
running
jogging
handwaving
handclapping
boxing walking
Distance metric
26
…
…
…
…
…
…
Detected event
Query model
Template models
Distance metric
27
Distance between two direction models
Distance between two magnitude models
Result comparison
28
ADL dataset
KTH dataset
[BALD11] Yassine Benabbas, Samir Amir, Adel Lablack, and Chabane Djeraba. Human action recognition using direction and magnitude models of motion. In International Conference on Computer Vision and Applications (VISAPP), 2011
General Approach
29
High level information
Mid-level descriptors
Low level features
Video stream
Applications
• Human action recognition• Crowd event detection
• Motion pattern extraction
Crowd Event Detection
• Objective:
– Detection of interesting events or situation that occur in a crowd scene
• The targeted events are:
– Running – Splitting
– Local Dispersion – Evacuation
– Merging
• These events are defined in the PETS’2009 workshop.
30
Compute the instantaneous direction model
• Compute the direction model for the current frame
• Keep only the main orientation for each block of the direction model
31
Group Clustering and Tracking
• Cluster the neighboring blocks that have a similar direction into a group.
32
Group Clustering and Tracking
• Cluster the neighboring blocks that have a similar direction into a group.
33
Group Clustering and Tracking
• Cluster the neighboring blocks that have a similar direction into a group.
34
Group Clustering and Tracking
• Cluster the neighboring blocks that have a similar direction into a group.
• Define an orientation and a centroid for each group.
• Each group is tracked over the next frames
35
Event detection
• We use two classifiers:
– One for running and walking events using the mean motion speed as a feature
– One for local dispersion, split, merge and evacuation events using as features:
• Number of groups
• Mean orientation
• The circular variance
• Mean motion speed
• The mean distance between groups
– Using two classifiers allows to detect
36
Comparison
[BID11] - Yassine Benabbas, Nacim Ihaddadene, and Chabane Djeraba. Motion pattern extraction and event detection for 37 automatic visual surveillance. EURASIP Journal on Image and Video Processing, 2011:15, 2011
General Approach
38
High level information
Mid-level descriptors
Low level features
Video stream
Applications
• Human action recognition• Crowd event detection
• Motion pattern extraction
Motion Pattern Extraction
• It consists of extracting usual (or repetitive) patterns (or trends) of motion
• It can be considered as a synthesized information about the motion behavior in a video
39
Motion Pattern Extraction
• Motions patterns learned from a given scene can be used for modeling usual behaviors of subjects and have a lot of applications:
– They provide relevant information about subjects’
behavior.
– They can improve tracking results.
– They can help to detect events.
• Learning motion patterns in unstructured crowd scenes is a difficult task;
– In some locations in the scene, the motion has different orientations (example : zebra crossing)
40
Clustering similar regions
• Affect at most k major orientations for each cell.
– They are obtained from the cell’s mixture model.
• A direction model is obtained
Representation of the learned direction model 41
Clustering similar regions
• Cluster similar blocks
depending on their major orientations
– Two blocks are similar If they are neighbor, the window is one block.
– And the cosine similarity between two of their major orientations is less that a predefined threshold.
• A block can belong to a maximum of k clusters
42 Pattern 1
Pattern 2
Pattern 3 Direction Model
Experiments
• Car traffic video from the AVSS dataset
• The orientations of optical flow vectors are represented
43
Detected patterns
44
Putting it all together
45
Escalator
46
Comparison
[BID11] - Yassine Benabbas, Nacim Ihaddadene, and Chabane Djeraba. Motion pattern extraction and event detection for 47 automatic visual surveillance. EURASIP Journal on Image and Video Processing, 2011:15, 2011
Conclusion and future works
• Conclusions
– General approach for video analysis
– Based on motion, which preserves privacy – Very promising results
– Can be easily improved and applied to other applications
• Future works
– Open source behavior analysis toolbox – Apply approaches in real environments – Scale independent features
– In event detection: apply weights to direction and magnitude models
– Affine group analysis (detect walking and running persons inside a group)
48
QUESTIONS?
Thank you for your attention
49