Global and Efficient Self-Similarity for
Object Classification and Detection
CVPR 2010
Thomas Deselaers and Vittorio Ferrari
CALVIN group
Computer Vision Laboratory
ETH Zurich
Switzerland
Conventional Image Descriptors
Measure direct image properties
gradients
Self-Similarity vs Conventional Descriptors
[Shechtman, Irani CVPR 07]
Assumption of conventional image descriptors
• There is a direct visual property shared by images of objects of the same class (e.g. colors, gradients, …).
• This property can be used to compare images.
Self-similarity:
• Indirect property: geometric layout of repeating patches within an image • More general property
Local Self-Similarity Descriptors
Using Local Self-Similarity Descriptors
Applications: object recognition, image retrieval, action recognition
• Ensemble matching [Shechtman CVPR 07]
• Nearest neighbor matching [Boiman CVPR 08] • Bag of local self-similarities
*Gehler ICCV09, Vedaldi ICCV09, Hörster ACMM08, Lampert CVPR09, Chatfield ICCV09 WS]
1. Compute LSS descriptors for an image 2. Assign the LSS descriptors to a codebook
Self-Similarity goes Global
Self-Similarity goes Global
compute self-similarity between all pairs of
pixels
Global Self-Similarity Tensor
4D self-similarity tensor
Problems with the GSS Tensor
• Computation time:
• Memory requirement:
Aim: Reduce both
300 500 11 11
∼ 80GB
∼ 20h
Outline
• Efficient global self-similarity tensor
• Global self-similarity descriptors
– Bag of correlation surfaces – Self-similarity hypercubes
• Detection with self-similarity hypercubes
– Efficient sliding window– Efficient subwindow search
• Experiments
– Global self-similarity better than local self-similarity – Complementary to conventional descriptors
Efficient Global Self-Similarity Tensor
Find an efficient approximation to
Quantize patches according to codebook
If two patches are assigned to the same prototype, they are similar
Efficient Global Self-Similarity
Two patches are only similar if they are assigned to the same prototype
Patch Prototype Codebooks
Remember: Self-similarity encodes image content indirectly
Image-specific codebooks can be smaller than conventional ones
see paper for more generic codebooks and
• Self-similarity hypercubes: now
• Bag of correlation surfaces: only in the paper
Global Self-Similarity Descriptors
So far:• Compact GSS computed efficiently
Now:
• Descriptors that can be used in machine learning classifiers • Fixed dimensionality
Self-Similarity Hybercubes
SSHs for Detection
• Computing SSH naïvely requires operations
• Sliding windows has to evaluate many windows
Efficient Computation of SSHs
Compute integral self-similarity tensor:
operations to compute SSH
for an image window
∼5000x speedup
160000
Efficient Subwindow Search for SSH
• Derive an upper bound on
the score of a set of
windows
• Section 5.2 in our paper
Experiments: Object classification
PASCAL 07 objects– 9608 cropped images of objects from PASCAL 07 – 20 classes
Task: Classify each test image into one of 20 classes Model: Linear SVM
Classification on the PASCAL 07 objects set
+ GSS outperform LSS
+ Self-Similarity is truly complementary to conventional descriptors
classif ic at ion accur acy [%]
Experiments: Object detection
ETHZ Shape Classes– 255 images
– 5 classes (apple logos, bottles, giraffes, mugs, swans)
Task: Detect objects in images
Detector: Linear SVM, sliding windows
Detection Results
+ SSH outperforms BOLSS
+ it is possible to use GSS for detection with good results
BoLSS SSH apple logos 10.0 80.0 bottles 10.7 96.4 giraffes 23.4 85.1 mugs 6.5 67.7 swans 17.6 70.6 Average 13.6 80.0 DR at FPPI 0.4 apple logos bottles giraffes mug s swans
}
}
SSH BoLSS FPPI 0.4Comparison results (avg): [Ferrari CVPR07]: 71.9 [Maji CVPR09]: 93.2 … many more DR at 0.5 PASCAL o verlap
Runtimes for Computing Descriptors
• 200x200 image • GSS tensor
– directly: 5512s (∼1.5 hours)
– using our method: 81s (∼1.5 minutes) • Computing descriptors: few seconds
• Our method: 70x speedup
• For Reference: – GIST: 0.4s – BOLSS: 0.7s
Runtimes for Detection
Given the prototype assignment map (80s) (once only) SSH sliding window: 30s/img (once per class)
For Comparison
– Computing direct GSS tensor for 25000 windows: 4 years/img Speedup: ∼1 million
⇒ Using our methods, GSS can be used for object detection For Reference:
Global and Efficient Self-Similarity for
Object Classification and Detection
CVPR 2010
Thomas Deselaers and Vittorio Ferrari
CALVIN group
Computer Vision Laboratory
ETH Zurich
Switzerland
Conclusion
• self-similarity should be considered globally
– Global self-similarity performs better than local self-similarity
• truly complementary to conventional descriptors
• global self-similarity is feasible
– efficient computation of self-similarity – two descriptors based on self-similarity