Global and Efficient Self-Similarity for Object Classification and Detection

(1)

Global and Efficient Self-Similarity for

Object Classification and Detection

CVPR 2010

Thomas Deselaers and Vittorio Ferrari

CALVIN group

Computer Vision Laboratory

ETH Zurich

Switzerland

(2)

Conventional Image Descriptors

Measure direct image properties

gradients

(3)

Self-Similarity vs Conventional Descriptors

[Shechtman, Irani CVPR 07]

Assumption of conventional image descriptors

• There is a direct visual property shared by images of objects of the same class (e.g. colors, gradients, …).

• This property can be used to compare images.

Self-similarity:

• Indirect property: geometric layout of repeating patches within an image • More general property

(4)

Local Self-Similarity Descriptors

(5)

Using Local Self-Similarity Descriptors

Applications: object recognition, image retrieval, action recognition

• Ensemble matching [Shechtman CVPR 07]

• Nearest neighbor matching [Boiman CVPR 08] • Bag of local self-similarities

*Gehler ICCV09, Vedaldi ICCV09, Hörster ACMM08, Lampert CVPR09, Chatfield ICCV09 WS]

1. Compute LSS descriptors for an image 2. Assign the LSS descriptors to a codebook

(6)

Self-Similarity goes Global

(7)

Self-Similarity goes Global

(8)

compute self-similarity between all pairs of

pixels

Global Self-Similarity Tensor

4D self-similarity tensor

(9)

Problems with the GSS Tensor

• Computation time:

• Memory requirement:

Aim: Reduce both

300 500 11 11

∼ 80GB

∼ 20h

(10)

Outline

• Efficient global self-similarity tensor

• Global self-similarity descriptors

– Bag of correlation surfaces – Self-similarity hypercubes

• Detection with self-similarity hypercubes

– Efficient sliding window

– Efficient subwindow search

• Experiments

– Global self-similarity better than local self-similarity – Complementary to conventional descriptors

(11)

Efficient Global Self-Similarity Tensor

Find an efficient approximation to

Quantize patches according to codebook

If two patches are assigned to the same prototype, they are similar

(12)

Efficient Global Self-Similarity

Two patches are only similar if they are assigned to the same prototype

(13)

Patch Prototype Codebooks

Remember: Self-similarity encodes image content indirectly

Image-specific codebooks can be smaller than conventional ones

see paper for more generic codebooks and

(14)

• Self-similarity hypercubes: now

• Bag of correlation surfaces: only in the paper

Global Self-Similarity Descriptors

So far:

• Compact GSS computed efficiently

Now:

• Descriptors that can be used in machine learning classifiers • Fixed dimensionality

(15)

Self-Similarity Hybercubes

(16)

SSHs for Detection

• Computing SSH naïvely requires operations

• Sliding windows has to evaluate many windows

(17)

Efficient Computation of SSHs

Compute integral self-similarity tensor:

operations to compute SSH

for an image window



∼5000x speedup

 160000

(18)

Efficient Subwindow Search for SSH

• Derive an upper bound on

the score of a set of

windows

• Section 5.2 in our paper

(19)

Experiments: Object classification

PASCAL 07 objects

– 9608 cropped images of objects from PASCAL 07 – 20 classes

Task: Classify each test image into one of 20 classes Model: Linear SVM

(20)

Classification on the PASCAL 07 objects set

+ GSS outperform LSS

+ Self-Similarity is truly complementary to conventional descriptors

classif ic at ion accur acy [%]

(21)

Experiments: Object detection

ETHZ Shape Classes

– 255 images

– 5 classes (apple logos, bottles, giraffes, mugs, swans)

Task: Detect objects in images

Detector: Linear SVM, sliding windows

(22)

Detection Results

+ SSH outperforms BOLSS

+ it is possible to use GSS for detection with good results

BoLSS SSH apple logos 10.0 80.0 bottles 10.7 96.4 giraffes 23.4 85.1 mugs 6.5 67.7 swans 17.6 70.6 Average 13.6 80.0 DR at FPPI 0.4 apple logos bottles giraffes mug s swans

}

SSH BoLSS FPPI 0.4

Comparison results (avg): [Ferrari CVPR07]: 71.9 [Maji CVPR09]: 93.2 … many more DR at 0.5 PASCAL o verlap

(23)

Runtimes for Computing Descriptors

• 200x200 image • GSS tensor

– directly: 5512s (∼1.5 hours)

– using our method: 81s (∼1.5 minutes) • Computing descriptors: few seconds

• Our method: 70x speedup

• For Reference: – GIST: 0.4s – BOLSS: 0.7s

(24)

Runtimes for Detection

Given the prototype assignment map (80s) (once only) SSH sliding window: 30s/img (once per class)

For Comparison

– Computing direct GSS tensor for 25000 windows: 4 years/img Speedup: ∼1 million

⇒ Using our methods, GSS can be used for object detection For Reference:

(25)