Content based Image Retrieval

(1)

SVCL

1

Content based Image Retrieval

(at SVCL)

Nikhil Rasiwasia, Nuno Vasconcelos Statistical Visual Computing Laboratory

University of California, San Diego

ECE271A – Fall 2007

(2)

SVCL

2

Why image retrieval?

• Help in finding you the images you want.

Source: http://www.bspcn.com/2007/11/02/25-photographs-taken-at-the-exact-right-time/

(3)

SVCL

3

But there is Google right?

• Metadata based retrieval systems

– text, click-rates, etc.

– Google Images

– Clearly not sufficient

• what if computers understood images?

– Content based image retrieval (early 90’s) – search based on

the image content

Top 12 retrieval results for the query ‘Mountain’

Metadata based retrieval systems

(4)

SVCL

4

Early understanding of images.

• Query by Visual Example(QBVE)

– user provides query image

– system extracts image features (texture, color, shape) – returns nearest neighbors using suitable similarity

measure

Texture similarity

Color similarity

Shape similarity

(5)

SVCL

5

• Bag of features representation

– No spatial information () – Yet performs good ()

• Each Feature represented by DCT coefficients

– Other people use SIFT, Gabor filters etc

This is a graduate class, so! Details

dct ( ) ⁺

(6)

SVCL

6

Image representation

Bag of DCT vectors

+ ₊ ⁺

+ + +

+ +

+ + + +

++ +

+ + +

+ +

+ + +

+ + + + +

+ +

+ + +

+ +

+ + ++ +

+

+ +

+ + +

+ + + +

+

+ +

+ + + +

+ + +

+ +

+ + + +

+ + +

+ + + +

+ ++

++ +

+ + + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

+ + +

+ + ++ +

+ + + +

+ + + + + + ++

+ +

+ + +

++++ + + +

+ + + + + ++

+ + + +

+ + +

+ ++ +

+ + + +

+ + +

+ + ++ ++ +

+ +

++

++ ++

+ +

+ + +

+ +

+ + ++

+

+ + +

+ +

+ + +

+ +

+ + +

+ + + ++

+ +

+ + ++

+ + + + +

+ + +++ +

+ +

+ + +

+ +

+ + +

+ + + +

++ ++ +

+ + + +

+ + + + ++ +

+ + + +

+ + +

+ ++

+ + + +

+ +

++ + + +

+ + +

++++ + +

+ + +

++++ + + + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

+ + + +

+ + +

+ ++

+ + + +

+ + +

+ + + + +

+ + +

+ +

+ + +

+ +

+ + + +

+ + +

+ +

+ + +

+ +

+ + +

+ +

+ + + +

+

+ +

+ + + + + + +

+ + +

+ +

+

+ + + +

+ +

+ + + + + + +

+ + +

+

GMM

(7)

SVCL

7

Query by visual example

Query Image

Candidate Images

Probability under various models

Ranking p1 >

p2

.

> p

_n

(8)

SVCL

8

Query by visual example (QBVE)

QUERY TOP MATCHES

(9)

SVCL

9

What can go wrong?

QUERY TOP MATCHES

(10)

SVCL

10

• visual similarity does not always correlate with “semantic” similarity

This can go wrong!

Both have visually dissimilar sky

Disagreement of the semantic notions of train with the visual notions of arch.

(11)

SVCL

11

Intelligent Researchers (like u)…

• Semantic Retrieval (SR)

– User provided a query text (keywords)

– find images that contains the associated semantic concept.

– around the year 2000,

– model semantic classes, learn to annotate images

– Provides higher level of abstraction, and supports natural language queries

abc

query: “people, beach”

(12)

SVCL

12

Semantic Class Modeling

+ ₊ ⁺ + + + +

+

+ + + + + + +

++ + + +

+ + +

++ + + + + ++

+ +

+ + +

++ + + ++

+ +

+ + + ++ + + +

+

+ + + + + + + +++

+ ++

+ +

+++ + +

+ +

+ + + +

+ ++

++ +

+ + + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

+ + +

+ + ++ +

+ + + +

+ + + + ++ +++ +

+ + +

++++ + + +

+ + + + + ++

+ + + +

+ + +

+ ++ +

+ + + +

+ + +

+ + ++ ++ +

+ +

++

++ ++

+ +

+ + +

+ +

+ + ++

+

+++ + ++

+ +

+ ++

+ +

+ + + ++

+ +

+ + ++

++ + + ++ + ++ +++ +

+ ++

++ + +++

+ + +

++ ++ +

++ + +

+ + + + ++ +

++ + +

+ + +

+ ++

+ + ++ + + ++

+ + +

+ + + ++ +++ +

+ + +

+ +++ ++ + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

++ + +

+ + + + ++

+ + + +

+ +

+ +++

+ ++

+ +

+ + +

+ + + + +

+ +

+++ + + + +

+ + +++ + + + + +

+ + +

+ + + ++

+ +

+ + + + + + + + + +++ + ++

+ +

+ + + ++

+ +

+++ + ++ +

+ + +

+

GMM

w

_i

= mountain mountain

 x mountain 

P

_X_|_W

|

Semantic Class Model Efficient

Hierarchical Estimation

•“Formulating Semantics Image Annotation as a Supervised Learning Problem”

[G. Carneiro, IEEE Trans. PAMI, 2007]

(13)

SVCL

13

Semantic Retrieval

Query Image

Candidate Words

Probability under various models

Ranking p1 >

p2

.

> p

_n

Mountian

Sky

Sexy

Girl

… so on

house

(14)

SVCL

14

(15)

– multiple meaning of the same word

• Anchor - TV anchor or for Ship?

• Bank - Financial Institution or River bank?

• Multiple semantic interpretations of an image

• Boating or Fishing or People?

• Limited by Vocabulary size

– What if the system was not trained for ‘Fishing’

– In other words, it is outside the

space of trained semantic concepts

Lake? Fishing? Boating?

People?

Fishing! what if not in the vocabulary?

abc

(21)

SVCL

21

In Summary

• SR Higher level of abstraction

– Better generalization inside the space of trained semantic concepts

– But problem of

• Lexical ambiguity

• Multiple semantic interpretations

• Vocabulary size

• QBVE is unrestricted by language.

– Better Generalization outside the space of trained semantic concepts

• a query image of ‘Fishing’ would retrieve visually similar images.

– But weakly correlated with human notion of similarity

VS

abc

Both have visually dissimilar sky Fishing! what if not in the vocabulary?

Lake? Fishing? Boating? People?

The two systems in many respects are complementary!

(22)

SVCL

22

Query by Semantic Example (QBSE)

• Suggests an alternate query by example paradigm.

– The user provides an image.

– The image is mapped to vector of weights of all the semantic concepts in the vocabulary, using a semantic labeling system.

– Can be thought as an projection to an abstract space, called as the semantic space

– To retrieve an image, this weight vector is matched to database, using a suitable similarity function

Lake Water

People

Sky

…

_Bo^at

.2 .3 .2 .1 … …

Semantic Space Mapping to an abstract space of semantic concepts

Semantic multinomial

vector of weights or

(23)

SVCL

23

Query by Semantic Example (QBSE)

• As an extension of SR

– Query specification not as set of few words.

– But a vector of weights of all the semantic concept in the vocabulary.

– Can eliminat

• Problem of lexical ambiguity- Bank+’more’

• Multiple semantic interpretation– Boating,

People

• Outside the ‘semantic space’ – Fishing.

• As an enrichment of QBVE

– The query is still by an example paradigm.

– But feature space is Semantic.

• A mapping of the image to an abstract space.

– Similarity measure at a higher level of abstraction.

.1 .2 .1 .3 … … .2

Lake Water

People

Boating …

Boat

0 .5 0 .5 … … 0

(SR) query: water, boating

 =

(QBVE) query: image

Lake Water

People

Boating …

Boat

Semantic Space

Boating Water

Lake

(24)

SVCL

24

QBSE System

Concept 1 Query Image

Any Semantic Labeling System

Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector 1 Weight Vector 2 Weight Vector 3 Weight Vector 4 Weight Vector 5

Weight Vector N

. . .

1



2



3



L Posterior probability Weight Vector

(26)

SVCL

26

Semantic Class Modeling

+ ₊ ⁺ + + + +

+

+ + + + + + +

++ + + +

+ + +

++ + + + + ++

+ +

+ + +

++ + + ++

+ +

+ + + ++ + + +

+

+ + + + + + + +++

+ ++

+ +

+++ + +

+ +

+ + + +

+ ++

++ +

+ + + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

+ + +

+ + ++ +

+ + + +

+ + + + ++ +++ +

+ + +

++++ + + +

+ + + + + ++

+ + + +

+ + +

+ ++ +

+ + + +

+ + +

+ + ++ ++ +

+ +

++

++ ++

+ +

+ + +

+ +

+ + ++

+

+++ + ++

+ +

+ ++

+ +

+ + + ++

+ +

+ + ++

++ + + ++ + ++ +++ +

+ ++

++ + +++

+ + +

++ ++ +

++ + +

+ + + + ++ +

++ + +

+ + +

+ ++

+ + ++ + + ++

+ + +

+ + + ++ +++ +

+ + +

+ +++ ++ + +

+ + +

+ +

++ +

+ + + +

+ + +

+ ++ +

++ + +

+ + + + ++

+ + + +

+ +

+ +++

+ ++

+ +

+ + +

+ + + + +

+ +

+++ + + + +

+ + +++ + + + + +

+ + +

+ + + ++

+ +

+ + + + + + + + + +++ + ++

+ +

+ + + ++

+ +

+++ + ++ +

+ + +

+

Gaussian Mixture

Model

w

_i

= mountain mountain

 x mountain 

P

_X_|_W

|

Semantic Class Model Efficient

Hierarchical Estimation

•“Formulating Semantics Image Annotation as a Supervised Learning Problem”

[G. Carneiro, CVPR 2005]

(27)

SVCL

27

QBSE System

Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector N

. . .

Measure

. . .

P_X_|_W |

x clouds

P_X_|_W |

 





 









 L



3 2 1



 L



i i 1

 1

(29)

SVCL

29

Semantic Multinomial

(30)

SVCL

30

QBSE System

Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector N

. . .

Measure

. . .

• A natural similarity function is the Kullback-Leibler divergence

Query ^{Query SMN}

)

||

( max

arg )

(

_i

KL

_i

f    

j i L j

j

i



 log  max

arg



1



<>

Database

(32)

SVCL

32

Semantic Feature Space

• The space is the simplex of posterior concept probabilities

• Each image/SMN is thus represented as a point in this simplex

Traditional Text Based Query Query by Semantic

Example

VS abc

vegetation

.2

x x o

xx x

xx x x

x x x

xx x

xx x x

x xx

xx xx x

xx x

xx x xx

x x

xx xx xx x

xx xx x

xx

Semantic sim plex

o - query x – database images

closest match es mountain

sky

rocks

mountain sky

vegetation

… .4 … .2 … … …

x o

xx

x x

x

x x

x

x x x

x

x x

x x xx x

x x x x

x x

o - query

x xx x x

x x x

x x x x

x

x x x mountain

sky

rocks vegetation

…

“mountains + sky”

mountain sky

vegetation

x – database images

0 .5 0 0 .5 0 …

Closest matches

(33)

SVCL

34

Evaluation – Precision:Recall:Scope

Irrelevant Images |A|

Relevant Images |B|

Retrieved Images |A_r|+|B_r|

|A| |A

_r

| |B

_r

| |B|

|B

_r

|

Recall = |B|

relevant images to all the images retrieved.

The proportion of relevant images that are retrieved, out of all relevant images available.

The number of images that are retrieved.

(34)

SVCL

35

Relevant ^#retrieved Precision Recall

Yes 1 1/1 1/3

No 2 1/2 1/3

Yes 3 2/3 2/3

No 4 2/4 2/3

Yes 5 3/5 3/3

Query

Ranking

(35)

SVCL

36

Experimental Setup

• Evaluation Procedure ^[Feng,04]

– Precision-Recall(scope) Curves : Calculate precision at various recalls(scopes).

– Mean Average Precision: Average precision over all queries, where recall changes (i.e. where relevant items occur)

• Training the Semantic Space

– Images – Corel Stock Photo CD’s – Corel50

• 5,000 images from 50 CD’s, 4,500 used for training the space

– Semantic Concepts

• Total of 371 concepts

• Each Image has caption of 1-5 concepts

• Semantic concept model learned for each concept.

(36)

SVCL

37

Experimental Setup

• Retrieval inside the Semantic Space.

– Images – Corel Stock Photo CD’s – same as Corel50

• 4,500 used as retrieval database

• 500 used to query the database

• Retrieval outside the Semantic Space

– Corel15 - Another 15 Corel Photo CD’s, (not used previously)

• 1200: retrieval database, 300: query database

– Flickr18 - 1800 images Downloaded from www.flickr.com

• 1440: retrieval database, 360: query database

• harder than Corel images as shot by non-professional flickr users

(37)

SVCL

38

Inside the Semantic Space

• Precision of QBSE is significantly higher at most levels of recall

VS

QBSE QBVE

“whitish + darkish”

“train + railroad”

Inside the

Semantic

Space

(41)

SVCL

42

Outside the Semantic Space

(42)

SVCL

43

People 0.09

Buildings 0.07

Street 0.07

Statue 0.05

Tables 0.04

Water 0.04

Restaurant 0.04

Buildings 0.06 People 0.06 Street 0.06 Statue 0.04

Tree 0.04

Boats 0.04 Water 0.03 People 0.08 Statue 0.07 Buildings 0.06 Tables 0.05 Street 0.05 Restaurant 0.04 House 0.03

People 0.1 Statue 0.08 Buildings 0.07 Tables 0.06 Street 0.06

Door 0.05

Restaurant 0.04 People 0.12

Restaurant 0.07

Sky 0.06

Tables 0.06 Street 0.05 Buildings 0.05 Statue 0.05

QBVE

QBSE

Commercial Construction

VS

(43)

SVCL

44

• nearest neighbors in this space is

significantly more robust

QBSE vs QBVE

x xo_x

x x

xx x x

x x x

xx x

xx x x

x xx

xx xx x

xx x

xx x xx

x x

xx xx xx x

xx xx x

xx

o - query x – database

images closest

matches locomotive

sky

bridge train

2 4 2

2    

• both in terms of

– metrics

– subjective matching quality

•“Query by semantic example”

[N. Rasiawasia, IEEE Trans. Multimedia 2007]

(44)

SVCL

45

Structure of the Semantic Space

• is the gain really due to the semantic structure of the SMN space?

• this can be tested by comparing to a space where the probabilities are relative to random image

groupings

w

Semantic Class Model Efficient

Hierarchical

Estimation

(45)

SVCL

46

Content based Image Retrieval

SVCL

Content based Image Retrieval

(at SVCL)

Nikhil Rasiwasia, Nuno Vasconcelos Statistical Visual Computing Laboratory

University of California, San Diego

ECE271A – Fall 2007

SVCL

Why image retrieval?

• Help in finding you the images you want.

SVCL

But there is Google right?

• Metadata based retrieval systems

– text, click-rates, etc.

– Google Images

– Clearly not sufficient

• what if computers understood images?

– Content based image retrieval (early 90’s) – search based on

the image content

Metadata based retrieval systems

SVCL

Early understanding of images.

• Query by Visual Example(QBVE)

– user provides query image

– system extracts image features (texture, color, shape) – returns nearest neighbors using suitable similarity

measure

SVCL

• Bag of features representation

– No spatial information () – Yet performs good ()

• Each Feature represented by DCT coefficients

– Other people use SIFT, Gabor filters etc

This is a graduate class, so! Details

dct ( ) +

SVCL

Image representation

GMM

SVCL

Query by visual example

Query Image

Candidate Images

Ranking p1 >

p2

.

.

> p

SVCL

Query by visual example (QBVE)

SVCL

What can go wrong?

SVCL

This can go wrong!

SVCL

Intelligent Researchers (like u)…

• Semantic Retrieval (SR)

– User provided a query text (keywords)

– find images that contains the associated semantic concept.

– around the year 2000,

– model semantic classes, learn to annotate images

– Provides higher level of abstraction, and supports natural language queries

SVCL

Semantic Class Modeling

GMM

w

= mountain mountain

 x mountain 

P

|

Semantic Class Model Efficient

Hierarchical Estimation

•“Formulating Semantics Image Annotation as a Supervised Learning Problem”

SVCL

Semantic Retrieval

Query Image

Candidate Words

Ranking p1 >

p2

.

.

> p

Mountian

dct ( ) ⁺