Exploiting the Web of Data for cross-domain
information retrieval and recommendation
VII Jornadas MAVIR
Avances en Tecnologías de la Lengua y Acceso a la Información Multimedia
Escuela Politécnica Superior, Universidad Carlos III de Madrid
Ignacio Fernández-Tobías
under the supervision of
Iván Cantador
Grupo de Recuperación de Información
Universidad Autónoma de Madrid
1
•
Introduction: Cross-domain item recommendation
•
Case study: Linking music with places of interest
•
A semantic-based framework for linking domains
•
Cross-domain semantic networks from Wikipedia
•
Cross-domain semantic networks from Open Information Extraction
•
A social tag-based emotion-oriented approach for linking domains
2
Introduction: Cross-domain item recommendation
•
Recommender systems
help users to make choices, by proactively
finding relevant items or services, taking into account or predicting the
users’ tastes, priorities and goals
•
The vast majority of the currently available recommender systems predicts
the user’s relevance of items in a specific and limited domain
3
Introduction: Cross-domain item recommendation
•
In some applications, it could be useful to offer the user joint personalized
recommendations of items belonging to multiple domains
•
In an e-commerce site, we may suggest
movies
or
videogames
based on a
particular
book
bought by a costumer
•
In a travel application, we may suggest
cultural events
may interest a person
who has booked a hotel in a particular
place
•
In an e-learning system, we may suggest educational
websites
with topics
related to a
video
documentary
a student has seen
•
Potential benefits
•
Offering diversity and serendipity
•
Addressing the cold-start problem (on a target domain)
•
Mitigating the sparsity problem
Fernández-Tobías, I., Cantador, I., Kaminskas, M., Ricci, F. 2012. Cross-domain Recommender
Systems: A Survey of the State of the Art.
2nd Spanish Conference on Information Retrieval
.
4
Introduction: Cross-domain item recommendation
•
Some real applications (e.g. Amazon) do already recommend items from
different domains, but
•
their recommendations rely on statistical analysis of
popular items
, without any
personalization strategy, or
•
most of them only exploit information about the user preferences
in the target
domain
5
Introduction: Cross-domain item recommendation
•
Context
•
User and item profiles are distributed in multiple systems
there is no / a few user profiles with preferences on items in different domains
•
Goal
6
•
Introduction: Cross-domain item recommendation
•
Case study: Linking music with places of interest
•
A semantic-based framework for linking domains
•
Cross-domain semantic networks from Wikipedia
•
Cross-domain semantic networks from Open Information Extraction
•
A social tag-based emotion-oriented approach for linking domains
7
•
Case study
: Suggesting music / musicians highly related to a particular
point of interest (POI)
8
•
Case study
: Suggesting music / musicians highly related to a particular
point of interest (POI)
•
Relations between music and places
‐
Based on common
emotions
caused by listening to music and visiting
POIs
social tags
Case study: Linking music with places of interest
Kaminskas, M., Ricci, F. 2011. Location-Adapted Music Recommendation Using Tags.
9
•
Case study
: Suggesting music / musicians highly related to a particular
point of interest (POI)
•
Relations between music and places
‐
Based on common emotions caused by listening to music and visiting POIs
social tags
‐
Based on explicit
semantic associations
between musicians and POIs
information available in the (Semantic) Web
Case study: Linking music with places of interest
Vienna State Opera
Gustav Mahler
Wolfgang Amadeus Mozart
Arnold Schoenberg
Classical music
Austrian musicians
Opera composers
19th century
Romanticism
10
•
Semantic relations between musicians and POIs
•
Location
relations
‐
Arnold Schoenberg was born in Vienna, which is the city where Vienna State
Opera is located
•
Time
relations
‐
Gustav Mahler was born in 1869, which is a year in the decade when Vienna State
Opera was built
•
Architecture-History/Art-Music “
category
” relations
‐
Wolfgang A. Mozart was a classical music composer, and classical compositions
are played in Opera houses, which is the building type of the Vienna State Opera
•
Arbitrary relations
‐
Gustav Mahler was the director of Vienna State Opera
‐
Ana Belén (a famous Spanish singer) composed a song about La Puerta de Alcalá
(a well known POI in Madrid)
11
•
Introduction: Cross-domain item recommendation
•
Case study: Linking music with places of interest
•
A semantic-based framework for linking domains
•
Cross-domain semantic networks from Wikipedia
•
Cross-domain semantic networks from Open Information Extraction
•
A social tag-based emotion-oriented approach for linking domains
12
Cross-domain semantic networks from Wikipedia
City
Building type
(Architecture) categories
13
Visitor
attractions
Arts
venues
Music
venues Opera
houses
Opera
Classical
music
Music
genres
Opera
composers
Architectural
styles
19th century
architecture
19th century
Modern
history
Historical
eras
Music
people
Musicians
Composers
Romanticism
18th century
19th century
in music
19th century
musicians
Romantic
composers
Classical
composers
19th century
composers
Cross-domain semantic networks from Wikipedia
•
Linking Wikipedia’s architecture and music categories
Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of
Music for Places.
13th Intl. Conference on Information and Communication Technologies in Tourism
.
14
Visitor
attractions
Arts
venues
Music
venues Opera
houses
Opera
Classical
music
Music
genres
Opera
composers
Architectural
styles
19th century
architecture
19th century
Modern
history
Historical
eras
Music
people
Musicians
Composers
Romanticism
18th century
19th century
in music
19th century
musicians
Romantic
composers
Classical
composers
19th century
composers
Cross-domain semantic networks from Wikipedia
•
Linking Wikipedia’s architecture and music categories
Kaminskas, M., Fernández-Tobías, I., Ricci, F., Cantador, I. 2013. Ontology-based Identification of
Music for Places.
13th Intl. Conference on Information and Communication Technologies in Tourism
.
15
Cross-domain semantic networks from Wikipedia
•
Cross-domain taxonomies from Wikipedia
•
Architecture
•
History / Art
•
Music
Visitor
attractions
Arts
venues
Music
venues
Opera
houses
Historical
eras
Modern
history
Romanticism
19th century
18th century
Centuries
Architectural
styles
Centuries in
architecture
19th century
architecture
Music
genres
Classical
music
Opera
Music
people
Romantic composers
Classical composers
Composers
Musicians
Opera composers
19th century musicians
19th century composers
16
POI
City
Date
Year
Decade
Century
Architectural
style
Musician
type
Musician
located_in
has_style
genre_of
type_of
birth_place_of
death_place_of
residence_place_of
birth_date_of
death_date_of
activity_date_of
Music
genre
Building
type
Musical
era
Historical
era
Architectural
era
has_type
subcategory_of
building_start_date_of
building_end_date_of
opening_date_of
17
Vienna,
Austria
1869
Opera
houses in
Austria
19th
century
architecture
Gustav
Mahler
19th
century
Opera
houses
Opera
houses in
Vienna
1869
architecture
Opera
Romanticism
19th
century
Vienna
State
Opera
Romantic
music
Architectural
styles
Building types
Music genres
Musician
types
Architectural
eras
Historical
eras
Musical
eras
Date
City
birth_decade_of
activity_century_of
death_place_of
19th
century in
music
1860s
19th
century
composers
Classical
music
Romantic
composers
Classical
composers
18
Cross-domain semantic networks from Wikipedia
•
Weight Spreading Activation
•
PageRank
•
HITS
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝑃𝑅 𝑖 = 1 − 𝑑 ·
1
𝑁
+ 𝑑 ·
1
𝐿(𝑗)
𝑗→𝑖
𝑃𝑅(𝑗)
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝐴 𝑖
𝐴 𝑖 = 𝐻(𝑗)
𝑗→𝑖𝐻 𝑖 = 𝐴(𝑗)
𝑖→𝑗H
A
A
H
i
j
𝑠𝑐𝑜𝑟𝑒 𝑖 ← 𝑆 𝑖 = 1 − 𝑑 · rel 𝑖 + 𝑑 · 𝑤
𝑗𝑖
𝑆(𝑗)
𝑗→𝑖
j
i
19
Cross-domain semantic networks from Wikipedia
20
Cross-domain semantic networks from Wikipedia
Average precision values for the top 5 ranked musicians for each POI
P@1
P@2
P@3
P@4
P@5
Random
0.355*
0.391*
0.363*
0.435*
0.413*
HITS
0.688
0.706
0.711*
0.700*
0.694
PageRank
0.753
0.728
0.707*
0.660*
0.646*
Spreading
0.810
0.804
0.828
0.847
0.837
The values marked with * have differences statistically significant with Spreading algorithm’s
(Wilcoxon signed-rank test, p<0.05)
Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for
supporting cross-domain recommendation: Suggesting music for places of interest.
Submitted
.
21
Cross-domain semantic networks from Wikipedia
Average number of semantic paths per POI
Interesting
Non interesting
Related
78.3%
21.7%
Non-related
8.2%
91.8%
Percentages of interesting and obvious musicians recommended by
Spreading algorithm
Non obvious
Obvious
58.9%
41.1%
84.2%
15.8%
Fernández-Tobías, I., Kaminskas, M., Cantador, I., Ricci, F. 2013. A semantic framework for
supporting cross-domain recommendation: Suggesting music for places of interest.
Submitted
.
22
•
Introduction: Cross-domain item recommendation
•
Case study: Linking music with places of interest
•
A semantic-based framework for linking domains
•
Cross-domain semantic networks from Wikipedia
•
Cross-domain semantic networks from Open Information Extraction
•
A social tag-based emotion-oriented approach for linking domains
23
Cross-domain semantic networks from Open Information Extraction
•
TextRunner
(
openie.cs.washington.edu
) and
ReVerb
(
reverb.cs.washington.edu
):
•
Automatically identification and extraction of binary relationships from English
sentences
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam. 2011.
Linked to Freebase
24
Fernández-Tobías, I., Cantador, I. 2013. Open Cross-domain Semantic Networks:
Application to Item-to-item Recommendation.
To be submitted
.
Cross-domain semantic networks from Open Information Extraction
•
Filtering
relations
based on a TF-IDF heuristic
𝑤 𝑒
1
, 𝑟, 𝑒
2
= 𝜆
𝑐 𝑒
1
, 𝑒
2
𝑐 𝑒
𝑖
, 𝑒
𝑗
𝑒
𝑖
,𝑒
𝑗
+ 1 − 𝜆 tfidf(𝑟)
tfidf 𝑟 =
𝑒
𝑖
, 𝑟, 𝑒
𝑗
∈ 𝐺
max
𝑠
𝑒
𝑖
, 𝑠, 𝑒
𝑗
· log
𝑁
𝑒
𝑖
, 𝑟, 𝑒
𝑗
∈ 𝒞
•
Ranking
entities
according to node categories and graph structure
𝑤 𝑒 = 𝛼
1
𝑤
𝑇
𝑒 + 𝛼
2
𝑤
𝑃
(𝑒) + 𝛼
3
𝑤
𝐷
(𝑒)
𝑤
𝑇
𝑒 = 𝑇 𝑒 ∩ 𝐷 ·
𝑇 𝑒 ∩ 𝐷
𝑇(𝑒)
𝑤
𝑃
𝑒 = 𝑠 → 𝑒
25
26
27
•
Introduction: Cross-domain item recommendation
•
Case study: Linking music with places of interest
•
A semantic-based framework for linking domains
•
Cross-domain semantic networks from Wikipedia
•
Cross-domain semantic networks from Open Information Extraction
•
A social tag-based emotion-oriented approach for linking domains
28
A social tag-based emotion-oriented approach for linking domains
•
Mining social tagging systems to create linked emotion-oriented
29
•
Generic emotion lexicon
•
Automatically created by mining online thesauri (e.g.
thesaurus.com
)
•
16 main emotions
: alert, excited, elated, happy, content, serene, relaxed,
calm, fatigued, bored, depressed, sad, upset, stressed, nervous, tense
•
Emotion
=
synonym & antonym vector
‐
Synonyms: positive weights
‐
Antonyms: negative weights
A social tag-based emotion-oriented approach for linking domains
Fernández-Tobías, I., Plaza, L., Cantador, I. 2013.
Cross-domain Emotion Folksonomies.
To be submitted
.
happy
:+66,
cheerful
:+ 21,
merry
:+19,
felicitous
:+17, …
unhappy
:–11,
sad
:–10,
depressed
:–6,
serious
:–4, ….
30
A social tag-based emotion-oriented approach for linking domains
•
Generic emotion lexicon
•
In accordance with
Russell’s emotion model
(1980)
‐
Emotion representation in 2 dimensions: pleasure & arousal
AROUSAL SLEEPINESS PLEASURE MISERY DISTRESS EXCITEMENT CONTENTMENT DEPRESSION excited alert happy elated content relaxed calm serene bored fatigued depressed sad stressed upset nervous tense alert excited elated happy content serene relaxed calm fatigued bored depressed sad upset stressed nervous tense -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15