V. Loreto
http://www.tagora-project.eu/
TAGora
TAGora consortium
Prof. L. Steels Prof. N. Shadbolt Dr. Harith Alani Prof. S. Staab Prof. G. Stumme Prof. V. LoretoCoordination
TAGora Synergies
PHYS-SAPIENZA
SONY-CSL
• complex systems expertise
• modeling and simulation
• data from existing folksonomies
• statistical tools for data analysis
• stochastic models of tagging behavior
• dissemination
• project coordination
UNI-SOTON
• image tag-based navigation system
• music tag-based navigation system
• feature-enhanced navigation maps
• expertise in semiotic dynamics
• collective intelligence for sustainability
• dissemination
UNIK UNI KO-LD
• tagging system for bibliographic data
• data mining for folksonomies
• social network analysis
• dissemination • recommendation systems • cross-folksonomy analysis • trend detection • network analysis • dissemination
• peer-to-peer testbed for folksonomies
• representation of folksonomy data
• ontology learning
a short history of the web
1989-1991
1991-2000
1998
2000
2000-2004
2005
the Semantic Web vision by T. Berners-Lee
WWW is created at CERN
users become content providers,
rise of online communities
“bottom-up” information architecture
mass adoption, users are consumers,
taxonomic approach
Google is born
artwork by R. Munroe
~ 100 million users
artwork by R. Munroe
http://del.icio.us
resource
user
resource
user
resource
user
{ tags }
.net 3d advertising
ajax
animation apiapple
architectureart
article
articlesaudio
bittorrentblog
bloggingblogs
bookbooks
browserbusiness
calendar cms code collaboration color comicscommunity
computer
computers cookingcool
css
culture
daily database del.icio.usdesign
development
diy
download
downloads dvd economicseducation
electronics email english entertainment environment fashion film finance
firefox
flash
flickr
fontsfood
forum frameworkfree
freeware
fun
funny
gadgets gallerygame
games
geekgraphics
gtd guide hack hackshardware
health
history
home
hostinghowto
html
humor
icons illustrationimages
imported
informationinspiration
interestinginternet
ipod japanjava
javascript
jobslanguage
learninglibrary
life lifehacks linkslinux
list literaturemac
magazine management map
maps
marketing
mathmedia
microsoft
mobile
money moviemovies
mp3
music
mysqlnetwork networking
news
online
opensource
osx
p2p perl personal philosophy phonephoto
photography
photos
photoshop
php
plugin podcastpolitics
portfolio privacyproductivity
programming
psychologypython
radio
rails
recipesreference
religionresearch
resourceresources
reviewsrss
ruby
rubyonrails safari_exportscience
search
security
seo server service shopshopping
social
software
spyware statistics sysadmintech
technology
tips
tool
tools
toread
travel
tutorial
tutorials
tv
typography ubuntu unix usability useful utilitiesvideo
videos visualizationweb
web2.0
webdesign
webdev
wiki
windows
wordpresswork
writing
xml
del.icio.us
Navigate the information sea
Navigate the information sea
the complexity
community level
the complexity
community level
http://www.flickr.com/photos/gustavog/9708628/
user level
F O L K S O N O M Y
the complexity
community level
http://www.flickr.com/photos/gustavog/9708628/user level
http://dml.riken.go.jp/~ciro/blog/2005/Feb/14Main results of the first year
Extensive data collection from selected collaborative
Main results of the first year
Extensive data collection from selected collaborative
tagging
!
systems (
del.icio.us
,
Flickr
and
Last.Fm
)
Acquisition of existing datasets from several social
Main results of the first year
Extensive data collection from selected collaborative
tagging
!
systems (
del.icio.us
,
Flickr
and
Last.Fm
)
Acquisition of existing datasets from several social
websites
!
(
IMDB
,
Netflix
,
Wikipedia
)
Realization of web-based applications:
!
BibSonomy
(www.bibsonomy.org)
!
Ikoru
(www.ikoru.net)
http://bibsonomy.org (by KDE @ Kassel)
post
{ tags }
resource
user
Tag co-occurrence: raw data
design developmentcss webdevdhtml xml programming xmlhttprequestweb javascriptajax
xslt xmlhttprequest htmlcss javarss programmingajax web javascriptxml
tech politicsart daily cssrss musicnews web designblog
time rankTag co-occurrence: raw data
design developmentcss webdevdhtml xml programming xmlhttprequestweb javascriptajax
xslt xmlhttprequest htmlcss javarss programmingajax web javascriptxml
tech politicsart daily cssrss musicnews web designblog
time rank“serialize” posts to produce a stream of tags
100 101 102 103 104
R
10-7 10-6 10-5 10-4 10-3 10-2 10-1P(R)
"blog" "ajax" "xml" 100 101 102 R 10-4 10-3 10-2 10-1 P(R) web news music rss design javascript web "blog" "ajax" "xml" "H5N1" del.icio.us Connoteafrequency-rank plot
α ! 5/4100 101 102 103 104
R
10-7 10-6 10-5 10-4 10-3 10-2 10-1P(R)
"blog" "ajax" "xml" 100 101 102 R 10-4 10-3 10-2 10-1 P(R) web news music rss design javascript web "blog" "ajax" "xml" "H5N1" del.icio.us Connoteafrequency-rank plot
α ! 5/4 high rankP
(
R
)
∼
R
−αα >
1
100 101 102 103 104
R
10-7 10-6 10-5 10-4 10-3 10-2 10-1P(R)
"blog" "ajax" "xml" 100 101 102 R 10-4 10-3 10-2 10-1 P(R) web news music rss design javascript web "blog" "ajax" "xml" "H5N1" del.icio.us Connoteafrequency-rank plot
low rankleveling off
α ! 5/4 high rankP
(
R
)
∼
R
−αα >
1
•
we start with n
0words
•
at time
t
: with probability
p
, a new word is appended
•
with probability
1-p
, a word is copied at random from the past
the Yule-Simon process
t-x new p 1-p ... t-9 t-8 t-7 t-6 t-5 t-4 t-3 t-2 t-1
P
(
R
)
∼
R
1
−
p
Tag Correlations
C
(∆
t, t
w) =
1
T
−
∆
t
t=tw+T −∆t!
t=twδ
(tag(
t
+ ∆
t
)
,
tag(
t
))
∆
t
... t+8 t+7 t+6 t+5 t+4 t+3 t+2 t+1 t100 1000
Δ
t
0.006 0.008 0.010 0.012C(
Δ
t,t
w)
c(t w) c(t w) c(tw1) 2 3 1 1 2 2 3 3 3 3 3 t w tw + T t w tw + T t w tw + T a(t w) / [Δt + δ(tw)] + c(tw)C
(∆
t
)
∼
1
∆
t
+
δ
Tag Correlations
C
(∆
t, t
w) =
1
T
−
∆
t
t=tw+T −∆t!
t=twδ
(tag(
t
+ ∆
t
)
,
tag(
t
))
∆
t
... t+8 t+7 t+6 t+5 t+4 t+3 t+2 t+1 t c(tw) = R max(tw) ! R=1 Pt2 w( R)a Yule-Simon model with memory
•
we start with n
0words
•
at time
t
: with probability
p
, a new word is appended
•
with probability
1-p
, a word is copied from position
t-x
•
x
is distributed according to a fat-tailed memory kernel Q(x)
t-x new p 1-p
x
... t-9 t-8 t-7 t-6 t-5 t-4 t-3 t-2 t-1Q
t(
x
)
∼
1
x
+
τ
ln x
frequency-rank plot: exp. vs model
p = 0.06
p = 0.03
τ
= 20
τ
= 100
“blog”
“ajax”
100 101 102 103 104R
10-7 10-6 10-5 10-4 10-3 10-2 10-1P(R)
"blog", experimental "blog", theory "ajax", experimental "ajax", theory "xml", experimental "xml", theory 100 101 102 R 10-4 10-3 10-2 10-1 P(R) experimental web news music rss design javascript web "blog" "ajax" "xml" "H5N1" del.icio.us Connotea C. Cattuto, VL, L. Pietronero PNAS 104, 1461 (2007)frequency-rank plot: exp. vs model
p = 0.06
p = 0.03
τ
= 20
τ
= 100
“blog”
“ajax”
Q
t(
x
)
∼
1
x
+
τ
100 101 102 103 104R
10-7 10-6 10-5 10-4 10-3 10-2 10-1P(R)
"blog", experimental "blog", theory "ajax", experimental "ajax", theory "xml", experimental "xml", theory 100 101 102 R 10-4 10-3 10-2 10-1 P(R) experimental web news music rss design javascript web "blog" "ajax" "xml" "H5N1" del.icio.us Connotea C. Cattuto, VL, L. Pietronero PNAS 104, 1461 (2007)10
410
510
610
710
8τ
10
210
310
410
510
610
7N(
τ
)
2004 2005 2006 1×106 2×106 Nglobal vocabulary growth
~ 650.000 users
~ 2 · 10
7resources
~ 5 · 10
7posts
~ 2.5 · 10
6tags
advertising agency art arte artist artists blog branding cool creative creativity css
design
designer designers digital drawing fashion flash gallery graphic graphicsideas identity illustration illustrator ilustração inspiration interactive magazine marketing
motion photo photographer photography portfolio portfolios print propaganda reference
showcase studio sweden typography vector wallpaper web webdesign webdev website
R
1
T
1
art awards blog blogs code community cool
css
cssgallerydesign
development directorydiseño examples free galeria galleries gallery graphics html ideas inspiration interface
layout links news portal portfolio programming reference resource resources safari_export
showcase standards style template templates tools tutorial typography usability web
web2.0 web_design webdesign webdev website webstandards xhtml
T
2
R
2
Tagcloud Overlap Metrics
w
R1,R2=
!
t∈T1∩T2 min(ft1.ft2) ft!
t∈T1∩T2 max(ft1.ft2) ft+
!
t∈T1−T2 ft1 ft+
!
t∈T2−T1 ft2 ft-0.005 0 0.005 -0.005 0 0.005 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 V5 V3 V4 V5 2 1 3 4 5 6
Resource Networks: Community Structure
politics
37signals art blog books css
design
development fontfonts free graphics howto illustration inspiration photo
photography photoshop portfolio productivity programming reference software system:unfiled themes tutorial
tutorials typography web webdesign wordpress
activism art blog burn bush creativity culture dvd economics
flash freeware fun funny government history humor
maps media money
politics
reference research softwarespeechwriter statistics system:unfiled tools usa war windows
art business color css
design
development flash freefun game games google graphics html inspiration patterns
photography photos pricing reference resources search software stock
system:unfiled tools web web2.0 webdesign webdev
ajax art awards blog blogger blogs color cool
css
design
flash gallery graphics html imagesinspiration internet javascript lightbox politics portal
portfolio reference system:unfiled templates tools web web2.0
webdesign
webdevajax art blog books color css
design
desktop desktops development extension extensions firefox flashgraphics icons illustration inspiration programming reference
software system:unfiled technology tools typography wallpaper
wallpapers web webdesign webdev
activism blog blogs bush colbert comedy conservative culture
election fraud freedom funny government grillo humor internet law
libertarian maps media news political
politics
progressivescience security system:unfiled usa video voting
1
2
4
5
6
3
Resource Networks: Community Structure
-0.005 0 0.005 -0.005 0 0.005 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 V5 V3 V4 V5 2 1 3 4 5 6
“humor” in politics
news in politics
web design
visual design
spam detection
raw data
no spam
shuffled
Expected results of the Project
Devise methods and algorithms for analysing raw data
collected in online social communities.
Develop suitable modeling and theoretical
constructions to understand, predict and control
emergent properties.
Develop and make publicly available innovative
applications embodying novel navigation and control
concepts.
Foster the growth of new web communities revolving
around the applications developed by the Consortium.
Create the first extensive and comprehensive body of
data on web-based tagging and make it available to the
broader IT and scientific community.
C. Cattuto, V.D.P. Servedio and VL “A Yule-Simon process with memory” Europhys. Lett. 76, 208 (2006)
C. Cattuto, VL and L. Pietronero
“Semiotic dynamics and collaborative tagging” PNAS 104, 1461 (2007)
C. Cattuto, A. Baldassarri, V.D.P. Servedio and VL “Vocabulary growth in social tagging systems” http://arxiv.org/abs/0704.3316v1
C. Cattuto et al.
“Network Properties of Folksonomies” AI Communications (2007), in press
http://www.tagora-project.eu/
C. Cattuto, V.D.P. Servedio and VL “A Yule-Simon process with memory” Europhys. Lett. 76, 208 (2006)
C. Cattuto, VL and L. Pietronero
“Semiotic dynamics and collaborative tagging” PNAS 104, 1461 (2007)
C. Cattuto, A. Baldassarri, V.D.P. Servedio and VL “Vocabulary growth in social tagging systems” http://arxiv.org/abs/0704.3316v1
C. Cattuto et al.
“Network Properties of Folksonomies” AI Communications (2007), in press