1st Int. Alexandria Workshop (15th. Sep. 2014)
Multiple Media
Multiple
Media
Analysis
and
Visualization
with
Large
‐
Scale
Temporal
Web
Archives
Masashi Toyoda
Masashi
Toyoda
Center
for
Socio
‐
Global
Informatics,
I tit t
f I d t i l S i
Institute
of
Industrial
Science,
Collaborative researchers
Collaborative
researchers
•
Masahiko
Itoh (The
University
of
Tokyo)
•
Naoki Yoshinaga (The University of Tokyo)
Naoki
Yoshinaga (The
University
of
Tokyo)
•
Cai
‐
Zhi
Zhu (National
Institute
of
Informatics)
•
Shin’ichi Satoh
(National
Institute
of
Informatics))
•
Masaru
Kitsuregawa (National
Institute
of
I f
ti
/ Th U i
it
f T k )
Informatics
/
The
University
of
Tokyo)
Web
as
a
Sensor
for
Society
Interaction
between
Media
Mass
and
web
media
affect
each
other
di
ill h
bi i fl
•
Mass
media
still
have
big
influence
– Major topics in web media are coming from TV
•
Web
media
involve
diverse
services
and
become
important
information
sources
for
mass
media
– Blogs, microblogs, SNS, photo/video/link sharing..
4
Multiple
Media
Fusion
Analyzing
multiple
media
is
strongly
required
for
understanding
social
activities
5
Web Archives in UTokyo
Web
Archives
in
UTokyo
•
Japanese
Web
Archive
–
Focused
crawl
of
Japanese
pages
in
every
domains
(1999
‐
)
–
Contents
of
30
billion
URLs
O
f th l
t
–
One
of
the
largest
archive
in
Asian
region
•
Blogs
number of URLs (billion)
30
Blogs
–
From
2006
–
2.5 million feeds
20 252.5
million
feeds
–
About
1
billion
articles
•
–
From
2011
–
1.5
million
users
0 5 98 00 02 04 06 08 10 12 14–
About
20
billion
tweets
6
Broadcast
News
Video
Archive
in
NII
TV RECS i N i
l I
i
f I f
i
•
TV
‐
RECS
in
National
Institute
of
Informatics
–
News
program
mainly from NHK
mainly
from
NHK
• March 2001 ‐
–
24
‐
hour,
MPEG 1/2 Capture CC Client PC
,
7
channels
in
Tokyo
area
• August 2009 CC Decoder MPEG 1/2 Capture CC Video Archive Server DBMS Server Client PC Client PC • August 2009 ‐
–
300,000
hours,
1
petabyte
CC Decoder ・ ・ ・ 24 hours CC Client PCp
y
–
closed
‐
caption
text
and
electronic
program guide
MPEG 1/2 Capture CC Decoder ・ 24 hours Archives (upto a year) News Archives CC Text EPG Vid A hi (1PB) Metadataprogram
guide
(EPG).
7Big Data Visualization Platform
(or Web Observatory Control Room?)
Visualizing large scale data on large wall displays
Platform
for
Multiple
Media
Analysis
Multiple Media Analysis Image Linkage
Graph Analysis Text Analysis
A A A A A B C D E F A B C D E F A B C D E F A B C D E F Video Archive b d h b l F G H F G H F G H F G H
Web media archive by continuous crawler
Information
Diffusion
on
3.11
in
•
was
one
of
the
important
information
sources for evacuation in Tokyo
sources
for
evacuation
in
Tokyo
•
Situation
in
Tokyo:
All t i
d
t
–
All
trains
and
metros
were
stopped
–
Highways were closed
Highways
were
closed
–
Millions
of
people
walked
to
home
or
shelters
–
Cell
phones:
voice
didn’t
work,
data
worked
but
narrow
Challenge:
Observing
important
information
diffusion
and supplying right information in right timing
Visualizing
Information
Diffusion
in
•
Graph
visualization
of
mentions
and
retweets
b
between
tweets
•
Tweets
are
automatically
clustered by their
clustered
by
their
text
similarities
•
Temporal
changes
in
graphs
g p
are
animated
Analysis and Visualization
Analysis and Visualization
of Temporal Changes
in Bloggers'
in Bloggers
Activities and Interests
Masahiko ITOH, Naoki Yoshinaga,
Masashi TOYODA and Masaru KITSUREGAWA Masashi TOYODA, and Masaru KITSUREGAWA
Institute of Industrial Science (IIS), University of Tokyo, Japan y y p
Visual Summarization of Events by
Term Dependency Analysis
Term Dependency Analysis
Blog archive
2006 2 2006.2
Extracted events Summary of events
… die of swine flu … catch swine flu … 2006.2 catch swine flu
Blog post verb
noun in California 1
swine flu…
… die of swine flu in California
die of
swine flu in California
sentence die of 2 … catch swine flu … catch i fl die of Dependency structures … catch swine flu in California … swine flu 2006.3 swine flu swine flu catch catch 3 2006.4 ・ ・ ・ 2006 3 swine flu in California in California 1 ・ ・ ・ 2006.3 2006.4 ・ ・ ・ 13
Term Dependency Database
Dependency Database
Noun-verb DB
swine flu catch 3
・・・ frequency die 2 California catch 1 obj of in Verb-noun DB 2006.3 2006.2 California catch 1 die 1 2006.4 ・ ・ ・ in 1 die of swine flu 2 in California
1 catch
California swine flu 1
in of
obj swine flu 3 in California 1
2006.2 ・ ・ ・ ・・・ 1 California
in obj swine flu 1
14
2006.3 2006.4
3D Visualization
Visualizing dependency relations as a g p y
unified tree representation
TimeSlice [Itoh 2010]: slidable 2D plane on the
timeline to visualize temporal changes in structure
structure
Sliding operation
changes in the structure and frequencies
Interactively navigating to the detailed
dependency
P idi lti l 2D l t
Providing multiple 2D planes to
compare
Different timings, different topics Tiled view
Tiled view
Visualizing changes in the frequencies
of each dependency relation of each dependency relation
TimeFlux: line of spheres to visualize changes
in the amount of information
Image
Flows
Visualization
for
Inter
‐
Media
Comparison
Masahiko Itoh, Masashi Toyoda (The University of Tokyo), Masahiko Itoh, Masashi Toyoda (The University of Tokyo),
Cai‐Zhi Zhu, Shin’ichi Satoh (National Institute of Informatics),
Masaru Kitsuregawa (National Institute of Informatics, The University of Tokyo)
PACIFIC
VISUALIZATION 2014
17
Tracking
g
Diffusion
of
Topics
p
through
g
Blogs
g
and
TV
News
•
What
kinds
of
contents
become
popular
in
each
di
?
medium?
•
For
a
given
topic,
which
medium
first
provided
the
contents?
•
How
the
exposure
of
contents
in
one
medium
affect
the
other?
Topic
p
Tracking
g
based
on
Image
g
Matching
g
•
Examine
influence
and
diffusion
of
image
contents
Clustering duplicate and near mirror images in blog posts
– Clustering duplicate and near‐mirror images in blog posts
– Topics are created by grouping image clusters based on their
similarity of surrounding texts
– For each topic, retrieve TV shots relevant to each image clusters in
the topic
Visualizing image flows from blogs and TV
– Visualizing image flows from blogs and TV
Topical keywords Extracting image Topics Image Matching Topics
Image Topics News video archive
g
Blog archive
19
Visual Exploration of Topics
Visual
Exploration
of
Topics
1. Visualizingg the time series of imageg flows
2. Comparing multiple topics
3. Comparing imagesp g g in different media 4. Exploring image clusters
5. Accessing detailed information
5 ccess g deta ed o at o
A
topic
is
represented
by
• Life time of the topica
histogram
of
images
• Changes in visual trends• Bursts
• Variety of imagesVariety of images
20
Visual Exploration of Topics
Visual
Exploration
of
Topics
1. Visualizingg the time series of imageg flows 2. Comparing multiple topics
3. Comparing imagesp g g in different media
4. Exploring image clusters
5. Accessing detailed information
5 ccess g deta ed o at o TV Bl l TV Blog 21 front and back top and bottom Blog
Visual Exploration Environment
Visual
Exploration
Environment
1. Visualizingg the time series of imageg flows For each image cluster
2. Comparing multiple topics
3. Comparing images in different media
pair, calculate and filter by:
• cosine similarity
l ti d
p g g
4. Exploring image clusters
5. Accessing detailed information
• cross‐correlation and lead/lag date • number of images 5 ccess g deta ed o at o TV Blog 22 Parallel Coordinates View Image Flow Visualization Blog
Case
Study
1.
Fukushima
Nuclear
Disaster
TV
Blog
Case
Study
1.
Spotting
the
origin
of
images
TV
Blog leads TVBlog
Explanation of the structure and situation of the 25 Fukushima nuclear power plant by a researcher (Dr. Josef Oehmen) at MITCase Study 2 Political Issue
• Some political activity in some public event
Case
Study
2.
Political
Issue
was first broadcasted in TV• In first two days, TV programs were
h l h h
TV somehow reluctant to show the image
• That activity made bloggers angry and they
l i l i d th i
explosively copied the image
• After seeing that response, TV news started
to treat the issue as a big problem
Blog
to treat the issue as a big problem
Platform
for
Multiple
Media
Analysis
Multiple Media Analysis Image Linkage
Graph Analysis Text Analysis
A A A A A B C D E F A B C D E F A B C D E F A B C D E F Video Archive b d h b l F G H F G H F G H F G H
Web media archive by continuous crawler