• No results found

Big Data in Pictures: Data Visualization

N/A
N/A
Protected

Academic year: 2021

Share "Big Data in Pictures: Data Visualization"

Copied!
117
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data in Pictures:

Data Visualization

Huamin Qu

(2)

What is data visualization?

• “Data visualization is the creation and study of the visual representation of data” - wiki

(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)

Data Visualization

• Vis is hot & cool

• Vis is young

• What is vis research?

(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)

Obama’s big data plan

• A $2 million award for a research training

group in big data that will support training for undergraduate and graduate students and

postdoctoral fellows using novel statistical, graphical, and visualization techniques to study complex data.

(19)
(20)
(21)
(22)
(23)

Data Visualization

• Vis is hot & cool

• Vis is young

• What is vis research?

(24)

Subfields

• Scientific Visualization (SciVis) – Spatial data

• Information Visualization (InfoVis)– Abstract data

(25)

VAST (Visual Analytics Science and Technology)

InfoVis (Information Visualization)

SciVis (Scientific Visualization)

(26)
(27)
(28)
(29)
(30)

VIS - Infographics

• Infographics is static

(31)
(32)
(33)
(34)

VIS - HCI

• Visualization deals with data

• HCI deals with everything involving human & computer interaction

(35)

VIS - Data Mining

• Data mining focuses more on automatic algorithms

• Visualization keeps human in the loop and focuses more on interactive analysis

(36)

VA and Data Mining

• Data Analysis

– let computers do what computers are good at

Data mining

– let humans do what they're good at

(37)

Data mining

Visual Analytics

Human Computer Interaction Computer Graphics

(38)
(39)

Data Visualization

• Vis is hot & cool • Vis is young

• What is vis research?

(40)

What is vis research?

• Techniques/algorithms • Applications • Systems • Evaluations • Theory/models

(41)

Visualization Process

• Visualization Pipeline

(42)

Technique Papers

(43)

Application/Design Study Papers

Applying visualization and visual analytics techniques in an application area.

(44)

System Papers

A blend of algorithms, technical requirements, user requirements, and design that solves a major problem.

(45)

Evaluation Papers

(46)

Theory/Model papers

New interpretations of the foundational theory of visualization and visual analytics.

How Information Visualization Novices Construct Visualizations Grammel et al. IEEE TVCG 2010

(47)

A Typical Visual Analytics Problem

IEEE VAST Challenge 2009

An employee is leaking important information to the outside world; hypotheses about his identity and

network need to be made or confirmed. There are three datasets:

– Badge and computer network traffic

– Social network (with a very small geospatial component) – Video

(48)
(49)
(50)

Role of Visualization in

Big Data Analytics

(51)

Challenges for

Visualization Research

(52)

Better Scalability

- Better visualization - More people

(53)

Better Visualization

• Better overview & summarization • Better data reduction

• Better visual encoding • Better user interaction • ….

(54)

Top Ten Interaction Challenges

in Extreme-Scale Visual Analytics

P.C. Wong, H.W. Shen and C. Chen 2012 1. In Situ Interactive Analysis

2. User-Driven Data Reduction

3. Scalability and Multi-level Hierarchy

4. Representation of Evidence and Uncertainty

(55)

Top Ten Interaction Challenges

in Extreme-Scale Visual Analytics

6. Data Summarization and Triage for Interactive Query

7. Analytics of Temporally Evolving Features 8. The Human Bottleneck

9. Design and Engineering Development

(56)

How to Get More People?

• Lower the bar for visualizations • Bring visualizations to masses

(57)

Lower the Bar for Visualizations

• Google Visualization API

• IBM’s RAVE and Many Eyes • D3.js

(58)
(59)
(60)
(61)
(62)
(63)
(64)

Data Visualization

• Vis is hot & cool • Vis is young

• What is vis research?

(65)
(66)

Medical Data Visualization

TVCG’07 TVCG’08 (Vis’08) TVCG’09 1 (Vis’09) TVCG’09 2 (Vis’09) TVCG 2011

(67)

High Dimensional and Relational Data

TVCG’08 (InfoVis’08) CGF’08 (EuroVis’08) CGF’09 (EuroVis’09) TVCG’09 (InfoVis’09) TVCG’11 InfoVis’11

(68)

Text Data

Dynamic Tag Clouds

IEEE CG&A’ 11 OpinionSeer TVCG’10 (InfoVis’10) FacetAtlas TVCG’10 (InfoVis’10) TextFlow TVCG’11 (InfoVis’11) TextWheel ACM TIST’12

(69)

Urban Informatics

Air pollution analysis TVCG’07 (Vis’07) Route Zooming TVCG’09 (Vis’09) Trajectory Analysis VAST’11

(70)

Three Projects

• Twitter Data Visualization

• Trajectory Data Visualization • Coursera Data Visualization

(71)
(72)

Whisper

(73)

How information is spread ?

(74)

Information diffusion: When, where and how the ideas are spread ?

The temporal trend, social-spatial extent and community response to a topic of interest.

(75)

Challenges

• How can we represent the dynamic

information diffusion processes for diffusion pattern detection and comparison?

– How can we deal with a huge amount of micro

blog data in real time?

– How can we summarize the diffusion processes by

considering temporal, spatial, topic and community information?

(76)

Visualization design

• Color hues: sentiments

• Color opacity: activeness of tweets / users

• Size: importance (number of followers)

(77)

Visualization design

• Diffusion series

– When mouse over on an active tweet, its diffusion

series will be shown to illustrate ``when did who reweet whom”.

(78)
(79)

Layout

(80)

Layout

Topic Disc Diffusion Pathway User Group Assembling

(81)

Layout

Topic Disc Diffusion Pathway User Group Assembling

• Diffusion Field – Based on electronic fields – Topic positive charged – User group negative charged

(82)

Layout

Topic Disc Diffusion Pathway User Group Assembling

(83)

Layout

Topic Disc Diffusion Pathway User Group Assembling

• Users are grouped either by geo-locations / their shared interests

– Use voronoi icons to facilitate comparison

– Use sunflower packing for efficient layout in case of online monitoring

(84)

Layout

Topic Disc Diffusion Pathway User Group Assembling

• Visual clutter reduction

– Reorder the user groups in match with the center

active tweets to reduce crossings

(85)
(86)

Stories (1) : tracing diffusion events

(87)
(88)

Interview with Domain Experts

• Three experts:

– political scientist, design expert, communication phD

• Interview methodology:

– warm up; semi-structure interview with 7 questions;

free discussions

• Results

(89)

A Complete VA System

• Design goals

• Underlying techniques (data mining; visualization, interaction, etc.)

• Case studies

• User study (questionnaire-based; task-based) • Expert Interview

(90)

Visual Analytics of Trajectory Data

• Fix data errors

• Reconstruct continuous trajectories • Route diversity analysis

Data Uncertainty Data Error

(91)

Navigation System

• Microsoft T-drive [Yuan et al., 2010]

R1 R2 R3 a b R4 99

(92)

• Time

– Outer circle

• Number of trajectories

– The height of the bars

• Duration

– Length of the arc

• Speed

– Color

100

Encoding Scheme

(93)

User Interactions

• Brushing by locations • Brushing by time span

(94)

Traffic Monitoring

(95)

Route Suggestion

(96)

Route Suggestion

(97)

Embed Temporal Displays

onto the Roads

(98)
(99)

Analysis of Clickstreams

in the Coursera

(100)

Analysis on clickstream pattern-stacked graphs Week 1-1 Week 1-2 Week 1-3 Week 1-4 628 Fig.1

(101)

Analysis on clickstream pattern-stacked graphs

Week 1-1

Week 1-2

Week 1-3

(102)

Analysis on clickstream pattern - animation

(103)

Analysis on clickstream pattern - animation

Fig.4

11 2

(104)

Analysis on user behavior

11 4

(105)

• Current Classification

• Comparisons between different types of users

“seek” action – most frequent and meaningful  seek arc comparisons

 Users who watched only one video – “random” users 28.1%  Users who watched all videos – “persistent” users 9.8%

 Users who watched more than one video and only those from the

first seven videos – “impersistent” users 44.5%

 Other types of users – “others” 17.6%

User classification

11 5

(106)

Seek arc visual encoding scheme

Fig.10

11 6

(107)

Comparison of seek behavior on the first course video for different groups of users (500 users for each group)

Fig.11

11 7

(108)

Comparison among videos from different weeks for the user group who have watched all the 17 videos(persistent users)

Fig.12

11 8

(109)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 1) “Skip the opening”

11 9

(110)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 1) “Skip the opening”

12 0

(111)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 2) “New term & Sensitive topic”

12 1

(112)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 2) “New term & Sensitive topic”

12 2

(113)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 3) “Findings vs. Conclusions”

12 3

(114)

*Specific Parts in Videos Corresponding to Histogram peaks

Some patterns from peaks: 3) “Findings vs. Conclusions”

12 4

(115)

Conclusions

• A primer on data visualization

• Examples to showcase HKUST vis projects • Data visualization is hot, cool, and young

(116)
(117)

References

Related documents

ment of speculation is summarized by the metaphor of the beauty contest. In other words speculation consists in a behaviour directed to outguess others’ guesses about

The main tasks in the nursing care process include processes of patient care, of ward management, of communication and cooperation with other health professionals and education

Based on the pathogenesis of cancers and the thera- peutic values of xenogeneic cell therapy (i.e. cellular xenotransplantation), we hypothesized that the trans- plantation

Diagrams of most popular business process modelling language BPMN represent only a sequence of tasks and message flows and are not suitable enough for

Filleting, Fairing, and Molding with Epoxy Resins Cold Cure epoxy is mixed with phenolic microballoons (purple), quartz microspheres (white), or wood flour (brown) to make putty-

The majority of postdoctoral pediatric dentistry pro- grams teach that communicative and pharmacologic behavior management techniques, with the exception of HOME, are acceptable.

British historical, literary and folkloric influences are clearly apparent, with Treasure Island and Kidnapped adapted from Robert Louis Stevenson’s

Further, though many public sector practitioners report involvement in marketing activity, at least one national study suggested that basic marketing principles were often absent