• No results found

Interactive Visual Data Analysis in the Times of Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Interactive Visual Data Analysis in the Times of Big Data"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Interactive Visual

Data Analysis in the

Times of Big Data

Cagatay Turkay

* giCentre, City University London

Who?

• Lecturer (Asst. Prof.) in Applied Data Science • Started December 2013

• @ the giCentre (gicentre.net)

• PhD @ VisGroup at Univ. of Bergen, Norway • with Helwig Hauser

• MSc @ CGLab at Sabanci University, Istanbul, Turkey • with Selim Balcisoy

(2)

giCentre

• 6 academics • 3 postdocs • 5 PhDs

Research to date

• Integrating Computational Tools in Interactive

Visual Analysis Methods

• Perceptually Optimized Visualization

• Information Theoretical Approach to Crowd

(3)

Interactive Visual

Data Analysis in the

Times of Big Data

Data supported science

• Data analysis in almost all scientific fields

– Biology, medicine, astronomy, psychology,…

• Data driven science

• Research in several fields

– Visualization – Data Mining

– Machine Learning – Statistics

(4)

The “times” of Big Data

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

More V’s added

• Volume • Velocity • Variety • Veracity • Visualisation

• Value

“big data” on Google Trends, as of today

(5)

Getting the best of “Big data”

• Limited value in size!

• Real value in complexity

– Heterogeneous – Several modalities – Many variables

– Spatially & Temporally varying

Visual data analysis

can help!

(6)

Visualization ?

“Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.” [Tamara Munzner, 2014]

“The use of computer-generated, interactive, visual representations of data to amplify

cognition”[Card, Mackinlay, & Shneiderman 1999]

(7)

Interactive Visual Analysis (a.k.a. Visual Analytics)

• “… the science of analytical reasoning facilitated by

interactive visual interfaces.” [Thomas and Cook,

2005]

• “… combining visualization, human factors and data

analysis.” [Keim et al., 2006]

• Combine the best of two worlds: human capabilities and computing power

[Seo & Shneiderman, 2002] [Tableau Software]

How visual analysis can help?

• Ease of interpretation • Ease of communication

• Flexibly varying & relating multiple aspects

• Compare multiple computational outputs & communicate uncertainties

(8)

1- Flexibly vary &

relate multiple aspects

Basics -- Multiple linked views

• Several visualizations • Linking + brushing • Combining selections • Commercial success [Tableau Software] ComVis

(9)

Example :

Analyzing Crime Statistics

through Selection Combinations

and Data Derivation

Analyzing Crime Statistics

• Crime statistics over Birmingham • Crime records – Crime types – Offender location – Crime location • Want to identify .. – Hot spots – Groups – Trends / patterns

(10)

Crime

Locations LocationsOffender

Difference in Location (derived) Offence types Day of week Other details

(11)

A key aspect – Iterative process

• Learn through the process • Evaluate & re-perform

Turkay, 2012 Hauser, 2013

Sense making loop by Pirolli and Card, 2005

2- Compare multiple

computational outputs &

communicate uncertainties

(12)

Issues in computational analysis

• Computational tools have helped

– Several tasks: find structures, summarize, trends… – Rich set of methods, PCA, LDA, Clustering …

• Have issues, unfortunately

– Not flexible

– Limited interaction

– Reliability (noisy data, assumptions)

• Visual analysis to:

– Compare multiple computational outputs & communicate uncertainties

Example :

Comparing and Characterizing

Clustering Results in Cancer

Subtype Analysis

[ Turkay, Cagatay, et al..“, Characterizing Cancer Subtypes Using Dual Analysis in Caleydo StratomeX

(13)

Cluster analysis

• Cluster is a group of “similar” entities

• Several algorithms – important to compare • Challenging to evaluate/interpret

Items

Dimensions (variables)

Thanks to Alexander Lex for the slides

Multiple Clusterings (& Datasets)

Cluster A1

Cluster A2

C B1

(14)

Multiple Clusterings (& Datasets)

Cluster A1 Cluster A2 Cluster A3 C B1 C B2 Clustering 1

e.g., k-means e.g., hierarchicalClustering 2, 27

Thanks to Alexander Lex for the slides

Multiple Clusterings (& Datasets)

Cluster A1 Cluster A2 Cluster A3 C B1 C B2 Meta Data, e.g. clinical data

Dep. C1

Dep. C2

Clustering 1,

e.g., k-means e.g., hierarchicalClustering 2, 28

(15)

Patie nt s (s amples) Genes Candidate Subtype / Heat Map Header / Summary of whole Stratification

Cancer types are not

homogeneous, i.e., subtypes

Subtypes are identified

by clustering datasets, e.g.,

• based on an expression pattern • a mutation status

• a copy number alteration • a combination of these

http://caleydo.org/

Thanks to Alexander Lex for the slides

Multiple Clustering Results

Many shared Patients

Sample Overlaps

(16)

Integrated detail visualisations

Only the member samples

visualized

Communicating (Un)Certainty

Only the member samples

used in computations

Genes significantly different for a cluster

(17)

Some observations

• Compare many results to get support and

increase trust

• Identify how and why results change • Communicate uncertainties

• Applied to many other problems

(18)

3- Seamless integration

of computation

Seamless integration

• Best of human + computer – the ultimate goal of VA!

• Computational algorithms implicit • Embedded within interaction

• Basic interaction methods, e.g., select, pan, zoom, etc.

(19)

Example :

Dynamically generated

visual (statistical) summaries for

geographical data analysis

[ Turkay, Cagatay, et al., Attribute Signatures: Dynamic Visual Summaries for Analyzing Multivariate Geographical Data“, IEEE TVCG (InfoVis 2014)]

Visualization to understand how things

change with geography

(20)

MULTIPLE

VARIABLES?

ONE

VARIABLE

visualizations to look at many

variables…

and

computational support to generate

comparative summaries for the

(21)

Example: Analysis of UK OA data

UK Census of Population in 2001 and 2011 for the

181,000 Output Areas (OA)

for 41 indicators

41 X 181,000

Integrated summary computation

Below baseline Above baseline

Variable comparison baseline

(e.g., country or city avg.)

Each value is computed

(22)

Along the Southwest England coast

(23)

Multi-dimensional projection Janicke et al., 2008 Regression analysis Muhlbacher, T., Piringer, 2013 Oliver et al., 2010 Finding structures

More examples of integrations

(24)

Lessons learned

• All starts with problems and related data

– Need to earn users’ trust!

• Visual analysis as a means to increase data’s

value

• Enhanced analysis through informed use of computation

• Interactive visual methods improve reliability and

interpretability

• Tight integration enables quick hypothesis

prototyping

• Important to communicate the certainty of the findings

Looking into the future

• Understanding the real value of data • Scalable analysis processes

• Improved methods for visual guidance

• Enable interactive response in seamless integrations

• Adapt to changing information characters & problems

– Dynamic data, e.g., data streams – Prediction

(25)

Thank you!

• Helwig Hauser • Peter Filzmoser

• giCentre folks: Aidan Slingsby, Jason Dykes, Jo Wood

• Arvid Lundervold, Astri Lundervold, Nathalie Reuter, the Caleydo Team

References

Related documents

6 Standing Waves standing wave  an interference pattern produced  when incoming and reflected waves interfere with 

Figure 3 indicat the effect of the wind velocity on the overall heat-loss coefficient of the flat plate solar collector (1-5 m/s) on the losses coefficient, was

Indian immigrants who lived in Western countries for a long time, used to teach some Western imaginary demands on newcomer Indian immigrants like in Kapur’s The

The heart of our 360° quality assurance is the perfect coordination of the floor covering with the installation materials.. This guarantees a high level of functional reliability

The transport section of the 2010 Coalition Programme for Government states “We will support sustainable travel initiatives, including the promotion of cycling and walking, and

If the increased risk of burning or immersion for insured vehicles compared with uninsured vehicles is due to insurance fraud, the data in Table 8 provide the basis for an estimate

ASP.NET 是.NET 框架应用程序中的一个执行引擎,主要包括 Web Service 和 Web Form 这两个编程模型,我们可以在 ASP.NET