• No results found

Simplifying the design of workflows for large-scale data exploration and visualization

N/A
N/A
Protected

Academic year: 2021

Share "Simplifying the design of workflows for large-scale data exploration and visualization"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Simplifying the Design of Workflows

for Large- Scale

Data Exploration and Visualization

Juliana Freire Claudio Silva

http://www .cs. utah. edu/ "-J juliana http://www .cs. utah. edu/ "-J csilva

(2)

Workflows and Com putational Processes

• Workflows are em erging as a paradigm for

representing and managing com plex com putations

- Simulations, data analysis, visualization, data integration

• They capture computation and analysis processes, enabling

- Automation, reproducibility, result sharing

• Workflows are rapidly replacing prim itive shell scr i pt s

- Apple's Mac OS X Autom ator, Microsoft Windows Workflow Foundation, and Yahoo! Pipes

• Business Workflows => Scientific Workflows

- Important differences!

(3)

Workflows: Scientific vs. Business

• Express sequence of data transformations • Dataflow: Stateless, fu nct ion al • Data intensive,

com puting intensive • Cater to a broad set of

users

• Ensure rules and

prescribed processes are followed

• Control flow (e.g., BPEL): State and side effects

(4)

c

Exploration and Workflows

Workflows have been traditionally used to automate repet it ive tasks

• I n exploratory tasks, change is the norm!

Data analysis and exploration are iterative processes

Data Data ~ Workflow Specification .... ~ Data Product Manipulation

Microsoft eScience Workshop, 2008

T ~ Perception & Cognition Exploration .... ~ Knowledge User

Figure modified from J. van Wijk, IEEE Vis 2005

(5)

Data Exploration and Workflows

workflow

raw data: CT scan

Files (workflow specifications) anon4877 _voxel_scale_1_zspace_20060331.srn anon4877 _textureshading_20060331.srn anon4877 _textureshading_planeO_20060331.srn anon4877 _goodxferfunction_20060331.srn Notes r lI\..~t~til L V~SUIA tLZlAtLDIA ,

WI.: Aototeot textuye til V1 Aototeot pLtilll\..e to

VI . ,

(6)

Explorat ion and Creat ivity Su pport

• Exploratory processes require reflective reasoning

"Reflective reasoning requires the ability to store temporary results, to make inferences from stored knowledge, and to follow chains of reasoning

backward and forward, sometimes backtracking when a prom ising line of thought proves to be unfruitful .... the process is slow and laborious"

Donald A. Norm an

• Need external aids-tools to facilitate this process

- Creativity support tools [Shneiderman, CACM 2002]

• Need aid from people-collaboration

(7)

Data Exploration and Workflows: Issues

• Hard to assem ble and iteratively refine workflows

• Combine many tools and libraries: Need in-depth knowledge to weave them together

• No support for reflective reasoning

- E.g., history of the exploration trail maintained manually through file-nam ing conventions and detailed notes

- Hard to understand the exploratory process and relationships am ong workflows

• Lack of support for collaboration

Existing systems fail to provide the necessary

infrastructure for exploratory tasks. As a result, the

generation and maintenance of workflows is a major

(8)

VisTrails: Managing Scientific Exploration

• Goal: reduce time to insight

• Build infrastructure to streamline exploratory tasks

such as data analysis and visualization

• Support for collaboration

Usability-provide tools and intuitive interfaces

• The VisTrails System: an open-source provenance-enabled scientific workflow system

- > 6,000 downloads since 2007

- Used in many applications: environmental modeling (OHSU), physics sim ulation (Cornell, LANL), medical studies (University of Utah), ...

(9)

Outline

• Using provenance to support reflective reasoning

• Explori ng and re- u si ng proven ance

- Querying workflows by exam pie - Creating workflows by analogy - Auto-com pletion for workflows

• Emerging applications

(10)

c

(11)

Change-Based Provenance

Captures provenance of workflow

evolution

Records user actions

Provenance

=

changes to

com putational tasks

- Add a module, add a connection, change a parameter value

addModule deleteConnection addConnection addConnection setParameter

Q

= ~

~n;V

-

-E~~

(12)

Change-Based Provenance

• Records user actions

• Provenance

=

changes to com putational tasks

- Add a module, add a connection, change a parameter value

• Extensible change algebra

• A vistrail node vt corresponds to the workflow that is constructed by the sequence of actions from the root to vt

vt

=

xn

0

xn_

1 ° ... 0

x

1 0

0

[Freire et ai, I PAW 2006]

Microsoft eScience Workshop, 2008

vistrail

(13)

Exploring the Change Space

Script i ng workflows: Param eter explorat ions are

sim pie to specify and apply

Exploration of param eter space for a workflow

v

t

(setParameter(idn, valuen) 0 • • • 0 (setParameter(idt , value t ) 0 v

t )

l .... lVi.TraiI>Oulidor lorminalor,lIml g~~

"lkCan..era Set'/\eWUP(O,O, ,I) set9osIIon(7~S. ~S3, 369) 5«Foc-<13S. 135, ISO) vtkContourfilt ... 5et-.iue(O.67) vtkStructu,edPointsRe_

(14)

Exploring the Change Space

• Script i ng workflows: Param eter explorat ions are sim pie to specify and apply

• Exploration of param eter space for a workflow

v

t

(setParameter(idn , valuen) 0 • • • 0 (setParameter(idt , valuet ) 0 v

t )

• Exploration of multiple workflow specifications

(addModule(id;, ..

J

0 (deleteModule(id;) 0 v

1 )

(addModule(id;, ..

J

0 (deleteModule(id;) 0 vn )

• Results can be conveniently compared in the VisTrails spreadsheet

• Can create an im at ions too!

• Caching to avoid redundant com putations [Bavoil et aI., I EEE Vis 2005]

(15)

Com puting Workflow Differences

• No need to com pute subgraph

isom orph ism!

• A vistrail is a rooted tree: all

nodes have a com m on ancestor-diffs are

well-defined and simple to

compute

vt1

=

Xio Xi_tO ... oxto

0

vt

2

=

Xj 0 Xj -1 0 • • • 0 X 1 0

0

vt 1-vt2 = {Xi' X i- t ' ... , X t , 0)

( Xj , xj - 1, •• .,X 1 , 0)

• Different sem ant ics:

(16)

Collaborative Exploration

• Collaboration is key to data exploration

- Translational, integrative approaches to science

• Store provenance information in a database

• Synchronize concurrent updates through locking

- Real-tim e collaboration [Ellkvist et aI., I PAW 2008]

• Asynchronous access: similar to version control

system s

- Check out, work offline, synchronize - Users exchange patches

• Synchronization is sim pie-provenance is

monotonic

• No need for a central repository-support for

distributed collaboration

- For details see Callahan et ai, SCI I nstitute Technical Report, No. UUSCI-2006-016 2006

(17)

Change- Based Provenance: Su m mary

• General: Works with any system that has undo/redo!

• Concise representation

• Uniform Iy captures data and workflow provenance

- Data provenance: where does a specific data product come from?

- Workflow evolution: how has workflow structure changed over time?

• Results can be reproduced

Detailed information about the exploration process

• Provenance beyond reproducibility:

- Support for reflective reasoning

- Scalable exploration of the par am eter space-results can be com pared side-by-side in the spreadsheet

(18)

• Storing detailed information is important, but not

enough!

• Need appropriate user

interface and operations to leverage information

- Understand and re- use the history

• Simplify the creation of new workflows

(19)

Looking for Exam pies

• Need to query workflow collection:

- Find workflows that process a particular type of file - Find workflows that output a particular data product

- Find workflows that contain a given module or sequence of modules

• Workflow are graphs: hard to specify queries using text

(20)

Querying Workflows by Exam pie

• WYSIWYQ --

What You See Is

What You Query

• I nterface to create workflow is sam e as to query

Pipeline Interface 000000 vtkStructuredPolntsReader o 00000000 vtkOpenGLVolumeTextureMapper3D

Microsoft eScience Workshop, 2008

o

[Scheidegger et al., TVCG 2007

Query Result

.,-;..

---Query Interface

(21)

Refining Workflows

Com plex workflows are hard to create

- Dam ain knowledge

- Fam iliarity with different tools Steep learning curve

r

... T ... ...

Bit fl:Iit »I'll' &M1 ~~ ~

JW ~ ~ PI' 0 ~Imt\l"'ton .... lQof:'Y p ... ~ .. r:~a!lon

:~DJ-·~- i~:

~~ e~ COl 00) PO~ l a

--

0.4 005. 0.' o,~.. ,--""': .. =n-" ,...,.0..' 0 ... "10 0 +1 I. "to 0 +1 • :OMetrO.flSlOVom;: , ~

:0".

~'

.

j !:: 1004 ." .. • O!

(22)

Refining Workflows by Analogy

• Leverage the wisdom of the crowds in shared

provenance

- Sam e workflow refinem ents are com man, e. g., change the rendering technique, publish im age on the Web

• Apply refinem ents by analogy, autom atically

[Scheidegger et ai, IEEE TVCG 2007]

AnalogyTemplate

MicrosoTt e~clence vvorKsnop, ~UUts

~

1 - - - 1 1 1 1 1 l _ _ _ _ _ ... ~ ... '·r. r , :,.-.. .. ' --

.

. ---. ...

Automatically constructed visualizations

(23)

o

Refining Workflows by Analogy

poe Report

ProteIn Title

Authors

Atom Count

is to

NEURAL CELL ADHESION MOLECULE, MODULE 2, NMR, 20 STRUCTURES P.H.JENSEN, V.SOROKA, N.K.THOMSEN, V.BEREZIN, E.BOCK, F.M.POULSEN C:9560 H: 15440 N:2580 0: 2680 5:60

Links POB Entry

as

is to

7

(24)

o

The Analogy Algorithm

1. Com pute difference: 8(A,8)

- Just like a patch! - But ...

D = 8(A,B) 0 C may not be a valid

workflow

2. Find correspondences between A

and C: map(A,C)

Diffuse similarity scores across the

product graph Axe using Eigenvalue

is to

as

lIJ

isto

D

111 = DIFF(A, 8) 1 ~.

A

decom pos ition s PDBReport

~~---~~~temT~me-=STRU~CTU~REO~FTH~E ~

3. Com pute mapped difference Authors ~~~!f!~f~~f~~~:~~TER

1\ (A 8) (A C) 1\( A 8) Atom Count C:5934

UAC '

=

m ap , U , ~::]~~

4. D

=

8 AC(A,8) 0

C

Microsoft eScience Workshop, 2008

C: 20 N: 12 NA: 4 0:44 P: 6 Links .E.Qa..E.o1[y. ~ ________ ~ ______ ~ D

(25)

The Analogy Operation

• Allows workflows to be refined without requiring users to direct Iy m od ify the speeifieat io n

• Basis for scalable updates

• Analogies are not foolproof

- If it works, great. If it doesn't, it may help - User can edit and fix the new version

• I m prove by

- Using domain knowledge

(26)

The Need

for

Guidance

Workflow Design

vtkCeliTypes vtkCenteredSliderWidget •

In

vtkChacoReader 00000000 000000000 OOOOOOOO[ 0 DO

vtkCh, 000000 lateErode31

1 vtkWarpVector vtkExtractEdgE vtkFieldDat; vtklmplicitSum n

vtkCh. vtkPlecewlseFunction DO

~~rD 0 J - ouuuuuu __ :.JUUUUL 0000000000 JOO 00000000

vtk vtkRungeKutta2 IUnaY;jl)OO :one vtkLODActor vtkAxes vtkTransformFilter metl'1 vtkSphereSource

vtk 0 0 DO . DO

vtk L-JL-JL-J nn~ooo ~UUUUUOOOOOOOO

D~D[tmlc..tL"J'- tklmageActor

I

0 DataSetReader 10 [ vtkOpenG LVolu meTextureMapper3D

o vtkCylin vtkProperty2D - - 0

vtkTransform 01000000 0 0000000000 ...JL-JL-JL-JL-JL-JL-JL-JL-JL-JL-JL-J&:::J

DO I ... JrowSource vtkPolyDataMapper vtklmageReslice

VTKC~~tTnCnnr.rAtA r'r 0 DO

~kD'Dc10nOOOOI 0 IctorStyle 00000000 0_______ Juuu ...

vt vtkElevation vtkCylindricalTransform -.JUUUL 0 vtkDICOMlmageRe vtkLlneSourc~r:J !IIDataToPointDat~n

~ ~LJLJLJLJLJLJLJLJ 0 vtkActor DO JOOOOOOOOOO

vtkConeSource vtkBVUReader ""] 0 0 0 0 0 0 0 0 vtkActor2D 0 vtkGlyph3D 000

vtkConnectivityFilter vtkP rt I vtklmageFFT vtkTextActor

~~~~IODOOOOOOOOOOO ~ rope y 0 JGL-JL-JL-J oog~ooooo 0 0

vtkCo vtklmageResample b Ax LJ

A t 2D vtkHyperOctree vtkWarpScalar

vtkCo DO e es c or - vtkParametricSuperEllipsoid 0 L

vt~ 00000000 [

00-vtkCoordinate 0000______ S bd' .. F'lt VII\\,onnec;;t uuuuuuuu

)erOctrE vtkLoop u IVlslon I er

0000000 0 DO vtkCamer;l vtklmageDataGeometryFilter vtkStructuredGridReader 00000000 DO DO :J 0 0 0 0 0 0 0 0 vtkM-000000000 . edGraalem~naaer- 000000000 vtkCursor2[ WWWWWWWW vtkDelaunay2D c . . [ vtkXVZMolReader vtkCursor3[ vtklmageClip~~ vtkWelghtedTranSformFllteor 0 0000000 DO vtkCurvatun vtkCutter 00000000000 tkFrustumCoverage\,ul u u u u u u u u u u u u t..J 0 0 0 0 0 0 0 0 0 U U U

vtkCylinder . .&1,"1 n'T'''''U~eader vtkMergeFllter vtkRungeKutta4

V~CYI~n~e.rsource 00000000 DO. nnnnnoooo 0

~ , vtkPla 00000000 rammableF 00000000 0000000

00000 vtkButterflySubdivisionFilter 100000 vtkSurfaceReconstructionFilter vtkRenderer

vtkScalarE DO DO 0

u VlK:")ueamTracer

DO

(27)

VisCom plete: A Workflow

Recom m endation System

• Mine provenance collection: Identify graph

fragments that co-occur in a collection of workflows

• Predict sets of likely workflow additions to a given

part ial workflow

• Similar to a Web browser suggesting URL

com pletions ~kOSPFiIt'rGroup \<tkOataOb,ectlbDataSlt .. IAkOttaS.I'SutfaceFdtlt ~kOatiiSeIToDataObl·ct ... -.tkOataSelltlanoleFlhv oA.kOecunal.pO~ln.F,'t.r \rtkElevatlonFllter IAkFleldDiltaTDAttribute. IAkGenerlcContoutFll!.er 'A.kGenerICGeomttryRlter-\AkGenerlcG/yptlJDAlter IotkGlnelicOUtllMRlttr \otkGener\cProbeFlller \OtkGtometf)flll:er IAkGradlentFiller IAkG~phLayOUtl1lt.r tA.kH~chlc.lOatilLewt., tAkH\et.rchicalDataS.tG.

[Keep et al., IEEE Vis 2008]

Builder

.BI. ~dlt lilI-am ~ ElIlr.n Help

1I1kDataSeITo08IaOb,eCI ... \o1kDalaSft"',an~F"" VlI:Declm.teP~nef1lter ..n.kElevauonAlttr vtkFI.ldO.t.ToAttnbut .... vtkGtnlncContourFllttf vllffia"aNCGoomltl)'flhaf vtkGenlncGlyphlOFliter vtkG"n.rkOut~n"FNt.r vtkGenencProbeFllter IItkGtOrMll)fiIttr \/Iir:GJadl.ntAIt,t \'tkG~hl.¥lIlRIt.,. vt"HlerarchlcaIOalaLew:l •• vtkHler.rchlc .. fO .. tiiSetG ... vtkIottP8fOdr . . Contourf •••

(28)

VisCom plete: A Workflow

Recom m endation System

Identify graph fragments that co-occur in a

collect ion of workflows

Predict sets of likely workflow additions to a given

part ial workflow

Similar to a Web browser suggesting URL

com pletions

OOOOOD

Microsoft eScience Workshop, 2008

o 0000

vtkOataSetRaadar

D

(29)

VisCom plete: Dem

0

http://www.cs.utah.edu/''-Jjuliana/videos/viscomplete_h_264.mov

[Koop et al., IEEE Vis2008]

(30)

Resu Its Su m mary

Eliminates over 50% of actions

Selected completions are almost always in the first

four suggestions

A database of sim pie pipelines can aid users

constructing more complex pipelines

• See [Koop et al., TVCG 2008] for details on how the

path database is constructed and on the completion algorithm

(31)

Conclusions and Future Work

• Appropriate support for exploratory tasks is

essential for a wider adoption and more effective use of scientific workflow systems

• Provenance can be used to support reflective

.

reasoning

• Intuitive interfaces for simplifying the construction and refinem ent of workflows

• Sharing workflows provenance at a large scale creates new opportunities

- Workflow/ provenance repositories; provenance- enabled publications

- Scientists can learn by exam pie; expedite their scientific

(32)

Acknowledgments: Funding

• This work is partially supported by the National

Science Foundation, the Department of Energy, an IBM Facu Ity Award, and a Un iversity of Utah Seed Grant.

Microsoft eScience Workshop, 2008

DOE SciDAC VISUALIZATION AND ANALYTICS CENTER FOR ENABLING TECHNOLOGIES

1 • •

1

.

1

-

.

. 1. - . .' I • •

-- -- -- ----- ------ - -

---- --

---

(33)

Acknowledgments: People

VisTrails Group - Claudio Silva - Erik Anderson - Jason Callahan - Steven P. Callahan - Lorena Carlo - David Koop - Lauro Lins - Emanuele Santos - Carlos E. Scheidegger - Huy T. Vo - Geoff Draper

(34)

For more info about VisTrails

Visit: http://www.vistrails.org

Figure

Figure modified from J.  van Wijk, IEEE Vis  2005

References

Related documents

The mission of the company is to offer an innovative product and service with quality, therefore it dedicates much of its efforts to the research and development of new

It is recommended that local authorities should go further than the minimum publication requirements set out in Part 2 and publish information on a monthly instead of annual

Modern (trendy) types of organization structure are: T-form structure Virtual structure Web structure Inverted structure Fishnet structure Team structure Front /

Distance Education Support: Law librarians can provide an additional point of connection to distance education students by providing them with both access to print and digital

Using REST with SMART Browser VistA System RDF Data fetched From

 Helps you quickly activate and reactivate tickets in regions with a poor data reception. Saving To Your

Typically ascribed to the writings of Immanuel Kant and Georg Wilhelm Friedrich Hegel, personhood-based theories of copyright serve as the foundation for the moral rights prominent

Following on from the above references to missing data, and respondent beliefs, embarrassment and objection to the question, the review will now consider more generally the topics