• No results found

Computational Infrastructure for Policy Informatics

N/A
N/A
Protected

Academic year: 2020

Share "Computational Infrastructure for Policy Informatics"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Computational

Infrastructure for Policy

Informatics

Policy Informatics in an Interdependent World

Workshop

Washington DC September 13 2007

Geoffrey Fox

Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

(2)

2

e-moreorlessanything

n ‘e-Science is about global collaboration in key areas of science,

and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology

n e-Science is about developing tools and technologies that allow

scientists to do ‘faster, better or different’ research

n Similarly e-Business captures an emerging view of corporations as

dynamic virtual organizations linking employees, customers and stakeholders across the world.

n This generalizes to e-moreorlessanything including presumably

e-Policyinformatics

n A deluge of data of unprecedented and inevitable size must be

managed and understood.

n People (see Web 2.0), computers, data and instruments must be

linked.

n On demand assignment of experts, computers, networks and

storage resources must be supported

(3)

3

Role of Cyberinfrastructure

n Cyberinfrastructure is infrastructure that supports

distributed science (e-Science)– data, people, computers

n Exploits Internet technology (Web2.0) adding (via Grid

technology) management, security, supercomputers etc.

n It has two aspects: parallel – low latency (microseconds)

between nodes and distributed – highish latency (milliseconds) between nodes

n Parallel needed to get high performance on individual large

simulations, data analysis etc.; must decompose problem

n Distributed aspect integrates already distinct components –

especially natural for data

n Cyberinfrastructure is in general a distributed collection of

parallel systems

n Cyberinfrastructure is made of services (originally Web

services) that are “just” programs or data sources packaged for distributed access

(4)

Structure of Cyberinfrastructure

n Distributed software systems are being “revolutionized” by

developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0”

n The emerging distributed system picture is of distributed services

with advertised interfaces but opaque implementations

communicating by streams of messages over a variety of protocols

Complete systems are built by combining either services or

predefined/pre-existing collections of services together to achieve new capabilities

n As well as Internet/Communication revolutions (distributed

systems), multicore chips will likely be hugely important (parallel systems)

n Industry not academia is leading innovation in these technologies

(5)

Policy Informatics Infrastructure

n The Party Line approach is clear – one creates a

Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds

n Services include:

“original data”

Transformations or filters implementing DIKW (Data Information

Knowledge Wisdom) pipeline

Final “Decision Support” step converting wisdom into actionGeneric services such as security, profiles etc.

n Some filters could correspond to large simulations

n Infrastructure will be set up as a System of Systems (Grids of

Grids)

Services and/or Grids just accept some form of DIKW and produce

another form of DIKW

“Original data” has no explicit input; just output

(6)

Database

S S

S

S SS SS SS SS SS SS SS SS

F S F S F S F S F S F S F S F

S SF

F S F S F S F S F S F S F S F S F S F S F

S Portal F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD

MetaData Filter Service Sensor Service Other Service Anothe Grid

Raw DataDataInformationKnowledgeWisdom Decisions S S S S Anothe Service Anothe Service S S Anothe

Grid S S

(7)

Information Management/Processing

n Diagram describes e-Science, Military Command and Control

and perhaps Policy Informatics

n DataInformationKnowledgeWisdom transformation n (SOAP or just RSS) messages transport information expressed

in a semantically rich fashion between sources and services that enhance and transform information so that complete system

provides

Semantic Web technologies like RDF and OWL might help us

to have rich expressivity but they might be too complicated

n We are meant to build application specific information

management/transformation systems for each domain

Each domain has specific services/standards (for API’s and Information)

and will use generic services (like R for datamining) and standards (RDF, WSDL)

What is PIML Policy Informatics Markup Language?

Standards made before consensus or not observant of technology progress

are dubious (cf. HLA in simulation or many grid standards)

(8)

Too much Computing?

n Historically one has tried to increase computing capabilities by

Optimizing performance of codes

Exploiting all possible CPU’s such as Graphics co-processors and “idle

cycles”

Making central computers available such as NSF/DoE/DoD

supercomputer networks

n Next Crisis in technology area will be the opposite problem –

commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients

Only 2 releases of standard software (e.g. Office) in this time span

n Gaming and Generalized decision support (data mining) are two

obvious ways of using these cycles

Intel RMS analysis

Note even cell phones will be multicore

n “Too much data” matched to “Too much computing” but

implications involved rather different

(9)

Intel’s Projection

(10)

Pradeep K. Dubey, [email protected]

Tomorrow

What is …? Is it …? What if …?

Recognition Mining Synthesis

Create a model instance

RMS: Recognition Mining Synthesis

Model-based multimodal recognition Find a model instance Model

Real-time analytics on dynamic, unstructured, multimodal datasets Photo-realism and physics-based animation Today

Model-less Real-time streaming andtransactions on static – structured

datasets

Very limited realism

(11)

Pradeep K. Dubey, [email protected]

What is a tumor? Is there a tumor here? What if the tumor progresses?

It is all about dealing efficiently with complex multimodal datasets

Recognition Mining Synthesis

Images courtesy:

http://splweb.bwh.harvard.edu:8000/pages/images_movies.html

(12)
(13)

What should we do?

n There will be high quality parallel data mining algorithms

Speech Recognition, Text and multimedia search and browsersNew generation of desktop aides

What are synergies to “Personal aides in an information rich world” (future of

PC?) and Policy Informatics?

n What filters (data mining) does policy informatics need?

n As computing free, focus on identifying information/knowledge/wisdom

needed (there is probably too much data but not so much wisdom in DIKW pipeline)

We should use supercomputer/computer services but Information services more

important and less “controversial”

n Identify standards for data and data-mining API’s n Set up distributed Policy Informatics Services

n Use Web 2.0 (as it makes things easier) not current Grids (which makes

things harder)

Build a “Programmable Policy Informatics Web”’Emphasize Simplicity

Is “Secrecy” important and in fact viable?

n Should we care just about “original data” or also about the whole pipeline

DIKW?

(14)

Web 2.0 Mashups

and APIs

n

http://www.programmable

web.com/apis

has (Sept 12

2007) 2312 Mashups and

511

Web 2.0 APIs

and with

GoogleMaps the most often

used in Mashups

n

Mashups

are called

workflow in Grid arena

(15)

The List of

Web 2.0 API’s

n

Each site has API and

its features

n

Divided into broad

categories

n

Only a few used a lot

(

49 API’s

used in

10

or more

mashups

)

n

RSS feed of new APIs

n

Amazon S3 growing

in popularity

(16)
(17)

Grid Service Philosophy I

n

Services

receive

data

in

SOAP messages

, manipulate it

and produce

transformed data

as further messages

n

Knowledge is created

from information by services

Information is created from data by services

n

Semantic Grid

comes from building

metadata rich

systems of services

n

Meta-data

is

carried

in

SOAP

messages

n

The Grid enhances

Web services with

semantically rich

system and application specific

management

n

One must exploit and work around the

different

approaches to

meta-data (state)

and their

manipulation

in Web Services

(18)

Grid Service Philosophy II

n There are a horde of support services supplying security,

collaboration, database access, user interfaces

n The support services are either associated with system or

application where the former are WS-* and GS-* which implicitly or explicitly define many support services

n There are generalized filter services which are applications that

accept messages and produce new messages with some data derived from that in input

Simulations (including PDE’s and reactive systems)Data-mining

TransformationsAgents

Reasoning

Decision making Tools are all termed filters here

n Agent Systems are a special case of Grids

n Peer-to-peer systems can be built as a Grid with particular

discovery and messaging strategies

(19)

Grid Service Philosophy III

n

Filters

can be a

workflow

which means they are

“just

collections of other simpler services

n

Grids

are

distributed systems

that accept

distributed messages and produce distributed result

messages

n

A

service

or a

workflow

is a

special case

of a

Grid

n

A collection of services

on a

multi-core chip

is a

Grid

n

Sensors

or

Instruments

are

“managed”

by

services;

they may

accept

non SOAP

control messages

and

produce data

as

messages

(that are not usually

SOAP)

(20)

Virtual Observatory Astronomy Gri

Integrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

(21)

Service or Web service Approach

n One uses GML, CML etc. to define the data in a system and one

uses services to capture “methods” or “programs”

n In eScience, important services fall in three classes

Simulations

Data access, storage, federation, discoveryFilters for data mining and manipulation

n Services use something like WSDL (Web Service Definition

Language) to define interoperable interfaces (see OPAL talk!)

n WSDL establishes a “contract” independent of implementation

between two services or a service and a client

n Services should be loosely coupled which normally means they

are coarse grain

n Services will be composed (linked together) by mashups

(typically scripts) or workflow (often XML – BPEL)

n Software Engineering and Interoperability/Standards are closely

related

(22)

Philosophy of Web Service Grids

n

Much of Distributed Computing was built by natural

extensions of computing models developed for sequential

machines

n

This leads to the

distributed object

(DO) model represented

by Java and

CORBA

RPC (Remote Procedure Call) or RMI (Remote Method

Invocation) for Java

n

Key people think this is not a good idea as it scales badly

and ties distributed entities together too tightly

Distributed Objects

Replaced by

Services

n

Note

CORBA

was considered too complicated in both

organization and proposed infrastructure

and

Java

was considered as “tightly coupled to Sun”

So there were other reasons to discard

n

Thus replace distributed objects by

services

connected by

one-way

” messages and not by request-response messages

(23)

Web services

n

Web Services

build

loosely-coupled,

distributed

applications,

(wrapping existing

codes and databases)

based on the

SOA

(service oriented

architecture) principles.

n

Web Services interact

by exchanging messages

in

SOAP

format

n

The contracts for the

message exchanges that

implement those

interactions are

described via

WSDL

(24)

A typical Web Service

n In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

n The simplest implementations involve XML messages (SOAP) and

programs written in net friendly languages like Java and Python

Paymen Credit

Card

Warehous e

Shipping control

WSDL interfaces

WSDL interfaces

Securit

y Catalog

Porta Service

Web Services Web Services

(25)

The Grid and Web Service Institutional Hierarchy

OGSA GS-*

and some WS-* GGF/W3C/…

XGSP (Collab)

WS-* fro OASIS/W3C Industry

Apache Axi .NET etc.

Must set standards to get interoperability

2: System Services and Features (WS-* from OASIS/W3C/Industry)

Handlers like WS-RM, Security, UDDI Registry

3: Generally Useful Services and Features

(OGSA and other GGF, W3C) Such as

“Collaborate”, “Access a Database” or “Submit a Job”

4: Application or Community of Interest (CoI

Specific Services such as “Map Services”, “Run

BLAST” or “Simulate a Missile”

1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)

XBM

XTCE VOTABLE CML

CellML

(26)

The Ten areas covered by the 60 core WS-* Specifications

WSRP (Remote Portlets)

10: Portals and User Interfaces

WS-Policy, WS-Agreement

9: Policy and Agreements

WSDM, WS-Management, WS-Transfer

8: Management

WSRF, WS-MetadataExchange, WS-Context

7: System Metadata and State

UDDI, WS-Discovery

6: Service Discovery

WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation

5: Security

BPEL, WS-Choreography, WS-Coordination

4: Workflow and Transactions

WS-Notification, WS-Eventing (Publish-Subscribe)

3: Notification

WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM

2: Service Internet

XML, WSDL, SOAP

1: Core Service Model

Examples WS-* Specification Area

(27)

Activities in Global Grid Forum Working Groups

Authorization, P2P and Firewall Issues, Trusted Computing

7: Security

Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model

6: Management

Network measurements, Role of IPv6 and high performance networking, Data transport

5: Infrastructure

Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management

4: Data

Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling

3: Compute

Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,

2: Applications

High Level Resource/Service Naming (level 2 of slide 6), Integrated Grid Architecture

1: Architecture

GS-* and OGSA Standards Activities GGF Area

(28)

Two-level Programming I

• The Web Service (Grid) paradigm implicitly assumes a

two-level Programming Model

• We make a

Service

(same as a “distributed object” or

“computer program” running on a remote computer) using

conventional technologies

– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access

• Such

services

accept and produce data from users files and

database

• The Grid is built by coordinating such services assuming

we have solved problem of programming the service

Servic

e Data

(29)

Two-level Programming II

n

The Grid is discussing the composition of distributed

services

with the runtime

interfaces to Grid as

opposed to UNIX

pipes/data streams

n

Familiar from use of UNIX Shell, PERL or Python

scripts to produce real applications from core programs

n

Such interpretative environments are the single

processor analog of

Grid Programming

n

Some projects like GrADS from Rice University are

looking at integration between service and composition

levels but dominant effort looks at each level separately

Service

1 Service2

Service

3 Service4

(30)

Grid Workflow Data Assimilation in Earth Science

n Grid services triggered by abnormal events and controlled by workflow process real

time data from radar and high resolution simulations for tornado forecasts

Typical graphical interface to service

composition

References

Related documents

CPU, memory, network I/O configuration synopsis LoadPlugin "cpu" LoadPlugin "memory" LoadPlugin "interface" <Plugin interface> Interface lo Interface

Fifty-nine of Ohio’s 88 counties operated senior service levies that brought in over $94 million for the state’s older population in 2004.* These levies vary greatly from county

A full R-matrix calculation with n = 4, 5 levels is challenging, so before embarking on such a calculation we performed vari- ous DW calculations to estimate which configurations

Enrollment data from the 22 Medi-Cal managed care plans indicate that safety-net clinics provided the majority of access required to meet the explosive growth in Medi-Cal during

In summary, the main contributions of this paper are characterized as follows: (1) an output feedback design method is adapted to stabilise the dynamic multi-variable

De-icing salts cause damage through direct contact of salt solutions with plant foliage (referred to as "spray zone" injury) and through chemical and physical modification

The author especially focuses on the changes to the Criminal Code introduced since 2016, discussing the new offence of participating in a terrorist training (Art. 255a

Nonetheless, some generalizations can be made about issues which are the focus of policy attention across Canadian jurisdictions: improving integrated water resources