Computational
Infrastructure for Policy
Informatics
Policy Informatics in an Interdependent World
Workshop
Washington DC September 13 2007
Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
2
e-moreorlessanything
n ‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology
n e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
n Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and stakeholders across the world.
n This generalizes to e-moreorlessanything including presumably
e-Policyinformatics
n A deluge of data of unprecedented and inevitable size must be
managed and understood.
n People (see Web 2.0), computers, data and instruments must be
linked.
n On demand assignment of experts, computers, networks and
storage resources must be supported
3
Role of Cyberinfrastructure
n Cyberinfrastructure is infrastructure that supports
distributed science (e-Science)– data, people, computers
n Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc.
n It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds) between nodes
n Parallel needed to get high performance on individual large
simulations, data analysis etc.; must decompose problem
n Distributed aspect integrates already distinct components –
especially natural for data
n Cyberinfrastructure is in general a distributed collection of
parallel systems
n Cyberinfrastructure is made of services (originally Web
services) that are “just” programs or data sources packaged for distributed access
Structure of Cyberinfrastructure
n Distributed software systems are being “revolutionized” by
developments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0”
n The emerging distributed system picture is of distributed services
with advertised interfaces but opaque implementations
communicating by streams of messages over a variety of protocols
• Complete systems are built by combining either services or
predefined/pre-existing collections of services together to achieve new capabilities
n As well as Internet/Communication revolutions (distributed
systems), multicore chips will likely be hugely important (parallel systems)
n Industry not academia is leading innovation in these technologies
Policy Informatics Infrastructure
n The Party Line approach is clear – one creates a
Cyberinfrastructure consisting of distributed services accessed by portals/gadgets/gateways/RSS feeds
n Services include:
• “original data”
• Transformations or filters implementing DIKW (Data Information
Knowledge Wisdom) pipeline
• Final “Decision Support” step converting wisdom into action • Generic services such as security, profiles etc.
n Some filters could correspond to large simulations
n Infrastructure will be set up as a System of Systems (Grids of
Grids)
• Services and/or Grids just accept some form of DIKW and produce
another form of DIKW
• “Original data” has no explicit input; just output
Database
S S
S
S SS SS SS SS SS SS SS SS
F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service Anothe Grid
Raw Data Data Information Knowledge Wisdom Decisions S S S S Anothe Service Anothe Service S S Anothe
Grid S S
Information Management/Processing
n Diagram describes e-Science, Military Command and Control
and perhaps Policy Informatics
n Data Information Knowledge Wisdom transformation n (SOAP or just RSS) messages transport information expressed
in a semantically rich fashion between sources and services that enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL might help us
to have rich expressivity but they might be too complicated
n We are meant to build application specific information
management/transformation systems for each domain
• Each domain has specific services/standards (for API’s and Information)
and will use generic services (like R for datamining) and standards (RDF, WSDL)
• What is PIML Policy Informatics Markup Language?
• Standards made before consensus or not observant of technology progress
are dubious (cf. HLA in simulation or many grid standards)
Too much Computing?
n Historically one has tried to increase computing capabilities by
• Optimizing performance of codes
• Exploiting all possible CPU’s such as Graphics co-processors and “idle
cycles”
• Making central computers available such as NSF/DoE/DoD
supercomputer networks
n Next Crisis in technology area will be the opposite problem –
commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them – especially on clients
• Only 2 releases of standard software (e.g. Office) in this time span
n Gaming and Generalized decision support (data mining) are two
obvious ways of using these cycles
• Intel RMS analysis
• Note even cell phones will be multicore
n “Too much data” matched to “Too much computing” but
implications involved rather different
Intel’s Projection
Pradeep K. Dubey, [email protected]
Tomorrow
What is …? Is it …? What if …?
Recognition Mining Synthesis
Create a model instance
RMS: Recognition Mining Synthesis
Model-based multimodal recognition Find a model instance Model
Real-time analytics on dynamic, unstructured, multimodal datasets Photo-realism and physics-based animation Today
Model-less Real-time streaming andtransactions on static – structured
datasets
Very limited realism
Pradeep K. Dubey, [email protected]
What is a tumor? Is there a tumor here? What if the tumor progresses?
It is all about dealing efficiently with complex multimodal datasets
Recognition Mining Synthesis
Images courtesy:
http://splweb.bwh.harvard.edu:8000/pages/images_movies.html
What should we do?
n There will be high quality parallel data mining algorithms
• Speech Recognition, Text and multimedia search and browsers • New generation of desktop aides
• What are synergies to “Personal aides in an information rich world” (future of
PC?) and Policy Informatics?
n What filters (data mining) does policy informatics need?
n As computing free, focus on identifying information/knowledge/wisdom
needed (there is probably too much data but not so much wisdom in DIKW pipeline)
• We should use supercomputer/computer services but Information services more
important and less “controversial”
n Identify standards for data and data-mining API’s n Set up distributed Policy Informatics Services
n Use Web 2.0 (as it makes things easier) not current Grids (which makes
things harder)
• Build a “Programmable Policy Informatics Web”’ • Emphasize Simplicity
• Is “Secrecy” important and in fact viable?
n Should we care just about “original data” or also about the whole pipeline
DIKW?
Web 2.0 Mashups
and APIs
n
http://www.programmable
web.com/apis
has (Sept 12
2007) 2312 Mashups and
511
Web 2.0 APIs
and with
GoogleMaps the most often
used in Mashups
n
Mashups
are called
workflow in Grid arena
The List of
Web 2.0 API’s
n
Each site has API and
its features
n
Divided into broad
categories
n
Only a few used a lot
(
49 API’s
used in
10
or more
mashups
)
n
RSS feed of new APIs
nAmazon S3 growing
in popularity
Grid Service Philosophy I
n
Services
receive
data
in
SOAP messages
, manipulate it
and produce
transformed data
as further messages
n
Knowledge is created
from information by services
• Information is created from data by services
n
Semantic Grid
comes from building
metadata rich
systems of services
n
Meta-data
is
carried
in
SOAP
messages
n
The Grid enhances
Web services with
semantically rich
system and application specific
management
n
One must exploit and work around the
different
approaches to
meta-data (state)
and their
manipulation
in Web Services
Grid Service Philosophy II
n There are a horde of support services supplying security,
collaboration, database access, user interfaces
n The support services are either associated with system or
application where the former are WS-* and GS-* which implicitly or explicitly define many support services
n There are generalized filter services which are applications that
accept messages and produce new messages with some data derived from that in input
• Simulations (including PDE’s and reactive systems) • Data-mining
• Transformations • Agents
• Reasoning
• Decision making Tools are all termed filters here
n Agent Systems are a special case of Grids
n Peer-to-peer systems can be built as a Grid with particular
discovery and messaging strategies
Grid Service Philosophy III
n
Filters
can be a
workflow
which means they are
“just
collections of other simpler services
”
n
Grids
are
distributed systems
that accept
distributed messages and produce distributed result
messages
n
A
service
or a
workflow
is a
special case
of a
Grid
nA collection of services
on a
multi-core chip
is a
Grid
n
Sensors
or
Instruments
are
“managed”
by
services;
they may
accept
non SOAP
control messages
and
produce data
as
messages
(that are not usually
SOAP)
Virtual Observatory Astronomy Gri
Integrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
Galaxy Density Map
Service or Web service Approach
n One uses GML, CML etc. to define the data in a system and one
uses services to capture “methods” or “programs”
n In eScience, important services fall in three classes
• Simulations
• Data access, storage, federation, discovery • Filters for data mining and manipulation
n Services use something like WSDL (Web Service Definition
Language) to define interoperable interfaces (see OPAL talk!)
n WSDL establishes a “contract” independent of implementation
between two services or a service and a client
n Services should be loosely coupled which normally means they
are coarse grain
n Services will be composed (linked together) by mashups
(typically scripts) or workflow (often XML – BPEL)
n Software Engineering and Interoperability/Standards are closely
related
Philosophy of Web Service Grids
n
Much of Distributed Computing was built by natural
extensions of computing models developed for sequential
machines
n
This leads to the
distributed object
(DO) model represented
by Java and
CORBA
•
RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java
n
Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly
•Distributed Objects
Replaced by
Services
n
Note
CORBA
was considered too complicated in both
organization and proposed infrastructure
•
and
Java
was considered as “tightly coupled to Sun”
•
So there were other reasons to discard
n
Thus replace distributed objects by
services
connected by
“
one-way
” messages and not by request-response messages
Web services
n
Web Services
build
loosely-coupled,
distributed
applications,
(wrapping existing
codes and databases)
based on the
SOA
(service oriented
architecture) principles.
n
Web Services interact
by exchanging messages
in
SOAP
format
n
The contracts for the
message exchanges that
implement those
interactions are
described via
WSDL
A typical Web Service
n In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
n The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
Paymen Credit
Card
Warehous e
Shipping control
WSDL interfaces
WSDL interfaces
Securit
y Catalog
Porta Service
Web Services Web Services
The Grid and Web Service Institutional Hierarchy
OGSA GS-*
and some WS-* GGF/W3C/…
XGSP (Collab)
WS-* fro OASIS/W3C Industry
Apache Axi .NET etc.
Must set standards to get interoperability
2: System Services and Features (WS-* from OASIS/W3C/Industry)
Handlers like WS-RM, Security, UDDI Registry
3: Generally Useful Services and Features
(OGSA and other GGF, W3C) Such as
“Collaborate”, “Access a Database” or “Submit a Job”
4: Application or Community of Interest (CoI
Specific Services such as “Map Services”, “Run
BLAST” or “Simulate a Missile”
1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)
XBM
XTCE VOTABLE CML
CellML
The Ten areas covered by the 60 core WS-* Specifications
WSRP (Remote Portlets)
10: Portals and User Interfaces
WS-Policy, WS-Agreement
9: Policy and Agreements
WSDM, WS-Management, WS-Transfer
8: Management
WSRF, WS-MetadataExchange, WS-Context
7: System Metadata and State
UDDI, WS-Discovery
6: Service Discovery
WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation
5: Security
BPEL, WS-Choreography, WS-Coordination
4: Workflow and Transactions
WS-Notification, WS-Eventing (Publish-Subscribe)
3: Notification
WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM
2: Service Internet
XML, WSDL, SOAP
1: Core Service Model
Examples WS-* Specification Area
Activities in Global Grid Forum Working Groups
Authorization, P2P and Firewall Issues, Trusted Computing
7: Security
Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model
6: Management
Network measurements, Role of IPv6 and high performance networking, Data transport
5: Infrastructure
Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management
4: Data
Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling
3: Compute
Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,
2: Applications
High Level Resource/Service Naming (level 2 of slide 6), Integrated Grid Architecture
1: Architecture
GS-* and OGSA Standards Activities GGF Area
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a
Service
(same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such
services
accept and produce data from users files and
database
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Servic
e Data
Two-level Programming II
n
The Grid is discussing the composition of distributed
services
with the runtime
interfaces to Grid as
opposed to UNIX
pipes/data streams
n
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
n
Such interpretative environments are the single
processor analog of
Grid Programming
n
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
Service
1 Service2
Service
3 Service4
Grid Workflow Data Assimilation in Earth Science
n Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Typical graphical interface to service
composition