• No results found

Data Grids for HPC: Geographical Information System Grids

N/A
N/A
Protected

Academic year: 2020

Share "Data Grids for HPC: Geographical Information System Grids"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Grids for HPC:

Geographical Information

System Grids

Marlon Pierce Geoffrey Fox Indiana University

(2)

Overview from

(3)

Parallel Computing

n

Parallel processing is built on

breaking problems up

into parts

and simulating each part on a separate

computer node

n

There are several ways of expressing this breakup into

parts with Software:

Message Passing as in MPI or

OpenMP model for annotating traditional languages

Explicitly parallel languages like High Performance Fortran

n

And several computer architectures designed to

support this breakup

Distributed Memory with or without custom interconnectShared Memory with or without good cache

(4)

What are Web Services?

n

Web Services are distributed computer programs that

can be in any language (Fortran .. Java .. Perl .. Python)

n

The simplest implementations involve

XML messages

(SOAP)

and programs written in net friendly languages

like Java and Python

n

Here is a typical e-commerce use?

Securit

y Catalog

Paymen Credit

Card

(5)

What Is the Connection?

n

Both MPI and Web Services rely upon messaging to

interact.

n

But the difference is in speed of message transmission

MPI useful for microsecond communication speeds. n Clusters, traditional parallel computing

Web Services communicate with Internet speeds n Millisecond communication times at best.

n

This implies that we have (at least) a two-level

programming model.

Level 1: MPI within science applications on clusters and

HPC.

(6)

Two-level Programming I

n The Web Service (Grid) paradigm implicitly assumes a two-level

Programming Model

n We make a Service (same as a “distributed object” or “computer

program” running on a remote computer) using conventional technologies

C++ Java or Fortran Monte Carlo module perhaps running with MPI on

a parallel machine

Data streaming from a sensor or SatelliteSpecialized (JDBC) database access

n Such services accept and produce data from other services, files

and databases

n The Grid is used to coordinate such services assuming we have

(7)

Two-level Programming II

n

The Grid is discussing the composition of distributed

services

with the runtime

interfaces to Grid as

opposed to UNIX

pipes/data streams

n

Familiar from use of UNIX Shell, PERL or Python

scripts to produce real applications from core programs

n

Such interpretative environments are the single

processor analog of

Grid Programming

n

Some projects like GrADS from Rice University are

looking at integration between service and composition

levels but dominant effort looks at each level separately

Service

1 Service2

Service

(8)

3 Layer Programming Model

Application (level 1 Programming)

Application Semantics (Metadata, Ontology) Level 2 “Programming”

Basic Web Service Infrastructure

Web Service 1

Workflow (level 3) Programming

WS 2 WS 3 WS 4

MPI Fortran C++ etc.

(9)

Data and Science Applications

n

Two- (or three-) level programming applies to all

applications.

n

Typically we need to bind together HPC and non-HPC

parts.

How do you provide data to your application?How do you share data between applications?

How do you communicate results to analysis and visualization

programs?

n

This is particularly important as the size and quality of

observational data is growing rapidly.

n

Q:

How do you easily bind together science apps and

remote data sources?

A: Web Services (and Grids) provide the unifying

(10)

Grid Libraries

n

Programming the Grid

has many similarities with

conventional languages

In HPSearch you use similar Scripting languages

n

Grids are particularly good at supporting user

interfaces as the browser is a particular service

Portal technology important “gift” of Grids for HPC

n

Most promising (and not exploited often) is building

Grid “Libraries”

which are collections of services

which can be re-used in several applications

(11)
(12)

Data Deluged Science

n In the past, we worried about data in the form of parallel I/O or

MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing

n Data assimilation was not central to HPCC n ASC set up because didn’t want test data!

n Now particle physics will get 100 petabytes from CERN

Nuclear physics (Jefferson Lab) in same situationUse around 30,000 CPU’s simultaneously 24X7

n Weather forecasting, climate, solid earth (EarthScope, Eath

Systems Grid, GEON)

We discussed our project SERVOGrid in October 2004 lecture.

n Bioinformatics curated databases (Biocomplexity only 1000’s of

data points at present)

(13)

Data Deluge @ Home

n In 2003, all of Marion County, IN (including Indianapolis) was surveyed using

Light Detection and Ranging (LiDAR) sensing.

n GRW, Inc flew a Cessna 337 airplane over the entire county to produce

digitized maps.

1 point per square meter.495 square miles total.

n Can be used to create high resolution contour maps…. n But what do you do with all of the data?

LiDAR data represents 3 orders of

magnitude increase in data resolution over what is used today in conventional flood prediction (B. Engles, Purdue).

(14)

Example Data Grid:

The Earth System Grid

n U.S. DOE SciDAC funded R&D effort

n Build an “Earth System Grid” that enables

management, discovery, distributed access, processing, & analysis of distributed terascale climate research data

n A “Collaboratory Pilot Project”

n Build upon ESG-I, Globus Toolkit, DataGrid

technologies, and deploy

n Potential broad application to other areas

(15)

ESG Data Sets

n

Community Climate Systems Model data

This is data that is compatible with the National Center for

Atmospheric Research (NCAR) global climate model, CCSM

n Couples atmospheric, land surface, ocean, and sea ice

models.

This is a US government model for climate modeling and

prediction.

http://www.ccsm.ucar.edu/

n

Parallel Climate Model data

Data compatible with extensions to CCSM.

Uses same atmospheric model but different ocean and sea ice

(16)

ESG Challenges

n By the end of 2003, DOE-sponsored climate change research had

produced 100 TB of scientific data.

Stored across several DOE sites and NCAR.

n Consequence of HPC, will only escalate as models can simulate

global weather patterns at increasingly fine resolution.

n Basic problems in data management

What is in the data files (metadata)?

How were data created and by whom (provenance)?How data be stored and moved

between sites efficiently?

How can data be delivered to

scientific community?

(17)
(18)

ESG Data Sets

n

Community Climate Systems Model data

This is data that is compatible with the National Center for

Atmospheric Research (NCAR) global climate model, CCSM

n Couples atmospheric, land surface, ocean, and sea ice

models.

This is the US government’s workhorse code for climate

modeling and prediction.

http://www.ccsm.ucar.edu/

n

Parallel Climate Model data

Data compatible with extensions to CCSM.

Uses same atmospheric model but different ocean and sea ice

(19)

Example Data Grid: GEON

n Project Goal: Prototype interpretive environments of the future

in Earth Sciences.

n Use advanced information technologies to facilitate

collaborative, inter-disciplinary science efforts.

n Scientists will be able to discover data, tools, and models via

portals, using advanced, semantics-based search engines and query tools, in a uniform authentication environment that provides controlled access to a wide range of resources.

A prototype “Semantic Grid”

n A services-based environment facilitates creation of scientific

workflows that are executed in the distributed environment.

n Advanced GIS mapping, 3D, and 4D visualization tools allow

scientists to interact with the data.

(20)

GEON Grid Application: SYNSEIS

SYNSEIS is a grid application that provides an

opportunity for seismologists and other earth

science partners

to compute and study 3D

seismic records to understand complex subsurface

structures.

SYNSEIS is built using a

service-based

architecture

.

While it provides users an

easy-to-use GUI to access data, models and compute

(21)

GAS GRA GridFT GS

SYNSEIS Architecture

SYNSEI (FLASH GUI) IRIS DMC TeraGrid NCSA SynS eis Engin e TeraGrid SDSC LLNL MCR GEON Portal Cornell Map Server Crustal ModelsModelsCrustalModelsCrustal

Corba Web service Web serviceSOAP Web service

(22)
(23)

GEON SYNSEIS Conclusions

n Using the Grid technology, GEON team was able to bring an extremely

complex and cumbersome seismic data analysis procedure to a level that

can be used by anyone efficiently and effectively, hence SYNSEIS is a first

step towards faster discovery.

n Democratization of community resources allows not only GEON

researchers but also external community members to access state-of-the-art software and tools.

n Although the tool is developed for GEON applications, it holds a

tremendous potential for projects like EarthScope. SYNSEIS can be used

by EarthScope researchers to conduct timely analysis of collected data

n SYNSEIS also has a high potential to be used in educational

environments allowing students to experiment with data and make their

own earthquakes.

n SYNSEIS has allowed us to practice building distributed data and

(24)

SERVOGrid Example: GeoFEST

n SERVOGrid was discussed in more detail in the October lecture

of this series.

But worth another mention in this context.

n GeoFEST is

Geophysical Finite Element Simulation Tool

GeoFEST solves solid mechanics forward models with these

characteristics:

n 2-D or 3-D irregular domains

n 1-D, 2-D or 3-D displacement fields

n Static elastic or time-evolving viscoelastic problems

n Driven by faults, boundary conditions or distributed loads

GeoFEST runs in a variety of computing environments:

n UNIX workstations (including LINUX, Mac OS X, etc.) n Web portal environment

n Parallel cluster/supercomputer environment

n GeoFEST output can be compared directly with current and

(25)

GeoFEST and Data Grids

n GeoFEST works directly with Earth fault data.

n Luckily for us, there is a Web Service data source for earth

faults in California

QuakeTables: accessible for human use through

n http://infogroup.usc.edu:8080/public.html

n http://complexity.ucs.indiana.edu:8282/jetspeed/index.jsp

USC, UC-Irvine, and IU designed and built this as part of the SERVO

project.

n But GeoFEST needs programmatic access to the fault data

Users design layer and fault geometry problems and create finite element

meshes through Web portal interface.

n Like GEON, we use portlets.

n Portlets are a standard way to make Java-based (and other) portals

out of reusable components.

(26)

User Interface Server

DB Service 1

JDBC Job Sub/Mon And File Services Operating WSD L WSDL Browser Interface WS DL WSD L WS DL WS

DL WSDL

Viz Service WSDL SOAP SOAP

(27)
(28)
(29)

a

Topography 1 km

Stress Change

PBO

Site-specific Irregular

Scalar Measurements Constellations for Plate Boundary-Scale Vector Measurements

a

a

Ice Sheets Volcanoes

Long Valley, CA

Northridge, CA

(30)

HPC Simulation Data Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri

(31)

Data Assimilation

n Data assimilation implies one is solving some optimization

problem which might have Kalman Filter like structur

n Due to data deluge, one will become more and more dominated

by the data (Nobs much larger than number of simulation

points).

n Natural approach is to form for each local (position, time)

patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.

n Data reduction done in natural distributed fashion NOT on

HPC machine as distributed computing most cost effective if calculations essentially independent

(32)

Distributed Filtering

HPC Machine Distribute

Data Filter

Nobslocal patch 1

Nfilteredlocal patch 1

Nobslocal patch 2

Nfilteredlocal patch 2

Geographicall y

Distribute

Sensor patches

N

obslocal patch

>> N

filteredlocal patch

Number_of_Unknowns

local patch

Send needed Filter Receive filtered data In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least

squares matrix

(33)

Standards For

(34)

The Story So Far…

n HPC applications generate huge amounts of data.

Constant problem for all HPC centers, including DOD MSRCs.Managing scientific information about these applications is just as

important as storage technology.

n HPC applications use observational data as input.

Projects like the ESG, GEON, and SERVO illustrate how HPC

applications need to be coupled to data sources.

Quantity of observational data is growing rapidly, opening fields for

non-traditional HPC (LiDAR and flood modeling).

n Huge amounts of new data potentially drive new HPC

applications (LiDAR->Flood modeling)

n Earth sciences are a focus of our examples, but really, many

applications have data sources that are geographically described.

Weather prediction is an obvious example.

n Thus we see the importance of coupling GIS data grid services to

(35)

What is GIS?

n

Geographic Information Systems

ESRI: commercial company with many popular GIS

products.

Open Geospatial Consortium (formerly OpenGIS

Consortium).

We will focus on OGC since they define open and

interoperable standards.

n

What are the characteristics of a GIS system?

Need data models to represent informationNeed services for remotely accessing data.

(36)

GML: A Data Model For GIS

n

GML 3.x is a interconnected suite of over 20 connected

XML schemas.

n

GML is an abstract model for geography.

n

With GML, you can encode

Features: abstract representations of map entities.

Geometry: encode abstractly how to represent a feature

pictorially.

Coordinate reference systems

Topology

(37)

Example Use of GML

n The SCIGN (Southern

California Integrated GPS Network) maintains online catalogs of GPS stations.

n Collective data for each site is

made available through online catalogs.

Using various text formats.

n This is not suitable for

processing, but GML is.

n GML can be used to describe

GPS using Feature.xsd

schema, with values encoded at GPS observations.

(38)

Open GIS Services

n GML abstract data models can encode data but you need

services to interact with the remote data.

n Some example OGC services include

Web Feature Service: for retrieving GML encode features, like faults,

roads, county boundaries, GPS station locations,….

Web Map Service: for creating maps out of Web Features

Sensor Grid Services: for working with streaming, time-stamped data.

n Problems with OGC services

Not (yet) Web Service compliant

n “Pre” web service, no SOAP or WSDL

n Use instead HTTP GET/POST conventions.

Often define general Web Service services as specialized standards

(39)

Anatomy of WFS (G. Aydin)

n WFS provides three major services as described in OGC specification:

GetCapabilities: The clients (WMS servers or users) starts with requesting a

document from WFS which describes it’s abilities. When a getCapabilities request arrives, the server dynamically creates a capabilities document and returns this.

This is OGC’s formalization of metadata, so important to GEON, ESG, etc.

DescribeFeatureType: After the client receives the capabilities document he/she

can request a more detailed description for any of the features listed in the WFS capabilities document.

The WFS returns an XML schema that describes the requested feature.

Metadata about a specific entry.

GetFeature: The client can ask the WFS to return a particular portion of any

feature data.

GetFeature requests contain some property names of the feature and a Filter

element to describe the query.

The WFS extracts the query and bounding box from the filter and queries the

feature databases.

The results obtained from the DB query are converted the feature’s GML

(40)

Example WFS Capability Entries

Contains a text block indicating any fees imposed by the service provider for usage of the service or for data retrieved from the WFS. The keyword NONE is reserved to mean no fees.

Fees

Defines the top-level HTTP URL of this service. Typically the URL of a "home page" for the service.

OnlineResource

Contains short words to aid catalog searching.

Keyword

Descriptive narrative for more information about the server.

Abstract

Human-readable title to briefly identify this server in menus.

Title

A name the service provider assigns to the web feature service instance.

Name

(41)

Sample Feature- CA Fault Lines

<gml:featureMember> <fault>

<name>Northridge2</name>

<segment>Northridge2</segment> <author>Wald D. J.</author>

<gml:lineStringProperty> <gml:LineString srsName="null"> <gml:coordinates -118.72,34.243 -118.591,34.17 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>

n After receiving getFeature

request, WFS decodes this request, creates a DB query from it and queries the

database.

n WFS then retrieves the

features from the database and converts them into GML documents.

n Each feature instance is

wrapped as a

gml:featureMember element.

n WFS returns a

wfs:FeatureCollection

document which includes all

featureMembers returned in

(42)

•A WFS can serve multiple feature types data.

•WFS returns the results of GetFeature requests as GML documents (Feature Collections). •Clients may include other

(43)

Schematic Interactions Between GIS

Services

WMS

IS

WFS

WFS

WFS

california fault data @complexity

california boundary data

california river data @gf1

(44)

Defining IS

n The central IS block in the proceeding diagram represents

nebulous “information services.”

n Information services are needed to bind together various GIS

and other services.

What are their URLs? How do you interact with them (WSDL)? What do

they do (capabilities)?

n The OGC defines information services, but they are specialized

to GIS.

Web Catalogue Service: state appears uncertain.

Web Registry Service: a common mechanism to classify, register,

describe, search, maintain and access information about OGC Web resources.

n But if they adopt Web Service standards, they get Web Service

(45)

Universal Description, Discovery

and Integration

n UDDI is the standard for building service registries and for

describing their contents.

UDDI is part of the WS-I core: http://www.ws-i.org/

n But no one seems to like it… n Centralized solution

Single point of failure

n Poor discovery model

No uniform way of querying about services, service interfaces and

classifications.

Limited query capabilities: search for services restricted to WS name and

its classification

n Stale data in registries

Out-of-date service documents in UDDI registries.Need a leasing system

(46)

UDDI Has Other Problems

n

Many Web Services need to maintain the concept of

state

between themselves during complicated

interactions.

n

For example, for better performance, I may wish to

cache maps in a Web Map Server instead of

reconstructing it via calls to a Web Feature Service

every time.

n

This is basically a glorified

HTTP Cookie

problem.

n

We need a way to store this kind of volatile session state

data in light weight data.

UDDI==heavyweight.

(47)

GIS Service Registries

n Functional capabilities of a GIS service is defined in

“capabilities.xml” file

n An information service can gather metadata about functional

requirements of a GIS service

By processing the capabilities file in an automated fashion when a service

is registered

By having the service provider declare these capabilities when publishing

a service

Information System API introduce a library for XML Schema Processing

of different capability files

n UDDI with the geospatial focus of GIS Services

Data layers (features) of a GIS service may have varying geospatial

coverage

UDDI registries do not natively support spatial queries.

We use existing geographic taxonomies such as QuadCode taxonomy to

(48)

WS-Context: Session State Service

n

Repository of Context Information

n

Allows for

Sharing Context info

n Info related to a particular transaction in multiple Web

Service interactions

Sharing data

n Data in multiple Web service interactions

n

Simply put, its a

Distributed Variation

of Shared

Memory.

n

See

(49)

HTTP(S) WSDL DB JDBC UDDI Registry Service WMS WMS WMS

WSDL WSDL WSDL

WSDL DB JDBC WS-Context Service WSDL DB JDBC WS-Context Service WSDL DB JDBC UDDI Registry Service WSDL DB JDBC UDDI Registry Service Information Service WS DL WSDL WS

DL WSDL WSDL WSDL

SOAP SOAP SOAP SOAP

An Information Service

with both Registry and

(50)

GIS FTHPIS Implementation

Status (M. S. Aktas)

n UDDI v.3E implementation

n metadata extension [completed]

n Processing geographic taxonomies to enable

UDDI support spatial queries [completed]

n WSDL interface to UDDI v.3 [completed]

n WSDL interface to WS-Context 1.0

n Monitoring scheme

Leasing [completed]

Heart-beat

n WS-Discovery implementation

n metadata extension [completed]

n WSDL interface to Information Service

n Message dissemination via Soap Handler Environment

(51)

Concluding Remarks

n

High Performance Computing will be increasingly data

driven.

n

High volumes of observational data will push many

applications into the realms of HPC.

n

There must be an overarching architecture to integrate

data sources, HPC applications, visualization

applications, users.

Web Service architectures provide this.Use to build Grid libraries

n

Large amounts of data related to the earth’s surface.

n

GIS data and service standards need to be integrated

References

Related documents

The specific business problem is that some business executives lack strategies to detect and eliminate tenant fraud in subsidized rental housing..

  Did you make any contributions to an education savings or 529 Plan account.   Health

from it should be acknowledged.. It starts with a brief account of Epiphanius' hfe and works and an overview of tlie contents of the Ancoratus. It then, offers an account of

However, this model showed that the statistically significant result of the positive interaction relationship between IT-service capability and MK-service

Building automation enables the possibility of energy flexibility in buildings. To investigate the motivation and barriers for the energy flexibility in buildings, this study develops

19 mothers were assigned to three groups of dialectical behavior therapy (consisting of 7 members: 3 mothers of children with cerebral palsy, 2 mothers of children with autism and

It is essential that your leaders participating in our summer camp program(s) attend the Leader’s meeting(s) at Hawk Mountain Scout Reservation on April 17, 2016 for the

Business development services (involving a range of training and technical support) were provided mainly through the University of Technology Entrepreneurial Center,