• No results found

Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting

N/A
N/A
Protected

Academic year: 2020

Share "Rapid Prototyping and Deployment of Distributed Web / Grid Services in a Service Oriented Architecture using Scripting"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Rapid Prototyping and Deployment of

Distributed Web / Grid Services in a

Service Oriented Architecture using

Scripting

Thesis Proposal

Harshawardhan Gadgil

(2)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(3)

Motivation

n

Critical Infrastructure systems connect disparate data

sources, high-performance computing applications

and visualization services for real-time data

processing.

n

Real-time data processing

n

Results required in real-time

. Data available in streams.

Requires pre-processing (e.g. filtering data to remove

unwanted parts).

n

Scalability

n

Potentially large number of data sources (Static,

dynamic) or data processing elements (services)

n

Unpredictable behavior

n

Fault-tolerance a key factor. E.g. Incorporate new data

(4)

Motivation (contd.)

n

System Management

n

Increasing complexity of application implies

more metadata.

n

Proper management required to ensure

smooth functioning of the system.

n

Require easy access to manage system

(5)

Motivatio

Streaming data Processing

n Critical Infrastructure systems

(Scientific applications)

n Real-time streaming sources exist

E.g. sensors, satellite stations

OR

Static data sources (databases containing previously warehoused observations)

n Data filtering / transformation essential in most cases for

converting data to proper format for processing application

n Real-time processing required.

Crucial for critical infrastructure applications

n Audio/video applications.

n Real-time sources

E.g. Collaborative sessions OR

Static data source (stored A/V files)

n Pre-processing required to modify A/V characteristic

n Format (encoding) / bit rate

(quality) etc…

n Real-time processing crucial for

(6)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(7)

Literature Survey

n

Services (Web / Grid)

n

Scripting Languages

n Benefits

n Possible problems

n

Handling data flow in applications

n File-based vs. Streaming

n

Workflow Systems

n Enable gluing High performance components n GUI – based building and programming flavor

n

Component based architectures

n

Messaging systems (for High throughput data

transfer)

(8)

Service

n “Service is a logical manifestation of a logical /physical resource (DB,

programs, devices, humans etc) and/or some application logic exposed to network”

- Web Service Grids: An Evolutionary Approach (2004) n Web Services

n Simple mechanism for distributed computing n Language independent, firewall friendly

n Grid Services

n Are essentially Web Services

(9)

Scripting Languages

n Benefits

n Enables Rapid prototyping (less code size and development time) n Less effort to

n Perform complex tasks

n Interface with OS (hosting environment) n Glue code to tie programs

n Usually portable

n Primarily for Plugging existing components together

n However, some disadvantages too

n Weak typing

n Less structure, difficult to maintain

n Some examples

n Rhino – Java script for JAVA n Perl, VBScript, (P/J)ython

n Scripting vs GUI builders

(10)

Scripting Environment

Hosting Services

n

OGSI:Lite & WSRF:Lite

n

Based on Perl

n

Rapidly deploy grid services

n

Matlab / Jython from GEODISE

n

GEODISE – Suite of CAD integrated with distributed

grid-enabled computing, data, analysis and knowledge

resources

n

Uses Matlab to provide programatic access to GEODISE

functions along with an existing suite of Matlab tools

n

Jython used to provide a hosting environment using

(11)

Data flow in applications

n

Real-time processing required.

n

Typically data transfer involves temporary storing of

data. This data may be transferred using files (E.g.

Grid FTP).

n

Every component of the chain processes data from

input file, writes processed data to output file.

n

Time and Space critical in real-time applications hence

file-based transfer is undesirable for real-time

applications.

n

Tools to automate data transfer and invoke

(12)

Workflow Architectures

n

Triana – Graphical PSE to compose scientific

applications

n Composed of one or more Triana engines.

n Distributed version

n Data transfer takes place using JXTA pipes.

n

Taverna

n Can interact with arbitrary services.

n Plugins to mediate / operate the service in each case

n Uses XScufl (derived from WSFL) workflow language.

n

Kepler

n Java packages for designing and execution.

n Has a graphical interface for composing complex workflows

n Can wrap existing code written in different languages. For e.g. Perl

(13)

Component Architectures

n

XCAT @ IU-Extreme

n

Connects components (Provides and Uses ports)

n

Jython based scripting to do application

management tasks (create application, set

properties, invoke application)

n

Data transfer by GridFTP between components,

Globus Reliable File Transfer (fault tolerance).

n

Many other systems

n

Focus mainly on invocation of services as in a

(14)

Messaging systems

n JXTA – P2P middleware, JMS for communication

n Pastry

n Fault tolerant P2P middleware

n Based on Distributed Hash tables

n No real-time routing possible

n NaradaBrokering @ IU – http://www.naradabrokering.org

n Event- brokering system designed to run on a large network of

co-operating brokers.

n Implements high-performance protocols (message transit time < 1

ms per broker)

n Order-preserving optimized message transport

n Interface with reliable storage for persistent events

n Fault tolerant data transport

n Support for different underlying transport implementations such as

(15)

System Management

n

Increasing complexity of systems implies increasing

amount of metadata to be managed

n

Provide access to

System

and management of

System metadata

- WS - Management

n

E.g. Performance metrics, logs, service metadata

n

Require ability to query system data and take

(16)

Outline

n

Motivation

n

Literature Survey

n

Research Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(17)

Research Issues

n Support for streaming data processing.

n Data transfer and processing in real-time

n Data transfer to be carried on between the end-points

(sender and recipient) without the flow engine mediating

- Grid Services Flow Language

n Design a run-time system that allows merging data sources,

data filtering and processing applications and visualization tools in a service-oriented architecture

n Assume all components available as Web (Grid) services. n Scalability an issue – Addition of data sources or processing

applications (Services) should not degrade the system performance

n Fault-tolerance – Services and data sources may be lost.

(18)

Research Issues

n

System Management Interface - Allow access to

system and manipulate the characteristics of system

by querying system metadata

n

Create Virtual topology for application deployment

n

Query performance metrics to design policies to

change routing substrate characteristics (E.g. Add new

brokers or links between existing brokers to aid

efficient routing)

n

Discover Services / brokers / topics of interest.

n

To dynamically rewire components with data

streams.

n

Replay events

(19)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(20)

HPSearch

n Binds URI to a scripting language

n We use Mozilla Rhino (A Javascript implementation, Refer:

http://www.mozilla.org/rhino), but the principles may be applied to any other scripting language

n Every Resource may be identified by a URI and HPSearch allows us to manipulate the

resource using the URI.

n For e.g. Read from a web address and write it to a local file

x = “http://trex.ucs.indiana.edu/data.txt”; y = “file:///u/hgadgil/data.txt”;

Resource r = new Resource(“Copier”);

r.port[0].subscribeFrom(x); /* read from */

r.port[0].publishTo(y); /* write to */

f = new Flow();

f.addStartActivities(r); f.start(“1”);

(21)

HPSearch (contd.)

n Currently provide bindings for the following n file://

n socket://ip:port n http://, ftp:// n topic://

n jdbc:

n Host-objects to do specific tasks

n WSDL – invoke web-services using SOAP

n PerfMetrics – Bind NaradaBrokering performance metrics.

Store published metrics and allow querying

n Resource – Every data source / filter / sink is a resource. n Flow – To create a data flow between resources. Useful for

creating data flows

n For more information, visit

(22)

Architecture

n Consists of

n SHELL

n Front end to scripting.

n TASK_SCHEDULER (FLOW_ENGINE)

n Distributes tasks among co-operating engines for load-balancing

purposes.

n WSPROXY

-n An AXIS web service wraps an actual service. The behavior of the

service can be controlled by making simple WS calls to this proxy.

§ Can be controlled by any Workflow Engine

§ WSProxy handles streaming data communication on behalf of the service.

n Service only sees I/P and O/P streams. These could be files or a

remote data stream or even a file transferred via HTTP / FTP or results from a database query

(23)

Architecture

n WSProxy - Interfaces

n Runnable

n More control over execution (start, suspend, resume, stop…)

n Basic idea (read block of data, process it, write it out)

n Ideal for designing quick filtering applications that process data

in streams.

n Wrapped

n Wrap an existing service (Executables [*.exe], Matlab scripts,

shell / Perl scripts etc…)

n Less control, can only start, stop

n Ideal for wrapping existing programs / services to expose as a

(24)
(25)

So what is the overhead

Partial results as of now

n Taken on 1.6 GHz Pentium 4 machine w/ 256 MB RAM running

Java 1.4.1_02, NB version 0.98 rc2, Rhino 1.5R3

n Shell Init: 2085 mSec (average)

n Results from RDAHMM Script (26 lines, small script) takes about

15 mSec (average per line) to execute

n Task distribution (2 engine, 4 tasks) 3897.645 mSec

n WSProxy (Init – depends on number of streams to initialize) 700

(26)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(27)

Contribution of this Thesis

n Stream and Service Management - Program data-flows

n Incorporate static and dynamic data sources

n WSProxy ensures that data flows directly between components

(Services) without the HPSearch engine mediating. Useful for streaming large amounts of data without clouding the controller.

n Scalable ?

n We use NB as our messaging substrate which can handle large number

of clients

n All components (data sources, data processing and visualization

applications) are clients. HPSearch manages streams and connects and steers components.

n Fault – tolerant ?

n Data source, data filter (processing application) failure possible.

n HPSearch can use the discovery service to invoke new services (in lieu

(28)

Contribution of this Thesi

(contd.)

n System Management - Scripting admin tasks

n Creating network (virtual broker network) topology n Querying Performance metrics

n Topic / Broker discovery

n Rapid deployment of applications

n Deploy Network topology n Set Application properties n Deploy Application

n In short:

n Provide alternative programmatic (scripting) access to

(29)

Milestones

n Implement WS front-end to shell

n Remotely submit a script for execution, possibly through a portal

n WSProxy / Handler: Fault tolerance to handle situations when

n The machine hosting the WSProxy dies

n The broker which is used by the proxy dies n The HPSearch Engine dies

n Design Application Interface

n Allow users to create applications using this interface

n Set Application properties, Allow modification of application

properties at runtime using scripting

n NB Admin objects

n

NaradaBroker

,

PerfMetrics

, NBDiscovery,

(30)

Milestones (contd.)

n

Design

stream negotiation module

to allow

WSProxy to negotiate stream characteristics

n

Select best possible transport and other QoS

elements for data transfer between two

services (for a particular stream)

n

Applications

- To demonstrate the use

n

Audio / Video mixer application

n

Multiple data sources and data filtering

(31)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(32)

Applications

Streaming Data Filtering

GPS Data

Data Filter

Filters the input data to get

only the estimate and error

values

RDAHMM

Analyze the data

Matlab Plotting

Script Graph

HPSearch

Kernel

-TSE Kernel

(33)

trex.ucs.indiana.edu school.cs.indiana.edu

Applications

Creating Virtual Broker Network for deploying

applications

b = new NaradaBroker("school.cs.indiana.edu");

b.create(""); /* OR b.create("file:///u/hgadgil/alternateConfig.conf"); */

b.connectTo("156.56.104.170", "5045", "t", "");

b.requestNodeAddress("156-56-104-170.bl-dhcp.indiana.edu:5045", "0"); c = new NaradaBroker("trex.ucs.indiana.edu");

c.create("");

c.connectTo("156.56.104.170", "5045", "t", "");

c.requestNodeAddress("tcp://156-56-104-170.bl-dhcp.indiana.edu:5045", "0");

156.56.104.

170 school.cs.indian

a.edu

trex.cs.indiana. edu

(34)

Applications

Invoking Arbitrary Web Services

approved = false;

userID = "111-22-3333"; if(loanAmt < 10000)

approved = true; else {

wsRA = new WSDL("http://www.riskAssessor.com/services/RiskAssessor"); risk = wsRA.invoke("assessRisk", userID, loanAmt);

if(risk > 50)

approved = false; else

approved = true; }

Print "Loan Approved: " + approved; risk =

(35)

Outline

n

Motivation

n

Literature Survey

n

Re

search Issues

n

HPSearch Architecture

n

Contributions and Milestones

n

Applications

(36)

Summary

n

This thesis addresses

n

Managing data streams (Dynamic and static)

n

Enabling connecting data sources and data processing

components (available as Web Services) for processing

data in real-time for critical infrastructure applications

n

Develop a general purpose scripting architecture (like

Perl) for a multitude of tasks

n

Goal is to create an architecture that is

n

Pluggable / Extensible

n

Manageable - Programmable

n

Similar to the

UNIX Pipe-Filter

Architecture, but

References

Related documents

Human Needs, Place, House, Home, Courtyard Houses, Villas, Saudi Arabia...

efflux from Albermarle-Pamlico Sound (APS) system was estimated to offset 2.5 years of C sequestration in the APS watershed. Tropical cyclones like Irene are projected to become

Taken together, it may be hypothesized that the +49 G allele acts as a dominant allele in relation to decreasing the risk of recurrent spontaneous abortion among Iranian women

Instructions for Preparing LREC 2016 Proceedings Rule based Automatic Multi Word Term Extraction and Lemmatization Ranka Stankovi?1, Cvetana Krstev2, Ivan Obradovi?1, Biljana

HAZARDOUS DECOMPOSITION PRODUCTS: Carbon monoxide, carbon dioxide Chlorine containing gases Nitrogen containing gases SECTION 11: TOXICOLOGICAL INFORMATION. COMPONENT

the tissue microarrays in order to determine whether cofilin activity (dephosphorylated cofilin expression) was correlated with outcomes in cases of human breast cancer.. In

schools award more Master's degrees than the number of students they admit specifically for Master's programs.. These frequently represent degrees awarded to students whose

Patients were cho- sen as candidates for surgical treatment according to the following criteria: localized bronchiectasis documented by high-resolution computed tomography (HRCT),