A Component Framework for Building
Web Science Gateways and Portals
Mehmet Nacar
Indiana University
Computer Science Department
[email protected]
Outline
•
Motivation
•
Background
•
Science gateways
•
Research objectives
•
Case studies
•
Architecture
•
Workflows and DAGs
•
Contributions
Motivation
•
Problem statement
n We study a novel approach is to build a better component
model for Grid portals that can enable Grid operations all in one application
•
Traditional Grid portlet approaches lack reusability
n Ex: GridSphere’s Grid portlets
n We leverage reusability of Grid portlets applicable to different
science domains
•
Goal: Simplify building Grid portlets
n Fine-grained and modular Grid operations
•
Goal: Collect separate Grid operations in one portlet
(e.g. multi-staging)
Definitions: Portal, portlet, portlet container
•
Portal
is a Web application providing
n Personalization, single sign on and content aggregation
•
Web, Grid and educational portal examples
n Web portals (Yahoo, iGoogle, Netvibes)
n Grid portals (VLab, CIMA, LEAD, and QuakeSim) n Education portals (Sakai, OneStart and Oncourse)
•
Portlets
are building blocks of portals
n Generic portlets (calendar, chat, and news)
n Grid portlets (job submission and file management)
•
Portlet containern Provides lifecycle management of portlets n Defines window states and portlet modes
n Java Specification Request (JSR) 168/286 specification defines
w Portlet container
w Portlet API
Science Portals and Gateways
•
Grid portals are built on common software/middleware
stack
n TeraGrid Science Gateway n Open Science Grid (OSG)
•
Many science gateways are based on portlets
n Account management
w GAMA: Grid Account Management Architecture
w PURSe: Portal-based User Registration System
n Credential management
w MyProxy: remote authentication infrastructure
n Grid service access
w GRAM: resource access
w GridFTP: file management
w Condor: resource management
n Access to Information Services (GPIR, MDS)
Research Issues
•
Need fine-grained component model for Grid portlets
n Issue with abstracting Grid operations as portlet applications n Portlet components are coarse-grained
n Simplifying Grid portlet building for application developers
•
Service oriented metadata management
n Issue with maintaining repositories for jobs
w Providing persistency
n Issue with archiving job information
n Utilizing metadata for resubmitting or searching capabilities
•
Pipelining and composing Grid operations
n Grid portlets for science applications must be composed of many
operations
n Issue with applying Directed Acyclic Graph (DAG) and workflows on
Grid portlets
n Abstracting Grid tags to support external DAG and workflow
frameworks
Reusability of portlets
•
Web application frameworks apply Model View Controller (MVC) architecturen Java Server Pages (JSP), Velocity and Struts n Portlets are entire web applications
•
Portlets are composed of two main parts View and Modeln View is user interface (markup language)
w There is no effort to markup Grid operations within Web frameworks
n Model is language code (Java beans)
w There are many efforts to provide modular model
w Java CoG Kit, Simple API for Grid Applications (SAGA)
•
We need to separate markups and language codes in order to allow reusable componentsn JSP reduces reusability of portlets
w Mixing up markup and language codes
Simple portlet application (JSP)
<%
PortletSession portletSession=renderRequest.getPortletSession(); PortletURL url=renderResponse.createActionURL();
String theActionString=url.toString(); %>
Fill out the form and then click submit to get your certificate. <form method=post action="<%=theActionString%>">
<input type=”text” name=”hostname” value=””> <input type=”text” name=”username” value=””>
<input type=”password” name=”password” value=””>
<input type=”submit” name=”getProxy” value=”Get Proxy Cert”> </form>
Solution: Grid Tag Libraries and Beans
(GTLAB)
•
GTLAB provides Grid components for building portlets
using reusable tags.
n The goal of GTLAB is to simplify Grid portlet development n Enable rapid development
•
Grid tag capabilities include Grid operations with XML
based tags within Java Server Faces (JSF) framework.
•
Grid beans provide methods to invoke Grid services
n Simplify Grid service programming
n Provides an abstract layer on top of low level APIs
•
Mapping Grid tags to Grid beans
Case Study 2: Grid Portlets for QuakeSim
•
QuakeSim portlets
in production
n Initial effort was to build the portal Web services
n QuakeSim portlets invoke the services and then execute simple
workflows built in shell scripts
•
Problem
: QuakeSim portlets should also work on TeraGrid
resources.
n TeraGrid restricts the services run on TeraGrid resources.
•
Our approach
:
n Prototyped QuakeSim portlets using Grid Tag Libraries and Beans
(GTLAB).
•
Advantages:
n QuakeSim portlets utilize IU, SDSC, NCSA and other TeraGrid
resources
n Rapid development by using reusable components
QuakeSim portlets with TeraGrid allocation
• Disloc is used to
calculate surface
displacements from
earthquake faults.
• 1994 Northridge
earthquake
• 2003 San Simeon
Earthquake
Architectur
Overview
• Grid portals are client to backend codes through Web/Grid services.
•
Grid tags are part of user interface tier and embedded into portlet container.•
Grid tags use local services in Apache Tomcat to manage sessions and handlers.•
Grid beans implement a layer on top of Grid client APIs such as Java CoG•
Implies that “portal” is quite sophisticated as must support integration•
Note CoG kit is here to act as a broker to hidecomplexities of evolving services (e.g. different versions of Globus)
•
Grid tags are embedded into JSF view pages and decorated
with standard JSF form, input, output and button components
Component development with JSF
•
Why is JSF suitable for Grid Portals?n OGCE community used templates to develop portlets
w Velocity and JSP (Web development templates)
n JSF decouples tags from beans
n JSF also provides an extensible framework (tag libraries)
•
How to extend JSF?n There are two types of JSF standard components
w Visual: CommandButton, InputText etc.
w Non-visual: form, param etc.
n Two components required for development of the tags:
w UIComponent: Describes encoding/decoding HTML.
w ComponentTag: Describes attributes
n Tag components should describe attribute bindings.
w Value binding -> #{bean.property}
w Method binding -> #{bean.method}
Grid tags
•
GTLAB component framework utilizes JSF model
•
Grid tags (components) are associated with Grid beans
•
Attributes are associated with Grid bean methods and
properties
Grid Tags (introductory example)
<html><body> < f:view>
<!-- Other user interface tags go here--> <f:form>
<o:submit id=”test” action=”next_page”>
<o:myproxy id=”pr” hostname=”gf1.ucs.indiana.edu” port=”7512” lifetime=”2” username=“mnacar” password=”***” />
<o:jobsubmit id=”task” hostname=”cobalt.ncsa.teragrid.org” provider=”GT4” executable=”/bin/ls” stdout=”tmp/result stderr=”tmp/error” />
</o:submit> </f:form>
</f:view>
</body> </html>
•
Grid tags are associated with Grid services via Grid beans• Grid Beans wrap the Java COG Kit (version 4)
• We show an example JSF page section below.
Grid Tags Associated Grid Beans Features
<submit/> ComponentBuilderBean Creating components, job handlers, submitting jobs
<handler/> MonitorBean Handling monitoring page actions
<multitask/> MultitaskBean Constructing simple workflow
<dependency/> MultitaskBean Defining dependencies among sub jobs
<myproxy/> MyproxyBean Retrieving myproxy credential
<fileoperation/> FileOprationBean Providing Gridftp operations
<jobsubmission/> JobSubmitBean Providing GRAM job submissions
<filetransfer/> FileTransferBean Providing Gridftp file transfer
Managing Metadata Repositories
•
Job metadata are stored in metadata repositories
n Metadata includes: submission parameters and files,
execution host and information, output parameters, files and their location
•
Job data is stored on the specified servers. As a result
data files are transferred by using GridFTP
•
Metadata of the data files such as URL location or
GridFTP locations are saved in the metadata repository
•
Portal users see the job metadata that has links to the
exact location of the data
•
Metadata repository is built by using WS-Context
repository
n Users see job metadata with the time stamped hierarchical
repository labels
Encoding DAGs to portlets
•
This example
demonstrates
a composite
Grid job
using
<o:submit id=”test” action=”next_page” />
<o:multitask id=”mytask” taskname=”test” persistent=”true” >
<o:myproxy id=”pr” hostname=”gf1.ucs.indiana.edu” port=”7512” lifetime=”2” username=“manacar” password=”***” />
<o:fileoperation id=”taskA” command=”mkdir” hostname=”cobalt.ncsa.teragrid.org” path=”/home/manacar/tmp/” />
<o:filetransfer id=”taskB”
from=”gridftp://gf1.ucs.indiana.edu:2811/home/manacar/input_file”
to=”gridftp://cobalt.ncsa.teragrid.org:2811/home/manacar/tmp/input_file” /> <o:jobsubmit id=”taskC” hostname=”cobalt.ncsa.teragrid.org” provider=”GT4”
executable=”/bin/execute”
stdin=”tmp/input_file” stdout=”tmp/result” stderr=”tmp/error” /> <o:filetransfer id=”taskD”
from=”gridftp://cobalt.ncsa.teragrid.org:2811/home/manacar/tmp/result” to=” gridftp://gf1.ucs.indiana.edu:2811/home/manacar/result” />
<o:dependency id=”dep1” task=”taskB” dependsOn=”taskA” /> <o:dependency id=”dep2” task=”taskC” dependsOn=”taskB” /> <o:dependency id=”dep3” task=”taskD” dependsOn=”taskC” /> </o:multitask>
</o:submit>
DAG Example JSF Page
Note the specific attribute values
DAG extensions: Condor DAGMan
•
Extensibility of GTLAB
n In addition to OGSA services, Grid community widely uses
Condor Grid.
n Condor DAGMan is a tool for running complex application
workflows on Condor
n GTLAB executes and monitors DAGs by Condor DAGMan
•
Condor DAGMan
n Grid tags integrate DAGMan with the following tags: n <o:condorDagman/> and <o:condorSubmit/>
n Composing DAGMan workflow is out of scope.
n GTLAB tags for DAGMan use Birdbath service to create client
Extension to Taverna workflows
•
DAGs provide advantages to pipe Grid invocations•
But DAGs cannot support full-fledged workflow capabilities like conditional branches and loops.n We studied Taverna as use case.
n We investigate Kepler and BPEL as future extensions
•
Workflows can be handled in different categories:n Composing
n Enacting n Monitoring
•
GTLAB supports Taverna enactment and monitoring.•
GTLAB imports well studied built-in workflows collected by the communityn Bioinformatics workflows and their metadata
•
Workflow composition is out of scope of this dissertationn There are ongoing researches in this area
Advantages of GTLAB
•
GTLAB provides simplicity to develop science portals
n Rapid development n Easy deployment
•
Grid tags provide rich selection of attributes to initialize
Grid beans.
•
Composite tasks can contain an unlimited number of
subtasks
•
GTLAB gives flexibility to developers to use their own Grid
beans library or add more Grid beans to the existing ones.
n Following the method name convention of GTLAB
•
Grid beans also can be imported to any presentation logic
n GCEShell is a command-line tool
Testing scenario
•
We want to show baseline performance metrics
•
The measurements will show GTLAB has acceptable
overhead.
•
The testing case:
n Tform : Request/response time between Client and Grid
services via portal server
Testing setup
•
The tests include multitask with Grid operations. A simple multitask is shown below.<o:multitask id="multi" persistent="true" taskname="#{resource.taskname}">
<o:myproxy id="mypr" hostname="gf1" lifetime="2" password="manacar" port="7512" username="manacar"/>
<o:jobsubmit id="js“ executable="/bin/ls“ hostname="gf1.ucs.indiana.edu" provider="#{resource.provider}" stdout="/home/manacar/tmp/out-test2"/> <o:jobsubmit id="js2" executable="/bin/ps" hostname="gf1.ucs.indiana.edu"
provider="#{resource.provider}" stdout="/home/manacar/tmp/out-test3"/> <o:dependency id="dep" dependsOn="js" task="js2"/>
</o:multitask>
•
We have simulated 10 users from different locations and browser sessions.n Each user makes 100 requests
•
Portal server runs on Grid Farm clusters at CGLBenchmark: Response Time
30
GTLA
Overhead
GTLAB Processing JSF Processing Handler storing Submitting
Evaluation of tests
•
Benchmark results summarize performance
metrics about GTLAB framework
•
Average overhead of GTLAB is about few
milliseconds
•
GTLAB does not add significant delay on
processing the requests.
•
The results also indicate that geographical
location of the end users causes minor delays
on response time
n
Delays vary about 10-50 msec depending on
distance
Contributions: System Research
•
Fine-grained components for Grid clients
n Providing a simplified programming model for Grid services. n Simplifying invocation of Grid applications through reusable
libraries.
n Providing two types of components
w Grid tags (User Interface)
w Grid beans (API)
•
Composing multiple Grid operations
n Abstracting DAG capabilities to manage multiple Grid
operations
n Integrating third party workflow engines
•
Metadata services enabling access to archives
Contributions: System Software
•
Design, development and application of GTLAB
architecture
n Extends the current field
w Limited by coarse-grained portlet components
n Our approach provides a natural way to implement Grid
portlets and other Grid clients
w Case studies are VLab and QuakeSim portals
•
VLab portal
•
QuakeSim portlets
•
CIMA portal
•
OGCE portlets
•
GCEShell
Related Work
•
GridSphere’s Grid Portlets 1.3
n Grid Portlets 1.3 provide API and User Interface (UI) tags to build Grid
portlets
n An effort called Vine (Portlet Vine) refactors Grid portlets and
decouples the portlets from GridSphere
•
Karajan
n XML based workflow language and engine for Grid computing n Built on Java CoG Kit
n Requires additional effort to aggregate it into Grid portals
•
Simple API for Grid Applications (SAGA)
n Low level programming interface (API)
n It provides tools to build Model in Web programming (Beans)
n It does not provide any User Interface widgets to embed it into Web
Future works
•
Enhancing Grid beans to embed into Web 2.0
presentation logic
n
Ajax based Grid portlets
n
JavaScript clients to Grid Beans
n
Web service interfaces to Grid beans
•
Improving workflow capabilities
n
Handling conditional branches
nHandling loops
Software
•
GTLAB v1.0 release available at
n http://grids.ucs.indiana.edu/users/manacar/GTLAB-website
•
GT
LAB link from main OGCE web site
n http://www.collab-ogce.org
•
Vlab
portal
n http://pedro.msi.umn.edu:6080/gridsphere
•
CIMA p
ortal
n http://cimaportal.indiana.edu:8080/gridsphere