A Component Framework for Building
Web Science Gateways and Portals
Mehmet Nacar Indiana University
Computer Science Department [email protected]
Outline
•
Motivation•
Background•
Science gateways•
Case studies•
Research Objectives•
Grid Portlet Components•
Workflows and DAGsIntroduction
•
Problem statementn “A novel approach to build a better component model for Grid
portals that can enable Grid operations all-in-one application”
•
Better component model of Grid portals that can enable portlet applications with tags.•
Integrating workflow execution and monitoring inscience gateways by using session management and persistency
•
Case studies to research and generalize science gateway needs for different disciplines•
We introduce Grid tag libraries framework to support reusable Grid operations portable among Grid portalsMotivation
•
A new approach to science gateway buildingn Fine grained and modular Grid operations
•
There are other approaches to build science gatewaysn Open Grid Computing Environments (OGCE) propose separate
portlets for each Grid capabilities
n GridSphere proposes additional services and portlets for Grid
•
We need to find a way to collect separate Grid operations in one portlet (e.g. multi-staging)n VLab and QuakeSim portlets
•
In addition to Grid operations, the framework should provide user session managementn Session: user sessions on Web browsers
Definitions: Portal, portlet, portlet container
•
Portal is a Web application providesn Personalization, single sign on and content aggregation n Web, Grid and educational portals
w Web -> Yahoo, Google
w Grid -> CIMA, VLAB and QuakeSim w Education -> OneStart and Oncourse
• Portlets are building blocks of portals
n Generic portlets are calendar, chat, file transfer, weather and news.
n Grid portlets are MyProxy (credential manager), job submission and GridFtp
• Portlet container
n Provides lifecycle management of portlets n Defines window states and portlet modes
n Equips packaging and deployment descriptors
n JSR 168/286 specification defines
w Portlet container
w Portlet API
The Grid and TeraGrid
•
Grid Computing Environments (GCE)n Virtual organizations (VO) for Grid
•
The Coordinated TeraGrid Software and Services (CTSS) represents a common software/middleware stack onTeraGrid machines.
•
CTSS give a common programming environment for remotely interacting with TeraGrid resources. Grid services includen GRAM: resource access
n GridFTP: data management
n MyProxy: remote authentication infrastructure
w GSI security (public/private keys)
n Information Services (GPIR, MDS, SRB)
Science Portals and Gateways
•
Science Gateways and Web portals are built on the CTSS stack.n TeraGrid Science Gateway n Open Science Grid (OSG)
•
Many Java-based gateways are based on the portlet component model.•
Portlets are reusable portal parts that can be shared between development groups.n OGCE: collection of portlets and tools for building portals
•
What makes a portal Grid-enabled?n Accessing to Grid services
w Providing file management, job submission and metadata portlets
n Managing Grid credentials to access Grid services
Science portals: Technologies
• What tools and services are available to build Grid portals?
n Java COG: Globus Grid client programming API
n Birdbath: Condor Grid web service
w Birdbath clients provide Java API for Condor
n MyProxy: credential management service
n GAMA: Grid Account Management Architecture
n PURSe: Portal-based User Registration System
• We examined three Grid portals
n VLAB: traditional job submission portal for material science
n QuakeSim: Earthquake science portal (computation and data)
Case Study: VLAB Portal
VLAB: The Virtual Laboratory for Earth and
Planetary Materials
• Primarily a traditional job submission, monitoring, and management portal.
n In case of OGCE all capabilities are portlets
• Collaborative Grid services and portals support computational material science.
• Component based Grid portlet development makes application development easier.
• VLAB Challenges:
n Grid Portlets must be easy to develop using component libraries. n HTML <form> actions in Grid portals typically have several steps:
w Stage data files in and out of the desired remote host.
w Run one or more executables.
Case Study: Grid Portlets for QuakeSim
•
QuakeSim portlets in productionn Initial effort was to build the portal Web services
n Portlets invoke the services and then execute simple workflows built
in Ant scripts
•
Problem: Portlets should also work with other TeraGridresources.
n TeraGrid restricts the services run on their resources.
•
Solution:n We have described Grid portlets to QuakeSim portal.
n Described the process for creating Grid portlets using Grid Tag
Libraries and Beans (GTLAB).
n We gained rapid development by using reusable components n QuakeSim portlets enabled to utilize IU, SDSC, NCSA and other
TeraGrid resources
QuakeSim portlets with TeraGrid allocation
• Disloc is used to calculate surface displacements from earthquake faults. • 1994 Northridge earthquake
• 2003 San Simeon Earthquake
Research Issues
•
Fine grained portlet componet modeln Grid portlets typically wrap each single Grid capability in a separate
portlet
n Issue is that Grid portlets need to combine these operations
•
Session managementn Keep repositories for submitted jobs n Provides persistency
n Long term job information archiving
n Metadata access for resubmitting or searching capabilities
•
Worfklow and DAG componentsn Issue with managing Directed Acyclic Graph (DAG) and workflows n Keeps handlers for DAGs
Solution: Grid Tag Libraries and Beans (GTLAB)
•
GTLAB provides common components for building portlets using reusable tags.•
The goal of GTLAB is to simplify Grid portlet developmentn Enable rapid development
•
GTLAB capabilities include Grid operations with XML based tags within Java Server Faces (JSF) framework.•
Grid tags attributes map with Grid beans methodsn End users pass values to Grid beans by using tag attributes.
•
Workflow tags provide DAG integrations•
Session manager deals with job handlersArchitectur
Overview
• Grid portals are client to backend codes through Web/Grid services.
• Grid tags are part of user interface tier and embedded into portlet container.
• Grid tags uses local services in Apache Tomcat to manage sessions and handlers.
• Grid tags implement a layer on top of Grid client APIs such as Java CoG
Grid Tag Libraries (introductory example)
• Grid tags simplify association of composite Grid actions
n increase reuse of code
• There are associated custom JSF tag extensions we’ve developed:
n <o:submit/>, <o:myproxy/>, <o:multitask/>,
<o:jobsubmit/>, and <o:dependency/>
• Two Grid operations with no dependency
<html> <body>
<f:form>
<o:submit id=”test” action=”next_page” />
<o:myproxy id=”pr” hostname=”gf1.ucs.indiana.edu” port=”7512” lifetime=”2” username=“mnacar” password=”***” />
• Component Builder Bean (CBB) handles user requests using JSF form pages
• CBB parses custom
components embedded into JSF view page.
n Constructs the workflow by using Multitask bean with dependencies
n Maintains task handlers and task objects with persistency
n Submit multitasks to Grid services
Grid Tags Associated Grid Beans Features
<submit/> ComponentBuilderBean Creating components, job
handlers, submitting jobs
<handler/> MonitorBean Handling monitoring page actions
<multitask/> MultitaskBean Constructing simple workflow
<dependency/> MultitaskBean Defining dependencies among sub
jobs
<myproxy/> MyproxyBean Retrieving myproxy credential
<fileoperation/> FileOprationBean Providing Gridftp operations
<jobsubmission/> JobSubmitBean Providing GRAM job submissions
Session management
•
GTLAB session management can handle multiple job submissions•
Batch jobs should not block portlet pagesn Need to manage job sessions
•
GTLAB handlers follows the progress, even the session expiresn Job handlers are stored, when user re-login handlers update job
metadata
•
Metadata information is stored on WS-Context serverManaging Metadata Repositories
•
Job metadata are stored in metadata repositoriesn Metadata includes: submission parameters and files,
execution host and information, output parameters, files and their location
•
Job data is stored on the specified servers. As a result data files are transferred by using GridFtp•
Metadata of the data files such as URL location orGridFtp locations are saved in the metadata repository
•
Portal users see the job metadata that has links to the exact location of the data•
Metadata repository is built by using WS-Context repositoryn Users see job metadata with the time stamped hierarchical
repository labels
Workflows and DAGs
•
GTLAB can be used to associate multiple Grid tasks
with a single <form> action click.
n We call this a “multitask”
•
This is a form of workflow (DAG)
n We build on top of CoG workflow capabilities.
n Then abstracted this to use other workflow engines.
w Condor DAGMan, Taverna
•
Each multitask should be associated with a submit
button.
n This allows many multitasks in a portlet page.
n It’s useful in some cases to bind relatively different
Encoding DAGs to portlets
•
Multitask provides a simple DAG•
This example demonstrates a composite Grid job using multi-staged multitask•
GTLAB handles lifecycle of DAG within JSF<o:submit id=”test” action=”next_page” />
<o:multitask id=”mytask” taskname=”test” persistent=”true” >
<o:myproxy id=”pr” hostname=”gf1.ucs.indiana.edu” port=”7512” lifetime=”2” username=“manacar” password=”***” />
<o:fileoperation id=”taskA” command=”mkdir” hostname=”cobalt.ncsa.teragrid.org” path=”/home/manacar/tmp/” />
<o:filetransfer id=”taskB”
from=”gridftp://gf1.ucs.indiana.edu:2811/home/manacar/input_file”
to=”gridftp://cobalt.ncsa.teragrid.org:2811/home/manacar/tmp/input_file” /> <o:jobsubmit id=”taskC” hostname=”cobalt.ncsa.teragrid.org” provider=”GT4”
executable=”/bin/execute”
stdin=”tmp/input_file” stdout=”tmp/result” stderr=”tmp/error” /> <o:filetransfer id=”taskD”
from=”gridftp://cobalt.ncsa.teragrid.org:2811/home/manacar/tmp/result” to=” gridftp://gf1.ucs.indiana.edu:2811/home/manacar/result” />
<o:dependency id=”dep1” task=”taskB” dependsOn=”taskA” /> <o:dependency id=”dep2” task=”taskC” dependsOn=”taskB” /> <o:dependency id=”dep3” task=”taskD” dependsOn=”taskC” /> </o:multitask>
</o:submit>
DAG Example JSF Page
Note the specific values would
DAG extensions: Condor DAGMan,
Birdbath
•
We extend GTLAB to support Condor capabilities•
Condor DAGMan is a tool for complex application workflows on Condor•
Birdbath is Web services provider of Condor capabilities•
Grid tags integrate DAGMan with the following tags:n <o:condorDagman/> and <o:condorSubmit/>
•
Composing DAGMan workflow is out of scope.•
GTLAB executes and monitors DAGs by CondorTaverna workflows
• GTLAB abstracts workflow support.
n We studied Taverna as use case.
n We investigate Kepler and BPEL as future extensions
• Workflows can be handled in different categories:
n Composing
n Enacting n Monitoring
• GTLAB supports Taverna enactment and monitoring.
• GTLAB imports well studied built-in workflows collected by the community
n Bioinformatics workflows and their metadata is available
• Workflow composition is out of scope of this dissertation
n There are ongoing researches in this area
Testing setup
•
JSF applications use web forms. Applying request/response paradigm.•
There are two testing case:n Tform : Turnaround time for requests are initiated from
browser client by submitting web forms.
n Tportal : Turnaround time for requests first comes to the tomcat
GTLA
Overhead
• Test results have shown that GTLAB framework has negligible overhead.
n Overhead = Tsubmit-Trequest
• Average overhead of GTLAB is less than 100 msec
• GTLAB does not add up significant delay on processing the requests.
27
Performance results I
Performance results II
29
•
Round trip time of request/responses (Tform)•
Delay interval is 10 msecMachine P4 3.4 Ghz, 1Gbmemory Dell Dimension DM051
OS Windows XP Service Pack 2
Tomcat 5.0.28
Tomcat ThreadsmaxThreads="500" minSpareThreads="75"maxSpareThreads="200"
Performance evaluation
•
This test includes a simple multitask with Grid operations below.<o:multitask id="multi" persistent="true" taskname="#{resource.taskname}"> <o:myproxy id="mypr" hostname="gf1" lifetime="2" password="manacar"
port="7512" username="manacar"/>
<o:jobsubmit id="js" arguments="-l" executable="/bin/ls" hostname="gf1.ucs.indiana.edu"
provider="#{resource.provider}" stdout="/home/manacar/tmp/out-test2"/> <o:jobsubmit id="js2" executable="/bin/ps" hostname="gf1.ucs.indiana.edu" provider="#{resource.provider}" stdout="/home/manacar/tmp/out-test3"/> <o:dependency id="dep" dependsOn="js" task="js2"/>
</o:multitask>
•
We address the issue of overloading of the server.Contributions: System Research
•
Simplification of software stack of portals for Grid applications through reusable libraries.n Tag libraries for portlets are “fine-grained” components.
•
Aggregating DAG and workflow capabilities within GTLAB architecture•
Managing multiple sessions and monitoring the jobs across the sessions•
Persistently storing job metadata•
Accessing to the archived data with metadata services.•
Defining an access control policy of portlet contentsContributions: System Software
•
Design, development and application of GTLAB
architecture
n Extends the current field
w Limited by coarse-grained portlet components
n Our approach provides a natural way to implement Grid
portlets
w Case studies are VLab and QuakeSim portals
•
VLab portal
•
QuakeSim portlets
•
CIMA portal
•
OGCE portlets
Related Work
•
Grid Portlets 1.3 of GridSpheren Now they are trying to decouple with GridSphere. It’s called Vine
(Portlet Vine) as separate project
n Grid Portlets 1.3 provide API and UI tags to build Grid portlets
•
OGCE portletsn Packages Velocity, JSP and JSF portlets
n Provides portlet package for several Grid applications such as Globus,
Condor, SRB and GPIR
•
myGrid Portal Interface (MPI) portletsn Implements interfaces for enacting and monitoring Taverna workflows n Provides programming interfaces of Taverna enactment and monitoring
Software
•
GTLAB v1.0 release available atn http://grids.ucs.indiana.edu/users/manacar/GTLAB-website
•
See link from main OGCE web siten http://www.collab-ogce.org
•
Vlab portaln http://pedro.msi.umn.edu:6080/gridsphere
•
CIMA portaln http://156.56.94.164:8080/gridsphere
•
QuakeSim portalThanks !
Selected Publications
• Nacar, Mehmet.A., M. Pierce, and G. Fox. GTLAB:Grid Tag Libraries Supporting Workflows within Science Gateways. in SKG 2007. 2007. Xian, China: IEEE Proceedings.
• Mehmet A. Nacar, M.S.A., Marlon Pierce, Zhenyu Lu and Gordon Erlebacher, Dan Kigelman, Evan F. Bollig, Cesar De Silva, Benny Sowell, and David A. Yuen, VLab: Collaborative Grid
Services and Portals to Support Computational Material Science Concurrency and Computation: Practice and Experience, 2007. 19(12): p. 1717-1728.
• Mehmet Nacar, MarlonPierce, Gordon Erlebacher, Geoffrey Fox, Designing Grid Tag Libraries and Grid Beans, in Second International Workshop on Grid Computing Environments GCE06 at SC06. 2006: Tampa, FL.
• Jay Alameda, Marcus Christie Geoffrey Fox Joe Futrelle Dennis Gannon Mihael Hategan Gopi Kandaswamy Gregor von Laszewski Mehmet A. NacarMarlon Pierce Eric Roberts Charles Severance Mary Thomas.The Open Grid Computing Environments collaboration: portlets and services for science gateways. Concurrency and Computation: Practice and Experience, 2007.
19(6): p. 22.
• Bollig, Evan F. Jensen, Paul A. Lyness, Martin D. Nacar, Mehmet A. da Silveira, Pedro R. C. Kigelman, Dan Erlebacher, Gordon Pierce, Marlon Yuen, David A. da Silva, Cesar R. S. VLAB: Web Services, Portlets, and Workflows for Enabling Cyber-infrastructure in Computational Mineral Physics. Physics of The Earth and Planetary Interiors. 2007. 163(1-4): p. 333-346.
• Hao Yin, Donald F.Mcmullen, Mehmet A. Nacar, Marlon Pierce, Kianosh Huffman, Geoffrey Fox and Yu Ma, Providing Portlet-Based Client Access to CIMA-Enabled Crystallographic
Instruments, Sensors, and Data, in 7th IEEE/ACM International Conference on Grid Computing (GRID 2006). 2006: Barcelona, Spain.
Backup Slides
CIMA Crystallography portal
CIMA (Common Instrument Middleware
Architecture)
• Primarily a dataportal to online instruments
• Crystallographers collect data in participating
laboratories and collaborate on samples.
• Portlets have to access data with group privileges.
Continue
•
Portal users have roles and groupsn RBAC provides this for portlets in portal framework
•
We need to have access control over portlet data•
We need to map identities between portal and data manager service•
We have used an additional database to map portal users and service users•
CIMA sample portlets are customized to work with user groups•
This customization is done by adding new database tables to the existing portal frameworkTaverna use case
•
A user interacts with
a workflow portlet to
utilize Taverna
enactor.
•
User provides
parameters by
Big Red Portal
•
Big Red supercomputer is part of TeraGrid at Indiana Universityn 2048 cluster nodes, 4 terabyte memory
•
We have developed several portlets for Big Red portal submissionn MEME job submission
w Interactive and batch
w Includes both GridFTP and GRAM clients
w Job tracking.
n MOAB queue monitoring for both entire machine and the
specific user.
Results of T
form03/02/2020 Mehmet Nacar 45
Machine P4 3.4 Ghz, 1Gb memory Dell Dimension DM051
OS Windows XP Service Pack 2
Tomcat 5.0.28
Tomcat Threads (default) maxThreads="150" minSpareThreads="25"maxSpareThreads="75"
JAVA_OPTS (default)
Machine
Results of T
formMachine P4 3.4 Ghz, 1Gb memory Dell Dimension DM051
OS Windows XP Service Pack 2
Tomcat 5.0.28
Tomcat Threads maxThreads="500" minSpareThreads="75" maxSpareThreads="200"
JAVA_OPTS JAVA_OPTS=-Xms16m -Xmx256m
Machine
Conclusions
•
We have found out the problem of portlet specification has two major drawbacksn The level of components should be fine-grained
n It does not enforce access control over portlet contents
•
We showed our solutions to components which is Grid tag libraries•
We also showed a stovepipe solution to portlet access control•
As future work, PERMIS or GridShib authorization frameworks can be adapted.•
We showed that how we integrate different DAG and workflows to Grid portal in a generic way.•
We also showed that accessing user metadata and reusing them.Research issues: Grid components
•
Grid portlet applications require dynamic interaction and fine-grained components.n Portlets themselves need to be built out of components n Grid services mostly use request/response paradigm n Grid portlets use web forms heavily
w Compared to wikis, blogs, RSS-driven news portals, which have a different problem of content management.
n Grid widgets can provide components for:
w Proxy credential
w GridFtp operations
w Job submissions, multi-staged jobs
Portal page aggregation
Portlet modes
• H--> Help • E--> Edit
Window states