7.1. Introduction
As we discussed in the previous section and the introduction, Web Services provide atomic functionality for distributed systems, but to integrate a collection of such services into a Grid, we need to address problems of interaction, information, and orchestration. Recently, the Web
Service community has developed different languages to address the question of invoking multiple related Web Services, collectively termed as workflow. Various languages such as BPEL [68] and tools such as Triana [69] and Kepler [70] have demonstrated success in connecting disparate services together. Data transfer between services is typically handled by transferring files using tools such as GridAnt [71].
A growing number of applications involve real-time streams of information that need to be transported in a dynamic, high-performance, reliable and secure fashion. Critical infrastructure applications also mandate real-time processing of data and visualization of results. [Section 5.3] is an example of application that uses real-time inputs from distributed sensors. Assuming that the sources, sinks and data processing elements for these data streams are Web or Grid Services, we need to transparently manage shipping of data in a streaming fashion through a chain of these services. This would enable existing data processing applications to be utilized as Web Services possibly with minor modifications.
HPSearch [72] primarily addresses this requirement of data stream management. HPSearch has been used to demonstrate [73] the use of this system for processing real-time data for critical infrastructure applications. HPSearch also enables connecting disparate services for data analysis in real-time. In the sections to follow we show how HPSearch uses a scalable, fault-tolerant middleware, NaradaBrokering, to construct Web Services that can process data in streaming fashion.
As mentioned earlier, HPSearch uses NaradaBrokering to route data in streams. However, to setup the distributed application, one needs to setup and deploy a broker network topology. Typical characteristics of this network topology is providing alternate connection points, multiple interconnect routes and failure detection. To help manage these characteristics, HPSearch provides management tools for deploying and managing the broker network at runtime. Further to access various services in the routing substrate such as replay of events, special configuration might be necessary.
Thus, HPSearch provides dual functionality: 1) it provides a high-level language suitable for application developers to program workflows in a Grid that utilizes the messaging middleware, and 2) it provides tools to manage the messaging middleware.
7.2. Architecture
HPSearch uses a scripting based architecture to provide management console functionality. The basic HPSearch architecture is shown in Figure 15.
Figure 15 HPSearch architecture consists of a shell for creating workflow scripts, a workflow
engine(WFE) that processes the scripts, and Web Service proxy wrappers that provide NaradaBrokering ports as well as more traditional HTTP ports . See text for a full description of the system components.
The HPSearch kernel consists of a Mozilla Rhino [74] Javascript based console application along with a FlowHandler and Work Flow Engine (WFE) component and other system objects. These system objects are bound to the scripting language to provide specific functionality. This includes objects to manage the brokering network, setup distributed workflows and other tasks such as reading / writing to a context service [67].
The workflow engine component of the HPSearch kernel is responsible for managing the flow. Multiple workflow engines may exist to control multiple flows and for load balancing purposes. These engines communicate through the brokering network using a set of predefined messages. This communication is represented by black double-headed arrows. The HPSearch suite also contains a special wrapper proxy service called WSProxy for enabling streaming data processing. The WSProxy exports a Web Service interface and handles all data communication (shown by green double headed arrows) on behalf of the wrapped service. Instantiation of such a service and its operation (shown by the thick dashed line) can be done by sending simple SOAP requests.
WSProxy [Figure 16] encapsulates a service using two interfaces (Runnable and Wrapped).
Runnable is suited to quickly creating data filtering applications and provide more control on the life-cycle operations of the service. Wrapped provides less control on the lifecycle of the service but allows us to wrap an existing code for creating a pluggable component and exposing it as a Web Service. A wrapped service provides best results if the service being wrapped reads data from STDIN and writes data to STDOUT.
Broker Network (Distributed Brokers) WSProxyClient WFE Resource DB Connector Request Handler WFE WFE Flow Handler WSProxy Wrapped Service WSProxy Wrapped Service WSProxy Wrapped or programmed service HPSearch Shell …
...
Figure 16 WSProxy wraps Web services and provides additional communication ports. See text for a full description.
We can compose a distributed data-flow by joining multiple WSProxy wrapped services and setting the correct input and output streams. The brokering network is used to handle the data communication between the services.
7.3. Geophysical Application Example: Pattern Informatics
Pattern Informatics [32] tries to discover patterns given past data to predict probability of future events. The process of analysis involves data mining which is made using results obtained from a Web Feature Service. The Web Map Service is responsible for collecting parameters for invoking the PI code. These parameters are then sent to an HPSearch engine which invokes the various services to start the flow. The process is diagrammatically illustrated in Figure 17. The Code Runner Service is a sample wrapper service that invokes the Pattern Informatics application.
As shown in the figure, the Web Map Service submits a flow for execution by invoking the HPSearch Web Service.
Figure 17 A general GIS Grid orchestration scenario involves the coordination of GIS services, data filters, and code execution services. These are coordinated by HPSearch.
Figure 17’s steps are summarized below. This is the basic scenario that we use for integrating Pattern Informatics, RDAHMM, and other applications.
0. WFS and WMS publish their WSDL URLs to the UDDI Registry.
1. User starts the WMS Client on a web browser; the WMS Client displays the available features. User submits a request to the WMS Server by selecting desired features and an area on the map.
2. WMS Server dynamically discovers available WFSs that provide requested features through UDDI Registry and obtains their physical locations (WSDL address).
3. WMS Server forwards user's request to the WFS.
4. WFS decodes the request, queries the database for the features and receives the response. 5. WFS creates a GML FeatureCollection document from the database response and
publishes this document to a specific NaradaBrokering topic.
6. WMS receives the streaming feature data through NaradaBrokering's agreed upon topic. WMS Server creates a map overlay from the received GML document and sends it to WMS Client which in turn displays it to the user.
7. The WMS submits a flow for execution by invoking the HPSearch Web Service. This request also includes all parameters required for execution of the script. The HPSearch system works in tandem with a context service for communicating with the WMS.
8. Initially, the context corresponding to the script execution is marked as "Executing". 9. Once submitted, the HPSearch engine invokes and initializes (a) the various services,
namely the Data Filter service, that filters incoming data and reformats it to the proper input format as required by the data analysis code, and the Code Runner service that actually runs the analysis program on the mined data. After these services are ready, the HPSearch engine then proceeds to execute (b) the WFS Web Service with the appropriate GML (Geographical Markup Language) query as input.
10.The WFS then outputs the result of the query onto a predefined topic. This stream of data is filtered as it passes through the Data Filter service and the result is accumulated by the code runner service.
11.The code runner service then executes the analysis code on the data and the resulting output can either be streamed onto a topic, or stored on a publicly accessible Web server. The URL of the output is then written to the context service by HPSearch.
12.The WMS constantly polls the context service to see if the execution has finished. 13.The execution completes and the context is updated.
14.The WMS downloads the result file from the web server and displays the output.
7.4. Future Work
We are currently working on making management standards compliant and global. Web Services community has recently introduced two competing management specifications, namely WS- Management and WS-Distributed Management. We are currently investigating, making management compatible with these standards. Further management is restricted by presence of firewalls and Network Address Translation (NAT) routers. We are also investigating how we management can span these restrictions and can be made more administratively scalable.