Looking into the Future of Workflows: The Challenges Ahead

(1)

The Challenges Ahead

Ewa Deelman

Contributors:Bruce Berriman, Thomas Fahringer, Dennis Gannon, Carole Goble, Andrew Jones, Miron Livny, Philip Maechling, Steven McGough, Deana Pennington, Matthew Shields, and Ian Taylor

In this chapter, we take a step back from the individual applications and software systems and attempt to categorize the types of issues that we are facing today and the challenges we see ahead. This is by no means a complete picture of the challenges but rather a set of observations about the various aspects of workflow management. In a broad sense, we are organizing our thoughts in terms of the different workflow systems discussed in this book, from the user interface down to the execution environment.

1 User Experience

It is often difficult to provide users with a satisfying experience in building and managing applications, mainly because user expectations with respect to transparency and control vary greatly. Some users may want to describe their problems in a high-level application-specific manner, some may want to view intermediate data, while others may make very detailed plans, including specifying particular resources to use and possibly interacting with the live analysis by suspending and restarting particular portions of the analysis. Thus, workflow requirements are varied, often being subject to user- or domain-specific issues that cannot be satisfied by one system. However, a workflow system needs to be smart enough to handle the low-level technical details behind the scenes, hiding that complexity from scientists while at the same time exposing interfaces to workflow management aspects.

Most of the workflow systems today support a “one-shot” user interaction where, having started the workflow execution, it must continue to completion or error state or be aborted. However, it is often the case that users are not decided on the exact steps in the analysis to be conducted. They may want to use the workflows in an explorative manner, exploring different ideas and avenues of investigation. In order to enable this explorative and

(2)

interactive mode, the user must be very much part of the workﬂow. The systems must provide meaningful information to the user, at an appropriate level of abstraction, and provide adequate user interface responsiveness and system performance to enable the user to interact with the system on a realistic time scale.

Scientific users are often comfortable with their existing methodologies and techniques for conducting their analysis and may resist spending large amounts of time learning new tools and technologies. It would be useful to create an environment where new users can view how other applications have benefited from the technologies. Another benefit of such an approach would be for novice users to be able to view and use the knowledge of domain experts, captured in workflows who have solved the same or similar problems. This type of expert “knowledge capture” is extremely valuable to commercial research institutions where staff may move on and a new employee is expected to take over. Collecting workflows and their components into libraries that can be easily explored, shared between scientists and organizations, and reused will become increasingly important. In some cases new workflows can be generated by finding a workflow that is “close” to the desired analysis and then modifying it to suit the particular needs by substituting different components or data sources. Additionally, demonstrating the usefulness of the workflow technologies in a variety of applications and scenarios would enable other scientists to leverage existing experiences.

Result validation and verification is always uppermost in a scientist’s mind, often the journey to the result is as important, if not more important, than the result itself. Reproducibility is vital for the scientific process. To be able to validate a given set of results, we must be able to take the original workflow and start data and rerun the execution to give the same results. Thus, it is important to provide detailed provenance about every step of the workflow process, even to the level of the execution environment. Each of the components or steps in the workflow must also be validated to ensure that each individual result for each component is accurate. Finally, to truly be able to verify and reproduce experiments accurately since all aspects of the system are software-based, we must have version information to ensure that when an experiment is rerun everything is as it was. Are the start data the exact version that the original experiment used? Do we have the same versions of all of the components and the execution environment, or have they been modified? Even if modifications do not affect the results, we must have information about the system versions and be able to prove that this is the case. Some aspects of the extremely complicated systems that make up modern workflow environments are very difficult to version accurately. For example, if we rely on external services such as Web services as components, what information do we have access to about the version of the service we are using from one instance to the next? Standard Web Services Description Language (WSDL) has no capability for representing version information, and

(3)

even if it did, if the service is controlled by a third party to what level do we trust any information we may get about the version of the service?

There must also be an infrastructure that can catalog the provenance information in a scalable way and provide means of efficiently searching the large volumes of information. Provenance also needs to be structured in a way that would enable a scientist to easily evaluate the validity of the results. For example, it may not be necessary to provide detailed execution records when a scientist wants to find out about the types of analysis used in the workflow, but it would be if the same scientist wanted to reproduce an experiment from a workflow. While some scientific workflow systems already provide detailed provenance information, the problem of providing a standard representation is not solved. Solving this is necessary if a provenance generated by one workflow system is to be replayed on another. The other vital area in provenance that is understood but not necessarily implemented everywhere, and certainly not in Web services, is all the aspects of workflow system versioning.

Today, various aspects of the user experience are being partially implemented in a variety of workflow systems. However, there is no single system that provides all the necessary ingredients for comprehensive, flexible, and scientifically rigorous experimentation.

2 Workﬂow Languages and Representations

An aspect not addressed in the user experience is the language used to encode scientific workflows. In some cases, it is graphical, and in others it is script-based. In all cases, the language needs to provide the users with easy ways of specifying the required steps in analysis tasks and a means of connecting them either with a flow of control or data. As mentioned above, given the differences between the types of users, developing a standard workflow language is very challenging. The issue remains whether the cognitive overhead involved in creating workflows may distract scientists from creative exploration.

Although a plethora of tools, GUIs, and paradigms are currently used, in practice many suffer from the drawback that they are too low-level and do not shield the programmer from underlying systems. In other cases, expressiveness is too limited to describe all the needed control and data flow. For example, very few graphical systems support exception handling or other forms of dynamic, adaptive behavior. As with the world of programming languages, there can be no standard form of expression: Different users will always need different ways of describing computations. It is possible that a common intermediate form may exist. Based on such a common intermediary, the wide variety of workflow-related tools could have a chance of becoming interoperable and some of the existing duplication of effort could be eliminated.

In terms of “visual editing” of workﬂows, much work still needs to be done. Current workﬂows range from those that have a few tasks executed by a few

(4)

services to those that are composed of thousands of tasks distributed over thousands of processors. Many of the editors existing today can be awkward to apply in a distributed setting. Thus, developing compact and meaningful visualizations is an important challenge.

Workflow representations need not only provide a way to describe a workflow but also support the transitions between the different levels of abstraction from high-level user descriptions down to low-level execution details. One example of the information that needs to be captured by a workflow representation is the performance requirements necessary to map a workflow to an executable form.

One also has to be careful not to take workflow languages to the extreme and turn them into full-featured programming or scripting languages since they already exist in abundance and are inadequate for scientists to use on a daily basis. Workflow languages need to capture the salient features of a scientific analysis without providing so much flexibility as to make the workflow composition process too complex.

One possible solution to this problem would be to develop a series of languages that can be mapped from one to the other, where we have different languages that are appropriate in different contexts—different levels of abstraction. Users could then enter the system at their appropriate level.

Using a common intermediate representation would be one approach. This could be augmented with a common runtime and standard workflow enactment engine. In a manner analogous to the Microsoft Common Language Runtime and Infrastructure (CLR), one could integrate small scripts as executing components within a larger workflow. Within the Web services community, especially for business interactions, BPEL is already becoming the de facto standard for service orchestration and workflow. It is one possible candidate for a common intermediate representation for e-Science workflows as well, although there are issues with this approach.

3 Workﬂow Compilers

Workflow compilers can be used as a mapping tool between workflow languages at different levels of abstraction. They allow scientists to express their analysis at any level of abstraction and then compile it to the target execution system, which can range from a single host or service to a distributed, heterogeneous set of resources and services.

Compiling a workflow down to an executable form requires knowledge about the requirements and performance characteristics of the workflow tasks and knowledge of the availability and the characteristics of the resources. Currently, this knowledge is rather limited and often encoded in an ad hoc manner. A challenge for the future would be to capture the application-level and the execution-level knowledge using semantic representations and employ reasoners to find suitable mappings.

(5)

The compilation process involves many decisions, for example, why particular resources were selected over others. It may be beneficial to encode some of the decision process as the workflow is being mapped. In fact, the dynamic nature of resource availability may make this late binding necessary. This would enable more efficient compilation and possibly a more rich interaction between the workflow compiler and the workflow executors.

Considering that e-Science workflows are often mapped to a set of heterogeneous, distributed resources, failures in execution are commonplace. This failure-prone environment poses a significant challenge to the workflow compilers. Ultimately, the compilers should anticipate failures and plan accordingly, possibly producing “plan B,” or backtracking. They should also work closely with the workflow engines to react to problems as they occur.

Compilers also need to support the mapping of information about the execution of the workﬂow components back to the high-level descriptions, for example, in order to provide user-level monitoring and failure information.

As we mentioned before, the management of metadata and provenance at every step of the workflow is crucial. Compilers can be very beneficial in this aspect as they can augment the executable workflows with metadata and provenance management tasks, for example, adding tasks for collecting execution statistics and tasks for storing them in relevant databases. However, the compiler cannot manage the metadata and provenance alone. It needs appropriate workflow representations to support annotations of the workflow products with relevant metadata.

4 Workﬂow Enactors or Executors

The main job of a workflow executor, or workflow engine, is to faithfully and robustly execute the workflows. However, current enactment engines are not as fault tolerant as we would like, and many application and system faults still occurr. Many lightweight systems embed the enactor directly into the workflow composer: If the user turns off his laptop, the workflow will stop. Others, such as those based around the BPEL specification, are designed to allow the entire workflow state to be made persistent in a database. Consequently, a workflow enactment can survive a reboot of the engine.

Today there are many workflow engines, as there are many workflow compilers and user environments. It would potentially be beneficial to have a common engine, or at least a limited set of engines, for execution in distributed environments such as the Grid. Again, workflow language standardization, at various levels of abstraction, would be of great benefit in developing common engines.

An important challenge for workﬂow engines is to detect when a failure in the environment is a mask, which needs to be passed to the compiler and perhaps in turn to the user. Clearly, the workﬂow executor needs to provide

(6)

enough information at an appropriate level of abstraction to enable this type of failure handling.

Another issue not addressed fully by workflow executors today is the management of dynamic workflows, where new portions of a workflow can be added at any time while some other portions are cancelled. A related problem stems from the amounts of data involved in the e-Science workflows. Modest-sized workflows can create gigabytes of data on the execution sites. These data, once they are successfully transferred to permanent storage, should be removed, unless of course they are needed by subsequent analysis.

5 Debugging

As we mentioned before, errors often occur and need to be dealt with either by the workflow engine, the workflow compiler, or the user. Today, a user often has to examine logs provided by the workflow management system, which are mostly too low-level to be comprehensible by an average user. Much of the complexity stems from the cryptic error messages generated by the underlying distributed execution environment. However, some progress at the workflow level could be made as well. For example, it would be beneficial to provide the capability of replaying arbitrary portions of the workflow while modifying the data sources, the execution systems, and workflow components. This may provide the users with some insight into the nature of the failures.

In the distributed case, the most common approach to this is to provide a global event notification system. Such a system can also be tied to the provenance tracking, and the event history can be of great value in the replay process. However, a larger challenge is managing all the intermediate data products of the workflow. These are needed if a workflow is to be interrupted and restarted without redoing all previous work. Again, in the distributed case, this requires a distributed virtual data management system. Every intermediate data product needs to have a unique identifier that can be used to access that object if it is needed again.

6 Execution Environments

Much work needs to be done in terms of the distributed execution environment. Reliability is of paramount importance, as is providing detailed yet meaningful information about failures when they occur. As more scientists depend on large-scale distributed systems to do their work, these systems need to provide production-level availability and reliability.

Much work also needs to be done in characterizing the execution system so that workﬂow services can make meaningful decisions. This includes not only characterizing computational resources but also storage. Currently, we

(7)

don’t distinguish between diﬀerent types of storage such as fast I/O storage, long-term storage, quotas associated with speciﬁc resources, etc.

Usage policies of the resources often are not exposed in a way that can be easily examined by workflow management software. As a result, computations may be sent to resources with little chance of successful execution. Consequently, there is a need for dynamic resource-level authorization and policy negotiation. One approach is to associate the identity of the owner of a workflow enactment with the instance of that enactment. The workflow engine can negotiate with resources at runtime to decide on the best resources that the user is authorized to use.

Monitoring tools are critical in a distributed environment. They must be scalable and include meaningful, up-to-date information. Many eﬀorts have gone into coming up with common schemas for representing sets of resources. However, because many execution environments today are managed by diﬀerent organizations and projects, there needs to be a way to monitor across the organizational boundaries. Possibly, semantic technologies may help match seemingly disparate information.

Finally, it is important for the workﬂow software to be easy to deploy and manage, ultimately supporting on-the-ﬂy deployment so as to make full use of the dynamic execution environment.

7 The Big Question

Is the workflow metaphor too restrictive for exploratory e-Science? We think not. As we have seen, there are a number of approaches within the workflow arena, but clearly the authors of this book believe that workflow is the correct approach for e-Science applications. Are they right? Well, only time and experience will give us the answer to that question.