4. Flexible and Interactive Development of Data Mining Scripts in Grid-based
5.10. Evaluation & Wrap-up
In this section we will describe how our approach to reusing processes can be evaluated according to best practices in business process redesign. In addition, we will summarize the context of this chapter.
5.10.1. Evaluation in the Context of Business Process Redesign
By our approach for the reuse of processes we want to improve the efficiency and effective- ness in bioinformatics scenarios. Ideally, a redesign or modification of a process decreases the time required to handle incidents, it decreases the required cost of executing the pro- cess, it improves the quality of the service that is delivered and it improves the ability of the process to react flexible to variation [17]. However, a property of such an evaluation is that trade-off effects become visible, which means that in general, improving upon one dimension may have a weakening effect on another.
In [104], a set of best practice heuristic rules on process (re)design is evaluated according to the metric cost, time, flexibility and quality. In the following, we give details on rules from [104] which are related to our approach:
• Order types: ’determine whether tasks are related to the same type of order and, if necessary, distinguish new business processes’ - If parts of business processes are not specific for the business process they are part of, this may result in a less effective management of this sub-process and a lower efficiency. Applying this best practice may yield faster processing times and less cost.
In our context, the abstraction from and specialization to tasks of the executable level according to the task hierarchy addresses this issue. If executable tasks are not longer matching to the process, they are abstracted to configurable, structural or conceptual tasks and then specialized into new executable tasks addressing the needs of the process.
• Task elimination: ’eliminate unnecessary tasks from a business process’ - A task is considered as unnecessary when it adds no value from a customer’s point of view, e.g., control tasks that are incorporated in the process to fix problems created (or not elevated) in earlier steps. The aim of this best practice is to increase the speed of processing and to reduce the cost, while an important drawback may be that the quality of the service decreases.
In our context, we intentionally go the other way around - we add tasks for checking requirements, which increases the time and cost in the executable process. However, the check tasks facilitate the reuse and can help to early detect problems, thus reducing the time and cost for the reuse.
• Triage: ’consider the division of a general task into two or more alternative tasks’ or ’consider the integration of two or more alternative tasks into one general task’ and Task composition: ’combine small tasks into composite tasks and divide large tasks into workable smaller tasks’ - These best practice improve the quality of the business process due to a better utilization of resources with obvious cost and time advantages.
5.10. Evaluation & Wrap-up
However, too much specialization can make processes become less flexible and less efficient.
According to the task hierarchy these best practices can be addressed by the abstrac- tion to and specialization from tasks at the structural level. However, the concrete specification of tasks is up to the user, which makes it hard to make a general statement in terms of cost, time and flexibility.
• Knock-out : ’order knock-outs in a decreasing order of effort and in an increasing order of termination probability’ and Control addition: ’check the completeness and correctness of incoming materials and check the output before it is send to customers’ - Additional checks increase time and costs, but increase the quality delivered. However, checking of conditions that must be satisfied to deliver a positive end result in the correct order reduces the cost and time without loss in quality. The meta-process is defined such that only patterns with a matching business ob- jective are considered for an application. By modelling tasks for checking the data mining goal as well as the requirements and prerequisites at the beginning of a pro- cess pattern it can be ensured that these checks are performed at an early stage in the process in order to stick to the best practices, thus reducing cost and time.
5.10.2. Summary
In this chapter we have presented an approach for describing data mining based analysis processes in the context of bioinformatics in a way that facilitates reuse. Our approach is based on CRISP and includes the definition of data mining process patterns, a hierarchy of tasks to guide the specialization of abstract process patterns to concrete processes, and a meta-process for applying process patterns to new problems. These data mining process patterns allow for representing the reusable parts of a data mining process at different levels of abstraction from the CRISP model as most abstract representation to executable workflows as most concrete representation, thus providing a simple formal description for the reuse and integration of data mining.
In addition, we have introduced an approach on data semantic aware data mining pro- cess patterns. The approach is based on encoding data requirements and pre-conditions for analysis processes that are related to the data as queries to semantically annotated data sources inside a data mining process pattern. Using a medical ontology, semantic information can be integrated into the data mining process patterns for formally checking the data requirements of analysis scenarios. In our approach we combine the concept of semantic mediation of data sources with the concept of data mining process patterns. With our approach we support the reuse of data mining based analysis processes in the area of medical and bioinformatics by a formal representation of data semantics and thus speed up the development of new solutions.
We evaluated our approach in 3 case studies. In the first case study we presented how to create and apply data mining process patterns in the context of a clinical trial scenario. It was shown that it is possible to create a process pattern by abstracting executable tasks of a script and to apply this pattern by specializing it into a workflow including a manual task. The second case study dealt with the transformation of a data mining process
pattern of a multi-center-multi-platform scenario into a process pattern for describing the abstract process of meta analysis in bioinformatics. By this it was shown that it is possible to describe abstract processes consisting of conceptual tasks. In the third case study we described how data mining process patterns can be integrated into business processes in a fraud detection scenario in the health care domain. We demonstrated how a data mining process pattern can be created based on information from a data mining paper and how to integrate this pattern into business processes.
Thus, it was shown that our approach meets the requirements of the users in terms of supporting reuse of data mining based analysis processes and that it can be applied in practice. Furthermore, we presented how our approach can be attributed to best practices for process optimization in the context of business processes.