2. Scientific Data Analysis, Data Mining and Data Analysis Environments
3.4. Case Study DataMiningGrid
3.4.2. User Interface for the Integration
This case study shows that it is possible to collect the information specified in the ADS in an easy and user friendly way, thus allowing users without knowledge in grid technology to grid-enable data mining components. It is based on [139]. In detail, the study describes how to use the DataMiningGrid Application Enabler web interface as a user-friendly way of writing instances of the Application Description Schema. The Application Enabler [139] is a tool which supports the creation and registration of the description and the upload of the executable of the component in order to integrate the component into the grid environment.
Using the ADS, users can create detailed descriptions of their data mining component. This assures on the one hand, that the component will run successfully, and on the other hand, that users have a better chance to find the component in the grid. Providing the description will always rely on the component developer or end-user, someone not really acquainted with the DataMiningGrid system. Thus, the manual creation of such a complex description is presumably error-prone. To avoid erroneous or incomplete descriptions and still rely on the component developer or end-user to create his own component description, we decided to provide a web application.
This web application hides XML-syntax from the user and serves him as a tool which supports the creation of the description and uploads the executable of the component. After upload to the grid, the component description will be registered in the grid registry. By this, it is published in the environment and can be found and used by other users.
The Application Enabler consists of several form-based web pages, leading the user through the whole process of creating and uploading his data mining component. For this purpose, the parts that have to be specified are divided into several functional parts:
1. General Information
2. Execution Information
3. Input Data
3.4. Case Study DataMiningGrid
5. Requirements
6. Upload
Each of these steps is presented to the user as a single jsp-web-page, in which specifica- tions can be made.
General Information
Figure 3.9.: DM Application Enabler - General Information (from [139]).
The first page displayed to the user when creating a new component description is the ”General Information” page (Figure 3.9). On this page he can describe basic things about his component, like its name, its version and some description. This information is mainly used to present the component to other users in the grid.
Execution Information
The second page (Figure 3.10) is essential for the execution of the data mining component. The user can describe his executable and its options here. He can choose from four different execution types (Java, Python, BashShell, C), specify interpreter commands and give all the options his data mining component is capable of handling. Very important for a new user is that each option is described by a short tool-tip, so users can learn about the meaning of different options.
Figure 3.10.: DM Application Enabler - Execution Information (from [139]).
Figure 3.11.: DM Application Enabler - Input Data (from [139]).
Input Data and Output Data
These two pages (Figure 3.11 and 3.12) are very similar, as they both describe the data the component works with. Both input and output data share similar properties, like a label, data type, flag and tool-tip. The only differences are the stage-in/stage-out flags and a flag called ”providedWithAlgorithm”. Stage-in for input data means that the data will have to be shipped to the execution machine before the execution can start. Stage- out means shipping the specified data to a storage server, from which the user of the workflow, who doesn’t know about the execution machine, can late obtain the results. The flag ”providedWithAlgorithm” allows the user to upload and assign input data which is then pre-set when later using the data mining component in a workflow.
3.4. Case Study DataMiningGrid
Figure 3.12.: DM Application Enabler - Output Data (from [139]).
Requirements
Figure 3.13.: DM Application Enabler - Requirements (from [139]).
After all of the internal conditions for the executable are specified, the user can indicate external conditions, like required environment variables or requirements applying to the execution machine (Figure 3.13). This information is especially important for the resource broker, which uses it to find the most adequate machine for execution and assures a correct environment for the component to run in.
Upload
Finally after having specified the necessary information, the files belonging to the data mining component can be uploaded (Figure 3.14). The only file that has to be uploaded commonly is the executable file. Other files are optional, unless the user specified them before. If he used the ”providedWithAlgorithm” flag for some input data, he has to upload
Figure 3.14.: DM Application Enabler - Upload (from [139]).
it to be able to create a valid description. Additionally he can upload required libraries the executable file depends on. These files will be copied to the execution machine before execution and are accessible to the executable file there.
Confirmation
When the user has specified all obligatory form-fields, uploaded the required files and clicked ”Generate Description”, an Application Description according the Application De- scription Schema will be created (Figure 3.15). For each new component, a folder will be created on the server. This folder will include all of the uploaded files as well as the ADS instance itself. From this folder it can be published in the grid registry.