Reducing the challenges using our approach

For the proof of concept, we implemented a prototype for structural analysis. The prototype consists of the components

• WSDL exporter, • WSDL loader,

• Mining configuration, and

• Frequent itemset mining for structural types.

The WSDL exporter connects to the enterprise service repository of an SAP system and reads the WSDL files of the public Web service interfaces from the repository.

The WSDL loader is capable of reading WSDL files and creating a first internal representation. For the tests, the WSDL loader was configured to read the message types from the WSDLs.

The miner configuration component presents the user with a graphical user interface to pick from options that influence the conversion from the internal representation to the mining database.

The closed frequent itemset mining component bases on the concepts described by Uno et al. (2004a;b). The mining component takes multiple XML schema definitions as input and creates a set of redundancy groups as a result.

9.4.1 Evaluation configurations

The miner configuration component allows for configurations of three categories (see GUI in Figure 9.1 on the facing page):

1. WSDL files to analyze, 2. granularity of analysis, and 3. populating the mining database.

All WSDL files to be analyzed must reside in a specific directory on the computer. The choice in the first category determines whether

• all files will be analyzed at once or

• subsequent mining runs will use more and more files starting from one file in the first run and ending with all available files in the final run.

Through configuration in the granularity of analysis category, one can determine • an initial minimum support,

• a final minimum support, and

• an increment on the minimum support between subsequent runs.

In addition, the minimum support may be defined as an absolute integer or as a percent- age ranging from 0% for the lowest minimum support of 2 and 100% for a minimum support equal to the number of transactions in the mining database.

In the category populating the mining database, the user may choose how the first internal model is transformed to the mining database. The two orthogonal options, already explained in Section 6.2.2 on page 68, are

1. the analysis range and 2. the domain of analysis.

9.4 Reducing the challenges using our approach 145

Figure 9.1: Miner configuration.

9.4.2 Evaluation runs

The purpose of the evaluation is to determine whether our concepts can be applied in an industrial context. In the context of SAP, it is important to determine ontological and technical overlap of exposed interface objects. Therefore, we performed test runs using the following configurations.

No Range Domain Minimum support Mining DB size

1 Top-level All subelements 10% increasing

2 Top-level All subelements decreasing fix 3 All types Direct subelements decreasing fix

The first two test settings aim at exploring ontological overlap because all given (top-level) types are evaluated based on their whole substructure.

In order to assess the performance of the mining component, a set of test runs was performed with increasing size of the mining database and a fix minimum support ratio of 10% of the current test’s database size. The result is depicted in Figure 9.2 on the following page. The same runs were used to plot the time used for mining against the number of redundancy groups generated by the miner in Figure 9.3 on the next page. An interpretation of the results follows in the subsequent section.

The second test setting was chosen to assess the performance with respect to the desired granularity of the results—the minimum support. Therefore, the minimum support was decreased in the subsequent evaluation runs on a fix mining database. The mining database was generated using the 689 message types that appear in Web service definitions exposed by the SAP’s enterprise SOA. The 689 message types are provided via 389 WSDL files with a total size of 55.4 Mbyte which makes about 1.77 messages

the results. A lower minimum support yields more results. Therefore, the test runs in the second setting, depicted in Figure 9.4 on the preceding page, take longer with a decreasing minimum support. That is unfortunate in an industrial setting where large messages are handled, which means that the mining database contains large transactions. The evaluations in the related work, such as Uno et al. (2004b), mainly focus on the range from 90% to 20%. With large database sizes, a minimum support of 20% (i. e., 138 of 689 transactions) is still very rough because 138 message types would have to contain the same overlap. We expect more interesting results in the lower minimum supports. However, the mining result with minimum support of 93 (i. e., 13% of 689 transactions) can be obtained in fewer than an hour.

In contrast to mining large transactions, multiple transactions of smaller size, as in the third evaluation setting, can be mined much faster. In that setting, the miner takes longer than 1 minute the first time for a minimum support of 12 (i. e., 0.23% of 5220 transactions).

As the discovery of ontological and technical overlap provides support for decisions on a rather strategical level, the time needed to obtain the mining result is not critical. As the interfaces in an enterprise are not expected to change very quickly, the mining in a realistic setting would also be rather infrequent which makes it feasible to wait for the mining result. The only remaining important issue of the mining is that it can scale up well to large mining database sizes. Therefore, the linear time and space complexity of the algorithm as presented in Figure 9.2 on page 146 and Figure 9.3 on page 146 and proven by Uno et al. (2004a;b) is essential and makes closed frequent itemset mining fit well to detecting redundant interface objects in realistic settings.

In document Scalable Ontological EAI and e-Business Integration (Page 161-166)