Organizing the benchmarking activities - Benchmarking methodology for Semantic Web

Benchmarking methodology for Semantic Web

4.4. Organizing the benchmarking activities

The RDF(S) and the OWL Interoperability Benchmarkings, presented in chapter 3, were organized and carried out following the benchmarking method-ology for Semantic Web technologies described above, which provides the general guidelines that have to be adapted to each case.

This section includes the instantiation of this methodology in both bench-marking activities from the beginning of the benchbench-marking to the Experiment definition task of the Experiment phase. The content of the rest of the section is valid for the two benchmarking activities since these two activities have similar goals and scope and the tasks followed in the Plan phase are also similar. When needed, the results of the tasks in each of the benchmarking activities are clearly differentiated.

The following chapters present the instantiation of the methodology in the other tasks of the RDF(S) Interoperability Benchmarking (chapter 5) and of the OWL Interoperability Benchmarking (chapter 6). The chapters contain a complete definition of the experiments performed (including the benchmark suites and the evaluation software used), information on how the experiments were executed, and a detailed analysis of the results.

4.4.1. Plan phase

The author took the role of the benchmarking initiator and was in charge of organizing and defining the benchmarking, carrying out the first tasks of its process.

Goals identification

According to the benchmarking methodology for Semantic Web technologies, the first task to perform is to identify the benchmarking goals, benefits and costs.

Our goal was to evaluate and improve the interoperability of Se-mantic Web technologies using, in one case, RDF(S) and, in the other case, OWL as the interchange language.

Achieving interoperability between Semantic Web technologies is not straight-forward when these tools do not share a common knowledge model, and their users do not to know the effects of interchanging an ontology from one tool to another.

In the RDF(S) Interoperability Benchmarking, the scope of the benchmark-ing was limited to one type of technology, namely, ontology development tools.

However, though the scope was limited, the benchmarking was intended to be general enough to allow other types of tools to participate. In the OWL In-teroperability Benchmarking, on the other hand, the scope was broadened and considered any type of Semantic Web technology.

The benefits pursued through our goal, which will be commented below, are related to the expected outcomes of the benchmarking and involve different communities dealing with Semantic Web tools, namely, the research community, the industrial community, and tool developers. Such benefits are

To create consensual processes and mechanisms for evaluating the inter-operability of these tools.

To produce user and developer recommendations on the interoperability of these tools.

To acquire a deep understanding of the practices used to develop these tools and of how the practices used affect their interoperability.

To extract from these practices those that can be considered best practices when developing the tools.

Most of the costs of the benchmarking go to the human resources needed to organize the benchmarking, to define the experimentation process, to perform the experiments on the tools, and to analyse the results. Other minor expendi-ture goes to travelling and computers, but this is negligible when compared to the aforementioned.

An estimation of these costs depends on different factors such as the effort put in organizing the benchmarking, the number of tools that participate, the availability of previous experiments that can be reused, and the possibility of automating both the experiment execution and the analysis of the experiment results.

In our case, the costs of organizing the benchmarking were unavoidable, and so were the costs of defining the experiments because no previous experiments that could be reused existed. Furthermore, when this task was carried out the number of participating tools was unknown.

Subject and metrics identification

Once the goals, benefits and costs of the benchmarking have been identified, its scope has to be defined by selecting which Semantic Web tools from the organization will participate, which of its functionalities will be measured, and which evaluation criteria will be used to assess these functionalities.

WebODE [Arp´ırez et al., 2003] is the ontology engineering platform devel-oped by the Ontology Engineering Group of the UPM and the tool chosen to participate in the two benchmarking activities.

As the goal presented in the previous section was too general, the scope was refined to cover a concrete interoperability scenario. Section 1.3.2 presents the different modes that Semantic Web technologies have to interoperate. The most commonly used and, therefore, the one considered here, is the indirect interchange of ontologies by storing them in a shared resource. We have selected this mode because a direct interchange of ontologies would require developing interchange mechanisms for each pair of tools, which would be very costly.

In our case, the shared resource is a local filesystem where ontologies are stored in text files serialized using the RDF/XML syntax because this is the syntax most widely used in Semantic Web technologies.

Also, it was taken into consideration that Semantic Web tools have different knowledge representation models, and it may occur that two tools use the same model or that a tool uses the RDF(S) or the OWL model.

In this scenario, interoperability depends on two different tool functionali-ties: one that reads an ontology stored in the tool and writes it into an RDF(S) or OWL file (RDF(S)/OWL exporter from now on), and other that reads an RDF(S) or OWL file with an ontology and stores this ontology into the tool (RDF(S)/OWL importer from now on).

The evaluation metrics must describe thoroughly the interoperability be-tween an origin tool and a destination one. Therefore, to obtain detailed in-formation on tool interoperability using an interchange language, we need to know

The components of the knowledge model of an origin tool that can be interchanged with a destination tool¹.

The secondary effects of interchanging ontologies that include these com-ponents, such as insertion or loss of information.

The subset of the knowledge models of the tools that these tools can use to correctly interoperate.

The problems that arise when ontologies are interchanged between two tools and the causes of these problems.

1In the rest of the document, for the sake of clarity, it will appear that “a tool

ports/imports/interchanges some components”. This should be understood as “a tool ex-ports/imports/interchanges ontologies that include the realisation of some components”.

Some specific evaluation criteria should be established for each experiment to assess the interoperability of the tools. The experiments to be performed should yield data informing how the tools comply with these criteria.

Participant identification

The delimited benchmarking scope helps to identify the organization mem-bers that are related to the benchmarking and to form the benchmarking team responsible for continuing the benchmarking activities in the organization.

Because WebODE is being developed by the Ontology Engineering Group at the UPM, it was quite straightforward to identify and contact the members of the organization involved in WebODE’s RDF(S) and OWL importers and exporters. In both benchmarking activities, the team was formed by the author and by Jes´us Prieto-Gonz´alez, an undergraduate student that provided support by developing experimentation-related software.

Proposal writing

The next task to perform was to compile all the benchmarking-related in-formation into a benchmarking proposal, which should be used as a reference along the benchmarking.

To reach a broader audience, the benchmarking proposals did not take the form of paper documents but of publicly available web pages²³, which include all the relevant information about the benchmarking activities. Currently, this information contains the following:

Motivation and goals.

Benefits and costs.

Tools and people involved.

Description of the experiment.

Benchmark suites.

Planning.

Related events.

Results and recommendations.

Management involvement

These benchmarking proposals were presented to the Director of the Ontol-ogy Engineering Group and, after her analysis, she agreed on continuing the benchmarking activities and allocating future resources both for performing the experiment and for improving the tool.

2http://knowledgeweb.semanticweb.org/benchmarking_interoperability/

3http://knowledgeweb.semanticweb.org/benchmarking_interoperability/owl/

Benchmarking partners

Participation in the benchmarking was open to any organization irrespec-tive of being a Knowledge Web partner or not. To find other best-in-class organizations willing to participate, the following actions were taken in the two benchmarking activities:

To research different ontology development tools, both freely available and commercial ones, which could export and import to and from RDF(S) or OWL and then, to contact the organizations that develop them.

To announce the interoperability benchmarking and to call for participa-tion through the main mailing lists of the Semantic Web area and through lists specific to ontology development tools.

Table 4.2 presents the ontology development tools capable of importing and exporting RDF(S) found by the time of performing this task in the RDF(S) In-teroperability Benchmarking (by March 2005). Table 4.3 presents the ontology development tools capable of importing and exporting OWL, which were found by the time of performing this task in the OWL Interoperability Benchmarking (by April 2007). Their developers were directly contacted.

Any Semantic Web tool capable of importing and exporting RDF(S) or OWL could participate in the RDF(S) Interoperability Benchmarking or in the OWL Interoperability Benchmarking, respectively. In the case of the RDF(S) Inter-operability Benchmarking, not only ontology development tools, but also RDF repositories participated.

Table 4.4 shows the six tools that took part in the RDF(S) Interoperability Benchmarking, three of which are ontology development tools: KAON, Prot´eg´e (using its RDF backend), and WebODE; the other three are RDF repositories:

Corese⁴, Jena⁵ and Sesame⁶.

Table 4.5 shows the nine tools having taken part in the OWL Interoper-ability Benchmarking: one ontology-based annotation tool, namely, GATE⁷; three ontology repositories: Jena⁸, KAON2⁹, and SWI-Prolog¹⁰; and five ontol-ogy development tools: the NeOn toolkit¹¹, Protégé-Frames¹², Protégé-OWL¹³, Semtalk¹⁴, and WebODE¹⁵.

ToolInstitutionURL ConstructNetworkInferencehttp://www.networkinference.com/products/constructit.html DOEI.Nationaldel’Audiovisuelhttp://homepages.cwi.nl/

˜ troncy/DOE/

InferEdIntellidimensionhttp://www.intellidimension.com/pages/site/products/infered/ IsaVizW3Chttp://www.w3.org/2001/11/IsaViz/ KAONUniversitatKarlsruhehttp://kaon.semanticweb.org/ LinkfactoryWorkbenchLanguage&Computinghttp://www.landcglobal.com/pages/linkfactory.php OilEdUniversityofManchesterhttp://oiled.man.ac.uk/ OntoEditFreeOntoprisehttp://www.ontostudio.de/ OpenOntologyForgeNationalInst.ofInformaticshttp://research.nii.ac.jp/˜collier/resources/OOF/ Protégé2000StanfordUniversityhttp://protege.stanford.edu/ SemTalkSemtationhttp://www.semtalk.com/ SNOBASEIBMhttp://www.alphaworks.ibm.com/tech/snobase UnicornWorkbenchUnicornSolutionshttp://www.unicorn.com/ VisualOntologyModelerSandpiperSoftwarehttp://www.sandsoft.com/products.html WebODEU.PolitécnicadeMadridhttp://webode.dia.fi.upm.es/ Table4.2:Ontologydevelopmenttoolsabletoimport/exportRDF(S)byMarch2005.

ToolInstitutionURL AltovaSemanticworksAltovahttp://www.altova.com/products/semanticworks/ DOEInst.Nationaldel’Audiovisuelhttp://homepages.cwi.nl/

˜ troncy/DOE/

DOMEDERIhttp://dome.sourceforge.net/ GrOWLUniversityofVermonthttp://ecoinformatics.uvm.edu/technologies/index.html HozoOsakaUniversityhttp://www.ei.sanken.osaka-u.ac.jp/hozo/eng/indexen.php IBMIODTIBMhttp://www.alphaworks.ibm.com/tech/semanticstk KAON2UniversitatKarlsruhehttp://kaon2.semanticweb.org/ LinkfactoryWorkbenchLanguage&Computinghttp://www.landcglobal.com/pages/linkfactory.php m3t4StudioMetatomixhttp://www.m3t4.com/ MediusVisualO.M.SandpiperSoftwarehttp://www.sandsoft.com/products.html ModelFuturesOWLEditorModelFutureshttp://www.modelfutures.com/OwlEditor.html TheNeOnToolkitTheNeOnprojecthttp://www.neon-toolkit.org/ OntoTrackUniversityofUlmhttp://www.informatik.uni-ulm.de/ki/ontotrack/ PowlUniversityofLeizpighttp://aksw.informatik.uni-leipzig.de/Projects/Powl Protégé-FramesStanfordUniversityhttp://Protégé.stanford.edu/ Protégé-OWLUniversityofManchesterhttp://Protégé.stanford.edu/ SemTalkSemtationhttp://www.semtalk.com/ SWOOPUniversityofMarylandhttp://www.mindswap.org/2004/SWOOP/ TopbraidComposerTopQuadranthttp://www.topbraidcomposer.com/ VisioOWLJohnFlynnhttp://mysite.verizon.net/jflynn12/VisioOWL/VisioOWL.htm WebODEU.PolitécnicadeMadridhttp://webode.dia.fi.upm.es/WebODEWeb/index.html Table4.3:Ontologydevelopmenttoolsabletoimport/exportOWLbyApril2007.

Tool Version Developer Experimenter

Corese 2.1.2 INRIA INRIA

Jena 2.3 HP U. P. Madrid

KAON 1.2.9 U. Karlsruhe U. Karlsruhe

Prot´eg´e 3.2 beta build 230 Stanford U. U. P. Madrid

Sesame 2.0 alpha 3 Aduna U. P. Madrid

WebODE 2.0 build 109 U. Polit´ecnica de Madrid U. P. Madrid

Table 4.4: Tools participating in the RDF(S) Interoperability Benchmarking.

Tool Version Developer Experimenter

GATE 4.0 Sheffield U. Sheffield U.

Jena 2.3 HP U. P. Madrid

KAON2 2006-09-22 Karlsruhe U. Karlsruhe U.

NeOn Toolkit 1.0 build 823 The NeOn project U. P. Madrid Protégé 3.3 build 395 Stanford U. CERTH Protégé-OWL 3.3 build 395 Manchester U. CERTH

SemTalk 2.3 Semtation Semtation

SWI-Prolog 5.6.35 U. of Amsterdam U. of Amsterdam WebODE 2.0 build 140 U. P. Madrid U. P. Madrid

Table 4.5: Tools participating in the OWL Interoperability Benchmarking.

As tables 4.4 and 4.5 show, the experiment was not always performed by tool developers. Furthermore, in the RDF(S) Interoperability Benchmarking some tools executed the experiments more times than others because the tools entered the benchmarking at different times (see section 5.2).

In the two benchmarking activities, the tools participating presented a vari-ety of knowledge models. Next, an enumeration of the knowledge models of the tools is presented:

Corese’s knowledge model enables processing RDF(S) and OWL Lite within the Conceptual Graphs formalism [Corby and Faron-Zucker, 2002].

GATE’s knowledge model consists of a class hierarchy with a growing level of expressivity. The expressivity of this model is aimed at being broadly equivalent to OWL Lite [Bontcheva et al., 2004].

Jena’s knowledge model supports RDF and ontology formalisms built on top of RDF. Specifically this means RDF(S), the varieties of OWL, and the now-obsolete DAML+OIL [McBride, 2001].

KAON’s knowledge model is an extension of RDF(S) that contains the essential modelling primitives of frame-based systems [Motik et al., 2002].

KAON2’s knowledge model is capable of manipulating the SHIQ(D) sub-set of OWL-DL and F-Logic [Motik and Sattler, 2006].

The NeOn Toolkit fully supports F-Logic modelling. Native support of OWL is currently under development [Erdmann and Wenke, 2007].

Prot´eg´e’s knowledge model is based on a flexible metamodel, which is comparable to object-oriented and frame-based systems [Noy et al., 2000].

Prot´eg´e-OWL’s knowledge model supports RDF(S), OWL Lite, OWL DL and significant parts of OWL Full [Knublauch et al., 2004].

SemTalk’s knowledge model supports modelling RDF(S) and OWL using Visio [Fillies and Weichhardt, 2005].

Sesame’s knowledge model allows managing RDF(S) [Broekstra et al., 2002].

SWI-Prolog’s knowledge model supports RDF(S) and OWL on top of Prolog [Wielemaker et al., 2008].

WebODE’s knowledge model is based in frames and is extracted from the intermediate representations of METHONTOLOGY [Arp´ırez et al., 2003].

The experiments carried out over the NeOn Toolkit were done in the scope of the NeOn European project¹⁶and the analysis of the NeOn Toolkit interoper-ability is presented in [Garc´ıa-Castro, 2007b]. The results of this interoperinteroper-ability are not included in this thesis as they are restricted to the NeOn partners.

Planning and resource allocation

The main deadline of the benchmarking was imposed by that of the bench-marking in Knowledge Web. Therefore, a plan had to be designed that included the Plan and Experiment phases, though this only contained the first task of the Improvement phase (Benchmarking report writing).

This plan was developed and agreed on by all the organizations participat-ing in the benchmarkparticipat-ing activities; besides, every organization had to assign a number of people to perform the process.

4.4.2. Experiment phase

Experiment definition

The experiments performed in the benchmarking activities had to provide data informing how the Semantic Web tools comply with the evaluation criteria established in the previous phase.

On the other hand, interoperability using an interchange language depends on the capabilities of the tools to import ontologies from the language (to read one file with an ontology and to store this ontology in the tool knowledge model)

16http://www.neon-project.org/

and to export ontologies to the language (to write into a file an ontology stored in the tool knowledge model). Therefore, the experiments provided data not only of the interoperability but also of the tool importers and exporters.

As mentioned before, participation in the two benchmarking activities was open to any Semantic Web tool. However, the experiments required that the tools be able to import and export RDF(S) ontologies in one case, and OWL ontologies in the other.

For the experiments, any group of ontologies can be used as input, but having real, large or complex ontologies is useless if we do not know whether the tools can interchange simple ontologies correctly. However, because one of the goals of the benchmarking is to improve the tools, the ontologies must be simple to isolate problem causes and to identify possible problems.

Therefore, to obtain the required experiment data, the author defined four benchmark suites to be used, which were common to all the tools. Three benchmark suites were used in the RDF(S) Interoperability Benchmarking (the RDF(S) Import, Export, and Interoperability Benchmark Suites) and one in the OWL Interoperability Benchmarking (the OWL Lite Import Benchmark Suite).

The quality of the benchmark suites used is essential for the results of the benchmarking. Therefore, once the benchmark suites were defined, the first step in the two benchmarking activities was to validate these benchmark suites and to agree on their definition. To this end, they were published on the benchmarking web pages so that they could be reviewed by the participants.

The benchmark suites were also validated and refined in reviews performed by Knowledge Web partners in several meetings. In the case of the RDF(S) In-teroperability Benchmarking, a workshop was organized by the author in Madrid in October 10th-11th 2005¹⁷, where the participants presented some of their first experiences in using the benchmark suites and evaluated their tools in a hands-on sessihands-on.

The experimentation planning of the two benchmarking activities was defined so as their deadlines would coincide with the Knowledge Web deadlines, date when the benchmarking results should be delivered. Therefore, the plan-ning included the Plan and Experiment phases, though it contained only the first task of the Improvement phase (Benchmarking report writing).

This planning was developed and agreed on by all the organizations partici-pating in the benchmarking activities; besides, every organization had to assign a number of people to participate in the experiments.

17http://knowledgeweb.semanticweb.org/benchmarking_interoperability/working_

days/

RDF(S) Interoperability

In document Benchmarking Semantic Web technology (Page 94-105)