Future Work - Trusted Collaboration in Distributed Software Development

Future work outlines areas where we can perform further research based on the issues discussed in our self-evaluation and related work. Here we identify several potential and interesting areas for future work which include: architecture, ontology, logic, and federation extensions. We also briefly discuss how performance might be increased.

6.3.1 Architectural Improvements

6.3.1.1 GRIA

The use of GRIA in our federation scenarios has shown how easy it is to develop a SPARQL service for querying our version control metadata. While our wiki interface is useful for viewing documents and their histories online, third-party access is an obvious advantage. Expanding our use of GRIA will enable us to create more complex trust relationships between developers and service providers

2_{http://www.joseki.org/.} 3_{http://darq.sourceforge.net/.}

who store the trusted metadata. One example would be to include constraints on accessing our SPARQL service using Service Level Agreements (SLAs).

6.3.1.2 Maven 2

Another extension might include the integration of the Maven 2 project management system [Massol et al.(2006)]. Projects hosted on our online collaborative tool could use Maven 24 _{for full building, testing and deployment services. This would}

potentially make the online collaborative tool a trusted compilation platform.

6.3.2 RDF Digital Signature Improvements

There are two key areas where we can improve on our RDF digital signature mechanism: the canonicalisation algorithm and the use of trust metrics. Improvements in both of these areas would increase the reliability of our online collaborative tool and create a solution that could be reused elsewhere.

As we noted in AppendixC,nauty is perfectly capable of canonicalising complex unlabelled graphs like the Petersen graph. One option would be to extend nauty so that it can understand labelled edges and Named Graphs. This would mean our RDF digital signature mechanism would be able to sign over arbitrary RDF graphs, removing the need for our currentconservative canonicalisation approach. It appears that recent work by Tan et al. (2006b) is particularly relevant to our research. It would be advantageous to collaborate in future work to establish how PASOA-based provenance and our Named Graph approach can be integrated. The interest in trust metrics inTan et al.(2006a) is also timely, since we believe that by expanding our RDF digital signature mechanism to support trust metrics, we can leverage existing approaches that have been developed in parallel to our own work. It is highly likely that integrating these approaches with research by Dimitrakos et al. (2001);Bizer (2004b) and Golbeck and Hendler (2004a,c) will open up new avenues of collaboration.

6.3.3 Ontology Extensions

Just like the documents we put under version control in our online collaborative tool, ontologies develop over time as requirements change [Noy and Klein(2004)]. While it is not envisaged that the ontology used as the basis for version control be managed by the same system, there may come a point where extensions to the ontology are vital for future development. Other developers, for example, may want to improve the ontology, which will require change management.

6.3.3.1 Advanced Software Project Management

Software project management systems, for example Maven 2 [Massol et al.(2006)], have become a fast and efficient method to manage and automate the build process of simple and complex projects. If we were to consider Maven 2 as part of the core architecture it would be necessary to model the Maven 2 build life-cycle so to capture the progress of a build.

6.3.3.2 Intellectual Property Rights Management

While our RDF digital signature mechanism supports non-repudiation when using a PKI and can help enforce Intellectual Property Rights, we have not written an ontology to represent these rights. Gonz´alez(2005) suggests a interesting approach to developing an OWL ontology for Digital Rights Management. This ontology approach could be used as a first step to creating a generic ontology for IPR attribution.

6.3.4 Logic Extensions

6.3.4.1 Non-monotonic Reasoning

One key advantage we have noted during this research of DL over relational database systems is the ability to leverage explicit knowledge and generate im- plicit knowledge using inference rules. Inferences are not limited to just RDF, RDFS and OWL entailments; our work has demonstrated that useful information can be generated with custom inference rules. These inferences, however, operate

under monotonic, open world semantics in line with Semantic Web “best prac- tices”. As SWRL becomes the mainstream language for rule composition, some researchers are beginning to advocate non-monotonic extensions to OWL [Katz and Parsia (2005); Hitzler et al. (2005)]. Others suggest combining the use of open world reasoning with closed world reasoning at a local level [Grimm and Motik (2005);Kolovski et al. (2005);Ng (2005)].

Web service description languages such as the Web Service Modelling Ontol- ogy (WSMO)5define a set of non-monotonic extensions to an otherwise monotonic framework. Similar non-monotonic extensions could also be applied to our Named Graph work.

6.3.5 Federation Extensions

6.3.5.1 Process-based Workflow

We have noted some of the management and maintenance issues regarding Jena 2- based inference rules. Although the declarative approach used is flexible, it makes data flow difficult to track, and can be computationally expensive depending on the expressivity of the rules and procedural builtins used.

Another approach that could be used to complement declarative rule languages is process-based workflow. Process-based workflow, while sequential in its execu- tion, can easily track data flow and has several industry standards (BPEL 2.0), currently lacking in the rule domain. Microsoft has gone some way toward this integration with Windows Workflow Foundation (WF), which is capable of firing rules sequentially [Young (2005)]. Workflows could be used for data flow and or- chestration, firing rules as they are required. It is also conceivable that further work here could produce useful results that work across different platforms.

6.3.5.2 SPARQL Query Protocol

We have taken advantage of only a small subset of SPARQL’s language. SPARQL also defines a query protocol which might be useful to employ, especially through a SOAP interface like that of Joseki6_{. Since SPARQL is now in its last call, all}

features should now be stable; this will encourage adoption.

5_{http://www.wsmo.org/.}

It would be reasonable to take the SPARQL query service we developed for our federation scenarios and develop it into a complete SPARQL protocol service. A client should also be developed that leverages DARQ.

6.3.5.3 Natural Language Processing

Unfortunately, the interface between application and Semantic Web query mech- anisms is such that it is difficult to dynamically create queries at runtime. This needs to be improved, especially when queries are used in conjunction with semantic inferences. Natural Language Processing (NLP) is one approach that maps basic English onto ontology concepts and roles.

6.3.6 Performance Enhancements

At present there are two issues that need to be addressed before Semantic Web toolkits will improve in their performance: triplestore database schemata and indexing. Unfortunately, efficient indexing is linked to the schema used to represent the RDF. It may be that rather than using SQL to perform the indexing, it will be necessary to let a higher level library perform this task.

To our knowledge none of the major Semantic Web toolkits share the same database schema for persistent storage. Each take their own approach, which can mean different performance depending on the toolkit used. NG4J, for example, takes a na¨ıve approach to persistent storage, keeping all components of a quad (graph- name, subject, predicate, object) in the same table. It might be more productive to investigate the Oracle approach where different components of an RDF triple are kept in different tables.

In document Trusted Collaboration in Distributed Software Development (Page 158-162)