3.5 Querying Semantic Version Control
3.5.2 IST Project Questions
1. A digital signature fails in the repository, search for other information that can help determine trust:
As in Section3.5.1, the repository administrator must take steps to discover the cause of the digital signature failure. Since an IST project has a smaller collaboration community this analysis should be slightly easier and have more information at its disposal.
The administrator should firstly check whether the author’s public key (FOAF) has been signed by one of the known CAs in the project consortium.
The author’s FOAF description and partner CAs are just two example of federable information published on the project webpage. Additional checks of the published project DOAP description will reveal whether the author is supposed to be making commits to the source code based on their responsi- bilities in the associated workpackage.
The repository should then produce a report based on the above information. As well as providing information on the number of changes since the last verifed commit, the repository can offer the administrator the option to override the broken signature to fix the problem. Future verifications ignore the original signature in favour of the new one.
2. Source code integration federation
Project partners during an integration phase need access to source code from other partners to produce an integrated prototype. A more convenient method is for metadata from each contributing partner to be federated to the integrating partner. To keep track of IPR attribution, the integrating partner will tag each set of metadata with a digital signature so to assert where it came from. Then, in a similar fashion to Section 3.5.1, the inte- grating partner will access source code directly from the remote contributing partner’s repositories.
Unlike the FLOSS scenarios which operate in a more open environment, both IST scenarios require more trust of federated information to be considered reliable. DOAP, FOAF, and CA information must be published by a trusted party (the project coordinator) otherwise answers in the event of a signature failure could be misleading. If wrong doing was discovered in an IST project and proved with these questions, contract obligations could be employed to resolve issues. In the case of FLOSS projects, all a repository administrator can is to ban the CA of the committer and manually check the affected source code.
The above scenarios and queries will form the basis for our analysis of the benefits of Semantic Web technology in Chapter 5. We will expand on these questions, demonstrating how they can be implemented in practise and provide experimental performance results.
3.6
Summary
Distributed collaborative software development, by its very nature, relies on in- teraction with third-party remote servers. Analysis of two approaches to version control reveal that the current RDBMS-based version control systems do not pro- vide the necessary support necessary for successful inter-domain development as required by our two case studies: FLOSS and EC IST Framework projects. In each case study there is a need for collaborators not to implicitly trust the integrity of the remote host, rather rely on the integrity of the repository’s metadata, secured using digital signatures and a PKI.
Semantic Web technologies offer another approach that should be considered. Un- like an RDBMS, RDF graphs appear ideal for data federation, which is desirable in distributed collaboration. The selection of Semantic Web technology in version control can be seen as a significant test as to whether it is a valuable and prac- tical technology. We have noted issues that still need to be resolved before take up improves; further analysis can be found in Chapter 5. Issues that should be addressed in our design include canonical RDF and provenance.
Nevertheless, the capabilities of SVN, GNU Arch, etc. represent a baseline ver- sion control capability which has proven itself over the years. Both Subversion and GNU Arch have introduced new architectural refinements, for example, file hier- archy restructuring, scalable and distributed repositories, and atomic commits to version control. It might therefore be productive to introduce some of the lessons of RDBMS version control into a Semantic Web DL approach.
In the next chapter we describe a design for an ontology for version control that uses Named Graphs as a mechanism for provenance. The purpose of this ontology is to act as the schema for a knowledge-base that performs the same functionality as a Subversion database. Our ontology leverages existing Description Logics and integrates with a new method of RDF digital signatures that promotes explicit trusted collaboration based on an established PKI. We go on to describe how our ontology, Named Graphs and digital signatures form the basis of an online collaborative tool that can support the necessary requirements of our case studies.
Design and Implementation
This chapter provides an overview of our version control ontology, the use of Named Graphs for provenance, and security considerations in the form of digital signatures. We provide our rationale for using DL as the underlying logic; we also demonstrate ontology extension and argue for re-using other ontologies to promote interoperability on the Semantic Web. Our design shows how Semantic Web technology can be used as the foundations of a next-generation version control system that supports distributed collaborative software development.
There are two parts to our design: Document Provenance which provides de- scriptive provenance similar to related work in Section 2.3.1 based on DL; IPR attribution based upon Named Graphs. Attribution is enforced by RDF digital signatures that help maintain integrity, forming the foundations of developer trust and accountability, in an otherwise open environment.
4.1
Ontology Design Overview
Software version control repositories like SVN manage the changes made to doc- uments over time. SVN uses a bespoke metadata format to record the author, description and version of a document which cannot readily be shared externally. Unlike CVS which uses a form of delta versioning [Hudson (2002)] on documents only, SVN is in addition capable of versioning directory structures and metadata. A well-known consequent restriction of CVS is its assumption of long-lived file names and, particularly, directory structures.
Another immediate problem with older tools such as CVS is that they keep the history metadata and delta versioning information together in the same logical structure. The Delta-V Working Group addressed this problem by separating the history and version metadata. Subversion [Collins-Sussman et al.(2004)] also improves on this problem, introducing a relational database to store metadata. To further develop this and leverage the rich tools of the Semantic Web we introduce Document Provenance, a Description Logic (DL) [Baader et al.(2003)] framework based on open standards which can be used for semantic version control and validation.