Handling Graph-Based Data - Centruow Server

4.3 Centruow Server

4.3.3 Handling Graph-Based Data

Graph-based data is the lifeblood of a Centruow visualisation, and as such providing this functionality was the main goal of Centruow Server. Unsurprisingly, most data inside companies is stored in relational databases, where relationships are declared through the use of primary and foreign keys. Despite this, it would be rather short-sighted to only consider relational databases as the sole repository for graph-based data, with other ex- amples including LDAP directories, RDF les, SPARQL endpoints, and other proprietary systems.

4.3.3.1 High-Level Architecture

The graph-based aspect of the Centruow Server application is a combination of Java servlet and Java Server Pages (JSP), backed with a considerable semantic web framework (making use of the Jena semantic web framework [78]). It exposes each distinct data source as a distinct SPARQL endpoint, allowing for any data source to be queried using the SPARQL query language. Whenever a SPARQL query is received by an endpoint, Centruow Server converts it into the appropriate SQL queries5_{, that can then be di-} rectly submitted to the necessary databases. When information is returned back from the database, it is converted into an RDF graph prior to being returned to the Centruow client.

In this way, all data sources are treated as RDF graphs with a SPARQL endpoint. What this means is that the Centruow Client can now talk to the Centruow Server using the SPARQL query language, as opposed to the Centruow Client using the proprietary Thinkmap protocol (and the necessary database duplication). This bodes well for future connectivity to systems that begin to expose data as RDF. The architecture for this aspect of the server is shown in gure 4.3. The diagram on the left shows the logical architecture, and the diagram on the right the physical architecture. In particular, the diagram on the right correctly shows that each data source adapter has its own SPARQL endpoint that forwards the SPARQL query directly to the data source adapter. This means that each Centruow client query is specically targeted at one SPARQL endpoint.

4.3.3.2 Implementation-Level Architecture

As noted earlier, we wanted to build the Centruow server using open standards and open source components, and as shown in gure 4.4, we largely succeeded. The Centruow Server product encompasses the entire top rectangle, but only a small portion is custom

5_{Whilst in theory the Centruow Server can query any data source, we have only implemented support}

for databases at this stage. This was because businesses expressed little interest in connecting other data sources. Instead, as mentioned in section 4.3.1, other requirements were deemed more important.

Figure 4.4: The Centruow Server implementation using open standards and open source components.

code, with the vast majority of code belonging to open source projects. There are only two rectangles inside the bold Centruow Server rectangle that we even touched with our own code, those of course being the Centruow Server box, as well as the D2R Server box. Most of our eort was spent in conguring these software components to work appropriately, and providing the necessary `glue' code between them.

To put this into context, Centruow Server requires 35 Java `JAR' les, totaling 13.9MB. The custom Centruow Server code is itself a separate JAR le, that weighs in at 0.44MB. This is the reason why, in the acknowledgements at the start of this thesis, we say we owe a great deal of gratitude to the open source developers of the world.

What follows is a quick overview of what the other components shown in gure 4.4 are used for.

Jetty

Jetty [79] is an open-source, standards-based, full-featured web server implemented en- tirely in Java. Jetty is used to provide the necessary HTTP protocol handling to allow for incoming and outgoing communications. In addition, Jetty acts as a fully capable Java servlet container, meaning that it can compile Java Server Pages (JSP) and servlets. We use Jetty to deploy Joseki.

Joseki

Joseki is deployed as a web application within the Jetty server and provides the ability to handle incoming SPARQL queries by accessing pluggable data sources. In our case, we have adapted Joseki to instead pass all SPARQL queries on to the D2R Server.

D2R Server

D2R Server is an application developed by Dr Christian Bizer which makes use of D2RQ to translate SPARQL queries into SQL, and in addition D2R Server provides a SPARQL endpoint that allows for SPARQL queries to be received. This is accomplished by pulling together Joseki and Jetty. We have adapted D2R Server considerably, to allow it to handle multiple data sources at once, as opposed to its original architecture of only supporting one data source.

Importantly, we have embedded the D2R Server within our Centruow Server, despite it being released under the GPL. We were able to receive a commercial contract from Dr Christian Bizer that allowed for Centruow Server to not be treated as a GPL'ed application. This also applies to D2RQ discussed below.

D2RQ

D2RQ is a library, also developed by Dr Christian Bizer, that translates SPARQL into SQL. It is no longer developed, but is still immensely useful in the context of Centruow Server. As quoted from [80]:

As Semantic Web technologies are getting mature, there is a growing need for RDF applications to access the content of non-RDF, legacy databases with- out having to replicate the whole database into RDF. D2RQ is a declara- tive language to describe mappings between relational database schemata and OWL/RDFS ontologies. The mappings allow RDF applications to access the content of huge, non-RDF databases using Semantic Web query languages like SPARQL.

In document Improving Centruflow using semantic web technologies : a thesis presented in partial fulfillment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand (Page 92-96)