Integration with DL-based inferencing engines builds on the common ideas of (i) sep- arating explicit from implicit knowledge and (ii) dynamic computation of implicit knowledge at query answering time versus materialization of implicit knowledge (pre- computation). We consider two types of system architectures depicted by Figure 6.5. Both have in common that explicit knowledge is maintained by a data store (which might be an RDF triple store) providing CC as discussed in Section 6.2. When mate- rialization is used we propose, however, to internally partition the data store. This is done for performance reasons and to prevent the need for a more complex distributed transaction management. Altogether both architectures share the following properties:
1. Under concurrent updates it is important that inferencing engines see only the results of committed update transactions. This can be ensured by SI-based CC in general, no matter whether it is applied at the level of OWL syntactic instances or at the level of RDF triples.
2. The fact that reads to the data store are never delayed (by update transactions run- ning in parallel) provides scalability only bounded by the technical access limits of the data store. More interestingly, it is even possible to use multiple inferencing engines in parallel. This allows for better scalability by concurrent query answer- ing distributed over multiple inferencing engines.
3. The architectures are independent of the different types of reasoning al- gorithms that exist; notably, rule-based forward/backward chaining (e.g., OWLIM [KOM05]), Datalog engines (e.g., KAON2 [Mot06]), and tableau-based (e.g., Pellet [SPG+07], HermiT [MSH09], RacerPro [HM01b]).
Data Store explicit implicit facts facts partition partition Data Store explicit axioms and assertions
(a) online inferencing
Inferencing Engine
(r)
Inferencing Engine
(u)
Inferencing Engine
Concurrent Updates Concurrent Queries
Client Applications
(b) materialization of implicit facts
(r)
(u)
Concurrent Updates Concurrent Queries
Client Applications
Inferencing Engine Inferencing Engine Inferencing Engine
(r) reads as part of query answering (u) update notifications (optional)
(r) updates to implicit facts partition (u) update notifications from explicit
facts partition
Figure 6.5: System Architecture Types for Integration of an OWL Data Store with Infer- encing Engines.
6.4.1
Online Computation of Implicit Knowledge
The first type depicted byFigure 6.5(a) considers inferences to be computed online as part of query answering requests. It is intended to be used when update frequency can become very high and computational complexity of reasoning is rather low, e.g., when a tractable OWL 2 profile is used (EL, QL, or RL).
Every update will be made directly to the data store and query answering (reads) will be generally handled via inferencing engines, assuming that an appropriate query interface is provided either directly by them or layered on top. Consequently, inferenc- ing engines need to read from the data store in order to provide complete query answer- ing over explicit and dynamically computed implicit knowledge. However, reasoning engines are purely read-only regarding the data store (i.e., they never need to update the data store). Since SI is used, they can read without additional delays and multiple instances can be used in parallel to distribute the load of concurrent query answering requests.
According to this architecture blueprint, an optional update notification mechanism exists from the data store directed towards inferencing engines. This is motivated by utilizing incremental reasoning capabilities provided by several inferencing engines. Whenever an update commits, its change set (added and deleted triples) would be prop- agated to inferencing engines. This allows them to keep internal caches up to date with changes to the data store. Finally, we note that with such a notification mechanism it would also make sense – for performance reasons of reasoning – to actually restrict the data store to maintain only the (possibly large) ABox and assume that the TBox and RBox is maintained internally by the inferencing engines.
6.4.2
Materialization of Implicit Knowledge
The second type of architecture blueprint addresses cases where update frequency is moderate and query requests (vastly) outnumber updates. In such cases it is often ben- eficial to precompute and materialize implicit knowledge and keep it in sync with up-
dates to the explicit knowledge. However, materialization is generally possible only if the expressivity of the underlying DL has the finite model property8; that is, if there cannot be cases in which one can infer an infinite number of axioms and/or assertions.
One might be tempted to directly apply a separation of explicit and implicit knowl- edge by using two independent data store instances. However, this imposes more com- plex CC in order to guarantee correct data access at a global level such that consistency spans both data stores. The reason is that in this case one has to apply distributed trans- action management which would require dedicated coordination mechanisms between data stores to ensure atomicity (e.g., Paxos, Two-Phase Commit, or Commit Ordering). Our proposal still supports the separation of implicit and explicit facts and also the abil- ity to distribute the load of inferencing to multiple engines. This is achieved, first, by using one data store which is internally partitioned in one for explicit and another for implicit knowledge. Second, by extending transactions such that updates to implicit knowledge caused by updates to explicit knowledge are committed all at once (or not at all), essentially making updates spanning both partitions atomic by combining them into one transaction. The partitioning also has the advantage that all implicit knowledge can be easily discarded (by clearing the partition).
We illustrate the need for combining updates to either of the partitions into one transaction using the following example.
Example 6.6
Imagine a transaction Te making updates in the partition for explicit knowledge, i.e., that has a nonempty changeset δ(Te). Furthermore, we assume that Tedoes not conflict with another transaction. The successful commit of Te implies updates in the partition for implicit knowledge, computed by an inferencing engine after Te commits. Let us assume that these updates were applied to the partition for implicit knowledge by a transaction Ti having the nonempty changeset δ(Ti). Consequently, Ti does not start before Te commits. In practice there would be a time interval between commit of Te and commit of Ti in which another read-only transaction Tr might be executed. In this interval Tr sees the update δ(Te) but not yet δ(Ti); that is, not the entailments of the update to the explicit knowledge. Obviously, this would be unsound w.r.t. to the underlying DL.
The missing knowledge anomaly described inExample 6.6can be avoided if Teand Ti are combined into one transaction, essentially making application of explicit and implicit updates atomic. Since both updates become visible all at once in both partitions other transactions cannot see partial updates. This implies, of course, that the change sets δ(Te), δ(Ti)need to be joined for conflict analysis against other active transactions.