Games for query inseparability of description logic knowledge bases

(1)

BIROn - Birkbeck Institutional Research Online

Botoeva, E. and Kontchakov, Roman and Ryzhikov, V. and Wolter, F. and

Zakharyaschev, Michael (2016) Games for query inseparability of description

logic knowledge bases. Artificial Intelligence 234 , pp. 78-119. ISSN

0004-3702.

Downloaded from:

Usage Guidelines:

Please refer to usage guidelines at

or alternatively

(2)

Games for Query Inseparability of Description Logic Knowledge Bases

Elena Botoevaa_{, Roman Kontchakov}b_{, Vladislav Ryzhikov}a_{, Frank Wolter}c_{, Michael Zakharyaschev}b

a_{KRDB Research Centre, Free University of Bozen-Bolzano, Italy}

b_{Department of Computer Science and Information Systems, Birkbeck, University of London, UK} c_{Department of Computer Science, University of Liverpool, UK}

Abstract

We consider conjunctive query inseparability of description logic knowledge bases with respect to a given signature— a fundamental problem in knowledge base versioning, module extraction, forgetting and knowledge exchange. We give a uniform game-theoretic characterisation of knowledge base conjunctive query inseparability and develop worst-case optimal decision algorithms for fragments ofHorn-ALCH I, including the description logics underpinning

OWL 2 QL andOWL 2 EL. We also determine the data and combined complexity of deciding query inseparabil-ity. While query inseparability for all of these logics is P-complete for data complexity, the combined complexity ranges from P- to E_xpT_ime- to 2E_xpT_ime-completeness. We use these results to resolve two major open problems for

OWL 2 QLby showing that TBox query inseparability and the membership problem for universal conjunctive query solutions in knowledge exchange are both ExpTime-complete for combined complexity. Finally, we introduce a more

flexible notion of inseparability which compares answers to conjunctive queries in a given signature over a given set of individuals. In this case, checking query inseparability becomes NP-complete for data complexity, but the ExpTime

-and 2ExpTime-completeness combined complexity results are preserved.

Keywords: Description logic, knowledge base, conjunctive query, query inseparability, games on graphs, computational complexity.

1. Introduction

A description logic (DL) knowledge base (KB) consists of a terminological box (TBox) and an assertion box (ABox). The TBox represents conceptual knowledge by providing a vocabulary for a domain of interest together with axioms that describe semantic relationships between the vocabulary items. To illustrate, consider the following toy TBoxT_a, which defines a vocabulary for the automotive industry:

MinivanvAutomobile,

HybridvAutomobile,

Automobilev ∃poweredBy.Engine,

Hybridv ∃poweredBy.ElectricEngineu ∃poweredBy.InternalCombustionEngine,

ElectricEnginevEngine,

InternalCombustionEnginevEngine.

The first two axioms say that minivans and hybrids are automobiles, the third one claims that every automobile is powered by an engine, and the fourth axiom states that every hybrid is powered by an electric engine and also by an internal combustion engine. Thus, the TBox introduces, among others, the concept names (sets)Minivan,Automobile andEngine, states that the conceptMinivanis subsumed by the conceptAutomobileand uses the role name (binary

(3)

relation)poweredByto say that automobiles are powered by engines. TBoxes, often called ontologies, are represented in many applications using the syntax of the Web Ontology Language OWL 2 (www.w3.org/TR/owl2-overview).

The ABox of a knowledge base is a set of facts storing data about the concept and role names introduced in the TBox. As an example ABox in the automotive domain, we will use the following set of assertions:

A_a={Hybrid(toyota highlander),Minivan(toyota highlander),

Minivan(nissan note),poweredBy(nissan note,hr15de),InternalCombustionEngine(hr15de)}.

Typical applications of KBs in modern information systems use the semantics of the concepts and roles in the TBox to enable the user to query the data in the ABox. This is particularly useful if the data is incomplete or comes from heterogeneous data sources, which is the case, for example, in linked data applications [1] and large-scale data integration projects [2, 3], or if the data comprises the web content gathered by search engines using semantic markup [4].

As the data may be incomplete, the open world assumption is adopted when querying a KB K: a tuple a of individuals fromKis a (certain) answer to a queryqoverKifq(a) is true in every model ofK. Since general first-order queries are undecidable under the open-world semantics, the basic and most important querying instrument is conjunctive queries (CQs), which are ubiquitous in relational database systems and form the core of the Semantic Web query language SPARQL (www.w3.org/TR/sparql11-query). In our context, a CQ q(x) is a first-order formula

∃yϕ(x,y) such thatϕ(x,y) is a conjunction of atoms of the formA(z1) orP(z1,z2), for a concept nameA, a role name

P, and variablesz1,z2fromx,y.1For example, to find minivans powered by electric engines, one can use the CQ

q(x)=∃y Minivan(x)∧poweredBy(x,y)∧ElectricEngine(y),

withtoyota highlanderbeing the only certain answer toq(x) over (T_a,A_a).

The problem of answering CQs over KBs has been the focus of significant research in the DL community: deep complexity results have been obtained for a broad range of DLs (see below), new DLs have been introduced with tractable (in data complexity) query answering [5, 6], a variety of query answering techniques have been invented [6, 7] and implemented in a number of powerful software systems (see, e.g., [8] and references therein).

Apart from developing query answering techniques, a major research problem is KB engineering and mainte-nance. In fact, with typically large data and often complex and tangled ontologies, tool support for transforming and comparing KBs is becoming indispensable for applications. To begin with, KBs are never static entities. Like most software artefacts, they are updated to incorporate new information, and distinct versions are introduced for different applications. Thus, developing support for KB versioning has become an important research problem [9, 10]. As dealing with a large and semantically tangled KB can be costly, one may want to extract from it a smaller module that is indistinguishable from the whole KB as far as the given application is concerned [11]. Another technique for extracting relevant information is forgetting, where the task is to replace a given KB with a new one, which uses only those concept and role names that are needed by the application but still provides the same information about those names as the original KB [12, 13]. Finally, the vocabulary of a given KB may not be convenient for a new application. In this case, similarly to data exchange in databases [14]—where data structured under a source schema is converted to data under a target schema—one may want to transform a KB in a source signature to a KB given in a more useful target signature and representing the original KB in an accurate way. This task is known as knowledge exchange [15, 16].

In this article, we investigate a relationship between KBs that is fundamental for all such tasks if querying the data via CQs is the main application. LetΣbe a relational signature consisting of a finite set of concept and role names. We say that KBsK1andK2areΣ-query inseparableand writeK1≡ΣK2if any CQ formulated inΣhas the same answers

overK1andK2. Note that even forΣcontaining all concept and role names in the KBs,Σ-query inseparability does

not necessarily imply logical equivalence: for example, (∅,{A(a)}) is{A,B}-query inseparable from ({BvA},{A(a)}) but the two KBs are clearly not logically equivalent. Thus, if KBs are used for purposes other than querying data via CQs, then different notions of inseparability are required. We now discuss the applications ofΣ-query inseparability for the tasks mentioned above in more detail.

(4)

Versioning. Version control systems for KBs provide a range of operations including, for example, computing the relevant differences between KBs, merging KBs and recovering KBs. All these operations rely on checking whether two versions,K₁ andK₂, of a KB are indistinguishable from the application point of view. If that application is querying the data via CQs in a given relational signatureΣ, thenK₁andK₂should be regarded as indistinguishable just in case they give the same answers to CQs formulated inΣ. Thus, the basic task for a query-centric approach to KB versioning is to check whetherK1≡_ΣK2.

Modularisation. Modularisation and module extraction are major research topics in ontology engineering and main-tenance. In module extraction, the problem is to find a (small) subset of the axioms of a given large KB that is indistin-guishable from the KB with respect to the intended application. If that application is querying a KBKusing CQs in a relational signatureΣ, then the problem is to find a smallΣ-query module ofK, that is, a KBK0_{⊆ K}_with_K0_≡

ΣK. Note that one can extract a minimalΣ-query module from a KB using a polynomial-time algorithm with theΣ-query inseparability check as an oracle (see, e.g., [17]). To illustrate the notion ofΣ-query module, consider the automotive knowledge baseKa =(Ta,Aa) defined above and the relational signatureΣm={Automobile,Engine,poweredBy}. ThenKm=(Tm,Am) is aΣm-query module ofKa, where

Tm={MinivanvAutomobile,Automobilev ∃poweredBy.Engine,InternalCombustionEnginevEngine},

Am={Minivan(toyota highlander),

Minivan(nissan note),poweredBy(nissan note,hr15de),InternalCombustionEngine(hr15de)}.

Knowledge Exchange. In knowledge exchange, we want to transform a KBK₁in a relational signatureΣ1to a KB K₂in a new signatureΣ2connected toΣ1via a declarative mapping specification given by a TBoxT₁₂. Such mapping specifications between KBs are also known as ontology alignments or ontology matchings and have been studied extensively [18]. If, as above, we are interested in querying data via CQs, then the target KBK2should be a sound

and complete representation ofK1with respect to answers to CQs, and so should satisfy the conditionK1∪T12 ≡Σ2 K2,

in which case it is called a universal CQ-solution. To illustrate, consider again the knowledge baseKa = (Ta,Aa) and letTaeconnect the relational signatureΣaofKatoΣe ={Car, HybridCar, ElectricMotor, Motor, hasMotor} by means of the following axioms:

AutomobilevCar, HybridvHybridCar, poweredByvhasMotor,

EnginevMotor, ElectricEnginevElectricMotor.

ThenKe=(Te,Ae) is a universal CQ-solution, where

T_e={ElectricMotorvMotor,Carv ∃hasMotor.Motor,HybridCarvCaru ∃hasMotor.ElectricMotor},

Ae={HybridCar(toyota highlander),Car(nissan note),hasMotor(nissan note,hr15de),Motor(hr15de)}.

Forgetting. A KBK0 is said to result from forgetting a relational signatureΣin a KB K ifK0 ≡sig(K)\Σ K and sig(K0) ⊆sig(K)\Σ, wheresig(K) is the relational signature ofK. Thus, the result of forgettingΣdoes not use

Σand gives the same answers to CQs without symbols inΣasK. The result of forgetting is also called a uniform interpolant for K with respect to sig(K)\Σ. Forgetting is of interest in a number of scenarios. Typically, when reusing an existing KB in a new application, only a small number of its symbols is relevant, and so instead of reusing the whole KB, one can take a potentially smaller KB resulting from forgetting the extraneous symbols. Forgetting can also be used for predicate hiding: if a KB is to be published, but some part of it has to be concealed from the public, then this part can be removed by forgetting its symbols [19]. Finally, forgetting can be used for KB summary: the result of forgetting often provides a smaller and more focused KB that summarises what the original KB says about the retained symbols, potentially facilitating comprehension. To illustrate, the KBK_f =(T_f,A_f) results from forgettingΣf ={Minivan,Hybrid,ElectricEngine,InternalCombustionEngine}inKa, where

Tf ={Automobilev ∃poweredBy.Engine},

A_f ={Automobile(toyota highlander),

(5)

Horn-ALCH I

Horn-ALCI

Horn-ALCH

Horn-ALC ELHdr_⊥

EL

DL-LiteH_horn

DL-Litehorn

DL-LiteH_core

DL-Litecore

P Thms. 21, 33

ExpTime

Thm. 21

ExpTime

Thms. 31, 34

P Thms. 32, 33 2ExpTime

Thms. 31, 35

forward strategy

general strategy

[image:5.595.96.502.108.254.2]

backward+forward strategy

Figure 1: Summary of the combined complexity results.

In this article, we develop worst-case optimal algorithms decidingΣ-query inseparability of KBs given in various fragments of the description logicHorn-ALCH I[20], which includeDL-LiteH

core[6, 21] andELH dr

⊥ [22] underlying

the OWL 2 profilesOWL 2 QL andOWL 2 EL (www.w3.org/TR/owl2-profiles). The algorithms are based on

two characterisations ofΣ-query inseparability, one of which is model-theoretic and the other game-theoretic. The former characterisesΣ-query inseparability in terms of partialΣ-homomorphisms between materialisations, that is, interpretationsMof KBsK such that the certain answers to any CQqoverK coincide with the answers to CQ q overM. AnyHorn-ALCH IKB has a materialisation. While materialisations can be infinite, we show that one can always compute a finitegenerating structurefrom which a materialisation is obtained by unravelling. We then develop a game-theoretic machinery for checking the existence of partialΣ-homomorphisms between materialisations by playing two-player games on the corresponding finite generating structures. Thus, our algorithms consist of two components: computing finite generating structures for the given KBs and deciding the existence of winning strategies for the games on these structures.

We use the constructed algorithms to obtain optimal upper bounds for the data and combined complexity of decidingΣ-query inseparability for KBs given in all of the DLs mentioned above.Σ-query inseparability turns out to be P-complete for data complexity, which matches the complexity of CQ evaluation for all of our DLs lying outside theDL-Litefamily. For combined complexity, the obtained tight complexity results are summarised in Fig. 1. Most interesting are E_xpT_ime-completeness of DL-LiteH

core and 2ExpTime-completeness of Horn-ALCI, which contrast with NP- and E_xpT_ime-completeness of CQ evaluation for these logics. We note in passing that the 2E_xpT_ime-hardness proof goes through for the fragmentELIofHorn-ALCI. ForDL-Litewithout role inclusions,ELandELHdr_⊥,Σ -query inseparability is P-complete, while CQ evaluation is NP-complete. In general, it is the combined presence of inverse roles and qualified existential restrictions (or role inclusions) that makesΣ-query inseparability hard. The matching lower bounds are established by a (rather involved) encoding of suitable alternating Turing machines.

We apply our complexity results for Σ-query inseparability to resolve two important open problems. First, we show that, in knowledge exchange, the membership problem for universal CQ-solutions for DL-LiteH_core KBs is ExpTime-complete for combined complexity, which settles an open question of [23], where only PSpace-hardness

was established. Second, we show that decidingΣ-query inseparability ofDL-LiteHcoreTBoxes (for arbitrary ABoxes) is ExpTime-complete, which closes the PSpace–ExpTimegap that was left open by Konev et al. [24].

In the definition ofΣ-query inseparability above, we took account ofalltuples of individuals in the KBs that could be certain answers to CQs. In some applications, however, we may be interested only in a specific set of individuals over which the certain answers should be compared. LetΓ be an individual signature consisting of a finite set of individual names. For KBsK₁,K₂and a relational signatureΣ, we say thatK₁andK₂are (Σ,Γ)-query inseparable if any CQ formulated inΣhas the same certain answers among the individuals inΓover bothK₁andK₂, in which case we writeK₁≡_Σ_,_ΓK₂. Clearly, ifΓcontains all individuals inK₁∪ K₂, then (Σ,Γ)-query inseparability impliesΣ-query inseparability. (Σ,Γ)-query inseparability can be used to refineΣ-query inseparability as a foundation for versioning, modularisation, forgetting and knowledge exchange.

For instance, a KBK0_{is a (}_Σ_,_Γ_{)-query module of a KB}_K_if_K0_{⊆ K}_and_K0_≡

(6)

automo-tive ontologyK_a =(T_a,A_a) and the relational signatureΣm={Automobile,Engine,poweredBy}. Unlike our exam-ple illustratingΣ-query modules, we now restrict the individual signature toΓm={toyota highlander,nissan note} thereby leaving outhr15defrom the set of individuals considered. Then the KBK0

m=(Tm0,A0m) is a (Σm,Γm)-query module ofK_a, where

T_m0 ={MinivanvAutomobile,Automobilev ∃poweredBy.Engine},

A0_m={Minivan(toyota highlander),Minivan(nissan note)}.

Thus, the restriction of the individual signature removes the two assertions withhr15defromA_mas well as an axiom fromT_m.

Similarly, a KBK0_{results from forgetting (}_Σ_,_Γ_{) in a KB}_K _if_K0 _≡

sig(K)\Σ,ind(K)\ΓK,sig(K0)⊆sig(K)\Σand ind(K0₎_⊆_ind₍_K₎_\_Γ_{, where}_ind₍_K_{) is the set of individuals in the ABox of}_K_{. In this case, for}_Γ

f ={hr15de}, the KBK0

f =(T 0 f,A

0

f) results from forgetting (Σf,Γf) inKa, where

T_f0={Automobilev ∃poweredBy.Engine},

A0_f ={Automobile(toyota highlander),Automobile(nissan note)}.

In knowledge exchange, the refined notion of query inseparability can be used to represent a more flexible knowl-edge exchange model, which allows additional individuals in the target KB. These ‘anonymous’ individuals are similar to nulls in the standard approaches to incomplete databases [25]. Thus, we say that a KBK2with a relational signature Σ2is a universal CQ-solution with nulls for a KBK1and a mapping specificationT12ifK1∪ T12 ≡Σ2,ind(K1)K2(here,

the individuals inind(K2)\ind(K1) play the role of nulls). To illustrate, we consider again the knowledge exchange

example given above with the sameΣeandTae. Observe first thatKeis also a universal CQ-solution with nulls. On the other hand, there are universal CQ-solutions with nulls that are not universal CQ-solutions. To illustrate, letm1be

a fresh individual name. ThenK_e0=(∅,A0_e) is a universal CQ-solution with nulls forKaandTae, where

A0_e={HybridCar(toyota highlander),Car(toyota highlander),

hasMotor(toyota highlander,m1),ElectricMotor(m1),Motor(m1),

Car(nissan note),hasMotor(nissan note,hr15de),Motor(hr15de)}.

Intuitively,A0

eis a materialisation of all consequences ofKa∪Taein the relational signatureΣeand, among individuals ofK_a, it clearly gives rise to the same answers to all CQs formulated inΣe(the additional individual,m1, is not counted

when comparing the CQ answers). The interested reader is referred to [23] for more explanations on the advantages of this notion.

We extend our algorithms decidingΣ-inseparability to algorithms deciding (Σ,Γ)-inseparability and investigate the data and combined complexity of the problem for KBs given in the same fragments ofHorn-ALCH Ias before. In contrast toΣ-query inseparability, which is P-complete for data complexity for all of those fragments, deciding (Σ,Γ)-query inseparability turns out to be NP-complete for data complexity. (In fact, it is NP-hard already for KBs without TBoxes since (Σ,Γ)-query inseparability is then equivalent to the problem of deciding the existence of a homomorphism from one relational structure to another, which is known to be NP-hard.) For combined complexity, (Σ,Γ)-query inseparability is exactly as hard asΣ-query inseparability whenever it is already NP-hard.

(7)

2. Horn-ALCH I and its Fragments

In this article, we investigateΣ- and (Σ,Γ)-query inseparability of KBs given in DLs that are Horn fragments2_of

ALCH I. To define these DLs, we fix sequences ofindividual names ai,concept names Ai, androle names Pi, for i< ω. Aroleis either a role namePior aninverse role P−i; we assume that (P

− i)

−₌_P

i.ALCI-conceptsare defined by the grammar

C ::= Ai | > | ⊥ | ¬C | C1uC2 | C1tC2 | ∃R.C | ∀R.C, (ALCI)

whereRis a role. ALC-conceptsare thoseALCI-concepts that do not contain inverse roles. ALCI-TBoxesand ALC-TBoxesare finite sets ofconcept inclusionsof the form

C1vC2,

where theCi areALCI- or, respectively,ALC-concepts. ALCH I-TBoxesare finite sets of concept inclusions in

ALCIandrole inclusionsof the form

R1vR2,

where theRiare roles.ALCH-TBoxesareALCH I-TBoxes that do not contain occurrences of inverse roles. The DLs in theELandDL-Lite families are sub-Boolean fragments ofALCH I. EL-conceptsare defined by the grammar

C ::= Ai | > | C1uC2 | ∃Pi.C. (EL)

In other words, they areALC-concepts without⊥,t,¬and∀Pi.C. Note thatELdoes not have inverse roles. EL -TBoxesare finite sets of concepts inclusions inEL.ELHdr_⊥ is an extension ofELwith⊥, role inclusions and domain and range restrictions. Thus,ELHdr_⊥-conceptsare defined similarly toEL-concepts but can also use⊥, andELHdr_⊥ -TBoxes consist of a finite number ofELHdr_⊥-concept inclusions, role inclusions (without inverse roles), andrange restrictionsof the form

> v ∀Pi.C

(domain restrictions are expressible by means of concept inclusions∃Pi.> vC). Clearly,ELandELHdr⊥ are

sub-languages ofALCandALCH, respectively.

Basic conceptsinDL-Liteare defined by the following grammar:

B ::= Ai | > | ⊥ | ∃R.>, (DL-Lite)

whereR is a (possibly inverse) role. Existential quantifiers∃R.>are calledunqualified, and we usually write∃R instead of∃R.>.DL-Litecore-TBoxesare finite sets of concept inclusions of the form

B1vB2 and B1uB2v ⊥,

where theBiare basic concepts.DL-Litehorn-TBoxesconsist of a finite number of concept inclusions of the form

B1u · · · uBkvB.

DL-LiteH_core- andDL-LiteH_horn-TBoxes contain, in addition, a finite number of role inclusions and role disjointness axioms of the formR1uR2v ⊥. Note that, unlikeELandELHdr⊥, theDL-Litelogics do have inverse roles.

To introduce the Horn fragments of the DLs with the Booleans operators, we require the following (standard) recursive definition [5, 26]. We say that a conceptCoccurs positively inCitself and, ifCoccurs positively (negatively) inC0_{, then}

2_{Strictly speaking,}_DL-LiteH

coreandDL-Lite H

hornare notfragmentsofALCH Ibecause it does not have role disjointness constraints. However,

(8)

– Coccurs positively (respectively, negatively) inC0tD,C0uD,∃R.C0,∀R.C0,DvC0, and

– Coccurs negatively (respectively, positively) in¬C0andC0vD.

Now, we call a TBoxT Hornif no concept of the formCtDoccurs positively inT, and no concept of the form¬C or∀R.Coccurs negatively inT. Clearly, theEL- andDL-Lite-TBoxes are Horn by definition. For any other DLL (e.g.,ALCH I), only HornL-TBoxes are allowed in the DLHorn-L.

AnABox,A, is a finite set ofassertionsof the formAk(ai) orPk(ai,aj). AnL-TBoxT and an ABoxAtogether form anLknowledge base(KB)K =(T,A).

Arelational signatureis any non-empty finite set of concept and role names. Anindividual signatureis a (possibly empty) finite set of individual names. We usually denote a relational signature byΣ, an individual signature byΓ, and sometimes call the pair (Σ,Γ) simply asignature. The relational signature of a KBK =(T,A), which consists of the concept and role names occurring inK, is denoted by sig(K). The individual signature ofK, comprising the individual names inA, is denoted byind(K). In this article, we are not interested in KBs with empty ABoxes, and so bothsig(K) andind(K) are non-empty by definition. By aΣ-concept, Σ-role,Σ-ABox, etc. we understand any concept, role, ABox, etc. all of whose concept and role names are taken fromΣ.

Let (Σ,Γ) be a signature. In our interpretations, we adopt thestandard name assumptionin the sense that every individual namea ∈Γis interpreted by itself. A (Σ,Γ)-interpretationis a pairI=(∆I_,_·I_{), where}_∆I_⊇_Γ_{is a}

non-empty set, thedomainofI, and·I_{is an}_{interpretation function}_{that assigns a subset}_AI_⊆_∆I_{to every concept name}

Aand a binary relationPI_⊆_∆I_×_∆I_{to every role name}_P_{in such a way that}_AI₌_∅_and_PI₌_∅_{, for any}_A

<Σand

P<Σ. (Note that only the individual names fromΓare interpreted inIand although the list of individual names is

countably infinite,∆I_{may be finite. Note also that the concept and role names outside}_Σ_{are always interpreted as}_∅_.)

When we use the terms ‘interpretation’, ‘Σ-interpretation’ or ‘Γ-interpretation’ without specifying a full signature, we mean a (Σ,Γ)-interpretation for some suitable (Σ,Γ); the same applies to other notions with the prefix (Σ,Γ) to be introduced below.

Roles and complex concepts are interpreted inIas follows:

(P−_i)I=_(v,_u)_|_(u,_v)_∈_PI

i , >

I_{= ∆}I_,

⊥I=∅, (¬C)I= ∆I\CI,

(C1uC2)I=CI1 ∩C

I

2 (C1tC2)I=C1I∪C

I

2,

(∃R.C)I=_u_|_(u,_v)_∈_RI_and_v_∈_CI , ₍_∀_R._C)I=_u_|_v_∈_CI_{for all (u},_v)_∈_RI .

For an inclusion or assertionα(whose individual names belong toΓ), we define thetruth-relationI |=αby taking:

I |=C1vC2 iff CI1 ⊆C

I

2, I |=R1vR2 iff RI1 ⊆R

I

2,

I |=R1uR2v ⊥ iff RI₁ ∩RI₂ =∅,

I |=Ak(ai) iff ai∈AI_k, I |=Pk(ai,aj) iff (ai,aj)∈PI_k.

Given a KBK =(T,A), aΓ-interpretationIis called amodelofK ifind(K) ⊆ΓandI |=α, for allα∈ T ∪ A. In this case we writeI |=K. We writeK |=α, for an inclusion or assertionαthat only uses individual names from ind(K), ifI |=αfor all modelsIofK. The notationK |=C(a), whereCis any concept anda∈ind(K), should be understood in the same way. Finally,Kisconsistentif it has a model.

Aconjunctive query(CQ)q(x) is a formula∃yϕ(x,y), whereϕis a conjunction of atoms of the formAk(z1) or

Pk(z1,z2) withz1,z2fromx,y. LetKbe a KB andq(x) a CQ. We call a tupleaof elements fromind(K) (of the same

length asx) acertain answertoq(x) overKifI |=q(a) for all modelsIofK(understood as first-order structures). In this case we writeK |=q(a). Forqwithout free variables, the answer toqis ‘yes’ ifK |=qand ‘no’ otherwise. We slightly abuse notation and writea⊆Γto say that all elements of the tupleaare inΓ.

We remind the reader that, for combined complexity, the problem ‘K |=q(a)?’ is NP-complete for theDL-Lite

logics [6],ELandELHdr_⊥ [27], and E_xpT_ime-complete for the remaining Horn DLs introduced above [28]. For data complexity (with fixedT andq), this problem is in AC0_{for the}_DL-Lite_{logics [6] and P-complete for the remaining}

(9)

3. Σ-Query Entailment, Materialisation and (Σ,Γ)-Homomorphism

We now define the central concepts of the article,Σ-query entailment andΣ-query inseparability, provide them with a semantic characterisation based on the notion of materialisation, and develop a theory of finitely generated materialisations.

Definition 1. LetK1andK2be KBs andΣa relational signature. We say thatK1Σ-query entailsK2if

K2|=q(a) implies a⊆ind(K1) and K1|=q(a), for allΣ-CQsq(x) and all tuplesa⊆ind(K2).

Knowledge basesK₁andK₂areΣ-query inseparableif theyΣ-query entail each other; in this case we writeK₁≡_ΣK₂.

Remark 2. For KBs given in Horn DLs,Σ-query entailment for CQs impliesΣ-query entailment for UCQs, that is, unions (or disjunctions) of conjunctive queries. This follows from the fact that, for any KBKin a Horn DL and any

UCQq(x), a tupleais a certain answer toq(x) overKiffit is a certain answer to some CQ inq(x) overK. Thus, our

results forΣ-query entailment and inseparability apply to UCQs as well.

We first quickly considerΣ-query entailment for the degenerate case when one of the involved KBs is inconsistent so that in the remainder of the article we can focus on consistent KBs only. Clearly, an inconsistentK1Σ-query entails

a KBK2just in casea ∈ ind(K1) for alla ∈ ind(K2) with eitherK2 |= A(a) orK2 |=(∃R)(a), for someA ∈ Σor Σ-roleR. Now, suppose thatK1is consistent andK2is inconsistent. ThenK1Σ-query entailsK2iffK1 |=A(a) and

K1 |=P(a,b), for all concept and role namesA,P ∈Σand alla,b ∈ind(K2). Thus, decidingΣ-query entailment in

this case reduces to checking certain answers for all atomicΣ-CQs. A simple example showing that a consistent KB K₁canΣ-query entail an inconsistent KBK₂ is given byK₁ =(∅,{A(a)}) andK₂ =({Av ⊥},{A(a)}) withΣ ={A}. From now on we assume that all our KBs areconsistent.

Definition 3. LetKbe a KB. A (sig(K),ind(K))-interpretationIis called amaterialisationofKif

K |=q(a) iff I |=q(a), for all CQsq(x) and all tuplesa⊆ind(K).

We say thatKismaterialisableif it has a materialisation. (Note that we do not require a materialisation ofKto be a modelofK.)

Materialisations can be used to characterise Σ-query entailment by means of homomorphisms. Let (Σ,Γ) be a signature. For an interpretationI, theatomicΣ-typestI_Σ(u) andrI_Σ(u,v) ofu,v∈∆Iare defined by taking:

tI_Σ(u)=Σ_{-concept name}_A_|_u_∈_AI _and _rI

Σ(u,v)=Σ-roleR|(u,v)∈RI .

(It is to be emphasised that aΣ-role can be an inverse role even when we consider a language without role inverses.) We say that an elementu ∈∆I_is_Σ_{-participating in}_I_if_tI

Σ(u) ,∅or rI_Σ(u,v),∅, for somev∈∆I. The set of all

individual names that areΣ-participating inIis denoted bypartI_Σ. LetIibeΓi-interpretations, fori=1,2, such that

Γ∩partI1

Σ ⊆Γ2. A (Σ,Γ)-homomorphism hfromI1toI2is a functionh: ∆I1→∆I2such that

– h(a)=a, for everya∈Γ∩partI1

Σ ,

– tI1

Σ (u)⊆tIΣ2(h(u)) and rIΣ1(u,v)⊆rIΣ2(h(u),h(v)), for allu,v∈∆I1.

Example 4. ForΓ1 ={a,b,c}, letI1be aΓ1-interpretation with∆I1 ={a,b,c},AI1 ={a},BI1 ={b}andCI1 ={c}.

IfΣ ={A}thenpartI1

Σ ={a}as neitherbnorcisΣ-participating inI1. ForΓ2={a,b,d}, letI2be aΓ2-interpretation

with∆I2 =_{_a,_b,_d_}_,_AI2 = _{_a_}_,_BI2 = _{_d_}_and_CI2 =_{_b_}_{. In this case, any map}_h_: ∆I1 _→ ∆I2 _with_h(a) = _a_{is a}

({A},{a,b})-homomorphism fromI1toI2. However, there is no ({A,B},{a,b})-homomorphism fromI1toI2because

partI1

{A,B}={a,b}butt I1

{A,B}(b)*t I2

{A,B}(b).

We remind the reader of the following well-known link between certain answers to CQs and homomorphisms. Consider a CQ q(x) = ∃yϕ(x,y), aΓ0_{-interpretation}_I_{, and a tuple} _a _⊆_Γ0_{of the same length as} _{x. Let}_Σ_{be the}

(10)

a A

u

P R S T S T

Q Q Q

Q a

A

w

S,Q

T,Q

S,Q

T,Q R

R

a A P R−

S−

T− _S−

Q−

a

A

S,Q

T,Q S,Q R

R

I₂

I1

G2

[image:10.595.113.495.111.305.2]

G₁

Figure 2: MaterialisationsI2andI1from Example 5 (dotted lines indicate a partial homomorphism) and their generating structures,G2andG1.

whose domain consists of the individuals inaand variables in y, andIϕ(a,y) |=S(z) iffS(z) is a conjunct ofϕ(a,y).

In this case, we haveI |=q(a) iffthere is a (Σ,Γ)-homomorphism fromIϕ(a,y)toI.

SupposeI_iis a materialisation ofK_i, fori=1,2. Since a composition of homomorphisms is again a homomor-phism, if there is a (Σ,ind(K₂))-homomorphism fromI₂ toI₁, thenK₁Σ-query entailsK₂. The converse, however, does not necessarily hold, as shown by the following example.

Example 5. Consider the KBsK_i=(T_i,{A(a)}), fori=1,2, where

T1={Av ∃S, ∃S−v ∃T, ∃T−v ∃S, S vQ, T vQ, ∃Q−v ∃R},

T₂={Av ∃P, ∃P−v ∃R−, ∃Rv ∃S−u ∃Q−, ∃Qv ∃Q−, ∃S v ∃T−, ∃T v ∃S−}.

It is not hard to see (and it will be formally established below) that the interpretationsI₁andI₂shown in Fig. 2 are materialisations ofK₁andK₂, respectively. Now, forΣ ={Q,R,S,T}, there is no (Σ,{a})-homomorphism fromI₂to I₁. Indeed, if we maputo, say,wthen only the shaded part ofI₂ can be mapped (Σ,{a})-homomorphically toI₁. On the other hand,I₂|=q(a) impliesI₁|=q(a), for anyΣ-CQq(x), because anyfinitesubinterpretation ofI₂can be (Σ,{a})-homomorphically mapped toI₁. This example motivates the following definitions.

Asubinterpretationof a (Σ,Γ)-interpretationI=(∆I_,_·I_{) is a (}_Σ_,_Γ_{)-interpretation}_I0₌₍_∆I0

,·I0

) with∆I0 ⊆∆I_,

AI0

=AI_∩_∆I0

andPI0

=PI_∩₍_∆I0 ×∆I0

), for all concept and role namesAandP. Now, given a signature (Σ,Γ), we say that an interpretationI2isfinitely(Σ,Γ)-homomorphically embeddableinto an interpretationI1if, for every

finitesubinterpretationI0₂ofI2, there exists a (Σ,Γ)-homomorphism fromI0₂toI1.

In the proof of the following criterion ofΣ-query entailment, we regard any finite subinterpretation ofI2as a CQ

whose variables are the elements of∆I2_{, with}_ind₍_K

2) being the answer variables.

Theorem 6. SupposeKiis a KB with a materialisationIi, for i=1,2. ThenK1Σ-query entailsK2iffI2is finitely

(Σ,ind(K₂))-homomorphically embeddable intoI₁.

Proof. (⇒) SupposeK₁Σ-query entailsK₂. Leta=(a1, . . . ,an) be an enumeration of the individual names inind(K2)

that areΣ-participating inI₂. Take any finite subinterpretationI0

2 ofI2and let u1, . . . ,un+m be an enumeration of those elements of∆I0

2that areΣ-participating inI₂and such thatu_i=a_i, fori≤n. Consider aΣ-CQ

q(x1, . . . ,xn)=∃xn+1. . .∃xn+mϕ(x1, . . . ,xn+m), where ϕ(x1, . . . ,xn+m)=

^

i≤n+m A∈tI2

Σ (ui)

A(xi) ∧

^

i,j≤n+m R∈rI2

Σ (ui,uj)

(11)

SinceI₂|=ϕ(u1, . . . ,un+m), we haveI2 |=q(a) and, sinceI2is a materialisation,K2 |=q(a). AsK1Σ-query entails

K₂, we havea⊆ind(K₁) andK₁|=q(a). SinceI₁is a materialisation,I₁ |=q(a), and soI₁ |=ϕ(a,vn+1, . . . ,vn+m), for somevn+1, . . . ,vn+m ∈ ∆I1. Define a maph: ∆I

0

2 → ∆I1 _{by taking}_h(a

i) =ai, fori≤ n, andh(un+i) =vn+i, for i≤m(the rest of the domain ofI0

2can be mapped arbitrarily as they are notΣ-participating in it). It can be readily

seen thathis a (Σ,Γ)-homomorphism fromI0

2toI1.

(⇐) SupposeI₂is finitely (Σ,ind(K₂))-homomorphically embeddable intoI₁. Consider aΣ-CQq(x) =∃yϕ(x,y) and letK₂ |= q(a), for some a ⊆ind(K₂). SinceI₂is a materialisation ofK₂, there is a tupleu = (u1, . . . ,um) of elements in∆I2_{such that}_I

2 |=ϕ(a,u). LetI02be the subinterpretation ofI2with∆I

0

2 =ind(K₂)∪ {u₁, . . . ,u_m}and

lethbe a (Σ,ind(K₂))-homomorphism fromI0

2toI1. Observe that each individual inaisΣ-participating inI02, and

soh(ai)=ai, for eachaiina. We also haveI1|=ϕ(a,h(u1), . . . ,h(um)), whencea⊆ind(K1) andK1 |=q(a). q

One problem with applying Theorem 6 is that materialisations are in general infinite for any of the DLs considered in this article. We address this problem by introducing finite representations of materialisations and showing that

Horn-ALCH Iand all of its fragments defined above do have such finite representations.

Definition 7. LetKbe a KB and letG=(∆G,·G, ) be a finite structure such that

– ∆G₌_ind₍_K₎_∪_Ω_{, for some set}_Ω_{disjoint from}_ind₍_K_),

– (∆G,·G_{) is an interpretation with}_PG

i ⊆ind(K)×ind(K), for all role namesPi,

– (∆G_, _{) is a directed graph (possibly containing loops) with nodes}_∆G_{and arrows} _⊆_∆G_×_Ω_{, in which}

– everyw w0 is labelled with a set (w,w0)G , ∅ of roles such that (w1,w0)G = (w2,w0)G whenever

wi w0, fori=1,2,

– everyw∈Ωis reachable by a path fromind(K),

where by apath,σ, we mean any sequencew0· · ·wnwithw0∈ind(K) andwi wi+1fori<n.

Intuitively,w w0means thatwgeneratesw0 to witness an existential restriction∃R.C, and the label ofw w0 consists of the super-roles ofR. Hence, the labels on all incoming -arrows ofw0are required to coincide.

TheunravellingMofGis a (sig(K),ind(K))-interpretation (∆M_,_·M_{) such that}

∆M_{is the set of paths in}_G_,

AM=σ_|_tail₍σ₎_∈_AG _, _{for each concept name}_A_,

PM=PG∪₍σ, σ_w)_|_tail₍σ₎ _w, _P_∈₍_tail₍σ₎,_w)G

∪₍σ_w, σ₎_|_tail₍σ₎ _w, _P−_∈₍_tail₍_σ₎_,_w)G _, _{for each role name}_P_,

wheretail(σ) is the last element of a pathσ. We callGagenerating structure forKif its unravelling is a materialisa-tion ofK. We say that a DLLhasfinitely generated materialisationsif everyL-KB has a generating structure.

For instance, the materialisationsI2andI1from Example 5 are isomorphic to the unravellings of the structures

G2andG1in Fig. 2, respectively, and soGiis a generating structure for the KBKifrom that example, fori=1,2.

To construct generating structures for KBs, we first transform their TBoxes into normal form [20]. LetLbe any of our DLs. AnL-TBox is said to be innormal formif its inclusions are of the following form:

A1vA2, > vA,

A1v ∀R.A2, > v ∀R.A,

∃R.CvA, Av ∃R.C,

A1uA2vA, R1 vR2,

whereA,A1,A2 are concept names,Cis a concept name or>, andR,R1,R2 are roles. To describe the relationship

between a TBox and its transformation into normal form, we introduce the notion of model inseparability. Let (Σ,Γ) be a signature. We say thatΓ-interpretationsI1andI2coincide onΣif∆I1= ∆I2andSI1 =SI2, for allS ∈Σ; in this

case we writeI1 =_ΣI2. KBsK1andK2withind(K1)=ind(K2) are calledΣ-model inseparableif, for every model

(12)

Theorem 8. LetLbe any of our DLs. Given a consistentL-KBK =(T,A), one can construct in polynomial time anL-KBK =(T0_,_A₎_{in normal form such that}_K_and_K0_are_sig₍_T_{)-model inseparable.}

(Note that the ‘negative’ axioms of the formAv ⊥,A1uA2v ⊥andR1uR2 v ⊥can be removed from a TBox

if the knowledge base is known to be consistent.)

We show now how to define the generating structures. Suppose we are given a (consistent) KBK =(T,A) with aHorn-ALCH ITBoxT in normal form. For a roleR, theequivalence class[R]of R with respect toT is defined by taking

[R]=_S _{| T |}=_R_v_S _and_{T |}=_S _v_R .

Denote bycon(T) the set of

– concepts of the form>,Aand∃R.Athat occur inT, as well as

– concepts of the form∃R−.Csuch thatT containsCv ∀R.A.

TheT-type of u ∈ ∆I _in_I_{is the set}_τI

T(u) =

C ∈ con(T) | u ∈ CI . We say thatτ ⊆ con(T) is aT-typeif there exists a modelIofT such thatτ=τI

T(u), for someu ∈ ∆

I_{. Denote by}_type₍_T_{) the set of all}_T_{-types. It is}

well-known [29] thattype(T) can be computed in exponential time in|T |. We can orderT-types by the set-theoretic inclusion⊆. Sometimes we useτin concepts (say,∃R.τ), in which case it should be understood as an abbreviation ford_C_∈_τC.

Now, we define thegenerating relation on the set comprisingind(K) andΩT, which is the set ofallpairs of

the form ([R],τ), for a roleRinT andτ∈type(T). Fora∈ind(K) and ([R1],τ1),([R2],τ2)∈ΩT, we set

a ([R2],τ2) iff τ2is a⊆-maximalT-type such thatK |=(∃R2.τ2)(a) and

K 6|=R2(a,b), for anyb∈ind(K) withτ2⊆C∈con(T)| K |=C(b) ;

([R1],τ1) ([R2],τ2) iff τ2is a⊆-maximalT-type such thatT |=τ1 v ∃R2.τ2.

Thegenerating structureG =(∆G_,_·G_, _{) is defined as follows. Let}_Ω _⊆_Ω

T be the set of allwsuch that there are

a ∈ ind(K) andw1, . . . ,wn ∈ ΩT witha w1 · · · wn = w; in other words,Ωis the subset ofΩT that is

reachable fromind(K) via -arrows. Thus,∆G ₌_ind₍_K₎_∪_Ω_{. (The restriction of} _to_∆G _{will also be denoted by}

.) Second, the interpretation function·Gand the labelling of the graph (∆G, ) are defined by setting

AG=_a_∈_ind₍_K₎_{| K |}=_A(a) _∪ _([R],τ₎_∈Ω_|_A_∈τ ,

PG=_(a,_b)_|_R(a,_b)_{∈ A}_and_{T |}=_R_v_P ,

(w,w0)G=_S _{| T |}=_R_v_S , _{for every}_w _w0_with_w₌_([R]_,_τ₎

(here we assume thatP−(b,a)∈ AifP(a,b)∈ A). In order to show that the constructedG=(∆G,·G, ) is indeed a generating structure forK, we need to establish that its unravelling is a materialisation.

Theorem 9. LetK =(T,A)be a(consistent)KB with aHorn-ALCH ITBox in normal form. LetGbe the structure defined above. Then the unravellingMofGis a materialisation ofK, andGis a generating structure forK.

Proof. We require two lemmas. The proof of the first one is routine and can be found in Appendix A:

Lemma 10. Mis a model ofK. Moreover,

– τM_T(a)=_C_∈_con₍_T₎_{| K |}=_C(a) _{, for all a}_∈_ind₍_K_);

– τM_T(σ)=τ, for allσ∈∆M_with_tail₍_σ₎₌_([R]_,_τ_).

The second lemma says thatMis a universal model ofKin the following sense:

(13)

Proof. LetΣ =sig(K) andΓ =ind(K). By induction on the length ofσ ∈∆M_{, we define a function}_h_: _∆M _→ _∆I

which satisfies the following properties implying thathis a (Σ,Γ)-homomorphism:

h(a)=a, fora∈Γ, (1)

τM T(σ)⊆τ

I

T(h(σ)), forσ∈∆

M_, ₍₂₎

rM_Σ (σ, σ0)⊆rI_Σ(h(σ),h(σ0)), forσ, σ0∈∆M. (3)

(Note that (2) refers to the fullT-types comprising concepts of the form>,Aand∃R.Brather than the atomicΣ-types tcontaining only concept names.)

First, for eacha ∈Γ, we seth(a) =ain accordance with (1). Conditions (2) and (3) forσ, σ0 _∈_Γ_{follow from}

Lemma 10, the fact thatIis a model ofK, and the construction ofM.

Suppose now thath(σ) has already been defined forσ·([S],τ)∈∆M_{. By the construction of}_M_{, it follows that}

K |=(∃S.τ)(a) ifσ =a, orT |=τ0 _{v ∃}_S_._τ_if_tail₍_σ₎ ₌_([S0_]_,_τ0_{). Since}_{I |}₌_K_{, by Lemma 10 and the induction}

hypothesis—τM_T(σ)⊆τI_T(h(σ))—it follows that there existsz∈∆Isuch thatS ∈rI_Σ(h(σ),z) andτ⊆τI_T(z). We set h(σ·([S],τ)) =zand show that (2) and (3) hold. By Lemma 10, we haveτM_T(σ·([S],τ)) =τ, whence (2). Next, observe thatR∈rM_Σ (σ, σ·([S],τ)) follows fromT |=S vR, and sinceIis a model ofT, we obtainR∈rI

Σ(h(σ),z),

thus proving (3). _q

We are now in a position to complete the proof of Theorem 9. We show thatK |=q(a) if and only ifM |=q(a), for any CQq(x) =∃yϕ(x,y) and a ⊆ind(K). IfK |= q(a) then, by Lemma 10,M |= q(a). Conversely, suppose

M |= q(a). Then there exist a tupleσ = (σ1, . . . , σm) of elements in∆M such thatM |= ϕ(a,σ). Let Ibe any

model of K. By Lemma 11, there exists a (sig(K),ind(K))-homomorphismh fromM toI. But then we have

I |=ϕ(a,h(σ1), . . . ,h(σm)), and soI |=q(a). q

Note that the generating structuresG=(∆G,·G, ) of KBsKwithHorn-ALCH I,Horn-ALCI,Horn-ALCH andHorn-ALCTBoxes can containexponentially many(in|T |) elements inΩ(remember that∆G =ind(K)∪Ω); cf. Section 5. Note also that if the TBox inK is inHorn-ALCH (or one of its fragmentsHorn-ALC,ELHdr_⊥ or EL) then it contains no inverse roles, and so the labels (w,w0)G on arrowsw w0of the generating structure do not contain inverse roles either. We call such generating structuresforward.

The generating structures of KBs with ELHdr_⊥ andEL TBoxesT containpolynomially many elements inΩ. Indeed, for every element ([R],τ)∈Ω, we can find a single concept∃R.AinT such that

τ = _C_∈_con₍_T₎_{| T |}=_A_u l

T |=RvS >v∀S.BinT

BvC .

(This is not the case forHorn-ALCbecause of axioms of the formA1 v ∀R.A2 withA1 ,>.) We remark that the

generating structures forELdefined above were initially represented as pairs of functions by Brandt [30] and later called the canonical models; see, e.g., [31]. We prefer the term ‘generating structure’ to avoid confusion with the possibly infinite canonical model (materialisation).

Finally, the generating structures for KBs withDL-LiteTBoxesT also contain polynomially many elements inΩ because every ([R],τ)∈Ωis determined by the roleR:

τ = _C_∈_con₍_T₎_{| T |}=_∃_R−_v_C _.

Observe that ifT does not contain role inclusions (which is the case forDL-Litecore andDL-LitehornTBoxes) then, for anywandR, there is at most onew0_{such that}_w _w0_and_R_∈_(w_,_w0₎G_{. Generating structures with this property}

will be calledfunctional. We summarise these observations in the following theorem:

Theorem 12. Horn-ALCH Iand all of its fragments defined above have finitely generated materialisations. Fur-thermore, there is a polynomial p such that

(i) a generating structureGfor anyHorn-ALCH IKB(T,A)can be constructed in time|A| ·2p(|T |)_;

(14)

(iii) a forward generating structureGfor anyELHdr_⊥ KB(T,A)can be constructed in time|A| ·p(|T |);

(iv) a generating structureGfor anyDL-LiteH

hornKB(T,A)can be constructed in time|A| ·p(|T |);

(v) a functional generating structureGfor anyDL-LitehornKB(T,A)can be constructed in time|A| ·p(|T |).

As a final remark, we note that the generating structuresG=(∆G_,_·G_, _{) defined above can often be simplified.}

For example, in the case of DL-Lite KBs, we can impose the following additional restrictions on the generating relation :

(lite1) ifu ([R],τ) then [R] is≤T-minimal, where [S]≤T [T] iffT |=S vT;

(lite2) if ([R1],τ1) ([R2],τ2) then [R−₂],[R1].

It is easily seen that these simplifications do not affect the proof of Theorem 9 (the branches of the unravelling that are pruned as a result of these restrictions can be homomorphically mapped to other branches; for a more detailed argument, see the proof of Theorem 5 in the full version of [24]). The generating structureG₁in Fig. 2 as well as the generating structures in all our examples from Section 4 are constructed with these extra restrictions in mind.

So far we have only consideredΣ-query entailment becauseΣ-query inseparability can be reduced to twoΣ-query entailment checks. The following result shows that, conversely, one can reduceΣ-query entailment in LogSpaceto Σ-query inseparability, for all DLs considered in this article exceptDL-LitecoreandDL-Litehorn.3

Theorem 13. LetLbe any of our DLs that containsELor has role inclusions. ThenΣ-query entailment of consistent L-KBs isLogSpace-reducible toΣ-query inseparability ofL-KBs.

The proof of Theorem 13 is given in Appendix A and is based on the notions and results introduced in this section: the materialisations of KBs constructed to prove Theorem 12, the normal form of Theorem 8, and the semantic characterisation ofΣ-query entailment given in Theorem 6. The underlying idea is to construct modificationsK0

1and

K0

2of the given KBsK1andK2such thatK1Σ-query entailsK2iffK1andK10∪ K

0

2areΣ-query inseparable. Note that

modifications ofK1andK2are, in general, necessary: letK1 =(∅,{A(a)}),K2 =({AvB},{C(a)}) andΣ = {A,B};

thenK1Σ-query entailsK2butK1does notΣ-query entailK1∪ K2sinceK1∪ K2|=B(a).

4. FiniteΣ-homomorphic Embeddability by Games

In this section, we show that, for a DLLhaving finitely generated materialisations, the problem of checking finite

Σ-homomorphic embeddability between materialisations of KBs can be reduced to the problem of finding a winning strategy in a game played on the generating structures for these KBs.

We begin by giving a brief abstract description of the games we need. Every gameGis played by two players, player 1 and player 2, and defined by a setS of states, a setCof challenges, and two functionsχ: S → 2C _and ρ:S×C → 2S_{, where}_χ₍_s_{) is the set of challenges player 2 can choose from in any state}_s_and_ρ₍_s_,_c_{) is the set of} responses available to player 1 in order to reply to any challengecmade by player 2 in the states. The game starts in an initial states0∈Sand is played in rounds. In each roundi,i>0, the current state issi−1∈S. Ifχ(si−1)=∅, then

player 2 loses. Otherwise, player 2 challenges player 1 by choosingci∈χ(si−1). Ifρ(si−1,ci)=∅, then player 1 loses. Otherwise, player 1 responds withsi∈ρ(si−1,ci), which becomes the current state for the next roundi+1. A play of lengthnstarting froms0∈Sis any sequences0, . . . ,snof states obtained as described above. For any ordinalλ≤ω, we say that player 1 has aλ-winning strategyin the gameGstarting froms0if, for any plays0, . . . ,sn withn < λ that is played according to this strategy, player 1 has a response to any challenge of player 2 in the final statesn. The following proposition can be proved by a straightforward translation of the games introduced above into reachability games and using the known results [32, 33]:

3_{Note that, by Theorems 33 and 32,}_Σ_{-query entailment and inseparability are P-complete for}_DL-Lite

coreandDL-Litehornin both combined

and data complexity.DL-LitecoreandDL-Litehornare omitted from Theorem 13 since we have not found a direct LogSpace-reduction ofΣ-query

(15)

Proposition 14. Given a finite game G = (S,C, χ, ρ)defined above and a states0 ∈ S, it can be checked in time

polynomial in the size ofSandCwhether player 1 has anω-winning strategy froms0.

We now reformulate the definition of finiteΣ-homomorphic embedding in game-theoretic terms. LetM1andM2

be the materialisations for (consistent) KBsK1 andK2, respectively. The states of the gameG_Σ(M2,M1) are of the

form (π7→σ), whereπ∈∆M2_andσ_∈∆M1_{. Intuitively, (}π_7→σ_{) means that ‘}π_{is to be}Σ_{-homomorphically mapped}

toσ’. The game is played by player 1 and player 2 starting from some initial state (π0 7→σ0). The aim of player 1 is

to demonstrate that there exists aΣ-homomorphism from (a finite subinterpretation of)M2intoM1withπ0mapped

toσ0, while player 2 wants to show that there is no such homomorphism. In each roundi>0 of the game, player 2

challenges player 1 with someπi ∈ ∆M2 such that r_ΣM2(πi−1, πi) , ∅. Player 1, in turn, has to respond with some σi∈∆M1such that the already constructed partialΣ-homomorphism can be extended withπi7→σi:

– σi=πi, ifπi∈partM_Σ2,

– tM2

Σ (πi)⊆t M1

Σ (σi) and r M2

Σ (πi−1, πi)⊆r M1

Σ (σi−1, σi);

remember thatpartM2

Σ ⊆ind(K2). It is easy to see that if,

– for anyπ0∈∆M2, there existsσ0∈∆M1such that player 1 has anω-winning strategy in the gameGΣ(M2,M1)

starting from (π0→σ0),

then there exists aΣ-homomorphism fromM₂ intoM₁, and the other way round. ThatM₂ isfinitelyΣ -homomor-phically embeddable intoM1is equivalent to the following condition:

– for anyπ0 ∈ ∆M2 and anyn < ω, there existsσ0 ∈ ∆M1 such that player 1 has ann-winning strategy in the

gameG_Σ(M2,M1) starting from (π0→σ0).

This criterion, however, does not immediately yield any algorithm to decide finiteΣ-homomorphic embeddability because bothM₂ andM₁ can be infinite. Our aim now is to show that the existence ofn-winning strategies for player 1 in this simple infinite gameG_Σ(M₂,M₁) is equivalent to the existence of winning strategies in a more involved game played on the finite generating structures forM₂andM₁. First, in Section 4.1, we replaceM₂with its finite generating structureG₂, in which player 2 can only make challenges indicated by the generating relation Σ₂. ReplacingM₁withG₁is not so easy because player 1 can respond not only in the ‘forward’ direction (according to

Σ

1), but also in the ‘backward’ direction (because the label of Σ2 can be included in the inverse of the label of Σ1).

In the latter case, we have to ensure that all the responses of player 1 stay on the same branch ofM1, which obviously

complicates the game. In Section 4.2, we consider the forward strategies that are suitable for DLs without inverse roles. The general strategies are formulated in Section 4.5. To make the exposition more transparent, we decompose these strategies into backward strategies defined in Section 4.3, and start-bounded ones analysed in Section 4.4.

We require the following notation throughout this section. Suppose a DLLhas finitely generated materialisations. LetK be anL-KB andGits generating structure. For a relational signatureΣ, theΣ-types tG_Σ(w) andrG_Σ(w,w0) of w,w0∈∆Gare defined by:

tG_Σ(w)=Σ_{-concept name}

A|w∈AG , rG_Σ(w,w0)=

            

{Σ-roleR|(w,w0₎_∈_RG_}_, _if_w_,_w0_∈_ind₍_K₎_,

{Σ-roleR|R∈(w,w0)G}, ifw w0,

∅, otherwise,

where (P−₎G _{is the inverse of} _PG_{. We also define ¯}_rG

Σ(w,w0) to contain the inverses of the roles in rGΣ(w,w0). Note that ¯rG(a,b) = rG(b,a), fora,b ∈ ind(K) but, in general, ¯rG_Σ(w,w0) is not the same as rG_Σ(w0,w) as shown by the T−,S−-cycle in Fig. 2. We also write

w Σw0 if w w0 and rG_Σ(w,w0),∅,

that is, ifwgeneratesw0_{with a non-empty}_Σ_{-label of}_w _w0_in_G₍ _{-arrows with empty}_Σ_{-labels are irrelevant for}

Σ-homomorphisms).

(16)

4.1. Infinite Game G_Σ(G₂,M₁)

Thestatesof this game are of the forms_i=(ui7→σi), fori≥0,ui∈∆G2andσi∈∆M1, such that

(s1) t

G2

Σ (ui)⊆t M1

Σ (σi).

The game starts in a states0=(u07→σ0) with

(s0) σ0=u0in caseu0 ∈part

M2

Σ .

In each roundi >0, player 2 challenges player 1 with someui ∈∆G2such thatui−1 Σ2 ui. Player 1 has to respond with aσi∈∆M1satisfying(s1)and

(s2) r

G2

Σ (ui−1,ui)⊆r M1

Σ (σi−1, σi).

This gives the next statesi =(ui 7→ σi). Note that of all theui onlyu0 may be an ABox individual fromind(K2);

however, there is no such a restriction on theσi. As the gameGΣ(G2,M1) is not played on the individuals ofK2, we

need to make sure that the ABox part ofM₂is (Σ,ind(K₂))-homomorphically embeddable into the ABox part ofM₁. Thus, we require an additional condition:

(abox) partM2

Σ ⊆ind(K1) andt

M2

Σ (a)⊆tMΣ1(a) andrMΣ2(a,b)⊆rMΣ1(a,b), for anya,b∈partMΣ 2.

The following theorem gives a game-theoretic flavour to the criterion of Theorem 6.

Theorem 15. (i) M₂ is finitelyΣ-homomorphically embeddable intoM₁ if and only if(abox)and the following condition hold:

(win) for any u0 ∈∆G2and n< ω, there existsσ0 ∈∆M1 such that player 1 has an n-winning strategy in the game

G_Σ(G2,M1)starting from(u07→σ0).

(ii)There exists aΣ-homomorphism fromM₂toM₁if and only if(abox)and the following condition hold:

(ω-win) for any u0 ∈∆G2, there isσ0 ∈∆M1such that player 1 has anω-winning strategy in the game GΣ(G2,M1)

starting from(u07→σ0).

Proof. We only prove (i) and leave (ii) to the reader.

(⇒) SupposeM₂ is finitelyΣ-homomorphically embeddable into M₁. Then(abox)holds by the definition ofΣ -homomorphism. To show that(win)holds, supposeu0 ∈ ∆G2 andn < ωare given. Take a finite subinterpretation

M₀₂ of M₂ that contains σu0, for some (say, the shortest) word σ, and all those elements ofM2 whose distance

fromσu0does not exceedn(M02 also contains all individual names ofM2). Leth:M02 → M1be a (Σ,ind(K2

))-homomorphism. Takeσ0=h(σu0). Clearly,u0andσ0satisfy(s0)and(s1). We show that player 1 has ann-winning

strategy in the gameG_Σ(G2,M1) starting from (u0 7→ σ0). Suppose player 2 picks u0 Σ₂ u1. Then σu0u1 is

an element ofM02, and player 1 responds with σ1 = h(σu0u1). Conditions(s1)and(s2)hold because h is aΣ

-homomorphism. In the same way player 1 useshto respond to all challenges of player 2 in any roundk<nof the gameG_Σ(G2,M1).

(⇐) LetM02be a finite subinterpretation ofM2. We enumerate elements of the domain ofM02in such a way thatσ

appears in the list beforeσ0_whenever_σ0₌_σ_{u, for some}_{u. We define, by induction, a (}_Σ_,_ind₍_K

2))-homomorphism

h:M02 → M1as follows. Letnbe the number of elements in the domain ofM02. Pick the first (in the order described

above) elementσthat has not been mapped toM1yet. There are two possible options.

– Suppose first that there is noσ0∈∆M02 such thatσ=σ0uandtail(σ0) Σ₂ u, for someu. Then, by(win), there

isσ0∈∆M1_{such that player 1 has an}_{n-winning strategy in the game}_G

Σ(G2,M1) starting from (tail(σ)7→σ0).

We seth(σ)=σ0. Note that ifσ=a, for somea∈partM2

Σ , then, by(s0),h(a)=a.

– Otherwise, we consider thelongestsequenceu1, . . . ,uk,k ≥ 1, such thattail(σ0) Σ2 u1 Σ2 · · · Σ2 uk and σm=σ0u1· · ·um∈∆M02, for allm<k, withσ=σk. By the definition of the order,σ0, . . . , σk−1have already

been mapped byh. By construction and(win), player 1 has ann-winning strategy from (tail(σ0) 7→ h(σ0)).

Therefore, player 1 has a responseσ0_{to the challenge}_tail₍_σ

(17)

a

u u0

R

Q

Q R

R

GΣ₂

a

R

Q

MΣ₁

0

10_,₂0_{, . . . ,}_n0

1

2

3

4

n a

u0

u1

u2

v

R−

S−

T− _S−

Q−

GΣ₂

a

σ2

σ3

σ4

σ5

S,Q

T,Q

S,Q

T,Q

S,Q R

R

MΣ₁

0 1

2 20

3

30

4

40

(a) (b)

Figure 3: (a) Example 16:n-winning strategy inG_Σ(G2,M1) from (a7→a). (b) Example 17: 4-winning strategy inG_Σ(G2,M1) from (u07→σ4).

It is readily seen that, by(abox),(s1)and(s2), the constructedhis a (Σ,ind(K2))-homomorphism fromM02toM1. q

Example 16. ConsiderGΣ₂ andMΣ₁ shown in Fig. 3a, whereΣ = {Q,R}. Ann-winning strategy for player 1 in G_Σ(G2,M1) starting from (a7→a) is shown by dotted lines with the rounds of the game indicated by the numbers on

the dotted lines. In the state (a7→ a), player 2 has two possible challenges: a Σ₂ uanda Σ₂ u0. In response to the former, player 1 mapsutoaand the successive challenges to the elements of the chain that begins withRQ(indicated by indices 1,2, . . .). In response to the latter challenge, player 1 mapsu0and all the successive challenges to the same elementa(indices 10,20, . . .). Note that in all but the starting state, player 2 has only one possible challenge.

Example 17. Consider nowGΣ₂andMΣ₁in Fig. 3b, whereΣ ={Q,R,S,T}(see also Example 5). A 4-winning strategy for player 1 inG_Σ(G₂,M₁) starting from (u0 7→ σ4) is shown in Fig. 3b by dotted lines (again, rounds of the game

are indicated by the numbers). In contrast to Example 16, where player 1 either stays in the ABox or always moves away from it, the winning strategy for player 1 now is to move in the opposite direction, towards the ABox. (Note that in round 2, player 2 has two possible challenges,u1 Σ₂ u2andu1 Σ₂ v.) In fact, for anyn>0, player 1 has an

n-winning strategy starting from any (u07→σm) provided thatmis even andm≥n.

The criterion of Theorem 15 does not seem to be a big improvement on Theorem 6 as we still have to deal with an infinite materialisation. Note that, for some DLs such asEL,Horn-ALCandDL-Litehorn, it is enough to play the same game as defined above but on thefinitegenerating structuresG2andG1. We denote this na¨ıve reformulation of

G_Σ(G2,M1)—in whichσiandM1are replaced withwiandG1, respectively—byGn_Σ(G2,G1) and invite the reader to

prove that, in the case of, sayDL-Litehorn, Theorem 15 will continue to hold if we replace(win)with the following condition, which can be checked in polynomial time inO(|G2| × |G1|): for anyu0 ∈∆G2, there existsw0 ∈∆G1 such

that player 1 has anω-winning strategy in the gameGn_Σ(G₂,G₁) starting from (u0 7→w0). (We shall obtain this result

later as a consequence of a more general theorem.) Unfortunately, the existence of anω-winning strategy in this na¨ıve game does not implyΣ-homomorphic embeddability ofM₂intoM₁for DLs such asDL-LiteH_coreorHorn-ALCI.

In the remainder of this section, we show that condition(win)in the infinite gameG_Σ(G₂,M₁) can be checked by analysing a much more complex game on the finite generating structuresG₂ andG₁. We consider four types of strategies inG_Σ(G₂,M₁): forward, backward, start-bounded and general. For each strategy type,θ, we define a game Gθ_Σ(G2,G1) such that the following conditions are equivalent:

(win-θ) for anyu0 ∈∆G2 andn< ω, there isσ0∈∆M1such that player 1 has ann-winningθ-strategy in the infinite

[image:17.595.91.521.114.318.2]

(18)

a

u

v

v0

u0

R

Q

Q R

R

GΣ₂

a

w

w0 R

R

Q

GΣ₁

0

10_,₂0_{, . . .}

1

2

3,4, . . .

a7→a

u0_7→

a u7→a

v7→aw

v0_7→

aww0

v0_7→

aww0

w0

a u0 _a _u

u v

v v0

v0

u0

[image:18.595.124.487.109.293.2]

(a) (b)

Figure 4: The forward gameGf

Σ(G2,G1) from (a7→a) in Example 18: (a) anω-winning strategy for player 1; (b) the infinite graphTfor extracting

ω-winning strategies.

(ω-winθ) for anyu0∈∆G2, player 1 has anω-winning strategy in the finite gameGθ_Σ(G2,G1) starting from some state

depending onu0andθ.

We begin by considering ‘forward’ winning strategies (such as in Example 16) that are sufficient for the DLs without inverse roles.

4.2. Forward Strategy and Game G_Σf(G₂,G₁)

We say that aλ-strategy (λ≤ω) for player 1 in the gameG_Σ(G2,M1) isforwardif, for any play of lengthi−1< λ,

which conforms with this strategy, and any challengeui−1 Σ₂ uiby player 2, the responseσiof player 1 is such that eitherσi−1, σi ∈ ind(K1) orσi =σi−1w, for somew ∈ ∆G1. For instance, if the generating structuresGi,i =1,2, are forward theneverystrategy inG_Σ(G2,M1) is forward, and so(win)coincides with(win-f). By Theorem 12 (ii)

and (iii), this is the case forHorn-ALCH,Horn-ALC,ELHdr_⊥ andEL.

The existence of a forwardλ-winning strategy for player 1 inG_Σ(G2,M1) is equivalent to the existence of aλ

-winning strategy in thegame G_Σf(G₂,G₁) whose states, initial states, challenges of player 2 and responses of player 1 are defined in the table below:

forward gameG_Σf(G₂,G₁)

states,i≥0 (ui7→wi) withui∈∆G2,wi∈∆G1

andtG2

Σ (ui)⊆t G1

Σ (wi)

initial state (u07→w0) such thatw0=u0in caseu0∈partM_Σ2

challenges,i>0 ui−1 Σ₂ ui

responses,i>0 wisuch that eitherwi−1 1wiorwi−1,wi∈ind(K1)

andrG2

Σ (ui−1,ui)⊆r G1

Σ (wi−1,wi) Note again that of alluionlyu0may belong toind(K2).

Example 18. ConsiderGΣ₂andG₁Σshown in Fig. 4a, whereGΣ₁is a generating structure that can be unravelled intoMΣ₁ from Example 16. It is not hard to see that, for anyu0∈∆G2, there isw0∈∆G1 such that player 1 has anω-winning

strategy inG_Σf(G₂,G₁) starting from (u07→w0). Such a strategy starting from (a7→a) is depicted by dotted lines.