Log-Linear Description Logics

(1)

Log-Linear Description Logics

Mathias Niepert

and

Jan Noessner

and

Heiner Stuckenschmidt

KR & KM Research Group

Universit¨at Mannheim

Mannheim, Germany

Abstract

Log-linear description logics are a family of prob-abilistic logics integrating various concepts and methods from the areas of knowledge representa-tion and reasoning and statistical relarepresenta-tional AI. We deﬁne the syntax and semantics of log-linear de-scription logics, describe a convenient represen-tation as sets of ﬁrst-order formulas, and discuss computational and algorithmic aspects of proba-bilistic queries in the language. The paper con-cludes with an experimental evaluation of an im-plementation of a log-linear DL reasoner.

1 Introduction

Numerous real-world problems require the ability to handle both deterministic and uncertain knowledge. Due to differ-ences in epistemological commitments made by their particu-lar AI communities, researchers have mostly been concerned with either one of these two types of knowledge. While the representation of purely logical knowledge has been the focus of knowledge representation and reasoning, reasoning about knowledge in the presence of uncertainty has been the ma-jor research theme of the machine learning and uncertainty in AI communities. Nevertheless, there have been some at-tempts to combine the diverse concepts and methods by sev-eral researchers. The resulting approaches include proba-bilistic description logics [Jaeger, 1994; Kolleret al., 1997; Lukasiewicz and Straccia, 2008] and statistical relational languages [Getoor and Taskar, 2007] to name but a few. While the former have mainly been studied on a theoretical level, statistical relational languages have been proven use-ful for a number of practical applications such as data inte-gration [Niepert et al., 2010] and NLP [Riedel and Meza-Ruiz, 2008]. A line of work that has regained importance with the use of semantic web languages is the combination of probabilistic models with description logics [Lukasiewicz and Straccia, 2008]. Probabilistic description logic pro-grams [Lukasiewicz, 2007], for instance, combine descrip-tion logic programs under the answer set and well-founded semantics with independent choice logic [Poole, 2008].

In this paper, we represent and reason about uncertain knowledge by combining log-linear models [Koller and Friedman, 2009] and description logics [Baaderet al., 2003].

Log-linear models allow us to incorporate both probabilis-tic and determinisprobabilis-tic dependencies between description logic axioms. The ability to integrate heterogeneous features make log-linear models a commonly used parameterization in ar-eas such as NLP and bioinformatics and a number of sophis-ticated algorithms for inference and parameter learning have been developed. In addition, log-linear models form the ba-sis of some statistical relational languages such as Markov logic [Richardson and Domingos, 2006].

The logical component of the presented theory is based on description logics for which consequence-driven reasoning is possible [Kr¨otzsch, 2010; Baaderet al., 2005]. We focus par-ticularly on the description logic EL++ which captures the expressivity of numerous ontologies in the medical and bi-ological sciences and other domains. EL++ is also the de-scription logic on which the web ontology language proﬁle OWL 2 EL is based [Baaderet al., 2005]. Reasoning services such as consistency and instance checking can be performed in polynomial time. It is possible to express disjointness of complex concept descriptions as well as range and domain restrictions [Baaderet al., 2008]. In addition, role inclusion axioms (RIs) allow the expression of role hierarchiesr s

and transitive rolesr◦rr.

In real-world applications, uncertainty occurs often in form ofdegrees of confidenceortrust. The semantic web commu-nity, for instance, has developed numerous data mining al-gorithms to generate confidence values for description logic axioms with ontology learning and matching being two prime applications. Most of these confidence values have no clearly defined semantics. Confidence values based on lexical sim-ilarity measures, for instance, are in widespread use while more sophisticated algorithms that generate actual probabil-ities make often na¨ıve assumptions about the dependencies of the underlying probability distribution. Hence, formalisms and inference procedures are needed that incorporate degrees of confidence in order to represent uncertain axioms and to compute answers to probabilistic queries while utilizing the logical concepts ofcoherencyandconsistency.

To respond to this need, we introduce log-linear descrip-tion logics as a novel formalism for combining deterministic and uncertain knowledge. We describe a convenient represen-tation of log-linear description logics that allows us to adapt existing concepts and algorithms from statistical relational AI to answer standard probabilistic queries. In particular, we for-Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence

(2)

Name Syntax Semantics top ΔI bottom ⊥ ∅ nominal {a} {aI} conjunction CD CI∩DI existential _∃_r.C {x∈ΔI|∃y∈ΔI : restriction (x, y)∈rI∧y∈CI} GCI CD CI ⊆DI RI r₁◦...◦r_kr r₁I◦...◦r_kI⊆rI

Table 1: The DLEL++without concrete domains. mulate maximum a-posteriori queries and present an efﬁcient algorithm that computes most probablecoherent models, a reasoning service not supported by previous probabilistic de-scription logics. We conclude the paper with an experimental evaluation of ELOG, our implementation of a log-linear de-scription logic reasoner.

2 Description Logics

Description logics (DLs) are a family of formal knowledge representation languages. They provide the logical formalism for ontologies and the Semantic Web. We focus on the partic-ular DLEL++without concrete domains, henceforth denoted asEL++ [Baaderet al., 2005]. We conjecture that the pre-sented ideas are also applicable to other description logics for which materialization calculi exist [Kr¨otzsch, 2010].

Concept and role descriptions inEL++are deﬁned recur-sively beginning with a setNCof concept names, a setNRof

role names, and a set NI of individual names, and are built with the constructors depicted in the column “Syntax” of Ta-ble 1. We writeaandbto denote individual names;randsto denote role names; andC andDto denote concept descrip-tions. The semantics are deﬁned in terms of an interpretation I = (ΔI_,_·I₎_where_ΔI_{is the non-empty}_domain_{of the} in-terpretation and·Iis the interpretation function which assigns to everyA∈NCa setAI ⊆ΔI, to everyr∈NRa relation

rI⊆ΔI×ΔI, and to everya∈NIan elementaI ∈ΔI.

A constraint box (CBox) is a ﬁnite set of general concept inclusion (GCIs) and role inclusion (RIs) axioms. Given a CBoxC, we useBC_C to denote the set ofbasic concept de-scriptions, that is, the smallest set of concept descriptions consisting of the top concept , all concept names used in C, and all nominals{a}appearing inC. A CBoxCis in nor-mal form if all GCIs have one of the following forms, where

C1, C2∈BCC andD∈BCC∪ {⊥}:

C1D; C1 ∃r.C2;

C1C2D; ∃r.C1D;

and if all role inclusions are of the formrsorr1◦r2s.

By applying a ﬁnite set of rules and introducing new concept and role names, any CBoxCcan be turned into a normalized CBox of size polynomial inC. For anyEL++ CBoxC we writenorm(C)to denote the set of normalized axioms that result from the application of the normalization rules toC.

An interpretationI satisﬁes an axiom c if the condition in the column “Semantics” in Table 1 holds for that axiom.

An interpretationI is amodelof a CBoxCif it satisﬁes ev-ery axiom in C. A conceptC is subsumed by a conceptD

with respect to a CBoxC, writtenC C D, ifCI ⊆DIin every model ofC. A normalized CBox isclassiﬁedwhen sub-sumption relationships betweenallconcept names are made explicit. A CBoxCiscoherentif for all concept namesCin Cwe have thatC C⊥. For every axiomcand every set of axiomsC, we writeC |=cif every model ofCis also a model of{c}and we writeC |=CifC |=cfor everyc ∈ C. For a ﬁnite setNU⊆NC∪NRof concept and role names theset of all normalized axioms constructible fromNUis the union of

(a) all normalized GCIs constructible from concept and role names in NU and the top and bottom concepts; and (b) all

normalized RIs constructible from role names inNU.

3 Log-Linear Models

Log-linear models are parameterizations of undirected graph-ical models (Markov networks) which are central to the areas of reasoning under uncertainty and statistical relational learn-ing [Koller and Friedman, 2009; Getoor and Taskar, 2007]. A Markov networkMis an undirected graph whose nodes represent a set of random variables{X1, ..., Xn}and whose

edges model direct probabilistic interactions between adja-cent nodes. A distribution P is a log-linear model over a Markov networkMif it is associated with:

• a set of features{f1(D1), ..., fk(Dk)}, where eachDi

is a clique inMand eachfiis a function fromDitoR, • a set of real-valued weightsw1, ..., wk, such that

P(X1, ..., Xn) = 1 Zexp _k i=1 wifi(Di) ,

whereZis a normalization constant.

Arguably one of the more successful statistical relational languages is Markov logic [Richardson and Domingos, 2006] which can be seen as a ﬁrst-order template language for log-linear models with binary variables. A Markov logic network is a ﬁnite set of pairs(Fi, wi),1 ≤ i ≤ k, where eachFi

is a ﬁrst-order formula and eachwia real-valued weight

as-sociated withFi. With a ﬁnite set of constantsCit deﬁnes

a log-linear model over possible worlds. A possible world

xis a truth assignments to ground atoms (predicates without variables) with respect toC. Each variableXj,1 ≤j ≤n,

corresponds to a ground atom and featureficorresponds to

the number of true groundings ofFiin possible worldx.

4 Log-Linear Description Logics

Log-linear description logics integrate description logics with probabilistic log-linear models. While the syntax of log-linear description logics is that of the underlying description logic, it is possible (but not necessary) to assign real-valued weights to axioms. The semantics is deﬁned by a log-linear probability distribution overcoherentontologies. In the re-mainder of the paper, we focus on the log-linear description logicEL++ without nominals and concrete domains1_which we denote asEL++-LL.

1

We conjecture that the theory is extendable to capture nominals and concrete domains. However, we leave this to future work.

(3)

4.1 Syntax

The syntax of log-linear description logics is equivalent to the syntax of the underlying description logic except that it is possible to assign weights to GCIs and RIs. More formally, a EL++-LL CBox C = (CD,CU) is a pair consisting of a deterministicEL++CBoxCDand anuncertainCBoxCU = {(c, wc)}which is a set of pairs(c, wc)with eachcbeing a EL++_{axiom and}_w_{a real-valued weight assigned to}_c_{. Given}

aEL++-LLCBoxC we use BC_C to denote the set of basic concept descriptions occurring inCDorCU.

While the deterministic CBox contains axioms that are known to be true, the uncertainCBox contains axioms for which we only have adegree of conﬁdence. Intuitively, the greater the weight of an axiom the more likely it is true. Ev-ery axiom can either be part of the deterministic or the uncer-tain CBox but not both. The deterministic CBox is assumed to be coherent.

4.2 Semantics

The semantics of log-linear DLs is based on joint probability distributions overcoherentEL++ CBoxes. The weights of the axioms determine the log-linear probability distribution. For aEL++-LLCBox(CD,CU)and a CBoxCover the same set of basic concept descriptions and role names, we have that

P(C) = ⎧ ⎨ ⎩ 1 Zexp {(c,wc)∈CU:C|=c}wc _if_C_is_coherent andC |=CD; 0 otherwise

whereZis the normalization constant of the log-linear proba-bility distributionP. The semantics of the log-linear descrip-tion logicEL++-LLleads to exactly the probability distribu-tions one would expect under the open world semantics of description logics as demonstrated by the following example.

Example 1. Let BC_C = {C, D}, let CD = ∅, and let CU ₌ _{

C D,0.5,CD ⊥,0.5}.Then2_,_P₍_{_C

D, C D ⊥}) = 0, P({C D}) = Z−1exp(0.5),

P({C D, D C}) = Z−1exp(0.5),_P({_C _D ⊥ }) = Z−1exp(0.5), _P({_D _C}) = _Z−1exp(0), and

P(∅) =Z−1exp(0)withZ= 3 exp(0.5) + 2 exp(0).

4.3 Herbrand Representation

In order to formalize probabilistic queries for log-linear de-scription logics we represent EL++-LL CBoxes as sets of ﬁrst-order sentences modeling the uncertain and deterministic axioms. We begin by deﬁning some basic concepts.

Deﬁnition 2. Let_Pbe a set of predicate symbols and_Cbe a set of constant symbols. An atom is an expression of the form

p(t1, ..., tn)wherep∈Pandt1, ..., tnare variables or

con-stant symbols. A literal is an atom or its negation. LetSbe a set of ﬁrst-order formulas built inductively from the literals, with universally quantiﬁed variables only. A grounding of a formula is obtained by replacing each variable with a con-stant symbol in_C. The Herbrand base ofSwith respect to_C is the set of ground atoms whose predicate symbols occur in

2

We omit trivial axioms that are present in every classiﬁed CBox such asC andCC.

F1 ∀c:sub(c, c)

F2 ∀c:sub(c,)

F3 ∀c, c, d:sub(c, c)∧sub(c, d)⇒sub(c, d)

F4 ∀c, c1, c2, d:sub_int(c, c₍_c1)∧sub(c, c2)∧ 1, c2, d)⇒sub(c, d)

F5 ∀c, c, r, d:sub(c, c)∧rsup(c, r, d)⇒rsup(c, r, d)

F6 ∀c, r, d, d

_{, e}_:_rsup₍_{c, r, d}₎_∧_sub₍_{d, d}₎_∧

rsub(d, r, e)⇒sub(c, e) F7 ∀c, r, d, s:rsup(c, r, d)∧psub(r, s)⇒rsup(c, s, d)

F8 ∀c, r1, r2, r3, d, e:_pcomrsup(₍c, r_r 1, d)∧rsup(d, r2, e)∧ 1, r2, r3)⇒rsup(c, r3, e)

F9 ∀c:¬sub(c,⊥)

Table 2: The set of first-order formulasF. Groundings of the formulas have to be compatible with the types of the predi-cates specified in Definition 3.⊥andare constant symbols representing the bottom and top concept.

S. Each subset of the Herbrand base is aHerbrand interpre-tationspecifying which ground atoms are true. A Herbrand interpretation_His aHerbrand modelofS, written as|=_HS, if and only if it satisﬁes all groundings of formulas inS.

The set of formulasFlisted in Table 2 is partially derived from theEL++ completion rules [Baaderet al., 2005]. We are now in the position to deﬁne a bijective functionϕthat, given a ﬁnite set of concept and role namesNU, maps each

normalizedEL++-LLCBox overNUto a subset of the Her-brand base ofFwith respect toNU.

Deﬁnition 3 (CBox Mapping). Let NC and NR be sets of concept and role names and letNU⊆NC∪NRbe a ﬁnite set.

Let T be the set of normalizedEL++ axioms constructible fromNU. Moreover, letHbe the Herbrand base ofF with

respect toNU. The functionϕ:℘(T)→℘(H)maps

normal-ized CBoxes to subsets ofHas follows:(_ϕ(C) =_c_∈C_ϕ(_c))

C1D → sub(C1, D) C1C2D → int(C1, C2, D) C1 ∃r.C2 → rsup(C1, r, C2) ∃r.C1D → rsub(C1, r, D) rs → psub(r, s) r1◦r2r3 → pcom(r1, r2, r3).

All predicates are typed meaning thatr, s, ri,(1 ≤ i ≤ 3),

are role names, C1, C2 basic concept descriptions, and D

basic concept descriptions or the bottom concept.

We now prove that, relative to a finite set of concept and role names, the functionϕinduces a one-to-one correspon-dence between Herbrand models of the first-order theory F and coherent and classifiedEL++CBoxes.

Lemma 4. LetNCandNRbe sets of concept and role names and letNU ⊆ NC ∪NR be a ﬁnite set. LetT be the set of

normalizedEL++ axioms constructible fromNUand letH

be the Herbrand base ofFwith respect toNU. Then,

(a) for anyC ⊆ T we have that ifCis classiﬁed and coher-ent then_ϕ(C)is a Herbrand model ofF; and

(b) for any_H ⊆ Hwe have that if_H is a Herbrand model ofFthen_ϕ−1(_H)is a classiﬁed and coherent CBox.

(4)

From Lemma 4 we know that, relative to a ﬁnite setNUof

concept and role names, each normalized CBox overNUthat

is classiﬁed and coherent, corresponds to exactly one Her-brand model ofF. We extend the normalization of EL++ CBoxes toEL++-LLCBoxes as follows.

Deﬁnition 5. LetC= (CD,CU)be aEL++-LLCBox. Then, normLL(C) =norm(CD)∪₍_c,w_c₎_∈CUnorm({c}).

Lemma 4 provides the justiﬁcation for constructing the logical representation of aEL++-LLCBox as follows: G is a set ofweightedground formulas carrying the uncertain in-formation and it is derived from the axioms in the uncertain CBoxCUas follows. For every pair(c, wc)∈ CUwe add the

conjunction of ground atoms

g∈ϕ(norm({c}))

g

toGwith weightwc. The setK is constructed analogously

from the deterministic CBoxCDexcept that we do not asso-ciate weights with the ground formulas.3

Example 6. LetCD={CD}andCU={BCD E,0.5}. Then,normLL(C) =norm({CD})∪norm({B

CD E}) = {C D, C D A, B A E}

with_A a new concept name. We have that_ϕ(norm({_C

D})) = {sub(C, D)}and_ϕ(norm({_B_C_D _E})) = {int(C, D, A), int(B, A, E)}. Hence, we addsub(C, D)to Kand_int(_{C, D, A})∧_int(_{B, A, E})with weight0_.5toG.

4.4 Maximum A-Posteriori Inference

Under the given syntax and semantics the ﬁrst central infer-ence task is the maximum a-posteriori (MAP) query:“Given aEL++-LLCBox, what is a most probable coherentEL++ CBox over the same concept and role names?”In the context of probabilistic description logics, the MAP query is crucial as it infers amost probable classical ontology from a prob-abilistic one. The MAP query also captures two important problems that frequently occur in the context of the Semantic Web: Ontology learning and ontology matching.

The following theorem combines the previous results and formulates the EL++-LL MAP query as a maximization problem subject to a set of logical constraints.

Theorem 7. LetC = (CD,CU)be aEL++-LLCBox, letNU

be the set of concept and role names used innormLL(C), and

letHbe the Herbrand base ofFwith respect toNU. More-over, letKbe the set of ground formulas constructed fromCD and letGbe the set of weighted ground formulas constructed fromCU. Then, with

ˆ H:= arg max {H⊆H:|=H(K∪F)} {(G,wG)∈G:|=HG} wG (1) we have thatϕ−1( ˆH)is amost probable coherentCBox over NUthat entailsCD.

By showing that every partial weighted MAX-SAT prob-lem can be reduced to an instance of theEL++ MAP query

3

Note that the number of formulas inG andKis equal to the number of axioms inCUandCD, respectively.

problem of Theorem 7 in polynomial time and by invoking the complexity results for the former problem [Creignouet al., 2001] we can prove the following theorem.

Theorem 8. The maximum a-posteriori problem forEL++ -LLCBoxes is NP-hard and APX-complete.

Conversely, we can solve a MAP query by reducing it to a (partial) weighted MAX-SAT instance with a set of clauses polynomial in the number of axiomsandbasic con-cept descriptions. There exist several sophisticated algo-rithms for weighted MAX-SAT. We developed ELOG4_{, an} efﬁcientEL++-LLreasoner based on integer linear program-ming (ILP). LetFNU_{be the set of all groundings of formulas} inFwith respect to the set of constant symbolsNU. For each

ground atomg_i occurring in a formula inG,K, orFNU _we associate a unique variablexi∈ {0,1}. LetCjGbe the set of

indices of ground atoms in sentenceGj ∈ G, letCjK be the

set of indices of ground atoms in sentenceKj ∈ K, and let

C_jF( ¯C_jF)be the set of indices of unnegated (negated) ground atoms of sentenceFj ∈ FNU in clausal form. With each

for-mulaGj ∈ Gwith weightwj(without loss of generality we assume non-negative weights) we associate a unique variable

zj ∈ {0,1}. Then, the ILP is stated as follows max |G| j=1 wjzj subject to i∈CG j xi≥ |CjG|zj, ∀j (2) and i∈CK j xi≥ |CjK|, ∀j (3) and i∈CF j xi+ i∈C¯F j (1−xi)≥1, ∀j (4).

Every state of the variablesxi of a solution for the ILP cor-responds (via the functionϕ) to a most probable classiﬁed and coherent CBox overNUthat entailsCD. Adding all

con-straints of type (4) at once, however, would lead to a large and potentially intractable optimization problem. Therefore, we employ a variant of the cutting plane inference algorithm first proposed in the context of Markov logic [Riedel, 2008]. We first solve the optimization problemwithoutconstraints of type (4). Given a solution to the partial problem we deter-mine in polynomial time those constraints of type (4) that are violated by this solution, add those to the formulation, and solve the updated problem. This is repeated until no violated constraints remain. Please note that the MAP query derives a most probable CBoxandclassifies it at the same time.

4.5 Conditional Probability Inference

The second type of inference task is the conditional probabil-ity query:“Given aEL++-LLCBox, what is the probability of a conjunction of axioms?”More formally, given aEL++ -LLCBoxCand a setCQof normalizedEL++-LLaxioms over the same concept and role names, theconditional probability queryis given by P(CQ | C) = _{C_:_CQ_⊆C_}P(C)where

eachCis a classiﬁed and normalized CBox over the same set of basic concept descriptions and role names.

4

(5)

Algorithm 1 MC-ILPforEL++-LL

Input: weightswjand variableszjof the MAP ILP

1: _c_j←_w_j for allj

2: _x(0)←solution of ILP with allwjset to0

3: fork←1tondo

4: set allzjand allwjto 0

5: for all_z_jsatisﬁed byx(k−1)do

6: with probability1−e−cj _ﬁx_z

jto1in ILP(k)

7: end for

8: for all_z_jnot ﬁxed to1do

9: with probability0.5setwjto1in ILP(k)

10: end for

11: x(k)←solution of ILP(k)

12: end for

Determining the exact conditional probability is infeasible in the majority of use-cases. Hence, for the reasoner ELOG4, we developed a Markov chain Monte Carlo (MCMC) variant similar to MC-SAT, a slice sampling MCMC algorithm [Poon and Domingos, 2006]. Simpler sampling strategies such as Gibbs sampling are inadequate due to the presence of deter-ministic dependencies. Poon and Domingos showed that the Markov chain generated by MC-SAT satisﬁesergodicityand detailed balance. In practice, however, it is often too time-consuming to obtain uniform samples as required by MC-SAT and, thus, we loosen the uniformity requirement in favor of higher efﬁciency.

Algorithm 1 lists the pseudo-code of MC-ILP which is based on the previous MAP ILP except that we also add, for each query axiom not inCU, a constraint of type (2) associated with a new variablezj and weight0. In each iteration, after fixing certain variables to1 (line 6), we build an ILP where the coefficients of the objective function are set to 1 with probability0.5and remain0otherwise (line 9). The solution is thenas close as possibleto the uniform sample. MC-ILP samples coherent and classified CBoxes from the joint distri-bution and determines the probability of a conjunction of ax-ioms as the fraction of samples in which they occur together.

5 Experiments

To complement the presented theory we also assessed the practicality of the reasoning algorithms. After all, the devel-opment of the theory was motivated primarily by the need for algorithms that, given a set of axioms with conﬁdence values, compute a most probablecoherentontology. Two examples of this problem are ontology learning and ontology matching. Due to space considerations, we only discuss and evaluate the results on an instance of the former problem.

The ontology learning community has developed and ap-plied numerous machine learning and data mining algorithms to generate confidence values for DL axioms. Most of these confidence values have no clearly defined semantics. Con-fidence values based on lexical similarity measures, for in-stance, are in widespread use while more sophisticated al-gorithms that generate actual probabilities make often na¨ıve assumptions about the dependencies of the underlying prob-ability distribution. Hence, formalisms are needed that

in-Axiom type Algorithm Precision Recall F1score

Subsumption Greedy 0.620 0.541 0.578

EL++_-LL_MAP _0.784 _0.514 _0.620

Disjointness Greedy 0.942 0.980 0.961

EL++_-LL_MAP _0.935 _0.990 _0.961

Figure 1: Results for weighted axioms derived from the AMT questionnaireswithoutadditional known axioms.

Axiom type Algorithm Precision Recall F1score

Subsumption Greedy 0.481 0.669 0.559

EL++_-LL_MAP _0.840 _0.568 _0.677

Disjointness Greedy 0.948 0.960 0.954

EL++_-LL_MAP _0.937 _0.992 _0.964

Figure 2: Results for weighted axioms derived from the AMT questionnaireswithadditional known axioms.

corporate these various types of conﬁdence values in order to compute most probable ontologies while utilizing the logical concepts of coherency and consistency.

We decided to generate conﬁdence values using a “crowd-sourcing” service. Probably the best known crowdsourc-ing platform is theAmazon Mechanical Turk(AMT)5_{. With} AMT, Amazon offers numerous options for designing cus-tomized questionnaires. Due to its relatively high publicity (about 100,000 tasks were available at the time of this writ-ing), it attracts a lot of users and consequently seems most suitable for our scenario.

For the evaluation of the different methods, we used the EKAW ontology as the gold standard. It consists of 75 classes, 33 object properties, 148 subsumption, and 2,299 dis-jointness axioms when materialized, and models the domain of scientiﬁc conferences. We generated HITs (human intelli-gence tasks) by creating questionnaires each with 10 yes/no questions. Half of these were used to generate conﬁdence values for subsumption (disjointness) axioms. For the pair of class labelsConference PaperandPoster Session, for in-stance, the two types of yes/no questions were:

(a) Is everyConference Paperalso aPoster Session? (b) Can there be anything that is both a Conference Paper

and aPoster Session?

For each pair of classesandfor each type of question we ob-tained 9 responses fromdifferentAMT workers. The confi-dence value for a subsumption (disjointness) axiom was com-puted by dividing the number of “yes” (“no”) answers by 9. We applied a threshold of 0.5, that is, only when the major-ity of the 9 workers answered with “yes” (“no”) did we as-sign a confidence value to the axiom. Moreover, we halved the weights of the disjointness axioms for reasons of sym-metry. This resulted in 2,507 axioms (84 subsumption and 2,423 disjointness) with confidence values. Based on these axioms we constructed two different sets. One consisting of the weighted axioms only and the other with 79additional axioms implied by the gold standard ontology which are of

5

(6)

the form r s (7axioms), C ∃r.D (40axioms), and

∃r.C D(32axioms).

We compared theEL++-LLreasoner ELOG with a greedy approach that is often employed in ontology learning sce-narios. The greedy algorithm sorts the axioms in descend-ing order accorddescend-ing to their conﬁdence values and adds one axiom at a time to an initially empty ontology. How-ever, it adds an axiom only if it does not render the sulting ontology incoherent. To compute precision and re-call scores we materialized all subsumption and disjoint-ness axioms of the resulting ontology. We used the rea-soner Pellet for the materialization of axioms and the coher-ence checks. All experiments were conducted on a desk-top PC with Intel Core2 Processor P8600 with 2.4GHz and 2GB RAM. The reasoner ELOG, all ontologies, supplemen-tal information, and the experimensupplemen-tal results are available at

http://code.google.com/p/elog-reasoner/.

ELOG’s MAP inference algorithm needs on average3.7

seconds and the greedy algorithm33.6seconds for generat-ing the two coherent ontologies. Note that the3.7 seconds include the time to classify the ontology. ELOG’s cutting plane inference method needed 6 and7 iterations, respec-tively. These results indicate that the MAP inference algo-rithm under the EL++-LL semantics is more efﬁcient than the greedy approach for small to medium sized ontologies. Figures 1 and 2 depict the recall, precision, andF1scores for

both algorithms. TheF1 score for the ontologies computed usingEL++-LLMAP inference is, respectively,5% and12% higher than for the one created using the greedy algorithm. Furthermore, while the addition of known axioms leads to an

F1score increase of about5% under theEL++-LLsemantics

it decreases for the greedy algorithm.

The MC-ILP algorithm needs between0.4and0.6seconds to compute one MCMC sample for eitherEL++-LLCBox. We ﬁnd this to be an encouraging result providing further evidence that MCMC inference can be applied efﬁciently to compute conditional probabilities for conjunctions of axioms over real-worldEL++-LLontologies.

6 Conclusion and Future Work

With this paper we introduced log-linear description logics as a family of probabilistic logics integrating several concepts from the areas of knowledge representation and reasoning and statistical relational AI. The combination of description log-ics and log-linear models allows one to compute probabilistic queries efﬁciently with algorithms originally developed for typical linear optimization problems. We focused on the de-scription logicEL++without nominals and concrete domains but we believe that the theory is extendable to other DLs for which materialization calculi exist.

Future work will be concerned with the inclusion of nom-inals and concrete domains, negated axioms in the determin-istic CBox, and the design of algorithms for creating more compact representations of classiﬁed CBoxes.

Acknowledgments

Many thanks to Johanna V¨olker, Anne Schlicht, and the anonymous IJCAI reviewers for their valuable feedback.

References

[Baaderet al., 2003] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook. Cambridge University Press, 2003.

[Baaderet al., 2005] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing theELenvelope. InProceedings of IJCAI, 2005.

[Baaderet al., 2008] Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing theELenvelope further. In Pro-ceedings of OWLED, 2008.

[Creignouet al., 2001] Nadia Creignou, Sanjeev Khanna, and Madhu Sudan. Complexity classiﬁcations of boolean constraint satisfaction problems. SIAM, 2001.

[Getoor and Taskar, 2007] Lise Getoor and Ben Taskar. In-troduction to Statistical Relational Learning. MIT Press, 2007.

[Jaeger, 1994] M. Jaeger. Probabilistic reasoning in termino-logical logics. InProceedings of KR, 1994.

[Koller and Friedman, 2009] Daphne Koller and Nir Fried-man. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

[Kolleret al., 1997] Daphne Koller, Alon Levy, and Avi Pf-effer. P-classic: A tractable probabilistic description logic. InProceedings of AAAI, 1997.

[Krötzsch, 2010] Markus Krötzsch. Efficient inferencing for OWL EL. InProceedings of JELIA, 2010.

[Lukasiewicz and Straccia, 2008] Thomas Lukasiewicz and Umberto Straccia. Managing uncertainty and vagueness in description logics for the semantic web.J. of Web Sem., 6, 2008.

[Lukasiewicz, 2007] Thomas Lukasiewicz. Probabilistic de-scription logic programs.Int. J. Appr. Reas., 45, 2007. [Niepertet al., 2010] M. Niepert, C. Meilicke, and H.

Stuck-enschmidt. A Probabilistic-Logical Framework for Ontol-ogy Matching. InProceedings of AAAI, 2010.

[Poole, 2008] David Poole. The independent choice logic and beyond. InProbabilistic inductive logic programming. Springer-Verlag, 2008.

[Poon and Domingos, 2006] Hoifung Poon and Pedro Domingos. Sound and efﬁcient inference with probabilis-tic and determinisprobabilis-tic dependencies. In Proceedings of AAAI, 2006.

[Richardson and Domingos, 2006] Matthew Richardson and Pedro Domingos. Markov logic networks.Machine Learn-ing, 62(1-2), 2006.

[Riedel and Meza-Ruiz, 2008] Sebastian Riedel and Ivan Meza-Ruiz. Collective semantic role labelling with markov logic. InProceedings of CoNLL, 2008.

[Riedel, 2008] Sebastian Riedel. Improving the accuracy and efﬁciency of map inference for markov logic. In Proceed-ings of UAI, 2008.