Miika Hannula1[0000−0002−9637−6664]and Jonni Virtema2,3[0000−0002−1582−3718]
1 University of Helsinki, Finland [email protected] 2 Hokkaido University, Japan, [email protected]
3 Hasselt University, Belgium
Abstract. We study an adaptation of inclusion logic to probabilistic team seman- tics which is a novel framework for studying logical and probabilistic dependencies simultaneously. In terms of its computational properties we show that the data complexity of probabilistic inclusion logic is in polynomial time. We also consider probabilistic inclusion logic extended with dependence atoms, and show that this logic is strictly less expressive than probabilistic independence logic but captures a natural additive fragment of existential second-order logic, which in turn col- lapses to non-deterministic polynomial time over sentences. We also investigate the axiomatic properties of marginal identity atoms, and compare our findings to the axiomatization of inclusion dependencies well known in database literature.
1 Introduction
Team semantics is the semantical framework of modern logics of dependence and independence. Introduced by Hodges [12] and adapted to dependence logic by V¨a¨an¨anen [16], team semantics defines truth in reference to collections of assignments, called teams. Thus team semantics is particularly suitable for a formal analysis of properties, such as the functional dependence between two variables, which only arise in the presence of multiple assignments. In the past decade numerous research articles have, via re-adaptations of team semantics, shed more light into the interplay between logic and dependence. A common feature, and limitation, in all these endeavors has been their preoccupation with notions of dependence that are qualitative in nature. That is, notions of dependence and independence that make use of quantities, such as conditional independence in statistics, have usually fallen outside the scope of these studies.
In contrast to earlier literature there has recently been a gradual shift toward quantita- tive dependence in team semantics studies. Two parallel approaches have been identified. In multiteam semantics formulae are evaluated against multisets of variable assign- ments, called multiteams [4]. This approach, which is analogous to the bag semantics in databases, centers attention to application domains in which the actual multiplicities of values, and not just their ratios, are meaningful. Another approach comes from proba- bilistic team semantics in which the basic semantic units are probability distributions, called probabilistic teams. To be sure, the idea of adding a probability measure on a team is not new, as first ideas of probabilistic teams trace back to the works of Galliani [6] and Hyttinen et al. [13]. But a systematic study on the topic is quite recent. In [5]
2 M. Hannula et al.
probabilistic team semantics was studied in relation to the dependence concept that is most central in statistics: conditional independence. Mirroring [7,9,15] the expressive- ness of probabilistic independence logic (FO(⊥⊥c)), obtained by extending first-order
logic with conditional independence, was in [5,11] characterised in terms of arithmetic variants of existential second-order logic. In [11] the data complexity of FO(⊥⊥c)was
also identified in the context of Blum-Shub-Smale machines [1] and the existential the- ory of the reals. In [10] the focus was shifted to the expressivity hierarchies between probabilistic logics defined in terms of different quantitative dependencies.
Of all the dependence concepts thus far investigated in team semantics, that of inclusion has arguably turned out to be the most intriguing and fruitful. One reason is that inclusion logic, which arises from this concept, can only define properties of teams that are decidable in polynomial time [8]. In contrast, other natural team-based logics, such as dependence and independence logic, capture non-deterministic polynomial time [7,15,16], and many variants, such as team logic, have an even higher complexity [14]. Thus it should come as no surprise if quantitative variants of many team-based logics turn out be intractable; in principle, adding arithmetical operations and/or counting cannot be a mitigating factor when it comes to complexity. Indeed, it has been recently shown that the data complexity of probabilistic independence logic over sentences is possibly even higher than NP; it can be characterised in terms of a fragment of the existential theory of the reals which is NP-hard but not necessarily in NP [11]. The least known upper bound is PSPACE, as this is the least known upper bound for the full existential theory of the reals [2].
In this paper we ask the following general question: what are the definability and complexity properties of inclusion logic, if defined in quantitative terms. In team se- mantics the inclusion atom x ⊆ y, for two variables x and y, expresses that each value aof x also appears as a value of y. A quantitative variant of this atom is obtained by considering a so-called marginal identity atom x ≈ y, which states the probability (or multiplicity) of x being a is the same as the probability (or multiplicity) of y being a, for all possible values a [5]. Of the aforementioned two parallel approaches, our focus is in probabilistic team semantics.
We make the following contributions. First, we show that the data complexity of probabilistic inclusion logic (FO(≈)) over sentences is in P. Thus no complexity in- crease, at the sentence level, is here effected by the introduction of quantities. In con- trast, as stated above, whether independence logic is defined in terms of probabilistic teams or plain teams bears a (possible) impact upon complexity. Second, we show that probabilistic inclusion logic extended with dependence atoms (FO(≈, =(·))) cap- tures an additive variant of existential second-order logic. Using this we also show that FO(≈, =(·)) over sentences corresponds to NP. Third, we show that FO(⊥⊥c)over
open formulae is strictly more expressive than FO(≈, =(·)).4From [10] we already
know that FO(≈, = (·)) is strictly more expressive than FO(≈); the reason is that marginal identity atoms, but not dependence atoms, are closed under so-called scaled unions of probabilistic teams. Thus we obtain the following strict expressivity hierarchy: 4Results in the vein of the second and third item have been independently developed, but not yet
published, in the context of multiteam semantics. While our results are here similar, the proof techniques are different.
Complexity of probabilistic inclusion logic and additive real arithmetics 3
FO(≈) < FO(≈, =(·)) < FO(⊥⊥c). Fourth, we consider the axiomatic properties of
the marginal independence atom. That inclusion atoms enjoy simple sound and complete axioms is well known from database theory [3]; we will investigate whether the same axioms yield a complete characterisation of marginal identity atoms.