Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Dependence logic was introduced by Väänänen in 2007 [36]. It extends first-order logic with dependence atomic formulas (dependence atoms) \({{\mathrm{=}}}({\varvec{x}}, y)\) with the intuitive meaning that the value of the variable y is functionally determined by the values of the variables \({\varvec{x}}\). The notion of dependence has real meaning only in plurals. Thus, in contrast to the usual Tarskian semantics, in dependence logic the satisfaction of formulas is defined not via single assignments but via sets of assignments. Such sets are called teams and the semantics is called team semantics. In this article we take a further step of replacing structures and teams by their multiset analogues. Multiteams have been considered in some earlier works [20, 21, 38] but so far no systematic study of the subject in the team semantics context has appeared. In the temporal logic setting (in the context of computation tree logic) multiteam semantics have been introduced and studied recently [28]. In this article we define the so-called lax and strict multiteam semantics and study properties of various logics under these semantics. Moreover we show how the shift from sets to multisets naturally gives rise to probabilistic and approximate versions of dependence logic.

The idea of team semantics goes back to Hodges [19] whose aim was to define compositional semantics for independence-friendly logic [18]. The introduction of dependence logic and its many variants has evinced that team semantics is a very interesting and versatile semantical framework. In fact, team semantics has natural propositional, modal, and temporal variants. The study of modal dependence logic was initiated by Väänänen [37] in 2008. Shortly after, extended modal dependence logic was introduced by Ebbing et al. [7] and modal independence logic by Kontinen et al. [27]. In purely propositional context the study was initiated by Yang and Väänänen [42] and further studied, e.g., by Hannula et al. [17]. One of the most important developments in the area of team semantics was the introduction of independence logic [13] in which dependence atoms of dependence logic are replaced by independence atoms \({\varvec{y}} \perp _{{\varvec{x}}} {\varvec{z}}\). The intuitive meaning of the independence atom \({\varvec{y}} \perp _{{\varvec{x}}} {\varvec{z}}\) is that, when the value of \( {\varvec{x}}\) is fixed, knowing the value of \({\varvec{z}}\) does not tell us anything new about the value of \({\varvec{y}}\). Soon after the introduction of independence logic, Galliani [9] showed that independence atoms can be further analysed, and alternatively expressed, in terms of inclusion and exclusion atoms. The inclusion atom \({\varvec{x}} \subseteq {\varvec{y}}\) expresses that each value taken by \({\varvec{x}}\) in a team X appears also as a value of \({\varvec{y}}\) in X. The meaning of the exclusion atom \( {\varvec{x}} | {\varvec{y}}\) is that \({\varvec{x}}\) and \({\varvec{y}}\) have no common values in X.

Independence, inclusion, and exclusion atoms have very interesting properties in the team semantics setting. For example, inclusion atoms give rise to a variant of dependence logic that corresponds to the complexity class PTIME over finite ordered structures [10]. In fact, the complexity theoretic aspects of these atoms in propositional, modal, and first-order setting have been studied extensively during the past few years (see the survey of Durand et al. [6] and the references therein).

A team X over variables \(x_1,\ldots , x_{n}\) can be viewed as a database table with \(x_1,\ldots , x_{n}\) as its attributes. Under this interpretation, dependence, inclusion, exclusion, and independence atoms correspond exactly to functional, inclusion, exclusion, and embedded multivalued dependencies, respectively. These dependencies have been studied extensively in database theory. The close connection between team semantics and database theory has already led to fruitful interactions between these areas [15, 16, 26]. It is worth noting that multiset semantics (also known as bag semantics) is widely used in databases [1, 24, 29]. On the other hand, independence atoms, embedded multivalued dependencies, and the notion of conditional independence \({\varvec{Y}} \perp {\varvec{Z}}|{\varvec{X}}\) in statistics have very interesting connections, see, e.g., [14, 40]. In this article we establish that, in the multiteam semantics setting, independence atoms can be naturally interpreted exactly as statistical conditional independence. Probabilistic versions of dependence logic have been previously studied by Galliani and Mann [8, 11].

In practice dependencies such as functional dependence do not hold absolutely but with a small margin of error. In order to logically model such scenarios, Väänänen introduced approximate dependence atoms [38]. The corresponding approximate functional dependencies have been studied in the context of data mining [22]. In this article we define a general approximation operator which, in particular, can be used to express approximate dependence atoms. In the last sections of the article, we study the computational aspects of logics extended by the approximation operator.

Previous work on multisets in team semantics. The idea of generalising team semantics by the use of multisets has been discussed in several articles. Hyttinen et al. [20] study multiteams, and their generalisations called quantum teams, which they use to give semantics to a propositional logic called quantum team logic that can be used for the logical analysis of phenomena in quantum physics. Moreover Hyttinen et al. [21] define a notion of a measure team and measure team logic. The latter is a logic for making inferences about probabilities of first-order formulas in measure teams. Furthermore Krebs et al. introduced team semantics with multisets for the temporal logic CTL [28]. Finally the fact that under multiteam semantics approximate dependence atoms have the locality property (compare to Proposition 11) is discussed by Väänänen [38].

Organisation. This article is organised as follows. Section 2 briefly discusses the basic concepts and definitions. The generalisation of team semantics to multisets is presented in Sect. 3. Section 4 defines the approximation operators, and in Sect. 5 the complexity theoretic aspects of logics with the approximation operators are studied.

2 Preliminaries

We assume familiarity with standard notions in computational complexity theory and logic. We will make use of the complexity classes \(\mathrm {NP} \) and \(\mathrm {PTIME} \). For an introduction into this topic we refer to the good textbook of Papadimitriou [34].

2.1 Team Semantics

Vocabularies \(\tau \) are finite sets of relation symbols with prescribed arities. For each \(R\in \tau \), let \(\mathrm {ar}(R)\in Z_+\) denote the arity of R. A \(\tau \)-structure is a tuple \(\mathfrak {A}= \big (A, (R^\mathfrak {A}_i)_{R_i\in \tau }\big )\), where A is a set and each \(R^\mathfrak {A}_i\) is an \(ar(R_i)\)-ary relation on A (i.e., \(R^\mathfrak {A}_i \subseteq A^{ar(R_i)}\)). We use \(\mathfrak {A}\), \(\mathfrak {B}\), etc. to denote \(\tau \)-structures and A, B, etc. to denote the corresponding domains. In this article we restrict attention to finite structures.

Let D be a finite set of first-order variables and A be a nonempty set. A function \(s:D \rightarrow A\) is called an assignment. The set D is the domain of s, and the set A the codomain of s. For a variable x and \(a\in A\), the assignment \(s(a/x):D\cup \{x\} \rightarrow A\) is obtained from s as follows:

$$ s(a/x)(y) := {\left\{ \begin{array}{ll} a &{} \text {if y=x},\\ s(y) &{}\text {otherwise}. \end{array}\right. } $$

A \(team \) is a finite set of assignments with a common domain and codomain. Let X be a team, A a finite set, and \(F:X\rightarrow \mathcal {P}(A)\setminus \{\emptyset \}\) a function. We denote by X[A / x] the modified team \(\{s(a/x) \mid s\in X, a\in A\}\), and by X[F / x] the team \(\{s(a/x)\mid s\in X, a\in F(s)\}\). Let \(\mathfrak {A}\) be a \(\tau \)-structure and X a team with codomain A, then we say that X is a team of \(\mathfrak {A}\).

Let \(\tau \) be a set of relation symbols. The syntax of first-order logic \(\mathsf {FO} (\tau )\) is given by the following grammar, where \(R\in \tau \), \({\varvec{x}}\) is a tuple of variables, and x and y are variables. Note that in the definition the scope of negation is restricted to atomic formulae.

$$ \varphi :\,\!:= x=y \mid x\ne y \mid R({\varvec{x}}) \mid \lnot R({\varvec{x}}) \mid (\varphi \wedge \varphi ) \mid (\varphi \vee \varphi ) \mid \exists x \varphi \mid \forall x \varphi . $$

Let \({\varvec{x}},{\varvec{y}}\) be tuples of variables and \(\varphi \) a formula. We write \(\mathrm {Var}(\varphi )\) for the set of variables that occur in \(\varphi \), and \(\mathrm {Var}({\varvec{x}})\) for the set of variables listed in \({\varvec{x}}\). We also write \({\varvec{x}}{\varvec{y}}\) for the concatenation of \({\varvec{x}} \) and \({\varvec{y}}\), \({\varvec{x}} \cap {\varvec{y}}\) for any tuple listing the variables that occur both in \({\varvec{x}}\) and \({\varvec{y}}\), and \({\varvec{x}} \setminus {\varvec{y}}\) for any tuple listing the variables that occur in \({\varvec{x}} \) but not in \({\varvec{y}}\). For an assignment s, we write \(s({\varvec{x}})\) to denote the sequence \(\big (s(x_1),\ldots ,s(x_n)\big )\).

Next we define the lax and strict team semantics of first-order logic. It is worth noting that the disjunction has a non-classical interpretation. The classical (or intuitionistic) disjunction is usually denoted by in the team semantics framework. However, as exemplified by Proposition 1, the non-classical disjunction of team semantics naturally corresponds to the classical disjunction of ordinary first-order logic.

Definition 1

(Lax Team Semantics). Let \(\mathfrak {A}\) be a \(\tau \)-structure and X a team of \(\mathfrak {A}\). The satisfaction relation \(\models _X\) for first-order logic is defined as follows:

$$\begin{aligned} \begin{array}{lll} \mathfrak {A}\models _{X} x=y &{}\Leftrightarrow &{} \forall s\in X: s(x)=s(y)\\ \mathfrak {A}\models _{X} x\ne y &{}\Leftrightarrow &{} \forall s\in X: s(x) \not = s(y)\\ \mathfrak {A}\models _{X} R({\varvec{x}}) &{}\Leftrightarrow &{} \forall s\in X: s({\varvec{x}}) \in R^{\mathfrak {A}}\\ \mathfrak {A}\models _{X} \lnot R({\varvec{x}}) &{}\Leftrightarrow &{} \forall s\in X: s({\varvec{x}}) \not \in R^{\mathfrak {A}}\\ \mathfrak {A}\models _{X} (\psi \wedge \theta ) &{}\Leftrightarrow &{} \mathfrak {A}\models _{X} \psi \text { and } \mathfrak {A}\models _{X} \theta \\ \mathfrak {A}\models _{X} (\psi \vee \theta ) &{}\Leftrightarrow &{} \mathfrak {A}\models _Y \psi \text { and } \mathfrak {A}\models _Z \theta \text { for some Y,Z}\subseteq \text { X }s.t.\ Y\cup Z = X\\ \mathfrak {A}\models _{X} \forall x\psi &{}\Leftrightarrow &{} \mathfrak {A}\models _{X[A/x]} \psi \\ \mathfrak {A}\models _{X} \exists x\psi &{}\Leftrightarrow &{} \mathfrak {A}\models _{X[F/x]} \psi \text { holds for some } F:X \rightarrow \mathcal {P}(A)\setminus \{\emptyset \}. \end{array} \end{aligned}$$

The so-called strict team semantics is obtained from the previous definition by adding the following two requirements.

  1. (i)

    Disjunction: \(Y\cap Z= \emptyset \).

  2. (ii)

    Existential quantification: F(s) is singleton for all \(s\in X\).

Proposition 1

[36]. Let \(\mathfrak {A}\) be a \(\tau \)-structure, X a team of \(\mathfrak {A}\), and \(\varphi \) a formula of \(\mathsf {FO} (\tau )\). Then

$$ \mathfrak {A}\models _X \varphi \Leftrightarrow \forall s\in X: \mathfrak {A}\models _s \varphi , $$

where \(\models _s\) denotes the ordinary satisfaction relation of first-order logic defined via models and assignments as usual, and \(\models _X\) denotes the satisfaction relation of either lax or strict team semantics.

For a model \(\mathfrak {A}\) and a sentence \(\varphi \) (i.e., a formula with no free variables), the satisfaction relation \(\models \) is defined as:

$$\mathfrak {A}\models \varphi \text { if }\mathfrak {A}\models _{\{\emptyset \}} \varphi ,$$

where \(\{\emptyset \}\) denotes the singleton team of empty assignment.

Team semantics enables extending first-order logic with various dependency notions. The following dependency atoms were introduced in team semantics setting in [9, 13, 36].

Definition 2

(Dependency atoms). Let \(\mathfrak {A}\) be a model and X a team of \(\mathfrak {A}\). If \({\varvec{x}}, {\varvec{y}}\) are variable sequences, then \({{\mathrm{=}}}({\varvec{x}},{\varvec{y}})\) is a dependence atom with the satisfaction relation:

$$\mathfrak {A}\models _X {{\mathrm{=}}}({\varvec{x}},{\varvec{y}}) \text { if for all } s,s'\in X \text { s.t. }s({\varvec{x}}) = s'({\varvec{x}}), \text { it holds that } s({\varvec{y}})=s'({\varvec{y}}).$$

If \({\varvec{x}},{\varvec{y}}\) are variable sequences of the same length, then \({\varvec{x}} \subseteq {\varvec{y}}\) is an inclusion atom with the satisfaction relation:

$$\mathfrak {A}\models _X {\varvec{x}} \subseteq {\varvec{y}} \text { if for all }s\in X \text { there exists } s'\in X\text { such that } s({\varvec{x}})=s'({\varvec{y}}).$$

If \({\varvec{x}},{\varvec{y}},{\varvec{z}}\) are variable sequences, then \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) is a conditional independence atom with the satisfaction relation:

$$\begin{aligned}&\mathfrak {A}\models _{X} {\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\text { if for all } s,s'\in X\text { such that }s({\varvec{x}})=s'({\varvec{x}})\text { there exists } s''\in X \\&\text { such that }s''({\varvec{x}})=s({\varvec{x}})\text {, }s''({\varvec{y}})=s({\varvec{y}}), \text { and }s''({\varvec{z}})=s'( {\varvec{z}}). \end{aligned}$$

Note that in the previous definition it is allowed that some or all of the vectors of variables have length 0. For example, \(\mathfrak {A}\models _X{{\mathrm{=}}}({\varvec{x}})\) holds iff \(\forall s\in X: s({\varvec{x}})={\varvec{c}}\) holds for some fixed tuple \({\varvec{c}}\), and \(\mathfrak {A}\models _X{\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) holds always if either of the vectors \({\varvec{y}}\) or \({\varvec{z}}\) is of length 0.

We write \(\mathsf {FO} \) for first-order logic, and given a set of atoms \(\mathcal {C}\), we write \(\mathsf {FO} (\mathcal {C})\) (omitting the set parentheses of \(\mathcal {C}\)) for the logic obtained by adding the atoms of \(\mathcal {C}\) to \(\mathsf {FO} \). For instance, \(\mathsf {FO} ({{\mathrm{=}}}(\cdot ))\) denotes then dependence logic.

Often in literature dependence atoms are defined such that \({\varvec{y}}\) is a single variable, i.e., the widely used form is \({{\mathrm{=}}}({\varvec{x}},y)\). The definition above yields the strongest form of functional dependence. Moreover the atom \({{\mathrm{=}}}({\varvec{x}},{\varvec{y}})\) can be equivalently rewritten as a conjunction of dependence atoms of type \({{\mathrm{=}}}({\varvec{x}},y)\).

3 Multiteam Semantics

In this section we generalise team semantics with the concept of multisets. Multisets and multiteam semantics can be used, e.g., in applications to database theory to model reasoning with databases with duplicates. In practice, for multitude of reasons, the existence of duplicates in databases is very common. Again as previously noted, we restrict attention to finite sets and finite multisets. In the following definition, occurrences of “zero multiplicities” are allowed for notational convenience.

Definition 3

(Multiset). A multiset is a pair (Am) where A is a set and \(m:A \rightarrow \mathbb {N}\) is a (multiplicity) function. The function m determines the multiplicities of the elements in the multiset (Am). A multiset (Xm) is a multiteam if the underlining set X is a team. The domain (or the codomain) of the multiteam (Xm) is the domain (codomain) of the team X.

For each multiset (Am), we define the canonical set representative \([{(A,m)}]_\mathrm {cset}\) of (Am) as follows:

$$ [{(A,m)}]_\mathrm {cset} := \{ (a,i) \mid a\in A, 0<i\le m(a)\}. $$

We say that (Am) is finite whenever \([{(A,m)}]_\mathrm {cset}\) is finite. We say that a multiset (Am) is a submultiset of a multiset (Bn), \((A,m) \subseteq (B,n)\), if and only if \([{(A,m)}]_\mathrm {cset} \subseteq [{(B,n)}]_\mathrm {cset}\). Furthermore, we define that \((A,m) = (B,n)\) if and only if both \((A,m) \subseteq (B,n)\) and \((B,n) \subseteq (A,m)\) hold.

The disjoint union \((A,m) \uplus (B,n)\) of (Am) and (Bn) is the multiset (Ck), where \(C:= A\cup B\) and \(k:C\rightarrow \mathbb {N}\) is the function defined as follows:

$$ k(s):= {\left\{ \begin{array}{ll} m(s)+n(s) &{} \text { if }s\in A\text { and }s\in B,\\ m(s) &{} \text { if }s\in A\text { and }s\not \in B,\\ n(s) &{} \text { if }s\not \in A\text { and }s\in B. \end{array}\right. } $$

We write \(|(A,m)|\) to denote the size of the multiset (Am), i.e., \(|(A,m)|:= \sum _{a\in A} m(a)\). The set of non-empty submultisets of a multiset (Am) is the set

$$\mathcal {P}^+\big ((A,m)\big ):=\{ (C,l) \mid (C,l) \subseteq (A,m) \text { s.t. } l(c)\ge 1 \text { for each }c\in C\}\setminus \{(\emptyset ,\emptyset )\}.$$

Let (Xm) be a multiteam, (An) a finite multiset, and \(F:[{(X,m)}]_\mathrm {cset}\rightarrow \mathcal {P}^+\big ( (A,n) \big )\) a function. We denote by (Xm)[(An) / x] the modified multiteam defined as

$$ \biguplus _{s\in X}\biguplus _{a\in A} \{\big (s(a/x), m(s) \cdot n(a)\big )\}. $$

By X[F / x] we denote the multiteam defined as

$$ \biguplus _{s\in X}\biguplus _{1\le i \le m(s)} \{\big (s(b/x), l(b)\big ) \mid (B,l) = F\big ((s,i)\big ), b\in B\}. $$

A \(\tau \)-multistructure is a tuple \(\mathfrak {A}= \big ((A,m), (R^\mathfrak {A}_i)_{R_i\in \tau }\big )\) where (Am) is a multiset and, for each \(R_i\in \tau \), \(R^\mathfrak {A}_i\) is an \(\mathrm {ar}(R_i)\)-ary relation over the set \(\{a\in A\mid m(a) \ge 1 \}\). A multiteam (Xm) over \(\mathfrak {A}\) is a multiteam with codomain A.

Next we define multiteam semantics for first-order logic.

Definition 4

(Multiteam Semantics). Let \(\mathfrak {A}\) be a \(\tau \)-multistructure, (An) the domain of \(\mathfrak {A}\), and (Xm) a multiteam over \(\mathfrak {A}\). The satisfaction relation \(\models _{(X,m)}\) is defined as follows:

$$\begin{aligned} \begin{array}{lll} \mathfrak {A}\models _{(X,m)} x=y &{}\Leftrightarrow &{} \forall s\in X: \text { if } m(s)\ge 1 \text { then } s(x)=s(y)\\ \mathfrak {A}\models _{(X,m)} x\ne y &{}\Leftrightarrow &{} \forall s\in X: \text { if } m(s)\ge 1 \text { then } s(x)\not =s(y)\\ \mathfrak {A}\models _{(X,m)} R({\varvec{x}}) &{}\Leftrightarrow &{} \forall s\in X: \text { if } m(s)\ge 1 \text { then } s({\varvec{x}}) \in R^{\mathfrak {A}}\\ \mathfrak {A}\models _{(X,m)} \lnot R({\varvec{x}}) &{}\Leftrightarrow &{} \forall s\in X: \text { if } m(s)\ge 1 \text { then } s({\varvec{x}}) \not \in R^{\mathfrak {A}}\\ \mathfrak {A}\models _{(X,m)} (\psi \wedge \theta ) &{}\Leftrightarrow &{} \mathfrak {A}\models _{(X,m)} \psi \text { and } \mathfrak {A}\models _{(X,m)} \theta \\ \mathfrak {A}\models _{(X,m)} (\psi \vee \theta ) &{}\Leftrightarrow &{} \mathfrak {A}\models _{(Y,k)} \psi \text { and } \mathfrak {A}\models _{(Z,l)} \theta \text { for some multisets}\\ &{}&{}\,\mathrm{(}\text {Y,k}\mathrm{)},\mathrm{(}\text {Z,l}\mathrm{)}\subseteq \mathrm{(}\text {X,m}\mathrm{)} \text {s.t.} \mathrm{(}\text {X,m}\mathrm{)}\subseteq \mathrm{(}\text {Y,k}\mathrm{)}\uplus \mathrm{(}\text {Z,l}\mathrm{).}\\ \mathfrak {A}\models _{(X,m)} \forall x\psi &{}\Leftrightarrow &{} \mathfrak {A}\models _{(X,m)[(A,n)/x]} \psi \\ \mathfrak {A}\models _{(X,m)} \exists x\psi &{}\Leftrightarrow &{} \mathfrak {A}\models _{(X,m)[F/x]} \psi \text { holds for some function}\\ &{}&{}F:[{(X,m)}]_\mathrm {cset} \rightarrow \mathcal {P}^+\big ((A,n)\big ). \end{array} \end{aligned}$$

The so-called strict multiteam semantics is obtained from the previous definition by adding the following two requirements.

  1. (i)

    Disjunction: \((Y,n)\uplus (Z,k) = (X,m)\).

  2. (ii)

    Existential quantification: for all \(s\in X\) and \(0<i\le m(s)\), \(F\big ( (s,i) \big )=(B,n)\) for some singleton \(B=\{b\}\) and \(n(b)=1\).

This alternative semantics is discussed in Sect. 3.3. Otherwise in the paper we restrict attention to the multiteam semantics given in Definition 4, sometimes referred to as lax multiteam semantics. The following proposition shows that multiteam semantics and team semantics for first-order logic coincide when the multisets in multistructures are essentially sets. The proof of the proposition is self evident.

Proposition 2

Let \(\mathfrak {A}\) be a multistructure with domain (An), and (Xm) a multiteam over \(\mathfrak {A}\) such that \(n(a)=m(s)=1\) for all \(a\in A\) and \(s\in X\). Define \(\mathfrak B:=(A, (R^\mathfrak {A})_{R\in \tau })\). Then for every \(\varphi \in \mathsf {FO} \) it holds that

$$ \mathfrak {A}\models _{(X,m)} \varphi \text { if and only if } \mathfrak B\models _{X}\varphi . $$

Next we generalise inclusion and conditional independence atoms to multiteams by introducing their probabilistic versions. For a multiteam (Xm) of codomain A, a tuple of variables \({\varvec{x}}\) from \(\mathrm {Dom}(X)\), and \({\varvec{a}} \in A^{|{\varvec{x}}|}\), we denote by \((X,m)_{{\varvec{x}}= {\varvec{a}}}\) the multiteam (Xn) where n agrees with m on all assignments \(s\in X\) with \(s({\varvec{x}})={\varvec{a}}\), and otherwise n maps s to 0.

Definition 5

Let \(\mathfrak {A}\) be a multistructure with domain (An), and (Xm) a multiteam over \(\mathfrak {A}\). If \({\varvec{x}},{\varvec{y}}\) are variable sequences of the same length, then \({\varvec{x}} \le {\varvec{y}}\) is a probabilistic inclusion atom with the following semantics:

$$\mathfrak {A}\models _{(X,m)} {\varvec{x}} \le {\varvec{y}} \text { if } |(X,m)_{{\varvec{x}}={\varvec{s}}({\varvec{x}})}| \le |(X,m)_{{\varvec{y}}=s({\varvec{x}})}| \text { for all } s:\mathrm {Var}({\varvec{x}}) \rightarrow A.$$

If \({\varvec{x}},{\varvec{y}},{\varvec{z}}\) are variable sequences, then \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) is a probabilistic conditional independence atom with the satisfaction relation defined as

$$\begin{aligned} \mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}} \end{aligned}$$
(1)

if for all \(s:\mathrm {Var}({\varvec{x}}{\varvec{y}}{\varvec{z}}) \rightarrow A\) it holds that

$$\begin{aligned} |(X,m)_{{\varvec{x}}{\varvec{y}}= s({\varvec{x}}{\varvec{y}})}| \cdot |(X,m)_{{\varvec{x}}{\varvec{z}}=s({\varvec{x}}{\varvec{z}})}|=|(X,m)_{{\varvec{x}}{\varvec{y}}{\varvec{z}}=s({\varvec{x}}{\varvec{y}}{\varvec{z}})}|\cdot |(X,m)_{{\varvec{x}}=s({\varvec{x}})}|. \end{aligned}$$

We call atoms of the form \({{\varvec{x}} \perp \!\!\!\perp _{\emptyset } {\varvec{y}}}\) probabilistic marginal independence atoms, written as the shorthand \({{\varvec{x}} \perp \!\!\!\perp {\varvec{y}}}\). Note that we obtain the following satisfaction relation for \({{\varvec{x}} \perp \!\!\!\perp {\varvec{y}}}\):

$$\begin{aligned} \mathfrak {A}\models _{(X,m)} {{\varvec{x}} \perp \!\!\!\perp {\varvec{y}}}\text { if for all }&s:\mathrm {Var}({\varvec{x}}{\varvec{y}}) \rightarrow A, \\ \frac{|(X,m)_{{\varvec{x}} =s({\varvec{x}})}| \cdot |(X,m)_{{\varvec{y}}=s({\varvec{y}})}|}{|(X,m)|}&=|(X,m)_{{\varvec{x}}{\varvec{y}}=s({\varvec{x}}{\varvec{y}})}|.\nonumber \end{aligned}$$
(2)

The study of database dependencies is very interesting also in the practical point of view as many interesting properties of datasets can be revealed. Further the investigation of conditional independence can yield methods to be used to decompose datasets for speeding up different processing tasks on the data.

Multiteams (Xm) induce a natural probability distribution p over the assignments of X. Namely, we define \(p:X \rightarrow [0,1]\) such that

$$p(s) = \frac{m(s)}{\sum _{s\in X} m(s)}.$$

The probability that a tuple of (random) variables \({\varvec{x}}\) takes value \({\varvec{a}}\), written \(\Pr ({\varvec{x}} = {\varvec{a}})\), is then

$$\sum _{\begin{array}{c} s\in X,\\ s({\varvec{x}}) = {\varvec{a}} \end{array}} p(s).$$

It is now easy to see that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) iff for all \({\varvec{a}}{\varvec{b}}{\varvec{c}}\),

$$\Pr ({\varvec{y}}={\varvec{b}}, {\varvec{z}} = {\varvec{c}} | {\varvec{x}}= {\varvec{a}}) = \Pr ({\varvec{y}}={\varvec{b}} | {\varvec{x}}= {\varvec{a}})\Pr ({\varvec{z}} = {\varvec{c}} | {\varvec{x}}= {\varvec{a}}),$$

that is, the probability of \({\varvec{y}}={\varvec{b}}\) is independent of the probability of \({\varvec{z}} = {\varvec{c}}\), given \({\varvec{x}}={\varvec{a}}\). Analogously, a probabilistic inclusion atom \({\varvec{x}} \le {\varvec{y}}\) indicates that \(\Pr ({\varvec{x}} = {\varvec{a}}) \le \Pr ({\varvec{y}} ={\varvec{a}})\) for all values \({\varvec{a}}\), and a probabilistic independence atom of the form \({{\varvec{x}} \perp \!\!\!\perp {\varvec{x}}}\) that \(\Pr ({\varvec{x}} = {\varvec{a}})=1\) for some value \({\varvec{a}}\). Note that such atoms have been studied in the literature under the name of constancy atoms [9].

One can also study the usual dependency notions of database theory in the multiteam semantics setting.

Definition 6

Let \(\mathfrak {A}\) be a multistructure, (Xm) a multiteam over \(\mathfrak {A}\), and \(\varphi \) of the form \({{\mathrm{=}}}({\varvec{x}}, {\varvec{y}})\), \({\varvec{x}} \subseteq {\varvec{y}}\), or \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\). Then the satisfaction relation \(\models _{(X,m)}\) is defined as follows:

$$\mathfrak {A}\models _{(X,m)} \varphi \,\text { iff }\, \mathfrak {A}\models _{X^+} \varphi ,$$

where \(X^+\) is the team \(\{s\in X \mid m(s) \ge 1\}\).

First we notice that the known translation of dependence atoms to independence atoms (see Grädel and Väänänen [13]) works also in the probabilistic case.

Proposition 3

Let \(\mathfrak {A}\) be a multistructure, (Xm) a multiteam over \(\mathfrak {A}\), and \({\varvec{x}},{\varvec{y}}\) tuples of variables. Then \(\mathfrak {A}\models _{(X,m)}{{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{y}}}\)  iff  \(\mathfrak {A}\models _{(X,m)} {{\mathrm{=}}}({\varvec{x}},{\varvec{y}})\).

Proof

From the truth definition we obtain that

$$\begin{aligned} \mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{y}}} \,\Leftrightarrow \,\,&\text {for all } s:\mathrm {Var}({\varvec{x}}{\varvec{y}}) \rightarrow A\text { with }(X,m)_{{\varvec{x}}{\varvec{y}} =s({\varvec{x}}{\varvec{y}})} \ne \emptyset , \\&|(X,m)_{{\varvec{x}}{\varvec{y}} =s({\varvec{x}}{\varvec{y}})}| = |(X,m)_{{\varvec{x}} =s({\varvec{x}})}|. \nonumber \end{aligned}$$
(3)

The result then follows since \( \mathfrak {A}\models _{(X,m)} {{\mathrm{=}}}({\varvec{x}},{\varvec{y}})\) iff the right-hand side of Eq. (3) holds.    \(\square \)

Note that the restriction of Proposition 3 to marginal independence states that

$$ \mathfrak {A}\models _{(X,m)} {{\varvec{x}} \perp \!\!\!\perp {\varvec{x}}} \quad \Leftrightarrow \quad \mathfrak {A}\models _{(X,m)} {{\mathrm{=}}}({\varvec{x}}). $$

It is left open whether one can define inclusion or conditional independence atoms in \(\mathsf {FO} (\perp \!\!\!\perp _\mathrm{c},\le )\). However, over constant multiplicity functions conditional independence atoms \(\varphi \) coincide with their probabilistic counterparts whenever \(\mathrm {Var}(\varphi ) =\mathrm {Dom}(X)\). In the following, we denote by \({}^{{\varvec{x}}}A\) the team of all assignments \(\mathrm {Var}({\varvec{x}}) \rightarrow A\).

Lemma 1

Let \(\mathfrak {A}\) be a multistructure and (Xm) a multiteam over \(\mathfrak {A}\). Then

  1. (i)

    \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}} \quad \Leftrightarrow \quad \mathfrak {A}\models _{(X,m)} \big ({{\varvec{y}}\setminus {\varvec{x}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}\setminus {\varvec{x}}}\big )\),

  2. (ii)

    \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}} \quad \Leftrightarrow \quad \mathfrak {A}\models _{(X,m)} \big ({{\varvec{y}}\setminus {\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}\setminus {\varvec{y}}}\big ) \wedge \big ({{\varvec{y}}\cap {\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{y}}\cap {\varvec{z}}}\big )\).

Proof

Case (i). The truth definition in Eq. (1) is symmetric, and hence it suffices to show that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} x \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}} \Leftrightarrow \mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) whenever x is listed in \({\varvec{x}}\). This follows since \({}^{{\varvec{x}}{\varvec{y}} x {\varvec{z}}}A= {}^{{\varvec{x}}{\varvec{y}} {\varvec{z}}}A\), and the Eq. (1) remains the same after removing x.

Case (ii). Let us first show that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) implies \(\mathfrak {A}\models _{(X,m)} \big ({{\varvec{y}} \cap {\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{y}} \cap {\varvec{z}}}\big )\). For this, it remains to show that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} u \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) implies \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\), for u not listed in \({\varvec{x}} {\varvec{y}}{\varvec{z}}\). This follows since for all \(s\in {}^{{\varvec{x}}{\varvec{y}} {\varvec{z}}}A\),

$$\begin{aligned} |(X,m)_{{\varvec{x}}{\varvec{z}}=s({\varvec{x}}{\varvec{z}})}|&\cdot |(X,m)_{{\varvec{x}}{\varvec{y}}= s({\varvec{x}}{\varvec{y}})}|\\&= |(X,m)_{{\varvec{x}}{\varvec{z}}=s({\varvec{x}}{\varvec{z}})}| \cdot \varSigma _{a\in A} |(X,m)_{{\varvec{x}}{\varvec{y}}u= s({\varvec{x}}{\varvec{y}})a}|\\&=\varSigma _{a\in A} (|(X,m)_{{\varvec{x}}{\varvec{z}}=s({\varvec{x}}{\varvec{z}})}|\cdot |(X,m)_{{\varvec{x}}{\varvec{y}}u= s({\varvec{x}}{\varvec{y}})a}|)\\&= \varSigma _{a\in A} ( |(X,m)_{{\varvec{x}} = s({\varvec{x}})}|\cdot |(X,m)_{{\varvec{x}}{\varvec{y}}{\varvec{z}} u=s({\varvec{x}}{\varvec{y}}{\varvec{z}})a}| )\\&= |(X,m)_{{\varvec{x}} = s({\varvec{x}})}|\cdot \varSigma _{a\in A}|(X,m)_{{\varvec{x}}{\varvec{y}}{\varvec{z}} u=s({\varvec{x}}{\varvec{y}}{\varvec{z}})a}| \\&= |(X,m)_{{\varvec{x}} = s({\varvec{x}})}| \cdot |(X,m)_{{\varvec{x}}{\varvec{y}}{\varvec{z}}=s({\varvec{x}}{\varvec{y}}{\varvec{z}})}|, \end{aligned}$$

where in the third equation we apply the assumption that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} a \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\).

For the claim it now suffices to show that \(\mathfrak {A}\models _{(X,m)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}} \Leftrightarrow \mathfrak {A}\models _{(X,m)} \big ({{\varvec{y}}\setminus {\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}\setminus {\varvec{y}}}\big )\) whenever \(\mathfrak {A}\models _{(X,m)} \big ({{\varvec{y}} \cap {\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{y}} \cap {\varvec{z}}}\big )\). This follows directly from the truth definition since by Eq. (3) for all \(s\in {}^{{\varvec{x}}{\varvec{v}}}A\) with \((X,m)_{{\varvec{x}}{\varvec{v}} =s({\varvec{x}}{\varvec{v}})} \ne \emptyset \):

$$|(X,m)_{{\varvec{x}}{\varvec{v}} =s({\varvec{x}}{\varvec{v}})}| = |(X,m)_{{\varvec{x}} =s({\varvec{x}})}|,$$

for \({\varvec{v}}:= {\varvec{x}} \cap {\varvec{y}}\).    \(\square \)

If \({\varvec{x}},{\varvec{y}},{\varvec{z}}\) are pairwise disjoint, then \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) corresponds to the generalised embedded multivalued dependency \({\varvec{x}}\multimap \rightarrow {\varvec{y}}\mid {\varvec{z}}\) that is defined over extended relational data models (i.e., relational data models equipped with a multiplicity function) using semantics that coincide with that of Definition 5 [39, 40]. It was shown by Wong [39] that the generalised multivalued dependency \({\varvec{x}}\multimap \rightarrow {\varvec{y}}\) holds in an extended relational data model if and only if the underlying relational model satisfies the multivalued dependency \({\varvec{x}}\twoheadrightarrow {\varvec{y}}\). This is stated in the following theorem reformulated into the framework of this article.

Theorem 1

[39]. Let \(\mathfrak {A}\) be a multistructure, X a team over \(\mathfrak {A}\), and \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) a probabilistic conditional independence atom such that \(\mathrm {Var}({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}) =\mathrm {Dom}(X)\) and \({\varvec{x}} ,{\varvec{y}},{\varvec{z}}\) are pairwise disjoint. Let 1 denote the constant function that maps all assignments of X to 1. Then \(\mathfrak {A}\models _{(X,1)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\)  iff  \(\mathfrak {A}\models _{(X,1)} {\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\).

Using Lemma 1, the restriction that \({\varvec{x}},{\varvec{y}},{\varvec{z}}\) are disjoint can be now removed.

Proposition 4

Let \(\mathfrak {A}\) be a multistructure, X a team over \(\mathfrak {A}\), and \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) a probabilistic conditional independence atom such that \(\mathrm {Var}({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}) =\mathrm {Dom}(X)\). Then \(\mathfrak {A}\models _{(X,1)} {{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\)  iff  \(\mathfrak {A}\models _{(X,1)} {\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\).

Proof

First note that by Proposition 3 and Lemma 1, \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) is equivalent in multiteam semantics to \(\big ({{\varvec{y}}\setminus {\varvec{x}}{\varvec{z}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}} \setminus {\varvec{x}} {\varvec{y}}}\big ) \wedge {{\mathrm{=}}}({\varvec{x}},{\varvec{y}}\cap {\varvec{z}})\). Moreover, it is known that in team semantics \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) is equivalent to \({\varvec{y}}\setminus {\varvec{x}}{\varvec{z}}~\bot _{{\varvec{x}}}~{\varvec{z}} \setminus {\varvec{x}} {\varvec{y}} \wedge {{\mathrm{=}}}({\varvec{x}},{\varvec{y}}\cap {\varvec{z}})\) [13]. Hence, the claim follows by Theorem 1.   \(\square \)

Note that \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) implies \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) also over arbitrary multiplicity functions since non-emptiness of \((X,m)_{{\varvec{x}}{\varvec{y}}=s({\varvec{x}}{\varvec{y}})}\) and \((X,m)_{{\varvec{x}}{\varvec{z}}=s({\varvec{x}}{\varvec{z}})}\) implies non-emptiness of \((X,m)_{{\varvec{x}}{\varvec{y}}{\varvec{z}}=s({\varvec{x}}{\varvec{y}}{\varvec{z}})}\) by the truth definition in Eq. (1). The converse however does not hold; the multiteam (Ym) depicted in Fig. 1 satisfies \(x~\bot ~y\) but violates \({x \perp \!\!\!\perp y}\).

A Diversion: Implication Problems. Results similar in spirit to Proposition 4 have been studied in connection to implication problems which is a central notion in causal reasoning and database dependency theory. The finite implication problem of independence atoms \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) is defined as follows. Given a finite collection \(\varSigma \cup \{\varphi \}\) of independence atoms, determine whether for all finite \(\mathfrak {A},X\):

$$\mathfrak {A}\models _X \varSigma \Rightarrow \mathfrak {A}\models _X\varphi .$$

If the above holds, we write \(\varSigma \models \varphi \). The implication problem of other types of dependencies is defined analogously. Furthermore, the problem for the atoms \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) can be defined similarly by replacing teams by multiteams. The implication problems of embedded multivalued dependencies (i.e., the atoms \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\)) and \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) have been extensively studied, e.g., for both atoms the problem is not finitely axiomatisable and for the former the problem is known to be undecidable. On the other hand, there are interesting restricted cases where the implication problems are finitely axiomatisable and equivalent, i.e., for all inputs \(\varSigma \cup \{\varphi \}\), \(\varSigma \models \varphi \) iff \(\varSigma ^p\models \varphi ^p\), where \(\varSigma ^p\) and \(\phi ^p\) are defined by replacing \({\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\) by \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\). This holds, for example, for marginal independence atoms and for the so-called saturated atoms \(\varphi \) of the form \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) (or equivalently for \(\varphi ={\varvec{y}}~\bot _{{\varvec{x}}}~{\varvec{z}}\)) which, as in Theorem 1, satisfy \(\mathrm {Var}({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}) =\mathrm {Dom}(X)\) (see the survey by Wong et al. [40]). Relationships between fragments of conditional independence statements and embedded multivalued dependencies have recently been studied, e.g., in [3033]. It is also worth noting that the passage from set to multisets has interesting consequences also for the study of implication problems of database dependencies. For example, while key constraints can be expressed by functional dependencies under team semantics, this is no longer true under multiteam semantics [23].

Conditional independence is an important notion for expressing structural aspects of probability distributions. Context specific independence is a variant of \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) expressing independence in a context where the values of some variables of \({\varvec{x}}\) are restricted to range over a subset of all possible values [2, 35]. The next simple example shows how disjunction can be used to express context specific independence statements in \(\mathsf {FO} ( \perp \!\!\!\perp _\mathrm{c})\). The example shows that combining \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) with the logical connectives and quantifiers available in \(\mathsf {FO} ( \perp \!\!\!\perp _\mathrm{c})\) provide us with powerful means to define interesting generalisations of conditional independence. The definability of context specific independence using disjunction has been pointed out in [3].

Example 1

Let \(A=\{0,1\}\) and X be a multiteam of A with domain \(\mathrm {Dom}(X)=\{x_0,x_1, \ldots ,x_n\}\). Now in X the variable \(x_0\) is said to be contextually independent of \(x_2\) given \(x_1=0\), denoted by

$$\begin{aligned} x_0 \perp x_2 \mid x_1=0, \end{aligned}$$
(4)

if for all \(s:\{x_0,x_1,x_2\}\rightarrow A\) such that \(s(x_1)=0\) it holds that

$$\begin{aligned} |(X,m)_{x_0x_1= s(x_0x_1)}| \cdot&|(X,m)_{x_1x_2=s(x_1x_2)}|\\&=|(X,m)_{x_0x_1x_2=s(x_0x_1x_2)}|\cdot |(X,m)_{x_1=s(x_1)}|. \end{aligned}$$

It is now straightforward to check that Eq. (4) can be equivalently expressed by the \(\mathsf {FO} ( \perp \!\!\!\perp _\mathrm{c})\)-formula \((x_1\ne c) \vee \big (x_1=c \wedge (x_0 \perp \!\!\!\perp _{x_1} x_2)\big ),\) where c is a constant symbol interpreted as 0.

3.1 Probabilistic Notions in Multiteam Semantics

In this section we investigate some properties of the probabilistic logics we have defined so far.

The set of free variables of a formula \(\varphi \in \mathsf {FO} (\mathcal {C})\), denoted by \(\mathrm {Fr}(\varphi )\), is defined in the obvious manner as in first-order logic. In particular, we define

$$\begin{aligned} \mathrm {Fr}({\varvec{x}} \subseteq {\varvec{y}})&:= \mathrm {Fr}({\varvec{x}} \le {\varvec{y}}) := \mathrm {Fr}\big ({{\mathrm{=}}}({\varvec{x}},\mathbf {y})\big ) := \{{\varvec{x}}, {\varvec{y}}\}\\ \mathrm {Fr}\big ( {\varvec{y}}\perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}} \big )&:= \mathrm {Fr}\big ( {\varvec{y}}\bot _{{\varvec{x}}} {\varvec{z}} \big ) := \{{\varvec{x}}, {\varvec{y}}, {\varvec{z}} \}. \end{aligned}$$

For \(V \subseteq \mathrm {Dom}(X)\), we define \((X,m) \upharpoonright V:=(X\upharpoonright V, \,n)\) where

$$n(s):= \sum _{\begin{array}{c} s'\in X,\\ s'\upharpoonright V = s \end{array}} m(s').$$

The following locality principle holds by easy structural induction.

Proposition 5

(Locality). Let \(\mathfrak {A}\) be a multistructure, (Xm) a multiteam, and V a set of variables such that \(\mathrm {Fr}(\varphi ) \subseteq V\subseteq \mathrm {Dom}(X)\). Then for all \(\varphi \in \mathsf {FO} (\le ,\perp \!\!\!\perp _\mathrm{c},{{\mathrm{=}}}(\cdot ),\subseteq ,\bot _\mathrm{c})\) it holds that \(\mathfrak {A}\models _{(X,m)}\varphi \)  iff  \(\mathfrak {A}\models _{(X,m)\upharpoonright V}\varphi \).

The notion of flatness is generalised to the multiteam setting as follows.

Definition 7

(Weak Flatness). We say that a formula \(\varphi \) is weakly flat if for all multistructures \(\mathfrak {A}\) and for all multiteams (Xm) it holds that

$$\mathfrak {A}\models _{(X,m)} \varphi \quad \Leftrightarrow \quad \mathfrak {A}\models _{(X,n)} \varphi ,$$

where n agrees with m on all s with \(m(s) =0\), and otherwise maps all s to 1. The multiteam (Xn) is then called the weak flattening of (Xm). A logic is called weakly flat if every formula of this logic is weakly flat.

Dependence, conditional independence, and inclusion atoms are insensitive to multiplicities, and using structural induction one can prove the following proposition.

Proposition 6

\(\mathsf {FO} ({{\mathrm{=}}}(\cdot ),\subseteq ,\bot _\mathrm{c})\) is weakly flat.

On the other hand, probabilistic dependencies do not satisfy weak flatness as shown in the next example.

Example 2

For instance (Ym), illustrated in Fig. 1, does not satisfy \({{\varvec{x}} \perp \!\!\!\perp {\varvec{y}}}\) but its weak flattening (Yn) does.

Analogously, the probabilistic inclusion atom is not weakly flat, and therefore neither of these atoms can be expressed in \(\mathsf {FO} ({{\mathrm{=}}}(\cdot ),\subseteq ,\bot _\mathrm{c})\).

Fig. 1.
figure 1

Assignments for multiteams in Example 2.

A formula \(\varphi \) is called union closed (in the multiteam setting) if for all multistructures \(\mathfrak {A}\) and all multiteams (Xm), (Yn): if \(\mathfrak {A}\models _{(X,m)} \varphi \) and \(\mathfrak {A}\models _{(Y,n)} \varphi \), then \(\mathfrak {A}\models _{(Z,h)}\varphi \), where \((Z,h)=(X,m)\uplus (Y,n)\). A logic is called union closed if all its formulae are union closed. It is easy to show by induction on the structure of formulae that probabilistic inclusion logic satisfies union closure.

Proposition 7

\(\mathsf {FO} (\le ,\subseteq )\) is union closed.

3.2 Probabilistic Notions in Team Semantics

In this section we examine probabilistic independence and inclusion logic in the (set) team semantics setting. Note that all the models considered in this section are usual first-order structures.

Satisfaction of probabilistic atoms in team semantics setting is defined by adding a constant multiplicity function.

Definition 8

Let \(\mathfrak {A}\) be a model, X a team over \(\mathfrak {A}\), and \(\varphi \) be a probabilistic atom of the form \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) or \({\varvec{x}} \le {\varvec{y}}\). Then the satisfaction relation \(\models _X\) is defined as follows:

$$\mathfrak {A}\models _X \varphi \text { iff }\mathfrak {A}\models _{(X,1)} \varphi ,$$

where 1 is the constant function that maps all assignments of X to 1.

The next theorem shows that, since probabilistic inclusion and independence atoms are expressible (in the team semantics setting) in \(\mathsf {FO} (\bot _\mathrm{c})\) relative to teams of fixed domain, their addition does not increase the expressive power of \(\mathsf {FO} (\bot _\mathrm{c})\).

Theorem 2

Let \(\varphi \in \mathsf {FO} (\le ,\perp \!\!\!\perp _\mathrm{c},{{\mathrm{=}}}(\cdot ),\subseteq ,\bot _\mathrm{c})\) be a sentence. Then there exists a sentence \(\varphi '\in \mathsf {FO} (\bot _\mathrm{c})\) such that for all models \(\mathfrak {A}\) it holds that \(\mathfrak {A}\models \varphi \) iff \(\mathfrak {A}\models \varphi '\).

Proof

First note that inclusion and dependence atoms can be expressed in \(\mathsf {FO} (\bot _\mathrm{c})\) [9, 13]. Also it is easy to see that one can construct existential second-order logic sentences that capture probabilistic inclusion and conditional independence atoms over teams of fixed domain. Namely, for all \(\varphi \) of the form \({{\varvec{y}} \perp \!\!\!\perp _{{\varvec{x}}} {\varvec{z}}}\) or \({\varvec{x}} \le {\varvec{y}}\) and all \( V \supseteq \mathrm {Fr}(\varphi )\), there exists an \(\mathsf {ESO} \) sentence \(\varphi ^*(R)\), where R is a k-ary relation symbol for \(k= |\mathrm {Var}(\varphi )|\), such that for all \(\mathfrak {A}\) and X with \(\mathrm {Dom}(X) =V\),

$$\mathfrak {A}\models _X \varphi \Leftrightarrow (\mathfrak {A},\text {Rel}(X))\models \varphi ^*(R),$$

where \(\text {Rel}(X)= \{(s(x_1), \ldots ,s(x_k))\mid s\in X\}\). All \(\mathsf {ESO} \)-definable properties of teams translate into \(\mathsf {FO} (\bot _\mathrm{c})\) [9], and hence the formula \(\varphi '\) can be constructed from \(\varphi \) by replacing each probabilistic atom with a correct \(\mathsf {FO} (\bot _\mathrm{c})\)-translation.    \(\square \)

Note that probabilistic inclusion atoms are not closed under (set) unions in team semantics, and hence they cannot be expressed in \(\mathsf {FO} (\subseteq )\) as shown in the following example.

Example 3

Let \(\mathfrak {A}\) be a first-order structure with domain \(\{0,1,2\}\), and \(s:=\{(x,0), (y,1), (z,0) \}\), \(s':=\{(x,1), (y,0), (z,1) \}\), and \(s'':=\{(x,0), (y,1), (z,2) \}\) be assignments. Define \(X:=\{s,s'\}\) and \(Y:=\{s',s''\}\). Now \(\mathfrak {A}\models _X x\le y\), \(\mathfrak {A}\models _Y x\le y\), but \(\mathfrak {A}\not \models _{X\cup Y} x\le y\).

3.3 Strict Multiteam Semantics

We briefly consider properties of related logics under strict multiteam semantics.

Proposition 8

Over strict multiteam semantics \(\mathsf {FO} ({{\mathrm{=}}}(\cdot ))\) is weakly flat.

The logics \(\mathsf {FO} (\bot _\mathrm{c})\) and \(\mathsf {FO} (\subseteq )\) are not weakly flat under strict multiteam semantics as shown in the next example.

Example 4

For instance (Xm), illustrated in Fig. 2, satisfies \( (x\subseteq z)\vee (y\subseteq z)\) in strict semantics but its weak flattening (Xn) does not.

Fig. 2.
figure 2

Assignments for teams in Example 4.

Similarly, one can show that \(\mathsf {FO} (\le ,\subseteq )\) is not union closed under strict multiteam semantics. Moreover one can show that Propositions 2 and 5 hold also under strict multiteam semantics.

4 Approximate Operators

Now we will turn to define an existential and a universal approximate operator which allows one to state truth of formulas not with respect to the full team but with respect to a ratio of the team. The main motivator for this approach is the important application in database theory to be able to model the truth of properties in databases that may contain some faulty data. Moreover, in practice, duplicates occur frequently in databases for a multitude of reasons. Thus the study of database dependencies, such as inclusion dependencies and foreign key constraints, in combination with approximate operators is an important topic as it explains inherent properties of a given dataset. In this section we consider multiteam semantics.

Definition 9

Let \(\mathfrak {A}\) be a multistructure, and (Xm) a multiteam over \(\mathfrak {A}\), and \(p\in [0,1]\) a rational number.

$$\begin{aligned} \mathfrak {A}\models _{(X,m)}\langle {p}\rangle \varphi&\Leftrightarrow \exists (Y,n)\subseteq (X,m), |(Y,n)|\ge p\cdot |(X,m)|: \mathfrak {A}\models _{(Y,n)}\varphi ,\\ \mathfrak {A}\models _{(X,m)}[{p}]\varphi&\Leftrightarrow \forall (Y,n)\subseteq (X,m), |(Y,n)|\ge p\cdot |(X,m)|: \mathfrak {A}\models _{(Y,n)}\varphi \end{aligned}$$

The previous definition generalises the notion of approximate dependence atoms \({{\mathrm{=}}}_p(\cdot )\), introduced by Väänänen [38], in the following sense: \({{\mathrm{=}}}_{1-p}({\varvec{x}},y)\) is equivalent to the formula \(\langle {p}\rangle {{\mathrm{=}}}({\varvec{x}},y)\).

In the following we observe that distributivity does not hold in general with respect to \(\langle {p}\rangle \).

Fig. 3.
figure 3

Assignments for multiteams in Examples 5 and 6.

Proposition 9

It is not true that \(\langle {p}\rangle (\varphi \vee \psi )\equiv \langle {p}\rangle \varphi \vee \langle {p}\rangle \psi \).

Proof

Let \(\mathfrak {A}\) be the multistructure over the empty vocabulary with domain \((\{0,1,2\},1)\), where 1 is the constant 1 multiplicity function. Then \(\mathfrak {A}\models _{(X,1)}\langle {\frac{2}{3}}\rangle (x=y\vee x=z)\) but \(\mathfrak {A}\not \models _{(X,1)}\langle {\frac{2}{3}}\rangle (x=y)\vee \langle {\frac{2}{3}}\rangle (x=z)\), where (X, 1) is the multiteam depicted in the Fig. 3.   \(\square \)

The next simple observation states the distributivity of \([{p}]\) with respect to conjunction \(\wedge \), as well as the merger of two \(\langle {p}\rangle \)-operators and two \([{q}]\)-operators, respectively.

Observation 1

The following equivalences hold:

  1. 1.

    \([{p}](\varphi \wedge \psi )\equiv [{p}]\varphi \wedge [{p}]\psi \),

  2. 2.

    \(\langle {p}\rangle (\langle {q}\rangle \varphi )=\langle {p\cdot q}\rangle \varphi \),

  3. 3.

    \([{p}]([{q}]\varphi )=[{p\cdot q}]\varphi \).

The next two examples show that both downward closure and union closure are violated by the approximate operator.

Example 5

Let \(\mathfrak {A}\) be the multistructure over the empty vocabulary with domain \((\{0,1,2\},1)\), where 1 is the constant 1 multiplicity function. Then \(\mathfrak {A}\models _{(X,1)}\langle {\frac{1}{3}}\rangle (x=y)\) but \(\mathfrak {A}\not \models _{(Y,1)}\langle {\frac{1}{3}}\rangle (x=y)\), where \((Y,1)\subseteq (X,1)\) are the multiteams depicted in the Fig. 3.

Example 6

Let \(\mathfrak {A}\) be the multistructure over the empty vocabulary with domain \((\{0,1\},1)\), where 1 is the constant 1 multiplicity function. The multiteams (Zm), (Zn), (Zk), \((Z,\ell )\) are depicted in the Fig. 3. Now \(\mathfrak {A}\models _{(Z,k)}[{\frac{2}{3}}] (x\le y)\) and \(\mathfrak {A}\models _{(Z,\ell )}[{\frac{2}{3}}] (x\le y)\). However \(\mathfrak {A}\not \models _{(Z,n)} x \le y\) and thus \(\mathfrak {A}\not \models _{(Z,m)}[{\frac{2}{3}}](x\le y)\) even though \((Z,k)\uplus (Z,l) = (Z,m)\).

Proposition 10

Let \(\mathcal L\) be a logic and \(\varphi \in \mathcal L\) a formula. Then \(\langle {p}\rangle \) preserves union closure (whereas \([{p}]\) does not), i.e., \(\langle {p}\rangle \varphi \) is union closed whenever \(\varphi \) is.

Proof

Let \(\mathfrak {A}\) be a multistructure and XY be multiteams of \(\mathfrak {A}\). Assume that \(\mathfrak {A}\models _X \langle {p}\rangle \varphi \) and \(\mathfrak {A}\models _Y \langle {p}\rangle \varphi \). Then there are multiteams \(X'\subseteq X\) and \(Y'\subseteq Y\) such that \(|X'|\ge p|X|\), \(|Y'|\ge p|Y|\), and both \(\mathfrak {A}\models _{X'} \varphi \) and \(\mathfrak {A}\models _{Y'}\varphi \). Hence \(|X'\uplus Y'|=|X'|+|Y'|\ge p|X|+p|Y|= p(|X|+|Y|) = p|X\uplus Y|\) and thus \(\mathfrak {A}\models _{X\uplus Y} \langle {p}\rangle \varphi \).    \(\square \)

Yet locality holds for this logic as witnessed by the following proposition. The proof is by induction.

Proposition 11

(Locality). Let \(\mathfrak {A}\) be a multistructure, (Xm) a multiteam, and V be a set of variables such that \(\mathrm {Fr}(\varphi ) \subseteq V\subseteq \mathrm {Dom}(X)\). Then for all \(\varphi \in \mathsf {FO} (\langle {p}\rangle ,[{p}],\le ,\perp \!\!\!\perp _\mathrm{c},{{\mathrm{=}}}(\cdot ),\subseteq ,\bot _\mathrm{c})\), it holds that \(\mathfrak {A}\models _{(X,m)}\varphi \)  iff  \(\mathfrak {A}\models _{(X,m)\upharpoonright V}\varphi \).

5 On the Complexity of Approximate Dependence Logic

In the following we study computational complexity of model checking in dependence logic enriched with the operator \(\langle {p}\rangle \). The results hold under both team and multiteam semantics. To simplify notation, we work with team semantics in this section. Analogously to [5], our results can be seen as a first step towards a systematic classification of the syntactic fragments of approximate dependence logic for which data complexity of model-checking is tractable/intractable.

We first define the model checking problem in the context of team semantics. We consider only Boolean queries, that is we define the model checking problem for a logic \(\mathcal {L}\) as follows: given a model \(\mathfrak {A}\), a team X of \(\mathfrak {A}\), and a formula \(\varphi \) of \(\mathcal {L}\), decide whether \(\mathfrak {A}\models _X\varphi \) holds. There are three parameters to this problem: the model \(\mathfrak {A}\), the team X, and the formula \(\varphi \). Depending on which of these parameters are fixed, a different variant of the model checking problem arises. Here we consider two of these variants: the variant with a fixed formula (this is called data complexity), and a variant in which nothing is fixed (this is called combined complexity).

The following two theorems reveal that already very simple formulas of approximate dependence logic witness the \(\mathrm {NP} \)-completeness of the data complexity of the logic.

Theorem 3

Model checking for \(\langle {p}\rangle ({{\mathrm{=}}}(x,y)\wedge {{\mathrm{=}}}(u,v))\) is \(\mathrm {NP} \)-complete.

Proof

We sketch the proof here, a full proof can be found in the arXiv version of this article. For the lower bound we give a polynomial many-one reduction from \(\mathsf {3SAT} \) inspired by a similar proof of Jarmo Kontinen [25]. Start with a formula \(\varphi =\bigwedge _{i=1}^m\bigvee _{j=1}^3 \ell _{i,j}\) where \(\ell _{i,j}\) is the jth literal in the ith clause, i.e., either a variable x (said of parity 0) or its negation \(\lnot x\) (of parity 1). In the following we will construct a tuple \((X,\psi )\) from \(\varphi \) such that \(\varphi \in \mathsf {3SAT} \) if and only if \(\mathfrak {A}\models _X\psi \). First we define the team X to be the set

$$ X = \{(i,j,x,p)\mid \text {in { i}th clause the { j}th literal is the variable { x} with parity { p}}\}. $$

Technically the team can be seen as an encoding of the given formula. For instance the formula \(\varphi =(x_1\vee \lnot x_2\vee x_3)\wedge (\lnot x_1\vee \lnot x_2\vee \lnot x_3)\) would yield the team \( X = \{(1,1,x_1,0),(1,2,x_2,1),(1,3,x_3,0),(2,1,x_1,1),(2,2,x_2,1),(2,3,x_3,1)\}. \)

The formula \(\psi \) is defined as

$$ \langle {\frac{1}{3}}\rangle \bigl ({{\mathrm{=}}}(\text {clause},\text {literal})\wedge {{\mathrm{=}}}(\text {variable},\text {parity})\bigr ). $$

Then intuitively speaking \(\psi \) states that one has to decide for each clause a satisfying literal and do this consistently, i.e., the corresponding assignment has to be consistent. At first one selects exactly one third of the elements in X such that for each clause a literal is chosen (i.e., clause will determine the value of literal). Then the parity of each variable is consistently chosen (i.e., variable will determine the value of parity). It is straightforward to show that \(\mathfrak {A}\models _X\psi \) iff \(\varphi \) is satisfiable.

For the \(\mathrm {NP} \) upper bound, first observe that we can simply guess a subset \(X'\) of X such that \(|X'|\ge \frac{1}{3}|X|\). Then we just have to check whether \(\mathfrak {A}\models _{X'}{{\mathrm{=}}}(\text {clause},\text {literal})\wedge {{\mathrm{=}}}(\text {variable},\text {parity})\) holds. This can be clearly done in polynomial time.    \(\square \)

The next theorem shows that \(\mathrm {NP} \)-hard properties can be defined using very simple formulas even if the operator \(\langle {p}\rangle \) is restricted to appear only in front of dependence atoms. It is worth noting that the data complexity of formulas addressed in Theorem 4 without the operator \(\langle {p}\rangle \) is in NL by the results of [25].

Theorem 4

Model checking for \({{\mathrm{=}}}(x,y)\vee (\langle {p}\rangle {{\mathrm{=}}}(x,y)\wedge {{\mathrm{=}}}(u,v))\) is \(\mathrm {NP} \)-complete.

Proof

The upper bound is due to the same argument as in the proof of Theorem 3: use nondeterminism to tame the \(\langle {p}\rangle \) operator. The rest is just standard technique as for \(\mathsf {D} \), see the book of Väänänen [36].

Now we turn to the lower bound, again we will just sketch the proof. Here we will reduce from \(\mathsf {3SAT} \) through \(\mathsf {Max\text{- }2SAT} \), a well-known \(\mathrm {NP} \)-hard optimisation problem whose decision variant is \(\mathrm {NP} \)-complete. The problem asks given a 2CNF-formula \(\varphi \) and a number \(k\in \mathbb {N}\), if at least k of the clauses of \(\varphi \) can be simultaneously satisfied [12]. Garey et al. describe a reduction f from \(\mathsf {3SAT} \) to the decision variant of \(\mathsf {Max\text{- }2SAT} \) such that \(\varphi \in \mathsf {3SAT} \) iff at least \(\frac{7}{10}\) of the clauses of \(f(\varphi )\) can be satisfied.

We will exploit this known reduction in the following way. The team X is constructed in the same way as in the proof of Theorem 3. The formula then is

$$ {{\mathrm{=}}}(\text {clause},\text {literal})\vee ({{\mathrm{=}}}(\text {clause},\text {literal})\wedge \langle {\frac{7}{10}}\rangle {{\mathrm{=}}}(\text {variable},\text {parity})). $$

   \(\square \)

Currently the \(\langle {p}\rangle \) operator is defined with respect to some value of \(p\in [0,1]\). We saw that it depicts the behaviour of a ratio. Yet we want to shortly discuss a different approach for this setting. Instead we define \(\langle {p}\rangle \) for values of \(p\in \mathbb {N}\) hence p is now a natural number with the following meaning. A team X satisfies a formula \(\langle {p}\rangle \varphi \) if there exists a team \(Y\subseteq X\) of size \(\ge p\) such that \(Y\models \varphi \)—similarly for \([{p}]\) the meaning would be that every team \(Y\subseteq X\) of size \(\ge p\) satisfies \(\varphi \).

Sticking to this approach would allow one to state a similar result as for Theorems 3 and 4 but now for combined complexity as follows. Here one would just explicitly state the number of rows to be removed from the team, i.e., setting p to m in the constructed formula in the proof of Theorem 3. Regarding Theorem 4 in this setting the formula \(f(\varphi )\) increases the number of clauses by factor 10 and therefore requires to set p to \(\frac{7}{10}\cdot 10\cdot m=7\cdot m\) where m is the number of clauses of the given 3CNF formula \(\varphi \).

6 Conclusion

To the best of the authors knowledge this article is the first serious approach in defining team semantics with respect to multisets for first-order dependence logic. We also initiate the study of probabilistic analogues of independence and inclusion logic. Additionally the paper provides a first step into the study of a general approximation operator in the team semantics framework. We show several foundational properties of these newly defined formalisms and present some first computational complexity results for approximate dependence logic (\(\mathsf {ADL} \)). For \(\mathsf {ADL} \) we show that the introduction of approximate operators enables us to encode \(\mathrm {NP} \)-hard properties into the model checking problem (data complexity) of this logic even with only two dependence atoms, a single approximate operator, and a single conjunction. This shows how strong and elegant this kind of approximate notion really is. It is an interesting open question to study the computational properties of the analogously defined approximate inclusion logic.

Heretofore a broad field around intuitionistic logic [4] has developed. Intuitionistic logic can be seen as classical propositional logic without the law of excluded middle. One of the main concepts here is the intuitionistic implication \(\rightarrow \). In the setting of team semantics it is defined as follows. Let \(\mathfrak {A}\) be a structure and X be a team. Then \(\mathfrak {A}\models _X \varphi \rightarrow \psi \) is true if and only if for all subsets \(X'\subseteq X\) it holds that \(\mathfrak {A}\models _{X'}\varphi \) implies \(\mathfrak {A}\models _{X'}\psi \). The intuitionistic implication has been studied in the context of dependence logic, see e.g., the work of Yang [41]. An approximate variant of this operator in our setting will yield a nice resemblance to the \([{p}]\) operator. The slight and quite natural adjustment of intuitionistic implication to our setting is then: \(\mathfrak {A}\models _X \varphi \rightarrow _p\psi \) if and only if for all subsets \(X'\subseteq X\) with \(|X'|\ge p\cdot |X|\) (and \(p\in [0,1]\cap \mathbb Q\)) it holds that \(\mathfrak {A}\models _{X'}\varphi \) implies that \(\mathfrak {A}\models _{X'}\psi \). The operator \([{p}]\) can now be expressed with the help of the intuitionistic approximate implication. One can easily verify that \([{p}]\varphi \) is equivalent to \(\top \rightarrow _p\varphi \).

In this article we have considered approximation in the context of multiteam semantics when restricted to the finite. However our definitions can be generalised in a straightforward manner to deal with arbitrary cardinalities.