Keywords

1 Introduction

In this paper we describe a language for analyzing dependence of stochastic outputs of a system on deterministic inputs. This language applies to systems of all imaginable kinds: quantum physical, macroscopic physical, biological, psychological, and even purely mathematical, created on paper. The notion of “dependence,” as well as related to it notions of “influence,” “causality,” and “context” may have different meanings in different areas. Even if not, we do not know how to define them. We circumvent the necessity of designing these definitions by simply accepting that some inputs are connected to some outputs by arrows called direct influences. We ignore the question of how these direct influences are determined, except for a certain necessary condition they must satisfy (marginal selectivity). A system is also characterized by certain constraints imposed on the joint distribution of its outputs across different inputs. A prominent example when both direct influences and constraints are justified by a well-developed theory is the EPR paradigm in quantum physics, where it is assumed that measurement settings for a given particle directly affect measurement outcomes in that particle only, and the joint distributions of the measurement outcomes on different particles satisfy certain inequalities or parametric equalities. If these constraints can be accounted for entirely in terms of the posited direct influences, the system can be viewed as “contextless.” If this is not the case, we characterize probabilistic contexts by studying the deviations from the contextless behavior exhibited by the system.

Whether one deals with quantum contextuality or thinks of contextuality beyond even quantum bounds, our approach does squarely remains within the domain of the classical probability theory, which we refer to as Kolmogorovian. A caveat for using this attribution is that we do not mean the “naive” Kolmogorovian theory in which all random variables are thought of as defined on a single sample space (equivalently, as functions of a single random variable). Such a notion is no more tenable than the “set of all sets” of the naive set theory. The qualified Kolmogorovian approach we adopt is based on the principle of contextuality-by-default:

any two random variables recorded under mutually exclusive conditions are stochastically unrelated, defined on different sample spaces.

This is a radical version of views previously expressed in the literature, e.g., in Khrennikov (2008a, b), where it is traced back to Andrei Kolmogorov himself and even to George Boole. Our emphasis, however, is on the fact that any set of stochastically unrelated variables (but never “all of them”) can be coupled, or imposed a joint distribution upon, in many different ways (Thorisson 2000). In particular, the identity coupling is sometimes (but not always) possible, in which the two random variables defined under mutually exclusive conditions and “automatically” (by default) labeled as different and stochastically unrelated, merge into one and the same random variable.

The basics of this approach are presented in Sect. 2. In Sects. 3 and 4 we use it to investigate contextual influences with respect to a given pattern of direct influences. The theory and notation there closely follows Dzhafarov and Kujala (2013a). The departure point is that since different treatments (combinations of input values) are mutually exclusive, the joint distributions of the outputs corresponding to them, according to the principle of contextuality-by-default, are stochastically unrelated. We then consider all possible ways of coupling them across different treatments. From each such a coupling we extract stochastic relations that are “hidden,” principally unobservable, because they correspond to outputs obtained under different treatments. We focus on the special kind of these hidden relations, those between random variables that share the same pattern of direct influences. We call these hidden relations connections. Given a certain constraint imposed on the system by a theory or empirical observations, we pose the question of what connections imply (or force) this constraint and what connections are implied by (or compatible with) it. Taken over all possible couplings, these relations between connections and constraints characterize the type of contextuality exhibited by the system. This view of contextuality is different from the existing approaches (Khrennikov 2009; Laudisa 1997).

2 Probability Theory: Multiple Sample Spaces

Given two probability spaces, \(\left( S,\varSigma ,p\right) \) and \(\left( S_{A},\varSigma _{A},p_{A}\right) \), with standard meaning of the terms, a random variable is defined as a \(\left( \varSigma ,\varSigma _{A}\right) \)-measurable function \(A:S\rightarrow S_{A}\) subject to

$$\begin{aligned} p_{A}\left( X\right) =p\left( A^{-1}\left( X\right) \right) , \end{aligned}$$
(1)

for any \(X\in \varSigma _{A}\). The probability space \(\left( S,\varSigma ,p\right) \) is usually called a sample space, and we will refer to \(\left( S_{A},\varSigma _{A},p_{A}\right) \) as the distribution of \(A\). The sample space itself is a distribution of the random variable \(R\) (let us call it a basic variable) which is the \(\left( \varSigma ,\varSigma \right) \)-measurable identity function, \(x\mapsto x\), \(x\in \varSigma \). Any random variable \(A\) defined on this sample space can also be presented as a function \(A=f\left( R\right) \), and (1) can be written as

$$\begin{aligned} p_{A}\left( X\right) =\Pr \left[ A\in X\right] =\Pr \left[ R\in f^{-1}\left( X\right) \right] , \end{aligned}$$
(2)

for any \(X\in \varSigma _{A}\).

Let \(\left( A^{k}=f_{k}\left( R\right) :k\in K\right) \) be a sequenceFootnote 1 of random variables, all functions of one and the same basic variable \(R\), with \(A^{k}\) distributed as \(\left( S^{k},\varSigma ^{k},p_{k}\right) \). Then \(A=\left( A^{k}:k\in K\right) =f\left( R\right) \) too is a random variable that is a function of \(R\), with the distribution

$$\begin{aligned} \left( S_{A}=\prod _{k\in K}S^{k},\varSigma _{A}=\bigotimes _{k\in K}\varSigma ^{k},p_{A}\right) . \end{aligned}$$
(3)

Here, \(\bigotimes _{k\in K}\varSigma ^{k}\) is the minimal sigma-algebra containing sets of the form \(X^{k}\times \prod _{i\in K-\left\{ k\right\} }S^{i}\) for all \(X^{k}\in \varSigma ^{k}\), and \(p_{A}\) is defined by (2), with

$$\begin{aligned} f^{-1}\left( X\right) =\left\{ x\in S:\left( f_{k}\left( x\right) :k\in K\right) \in X\right\} . \end{aligned}$$
(4)

The distribution of \(A\) can also be given by (3) with no reference to its sample space, or basic variable. It can be viewed as a joint distribution of the components of a sequence \(A=\left( A^{k}:k\in K\right) \), such that, for any nonempty \(K'\subset K\), the subsequence \(A'=\left( A^{k}:k\in K'\right) \) is a random variable distributed as

$$\begin{aligned} \left( S_{A'}=\prod _{k\in K'}S^{k},\varSigma _{A'}=\bigotimes _{k\in K'}\varSigma ^{k},p_{A'}\right) , \end{aligned}$$
(5)

with

$$\begin{aligned} p_{A'}\left( X\right) =p_{A}\left( X\times \prod _{k\in K-K'}S^{k}\right) , \end{aligned}$$
(6)

for any \(X\in \varSigma _{A'}\). The distribution \(\left( S^{k},\varSigma ^{k},p_{k}\right) \) of a single \(A^{k}\) is determined by that of the one-element subsequence \(\left( A^{k}\right) \) in the obvious way. All the random variables \(A^{k}\) obtained in this way from \(A\) can be viewed as functions on one and the same basic variable, e.g., \(R=A\) itself.

We see that the relation “are jointly distributed” is synonymous to the relation “are functions of one and the same basic variable.” But clearly there cannot be a single basic variable of which all imaginable random variables are functions. This is obvious from the cardinality considerations alone, as random variables may have arbitrarily large sets of possible values. But this is true even if one confines consideration to all imaginable random variables with any given distribution, provided it is not concentrated at a point. Let, e.g., \(\mathcal {B}\) be a class (not necessarily a set) of all functions of \(R\) that are Bernoulli (0/1) variables with equiprobable values. That is, each \(B\in \mathcal {B}\) is a function \(f\left( R\right) \) with \(f:S\rightarrow \left\{ 0,1\right\} \), such that . Consider a Bernoulli variable \(B^{*}\) with equiprobable values such that for any \(B\in \mathcal {B}\),

(7)

Then \(B^{*}\) cannot be a function of \(R\) because it is independent of (hence is not the same as) any of the elements of \(\mathcal {B}\). If needed, however, one can redefine the basic variable, e.g., as \(R^{*}=\left( R,B^{*}\right) \), with independent \(R\) and \(B^{*}\), so that all elements of \(\mathcal {B}\cup \left\{ B^{*}\right\} \) become functions of \(R^{*}\).

This simple demonstration shows that the Kolmogorovian approach to probability is not represented by a single sample space with measurable functions on it. Rather the true picture is an “open-ended” class (definitely not a set) of basic variables that are stochastically unrelated to each other, each with its own class of random variables defined as its functions: schematically,

(8)

If necessary, using some coupling scheme as discussed below, any sequence of stochastically unrelated basic variables \(\left( R^{k}:k\in K\right) \) can be redefined into a random variable \(H=\left( H^{k}:k\in K\right) \) such that \(H^{k}\) and \(R^{k}\) are identically distributed for all \(k\). This amounts to considering all individual \(R^{k}\), as well as their functions, as functions of \(H\). But this procedure is not unique, and it cannot be performed for “all random variables.”

The contextuality-by-default principle requires that any two random variables conditioned upon mutually exclusive values of some third variable are stochastically unrelated. Indeed, there is never a unique way for coupling their realizations. A simple example: I flip a coin and depending on the outcome weigh one of two lumps of clay, lump 1 (if “heads”) or lump 2 (if “tails”). The random variables \(A=\) “weight reading for lump 1” and \(B=\) “weight reading for lump 2” do not a priori possess a joint distribution because there is no privileged way of deciding whether a given value of \(A\) co-occurs with a given value of \(B\). If necessary, however, such a co-occurrence (or coupling) scheme can always be constructed. For instance, one can list the values of \(A\) and \(B\) chronologically and then couple the \(n\)th realization of \(A\) with the \(n\)th realization of \(B\) (\(n=1,2,\ldots \)). Or one could rank-order the values of \(A\) and \(B\) and couple the realizations of the same quantile rank (this would create positive correlation between the variables) or of the complementary ranks (negative correlation). One cannot say that one way of paring is better justified than another, each one represents “a point of view” and creates its own joint distribution of \(A\) and \(B\).

3 All Possible Couplings Approach

Consider a sequence of random variables \(A=\left( A_{\phi }:\phi \in \varPhi \right) \). The elements of \(\varPhi \) are called (allowable) treatments. Two distinct treatments \(\phi ,\phi '\) are mutually exclusive, so \(A_{\phi }\) and \(A_{\phi '}\) are stochastically unrelated. This means that \(A\) is not a random variable.

Let there be a sequence of nonempty sets \(\alpha =\left( \alpha ^{k}:k\in K\right) \) such that \(\varPhi \subset \prod _{k\in K}\alpha ^{k}\). This means that every treatment is a sequence \(\phi =(x^{k}:k\in K)\), with \(x^{k}\in \alpha ^{k}\). The sets \(\alpha ^{k}\) are called inputs, and their elements \(x^{k}\) input values. Note that generally \(\varPhi \not =\prod _{k\in K}\alpha ^{k}\), that is, not all possible combinations of input values form treatments (hence the adjective “allowable”).

For every treatment \(\phi \), let the random variable \(A_{\phi }\) be a sequence of jointly distributed random variables \(A_{\phi }=\left( A_{\phi }^{\ell }:\ell \in L\right) \). For each \(\ell \), the sequence \(A^{\ell }=\left( A_{\phi }^{\ell }:\phi \in \varPhi \right) \) is called an output. Its element \(A_{\phi }^{\ell }\) can then be referred to as output \(A^{\ell }\) at treatment \(\phi \) (or simply output \(A_{\phi }^{\ell }\), when this does not create confusion). Note that \(A^{\ell }\) is not a random variable, because its components are stochastically unrelated.

We postulate that, for every input \(\alpha ^{k}\) and every output \(A^{\ell }\), either \(\alpha ^{k}\) directly influences \(A^{\ell }\), and we write \(A^{\ell }\leftarrow \alpha ^{k}\), or this is not true, \(A^{\ell }\not \leftarrow \alpha ^{k}\). This relation is treated as primitive. Its intuitive meaning can be different in different applications. The only constraint imposed on this relation, (complete) marginal selectivity, is as follows (Dzhafarov 2003). Let index subsets \(I\subset K\) and \(J\subset L\) be such that if \(A^{\ell }\leftarrow \alpha ^{k}\) for some \(\ell \in J\) then \(k\in I\). That is, no input belonging to \(\left( \alpha ^{k}:k\in K-I\right) \) directly influences any output belonging to \(\left( A^{\ell }:\ell \in J\right) \). Let \(\phi =(x^{k}:k\in K)\) and \(\phi '=(y^{k}:k\in K)\) be any allowable treatments such that

$$\begin{aligned} \phi |I=(x^{k}:k\in I)=(y^{k}:k\in I)=\phi '|I. \end{aligned}$$
(9)

The slash here indicates restriction of a function (sequence) on a subset of arguments (indices). Marginal selectivity means that under these assumptions

$$\begin{aligned} \left( A_{\phi }^{k}:k\in J\right) \sim \left( A_{\phi '}^{k}:k\in J\right) , \end{aligned}$$
(10)

where \(\sim \) means “has the same distribution as.” In other words, the joint distribution of a subset of outputs does not depend on inputs that do not directly influence any of these outputs. This does not mean, however, that these inputs, \(\left( \alpha ^{k}:k\in K-I\right) \), can be ignored altogether when dealing with \(\left( A^{\ell }:\ell \in J\right) \): generally, this will not allow one to account for its stochastic relation to other outputs, \(\left( A^{\ell }:\ell \in L-J\right) \).

By appropriately redefining the inputs the relation of “being directly influenced by” can always be made bijective: each output is directly influenced by one and only one input. The procedure is easier to illustrate on an example. Let the diagram of direct influences be

(11)

Assume, for simplicity, that all combinations of input values are allowable, \(\varPhi =\alpha ^{1}\times \alpha ^{2}\times \alpha ^{3}\). Then the redefined inputs are as shown:

(12)

The set \(\left\{ .\right\} \) represents a dummy (single-valued) input, it should be paired with any output that is not directly influenced by any inputs. The rest of the redefinition should be clear. The set of allowable treatments is redefined into a new set \(\varPsi \), which is not the Cartesian product of the new inputs but rather a proper subsequence thereof: e.g., if \(\beta ^{2}\) attains the value \(\left( x^{1},x^{2},x^{3}\right) \), then the only treatment allowable is

$$\begin{aligned} \left( \left( x^{1},x^{2}\right) ,\left( x^{1},x^{2},x^{3}\right) ,x^{2},.,x^{3}\right) . \end{aligned}$$
(13)

We assume from now on that the direct influences are defined in a bijective form: \(\alpha =\left( \alpha ^{k}:k\in K\right) \), \(\varPhi \subset \prod _{k\in K}\alpha ^{k}\), \(A_{\phi }=\left( A_{\phi }^{k}:k\in K\right) \), \(A^{k}\leftarrow \alpha ^{k}\) for every \(k\in K\), and there are no other direct influences.

Let us return to the sequence of random variablesFootnote 2

$$\begin{aligned} A=\left( A_{\phi }:\phi \in \varPhi \right) =\left( A_{\phi }^{k}:k\in K,\phi \in \varPhi \right) , \end{aligned}$$
(14)

with stochastically unrelated components. Consider a complete coupling for \(A\),

$$\begin{aligned} H=\left( H_{\phi }^{k}:k\in K,\phi \in \varPhi \right) , \end{aligned}$$
(15)

a random variable (that is, its components are jointly distributed) such that

$$\begin{aligned} H_{\phi }=\left( H_{\phi }^{k}:k\in K\right) \sim \left( A_{\phi }^{k}:k\in K\right) =A_{\phi }. \end{aligned}$$
(16)

Such a random variable \(H\) always exists. It suffices, e.g., to consider every element of \(H_{\phi }\) to be stochastically independent of every element in \(H_{\phi '}\), for all \(\phi \not =\phi '\). But generally, the complete couplings \(H\) for a given \(A\) can be chosen arbitrarily, except for the defining requirement (16).

Our approach consists in thinking of \(H\), in addition to (16), in terms of “connections” it contains, by which we understand couplings for sequences of random variables that are indexed by different treatments sharing the same pattern of direct influences. Consider, e.g., the components \(A_{\phi }^{k}\) for all \(\phi \) whose \(k\)th element equals a given value \(\phi \left( k\right) =x\). This subsequence can be written as

$$\begin{aligned} A_{x}^{k}=\left( A_{\phi }^{k}:\phi \in \varPhi ,\phi \left( k\right) =x\right) . \end{aligned}$$
(17)

Since \(A^{k}\leftarrow \alpha ^{k}\) only, all random variables \(A_{\phi }^{k}\) are directly influenced by the same input value. Let

$$\begin{aligned} C_{x}^{k}=\left( C_{\phi }^{k}:\phi \in \varPhi ,\phi \left( k\right) =x\right) \end{aligned}$$
(18)

be a coupling for \(A_{x^{k}}^{k}\). This means that if \(\phi \left( k\right) =x\),

$$\begin{aligned} C_{\phi }^{k}\sim A_{\phi }^{k}, \end{aligned}$$
(19)

and it follows from the marginal selectivity property that the distribution of \(C_{\phi }^{k}\) across all \(\phi \) with \(\phi \left( k\right) =x\) remains unchanged (and equal to the distribution of \(A_{\phi }^{k}\)). There can be many joint distributions of (18) with this property. One possibility is that \(C_{x}^{k}\) is an identity coupling, meaning that for any two \(C_{\phi }^{k},C_{\phi '}^{k}\) in (18),

$$\begin{aligned} \Pr \left( C_{\phi }^{k}=C_{\phi '}^{k}\right) =1. \end{aligned}$$
(20)

If this is assumed for all \(k\in K\) and \(x\in \alpha ^{k}\), then the complete coupling \(H\) in (15) can be written as the reduced coupling

$$\begin{aligned} R=\left( R_{x}^{k}:k\in K,x\in \alpha ^{k}\right) , \end{aligned}$$
(21)

such that

$$\begin{aligned} R_{\phi }=\left( R_{x}^{k}:k\in K,\phi \left( k\right) =x\right) \sim A_{\phi }. \end{aligned}$$
(22)

The existence of such a reduced coupling for a given \(A\) is the central theme of the theory of selective influences (Dzhafarov 2003; Dzhafarov and Kujala 2010, 2012a, b, 2013b, in press; Kujala and Dzhafarov 2008; Schweickert et al. 2012, Chap. 10), which includes the Bell-type theorems as special cases. Using the language of the present paper, if \(R\) exists, one can say that each \(A^{k}\) is influenced only by the input \(\alpha ^{k}\) that directly influences it. In other words, there are no influences that are not direct (“no context”). Other examples from behavioral sciences involve recent work on combination of concepts (Aerts et al., in press; Bruza et al. 2013; for a critical overview see Wang et al., in press; and Dzhafarov and Kujala, in press). In quantum physics the existence of the reduced coupling represents classical, pre-quantum determinism; it is the foundation of all Bell-type theorems (Basoalto and Percival 2003; Dzhafarov and Kujala 2012a).

We know, however, that Bell-type inequalities are violated in quantum physics. This leads us to explore alternatives to the assumption (20) and to the ensuing existence of a reduced coupling. This can be done by allowing \(C_{x}^{k}\) in (18) to be different from an identity coupling. The random variable \(C_{x}^{k}\) is called a connection. If its distribution is posited, we constrain the complete coupling (15) not just by (16), but also by its consistency with this connection:

$$\begin{aligned} H_{x}^{k}=\left( H_{\phi }^{k}:\phi \in \varPhi ,\phi \left( k\right) =x\right) \sim C_{x}^{k}. \end{aligned}$$
(23)

With this additional constraint, the coupling \(H\) need not exist.

Generalizing, let \(I\) be a subset of \(K\) other than empty set and \(K\) itself. Then the \(\left( I,\tau \right) {\text {-}}{ {connection}}\) is defined as a random variable

$$\begin{aligned} C_{\tau }^{I}=\left( C_{\phi }^{I}:\phi \in \varPhi ,\phi |I=\tau \right) \end{aligned}$$
(24)

such that for \(\phi |I=\tau \),

$$\begin{aligned} C_{\phi }^{I}\sim A_{\phi }^{I}=\left( A_{\phi }^{k}:k\in I\right) . \end{aligned}$$
(25)

Recall that \(\phi |I=\tau \) is the restriction of the treatment on a subset of its indices.Footnote 3 Note that the components of a given \(C_{\tau }^{I}\) are jointly distributed, but if \(\left( I,\tau \right) \not =\left( I',\tau '\right) \), \(C_{\tau }^{I}\) and \(C_{\tau '}^{I'}\) are stochastically unrelated.

Given a sequence of outputs \(A\) in (14), denote the sequence of the connections \(C_{\tau }^{I}\) for all \(I\) and \(\tau \) by \(C_{A}\) (not a random variable). Assume that the distributions of all these connections are known. Then one can ask whether a complete coupling \(H\) for \(A\) is consistent with all connections in \(C_{A}\), that is, whether in addition to (16) \(H\) also satisfies, for any \(I\in 2^{K}-\left\{ \emptyset ,K\right\} \) and any \(\tau \in \prod _{k\in I}\alpha ^{k}\),

$$\begin{aligned} H_{\tau }^{I}=\left( H_{\phi }^{I}:\phi \in \varPhi ,\phi |I=\tau \right) \sim C_{\tau }^{I}, \end{aligned}$$
(26)

where

$$\begin{aligned} H_{\phi }^{I}=\left( H_{\phi }^{k}:k\in I\right) . \end{aligned}$$
(27)

If this is true, then \(H\) is called an Extended Joint Distribution Sequence (EJDS) for \(\left( A,C_{A}\right) \). This notion is a generalization of the Joint Distribution Sequence (or “Joint Distribution Criterion set”) that coincides with the reduced coupling (21) in the theory of selective influences (Dzhafarov and Kujala 2010, 2012a, 2013b). It is obtained from EJDS by requiring that all connections be identity ones, that is, for any \(\phi ,\phi '\) in (24),

$$\begin{aligned} \Pr \left( C_{\phi }^{I}=C_{\phi '}^{I}\right) =1. \end{aligned}$$
(28)

4 Characterizing Contextuality

The notion of an EJDS can be used to characterize contextuality in relation to constraints imposed on the outputs of a system. Suppose that it is known that the outputs \(A\) taken across all allowable treatments in (14) satisfy a certain property \(\mathcal {P}\left( A\right) \). This property may be described by certain equations and inequalities relating to each other parameters of the outputs, such as Bell-type inequalities, or Cirelson-Landau’s quantum inequalities (see below). One should investigate then the set of possible \(C_{A}\) in relation to this property \(\mathcal {P}\left( A\right) \).

To understand this better, let us consider a simple example of \(A\). Let \(K\) be \(\left\{ 1,2\right\} \), the sequence of inputs \(\left( \alpha ^{k}:k\in K\right) \) be \(\left( \alpha ^{1}=\left\{ 1,2\right\} ,\alpha ^{2}=\left\{ 1,2\right\} \right) \), the sequence of allowable treatments be \(\varPhi =\alpha ^{1}\times \alpha ^{2}\), and the sequence of outputs be \(A=\left( \left( A_{ij}^{1},A_{ij}^{2}\right) :i,j\in \left\{ 1,2\right\} \right) \) (where each subscript \(ij\) represents the treatment \(\left( i,j\right) \)). The diagram of direct influences is assumed to be

(29)

The only choices of \(I\subset K\) here other than \(\emptyset \) and \(K\) are the singletons \(\left\{ 1\right\} \) and \(\left\{ 2\right\} \), so the only four connections are, for \(i\in \left\{ 1,2\right\} \),

$$\begin{aligned} C_{i}^{1}=\left( C_{i1}^{1},C_{i2}^{1}\right) ,C_{i}^{2}=\left( C_{1i}^{2},C_{2i}^{2}\right) , \end{aligned}$$
(30)

where \(C_{ij}^{k}\sim A_{ij}^{k}\) for all \(i,j,k\in \left\{ 1,2\right\} \). Recall that the logic of forming \(C_{i}^{1}=\left( C_{i1}^{1},C_{i2}^{1}\right) \) is that \(A_{i1}^{1}\) and \(A_{i2}^{1}\), while they are recorded at different treatments, \(\left( i,1\right) \) and \(\left( i,2\right) \), share the same pattern of direct influences, namely, both are directly influenced by the value \(i\) of \(\alpha ^{1}\) (in our general notation, \(\phi |\left\{ 1\right\} =\left( i\right) \)). So if their joint distribution is described by anything other than \(\Pr \left( C_{i1}^{1}=C_{i2}^{1}\right) =1\), we can speak of indirect, contextual influences. The situation with \(C_{i}^{2}\) is analogous. The complete coupling for \(A\) here is the 8-vector

$$\begin{aligned} H=\left( H_{ij}^{1},H_{ij}^{2}:i,j\in \left\{ 1,2\right\} \right) . \end{aligned}$$
(31)

Assume that each \(A_{ij}^{k}\) (hence also \(H_{ij}^{k}\) in the complete coupling, \(i,j,k\in \left\{ 1,2\right\} \)) is a binary random variable with equiprobable outcomes +1 and -1. Then \(A\) is represented by four probabilities \(p=\left( p_{11},p_{12},p_{21},p_{22}\right) \), where

$$\begin{aligned} p_{ij}=\Pr \left[ A_{ij}^{1}=+1,A_{ij}^{2}=+1\right] =\Pr \left[ H_{ij}^{1}=+1,H_{ij}^{2}=+1\right] . \end{aligned}$$
(32)

One prominent situation encompassed by this example is the Bohmian version of the EPR paradigm involving two spin-½ particles with two settings (spatial directions) per particle. As examples of a constraint \(\mathcal {P}\left( A\right) \) consider the Bell/CH/Fine inequalities (Bell 1964; Clauser and Horne 1974; Fine 1982)

$$\begin{aligned} 0\le p_{ij}+p_{ij'}+p_{i'j'}-p_{i'j}\le 1 \end{aligned}$$
(33)

and Cirel’son’s (1980) inequalities

$$\begin{aligned} \frac{1-\sqrt{2}}{2}\le p_{ij}+p_{ij'}+p_{i'j'}-p_{i'j}\le \frac{1+\sqrt{2}}{2}, \end{aligned}$$
(34)

where \(i,j\in \left\{ 1,2\right\} \), \(i'=3-i\), \(j'=3-j\) (so each expression contains four double-inequalities). The Bell/CH/Fine inequalities are known to be necessary and sufficient for the existence of a classical explanation for the EPR paradigm in question (Fine 1982), whereas the Cirel’son inequalities are necessary for the existence of a quantum mechanical explanation (Landau 1987).

One question to pose about the connections is: what is the set of all \(C_{A}\) such that whenever \(\mathcal {P}\left( A\right) \) is satisfied, an EJDS for \(\left( A,C_{A}\right) \) exists? We call any connection belonging to this \(C_{A}\) a fitting connection for \(\mathcal {P}\left( A\right) \). A question can also be posed about the opposite implication: what is the set of all \(C_{A}\) such that whenever an EJDS for \(\left( A,C_{A}\right) \) exists, \(\mathcal {P}\left( A\right) \) is satisfied? We call any connection in this \(C_{A}\) a forcing connection for \(\mathcal {P}\left( A\right) \). In our example, \(C_{A}\) is the sequence of four connections \(C_{i}^{k}\) in (30), and they are uniquely characterized by the 4-vector \(\varepsilon =\left( \varepsilon _{1}^{1},\varepsilon _{2}^{1},\varepsilon _{1}^{2},\varepsilon _{2}^{2}\right) \), where

$$\begin{aligned} \varepsilon _{i}^{1}=\Pr \left[ C_{i1}^{1}=+1,C_{i2}^{1}=+1\right] ,\varepsilon _{i}^{2}=\Pr \left[ C_{1i}^{2}=+1,C_{2i}^{2}=+1\right] . \end{aligned}$$
(35)

Hence the complete coupling \(H\), in order to be an EJDS for \(\left( A,C_{A}\right) \), should satisfy not only (32), but also

$$\begin{aligned} \Pr \left[ H_{i1}^{1}=+1,H_{i2}^{1}=+1\right] =\varepsilon _{i}^{1},\Pr \left[ C_{1i}^{2}=+1,C_{2i}^{2}=+1\right] =\varepsilon _{i}^{2}, \end{aligned}$$
(36)

for \(i\in \left\{ 1,2\right\} \).

To describe the fitting and forcing connections for our example, it is convenient to introduce the following abbreviations:

(37)

It turns out (Dzhafarov and Kujala 2013a) that the sets of fitting connections for the Bell/CH/Fine and Cirel’son inequalities are described by, respectively,

(38)

and

(39)

This means that if \(p\) satisfies (33), then any \(\varepsilon \) with is compatible with it, that is, this \(p\) and this \(\varepsilon \) can be embedded in the same EJDS \(H\). If \(p\) satisfies (34), the set of \(\varepsilon \) compatible with it is more narrow: they should additionally satisfy \(s_{0}\le \frac{3-\sqrt{2}}{2}\). Both sets include, of course, the vector \(\varepsilon =\left( 0,0,0,0\right) \), which represents no-contextuality and corresponds to the reduced coupling \(R\) in (21).

The sets of forcing connections for the Bell/CH/Fine and Cirel’son inequalities are described by, respectively,

$$\begin{aligned} s_{0}=1, \end{aligned}$$
(40)

and

$$\begin{aligned} s_{0}\ge \frac{3-\sqrt{2}}{2}. \end{aligned}$$
(41)

The set of \(\varepsilon \) such that \(s_{0}=1\) consists of \(\varepsilon =\left( 0,0,0,0\right) \), , and vectors with two zeros and two ½’s. All of them represent no-contextuality, with +1 and \({-}1\) interpreted differently in different connections. Only if \(\varepsilon \) is one of these vectors, \(p\) must satisfy the Bell/CH/Fine inequalities in order to be compatible with it. In other words, such an \(\varepsilon \) and no other “forces” \(p\) to satisfy these inequalities. The class of \(\varepsilon \) that force \(p\) to satisfy the Cirel’son inequalities should include these \(\varepsilon \) because every \(p\) satisfying (33) also satisfies (34). But there are other \(\varepsilon \), all those with \(s_{0}\ge \frac{3-\sqrt{2}}{2}\), that too are compatible with \(p\) only if they satisfy the Cirel’son inequalities.

The above serves only as a demonstration of how one could characterize the constraints imposed on outputs (by a theory or empirical generalizations) through the connections compatible with them, in the sense of being embeddable in the same coupling. It should be noted, however, that connections generally do not characterize couplings uniquely. This opens ways for constructing qualified Kolmogorovian models more general than the one presented in this paper.

5 Conclusion

We have shown that the classical, if qualified, Kolmogorovian probability theory is not synonymous with the classical explanation of the input-output relations (especially, in the entanglement paradigm of quantum physics). The latter, since Bell’s (1964) pioneering work, has been understood as the existence of a single sample space for all outputs when each output is indexed (identified) only by the inputs that directly influence it. In the qualified Kolmogorovian approach, however, this is only one of a potential infinity of possibilities. Different treatments (combinations of values of all inputs) correspond to stochastically unrelated random variables, and these can be coupled in many different ways. Only one of these ways, with identity connections, corresponds to John Bell’s single sample space.