1 Introduction

Let N be a positive integer, \([N] = \left\{ {0,1, \ldots , N-1} \right\} \), and \(\Omega = [N]^{{\mathbb {Z}}}\). Let \(T: \Omega \rightarrow \Omega \) be the left-shift given by \((T x)_{i} = x_{i+1}\) for all \(i \in {\mathbb {Z}}\). Given a probability measure \({\mathbf {p}}\) on \([N]\), we call \(B({\mathbf {p}}) = (\Omega , {\mathbf {p}}^{\mathbb {Z}}, T)\) a Bernoulli-shift on N symbols. We say that a Bernoulli shift \(B({\mathbf {q}})\) is a factor of \(B({\mathbf {p}})\), if there exists a measurable map \(\phi : \Omega \rightarrow \Omega \) such that the push-forward of \({\mathbf {p}}^{{\mathbb {Z}}}\) under \(\phi \) is \({\mathbf {q}}^{{\mathbb {Z}}}\) and \(\phi \circ T = T \circ \phi \) on a subset of \(\Omega \) with \({\mathbf {p}}^{{\mathbb {Z}}}\)-full measure; we also call the map \(\phi \) a factor from \(B({\mathbf {p}})\) to \(B({\mathbf {q}})\). We say that the Bernoulli shifts \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\) are isomorphic if there exists a factor map \(\phi \) from \(B({\mathbf {p}})\) to \(B({\mathbf {q}})\) such that its inverse \(\phi ^{-1}\) serves as a factor map from \(B({\mathbf {q}})\) to \(B({\mathbf {p}})\); in this case, we call \(\phi \) an isomorphism of \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\). A factor map \(\phi \) is monotone if for all \(x \in \Omega \), we have \(\phi (x)_i \le x_i\) for all \(i \in {\mathbb {Z}}\).

Theorem 1

If \(p \in (\tfrac{1}{2},1),\) then there exists a monotone isomorphism of \(B(1-p,p)\) and \(B(p,1-p)\).

Let us remark that the map defined by \(\phi (x)_i = {\mathbf {1}}[x_i=0]\) for all \(i \in {\mathbb {Z}}\), which just swaps zeros and ones, is clearly an isomorphism of \(B(1-p,p)\) and \(B(p,1-p)\). However, it is not monotone.

It is easy to determine when two Bernoulli shifts are isomorphic via an invariant introduced by Kolmogorov [9], which is non-increasing under factors and preserved under isomorphisms. The entropy of a probability measure \({\mathbf {p}}=(p_0, \ldots , p_{N-1})\) on \([N]\) is given by \(H({\mathbf {p}}): = -\sum _{i=0}^{N-1}{p}_i \log {p}_i\). Sinai [21, 22] proved that if \(H({\mathbf {p}}) \ge H({\mathbf {q}})\), then \(B({\mathbf {q}})\) is a factor of \(B({\mathbf {p}})\), and Ornstein [16, 17] proved that the entropies of two Bernoulli shifts are equal if and only if the two Bernoulli shifts are isomorphic.

Although it is easy to compute the entropy of a Bernoulli shift and to determine whether two Bernoulli shifts are isomorphic, the actual factor map which realizes the isomorphism is in general a much more complicated object. In some special cases, the factor map has a simple description [2, 15]. The first non-trivial example of an isomorphism is due to Melshalkin [15], which also gives a monotone isomorphism. I thank Zemer Kosloff for his help with the following example.

Example 1

(A classical example due to Melshalkin [15]) We will adjust the treatment given in [12] to ensure monotonicity. Let \({\mathbf {p}}=(\tfrac{1}{8},\tfrac{1}{8},\tfrac{1}{8},\tfrac{1}{8}, \tfrac{1}{2})\) and \({\mathbf {q}} = (\tfrac{1}{4},\tfrac{1}{4},\tfrac{1}{4},\tfrac{1}{4},0)\), so that \(N=5\), and \({\mathbf {p}}\) and \({\mathbf {q}}\) are probability measures on \([N] = \left\{ {0,1,2,3,4} \right\} \). Let \(x \in \Omega =[5]^{{\mathbb {Z}}}\). We define a factor map \(\phi : \Omega \rightarrow \Omega \) such that if \(x_i=4\), then \(\phi (x)_i \in \left\{ {2,3} \right\} \), if \(x_i \in \left\{ {2,3} \right\} \), then \(\phi (x)_i = 1\), and if \(x_i \in \left\{ {0,1} \right\} \), then \(\phi (x)_i = 0\).

It remains to specify what happens when \(x_i=4\). Think of every \(x_i=4\) as a right parenthesis, and think of every \(x_i \not =4\) as a left parenthesis. Ergodicity implies that every parenthesis will be matched legally almost surely. If \(x_i=4\), then let j be the position of the corresponding left parenthesis. If \(x_j\) is odd, then we set \(\phi (x)_i =3\), if \(x_j\) is even, then we set \(\phi (x)_i =2\).

By definition, the map \(\phi \) satisfies \(\phi \circ T = T \circ \phi \) and is monotone. Melshalkin proved that \(\phi \) is an isomorphism of \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\).\(\Diamond \)

It is easy to see that a necessary condition for the existence of a monotone factor from \(B({\mathbf {p}})\) to \(B({\mathbf {q}})\) is that there exists a monotone coupling of \({\mathbf {p}}\) and \({\mathbf {q}}\); that is, a probability measure \(\mathbf {\rho }\) on \([N] \times [N]\) such that \(\mathbf {\rho }(\cdot , [N]) = {\mathbf {p}}\), \(\mathbf {\rho }([N], \cdot ) = {\mathbf {q}}\), and \(\mathbf {\rho } \left\{ {(n,m): n\ge m} \right\} =1\). By Strassen’s theorem [23], the existence of a monotone coupling is equivalent to the condition that \(\sum _{i=0}^k{p}_i \le \sum _{i=0}^k {q}_i\) for all \(0 \le k < N\), in which case we say that \({\mathbf {p}}\) stochastically dominates \({\mathbf {q}}\).

Theorem 2

(Quas and Soo [18]) Let \({\mathbf {p}}\) and \({\mathbf {q}}\) be probability measures on \([N]\). If \({\mathbf {p}}\) stochastically dominates \({\mathbf {q}}\) and \(H({\mathbf {p}})\) is strictly greater than \(H({\mathbf {q}}),\) then there exists a monotone factor from \(B({\mathbf {p}})\) to \(B({\mathbf {q}}).\)

Ball [1] proved Theorem 2 in the case that the measure \({\mathbf {q}}\) is supported on two symbols. In both of those papers, a strict entropy inequality is required. In this paper, we treat the case of equal entropy, in the special case where there are only two symbols, zero and one, in each of the Bernoulli shifts. The methods used to prove Theorem 1 can also be adapted to produce monotone factors in other specific cases, but we do not know the answer to the following question.

Question 1

Let \({\mathbf {p}}\) and \({\mathbf {q}}\) be probability measures on \([N]\) such that \({\mathbf {p}}\) stochastically dominates \({\mathbf {q}}\) and \(H({\mathbf {p}}) = H({\mathbf {q}})\). Does there exists a monotone factor from \(B({\mathbf {p}})\) to \(B({\mathbf {q}})?\)

Russell Lyons [1] first posed the question of whether a monotone factor exists between two Bernoulli shifts. The requirement of monotonicity makes defining maps more difficult. In a related problem, Gurel-Gurevich and Peled [6, Theorem 1.3] proved that for \(p \in (\tfrac{1}{2}, 1)\) there exists a monotone map \(\phi : \left\{ {0,1} \right\} ^{{\mathbb {Z}}} \rightarrow \left\{ {0,1} \right\} ^{{\mathbb {Z}}}\) such that the product measure \((p, 1-p)^{{\mathbb {Z}}}\) is the push-forward of \((1-p, p)^{{\mathbb {Z}}}\) under \(\phi \); however, their map is not be equivariant; that is, it does not satisfy \(\phi \circ T = T \circ \phi \).

See [9, 17, 24] for more information on entropy and the isomorphism problem in ergodic theory. See [13, 18] for background on factors in probability theory.

The proof of Theorem 1 will involve some of the methods of [18], which in turn combines ideas from various treatments of the Ornstein and Sinai factor theorems given by Keane and Smorodinsky [10, 11], Burton and Rothstein [3, 4], del Junco [7, 8], and Ball [1]. We briefly summarize some of the main features and differences in their proofs. Keane and Smorodinsky, and Ball employed a marker-filler method and a version of Hall’s marriage theorem (see Remark 9). Del Junco also employed a marker-filler method, but he replaced the marriage lemma with his star-coupling (see Sect. 4). These constructions are explicit and they exhibit factor maps that are finitary–an almost surely continuity property (see [20] for details). In a somewhat more abstract approach, Burton and Rothstein proved that in a suitably defined metric space, the set of all factors is a residual set, in the sense of the Baire category theorem. This was the approach taken in [18], and will also be the approach we take here.

2 Dedication

I never had the pleasure of meeting Professor del Junco, but I wrote to him in December 2013 about Theorem 2 with a preprint of [18]. He wrote back the same day saying he was glad that an old idea of his had found another application and that he always felt that the star-coupling was one of his best ideas.

His coupling was a key feature in our proof of Theorem 2, and will also be a star feature in the proof of Theorem 1.

3 Coupling and stochastic domination

Strassen’s theorem [23] holds in the much more general setting of a partially ordered Polish space. The proof, even in the case of a finite set is non-trivial, see for example [14, Theorem 10.4]. However, in the special case of real-valued random variables or random variables taking values on a finite totally ordered set, the proof is easily obtained using a simple coupling of random variables.

3.1 Quantiles

Let X be a real-valued random variable, with cumulative distribution function or law given by \(F(z)=F_X(z):= {\mathbb {P}}(X \le z)\) for all \(z \in {\mathbb {R}}\). Define the generalized inverse of F via \(F^{-1}(y) := \sup \left\{ {x \in {\mathbb {R}}: F(x) <y} \right\} \). Let U be uniformly distributed in [0, 1], so that \(F_U(z) = z\) for all \(z \in [0,1]\). We call \(F_X^{-1}(U)\) the quantile representation of X. It is easy to see that the random variable \(F_X^{-1}(U)\) has the same law as X. When we define random variables using the quantile representation sometimes we will refer to the random variable U as the randomization; often U will be chosen to be independent of any previously defined random variables.

If X and Y are two real-valued random variables, we say that X stochastically dominates Y if \({\mathbb {P}}(X \le z) \le {\mathbb {P}}(Y \le z)\) for all \(z \in {\mathbb {R}}\). A coupling of X and Y is a pair of random variables \((X', Y')\) defined on the same probability space such that \(X'\) has the same law as X and \(Y'\) has the same law as Y. Let U be uniformly distributed in [0, 1]. If we set \(X':= F_X^{-1}(U)\) and \(Y':=F_Y^{-1}(U)\), then the quantile coupling of X and Y is given by \((X', Y')\). We say that the coupling \((X',Y')\) is monotone if \(X' \ge Y'\). Strassen’s theorem implies that X stochastically dominates Y if and only if there exists a monotone coupling of X and Y. Clearly, the existence of a monotone coupling implies stochastic domination; on the other hand, it is easy to see that the quantile coupling is monotone under the assumption of stochastic domination.

Let us remark that stochastic domination and the quantile coupling are also similarly defined in the case that the random variables take values in a finite totally ordered space.

Lemma 3

(Strassen’s theorem via the quantile coupling) Let X and Y be real-valued random variables or random variables taking values in a finite totally ordered space. If \((X',Y')\) is a quantile coupling of X and Y,  then \(X'\) is almost surely greater than or equal to \(Y'\) if and only if X stochastically dominates Y.

In Sect. 4, we will discuss an ingenious variation of the quantile coupling due to del Junco [7, Section 4], which will be a key ingredient in our proof of Theorem 1.

3.2 An simple application of Strassen’s theorem

Lemma 3 will be used to prove the following simple observation, which will serve as the starting point in our proof of Theorem 1. For two binary sequences x and y of the same length, we write \(x \preceq y\) if and only if \(x_i \le y_i\) for all indices i. Thus the relation \(\preceq \) defines a partial order on the set of binary sequences with the same length. We write \(x=1^{n}0^{\ell }\) to mean a binary sequence of n ones followed by \(\ell \) zeros.

Lemma 4

Let \(n \ge 1\) and \(p \in (\tfrac{1}{2}, 1)\). Let \(X=(X_1, \ldots , X_n)\) and \(Y= (Y_1, \ldots , Y_n)\) be an i.i.d. sequences of Bernoulli random variables with parameters p and \(1-p,\) respectively. Let \(B_n\) be the set of size \(n+1\) of all binary sequences z of length n of the form \(z=1^{n-\ell }0^{\ell }\) for some \(\ell \in [0, n]\). Let \(X^{*}\) and \(Y^{*}\) be random variables that have laws X and Y conditioned to be in \(B_n,\) respectively. Then with respect to the order \(\preceq ,\) defined on binary sequences,  \(X^{*}\) stochastically dominates \(Y^{*},\) and there is a monotone coupling of \(X^{*}\) and \(Y^{*}\).

Note that in Lemma 4, although the set of all binary sequences of a fixed length is only partially ordered by \(\preceq \), the set \(B_n\) is totally ordered by \(\preceq \). The set \(B_n\) can also be described as the set of binary sequences of length n that do not have a zero followed by a one. We will refer to \(B_n\) as a filler set.

Proof of Lemma 4

Lemma 4 is simple consequence of the duality between p and \(1-p\). For every integer \(\ell \in [0,n]\), we have

$$\begin{aligned} \sum _{i=0}^{n- \ell } p^{n-\ell -i}(1-p)^i = \sum _{i=0}^{n- \ell } (1-p)^{n-\ell -i}p^i; \end{aligned}$$
(1)

this implies, using \(\ell =0\), that \({\mathbb {P}}(X \in B_n) = {\mathbb {P}}(Y \in B_n)\). Thus in order to prove that \(X^*\) stochastically dominates \(Y^*\), it suffices to show that for all \(z \in B_n\), we have \({\mathbb {P}}(X \preceq z, X \in B_n) \le {\mathbb {P}}(Y \preceq z, Y \in B_n).\) If \(z =1^{n-\ell }0^{\ell }\), then since \(p > 1-p\), by equality (1) we have

$$\begin{aligned} {\mathbb {P}}(X \preceq z, X \in B_n)= & {} (1-p)^{\ell }\left( \sum _{i=0}^{n- \ell } p^{n-\ell -i}(1-p)^i\right) \\\le & {} p^{\ell }\left( \sum _{i=0}^{n- \ell } (1-p)^{n-\ell -i}p^i\right) \\= & {} {\mathbb {P}}(Y \preceq z, Y \in B_n). \end{aligned}$$

The existence of a monotone coupling follows from Lemma 3. \(\square \)

4 Markers, fillers, and joinings

4.1 Markers

Let us fix \(\Omega = \left\{ {0,1} \right\} ^{{\mathbb {Z}}}\). Let \(x \in \Omega \). We call the interval \([i, i+1] \subset {\mathbb {Z}}\) a primary marker if \(x_i=0\) and \(x_{i+1}=1\). Later, we will define secondary and tertiary markers which will consist of consecutive primary markers. Note that two distinct primary markers have an empty intersection. We call an interval of \({\mathbb {Z}}\) a filler if it is nonempty and lies between two primary markers. Thus each \(x \in \Omega \) partitions \({\mathbb {Z}}\) into intervals of primary markers and fillers.

Let \(p \in (\tfrac{1}{2},1)\), and consider the product probability measures on \(\Omega = \left\{ {0,1} \right\} ^{{\mathbb {Z}}}\) given by \(\mu :=(1-p, p)^{{\mathbb {Z}}}\) and \(\nu := (p, 1-p)^{{\mathbb {Z}}}\). Thus the probability that the zeroth coordinate is a one under \(\mu \) is p and is \(1-p\) under \(\nu \). By conditioning, an instance of a random variable X with law \(\mu \) can be given by first deciding on the locations of the primary markers, and then deciding on the content of the filler; the same observation holds for a random variable Y with law \(\nu \).

To be more precise, let \({\mathsf {T}}= \left\{ {{\mathsf {M}},{\mathsf {F}}} \right\} ^{{\mathbb {Z}}}\), where \({\mathsf {M}}\) and \({\mathsf {F}}\) are two symbols that stand for ‘marker’ and ‘filler.’ For each \(x \in \Omega \), define the hat map by setting

$$\begin{aligned} \hat{x}(i) = {\left\{ \begin{array}{ll} {\mathsf {M}} &{}\text {if } i\in {\mathbb {Z}}\text { is in a primary marker};\\ {\mathsf {F}} &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Let \(\tau \) and \(\tau '\) be push-forwards of the measures \(\mu \) and \(\nu \) via the hat map. Sometimes we will refer to \(\tau \) as the marker measure. We have the following disintegration. For \(\tau \)-almost every \(t\in {\mathsf {T}}\), there exists a probability measure, \(\mu _{t}\) on \(\Omega \), such that

$$\begin{aligned} \int f(x) d\mu (x) = \int \left( \int f(x) d\mu _{t}(x)\right) d\tau (t) \end{aligned}$$

for all measurable \(f:\Omega \rightarrow [0, \infty )\).

Remark 5

Keane and Smorodinsky [10, Lemma 4] give a concrete description of \(\mu _t\). The measure \(\mu _{t}\) assigns the sequence 01 to each primary marker interval of t, and is a product measure on the filler intervals, where on a filler interval of length n it is the law of n i.i.d. Bernoulli random variables with parameter p conditioned to be in the set \(B_n\) of sequences of consecutive ones followed by consecutive zeros (see Lemma 4). The analogous result holds of \(\nu \).\(\Diamond \)

Remark 6

Notice that the probability that the origin is contained in a primary marker is same under \(\mu \) and \(\nu \). Keane and Smorodinsky [10, Lemma 3] proved that \(\tau = \tau '\). Thus the marker measure \(\tau \) is the same for \(\mu \) and \(\nu \) and depends only on the parameter p. This fact will also be important in our proof of Theorem 1.\(\Diamond \)

4.2 Joinings

A coupling of \(\mu \) and \(\nu \) is a probability measure \(\xi \) on \(\Omega \times \Omega \) that has marginals \(\mu \) and \(\nu \); a joining is a coupling that is invariant under the product shift \(T \times T\), so that \(\xi \circ (T \times T) = \xi \). A joining \(\xi \) is ergodic if all \(\xi \)-almost sure \((T \times T)\)-invariant sets have measure zero or one. A coupling \(\xi \) is monotone if

$$\begin{aligned} \xi \left\{ {(x, y)\in \Omega \times \Omega : x_i \ge y_i \text { for all } i \in {\mathbb {Z}}} \right\} =1. \end{aligned}$$

A joining \(\xi \) is of marker form if for \(\xi \)-almost every \((x,y)\in \Omega \times \Omega \) the binary sequences x and y have the same primary markers. It follows from Remark 6 that there exists a joining of \(\mu \) and \(\nu \) in marker form. We will use a monotone version of this fact.

Proposition 7

There exists a monotone joining of \(\mu \) and \(\nu \) of marker form.

Proof

By Remark 6, we have \(\tau = \tau '\). Hence we may assume that there exist random variables X and Y with laws \(\mu \) and \(\nu \) such that X and Y have the same primary markers and filler intervals. Consider a coupling of X and Y defined in the following way. By Remark 5, conditioned on the locations of the primary markers, for each filler interval I of X, we know that the law of the restriction of X to I is given by the law of a finite sequence of i.i.d. Bernoulli random variables with parameter p conditioned be in a filler set; furthermore, conditioned on the locations of primary markers, the restrictions of X to each filler interval give independent random variables. The analogous statement holds for Y. For each filler interval I, by Lemma 4, there exists a monotone coupling of the restriction of X to I and the restriction of Y to I. Hence by applying Lemma 4 to each of the filler intervals independently, and leaving the primary markers alone, we obtain a coupling \((X', Y')\) of X and Y whose law is monotone and of marker form.

\(\square \)

4.3 The Baire category approach of Burton and Rothstein

Let \(p \in (\tfrac{1}{2}, 1)\) and \(J=J(p)\) be the set of all monotone ergodic joinings of \(\mu =(1-p, p)^{{\mathbb {Z}}}\) and \(\nu =(p, 1-p)^{{\mathbb {Z}}}\) of marker form. Note that J is nonempty by Proposition 7. Following the approach of Burton and Rothstein [4], we will show the monotone isomorphisms are a residual set in J, when we endow J with a suitable topology. Following del Junco [8], we assign a complete metric to J as follows. For \(i\ge 0\), let \({\mathcal {C}}_i\) be the set of measurable \(C \subset \Omega \times \Omega \) that only depend on the coordinates \(j\in [-i,i]\); we will call such sets cylinder sets. For any two measures \(\zeta \) and \(\xi \) on \(\Omega \times \Omega \) (which may not be joinings), set

$$\begin{aligned} {{\mathrm{d^{*}}}}(\zeta , \xi ):= \sum _{i=0}^ \infty 2^{-(i+1)} \sup _{C \in {\mathcal {C}}_i} |\zeta (C) - \xi (C)|. \end{aligned}$$

Thus \({{\mathrm{d^{*}}}}\) is the usual weak-star metric. For \(\xi \in J\), let \(\xi _{t}\) be \(\xi \) conditioned to have the primary markers given by \(t \in {\mathsf {T}}\). Let us remark that \(\xi _t\) is no longer a joining. Let \(\tau \) be the common marker measure. For \(\zeta , \xi \in J\), set

$$\begin{aligned} {{\mathrm{d}}}(\xi , \zeta ):= \int {{\mathrm{d^{*}}}}(\xi _{t}, \zeta _{t}) d\tau (t). \end{aligned}$$
(2)

Standard methods show that \((J, {{\mathrm{d}}})\) is a Baire space (see for example [18, Lemma 17]). We will show that the set of monotone isomorphisms contains an intersection of open dense sets of J, and hence is nonempty by the Baire category theorem. To be more precise, let \({\mathcal {F}}\) denote the product sigma-algebra for \(\Omega \). Let \({\mathcal {P}} := \left\{ {P_0, P_1} \right\} \) denote the partition of \(\Omega \) according the zeroth coordinate so that \(P_i := \left\{ {x \in \Omega : x_0 = i} \right\} \). Let \(\zeta \in J\) and let \(\varepsilon > 0\). If there exists \({\mathsf {T}}' \subset {\mathsf {T}}\) with \(\tau ({\mathsf {T}}') >1-\varepsilon \) such that for every \(t \in {\mathsf {T}}'\), and each \(P \in \sigma ({\mathcal {P}})\) there exists a \(P' \in {\mathcal {F}}\) such that

$$\begin{aligned} \zeta _t((P' \times \Omega ) \ \triangle \ (\Omega \times P)) < \varepsilon , \end{aligned}$$

then we say that \(\zeta \) is an \(\varepsilon \)-almost factor from \(B(1-p, p)\) to \(B(p, 1-p)\). For each \(\varepsilon >0\), let \(U_{\varepsilon }\) be the set of all \(\varepsilon \)-almost factors from \(B(1-p,p)\) to \(B(p, 1-p)\). It is routine to verify that \(U_{\varepsilon }\) is an open set (see for example [5, page 123–24]) and that an element in the intersection of all the \(U_{\varepsilon }\) defines a monotone factor from \(B(1-p,p)\) to \(B(p,1-p )\) (see for example [19, Theorem 2.8]). The real work lies in verifying that \(U_{\varepsilon }\) is dense; once this has been proved, the Baire category theorem gives that the set of monotone factors from \(B(1-p, p)\) to \(B(p, 1-p)\) contains an intersection of open dense sets, and hence is nonempty.

Theorem 1 asserts the existence of a monotone isomorphism which appears to be a much stronger statement the existence of a monotone factor. However, one of the advantages of the Baire category approach is that proving the existence of the isomorphism requires little additional work. We define an approximate factor from \(B(p, 1-p)\) to \(B(1-p, p)\) in the analogous way. Let \(\zeta \in J\) and let \(\varepsilon > 0\). If there exists \({\mathsf {T}}' \subset {\mathsf {T}}\) with \(\tau ({\mathsf {T}}') >1-\varepsilon \) such that for every \(t \in {\mathsf {T}}'\), and each \(P \in \sigma ({\mathcal {P}})\) there exists a \(P' \in {\mathcal {F}}\) such that \(\zeta _t( (P \times \Omega ) \ \triangle \ (\Omega \times P'))< \varepsilon ,\) then we say that \(\zeta \) is an \(\varepsilon \)-almost factor from \(B(p, 1-p)\) to \(B(1-p, p)\). For each \(\varepsilon >0\), let \(V_{\varepsilon }\) be the set of all \(\varepsilon \)-almost factors. Again, one can verify that \(V_{\varepsilon }\) is an open set, and that an element in the intersection of all the \(V_{\varepsilon }\) defines a monotone factor from \(B(p, 1-p)\) to \(B(1-p, p)\). Moreover, any element in the grand intersection of all the \(U_{\varepsilon }\) and \(V_{\varepsilon }\) defines a monotone isomorphism. It will become apparent that the same proof that shows that \(U_{\varepsilon }\) is dense can be essentially copied to show that \(V_{\varepsilon }\) is dense. Thus the Baire category theorem shows that the grand intersection is nonempty.

It remains to verify that for each \(\varepsilon >0\), the set \(U_{\varepsilon }\) of \(\varepsilon \)-almost factors is dense. Given \(\varepsilon >0\) and \(\xi \in J\), we need to find \(\xi ' \in U_{\varepsilon }\) with \({{\mathrm{d}}}(\xi , \xi ')<\varepsilon \). We will define \(\xi '\) as a certain perturbation of \(\xi \) which will be obtained using del Junco’s star-coupling [7, Section 4 and Proposition 4.7].

5 The star-coupling

Let X and Y be random variables taking values on finite sets A and B, respectively. In this section, we will discuss various couplings of X and Y; that is, random variables \(X'\) and \(Y'\) defined together on the same probability space with the same distribution as X and Y, respectively.

Let \(\rho \) be a joint probability mass function for X and Y. We say that an element \(a \in A\) is split by \(\rho \) if there exist distinct \(b,b' \in B\), such that \(\rho (a,b)>0\) and \(\rho (a,b')>0\). For the purposes of defining factors, we are interested in couplings that do not split many elements.

Remark 8

Let us remark that if we assign an arbitrary total ordering to A and B, then the law of a quantile coupling of X and Y will split at most \(|B|-1\) elements of A.\(\Diamond \)

Remark 9

Keane and Smorodinsky [10, Theorem 11] proved that there is a coupling of X and Y with law \(\rho '\) that will split at most \(|B|-1\) elements of A and in addition, \(\rho '\) is absolutely continuous with respect to \(\rho \); that is, \(\rho (a,b) = 0\) implies \(\rho '(a,b)=0\). A version of their theorem was used in the proof of Theorem 2, but we will not need to appeal to this result in our proof of Theorem 1.\(\Diamond \)

Let X and Y be jointly distributed random variables taking values on totally ordered finite sets \((A, \le )\) and \((B, \le )\), respectively. Let \(X'\) have the same law as X. One way to generate another random variable \(Y'\) so that \((X', Y')\) has the same joint distribution as (XY) is to appeal to a quantile representation. Consider the set of conditional cumulative distribution functions given by \(Q_a:={\mathbb {P}}(Y \le b \ | \ X=a)\) for each \(a \in A\). Let U be uniformly distributed in [0, 1] and independent of \(X'\). Set \(Y' :=Q_{X'}^{-1}(U)\). It is easy to verify that \((X', Y')\) has the same joint distribution as (XY); we call \((X',Y')\) the conditional quantile representation of (XY).

The next coupling we discuss is due to del Junco [7, 8]. Let \((X_1, Y_1)\) and \((X_2, Y_2)\) be random variables taking values on the finite sets \((A_1,B_1)\) and \((A_2, B_2)\), respectively. Suppose that each of the sets \(A_1, A_2, B_1,\) and \(B_2\) are totally ordered sets. We will define \((X_1', Y_1')\) and \((X_2', Y_2')\) such that \((X_i', Y_i')\) has the same law as \((X_i, Y_i)\) for \(i=1,2\). Let \(U_1, U_2\), and U be independent random variables uniformly distributed in the unit interval [0, 1]. Let \(X_2'\) and \(Y_1'\) be independent random variables that have the same laws as \(X_2\) and \(Y_1\), respectively; more specifically, we may assume that they are given by their respective quantile representations with sources of randomization given by \(U_2\) and \(U_1\). Next, using the same source of randomization U, let \(Y_2'\) be such that \((X_2', Y_2')\) is the conditional quantile representation of \((X_2, Y_2)\), and let \(X_1'\) be such that \((Y_1', X_1')\) is the conditional quantile representation of \((Y_1, X_1)\). We refer to \(((X_1', Y_1'), (X'_2, Y'_2))\) as the star-coupling of \((X_1, Y_1)\) and \((X_2, Y_2)\).

Remark 10

It is immediate from the definition the star-coupling that \(X_2'\) is independent of \((Y_1', X_1')\) and \(Y_1'\) is independent of \((X_2', Y_2')\).\(\Diamond \)

Remark 11

It follows from Remark 8, that the star-coupling of the random variables \((X_1, Y_1)\) and \((X_2, Y_2)\) taking values on \((A_1, B_1)\) and \((A_2, B_2)\), respectively, has the property that for a fixed \(a_2 \in A_2\) and \(b_1 \in B_1\), the number of \(a_1 \in A_1\) such that there are distinct \(b_2, b_2' \in B_2\) with both \((a_1,b_1, a_2,b_2)\) and \((a_1,b_1, a_2,b_2')\) receiving positive mass under the law of star-coupling \((X_1', Y_1', X_2', Y_2')\) is at most \(|B_2| -1\).\(\Diamond \)

Remark 12

del Junco refers to his coupling as the \(*\)-joining [7, 8].\(\Diamond \)

We may also iterate the star-coupling to more than two pairs of random variables. For example, if \((X_i, Y_i)\) are finite-valued random variables taking values in totally ordered spaces \((A_i, B_i)\) for \(i=1,2,3\), we define its iterated star-coupling in the following way. Take the star-coupling of \((X_1, Y_1)\) and \((X_2, Y_2)\) to be given by \(((X_1', Y_1'), (X_2', Y_2'))\). Assign a lexicographic ordering to the set \(A_1 \times A_2\), and take the star-coupling of \(((X_1', X_2'), (Y_1', Y_2'))\) and \((X_3, Y_3)\), to obtain random variables \(((X_1^{\prime \prime }, X_2^{\prime \prime }, X_3^{\prime \prime }), (Y_1^{\prime \prime }, Y_2^{\prime \prime }, Y_3^{\prime \prime }))\). Notice that by definition of the star-coupling, \((X_i^{\prime \prime }, Y_i^{\prime \prime })\) has the same law as \((X_i, Y_i)\) for \(i=1,2,3\).

We will make use of the following variation of the iterated star-coupling. Let \(\rho \) be a probability measure on the finite set \(A \times B\) which has projections \(\alpha \) and \(\beta \) on the sets A and B, respectively. For every \(k \in {\mathbb {Z}}^{+}\), let \(\alpha ^k\) and \(\beta ^k\) denote the k-fold product measures on \(A^k\) and \(B^k\), respectively. Let \(n_{\mathrm {initial}}\) and \({k_{\mathrm {group}}}\ge 2\) be integers. Let \(Z_i=(X_i, Y_i)\) be random variables with the following grouping property with law \(\rho \) and constants \(n_{\mathrm {initial}}\) and \({k_{\mathrm {group}}}\). For each \(i \ge 1\), the random variable \(Z_i\) takes values on \((A \times B)^{{k_{\mathrm {group}}}} \equiv A^{{k_{\mathrm {group}}}} \times B^{{k_{\mathrm {group}}}}\) and has a law that has projections \(\alpha ^{{k_{\mathrm {group}}}}\) and \(\beta ^{{k_{\mathrm {group}}}}\) on \(A^{{k_{\mathrm {group}}}}\) and \(B^{{k_{\mathrm {group}}}}\), respectively, and a projection \(\rho \) on each copy of \(A \times B\). Similarly, for \(i=0\), the random variable \(Z_0\) takes values on \((A \times B)^{n_{\mathrm {initial}}}\) and has a law that has projections \(\alpha ^{n_{\mathrm {initial}}}\) and \(\beta ^{n_{\mathrm {initial}}}\) on \(A^{n_{\mathrm {initial}}}\) and \(B^{n_{\mathrm {initial}}}\), respectively, and a projection \(\rho \) on each copy of \(A \times B\).

For \(i \ge 1\), write \(X_i = (X_i^1, \ldots , X_i^{{k_{\mathrm {group}}}})\), \(Y_i = (Y_i^1, \ldots , Y_i^{{k_{\mathrm {group}}}})\), \(Z_i^j = (X_i^j, Y_i^j)\), \(Y_i^{{{\mathrm {miss}}}} = (Y_i^1, \ldots , Y_i^{{k_{\mathrm {group}}}- 1})\), and \(Z_i^{{{\mathrm {miss}}}} = (X_i, Y_i^{{{\mathrm {miss}}}})\).

First, consider the following coupling of \(X_0, X_1, Y_0\), and \(Y_1\). The resulting coupling will not be a coupling of \(Z_0=(X_0,Y_0)\) and \(Z_1=(X_1, Y_1)\), but the resulting coupling as a measure on \((A \times B)^{n_{\mathrm {initial}}+ {k_{\mathrm {group}}}}\) will have \(\rho \) as a projection on each copy of \(A \times B\). Let \(W =(E, F)\) have law given by the product measure \(\rho ^{{k_{\mathrm {group}}}}\). Let \((\underline{Z_0}, \underline{Z_1}^{{{\mathrm {miss}}}})\) be a star-coupling of \(Z_0=(X_0, Y_0)\) and \(W^{{{\mathrm {miss}}}} = (E, F^{{{\mathrm {miss}}}})\). Using independent randomization, let \(Y^{{{\mathrm {rep}}}}_1\) be such that the pair \((\underline{X_1}^{{k_{\mathrm {group}}}}, Y_1^{{{\mathrm {rep}}}})\) is the conditional quantile representation of \(W^{{k_{\mathrm {group}}}}=(E^{{k_{\mathrm {group}}}}, F^{{k_{\mathrm {group}}}})\). It is easy to verify from the properties of the star-coupling and the independence of \((W^1, \ldots , W^{{k_{\mathrm {group}}}})\) that \(Y_1^{{{\mathrm {rep}}}}\) is independent of \(\underline{Y_1}^{{{\mathrm {miss}}}}\); moreover, \(\big (\underline{Z_0}, (\underline{X_1}, (\underline{Y_1}^{{{\mathrm {miss}}}}, Y_1^{{{\mathrm {rep}}}}) \big )\) is a coupling of \(Z_0\) and W such that \((\underline{X_0}, \underline{X_1})\) has law \(\alpha ^{n_{\mathrm {initial}}+ {k_{\mathrm {group}}}}\) and \((\underline{Y_0}, \underline{Y_1}^{{{\mathrm {miss}}}}, Y_1^{{{\mathrm {rep}}}})\) has law \(\beta ^{n_{\mathrm {initial}}+ {k_{\mathrm {group}}}}\). We will refer to this coupling as the star-coupling with replacement of \(Z_0\) and \(Z_1\).

Here, two ‘replacements’ take place, \(Z_1\) was replaced by \(W=(E, F)\) which has the product measure \(\rho ^{{k_{\mathrm {group}}}}\) as its law, and we only applied the star-coupling to \(Z_0\) and \(W^{{{\mathrm {miss}}}}\), where in the final construction, the ‘missing’ value is replaced with a conditional quantile representation.

We iterate this construction as follows. First, let \((Z_0', Z_1')\) be the star-coupling with replacement of \(Z_0\) and \(Z_1\). Next, we take the star-coupling with replacement of \(\big ( (X_0', X_1'), (Y_0', Y_1') \big )\) and \(Z_2\); to obtain random variables \(\big ( (\underline{X_0'}, \underline{X_1'}, \underline{X_2}), (\underline{Y_0'}, \underline{Y_1'}, \underline{Y_2}^{{{\mathrm {miss}}}}, Y_2^{{{\mathrm {rep}}}})\big )\) taking values on \(A^{n_{\mathrm {initial}}+2{k_{\mathrm {group}}}} \times B^{n_{\mathrm {initial}}+2 {k_{\mathrm {group}}}}\) with a law that has projections \(\alpha ^{n_{\mathrm {initial}}+2{k_{\mathrm {group}}}}\) and \(\beta ^{n_{\mathrm {initial}}+2{k_{\mathrm {group}}}}\), respectively. Finally, it is clear that this construction can be extended an arbitrary number of times in the obvious way. We call this construction the iterated-star coupling with replacement of \(Z_0, Z_1, \ldots , Z_n\).

The importance of the star-coupling can be summarized in Proposition 13, below; it is a version of del Junco’s [8, Proposition 4.7].

Proposition 13

(del Junco) Let \(\rho \) be a probability measure on the finite set \(A \times B\) and have marginals \(\alpha \) and \(\beta ,\) on A and B,  respectively. Assume that \(H(\alpha ) = H(\beta )\). Let \({k_{\mathrm {group}}}\ge 2\). For \(\eta >0,\) there exists \(n_{\mathrm {initial}}= n_{\mathrm {initial}}(\eta , {k_{\mathrm {group}}}) \in {\mathbb {Z}}^{+}\) such that the following holds.

Let \(n \in {\mathbb {Z}}^{+}\). Let \(Z_i=(X_i, Y_i),\) for \(i=0,1, \ldots , n,\) have the grouping property with the law \(\rho \) and constants \(n_{\mathrm {initial}}\) and \({k_{\mathrm {group}}}\). Define the following product spaces

$$\begin{aligned} {\mathbf {I}}_j:= & {} A^ {n_{\mathrm {initial}}} \times A^{{k_{\mathrm {group}}}j} \equiv A^{n_{\mathrm {initial}}+ {k_{\mathrm {group}}}j}, \\ {\mathbf {J}}_j:= & {} B^ {n_{\mathrm {initial}}} \times B^{{k_{\mathrm {group}}}j} \equiv B^{n_{\mathrm {initial}}+ {k_{\mathrm {group}}}j}, \end{aligned}$$

and

$$\begin{aligned} \bar{{\mathbf {J}}}_j := B^{({k_{\mathrm {group}}}-1)j}. \end{aligned}$$

For \({\mathbf {y}} = (y_0, (y_1, y_1'), \ldots , (y_j, y_j')) \in {{\mathbf {J}}_j} = B^{n_{\mathrm {initial}}} \times B^{{k_{\mathrm {group}}}j} = B^{n_{\mathrm {initial}}} \times (B^{{k_{\mathrm {group}}}-1} \times B) \times \cdots \times (B^{{k_{\mathrm {group}}}-1} \times B)\), let

$$\begin{aligned} \bar{{\mathbf {y}}} = (y_1, \ldots , y_j) \in \bar{{\mathbf {J}}}_j. \end{aligned}$$

Let \({\mathbf {W}}_n= ({\mathbf {X}}_n, \mathbf {Y}_n)\) be a random variable given by the iterative star-coupling with replacement of \({Z}_0, Z_1, \ldots , Z_n\). There exists a deterministic function \(\Psi :{\mathbf {I}}_n \rightarrow \bar{{\mathbf {J}}}_n\) such that \({\mathbb {P}}(\bar{\mathbf {Y}}_n = \Psi ({\mathbf {X}}_n))>1-\eta \).

The proof of Proposition 13 uses the Shannon–McMillan–Breiman theorem and Remark 11. A version of Proposition 13 is also used the proof of Theorem 2 of Quas and Soo, see [18, Proposition 14].

6 The proof of Proposition 13

Proof of Proposition 13

We will place conditions on \(n_{\mathrm {initial}}\) later. Let \(h:=H(\alpha ) = H(\beta )\). Let \(\varepsilon >0\) such that

$$\begin{aligned} h - 2\varepsilon > \left( 1- \tfrac{1}{{k_{\mathrm {group}}}}\right) (h+ \varepsilon ). \end{aligned}$$
(3)

Set

$$\begin{aligned} {\mathbf {L}}_j:= n_{\mathrm {initial}}+ {k_{\mathrm {group}}}j, \quad \text {for } 0 \le j \le n. \end{aligned}$$

Let \({\mathbf {x}} \in {{\mathbf {I}}_j}\) be given by \({\mathbf {x}}= (x_{0}, \ldots , x_{j})\). We say that \({\mathbf {x}}\) is \(\alpha \)-good if

$$\begin{aligned} \alpha ^{{\mathbf {L}}_j}({\mathbf {x}}) < e^{-(h - \varepsilon ) \mathbf {L_j}}, \end{aligned}$$
(4)

and is \(\alpha \)-completely good if for all \( 0 \le i \le j\), we have \((x_{0}, \ldots , x_{i}) \in {\mathbf {I}}_i\) is good.

The corresponding definition for \(\beta \) is more complicated. We remark that in the presence of a strict entropy gap, \(H(\alpha ) > H(\beta )\), the definition could be more simple and symmetric (see for example [10, page 366] or [18, Proof of Proposition 14]. We declare that every \({\mathbf {y}} \in {\mathbf {J}}_0\) is \(\beta \)-good. Set

$$\begin{aligned} \bar{{\mathbf {L}}}_j:= ({k_{\mathrm {group}}}-1)j \quad \text {for } 0 \le j \le n, \end{aligned}$$

so that \({\mathbf {L}}_j = \bar{{\mathbf {L}}}_j + n_{\mathrm {initial}}+ j\). We say that \({\mathbf {y}}=(y_0, (y_1, y_1'), \ldots , (y_j, y_j')) \in {\mathbf {J}}_j\) is \(\beta \)-good if

$$\begin{aligned} \beta ^{{\bar{\mathbf {L}}}_j}(\bar{{\mathbf {y}}}) > e^{-(h -2\varepsilon ) \mathbf {{L}_j}}. \end{aligned}$$
(5)

Note that being \(\beta \)-good does not depend on the behavior of \((y_0, y_1', \ldots , y_j')\) and \({\mathbf {L}}_j\) appears in the exponent rather than \(\bar{{\mathbf {L}}}_j\) on the right hand side of (5). We say that \({\mathbf {y}}\) is \(\beta \)-completely good if for all \( 0 \le i \le j \), we have \({\mathbf {y}}_j=(y_0, (y_1, y_1'), \ldots , (y_i, y_i')) \in {\mathbf {J}}_i \) is good.

Note that if \({\mathbf {y}} \in {\mathbf {J}}_n\) is not completely good, then for some \(j \ge 1\), we have \(\beta ^{\bar{{\mathbf {L}}}_j}(\bar{{\mathbf {y}}}_j) <e^{-(h-2\varepsilon ){\mathbf {L}}_j},\) and by (3),

$$\begin{aligned} \beta ^{\bar{{\mathbf {L}}}_j}(\bar{{\mathbf {y}}}_j) < e^{-(h-2\varepsilon ){\mathbf {L}}_j} \le e^{-(1- 1/{k_{\mathrm {group}}})(h+ \varepsilon ) {\mathbf {L}}_j} \le e^{-(h+\varepsilon ) \bar{{\mathbf {L}}}_j}. \end{aligned}$$
(6)

For two elements \({\mathbf {y}}, \mathbf {z} \in {\mathbf {J}}_j\), we say that they are equivalent if \(\bar{{\mathbf {y}}} = \bar{\mathbf {z}}\). We let \([{\mathbf {y}}] \subset {\mathbf {J}}_i\) be the equivalence class of \({\mathbf {y}}\). Given a measure on \({\mathbf {I}}_j \times {\mathbf {J}}_j\) we say it finely splits an element \({\mathbf {x}} \in {\mathbf {I}}_j\) if there exists \({\mathbf {y}}, \mathbf {z} \in {\mathbf {J}}_i\) such that \([{\mathbf {y}}] \not = [\mathbf {z}]\) and for which the measure assigns positive mass to both \(({\mathbf {x}}, {\mathbf {y}})\) and \(({\mathbf {x}}, \mathbf {z})\).

For \(j \ge 0\), let \({\mathbf {W}}_j=({\mathbf {X}}_j, \mathbf {Y}_j)\) be a random variable given by the iterative star-coupling with replacement of \(Z_{0}, Z_1, \ldots , Z_{j}\), where we set \({\mathbf {W}}_0:={Z}_0\); thus \(({\mathbf {X}}_j, \mathbf {Y}_j)\) takes values in \({{\mathbf {I}}_j} \times {\mathbf {J}}_j\). We say that \({\mathbf {x}} \in {{\mathbf {I}}_j}\) is desirable if the following properties are satisfied.

  1. (a)

    The element \({\mathbf {x}}\) is \(\alpha \)-completely good.

  2. (b)

    The element \({\mathbf {x}}\) is not finely split by (the law of) \({\mathbf {W}}_j=({\mathbf {X}}_j, \mathbf {Y}_j)\).

  3. (c)

    Furthermore, up to equivalence, there is a unique \(\beta \)-completely good \({\mathbf {y}} \in {{\mathbf {J}}_j}\) for which \(({\mathbf {x}}, {\mathbf {y}})\) receives positive mass under (the law of) \({\mathbf {W}}_j\).

For desirable \({\mathbf {x}} \in {\mathbf {I}}_j\), set \(\Psi _{j}({\mathbf {x}}) = \bar{{\mathbf {y}}}\), where \({\mathbf {y}}\) is determined by condition (c); otherwise if \({\mathbf {x}}\) is not desirable simply set \(\Psi _{j}({\mathbf {x}})= {\mathbf {y}}'\) for some predetermined fixed \({\mathbf {y}}'\in {\mathbf {J}}_j\). Note that

$$\begin{aligned} {\mathbb {P}}\big (\bar{\mathbf {Y}}_j = \Psi _j({\mathbf {X}}_j)\big ) \ge {\mathbb {P}}({\mathbf {X}}_j \text { is desirable}). \end{aligned}$$

Remark 11 and del Junco’s inductive argument [8, Lemma 4.6] will be used to show that for all \(j \ge 0\),

$$\begin{aligned} {\mathbb {P}}({\mathbf {X}}_j \text { is not desirable})\le & {} {\mathbb {P}}({\mathbf {X}}_j \text { is not c.g.}) + {\mathbb {P}}(\mathbf {Y}_j \text { is not c.g.}) \nonumber \\&+\,|B|^{{k_{\mathrm {group}}}}\sum _{i=0} ^ {j-1} e^{-\varepsilon {\mathbf {L}}_i}, \end{aligned}$$
(7)

where “c.g.” is short for completely good.

The case \(j=0\) is vacuous, since being good implies being completely good, and under \({Z}_0\) no elements are finely split.

Assume (7) for the case \(j-1 \ge 0\). We show that (7) holds for the case j. Let E be the event that \({\mathbf {X}}_{j-1} \text { is desirable, but } {\mathbf {X}}_{j} \text { is not desirable}\). Clearly,

$$\begin{aligned} {\mathbb {P}}({\mathbf {X}}_j \text { is not desirable}) \le {\mathbb {P}}({\mathbf {X}}_{j-1} \text { is not desirable}) + {\mathbb {P}}(E). \end{aligned}$$
(8)

Note that on the event E, the random variables \({\mathbf {X}}_{j-1}\) and \(\mathbf {Y}_{j-1}\) are completely good. Observe that the event E is contained in the following three events

  1. (I)

    \(E_1:=\) The random variable \({\mathbf {X}}_j\) is not good, but \({\mathbf {X}}_{j-1}\) is completely good.

  2. (II)

    \(E_2:=\) The random variable \(\mathbf {X_j}\) is completely good, but is finely split under the iterative star-coupling \({\mathbf {W}}_j\), even though \({\mathbf {X}}_{j-1}\) is desirable.

  3. (III)

    \(E_3:=\) The random variable \(\mathbf {Y_j}\) is not good, but \(\mathbf {Y}_{j-1}\) is completely good.

Clearly,

$$\begin{aligned} {\mathbb {P}}(E_1) + {\mathbb {P}}({\mathbf {X}}_{j-1} \text { is not c.g.}) = {\mathbb {P}}({\mathbf {X}}_j \text { is not c.g.}). \end{aligned}$$
(9)

Similarly,

$$\begin{aligned} {\mathbb {P}}(E_3) + {\mathbb {P}}(\mathbf {Y}_{j-1} \text { is not c.g.}) = {\mathbb {P}}(\mathbf {Y}_j \text { is not c.g.}). \end{aligned}$$
(10)

Let us focus on the event \(E_2\). Let \({\mathbf {X}}_{j} = ( {\mathbf {X}}_{j-1},X)\), so that X takes values in \(A^{{k_{\mathrm {group}}}}\). We show that for any \(x \in A^{{k_{\mathrm {group}}}}\) and any completely good \({\mathbf {y}} \in {{\mathbf {J}}_{j-1}}\) that

$$\begin{aligned} {\mathbb {P}}(E_2 \ | \ X=x, \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}}) \le |B|^{{k_{\mathrm {group}}}} e^{-\varepsilon {\mathbf {L}}_{j-1}}, \end{aligned}$$
(11)

so that \({\mathbb {P}}(E_2) \le |B|^{{k_{\mathrm {group}}}} e^{-\varepsilon {\mathbf {L}}_{j-1}}\) and it follows that (7) holds by (8)–(10), and the inductive hypothesis.

Note that if \({\mathbf {x}}\) and \({\mathbf {y}}\) are good, then

$$\begin{aligned} {\mathbb {P}}( {\mathbf {X}}_{j-1} = {\mathbf {x}} |X=x, \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}})= & {} \frac{{\mathbb {P}}\left( {\mathbf {X}}_{j-1} = {\mathbf {x}}, \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}}, X=x \right) }{{\mathbb {P}}\left( \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}}, X=x\right) } \nonumber \\= & {} \frac{{\mathbb {P}}\left( {\mathbf {X}}_{j-1} = {\mathbf {x}}, \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}}\right) }{{\mathbb {P}}\left( \mathbf {Y}_{j-1} = \bar{{\mathbf {y}}}\right) } \end{aligned}$$
(12)
$$\begin{aligned}\le & {} \frac{{\mathbb {P}}\left( {\mathbf {X}}_{j-1} = {\mathbf {x}}\right) }{{\mathbb {P}}\left( \bar{\mathbf {Y}}_{j-1} = \bar{{\mathbf {y}}}\right) } \nonumber \\\le & {} e^{-\varepsilon {\mathbf {L}}_{j-1}}, \end{aligned}$$
(13)

where (12) follows from the independence properties of the star-coupling (with replacement) and (13) follows from (4) and (5). Also note that if \({\mathbf {x}}\) is desirable, then if \(({\mathbf {x}}, x)\) is finely split under \({\mathbf {W}}_j\), then for the unique, up to equivalence, \({\mathbf {y}}\) for which \(({\mathbf {x}}, {\mathbf {y}})\) receives positive mass under \({\mathbf {W}}_{j-1}\) there exist \((y,u), (y', u') \in B^{{k_{\mathrm {group}}}} = B^{{k_{\mathrm {group}}}-1} \times B\) such that \(y \not = y'\) and both \((({\mathbf {x}}, x), ({\mathbf {y}}, y, u))\) and \((({\mathbf {x}}, x), ({\mathbf {y}}, y', u'))\) receive positive mass under \({\mathbf {W}}_j\). By Remark 11 and the definition of the star-coupling with replacement, for a fixed \(x \in A^{{k_{\mathrm {group}}}}\) and \({\mathbf {y}} \in {{\mathbf {J}}_{j-1}}\) the set of all \({\mathbf {x}}\) such that there exists distinct \(y,y'\in B^{{k_{\mathrm {group}}}-1}\) for which there are \(u, u' \in B\) such that both \((({\mathbf {x}}, x), ({\mathbf {y}}, y,u))\) and \((({\mathbf {x}}, x), ({\mathbf {y}}, y',u'))\) receive positive mass under \({\mathbf {W}}_j\) has at most \( |B|^{{k_{\mathrm {group}}}-1}-1\) elements; thus summing over all such \({\mathbf {x}}\) yields (11).

The Shannon–McMillan–Breiman theorem implies that \(n_{\mathrm {initial}}\) can be chosen so that all three terms in (7) can be made smaller than \(\eta /3\). This is done in the following way. Set

$$\begin{aligned} S_{A}(k, K):= \left\{ {\mathbf {a} \in A^K: \alpha ^{\ell }(\mathbf {a}) < e^{-(h-\varepsilon )\ell } \text { for all } k \le \ell \le K} \right\} \end{aligned}$$

and

$$\begin{aligned} S_{B}(k, K):= \left\{ {\mathbf {b} \in B^K: \beta ^{\ell }(\mathbf {b}) > e^{-(h + \varepsilon )\ell } \text { for all } k \le \ell \le K} \right\} , \end{aligned}$$

where we have the slight abuse of notation that if \(\mathbf {a} = (a_1, \ldots , a_K)\), then \(\alpha ^{\ell }(\mathbf {a}) = \alpha ^{\ell }(a_1, \ldots , a_{\ell })\).

First, by the Shannon–McMillan–Breiman theorem choose \(\kappa \) so that for all \(K \ge \kappa \), we have

$$\begin{aligned} \beta ^K(S_{B}(\kappa , K)) > 1-\eta /3. \end{aligned}$$
(14)

Next, using the Shannon–McMillan–Breiman theorem again, choose \(n_{\mathrm {initial}}\) sufficiently large so that the following three inequalities are satisfied:

$$\begin{aligned}&\alpha ^K(S_{A}(n_{\mathrm {initial}}, K)) > 1-\eta /3 \quad \text {for all } K \ge n_{\mathrm {initial}}, \end{aligned}$$
(15)
$$\begin{aligned}&\min \left\{ {\beta ^{\ell }({\mathbf {y}})>0: {\mathbf {y}} \in B^{\ell }, 0 \le \ell \le \kappa } \right\} > e^{-(h - 2\varepsilon )n_{\mathrm {initial}}}, \end{aligned}$$
(16)

and

$$\begin{aligned} |B|^{{k_{\mathrm {group}}}}\sum _{i=n_{\mathrm {initial}}}^{\infty } e^{-\varepsilon i} < \eta /3. \end{aligned}$$
(17)

Finally, we will verify that this choice of \(n_{\mathrm {initial}}\) is sufficient. Condition (15) gives that \({\mathbb {P}}({\mathbf {X}}_j \text { is not c.g.}) < \eta /3\). Recall that by definition, \({\mathbf {L}}_j\ge n_{\mathrm {initial}}\) for all \(j \ge 0\), so that (17) ensures that \(|B|^{{k_{\mathrm {group}}}}\sum _{i=0} ^ {j-1} e^{-\varepsilon {\mathbf {L}}_j} < \eta /3\). It remains to verify that \({\mathbb {P}}(\mathbf {Y}_j \text { is not c.g.}) < \eta /3\).

The definition of completely good (5), gives that if

$$\begin{aligned} {\mathbf {y}}=(y_0, (y_1, y_1'), \ldots , (y_j, y_j')) \in {\mathbf {J}}_n \end{aligned}$$

is not completely good, then for some \(i > 0\), we have

$$\begin{aligned} \beta ^{\bar{{\mathbf {L}}}_i}({\bar{{\mathbf {y}}}_i}) < e^{-(h-2\varepsilon ){\mathbf {L}}_i} < e^{-(h-2\varepsilon )n_{\mathrm {initial}}}, \end{aligned}$$
(18)

where \({\mathbf {y}}_i = (y_0, (y_1, y_1'), \ldots , (y_i, y_i'))\); inequalities (18) and (16) imply that

$$\begin{aligned} \bar{{\mathbf {L}}}_i > \kappa ; \end{aligned}$$
(19)

moreover, (6) gives that

$$\begin{aligned} \beta ^{\bar{{\mathbf {L}}}_i} (\bar{{\mathbf {y}}}_i) \le e^{-(h+\varepsilon ) \bar{{\mathbf {L}}}_i}. \end{aligned}$$
(20)

Hence if \({\mathbf {y}}\) is not completely good, then by (19) and (20) it belongs to the complement of \(S_B(\kappa , K)\) for all \(K \ge \kappa \). Thus (14) gives that \({\mathbb {P}}(\mathbf {Y}_j \text { is not c.g.}) < \eta /3\). \(\square \)

In Proposition 13, we have that given \({\mathbf {x}} \in {\mathbf {I}}_n\), with high probability, up to equivalence, it determines a corresponding

$$\begin{aligned} {\mathbf {y}} = (y_0, (y_1, y_1'), \ldots , (y_n, y_j')) \in {\mathbf {J}}_n. \end{aligned}$$

It will be useful to refer to \((y_0, y_1', \ldots , y_n')\) as the undetermined coordinates, and \((y_1, \ldots , y_n)\) as the destined coordinates. We say that there are \(n_{\mathrm {initial}}+ n\) undetermined coordinates, since \(y_0 \in B^{n_{\mathrm {initial}}}\), and \(y_i \in B\) for \(1 \le i \le n\), and there are \(({k_{\mathrm {group}}}-1)n\) destined coordinates.

7 Perturbing the joining

Let \(\xi \in J\) be a monotone joining of marker form. We will define a perturbation \(\xi '\) of \(\xi \) using the iterated star-coupling with replacement. The perturbation will depend on a few parameters. With the help of Proposition 13, we will be able to make a choice of these parameters so that \(\xi '\) will be an almost factor and close to \(\xi \) in the metric defined in (2).

7.1 Defining the perturbation

Let \({k_{\mathrm {mark}}}< {r_{\mathrm {mark}}}\) be large integers to be chosen later. A secondary marker is the maximal union of at least \({k_{\mathrm {mark}}}\) consecutive primary markers, so that secondary markers have no filler between them and if the interval [ij] is a secondary marker for \(x \in \Omega \), then x restricted to [ij] has the form \(0101 \cdots 0101\).

Similarly, a tertiary marker is the maximal union of at least \({r_{\mathrm {mark}}}\) consecutive primary markers. We call the set of integers between but not including two secondary markers a block, and the set of integers between but not including two tertiary markers a city. Thus within a city there are blocks, which we consider ordered from left to right. Note that a block may contain primary markers.

Let \(p \in (\tfrac{1}{2}, 1)\). Let \(\xi \in J(p)\). Let \(Z=(X, Y)\) have law \(\xi \). Suppose that we are given that Z has primary markers given by \(t \in {\mathsf {T}}\). Let \(I \subset {\mathbb {Z}}\) be a block of length n. We are interested in the distribution of the random variable taking values in \( \left\{ {0,1} \right\} ^n \times \left\{ {0,1} \right\} ^n\) given by the distribution of Z, conditioned on t, restricted to I. The type of the block I is defined to be the vector containing an alternating sequence of integers that are the lengths of the filler and marker intervals in I and the length of the type is simply the sum of the integers in the type which give the length of the block. The distribution of this random variable is determined by the parameter p and the type of I. There are a countable number of types. Fix an enumeration \(({{\mathrm {type}}}_i)_{i \in {\mathbb {N}}}\) of the types and let \(\rho _i\) be the corresponding law. Associate to each type-i block a large integer \(n_{\mathrm {initial}}^i\) which will be chosen later; here i is an index that is not an exponent. A census of a city is the sequence of nonnegative integers \(c_i\), where each \(c_i\) is the number of type-i blocks in the city.

Remark 14

Note that for every \(i \in {\mathbb {N}}\), if the length of the type-i is n, then \(\rho _i\) is a probability measure on \( \left\{ {0,1} \right\} ^n \times \left\{ {0,1} \right\} ^n\) with projections \(\alpha _i\) and \(\beta _i\) that have equal entropy. The equal entropy assertion also follows from the duality between p and \(1-p\).\(\Diamond \)

A modification of \(Z =(X,Y)\) on a subset of \({\mathbb {Z}}\) is a coupling of X and Y given by \(Z'=(X',Y')\) such that \(Z'\) is equal to Z off the subset and has the same primary markers as Z. We will define a modification \(Z'\) of Z so that the law of \(Z'\) will be a member of J. The modifications will be made independently on each city, so that we need only define what changes occur on a city. On each city the modifications will be made independently on each set of types, so that we need only define what changes occur on each set of types.

Suppose that the primary markers of Z are given by \(t \in {\mathsf {T}}\). Fix \(i \in {\mathbb {N}}\). Let us focus on the type-i blocks in a single fixed city. We will refer to this modification as the star-modification of type- i on a city. Suppose that the census c is such that we may write

$$\begin{aligned} c_i = n_{\mathrm {initial}}^{i} + q_i{k_{\mathrm {group}}}+ r_i, \end{aligned}$$
(21)

where \(0 \le r_i < {k_{\mathrm {group}}}\) and \(q_i\) is an nonnegative integer. We will not make modifications on the last \(r_i\) blocks. Suppose that length of the type-i block is n. It may be helpful to think of two different copies of \( \left\{ {0,1} \right\} ^n\) by setting \(A=B = \left\{ {0,1} \right\} ^n\). Let \(W=(W_j)_{j=1} ^{c_i}\) be the set of random variables taking values in \(A \times B\) obtained by taking the restriction of Z, conditioned to have primary markers given by t, to each block of type-i in the city. Although W gives an identical sequence, where each \(W_j\) has law \(\rho _j\), it may not be independent. However, the projections on A and B are independent; if we write \(W_j=(X_j, Y_j)\), then by Remark 5, we have that \(X=(X_j)\) and \(Y=(Y_j)\) are i.i.d. sequences. Consider the first \(n_{\mathrm {initial}}^i\) random variables together as a single random variable taking values in \((A\times B)^{n_{\mathrm {initial}}^i}\), and each subsequent \({k_{\mathrm {group}}}\) random variables together as random variables taking values in \((A \times B)^{{k_{\mathrm {group}}}}\). We obtain a sequence of random variables \(M=(M_0, M_1, \ldots , M_{q_i})\). Thus M takes values on

$$\begin{aligned} (A \times B)^{n_{\mathrm {initial}}^i+ q_i{k_{\mathrm {group}}}} \equiv A^{n_{\mathrm {initial}}^i + q_i{k_{\mathrm {group}}}} \times B^{n_{\mathrm {initial}}^i + q_i{k_{\mathrm {group}}}}. \end{aligned}$$

Take the iterative star-coupling with replacement of these random variables to obtain new random variables \(M'=(M_0', \ldots , M_{q_i}')\); furthermore, using independent randomization, we may stipulate that these random variables are independent of Z. We define a modification \(Z'\) of Z by replacing the values of M with those of \(M'\), so that \(Z=Z'\) off the type-i blocks in the city, and if Z restricted to the type-i blocks, then it is given by M, then \(Z'\) restricted to the type-i blocks is given by \(M'\). The iterated star-coupling with replacement gives that the law of each \(M_j'\) projected onto each of the \(n_{\mathrm {initial}}+q_i{k_{\mathrm {group}}}\) copies of \(A \times B\) is \(\rho _i\), so that monotonicity is preserved and the primary markers remain unchanged. Also, the projections of M and \(M'\) on \(A^{c_i - r_i}\) have the same law. Similarly, the projections of M and \(M'\) on \(B^{c_i - r_i}\) have the same law, so that by Remark 5, the random variable \(Z'\) gives the required coupling.

Note we have only defined the star modification of type-i when \(c_i \ge n_{\mathrm {initial}}^i + {k_{\mathrm {group}}}\). In the case that \(c_i\) is not sufficiently large, we simply do nothing, that is, we stipulate that the star modification of type-i leaves everything unchanged.

For a single type-i, if we apply the star modification of the type on each city, independently, then we obtain a modification of Z that has law that belonging to J. We call this the star-modification of type- i of Z. We summarize our construction in the following proposition.

Proposition 15

Let \(p \in (\tfrac{1}{2}, 1)\) and \(\xi \in J(p)\). If Z has law \(\xi \) and if \(Z'\) is a star modification of Z of a particular type,  then the law of \(Z'\) is also a member of J(p).

Given a finite set of types, the star-modification of Z on the set of types is obtained by applying the star-modification in succession, starting with the smallest type.

Remark 16

In our construction of the star modification of type-i on a city, we relied on the fact that the law of \(M_j'\) still has a projection of \(\rho _i\) on each copy of \(A \times B\) to ensure that primary markers and monotonicity are preserved. This fact will also be important for us later in proving that the parameters of the star-modification \(Z'\) of Z can be chosen so that it is a small perturbation in the \({{\mathrm{d}}}\)-metric, since on the event that the origin is contained in a block, and the coordinates of a cylinder set C lie in that block, we have that the probabilities of C under Z and \(Z'\) are not only close, they are equal! This is another one of nice features of del Junco’s star coupling.\(\Diamond \)

7.2 Choosing the parameters

From the discussion in Sect. 3.3, it remains to show that given a joining \(\xi \in J(p)\), we can choose parameters so that the star-modification of a random variable with law Z on a finite set of type results in a random variable with law \(\xi '\) that is close to \(\xi \) in the metric defined by (2) and is also an almost factor.

Proof of Theorem 1

Let \(p \in (\tfrac{1}{2},1)\). Let \(\varepsilon >0\). As discussed in Sect. 3.3, it suffices to show that \(U_{\varepsilon }\), the set of \(\varepsilon \)-almost factors from \(B(1-p, p)\) to \(B(p, 1-p)\) is dense. The proof that \(V_{\varepsilon }\), the set of \(\varepsilon \)-almost factors from \(B(p, 1-p)\) to \(B(1-p, p)\) is similar with the roles of p and \(1-p\) reversed.

Let \(\xi \in J(p)\) and Z have law \(\xi \). Let \(\varepsilon >0\). We will choose the parameters for the star-modification \(Z'\) of Z as follows. The modification will occur on a finite set of types \(\mathcal {T}\), which will be specified later. Recall that in the star-modification, some blocks are left unchanged, so that Z equals \(Z'\) on those blocks, and whereas some blocks are modified via the iterated star-coupling with replacement, so that Z may not equal to \(Z'\) on those blocks. Note that \(Z'\) and Z always share the same primary markers, and although markers may lie in the modified coordinates they are always preserved. If \(\xi '\) is the law of \(Z'\), then these parameters will be chosen so that \({{\mathrm{d}}}(\xi , \xi ') < \varepsilon \) and \(\xi ' \in U_{\varepsilon }\). We choose the parameters as follows.

  1. (i)

    Set \(\varepsilon ' := \varepsilon /100\).

  2. (ii)

    Let \(\delta >0\) be small enough and \(\ell ^*\) be large enough so that two measures \(\zeta \) and \(\zeta '\) on \( \left\{ {0,1} \right\} ^{{\mathbb {Z}}} \times \left\{ {0,1} \right\} ^{{\mathbb {Z}}}\) are \(\varepsilon '\) close in the metric \({{\mathrm{d^{*}}}}\), if for all cylinder sets \(C \in \mathcal {C}_{\ell ^*}\), we have \(|\zeta (C) - \zeta '(C)| < \delta .\)

  3. (iii)

    Choose \({k_{\mathrm {mark}}}\) sufficiently large so that with probability at least \(1 - \varepsilon '\) the origin is in a block and the interval \([-2\ell ^*, 2\ell ^*]\) is in the block.

  4. (iv)

    With this choice of \({k_{\mathrm {mark}}}\), there exists \(L >0\) such that with probability at least \(1- 2\varepsilon '\) the origin will in a block and the length of the block will be between \(\ell ^*\) and L.

  5. (v)

    In particular, there exists a finite set of types \(\mathcal {T}\), those with lengths between \(\ell ^*\) and L, such that with probability at least \(1- 2\varepsilon '\), each block will be of type \(\mathcal {T}\). Since we have a fixed enumeration of the types, we will view \(\mathcal {T}\) as a subset of \({\mathbb {N}}\).

  6. (vi)

    Set \({k_{\mathrm {group}}}= \lceil 1/\varepsilon ' \rceil +1.\)

  7. (vii)

    For each \(i \in {\mathbb {N}}\), choose \(n_{\mathrm {initial}}^i\) via Proposition 13, by substituting \(\rho = \rho _i\), \(n_{\mathrm {initial}}=n_{\mathrm {initial}}^i\), and \(\eta = \varepsilon '\).

  8. (viii)

    Let c be the census of the city containing the origin. If \(c_i\) is sufficiently large, define \(q_i\) as in (21). Choose \({r_{\mathrm {mark}}}\) sufficiently large so that with probability at least \(1- \varepsilon '\), the origin is in a city, and for all \(i \in \mathcal {T}\) the census will satisfy \(n_{\mathrm {initial}}^i / q_i < \varepsilon '\).

Applying Proposition 15 a finite number of times gives that \(\zeta ' \in J\). By Remark 16, conditions (i), (ii), and (iii), imply that \({{\mathrm{d}}}(\zeta ', \zeta ) < \varepsilon \).

It remains to verify that \(\zeta ' \in U_{\varepsilon }\). Call \(t \in {\mathsf {T}}\) a model marker if the block containing the origin is a modified block and the origin lies in a destined coordinate. Property (vii) and Proposition 13 imply that for all model markers \(t \in {\mathsf {T}}\) there exists a deterministic measurable \(\psi : \Omega \rightarrow \left\{ {0,1} \right\} \) such that

$$\begin{aligned} \zeta _t' \left\{ {(x,y): (x, y_0) = (x, \psi (x) )} \right\} > 1- \varepsilon '. \end{aligned}$$
(22)

For a particular type-i, with \(c_i = n_{\mathrm {initial}}^i + q_i {k_{\mathrm {group}}}+ r_i\) as in (21) the ratio of undetermined coordinates plus those that are unchanged to destined coordinates is \((n_{\mathrm {initial}}^i + q_i + r_i) /q_i ({k_{\mathrm {group}}}-1)\). Recall that \(r_i < {k_{\mathrm {group}}}\). Conditions (iv), (v), (vi), and (viii), ensure us that the set of model markers has probability at least \(1- 7\varepsilon '\); this fact together with (22) and (i) imply that \(\zeta ' \in J\). \(\square \)

8 Some other examples

One of the key observations of Keane and Smorodinsky [11, Lemmas 2 and 3] that allowed the definition of markers in their proof of that two Bernoulli shifts \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\) of equal entropy are isomorphic was that one could assume without loss of generality that \({p}_0 = {q}_0\) in the case where \({\mathbf {p}}\) and \({\mathbf {q}}\) give non-zero mass to three or more symbols, and in the case where \({\mathbf {p}}\) gives non-zero mass to only two symbols, then one can assume that \({p}_0^k{p}_1 = {q}_0^k{q}_1\) from some k. In general, in the construction of monotone factors, we may not make this reduction since monotonicity may not preserved. However by a straightforward adaptation of the proof of Theorem 1, the following monotone versions of the Keane and Smorodinsky reductions are enough to prove the existence of a monotone isomorphism.

Theorem 17

Let \(N \ge 2\). Let \({\mathbf {p}}\) and \({\mathbf {q}}\) be probability measures on [N] of equal entropy. Suppose \({\mathbf {p}}\) stochastically dominates \({\mathbf {q}},\) and furthermore there exists \(i \ge j\) such that \({p}_i = {q}_j\) and \({\mathbf {p}}^*\) stochastically dominates \({\mathbf {q}}^*,\) where \({\mathbf {p}}^*\) is the law of a random variable with law \({\mathbf {p}}\) conditioned not to take the value i,  and \({\mathbf {q}}^{*}\) is the law of a random variable with law \({\mathbf {q}}\) conditioned not to take the value j. Then there exists a monotone isomorphism of \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\).

Theorem 18

Let \(N \ge 2\). Let \({\mathbf {p}}\) and \({\mathbf {q}}\) be probability measures on [N] of equal entropy. Suppose \({\mathbf {p}}\) stochastically dominates \({\mathbf {q}},\) and furthermore there exists \(i \ge j\) and \(k \ge \ell \) such that \({p}_i {p}_k = {q}_j {q}_{\ell }\) and for all \(n \ge 1,\) we have that \({\mathbf {p}}^{n*}\) stochastically dominates \({\mathbf {q}}^{n*},\) where \({\mathbf {p}}^{n*}\) is the law of a random vector with law \({\mathbf {p}}^n\) conditioned so that an occurrence of an i is never immediately followed by an occurrence of a k,  and \({\mathbf {q}}^{n*}\) is the law of a random vector with law \(\mathbf {q^n}\) conditioned so that an occurrence of a j is never followed by an occurrence of an \(\ell \). Then there exists a monotone isomorphism of \(B({\mathbf {p}})\) and \(B({\mathbf {q}})\).