Keywords

1 Introduction

Given data, one finds regularity and predicts the future. This is what all living things are doing and what we want to make machines do. Is there a universal way of doing this? If so, what properties should the prediction have?

Solomonoff’s algorithmic probability or universal induction answers. For simplicity, consider the case of infinite binary sequences, that is, the Cantor space \(2^\omega \). We try to predict the next bit from a given finite binary string. It is known that there is an optimal c.e. semi-measure M on the Cantor space. Here, a function is c.e. if it can be computably approximated from below and optimality means that it dominates all c.e. semi-measures up to a multiplicative constant. See the definition in Sect. 2.1. By the usual correspondence between measures and martingales (or semi-measures and supermartingales), this roughly means that the prediction by M behaves at least as well as one by any c.e. semi-measure. The prediction induced from an optimal c.e. semi-measure is called algorithmic probability or universal induction.

Algorithmic probability has some desirable properties. One of them is the convergence to the true induced measure (Theorem 2.3 below). This roughly means that algorithmic probability can find any computable regularity unknown in advance.

In this paper, we propose a framework showing that all sufficiently general prediction should have some properties. That a program has a property does not mean that the program is general enough, however, that a program does not have a property means that the program can be modified to a more general one. By evaluating computational complexity that a function with a property should have, we can also discuss how difficult to add the property although we do not discuss much this in this paper.

We focus on the speed of the convergence. One of our results (Theorem 4.4) says that, for all sufficiently general computable predictions, the speed of the convergence to the true measure cannot be bounded by a computable function. Thus, incomputability of the rate of the convergence is not by the incomputability of algorithmic probability. Rather than that, it is by the existence of computable measures that are “close” to each other.

The structure of the paper as follows. In Sect. 2 we review some notions and results on algorithmic randomness and algorithmic probability. In Sect. 3 we prove the convergence result for computable predictions along computably random sequences. In Sect. 4 we consider the case of Dirac measures and show the incomputability of the rate of the convergence.

2 Preliminaries

2.1 Algorithmic Randomness

We follow the notation in computability theory (see e.g. [10]) or the theory of algorithmic randomness (see e.g. [1, 8]).

The Cantor space \(2^\omega \) is the class of all infinite binary sequences equipped with the topology generated by the base elements of the cylinders \([\sigma ]=\{X\in 2^\omega : \sigma \prec X\}\) where \(\prec \) denotes the prefix relation. A function \(f:\omega \rightarrow \omega \) is computable if it is computable by a Turing machine. The computability on \(\mathbb {Q}\) or \(2^{<\omega }\) is naturally induced by their natural representation by \(\omega \). A real \(x\in \mathbb {R}\) is called computable if there exists a computable sequence \((a_n)_{n\in \omega }\) of rationals such that \(|x-a_n|\le 2^{-n}\) for all n. A real \(x\in \mathbb {R}\) is called left-c.e. if there exists an increasing computable sequence \((a_n)_{n\in \omega }\) of rationals.

A function \(f:2^{<\omega }\rightarrow \mathbb {R}\) is called computable or c.e. if \(f(\sigma )\) is computable or left-c.e. uniformly in \(\sigma \in 2^{<\omega }\), respectively. A measure \(\mu \) on \(2^\omega \) is computable if the function \(\sigma \mapsto \mu ([\sigma ])=:\mu (\sigma )\) is computable. A semi-measure is a function \(\mu :2^{<\omega }\rightarrow [0,1]\) such that \(\mu (\lambda )\le 1\) and \(\mu (\sigma )\ge \mu (\sigma 0)+\mu (\sigma 1)\) for every \(\sigma \in 2^{<\omega }\) where \(\lambda \) denotes the empty string. A measure \(\mu \) with \(\mu (\lambda )=\mu (2^\omega )=1\) is called a probability measure. Notice that each semi-measure \(\mu \) satisfying \(\mu (\lambda )=1\) and \(\mu (\sigma )=\mu (\sigma 0)+\mu (\sigma 1)\) for every \(\sigma \in 2^{<\omega }\) can be seen as a probability measure.

Let \(\mu ,\nu \) be measures or semi-measures on \(2^\omega \). We say \(\mu \) multiplicatively dominates (or m-dominates) \(\nu \) if, there exists \(C\in \omega \) such that \(\nu (\sigma )\le C\mu (\sigma )\) for all \(\sigma \in 2^{<\omega }\). A c.e. semi-measure \(\mu \) is called optimal if \(\mu \) m-dominates every c.e. semi-measure. An optimal c.e. semi-measure exists.

Fix a computable probability measure \(\mu \). Martin-Löf randomness (or ML-randomness) is an important concept to talk about randomness of individual sequences. ML-randomness is usually defined by tests, but we give an equivalent characterization to compare it with the definition of computable randomness below. Let \(X_{\le n}=X_1 X_2\cdots X_n\) be the initial segment of X with length n.

Theorem 2.1

([6]). A sequence \(X\in 2^\omega \) is ML-random w.r.t. \(\mu \) (or \(\mu \)-ML-random) if and only if there exists a constant \(C\in \omega \) such that \(\xi (X_{\le n})\le C \mu (X_{\le n})\) for all n, where \(\xi \) is an optimal c.e. semi-measure.

By the optimality, this is equivalent to the following statement: For every c.e. semi-measure \(\xi \), there exists a constant \(C\in \omega \) such that \(\xi (X_{\le n})\le C \mu (X_{\le n})\) for all n.

The central notion in this paper is computable randomness. Computable randomness for the uniform measure is defined by martingales. However, Rute [9] has suggested the following definition for a general computable probability measure.

Definition 2.2

A sequence \(X\in 2^\omega \) is computably random w.r.t. \(\mu \) (or \(\mu \)-computably random) if \(\mu (X_{\le n})>0\) for all n and \(\limsup _n \xi (X_{\le n})/\mu (X_{\le n})<\infty \) for all computable measures \(\xi \).

It is not difficult to see that this is equivalent to the following statement: For every computable measure \(\xi \), there exists a constant \(C\in \omega \) such that \(\xi (X_{\le n})\le C \mu (X_{\le n})\) for all n. ML-randomness implies computable randomness, but the converse does not hold in general.

2.2 Algorithmic Probability

We review some results from algorithmic probability. For details, see e.g. [2].

Let \(\mu \) be an optimal c.e. semi-measure. Fix a sequence \(X\in 2^\omega \). We are interested in the ratio

$$\begin{aligned} \mu (k|X_{<n})=\frac{\mu (X_{<n} k)}{\mu (X_{<n})}, \end{aligned}$$

where \(X_{<n}=X_1\cdots X_{n-1}\) and \(k\in \{0,1\}\). Notice that \(X_0\) denotes the empty string. The function \(k\mapsto \mu (k|X_{<n})\) is a measure on \(\{0,1\}\) but the measure of the whole space \(\{0,1\}\) need not be 1. The ratio can be understood as the conditional probability of the n-th bit given the initial \((n-1)\) bits of X, and is called algorithmic probability.

One of the desirable properties of algorithmic probability is the following convergence to the true induced measure.

Theorem 2.3

Let \(\mu \) be a computable probability measure on \(2^\omega \) and \(\xi \) be an optimal c.e. semi-measure. Suppose \(X\in 2^\omega \) is sampled from \(\mu \). Then,

$$\begin{aligned} \xi (k|X_{<n})-\mu (k|X_{<n})\rightarrow 0\; {\textit{as}} \; n\rightarrow \infty \end{aligned}$$
(1)

for both \(k\in \{0,1\}\) and

$$\begin{aligned} \frac{\xi (X_{n}|X_{<n})}{\mu (X_{n}|X_{<n})}\rightarrow 1\; {\textit{as}} \;n\rightarrow \infty \end{aligned}$$
(2)

with \(\mu \)-probability 1.

The convergence (1) is called the convergence in difference by Solomonoff [11]. The convergence (2) is called the convergence in ratio in Li-Vitányi’s book [7, Theorem 5.2.2, p. 433]. Remark the difference between on-sequence and off-sequence. The speed of the convergence is one of our interest, which has been discussed briefly in [3] but has not been established.

2.3 Distance Measures Between Probability Measures

The following notions are important in the proof of the convergence. For probability measures \(\mu ,\xi \) on \(\{0,1\}\), we define the squared Hellinger distance \(H^2(\nu ,\xi )\) by

$$\begin{aligned} H^2(\mu ,\xi )=\frac{1}{2}\sum _{k\in \{0,1\}}(\sqrt{\mu (k)}-\sqrt{\xi (k)})^2=1-\sum _{k\in \{0,1\}}\sqrt{\mu (k)\xi (k)}. \end{aligned}$$

From the equalities above, \(0\le H^2(\mu ,\xi )\le 1\). We also use the Kullback-Leibler divergence (or KL-divergence) of \(\mu \) with respect to \(\xi \) defined by

$$\begin{aligned} D(\mu || \xi )=\sum _{k\in \{0,1\}}\mu (k)\ln \frac{\mu (k)}{\xi (k)} \end{aligned}$$

where \(\ln \) is the natural logarithm and \(0\cdot \ln \frac{0}{z}=0\) for \(z\ge 0\) and \(y\ln \frac{y}{0}=\infty \) for \(y>0\). The two notions are related by the following inequality:

$$\begin{aligned} 2H^2(\mu ,\xi )\le D(\mu ||\xi ). \end{aligned}$$
(3)

One can check this by direct calculation or see Hutter [2, Lemma 3.11h].

3 Convergence Along Computable Random Sequences

3.1 Convergence Results

Algorithmic probability is computably approximable or \(\varDelta ^0_2\) but not computable while all the functions we can implement are computable. The correct or ideal prediction may have some properties that algorithmic probability has, however, implementing a program with one of such properties may be impossible. Thus, this does not say anything about whether our implementable prediction should have the properties. That a program does not have one of such properties does not say that the program is not general enough.

The goal of this paper is to give a framework to study the properties of computable measures or predictions. Algorithmic probability uses an optimal c.e. semi-measure while no computable measure m-dominates all computable measures. We abandon to pursue the unique correct prediction. Instead, we ask the following:

Which properties do all sufficiently general predictions have?

Notice that this statement is about a prediction that can be implemented in reality.

As a definition of the “generality” above, we use m-domination inspired by the definition of optimality. More concretely, we construct a computable measure \(\nu \) such that a property P holds for all computable measures m-dominating \(\nu \). This means that P holds for all sufficiently general predictions. There are many quantifiers, and we will see that their order is important.

Suppose that, a property P is witnessed by a computable measure \(\nu _P\), that is, all computable measures \(\xi \) m-dominating \(\nu _P\) have the properties P. Similarly, suppose that a property Q is witnessed by a computable measure \(\nu _Q\). Then, the property \(P\wedge Q\) is witnessed by the computable measure \((\nu _P+\nu _Q)/2\). The composition of properties can be extended into computable countable sum.

Some property P may be witnessed by a measure \(\mu \) executable in feasible time. If some good prediction induced from \(\nu \) does not have the property P, then the prediction by \(\epsilon \mu +(1-\epsilon )\nu \) for a positive rational \(\epsilon \) is more general than \(\nu \) in the sense above, and the computation cost may be still reasonable.

Now, we give computable versions of Theorem 2.3 as follows.

Theorem 3.1

Let \(\mu \) be a computable probability measure on \(2^\omega \). For all computable probability measures \(\xi \) m-dominating \(\mu \) and for all \(\mu \)-computably random sequence \(X\in 2^\omega \), we have

$$\begin{aligned} \sum _{n=1}^\infty D(\mu (\cdot |X_{<n})\ || \xi (\cdot |X_{<n}))<\infty . \end{aligned}$$

In particular,

$$\begin{aligned} \xi (k|X_{<n})-\mu (k|X_{<n})\rightarrow 0\; {\textit{as}} \;n\rightarrow \infty \end{aligned}$$

for both \(k\in \{0,1\}\).

Theorem 3.2

Let \(\mu \) be a computable probability measure. For all computable probability measure \(\xi \) m-dominating \(\mu \) and for all \(\mu \)-computably random sequence \(X\in 2^\omega \), we have

$$\begin{aligned} \frac{\xi (X_n|X_{<n})}{\mu (X_n|X_{<n})}\rightarrow 1\; {\textit{as}} \; n\rightarrow \infty . \end{aligned}$$

Notice that both Theorems 3.1 and 3.2 claim the existence of a computable measure \(\nu (=\mu )\) such that for all computable measures \(\xi \) m-dominating \(\nu \) have some properties, that is, all sufficiently general computable measures have some properties.

Hutter and Muchnik [3] has shown that “\(\mu \)-probability 1” in Theorem 2.3 cannot be replaced by “for all \(\mu \)-ML-random sequences X.” The above theorem says that, for a computable probability measure, we only need \(\mu \)-computable randomness.

3.2 Martingale Characterization and Convergence

We use a martingale characterization of computable randomness and a convergence theorem of martingales. The following characterization is due to [9].

Definition 3.3

Let \(\mu \) be a computable probability measure. A martingale M with respect to \(\mu \) is a partial function \(M:\subseteq 2^{<\omega }\rightarrow \mathbb {R}^+\) such that the following two conditions hold:

  1. (i)

    (Impossibility condition) If \(M(\sigma )\) is undefined, then \(\mu (\sigma )=0\).

  2. (ii)

    (Fairness condition) For all \(\sigma \in 2^{<\omega }\), we have

    $$\begin{aligned} M(\sigma 0)\mu (\sigma 0)+M(\sigma 1)\mu (\sigma 1)=M(\sigma )\mu (\sigma ) \end{aligned}$$

    where \({\textit{undefined}}\cdot 0=0\) and \(\mathbb {R}^+\) is the set of all non-negative reals.

We say M is an almost-everywhere computable martingale (or a.e. computable martingale) if M is a partial computable function. We say M succeeds on \(X\in 2^\omega \) if \(\limsup _{n\rightarrow \infty }M(X_{\le n})=\infty \).

Proposition 3.4

Let \(\mu \) be a computable probability measure on \(2^\omega \). Then, \(X\in 2^\omega \) is \(\mu \)-computably random if and only if \(\mu (X_{\le n})>0\) for all n and there is no a.e. computable martingale M which succeeds on X.

Computable randomness with respect to the uniform measure can be characterized as the existence of the limit along the sequence for all computable martingales (see, e.g. [1, Theorem 7.1.3]) by using Doob’s upcrossing argument. The same method can be applied for any computable measure.

Proposition 3.5

Let \(\mu \) be a computable probability measure on \(2^\omega \). Then, \(X\in 2^\omega \) is \(\mu \)-computably random if and only if \(\lim _{n\rightarrow \infty }M(X_{\le n})\) exists for all a.e. computable martingales M.

3.3 Proof of Theorem 3.1

Proof

Since \(\xi \) m-dominates \(\mu \), there exists a constant \(C\in \omega \) such that

$$\begin{aligned} \mu (\sigma )\le C\xi (\sigma ) \end{aligned}$$
(4)

for all \(\sigma \in 2^{<\omega }\). Let \(D(\sigma )\) be the KL-divergence of \(\mu \) w.r.t. \(\xi \) at \(\sigma \), that is,

$$\begin{aligned} D(\sigma )=D(\mu (\cdot |\sigma )\ ||\ \xi (\cdot |\sigma )). \end{aligned}$$

We define a function \(M:\subseteq 2^{<\omega }\rightarrow \mathbb {R}^+\) by

$$\begin{aligned} M(\sigma )=\ln C-\ln \frac{\mu (\sigma )}{\xi (\sigma )}+\sum _{t=1}^{|\sigma |}D(\sigma _{<t}) \end{aligned}$$

for every \(\sigma \in 2^{<\omega }\).

We claim that M is an a.e. computable martingale w.r.t. \(\mu \). The function M is non-negative because of (4) and the non-negativeness of D. For the impossibility condition of Definition 3.3, notice that, if \(\mu (\sigma )>0\), then \(\xi (\sigma )>0\) because \(\xi \) m-dominates \(\mu \), thus \(M(\sigma )\) is defined. Then, the a.e. computability of M follows from the computability of \(\mu \), \(\xi \), and D. For the fairness condition,

$$\begin{aligned} \sum _{k\in \{0,1\}}\mu (\sigma k)M(\sigma k)-\mu (\sigma )M(\sigma ) =-\sum _{k\in \{0,1\}}\mu (\sigma k)\ln \frac{\mu (k|\sigma )}{\xi (k|\sigma )}+\mu (\sigma )D(\sigma )=0. \end{aligned}$$

Since X is \(\mu \)-computably random, we have \(\limsup _n M(X_{\le n})<\infty \). Since both \(\ln C-\ln \frac{\mu (\sigma )}{\xi (\sigma )}\) and \(D(\sigma )\) are always non-negative, \(\sum _{n=1}^\infty D(X_{\le n})\) also converges. Finally, the last claim of the theorem follows by (3). \(\square \)

3.4 Proof of Theorem 3.2

Proof

Suppose that \(\xi \) is a computable measure m-dominating the measure \(\mu \). We define a function \(M:2^{<\omega }\rightarrow \mathbb {R}^+\) by

$$\begin{aligned} M(\sigma )=\frac{\xi (\sigma )}{\mu (\sigma )}. \end{aligned}$$

Then, M is a a.e. computable martingale w.r.t. \(\mu \). Hence, \(\lim _n M(X_{\le n})=\alpha \) exists for all \(\mu \)-computably random sequences \(X\in 2^\omega \).

Since \(\xi \) m-dominates \(\mu \), there exists \(C\in \omega \) such that \(\mu (\sigma )\le C\xi (\sigma )\) for every \(\sigma \in 2^{<\omega }\). Then,

$$\begin{aligned} M(X_{\le n})=\frac{\xi (X_{\le n})}{\mu (X_{\le n})}\ge \frac{1}{C} \end{aligned}$$

for every n. Thus, \(\alpha \ge \frac{1}{C}\).

Fix \(\epsilon >0\). Then, there exists \(N\in \omega \) such that

$$\begin{aligned} \left| \frac{\xi (X_{\le n})}{\mu (X_{\le n})}-\alpha \right| =|M(X_{\le n})-\alpha |\le \frac{\epsilon }{3C} \end{aligned}$$

for all \(n\ge N\). Thus,

$$\begin{aligned} \frac{\xi (X_n|X_{\le n})}{\mu (X_n|X_{\le n})} =\frac{\xi (X_{\le n})}{\mu (X_{\le n})}\cdot \frac{\mu (X_{<n})}{\xi (X_{<n})} \le \frac{\alpha +\epsilon /(3C)}{\alpha -\epsilon /(3C)} =1+\epsilon \cdot \frac{2}{3\alpha C-\epsilon }<1+\epsilon \end{aligned}$$

for all \(n\ge N+1\) if \(\epsilon \) is sufficiently small. Similarly, \(\frac{\xi (X_n|X_{\le n})}{\mu (X_n|X_{\le n})}>1-\epsilon \) for all \(n\ge N+1\). Since \(\epsilon \) is arbitrary, the claim follows. \(\square \)

4 Non-computability of the Convergence

From now on, we only consider the case that \(\mu \) is the Dirac measure on a point \(A\in 2^\omega \). If \(\mu \) is computable, then A should be computable. Theorem 3.1 in this case can be written as follows.

Corollary 4.1

Let \(A\in 2^\omega \) be a computable sequence. There exists a computable measure \(\nu \) such that

$$\begin{aligned} \sum _n(1-\xi (A_n|A_{<n}))<\infty \end{aligned}$$
(5)

for all computable measures \(\xi \) m-dominating \(\nu \). In particular, \(\xi (A_n|A_{<n})\rightarrow 1\) as \(n\rightarrow \infty \).

Proof

Let \(\mu \) be the Dirac measure on the point \(A\in 2^\omega \). Then, A is \(\mu \)-computably random. By Theorem 3.1, we have \(\sum _{n=1}^\infty \ln \frac{1}{\xi (A_n|A_{<n})}<\infty \). Finally, notice that

$$\begin{aligned} \sum _n\ln (1-(1-\xi (A_n|A_{<n})))>-\infty \iff \sum _n(1-\xi (A_n|A_{<n}))<\infty . \end{aligned}$$

\(\square \)

All sufficiently general computable measures can detect the pattern of a computable sequence A while no computable measure can detect the pattern of all computable sequences. One needs to pay attention to the order or quantifiers.

Remark 4.2

For each computable measure \(\xi \), there exists a computable sequence A such that \(\xi (A_n|A_{<n})\) does not converge to 1.

This claim is essentially the same as a famous fact in algorithmic learning theory that the class of all computable sequences is not BC-learnable. See e.g. [12] for a survey on algorithmic learning theory. A stronger result in the context of universal induction is in [4, Lemma 5.2.4]. For the sake of self-containedness, we give a short proof here in our terminology.

Proof

Let \((\epsilon _n)_n\) be a computable sequence of positive rationals such that \(\epsilon _n<\frac{1}{2}\) for all n and \(\prod _n (1+\epsilon _n)<\infty \). For each \(\sigma \), at least one of \(i\in \{0,1\}\) satisfies \(\xi (\sigma i)<\frac{1+\epsilon _{|\sigma |}}{2}\xi (\sigma )\). This is a c.e. relation and one can compute such i from \(\sigma \) uniformly. By iterating this, one can compute a sequence A such that \(\xi (A_{\le n})<\frac{1+\epsilon _n}{2}\xi (A_{<n})\) for all n. Since \(\epsilon _n\rightarrow 0\), we have

$$\begin{aligned} \limsup _n \xi (A_n|A_{<n})\le \limsup _n \frac{1+\epsilon _n}{2}\le \frac{1}{2}. \end{aligned}$$

\(\square \)

The following theorem says that all sufficiently general predictions \(\xi (A_n|A_{<n})\) are not too close to 1; they have almost the same convergence speed (5) up to a multiplicative constant. The convergence speed \(\xi (\overline{A_n}|A_{<n})\) to 0 is the slowest one among the sequences whose sum converge. Let \(\overline{k}=1-k\) for \(k\in \{0,1\}\).

Theorem 4.3

Let \(A\in 2^\omega \) be a computable sequence and \((a_n)_n\) be a computable sequence of positive rationals such that \(\sum _n a_n<\infty \). Then, there exists a computable measure \(\nu \) with the following property: For each computable measure \(\xi \) m-dominating \(\nu \), there exists a natural number \(C\in \omega \) such that

$$\begin{aligned} \xi (\overline{A_n}|A_{<n})\ge \frac{a_n}{C} \end{aligned}$$

for all n.

Notice that \((a_n)_n\) need not be monotone.

Proof

Without loss of generality, we can assume that \(s=\sum _n a_n<1\). Define a measure \(\nu \) by

$$\begin{aligned} \nu =\sum _n a_n\mathbf {1}_{A_{<n}\overline{A_n}0^\omega }+(1-s)\mathbf {1}_{A} \end{aligned}$$

where \(\mathbf {1}_X\) is the point-mass measure on \(X\in 2^\omega \).

We claim that this measure \(\nu \) is computable. It suffices to show that \(\nu (\sigma )\) is computable uniformly in \(\sigma \in 2^{<\omega }\). If \(\sigma \prec A\), then

$$\begin{aligned} \nu (\sigma )=\sum _{n\ge |\sigma |}a_n+1-s=1-\sum _{n<|\sigma |}a_n. \end{aligned}$$

If \(\sigma =A_{<k}\overline{A_k}0^i\) for some \(k,i\in \omega \), then

$$\begin{aligned} \nu (\sigma )=a_k. \end{aligned}$$

If \(\sigma =A_{<k}\overline{A_k}0^i1\tau \) for some \(k,i\in \omega \) and \(\tau \in 2^{<\omega }\), then

$$\begin{aligned} \nu (\sigma )=0. \end{aligned}$$

In any case, \(\nu (\sigma )\) is computable from n.

Suppose that a computable measure \(\xi \) m-dominates \(\nu \). Then, there exists \(C\in \omega \) such that \(\nu (\sigma )\le C\xi (\sigma )\) for all \(\sigma \in 2^{<\omega }\). Then,

$$\begin{aligned} \xi (\overline{A_n}|A_{<n})=1-\frac{\xi (A_{\le n})}{\xi (A_{<n})} =\frac{\xi (A_{<n}\overline{A_n})}{\xi (A_{<n})} \ge \frac{\nu (A_{<n}\overline{A_n})}{C} =\frac{a_n}{C}. \end{aligned}$$

\(\square \)

The rate of convergence of \(\xi (\overline{A_n}|A_{<n})\) to 0 is not monotone. In fact, it cannot be bounded by a decreasing computable function converging to 0.

Theorem 4.4

Let \(A\in 2^\omega \) be a computable sequence. Then, there exists a computable measure \(\nu \) such that no decreasing computable sequence \((b_n)_n\) converging to 0 m-dominates \(\xi (\overline{A_n}|A_{<n})\) for all computable measures \(\xi \) m-dominating \(\nu \).

Proof

Let \(U:\subseteq 2^{<\omega }\rightarrow 2^{<\omega }\) be a universal prefix-free machine. By the usual convention, \(U(\sigma )[s]\uparrow \) for each \(s<|\sigma |\). For all n, let

$$\begin{aligned} a_n=\sum _{\sigma }\{2^{-|\sigma |} : U(\sigma )\downarrow \text { at stage }n\}. \end{aligned}$$

Note that \((a_n)_n\) is a computable sequence because the possible \(\sigma \) should satisfy \(|\sigma |\le n\) by the convention. Furthermore,

$$\begin{aligned} \sum _n a_n=\sum _{\sigma \in \mathrm {dom}(U)}2^{-|\sigma |}<1. \end{aligned}$$

Then, by Theorem 4.3, there exists a computable measure \(\nu \) such that \(\nu (\overline{A_n}|A_{<n})\) m-dominates \((a_n)_n\).

Suppose that there exists a decreasing computable \((b_n)_n\) such that \(b_n\rightarrow 0\) as \(n\rightarrow \infty \) and \((b_n)_n\) m-dominates \(\nu (\overline{A_n}|A_{<n})\). Then, \((b_n)_n\) also m-dominates \((a_n)_n\), and let \(C\in \omega \) such that \(a_n\le 2^C b_n\) for all n.

For each \(\sigma \), search the least \(n\in \omega \) such that \(b_n<2^{-|\sigma |-C}\). If \(U(\sigma )[n]\uparrow \) and \(U(\sigma )[s]\downarrow \) for some \(s>n\), then \(a_s\ge 2^{-|\sigma |}\) by the definition of \((a_n)_n\), and

$$\begin{aligned} a_s\le 2^C b_s \le 2^C b_n<2^{-|\sigma |}, \end{aligned}$$

which is a contradiction. Thus, \(U(\sigma )\downarrow \) if and only if \(U(\sigma )[n]\downarrow \). Since n is computable from \(\sigma \) uniformly, this means that the halting problem is computable, which is a contradiction. Hence, such \((b_n)_n\) does not exist. \(\square \)

It may be interesting to compare the above result to Laplace’s answer to the sunrise problem. The answer \(\frac{n}{n+1}\) is slightly slower than (5). Theorem 4.4 means that all sufficiently general prediction violates Nicod’s criterion as the usual Solomonoff induction does [5].