Introduction

The gap between finite and infinite probabilities has been highlighted by probability zero events since the Kolmogorov axiomatic has been proposed, see [1].

The standard probability theory is a mathematically rigorous, consistent theory with an immense range of applications which demonstrate its usefulness in modelling many situations in the physical world [2]. However, there are infinite probabilistic scenarios that cannot be described in a satisfactory way in terms of this theory. One example is the De Finetti fair lottery in which draws are taken from the infinite set \(\{1,2,3, \ldots \}\) with equal non-zero probability.

In this paper, we use Grossone [3,4,5,6] in a formalism inspired by Lolli [7] to construct and study a class of infinitesimal probabilities. We prove that these probabilities satisfy the properties regularity, totality, perfect additivity and weak Laplaceanity discussed in Benci et al. [1], provide the natural solution to De Finetti infinite fair lottery, Williamson’s objections [8] against infinitesimal probabilities are mathematically refuted.

In what follows \({\mathbb {N}}\) is the set of positive integers, \(\mathbb {R}\) is the set of reals, the power set of X is denoted by \({\mathscr {P}}(X)\) and the number of elements of a finite set is indicated by #.

In Sects.  2 and 3 we present random events, classical probabilities, Kolmogorov axioms and natural probability desiderata. Section 4 is a brief informal presentation of Grossone which is followed by Lolli’s formalism in Sect. 5. De Finetti’s infinite lottery is discussed in Sect. 6. The main Sect. 7 is devoted to Grossone-like uniform probability spaces. The last two Sects. 8 and 9 are dedicated to non-uniform Grossone probabilities and conclusions.

Random Events and Classical Probabilities

Informally, a random experiment \({\mathscr {E}}\) (process/variable) is an experiment that is not and cannot be made exact and, consequently, its outcomes cannot be predicted. Carrying out a random experiment is called a trial. An outcome obtained as a result of a trial is called an elementary event. The set of all possible outcomes of a random experiment is called the sample space and is denoted by \(\varOmega \).

The subsets of \(\varOmega \) are called events and the family of events is denoted by \({\mathscr {P}}\left( \varOmega \right) .\) For example, rolling two dice is a random experiment \({\mathscr {E}}\); the sample space is the set \(\varOmega =\left\{ \left( i,j\right) \mid 1\le i,j\le 6\right\} .\) This experiment has \(6^{2}\) elementary events \(\left\{ \left( i,j\right) \right\} \). The pair \(\left( 1,6\right) \) is one possible outcome of a trial. The event “the sum of the two dice is equal to 3” is \(A=\left\{ \left( 1,2\right) ,\left( 2,1\right) \right\} .\) Events can be combined with usual set-theoretical operations of union, intersection and complement. The empty set, denoted by \(\emptyset \), is the impossible event. To conclude, to a random experiment \({\mathscr {E}}\) we associate the mathematical object \(\left( \varOmega ,{\mathscr {P}}\left( \varOmega \right) \right) . \)

Let us consider a random experiment \({\mathscr {E}}\) and an event \(A\in {\mathscr {P}}\left( \varOmega \right) .\) If we perform n trials of \({\mathscr {E}}\), then the ratio between the number occurrences of A and n is called the “frequency of A in n trials”, and is denoted by \(f_{n}\left( A\right) .\) It is an empirical fact that, when n increases, \(f_{n}\left( A\right) \) tends to “stabilise” to a number called “the probability” of A,  which is denoted by \(P\left( A\right) \). If instead of an arbitrary A we consider the elementary events, performing n trials will produce a frequency distribution for elementary events; this distribution will approach a probability distribution as the number of trials increases. Since \(f_{n}\left( A\cup B\right) =f_{n}\left( A\right) +f_{n}\left( B\right) \) whenever \(A\cap B=\emptyset ,\) it is natural to assume that \(P\left( A\cup B\right) =P\left( A\right) +P\left( B\right) \) if \(A\cap B=\emptyset \) and \(P\left( \varOmega \right) =1\).

In the example of rolling two dice, we have \(6^{2}\) elementary events. If the dice are “fair”, the probability of any elementary event is \(P\left( \left( i,j\right) \right) =1/36.\)

When \(\varOmega \) is a finite set and \(A\in {\mathscr {P}}\left( \varOmega \right) ,\) then the probability defined by \( P\left( A\right) =\frac{\#\left( A\right) }{\#\left( \varOmega \right) } \) is called the classical probability. For every event \(A\in {\mathscr {P}}\left( \varOmega \right) , P\left( A\right) =0\) if and only if \(A=\emptyset \). See more in Nelson [9].

Kolmogorov Axioms and Probability Desiderata

Kolmogorov axioms are the standard framework for probability theory. The sample space, that is, the set of (atomic) outcomes is denoted by \(\varOmega \). A measurable space is a couple \(\left( \varOmega ,{\mathscr {B}}\left( \varOmega \right) \right) \) where the event space is a Borel field of subsets of \(\varOmega ,\) \({\mathscr {B}}\left( \varOmega \right) \subseteq {\mathscr {P}}\left( \varOmega \right) \). A probability space is a triple \(\left( \varOmega ,{\mathscr {B}}(\varOmega ), \Pr \right) \), where \(\left( \varOmega ,{\mathscr {B}}(\varOmega ) \right) \) is a measurable space and \(\Pr :{\mathscr {B}}(\varOmega ) \longrightarrow \left[ 0,1\right] \) is a probability measure, that is, \(\Pr \) satisfies the following two conditions: (a) the probability of a countable union of mutually exclusive sets in \({\mathscr {B}}( \varOmega )\) is equal to the countable sum of the probabilities of each of these sets, and (b) \(\Pr ( \varOmega ) =1.\) In case condition a) is satisfied only for finitely many mutually exclusive sets, the probability space will be called finitely additive.

Kolmogorov axioms for infinite sample spaces do not respect any of the following—arguably natural—desiderata [1]:

  1. 1.

    Regularity The probability of every possible event, that is, any non-empty subset of \(\varOmega \), should be strictly larger than that of the impossible event,Footnote 1

  2. 2.

    Totality Every element of the event space should be assigned a probability value,

  3. 3.

    Perfect additivity The probability of an arbitrary union of mutually disjoint events should be equal to the sum of the probabilities of the separate events, where “sum” has to be appropriately defined,

  4. 4.

    Weak Laplaceanity The probability theory should allow for a uniform probability distribution on the sample space.Footnote 2

De Finetti fair lottery, in which draws are taken from the infinite set \(\{1,2,3, \ldots \}\) with equal probability, is a classical example in which Kolmogorov axioms fail the above desiderata [1, 10].

The above desiderata are achievable in rigorous mathematical terms even for infinite events and one way to do it is to use hyper-reals valued probabilities, see for example NAP [1, 11].

In what follows, we will present a simpler and natural construction of infinitesimal probabilities based on Grossone inspired from the classical probability which satisfies all above desiderata.

An Intuitive Glance at Grossone

The set of positive integers (shortly, numbers in what follows)

$$\begin{aligned} {\mathbb {N}} = \{1,2,3,\ldots \} \end{aligned}$$
(1)

can be represented by different numeral systems. The most common ones use a positional numeral system with bases 10 or 2. In some sense these systems “reveal” the same information about the set (1). The system proposed by Sergeyev [3,4,5,6] gives more information about (1) using a formalism supporting Aristotles Principle: “the whole is greater than its parts”.Footnote 3 To this aim, a new infinite number \(\textcircled {1}\) (called Grossone) is introduced: informally, \(\textcircled {1}\) is the number of elements of the set of numbers (1).

The infinite number \(\textcircled {1}\) can be used to calculate the number of elements of various infinite sets of numbers. For example, the number of elements of the set \({\mathbb {N}}{\setminus} \{10\}\) is \(\textcircled {1} -1\); the sets of even and odd numbers have the same number of elements, namely \(\frac{\textcircled {1}}{2}\); the set \(\{1\} \cup \{2n \mid n\in {\mathbb {N}}\}\) has \(\textcircled {1} + 1\) elements. The elements \(\textcircled {1}, \textcircled {1}+1, \textcircled {1} -1, \frac{\textcircled {1}}{2}\) and many others like

$$\begin{aligned}&\cdots ,\frac{\textcircled {1}}{2}-2, \frac{\textcircled {1}}{2}-1, \frac{\textcircled {1}}{2}, \frac{\textcircled {1}}{2}+1, \frac{\textcircled {1}}{2}\\&\quad +2, \cdots ,\textcircled {1}-2, \textcircled {1}-1, \textcircled {1} \end{aligned}$$

can be thought as infinite numbers.

Accordingly, we can “magnify” \(\mathbb {N}\) to the set

$$\begin{aligned} \mathbb {N}^{\dagger }= \left\{ 1,2, \ldots ,\frac{\textcircled {1}}{2}-2, \frac{\textcircled {1}}{2}-1, \frac{\textcircled {1}}{2}, \frac{\textcircled {1}}{2}+1, \frac{\textcircled {1}}{2}+2, \ldots ,\textcircled {1}-2, \textcircled {1}-1, \textcircled {1}\right\} . \end{aligned}$$
(2)

We may think of the sets (1) and (2) as the results of two different experiments performed on the set of positive integers.Footnote 4 The set \(\mathbb {N}^{\dagger }\) will be used to “measure” the sizes of different, finite or infinite, subsets of \(\mathbb {N}\).

Grossone Calculus

We introduce now a calculus with \(\textcircled {1}\) following the axiomatic theory of Grossone developed and studied in Lolli [7]. Our approach differs twofold from Lolli’s: (a) we do no restrict ourselves to the predicative second-order logic, that is, quantifications of sets are allowed, (b) while Lolli uses \(\mathbb {N}\) for analysing \(\mathbb {N}^{\dagger }\), we proceed the other way round: we study \(\mathbb {N}\) with \(\mathbb {N}^{\dagger }\). The notation \(n\in {\mathbb {N}} \) means that n is a finite element in \(\mathbb {N}^{\dagger }\).

The following infinite list of rules R1 and R2 defines the arithmetic structure of \(\mathbb {N}^{\dagger }\).

R1:

Infinity For every \(n\in {\mathbb {N}}\), \(n<\,\textcircled {1}\).

R2:

Divisibility For every \(n\in {\mathbb {N}}\), \(\frac{\textcircled {1}}{n}\) is an infinite number.

As proved in Lolli [7], the rules R1 and R2 justify all infinite numbers in (2), operations with \(\textcircled {1}\) like \(n+ \textcircled {1}, \textcircled {1} - n, \frac{\textcircled {1}-n}{k}\) and identities like \(0\cdot \textcircled {1} =\textcircled {1} \cdot 0=0, \textcircled {1} - \textcircled {1} =0,\frac{\textcircled {1}}{\textcircled {1}}=1, 1^{\textcircled {1}}=1, 0^{\textcircled {1}}=0.\) Is \(\textcircled {1}\) prime? The answer is negative: in fact \(\textcircled {1}\) is “maximally composite” as by R2 for every finite \(n\in {\mathbb {N}}\), \(\frac{\textcircled {1}}{n} \in \mathbb {N}^{\dagger }\), hence n divides \(\textcircled {1}\).

The \(\textcircled {1}\)-measure \(\mu :{\mathscr {P}}(\mathbb {N}) \rightarrow \mathbb {N}^{\dagger }\cup \{0\}\) is introduced using the following two axioms:Footnote 5

\(\mathbf{A}_{\mu }{} \mathbf{1}.\):

For every \(x \in \mathbb {N}\), \(\mu (\{x\})=1\).

\(\mathbf{A}_{\mu }{} \mathbf{2}.\):

For every sets \(X,Y\subseteq \mathbb {N}\), if \(X\cap Y=\emptyset \), then \(\mu (X\cup Y)=\mu (X)+\mu (Y)\).

In conjunction with R2 we introduce a partition \((\mathbb {N}_{i,n})_{1\le i\le n}\) of \(\mathbb {N}\): for every \(n\in {\mathbb {N}}\) and \(1\le i\le n\),

$$\begin{aligned} \mathbb {N}_{i,n}= \{i, n+i, 2n+i, \dots \}. \end{aligned}$$

For every \(x\in \mathbb {N}^{\dagger }\), \(\prec _x\) is the initial segment of numbers less than or equal to x, \(\prec _x \ = \{y\in \mathbb {N}\mid y \le x\}.\) If \(x\in {\mathbb {N}}\), then \(\prec _x\) is finite, but if x is an infinite number, then the set \(\prec _x\) is infinite; for example, \(\prec _{\textcircled {1}}\ = \mathbb {N}\). A set X is called bounded if there exists an x such that \(X \subseteq \prec _x\).

The following two theorems have been proved in Lolli [7].

Theorem 1

The following facts can be proved from R1, R2, \(\mathbf{A}_{\mu }{} \mathbf{1}\) and \(\mathbf{A}_{\mu }{} \mathbf{2}\):

  1. 1.

    \(\mu (\emptyset )=0\).

  2. 2.

    For every bounded sets \(X,Y \subseteq \mathbb {N}\), if \(X\subseteq Y\), then \(\mu (X) \le \mu (Y)\).

  3. 3.

    For every \(x\in \mathbb {N}\), \(\mu (\prec _x)=x\); in particular, \(\mu (\prec _{\textcircled {1}})=\textcircled {1}.\)

  4. 4.

    Every proper subset of \(\mathbb {N}\) has a measure x with \(x < \textcircled {1}\).

Theorem 2

From R1, R2, \(\mathbf{A}_{\mu }{} \mathbf{1}\) and \(\mathbf{A}_{\mu }{} \mathbf{2}\) for every \(n\in {\mathbb {N}}\) and \(1\le i\le n\), we have \( \mu (\mathbb {N}_{i,n}) = \frac{\textcircled {1}}{n}.\)

From Theorem 2 the \(\textcircled {1}\)-measure of each \( \mathbb {N}_{i,n}\) is n times less than the \(\textcircled {1}\)-measure of the whole set \( \mathbb {N}\). This validates to the intuition that the ‘number’ of even numbers is the same as the ‘number’ of odd numbers and half of the ‘number’ of all numbers .

Corollary 1

From R1, R2, \(\mathbf{A}_{\mu }{} \mathbf{1}\) and \(\mathbf{A}_{\mu }{} \mathbf{2}\) we have

  1. 1.

    \( \mu (\mathbb {N}) = \textcircled {1}, \)

  2. 2.

    \( \mu (\mathbb {N}_{1,2}) = \mu (\mathbb {N}_{2,2}) =\frac{\textcircled {1}}{2}, \)

  3. 3.

    \( \mu (\mathbb {N}_{1,3})) = \mu (\mathbb {N}_{2,3}) = \mu (\mathbb {N}_{3,3}) = \frac{\textcircled {1}}{3}, \)

  4. 4.

    \( \mu (\mathbb {N}\cup \{ 0 \}) = \textcircled {1}+1, \)

  5. 5.

    \(\mu (\mathbb {N}{\setminus} \{ 3, 5, 10, 23, 114 \}) = \textcircled {1} -5, \)

  6. 6.

    \( \mu (\{ x \in \mathbb {N}\mid x= n^2, n \in \mathbb {N}\}) =\lfloor \sqrt{\textcircled {1}} \rfloor . \)

De Finetti Infinite Fair Lottery

By a finite lottery, we mean a process that assigns exactly one winner among a finite set of tickets (contained in an urn) in a fair way, that is, each ticket can be won with the same probability. If the tickets are drawn from the label set \(\{1,2,3, \dots ,n\}\), then the probability that each i, \(1\le i \le n\), is a winner is 1 / n. In particular, the probability of a combination of tickets contains the winner is the sum of the individual probabilities of the tickets. This process can be modelled by a uniform probability space which fulfils the Kolmogorov’s axioms.

What about an infinite lottery in which tickets are drawn from the label infinite set \(\{1,2,3, \dots \}\). Can it be fair? “Classically” this is impossible because informally the uniform distribution will require that each number has probability \(1/\infty = 0\); for a formal argument see [14, 15, Ch 11].

Using the Grossone framework it was informally argued in Rizza [16] that the infinite fair lottery \(\mathbb {N}\) – which has \(\textcircled {1}\) numbers – is possible and the probability of each number to be a winner is \(1/\textcircled {1}\), a positive infinitesimal. This intuition will be made rigorous in Theorem 3.

Grossone-Like Uniform Probability Spaces

In this section, we construct and study Grossone-like uniform probability spaces. The approach is inspired by and parallels the construction of classical probability.Footnote 6

Next, we introduce the infinitesimals based on \(\textcircled {1}\). A real number x is infinitesimal if \(\vert x\vert <r\) for every positive real r (see more in Lolli [17]).

R3:

Infinitesimals: There exists \(\textcircled {1}^{-1}\) with \(\textcircled {1}^{-1} \cdot \textcircled {1} = \textcircled {1} \cdot \textcircled {1}^{-1} = 1.\)

From R1 and R3, \(\textcircled {1}^{-1} >0\) and for every \(n\in {\mathbb {N}}\), \(\textcircled {1}^{-1} < \frac{1}{n}\), so \(\textcircled {1}^{-1}\) is an infinitesimal less than 1.

In what follows we will assume R1, R2, R3, \(\mathbf{A}_{\mu }{} \mathbf{1}\) and \(\mathbf{A}_{\mu }{} \mathbf{2}\). The probabilities defined in this section will have also infinitesimal values drawn from the set

$$\begin{aligned} \mathbb {I}=\left\{ \frac{\mu (A)}{\mu (\varOmega )} \mid \emptyset \not = A \subset \varOmega \subseteq \mathbb {N}, \varOmega \text{ infinite } \right\} , \end{aligned}$$
(3)

(see R3). Hence, all the probabilities will be defined on \({\mathscr {P}}(\varOmega )\) and will take values in the extended unit interval \([0,1]^{\dagger }= [0,1] \cup \mathbb {I}\).

In this framework, Williamson’s arguments against infinitesimal probability are mathematically refuted.

Grossone Probability Space

Let us consider the sample space \(\mathbb {N}\) and the elementary events of the random experiment \({\mathscr {E}}\) consisting in picking at random a number from \(\mathbb {N}\). The Grossone uniform probability space

$$\begin{aligned} \mathbb {N}(\textcircled {1})= \left( \mathbb {N}, {\mathscr {P}}(\mathbb {N}), \Pr \right) \end{aligned}$$

is defined by assigning to every event \(A\in {\mathscr {P}}( \mathbb {N})\) the probability

$$\begin{aligned} \Pr (A)= \frac{\mu (A)}{\mu (\mathbb {N})}= \frac{\mu (A)}{\textcircled {1}}. \end{aligned}$$
(4)

Theorem 3

The probability space \(\mathbb {N}(\textcircled {1})\) is regular, total, finitely additive and uniform.

Proof

In view of Theorem 1, the function \(\Pr :{\mathscr {P}}(\mathbb {N}) \rightarrow [0,1]^{\dagger }\) defined by (4) satisfies the properties of a finite-additive probability. Indeed, \(\Pr (A)\ge 0\) for all \(A\in {\mathscr {P}}(\mathbb {N})\), \(\Pr (\mathbb {N})=1\) and for all sets \(A, B \in {\mathscr {P}}( \mathbb {N})\) such that \(A\cap B=\emptyset \) we have \(\Pr (A\cup B)= \Pr (A)+\Pr (B)\) by \(\mathbf{A}_{\mu }{} \mathbf{2}\).

By Theorem 1 and (4), every event has a probability and for every \(A\in {\mathscr {P}}(\mathbb {N})\), \(\Pr (A)=0\) iff \(A=\emptyset \).

Finally, \(\mathbb {N}(\textcircled {1})\) is uniform because, by \(\mathbf{A}_{\mu }{} \mathbf{1}\), every elementary event \(\{n\}\) has the same chance to occur:

$$\begin{aligned} \Pr (\{n\})= \frac{ \mu (\{n\}) }{\textcircled {1}} = \frac{1}{\textcircled {1}}. \end{aligned}$$
(5)

\(\square \)

Theorem 3 shows that the Grossone probability space \(\mathbb {N}(\textcircled {1})\) satisfies all four probability desiderata in Sect. 3. In contrast with the standard probability theory, where there is no infinite fair lottery, the probability space \(\mathbb {N}(\textcircled {1}) = (\mathbb {N}, {\mathscr {P}}(\mathbb {N}), \Pr )\) is an adequate model for the fair lottery on \(\mathbb {N}\). In this probability space, we can calculate the probability of every event, in particular, the probability of each event in Corollary 1. For example, in \(\mathbb {N}(\textcircled {1})\), the probability of picking at random an arbitrary odd number is

$$\begin{aligned} \Pr (\mathbb {N}_{1,2}) =\frac{\frac{\textcircled {1}}{2}}{\textcircled {1}} = \frac{1}{2}. \end{aligned}$$
(6)

Grossone-Like Probability Spaces

Let \(\varOmega \subset \mathbb {N}\) be infinite. The probability \({\Pr }_{\varOmega }: {\mathscr {P}}(\varOmega ) \rightarrow [0,1]^{\dagger }\) is defined for every \(n\in \varOmega \) by

$$\begin{aligned} {\Pr }_\varOmega (\{n\})= \frac{1}{\mu (\varOmega )}. \end{aligned}$$

As a consequence of Theorem 3 we get

Corollary 2

The probability space \(\varOmega (\textcircled {1})=(\varOmega , {\mathscr {P}}(\varOmega ), {\Pr }_\varOmega )\) is regular, total, finitely additive and uniform.

The probability space \(\varOmega (\textcircled {1})\) will be called a Grossone-like probability space.Footnote 7 It is clear that by (5) and the assumption \(\varOmega \subset \mathbb {N}\), for every \(n\in \varOmega \), \({\Pr } (\{n\}) < {\Pr }_\varOmega (\{n\})\).

Grossone-like probability spaces can be defined not only on infinite sets of numbers, but also on other infinite discrete sets, like the sets of integers or rationals.

Williamson’s Arguments

Williamson’s argument [8] purports to show that infinitesimal probabilities are inherently problematic; for more philosophical arguments refuting Williamson’s conclusion see [1, 18]. In this section, we refute mathematically Williamson’s conclusion.

Consider an urn containing tickets labelled with the elements of \(\mathbb {N}\) and a mechanism to implement a fair lottery for the tickets in the urn. We consider two cases.

In case (1), all tickets are in the urn and the probability of winning each arbitrary single ticket n in this lottery is \(\Pr _{1}(n)\), a probability which can be infinitesimal, but not zero.

In case (2), one ticket k is removed from the urn prior to the drawing of the winning ticket. Accordingly, the urn contains one ticket less, so the probability of winning each remaining ticket \(m\in \mathbb {N}{\setminus} \{k\}\) is (after renormalisation):

$$\begin{aligned} {\Pr }_{2}(m) = \frac{{\Pr }_{1}(m)}{1-{\Pr }_{1}(m)}. \end{aligned}$$
(7)

However, in isolation, case (2) looks exactly as before the removal of the ticket k, which is case (1). Why? Due to a simple set-theoretic argument: both urns have the same set-theoretical cardinality. Accordingly, for every \(m\in \mathbb {N}{{\setminus}} \{k\}\):

$$\begin{aligned} {\Pr }_{2}(m) = {\Pr }_{1}(m). \end{aligned}$$
(8)

From (7) and (8) we deduce that for every \(m \in \mathbb {N}{{\setminus}} \{k\}\), \({\Pr }_{1}(m) = {\Pr }_{2}(m)=0\), a contradiction. In [8], this is interpreted as an inherently problematic situation arising in any attempt to use infinitesimal probabilities, irrespective of their mathematical constructions.

The reason case (1) and case (2) look exactly the same is because the urns have the same set-theoretical cardinality, that is, because informally they have the same number of elements. The cardinality of a finite set is indeed the same as the number of elements of the set; however, this is not true for infinite sets.

Here is the resolution of Williamson’s contradiction using the Grossone probability space. In case (1), the probability space is

$$\begin{aligned} \mathbb {N}(\textcircled {1})= (\mathbb {N}, {\mathscr {P}}(\mathbb {N}),{\Pr }_{1}) \end{aligned}$$

in which the winning probability of every single ticket labelled by \(n\in \mathbb {N}\) is

$$\begin{aligned} {\Pr }_{1}(\{n\})= \frac{1}{\textcircled {1}}. \end{aligned}$$

In case (2), when k has been removed from the urn, the correct probability space is the Grossone-like space

$$\begin{aligned} (\mathbb {N}{\setminus} \{k\}) (\textcircled {1} -1)= \left( \mathbb {N}{\setminus} \{k\}, {\mathscr {P}}(\mathbb {N}{\setminus} \{k\}),{\Pr }_{2}\right) . \end{aligned}$$

In this space, the winning probability of every single ticket labelled by \(m\in \mathbb {N}{\setminus} \{k\}\) is obviously larger than the probability in the space \(\mathbb {N}(\textcircled {1})\):

$$\begin{aligned} 0 < {\Pr }_{2}(\{m\})= \frac{1}{\textcircled {1}-1}> \frac{1}{\textcircled {1}}={\Pr }_{1}(\{m\}). \end{aligned}$$

A similar analysis works if instead of removing one ticket from the urn we remove infinitely many tickets, say, all tickets labelled with even numbers. In this example, the first probability space remains unchanged, but the second probability space is the Grossone-like space

$$\begin{aligned} \mathbb {N}_{2,2} \left( \textcircled {1}\right) =(\mathbb {N}_{2,2}, {\mathscr {P}}(\mathbb {N}_{2,2}),{\Pr }_{3}). \end{aligned}$$

In this space, the winning probability of every single ticket labelled by \(m\in \mathbb {N}_{2,2}\) is larger than the probability in \({\mathbb {N}}(\textcircled {1})\):

$$\begin{aligned} {\Pr }_{3}(\{m\}) = \frac{1}{\frac{\textcircled {1}}{2}}> \frac{1}{\textcircled {1}}={\Pr }_{1}(\{m\})>0. \end{aligned}$$

Independence

Consider a Grossone-like probability space \(\left( \varOmega , {\mathscr {P}}(\varOmega ), {\Pr }_{\varOmega } \right) \). Two events A and B are independent if \({\Pr }_{\varOmega } (A\cap B)= {\Pr }_{\varOmega } (A) {\Pr }_{\varOmega } (B)\). A finite set of events is mutually independent if every event is independent of any intersection of the other events.

Example 1

Consider the Grossone-like model for De Finetti infinite fair lottery \(\mathbb {N}\left( \textcircled {1}\right) \) and the following two events: A: the winner is smaller or equal to 10, and B: the winner is an odd number. We have:

$$\begin{aligned} \Pr (A)= & {} \Pr ( \prec _{10}) = \frac{10}{\textcircled {1}}; \, \Pr (B)= \Pr (\mathbb {N}_{1,2})=\frac{1}{2}, \, \Pr (A \cap B) \\= & {} \Pr (\{ 1,3,5,7,9\}) = \frac{5}{\textcircled {1}}, \end{aligned}$$

hence, \( \Pr (A \cap B) = \Pr (A)\Pr (B)\) so the events A and B are independent.

On the other hand, the events “the winner is an even number” and “the winner is an odd number” are not independent because \({\Pr }\left( {\mathbb {N}}_{1,2}\right) ={\Pr }\left( {\mathbb {N}}_{2,2}\right) =\frac{1}{2}\) while \({\Pr }\left( {\mathbb {N}}_{1,2}\cap {\mathbb {N}}_{2,2}\right) =0.\)

Conditional Probabilities

Consider a Grossone-like probability space \(\left( \varOmega , {\mathscr {P}}(\varOmega ), {\Pr }_{\varOmega } \right) \). The conditional probability of an event A assuming that the event B with \({\Pr }_{\varOmega }(B)> 0\) has occurred is defined as

$$\begin{aligned} {\Pr }_{\varOmega }(A \mid B)= \frac{{\Pr }_{\varOmega }(A\cap B)}{{\Pr }_{\varOmega }(B)}. \end{aligned}$$

Note that two events A and B are independent iff \({\Pr }_{\varOmega }\left( B\right) =0\) or \({\Pr }_{\varOmega }\left( A\mid B\right) ={\Pr }_{\varOmega }\left( A\right) .\)

In Example 1, we have

$$\begin{aligned} {\Pr }\left( \prec _{10}\mid {\mathbb {N}}_{1,2}\right) =\frac{{\Pr }\left( \prec _{10}\cap {\mathbb {N}}_{1,2}\right) }{{\Pr }\left( {\mathbb {N}}_{1,2}\right) }=\frac{\frac{5}{\textcircled {1}}}{\frac{1}{2}} =\frac{10}{\textcircled {1}}= {\Pr }\left( \prec _{10}\right) . \end{aligned}$$

For \(A \subseteq \varOmega \), we denote \({\bar{A}}=\varOmega {\setminus} A \) and \(A\rightarrow B= {\bar{A}} \cup B\).

Example 2

(Negation elimination) For every \(A \in {\mathscr {P}}(\varOmega )\) such that \( A\not = \emptyset \) (\( {\bar{A}} \not = \emptyset \)) and for every \(B \in {\mathscr {P}}(\varOmega )\) we have: \({\Pr }_{\varOmega }({\bar{A}} \rightarrow B \mid A)=1\,({\Pr }_{\varOmega }(A \rightarrow B \mid {\bar{A}})=1).\)

Proof

We have: \( {\Pr }_{\varOmega }({\bar{A}} \rightarrow B \mid A) = {\Pr }_{\varOmega }(A \cup B \mid A) = \frac{{\Pr }_{\varOmega }(A \cap (A \cup B))}{{\Pr }_{\varOmega }(A)}=1\) and similarly \({\Pr }_{\varOmega }\left( A\rightarrow B\mid {\overline{A}}\right) =1.\) \(\square \)

Lemma 1

(Law of Total Probability) If \(\left( B_{i}\right) _{1\ \le \ i\ \le \ n}\) is a partition of \( \varOmega \), then for every event \(A\ne \emptyset \) and all \(1\le i\le n\) we have:

$$\begin{aligned} {\Pr }_{\varOmega }\left( A\right) =\sum \limits _{j=1}^{n}{\Pr }_{\varOmega }\left( B_{j}\right) {\Pr }_{\varOmega }\left( A\mid B_{j}\right) =\sum \limits _{j=1} ^{n}{\Pr }_{\varOmega }\left( B_{j}\right) {\Pr }_{B_{j}}\left( A\right) . \end{aligned}$$

Proof

We have

$$\begin{aligned} A=\bigcup \limits _{j=1}^{n}\left( A\cap B_{j}\right) \text{ and } \left( A\cap B_{j}\right) \cap \left( A\cap B_{i}\right) =\emptyset , \text{ for } \text{ every } j\ne i. \end{aligned}$$

Then, by \(\mathbf{A}_{\mu }{} \mathbf{2}\) and the definition of conditional probability, we have

$$\begin{aligned} {\Pr }_{\varOmega }\left( A\right) =\sum \limits _{j=1}^{n}{\Pr }_{\varOmega }\left( A\cap B_{j}\right) =\sum \limits _{j=1}^{n}{\Pr }_{\varOmega }\left( B_{j}\right) {\Pr }_{\varOmega }\left( A\mid B_{j}\right) . \end{aligned}$$

\(\square \)

Theorem 4

(Bayes Theorem) If \(\left( B_{i}\right) _{1\ \le \ i\ \le \ n}\) is a partition of \(\varOmega \), then for every event \(A\not = \emptyset \) and all \(1\le i \le n\) we have:

$$\begin{aligned} {\Pr }_{\varOmega }(B_i \mid A)= \frac{{\Pr }_{\varOmega } (A \mid B_i){\Pr }_{\varOmega } (B_i)}{\sum _{j=1}^{n} { \Pr }_{\varOmega } (A \mid B_j) {\Pr }_{\varOmega } (B_j)}. \end{aligned}$$
(9)

Proof

The property follows from Lemma 1 and the equalities

$$\begin{aligned} {\Pr }_{\varOmega }\left( B_{i}\mid A\right) =\frac{{\Pr }_{\varOmega }\left( B_{i}\cap A\right) }{{\Pr }_{\varOmega }\left( A\right) }=\frac{{\Pr }_{\varOmega }\left( B_{i}\right) {\Pr }_{\varOmega }\left( A\mid B_{i}\right) }{{\Pr }_{\varOmega }\left( A\right) }. \end{aligned}$$

\(\square \)

Bayes Theorem plays a very important role when we gain an extra knowledge achieved through an auxiliary experiment. Assume we are interested in the outcomes \(B_{1}, \ldots , B_{n}\) of a random experiment which form a partition of the sample space \(\varOmega .\) Their prior probabilities are \(\left\{ { \Pr }_{\varOmega }\left( B_{i}\right) , 1\le i\le n\right\} .\)

We then perform an auxiliary random experiment and denote by A its outcome. Using Bayes formula (9), we can calculate the posterior probabilities of \(B_{1},\ldots ,B_{n},\) that is \(\left\{ { \Pr }_{\varOmega }\left( B_{i}\mid A\right) , 1\le i\le n\right\} .\)

Example 3

Consider an infinite, fair lottery with tickets, drawn from \({\mathbb {N}}\), written in red and blue in such a way that both colours are used. We can formulate the following hypotheses about the lottery:

\(H_{i}:\) “The lottery contains i red tickets and \(\left( \textcircled {1}-i\right) \) blue tickets”, \(1\le i\le \textcircled {1}-1\),

and, accordingly, model the lottery with the sets

$$\begin{aligned}&{\mathbb {N}}_{( i, \mathrm{red})} =\left\{ 1,2,3,\ldots \mid i\ \mathrm{\ tickets\ \ are\ \ red,\ \textcircled {1}-{ i}\ \ tickets\ \ are\ \ blue}\right\} ,\\&\quad i \ge 1, \, i \le \textcircled {1}-1. \end{aligned}$$

It is seen that

$$\begin{aligned}&{\mathbb {N}}_{( i, red)} =\mathbb {RED}_{i}\ \cup \ \mathbb {BLUE}_{i}, \mu \left( \mathbb {RED}_{i}\right) =i,\\&\quad \mu \left( \mathbb {BLUE} _{i}\right) =\textcircled {1}-i, 1 \le i\le \textcircled {1}-1. \end{aligned}$$

A drawing of the lottery will produce a (winner) ticket in one of the sets \({\mathbb {N}}_{(i, \mathrm{red})}\), a process which can be formally described by the following Grossone-like space \( \left( \varOmega ,{\mathscr {P}}\left( \varOmega \right) ,{\Pr }_{\varOmega }\right) \) in which the sample space is

$$\begin{aligned} \varOmega&=\bigcup \limits _{i=1}^{\textcircled {1}-1}{\mathbb {N}}_{( i, \mathrm{red})} =\left( \bigcup \limits _{i=1}^{\textcircled {1}-1}\mathbb {RED}_i\right) \ \bigcup \ \left( \bigcup \limits _{i=1}^{\textcircled {1}-1}\mathbb {BLUE}_i\right) , \end{aligned}$$

and the probability is \({\Pr }_{\varOmega }\left( \left\{ x\right\} \right) =\frac{1}{\textcircled {1}\left( \textcircled {1}-1\right) }\), for all \(x\in \varOmega \).

The event that the hypothesis \(H_{i}\) is true is \( {\mathbb {N}}_{( i, \mathrm{red})}\); for simplicity we write \(B_{i}=N_{( i, \mathrm{red}) }\).

A priori, the likelihood of the hypothesis \(H_{i}\) is given by the prior probability:

$$\begin{aligned} {\Pr }_{\varOmega }\left( B_{i}\right) =\frac{\mu (B_i)}{\textcircled {1}\left( \textcircled {1}-1\right) }=\frac{\textcircled {1}}{\textcircled {1}\left( \textcircled {1}-1\right) }=\frac{1}{\textcircled {1}-1}. \end{aligned}$$

The auxiliary experiment now consists in drawing one ticket which should give more information about the events \(B_i\). Assume the winner’s ticket is red, which means that the event \(\left( \bigcup \nolimits _{i=1}^{\textcircled {1}-1}\mathbb {RED}_i\right) \in {\mathscr {P}}\left( \varOmega \right) \) has occurred. Using the simpler notation \(A=\bigcup \nolimits _{i=1}^{\textcircled {1}-1}\mathbb {RED}_i\) we have:

$$\begin{aligned} {\Pr }_{\varOmega }\left( A\right) &=\frac{\mu \left( A\right) }{\textcircled {1}\left( \textcircled {1}-1\right) }=\frac{1}{\textcircled {1}\left( \textcircled {1}-1\right) }\sum \limits _{i=1} ^{\textcircled {1}-1}\mu \left( \mathbb {RED}_{i}\right) \\&\quad =\frac{1}{\textcircled {1}\left( \textcircled {1}-1\right) }\sum \limits _{i=1}^{\textcircled {1}-1}i =\frac{1}{\textcircled {1}\left( \textcircled {1}-1\right) }\cdot \frac{\left( \textcircled {1}-1\right) \textcircled {1}}{2}=\frac{1}{2}. \end{aligned}$$

On the other hand, if the hypothesis \(H_{i}\) is true we have

$$\begin{aligned}&{\Pr }_{\varOmega }\left( A\mid B_{i}\right) =\frac{{\Pr }_{\varOmega }\left( A\cap B_{i}\right) }{\Pr _{\varOmega }\left( B_{i}\right) }=\frac{\frac{\mu \left( A\cap B_{i}\right) }{\textcircled {1}\left( \textcircled {1}-1\right) }}{{\Pr }_{\varOmega }\left( B_{i}\right) }\nonumber \\&\quad =\frac{\frac{\mu \left( \mathbb {RED}_{i}\right) }{\textcircled {1}\left( \textcircled {1}-1\right) }}{{\Pr }_{\varOmega }\left( B_{i}\right) }=\frac{i \cdot (\textcircled {1}-1)}{\textcircled {1}\left( \textcircled {1}-1\right) } =\frac{i}{\textcircled {1}}. \end{aligned}$$
(10)

As the events \(\{B_{i}\mid 1\le i \le \textcircled {1}-1\}\) form a partition of \(\varOmega \), we can now use Bayes formula to calculate, for every \( 1 \le i \le \textcircled {1}-1\), the posterior probabilities:

$$\begin{aligned} {\Pr }_{\varOmega }\left( B_{i}\mid A\right) =\frac{{\Pr }_{\varOmega }\left( A\mid B_{i}\right) \cdot \Pr _{\varOmega }\left( B_{i}\right) }{{\Pr }_{\varOmega }\left( A\right) } =\frac{\frac{i}{\textcircled {1}}\cdot \frac{1}{\textcircled {1}-1}}{\frac{1}{2}}=\frac{2i}{\textcircled {1}\left( \textcircled {1}-1\right) }. \end{aligned}$$
(11)

First note that \( \underset{1\ \le \ i\ \le \ \textcircled {1}-1}{\max }{\Pr }_{\varOmega }\left( B_{i}\mid A\right) ={\Pr }_{\varOmega }\left( B_{\textcircled {1}-1}\mid A\right) =\frac{2}{\textcircled {1}}. \) Second, for some \(1 \le i, j\le \textcircled {1}-1\), \({\Pr }_{\varOmega }\left( B_{i}\right) < {\Pr }_{\varOmega }\left( B_{i}\mid A\right) \) and \({\Pr }_{\varOmega }\left( B_{j}\right) > {\Pr }_{\varOmega }\left( B_{j}\mid A\right) \). To locate a “switch” threshold we use (10) and (11) and get the inequalities

$$\begin{aligned} \frac{2i}{\textcircled {1}\left( \textcircled {1}-1\right) }<\frac{1}{\textcircled {1}-1}, \, \frac{2\left( i+1\right) }{\textcircled {1}\left( \textcircled {1}-1\right) }\ge \frac{1}{\textcircled {1}-1}, \end{aligned}$$

which have the solutions \(\frac{\textcircled {1}}{2}-1\le i<\frac{\textcircled {1}}{2}.\) Then, a “switch” threshold is the Grossone number \(i=\frac{\textcircled {1}}{2}-1.\)

Random Variables

Using (3), let \( \mathbb {R}^{\dagger }= \mathbb {R}\cup \mathbb {I}\). A real random variable on the Grossone-like probability space \(( \varOmega ,{\mathscr {P}}(\varOmega ),\Pr )\) is a function \(X:\varOmega \rightarrow \mathbb {R}^{\dagger }\) with the property that for every measurable set \(B\in {\mathscr {B}}\left( \mathbb {R}^{\dagger }\right) \) we have \(X^{-1}\left( B\right) \in {\mathscr {P}}(\varOmega ).\) A random variable X generates a probability space \(\left( \mathbb {R}^{*},{\mathscr {B}}\left( \mathbb {R}^{\dagger }\right) , P_{X}\right) ,\) in which the probability \(P_{X}\)—called the probability distribution of X—is defined as follows:

$$\begin{aligned} P_{X}\left( B\right) =\Pr \left( X^{-1}\left( B\right) \right) ,\ B\in {\mathscr {B}}\left( \mathbb {R}^{\dagger }\right) . \end{aligned}$$

The distribution of the discrete random variable X can be described by \(P_{X}(\{x\}), x\in X(\varOmega )\). For example, if we consider the Grossone probability space \(\mathbb {N}\left( \textcircled {1}\right) =\left( \mathbb {N},{\mathscr {P}}\left( \mathbb {N}\right) ,\Pr \right) \) and the random variable \(X :\mathbb {N}\rightarrow \mathbb {N}, X\left( n\right) =n\), then the probability distribution of X is

$$\begin{aligned} P_{X}\left( \left\{ n\right\} \right) =\Pr \left( \left\{ X=n\right\} \right) =\frac{1}{\textcircled {1}}, \end{aligned}$$

that is, the Grossone-like uniform distribution.

Using Grossone calculus, one can calculate the moments and variance of the Grossone-like uniform distribution. For example,

$$\begin{aligned} E\left( X\right)= &\, \sum \limits _{n\ \in \ \mathbb {N}}n\cdot \frac{1}{\textcircled {1}}=\frac{1}{\textcircled {1}}\cdot \frac{\textcircled {1}\left( \textcircled {1}+1\right) }{2}=\frac{\textcircled {1}+1}{2},\\ E\left( X^{2}\right)= & \, \sum \limits _{n\ \in \ \mathbb {N}}n^{2} \cdot \frac{1}{\textcircled {1}}=\frac{1}{\textcircled {1}}\cdot \frac{\textcircled {1}\left( \textcircled {1}+1\right) \left( 2\textcircled {1}+1\right) }{6}\\= & \, \frac{\left( \textcircled {1}+1\right) \left( 2\textcircled {1}+1\right) }{6},\\ Var\left( X\right)= & \, E\left( X^{2}\right) -E^{2}\left( X\right) =\frac{\left( \textcircled {1}+1\right) \left( 2\textcircled {1}+1\right) }{6}\\&-\frac{\left( \textcircled {1}+1\right) ^{2}}{4}=\frac{\left( \textcircled {1}\right) ^{2}-1}{12}. \end{aligned}$$

A Non-Uniform Grossone-Like Probability

In this section, we discuss a non-uniform Grossone-like probability.

The binomial experiment with parameters \(n\in \mathbb {N}\) and rational \(p\in (0,1)\) is a sequence of n independent experiments, each asking a yes–no question with the outcome being: “success” (with probability p) or “failure” (with probability \(q = 1 - p\)). If we consider \(n=\textcircled {1}\), then the discrete probability distribution of the number of successes in this experiment can be modelled by the Grossone-like probability space \((\mathbb {N}\cup \{ 0\}, {\mathscr {P}}(\mathbb {N}\cup \{ 0\}),\mathrm{P}_{\mathrm{Binomial},p} )\) where

$$\begin{aligned} \mathrm{P}_{\mathrm{Binomial},p}\left( \left\{ k\right\} \right) =\left( \begin{array} [c]{c} \textcircled {1}\\ k \end{array} \right) \cdot p^{k}\left( 1-p\right) ^{\textcircled {1}-k},\ k\in \ \mathbb {N}\cup \left\{ 0\right\} . \end{aligned}$$

Using Grossone calculus we have

$$\begin{aligned} \sum \limits _{k=0}^{\textcircled {1}}\left( \begin{array} [c]{c} \textcircled {1}\\ k \end{array} \right) \cdot p^{k}\left( 1-p\right) ^{\textcircled {1}-k}=\left( p+\left( 1-p\right) \right) ^{\textcircled {1}}=1^{\textcircled {1}}=1. \end{aligned}$$

Conclusion

In this paper, we have constructed and studied a class of infinitesimal probabilities on infinite sets of positive integers based on the notion of Grossone size. More precisely, we have used the “magnified” set \(\mathbb {N}^{\dagger }\) to measure the sizes of different, finite or infinite, subsets of \(\mathbb {N}\).

These probabilities have natural properties which are not satisfied in the standard Kolmogorov probability theory on infinite sample sets, like regularity, totality, uniform distributivity and weak Laplaceanity. In this framework, De Finetti fair lottery has the natural solution and Williamson’s objections against infinitesimal probabilities are mathematically refuted.

An advantage of the proposed construction over other infinitesimal probabilities studied in the literature [1] comes from its naturalness and simplicity: the construction is operationally similar to classical (finite) probability.

It will be interesting to test the utility of these probabilities in areas in which their properties not shared by the standard probability theory matter: a prime example will be (quantum) physics (see for example [19,20,21]).