1 Introduction

It is well known that in the state space of a classical mechanical system there are evolutions that induce entropy-increasing behavior in accordance with the Second Law of thermodynamics, but there are also evolutions that give rise to non-thermodynamic behavior. In our experience it seems that the actual evolutions of thermodynamic systems are of the first kind. And one of the most puzzling questions at the foundations of statistical mechanics is why this is so. The standard answer in statistical mechanics is given in terms of probability, namely that thermodynamic behavior is highly likely. But classical mechanics is a completely deterministic theory, and this means that a Laplacian demon could predict with complete certainty the evolution of any mechanical system, given the system’s microstate. This holds for thermodynamic systems no less than for other systems. Therefore, when one introduces the notion of probability into statistical mechanics in order to predict the future state of a thermodynamic system, one expresses ignorance with respect to the microstate of the system, and ipso facto with respect to the macrostate determined by the unknown microstate.Footnote 1 And put in this way, one of the central questions at the foundations of statistical mechanics is how can the notion of ignorance in statistical mechanics be made physical or objective, and have empirical significance. In other words, since what we wish to explain in statistical mechanics is thermodynamic behavior which is as physically objective as anything can be, the probabilities in statistical mechanics must be understood in the same way. We will show in this paper how this can be done.

Recently it has been argued that typicality considerations play a crucial explanatory role in deterministic theories in physics (e.g. classical statistical mechanics and Bohmian mechanics). In this approach a sharp distinction is made between typicality and probability. We analyze the relation between the notion of typicality and probability, the question of the choice of measure in deterministic theories in physics, and the way in which probability and typicality arise and should be understood in such theories. We will argue that even in theories with deterministic dynamics, like classical statistical mechanics and Bohm’s theory, it is the notion of probability rather than typicality that may (sometimes) have an explanatory value.

The paper is structured as follows. In Sect. 2 we explain how the choice of the probability measure should be done in statistical mechanics on the basis of transition probabilities. In Sect. 3 we consider the meaning of measure-1 theorems in mechanics. In particular, we focus on the significance of Lanford’s theorem as a theorem about typical behavior. In Sect. 4 we analyze the so-called typicality approach in statistical mechanics and we argue that on some prevalent ways of understanding it, the approach is wanting. In Sect. 5 we consider the implications of our discussion for Bohmian mechanics, a deterministic theory of quantum phenomena, which reproduces the statistical predictions of standard quantum mechanics. Section 6 is the conclusion.

2 Probability and the Choice of Measure

To fix the ideas let us consider the paradigmatic example of a gas expanding in a container after the removal of a partition (see Fig. 1). As can be seen in Fig. 1 at time t 0 the gas occupies the left hand side of a container. The gas evolves such that at t 1 it occupies three quarters of the volume of the container and at t 2 the gas fills the entire volume. This is a special case of the thermodynamic Law of approach to equilibrium. It is well known that in classical mechanics there are so-called non-thermodynamic initial microstates compatible with the gas’s not evolving to equilibrium. For this reason, there can be no general deterministic account of the Law of the approach to equilibrium which explains the gas’s behavior, and one must appeal to probability. This can be done as follows.

Fig. 1
figure 1

Approach to equilibrium

In statistical mechanics the phase space of a classical system is partitioned into sets of microstates, which are indistinguishable by an observer; these sets are called macrostates (see Fig. 2).Footnote 2 Let M 0 denote the initial macrostate of the gas, M 1 intermediate macrostate and M 2 the final macrostate in which the gas fills the entire volume of the container. Suppose that at time t 0 the system S is in some low entropy macrostate M 0 . Each microstate in M0 evolves along a trajectory segment from t0 to t1. We call the set of the end points of all such trajectory segments the dynamical blob B(t 1) of S at t 1 (given that it was in M 0 at t 0 ). B(t 1) partly overlaps with M 1 and partly with M 0. These trajectory segments evolve further such that at time t 2 their end points make up the dynamical blob B(t 2) which overlaps with M 1 and M 2 (see Fig. 2).

Fig. 2
figure 2

Macrostates and dynamical evolution of a blob

In general, the region covered by a dynamical blob B(t) overlaps with several macrostate regions; for instance, B(t 1) partially overlaps with M 0 and M 1. This means that, in this case, some of the initial microstates of the gas at t 0 that belong to M 0 may evolve such that they remain in M 0 at t 1 and others may evolve such that they arrive into M 1 at t 1. If the system S, which started out in M 0 at t 0 , actually happened to remain in M 0 at t 1 then the microstate of S at t 1 is in the intersection of B(t 1) and M 0; and if the system S, which started out in M 0 at t 0 , actually happened to evolve to macrostate M 1 at t 1, then the microstate of S is at some point in the intersection between the macrostate M 1 and the blob B(t 1). The question is now what is the probability for each of these two evolutions given the initial macrostate (and a similar question can of course be posed with respect to B(t 2) at time t 2 ). For example: what is the probability that S will end up in M 1 (or in M 0) at t 1 , given that it started out in M 0 at t 0 ?

Each point in the dynamical blob B(t) is a possible microstate of the system at time t, and therefore the macrostates with which B(t) overlaps at time t are possible macrostates M i of the system at t. It is therefore plausible that the relative frequencies of microstates in the different regions of overlap between B(t) and the M i yield the probabilities for these macrostates at t. And since the state space is continuous, these relative frequencies must be given in terms of the relative sizes of the regions of overlap between the dynamical blob B(t) and the macrostates M i. Determining these sizes requires a measure on the phase space, and the question then becomes: what is the right measure with which one should determine the size of overlaps between the dynamical blobs and macrostates, in order to determine the probability that S will end up in a given macrostate at a given time (given that it started out in M 0 at t 0 )? In other words, what is the correct choice of measure for calculating the probabilities of macrostates? According to the orthodox wisdom in statistical mechanics the answer is invariably the Lebesgue measure. We will now show that in general the measure for calculating probabilities need not be the Lebesgue measure, and we will describe the special case in which the Lebesgue measure is the right choice.

If we run the same experiment many times we will find that at t 1 S will be in M 0 with some relative frequency F 0 and in M 1 with relative frequency F 1 (and similarly with respect to t 2 ). These relative frequencies are the empirical basis on which the probabilistic statements in statistical mechanics can be based, and on the basis of which these statements can be tested or justified.

The next step is a conjecture on the basis of our experience that this statistical behavior will be repeated (more or less) in the future. This means that the relative sizes of the regions of overlap between B(t) and the macrostates M i should reflect the relative frequencies of the macrostates M i found in experience. However, in order to determine the relative sizes of subsets of the phase space one needs to impose a measure over the phase space. It is well known that since the phase space is continuous there are infinitely many measures which yield different relative sizes of its subsets. To motivate the choice of the measure that is suitable for determining the probabilities of macrostates on the basis of the sizes of the overlaps, we suggest the following consideration. We identify the set of probability measures that, if applied to the continuous phase space of S, yield a measure of the regions of overlap of the blob B(t 1) with the macrostates M 0 , and M 1 that are (to a satisfactory approximation) identical with the relative frequencies F 0 and F 1. There are many—possibly infinitely many—such measures, and all of them are empirically adequate. Among them we choose one measure, using pragmatic criteria such as simplicity, convenience, meshing with other theories, etc. Call this measure µ. The (normalized) measures of the regions of overlap are then given by \( \mu \left( {B\left( {t_{1} } \right) \cap M_{0} )} \right) \approx F_{0} , \) \( \mu \left( {B\left( {t_{1} } \right) \cap M_{1} } \right) \approx F_{1} . \) This measure µ provides the basis for predicting the evolution of systems of type S in terms of the two-time transition Probability Rule as follows:

Probability Rule: The transition probability that S evolves to macrostate M i at t 0  + Δt given that it was in macrostate M 0 at t 0 (under the conditions of the above experiment) is equal to

$$ \mu \left( {B\left( {t_{0} + \Delta t} \right) \cap M_{i} |M_{0} } \right) \approx F_{i} . $$

That is, the transition probability from the macrostate M 0 at t 0 to M i at t 0  + Δt is equal to the (normalized) measure µ of the region of overlap of the blob B(t 0  + Δt) with the macrostates M i . This is the content of probability statements in statistical mechanics. Note that in general μ(M i )/μ(M j ) need not be equal to \( \mu \left( {B\left( t \right) \cap M_{i} } \right)/\mu \left( {B\left( t \right) \cap M_{j} } \right). \) This implies that the transition to a given macrostate need not be equal to the entropy of that macrostate, even if both are measured by the same measure µ.Footnote 3

The above construction of transition probabilities brings out two central issues in the understanding of statistical mechanics. First, it is important to note that the transition probabilities given by the above rule supervene entirely on the interplay between the structure of trajectories in the phase space and the partition of the phase space into macrostates. In statistical mechanics, both of these notions are objective: the structure of trajectories is fixed by the dynamics, and the partition to macrostates is determined by the structure of the accessible region in the phase space. For this reason, the transition probabilities above can be understood as entirely objective, despite the fact that the underlying dynamics is deterministic. In this sense, one may say that in statistical mechanics the probabilities describe objective ignorance, regardless of whether or not the underlying dynamics is deterministic (for further details see Hemmo and Shenker 2012, Ch. 5–6).

The second issue we wish to focus on concerns the justification of the choice of measure and what justifies probabilistic statements in classical statistical mechanics. As we saw, probabilistic statements are grounded in the experience of relative frequencies, and therefore the choice of measure is based on inductive reasoning from the observed relative frequencies. In our view the fit with the observed relative frequencies is the only compelling argument for the choice of measure in statistical mechanics. In particular, it follows from the above construction of the transition probabilities that the measure chosen to express the probabilities of macrostates need not be the Lebesgue measure. However, it may happen that the Lebesgue measure is the most convenient measure among all the measures compatible with the relative frequencies of macrostates, and in this case the Lebesgue measure should be taken to express the transition probabilities in statistical mechanics.

In the literature the Lebesgue measure is used to express probabilities over initial conditions rather than transition probabilities. There are various attempts to justify the choice of the Lebesgue measure over initial conditions. We consider some of these attempts in Sect. 4. Here we focus on how to account for this choice in our framework. It is important to emphasize that in our framework the notion of transition probabilities is more fundamental than the notion of a probability distribution over initial conditions. That is, the notion of a probability distribution over initial conditions is derivative. Moreover, this notion has an empirical content only because it can be derived from the transition probabilities between macrostates. Let us see why.

Suppose that the measure µ in terms of which we express the transition probabilities turns out to be invariant under the classical dynamics. Suppose for example, that the measure µ happens to be the Lebesgue measure. Then one can map, backwards (as it were), the Lebesgue measure of regions over the blob at later times to the corresponding regions over the initial mactostate. That is, in this case the Lebesgue measure of a set of points in M 0 is equal to the Lebesgue measure of the union of the time evolved set of points to which this set is mapped by the dynamics. Once the (normalized) invariant measure is fixed in this way (by the transition probabilities) one can distribute uniform probabilities relative to the Lebesgue measure over the initial macrostate.Footnote 4 On the basis of this probability distribution over the initial conditions, which in our construction can be derived from the transition probabilities, one can construct all the probabilistic statements in statistical mechanics. Note that in order to use the probability distribution over initial conditions for predictions of future macrostates, one must know the evolution of the blob B(t) at the relevant finite times. For some cases this may be relatively easy (for example if the dynamics yields ergodic-like behavior for finite times), but at any rate the probability distribution over the initial conditions is never sufficient by itself for making the right predictions, which require knowing the dynamics. The probability distribution over initial conditions together with the dynamics yield the transition probabilities as given by our Probability Rule. In suitable cases this rule will entail that initial conditions that belong to sets with large measure have high probability of occurring. For example, if a set of initial conditions of a thermodynamic system that has measure one evolves to equilibrium in some time interval, then we shall say that with high probability the system satisfies the laws of thermodynamics.

However, as we said, this interpretative move is derivative. One can understand the meaning of a probability distribution over initial conditions in this derivative way, namely as a shorthand of the Probability Rule (in suitable cases), and thus make sense of a notion that seems otherwise to be empirically meaningless. The measure µ that appears in the Probability Rule is chosen so as to fit our observations of relative frequencies of macrostates. Whether or not this measure happens to be the Lebesgue measure, or more generally a measure that is invariant under the dynamics, is a contingent matter that depends on the observed relative frequencies. This shows that the two issues, of the choice of measure, and of the primacy of the transition probabilities over probabilities of initial conditions are closely linked.

3 Measure-1 Theorems

The above account of probability in statistical mechanics has implications for the significance of measure-1 theorems. We focus here as an example on Lanford’s theorem.Footnote 5 Lanford proved on the basis of the classical equations of motion, that, roughly, given some specific initial macrostate, and some specific kind of Hamiltonian, a subset of Lebesgue measure one of the microstates in that macrostate will evolve to a macrostate with larger entropy, after a certain short time.Footnote 6 What is the significance of Lanford’s theorem given the above account of probabilities in statistical mechanics?

In terms of our transition probabilities Lanford’s theorem proves that the Lebesgue measure of the overlap between the blob B(t 0  + Δt) and the macrostate E of equilibrium (or some other high entropy macrostate) is 1. Of course, as we said above, since the Lebesgue measure is conserved under the dynamics, one may interpret Lanford’s theorem as referring to the Lebesgue measure of subsets of the initial macrostate M 0 at t 0 . However, inferring anything about the measure of subsets of the initial macrostate is an artifact of the contingent fact that the Lebesgue measure matches the observed relative frequencies.

To appreciate this point, note that if the measure µ that matches our experience were not the Lebesgue measure, but some other measure (that may not be absolutely continuous with Lebesgue measure) then Lanford’s theorem would have a completely different significance: for instance, it could happen that by the measure µ the number of systems that go to equilibrium given Lanford’s Hamiltonian would be small. The theorem that a set of Lebesgue measure one has a certain property (such as approaching equilibrium after some finite time interval) would be empirically insignificant—unless this fact is supplemented by the additional fact that the Lebesgue measure happens to correspond (to a useful approximation) to the observed relative frequencies.

4 The Typicality Approach

The above account of probability in statistical mechanics differs from a tradition in the literature (which we shall call the typicality approach) in the way in which the choice of measure is justified (see for example Dürr 2001; Goldstein 2012). According to the typicality approach the choice of the Lebesgue measure in statistical mechanics is justified because the Lebesgue measure is natural. Of course the main issue here is what is meant by ‘natural’. If by ‘natural’ one means that the Lebesgue measure matches the observed relative frequencies of macrostates, then our above account (in Sect. 2) coincides with the typicality approach. But this is not what is usually meant by ‘natural. We shall now list a few senses of ‘natural’ that are brought up in the literature and argue very briefly that none of them can justify the choice of measure in statistical mechanics.

(1) It is sometimes said that the Lebesgue measure in statistical mechanics is preferred since it is invariant under the classical dynamics as expressed by Liouville’s theorem. Invariance of the measure under the dynamics means that the measure of a given set of points in the state space is equal to the measure of the set to which it is mapped by the time evolution equations for all times. In the case of an ergodic dynamics this idea receives further support due to a theorem according to which the Lebesgue measure is the only measure that is invariant under the dynamics amongst the set of all measures that give measure zero to all the Lebesgue measure zero sets (the so-called measures absolutely continuous with the Lebesgue measure).

Of course a measure that is invariant under the dynamics has very attractive properties (simplicity, elegance, etc.). We already saw in Sect. 2 that if the measure µ that expresses the transition probabilities is taken to be the Lebesgue measure, then it is simple and maybe natural to translate the transition probabilities to statements about a probability distribution over initial conditions. In the typicality approach the measure is used to determine the sizes of subsets of initial conditions, but in this approach the fact that the measure is invariant under the dynamics (even if uniquely) is taken to be a merit in itself. As we said in Sect. 2, the fact that the measure is invariant under the dynamics does not entail that this measure is necessarily the measure that justifies probabilistic statements, unless one grounds this choice in the transition probabilities between macrostate.

(2) In the case of an ergodic system the Lebesgue measure is related to the relative frequencies of macrostates via the ergodic theorem (proved by Birkhoff and von Neumann), and this seems to give further justification for using the Lebesgue measure to determine probabilities. According to the ergodic theorem the Lebesgue measure of a macrostate is equal to the relative frequency of that macrostate in the infinite time limit (for a Lebesgue measure-1 of initial conditions).

Here the mainFootnote 7 problem (in relying on the ergodic theorem to justify the choice of the Lebesgue measure as the one that determines the probabilities in statistical mechanics) is the clause in the brackets: namely the relative frequency of a macrostate in the infinite time limit is equal to the Lebesgue measure of that macrostate only for a set of initial conditions of Lebesgue measure one. The measure µ that defines ergodicity, although formally may be interpreted as a probability measure (since it is a normalized sigma additive measure), need not be interpreted as probability. Therefore, in understanding the ergodic theorem as a theorem about probability one must identify from the outset that a set of Lebesgue measure one has probability one. But such identification is not part of the ergodic theorem. Any attempt to use measure one theorems in statistical mechanics as grounding probability statements will be subject to the same criticism (see our discussion of Lanford’s theorem in the previous section).

(3) Another argument sometimes given for taking the Lebesgue measure as the natural measure of probability in statistical mechanics is that the Lebesgue measure of a macrostate corresponds to the thermodynamic entropy of that macrostate. Even if this argument were generally true,Footnote 8 it would not entail that the measure of entropy (which is the measure of the macrostate of the system) should be the same as the measure of probability. Indeed, if the probability of a macrostate would invariably be its Lebesgue measure, then systems would evolve with high probability from any low entropy state directly to equilibrium, and not via a sequence of macrostates of gradually increasing entropies.Footnote 9

5 Typicality in Bohmian Mechanics

The question of typicality has been considered extensively and explicitly not only in classical statistical mechanics, but also in the context of Bohm’s theory by Dürr et al. (1992, hereafter DGZ). DGZ prove that a typical Bohmian trajectory exhibits relative frequencies of measurement outcomes that conform to the quantum mechanical Born rule. Here is what they say about their proof:

…To demonstrate the compatibility of Bohmian mechanics with the predictions of the quantum formalism, we must show that for at least some choice of initial universal Ψ and q, the evolution [given by Bohm’s velocity equation] leads to apparently random pattern of events, with empirical distribution given by the quantum formalism. In fact we show much more.

We prove that for every initial ψ this agreement with the predictions of the quantum formalism is obtained for typical – i.e. for the overwhelming majority of – choices of initial q. And the sense of typicality here is with respect to the only mathematically natural – because equivariant – candidate at hand, namely, quantum equilibrium.Footnote 10

Thus, on the universal level, the physical significance of quantum equilibrium is as a measure of typicality, and the ultimate justification of the quantum equilibrium hypothesis is, as we shall show, in terms of the statistical behavior arising from a typical initial configuration. (DGZ 1992, p. 859)

In other words, DGZ argue that for every initial universal wavefunction and for a typical initial global configuration, the probability distribution over the Bohmian position of subsystems of the universe is given by the absolute square of the effective wavefunction of the subsystems (when an effective wavefunction exists). From this result they show that the conditions sufficient for the laws of large numbers to hold are satisfied in Bohmian mechanics with probability distribution that recovers the predictions of standard quantum mechanics as given by Born’s rule. Here the notion of a typical global configuration is understood relative to the quantum mechanical measure, i.e. the absolute square of the universal wavefunction.

What is the role of the typicality assumption in this argument? Typicality is meant to replace here the notion of a probability distribution over the initial conditions of the universe, and thus explain the initial conditions while avoiding the fairy tale concerning the random or probabilistic choice of initial conditions (see Goldstein 2012). As we argued above, there are two problems with this approach. First, if the notion of typicality is non-probabilistic, it is unclear why a condition which is true of most initial conditions (relative to the measure) should obtain for a given system. The problem is to justify the choice of the measure of typicality (in this case, the absolute square of the universal wavefunction) in a non-circular way. The argument given by DGZ for preferring the quantum mechanical measure as natural is that it is equivariant under the dynamics of the wave function and the Bohmian dynamics of the initial q.Footnote 11 However this argument is irrelevant just as the classical Liouville’s theorem is irrelevant as an argument for preferring the Lebesgue measure as the measure of probability in the case of ergodic dynamics.

It is important to stress that this criticism concerning the notion of typicality does not in anyway undermine the important results by DGZ concerning the probabilistic content of Bohm’s theory. In fact, it seems to us that one can retain all these results by making the simpler assumption that the initial condition of the universe just is—as a matter of fact—of the kind that yields the quantum mechanical predictions. It seems to us that this assumption yields the strongest support for justifying the probabilistic content of Bohm’s theory. No further justification is needed. Moreover, no further empirical justification is possible, and the reason is that we have no empirical access to the initial configuration of the universe and to the universal wavefunction. Rather, we construct the set of initial conditions and the wavefunction for subsystems of the universe in a way that yields the quantum mechanical Born rule distribution (the absolute square of the (effective) wavefunction of subsystems of the universe in Bohm’s theory), where the Born rule itself is subject to empirical tests in our experience. We then find a measure of typicality over the set of all possible configurations of the universe—the absolute square of the amplitude of the universal wavefunction—according to which the right sort of initial universal conditions turn out to be typical. But evidently in so far as explanation is concerned, such a procedure is viciously circular since we choose the measure in a way that the properties in which we are interested are the same for the vast majority of the admissible initial conditions. And this leads us to the same questions we encountered in the case of statistical mechanics in the previous section.

6 Conclusion

Typicality considerations appear in a variety of contexts in physics. Here are two further examples. One is Einstein’s (1905) account of Brownian motion, as developed by Wiener (see Pitowsky 1992). As is well known, Wiener has proved that the so-called Wiener measure of trajectories in the phase space of a Brownian particle, which are continuous but nowhere differentiable, is one. The explanation of the actual behavior of Brownian particles is based on the assumption that their actual trajectories belong to this set of measure one. Avogadro’s number is derived from this assumption. Another more recent example in quantum mechanics is an argument by Popesecu et al. (2005) and (independently) by Goldstein et al. (2006) according to which the near uniform (or maximally mixed) state of a subsystem is explained by the fact that almost all pure states in the Hilbert space of the universe induce on a small enough subspace a relative state which is very close (in trace norm) to the maximally mixed state (see also Pitowsky 2012).

We argued that typicality considerations are not justified as grounding probability statements in statistical mechanics and in Bohmian mechanics. These theories, however, are deterministic. The question remains whether the above criticism is applicable in stochastic approaches to understanding quantum mechanics. In such approaches, it might be that the choice of measure that yields the right statistical mechanical probabilities for thermodynamic systems (for example) is invariably dictated by the probabilities in the stochastic dynamics. This may have interesting implications for the idea of typicality.Footnote 12 We leave this question open for future research.