1 Introduction

By the end of June 1902, just after being accepted as Technical Assistant level III at the Federal Patent Office in Bern, Albert Einstein, 23, sent to the renowned journal Annalen der Physik a manuscript with the bold title “Kinetic Theory of Thermal Equilibrium and of the Second Law of Thermodynamics” [1]. In the introduction, he explains that he wishes to fill a gap in the foundations of the general theory of heat, “for one has not yet succeeded in deriving the laws of thermal equilibrium and the second law of thermodynamics using only the equations of mechanics and the probability calculus”. He also announces “an extension of the second law that is of importance for the application of thermodynamics”. Finally, he will provide “the mathematical expression of the entropy from the standpoint of mechanics”. Einstein’s papers and their translations are available on the Princeton University Press site [2].

In the following two years Einstein followed this line of research publishing a paper each year [3, 4]. The third one, entitled “On the general molecular theory of heat”, submitted on March 27, 1904, opened a new path, by tacitly extending the results obtained for a general mechanical system (with a large, but finite, number of degrees of freedom) to the case of black-body radiation. In pursuing this line of research Einstein found an unexpected result, that pointed at an inconsistency between the current understanding of the processes of light emission and absorption and the statistical approach. To resolve this inconsistency, in the first paper [5] of his “Annus Mirabilis” 1905, Einstein renounced the detailed picture of light emission and adsorption provided by Maxwell’s equations, maintaining his statistical approach, in particular the statistical interpretation of entropy. He introduced therefore the concept of light quanta, presented from a “heuristic point of view”.

The importance of the 1902–1904 papers on the molecular theory of heat in Einstein’s intellectual development and in the advance of physics has been stressed by Kuhn [6, p. 171], when he states that

What brought Einstein to the blackbody problem in 1904 and to Planck in 1906 was the coherent development of a research program begun in 1902, a program so nearly independent of Planck’s that it would almost certainly have led to the blackbody law even if Planck had never lived.

In spite of their importance, the 1902–1904 papers have received comparatively little attention. One of the reasons was the publication in 1902 of Gibbs’ Elementary Principles in Statistical Mechanics [7]. This book is considered, especially since the publication of the influential book by Tolman [8], as the founding text of the discipline. Einstein himself contributed to the neglect of the 1902–1904 papers. In his answer to Paul Hertz’ criticism of his derivation of the second principle [9], he says

I only wish to add that the road taken by Gibbs in his book, which consists in one starting directly from the canonical ensemble, is in our opinion preferable to the road I took. If I had known Gibbs’ book at that time, I would have not published these papers at all, but I would have limited myself to the treatment of a few points.

In his scientific autobiography [10, p. 47] Einstein returned to this point, saying

Not acquainted with the earlier investigations by Boltzmann and Gibbs, which had appeared earlier and actually exhausted the subject, I developed the statistical mechanics and molecular-kinetic theory of thermodynamics which was based on the former. My major aim in this was to find facts which would guarantee as much as possible the existence of atoms of definite size.

The last sentence of this quotation highlights the different attitude of Einstein with respect to Gibbs. Einstein aims at using the statistical approach to establish the reality of atoms, while Gibbs aims at a rational foundation of thermodynamics, and consequently focuses on the regularities which emerge in systems with many degrees of freedom. Einstein’s papers contain a more direct and fundamental approach to the statistical mechanics of equilibrium, and could actually suggest a didactically effective path to the introduction of the fundamental ideas of the field. We shall therefore attempt to ease their reading by summarizing them, pointing out in particular the differences between Einstein’s and Gibbs’ points of view. We shall not try to discuss all the detailed analyses of the papers which have appeared in the literature (beyond Kuhn’s work [6], one can also read [1116]), but shall only refer to the more interesting observations.

2 Kinetic Theory of Thermal Equilibrium and of the Second Principle of Thermodynamics

The first two papers [1, 3] have a very similar structure. The second paper aims to widen the scope of the first, by attempting to consider “general” dynamical systems and irreversible processes. We shall follow the first paper, and we shall then briefly review the points in which the second paper differs. We adapt Einstein’s discussion to modern notation.

Einstein begins by considering a general physical system as represented by a mechanical system with many coordinates \(q=(q_{1},\ldots ,q_{n})\) and the corresponding momenta \(p=(p_{1},\ldots ,p_{n})\), obeying the canonical equations of motion with a time-independent Hamiltonian that is the sum of a potential energy (function of the q’s alone) and of a kinetic energy that is a quadratic function of the p’s, whose coefficients are arbitrary functions of the q’s (and is implicitly supposed to be positive definite). Following Gibbs, we shall call the p’s and q’s collectively as the phase variables, and the space they span the phase space. Einstein then considers a very large number N of such systems, with the same Hamiltonian, whose energies E lie between two very close values \(\overline{E}\) and \(\overline{E}+\delta E\). He then looks for the stationary distribution of these systems in phase space.

Here Einstein introduces a strong mechanical hypothesis by assuming that, apart from the energy, there is no other function defined on the phase space that is constant in time.Footnote 1 He argues that this condition is equivalent to the requirement that the stationary distribution of the systems in phase space depends only on the value of the energy. He proves indeed that if there are other functions \(\phi (q,p)\) that are constants of the motion, the stationary distribution is not uniquely identified by the value of the energy, but does not attempt to prove the converse. He then shows that Liouville’s theorem implies that the local density of systems in phase space is constant in time and therefore, by the mentioned hypothesis, must be a function of the energy alone. Since the energies of all N systems are infinitely close to one another, this density must be uniform on the region of phase space defined by the corresponding value of the Hamiltonian. In this way Einstein has defined what is now called the microcanonical ensemble, i.e., the distribution in phase space which is uniform when the energy of the system lies between two closely lying values, and vanishes otherwise.

Einstein now turns to the consideration of thermal equilibrium between one system \(\mathsf {S}\) and one \(\varSigma \) considerably larger.Footnote 2 The second system acts as a thermal reservoir, and the first one as a thermometer. He assumes that the total energy \(\mathscr {E}\) of the global system \(\mathsf {S}\cup \varSigma \) can be written as

$$\begin{aligned} \mathscr {E}=E+H, \end{aligned}$$
(1)

up to negligible terms, where E pertains to \(\mathsf {S}\) and H to \(\varSigma \). Let the phase variables of \(\mathsf {S}\) be denoted by (pq) and those of of \(\varSigma \) by \((\pi ,\chi )\). The question is now to find the distribution of the phase variables of \(\mathsf {S}\) when the energy of the global system lies between \(\mathscr {E}_{0}\) and \(\mathscr {E}_{0}+\delta \mathscr {E}\), while the phase variables of \(\varSigma \) can take on any values. As pointed out by Uffink [15], this problem was considered several times by Boltzmann, who almost always solved it by taking an ideal gas for \(\varSigma \) and explicitly evaluating the resulting phase-space integral. Einstein instead introduces an elegant trick which leads directly to the desired result. Let us consider an infinitesimally small domain g in the phase space of the global system \(\mathsf {S}\cup \varSigma \), with energy \(\mathscr {E}\) between \(\mathscr {E}_{0}\) and \(\mathscr {E}_{0}+\delta \mathscr {E}\). Then the number dN of systems of the ensemble which are found in g is

$$\begin{aligned} d N = A \int _{g}d p\,d q\; d\pi \, d\chi , \end{aligned}$$
(2)

where A is a constant. Actually one can choose instead of A any function of the total energy \(\mathscr {E}\) which takes the value A for \(\mathscr {E}=\mathscr {E}_{0}\). Let us thus setFootnote 3

$$\begin{aligned} A = A^{\prime } \,e^{-\beta \, \mathscr {E}_{0}}=A^{\prime }\,e^{-\beta \,E}e^{-\beta \,H }, \end{aligned}$$
(3)

where \(\beta \) is a constant. Thus the number \(d N^{\prime }\) of systems such that the phase variables of \(\mathsf {S}\) lie in a region of volume \(d p\;d q\) around the point (pq), while the variables of \(\varSigma \) can have any value, as long as \(\mathscr {E}\) lies between \(\mathscr {E}_{0}\) and \(\mathscr {E}_{0}+\delta \mathscr {E}\), is given by

$$\begin{aligned} d N^{\prime } = A^{\prime } e^{-\beta E}\,d p\, d q\int e^{-\beta H}\,d \pi \, d \chi , \end{aligned}$$
(4)

where the integral runs over all values of the phase variables of \(\varSigma \) such that the values of its Hamiltonian H lie between \(H_{0}\) and \(H_{0}+\delta \mathscr {E}\), and

$$\begin{aligned} H_{0}=\mathscr {E}_{0}-E. \end{aligned}$$
(5)

The value of the constant \(\beta \) can be fixed by requiring that the integral appearing on the right-hand side of Eq. (4) be independent of E. Indeed, once \(\delta \mathscr {E}\) is fixed, the integral can be considered as a function \(\varPhi (H)\) of H alone. Thus, since \(E\ll \mathscr {E}_{0}\), we have

$$\begin{aligned} \varPhi (H_{0})=\varPhi (\mathscr {E}_{0}-E)\simeq \varPhi (\mathscr {E}_{0})-E\,\varPhi ^{\prime }(\mathscr {E}_{0}), \end{aligned}$$
(6)

where \(\varPhi ^{\prime }\) is the derivative of \(\varPhi \) with respect to its argument. Thus \(\varPhi ^{\prime }(\mathscr {E}_{0})=0\). We can write however

$$\begin{aligned} \varPhi (H)=e^{-\beta H}\cdot \omega (H), \end{aligned}$$
(7)

where \(\omega (H)=\int d\pi _{1}\ldots d\chi _{n}\), with the integral extended to the region in phase space such that the energy of \(\varSigma \) lies between H and \(H+\delta \mathscr {E}\). The condition now reads

$$\begin{aligned} e^{-\beta \mathscr {E}_{0}}\omega (\mathscr {E}_{0})\left[ -\beta +\frac{\omega ^{\prime }(\mathscr {E}_{0})}{\omega (\mathscr {E}_{0})}\right] =0, \end{aligned}$$
(8)

where \(\omega ^{\prime }\) is the derivative of \(\omega \) with respect to is argument. We therefore obtain the required condition for \(\beta \) in the form

$$\begin{aligned} \beta = \frac{\omega ^{\prime }(\mathscr {E}_{0})}{\omega (\mathscr {E}_{0})}. \end{aligned}$$
(9)

Einstein now turns to show that the quantity \(\beta \) is always positive. He first derives a lemma by considering a general (positive definite) quadratic function \(\varphi (x_{1},\ldots ,x_{n})\) of n variables (where n is large enough), and defining the function z(y) by the integral

$$\begin{aligned} z(y)=\int d x_{1}\ldots d x_{n}, \end{aligned}$$
(10)

where the integral is extended to all points for which \(\varphi \) lies between y and \(y+\Delta \), where \(\Delta \) is fixed. He then easily shows that, for \(n\ge 3\), z(y) is an increasing function of y. Let us now denote by \(\Gamma (H)\) the phase space available to the larger system \(\varSigma \) when the values of its Hamiltonian lie between H and \(H+\delta \mathscr {E}\). The Hamiltonian of \(\varSigma \) is given by the sum of the potential energy, that depends only on the coordinates, and of the kinetic energy, which is a quadratic form in the momenta, whose coefficients depend only on the coordinates. Let \(H_{0}\) and \(H_{1}\) be two values of H, with \(H_{1}>H_{0}\), and let \(\Gamma (H_{0})\) and \(\Gamma (H_{1})\) be the corresponding available space regions. Let \(Q(H_{0})\) be the region of coordinate space such that the potential energy of the system is smaller than \(H_{0}\). Thus if the point \((\pi ,\chi )\) belongs to \(\Gamma (H_{0})\), the point \((\chi )\) belongs to \(Q(H_{0})\). Within \(\Gamma (H_{1})\) let us identify the region \(\Gamma ^{\prime }(H_{1})\) where the coordinates \(\chi \) belong to \(Q(H_{0})\). Thus, for each such values of the coordinates, since the total energy is larger than \(H_{0}\), the kinetic energy must be larger. Therefore, by the lemma on the monotonic increase of z(y) with y, for each such point in coordinate space, the volume available to the momenta is larger for \(\Gamma ^{\prime }(H_{1})\) than for \(\Gamma (H_{0})\). Integrating over the coordinates we obtain that the volume of \(\Gamma ^{\prime }(H_{1})\) must be larger than that of \(\Gamma (H_{0})\). Since the volume of the region of \(\Gamma (H_{1})\) that does not belong to \(\Gamma ^{\prime }(H_{1})\) cannot be negative, the volume of \(\Gamma (H_{1})\) must be larger than that of \(\Gamma (H_{0})\), i.e., the function \(\omega (H)\) increases with H, and \(\beta \) given by the above expression must be positive.

Now, Einstein derives what is now known as the zero-th law of thermodynamics. Since \(\beta \) depends only on the state of \(\varSigma \), but determines the distribution of \(\mathsf {S}\) in state space, independently on how \(\varSigma \) and \(\mathsf {S}\) interact, it follows that if a given system \(\varSigma \) interacts with two small system \(\mathsf {S}\) and \(\mathsf {S}^{\prime }\) and is in equilibrium with them, \(\mathsf {S}\) and \(\mathsf {S}^{\prime }\) must have the same value of \(\beta \). In particular, if \(\mathsf {S}\) and \(\mathsf {S}^{\prime }\) are mechanically identical, the average value of any arbitrary observable function A(pq) must be equal in \(\mathsf {S}\) and \(\mathsf {S}^{\prime }\). Einstein then calls \(\mathsf {S}\) and \(\mathsf {S}^{\prime }\) thermometers, \(\beta \) the temperature function and the average of A the temperature measure. Then Einstein goes on to prove the converse result, namely that if two systems that have the same values of \(\beta \) are put in contact, they will be in thermal equilibrium. He considers two systems, \(\varSigma _{1}\) and \(\varSigma _{2}\), weakly interacting. Let each of them be in contact with an (infinitesimally) small thermometer \(\mathsf {S}_{1}\) and \(\mathsf {S}_{2}\). The temperature measures \(A_{1}\) and \(A_{2}\) in each thermometer will be the same, since we are in fact dealing with a single interacting system in thermal equilibrium, and therefore also the corresponding temperature functions \(\beta _{1}\) and \(\beta _{2}\) will be equal. Let the interaction terms between \(\varSigma _{1}\) and \(\varSigma _{2}\) be slowly brought to zero. Then the readings of the thermometers will remain equal, but now the reading of \(\mathsf {S}_{1}\) deals only with \(\varSigma _{1}\) and that of \(\mathsf {S}_{2}\) only with \(\varSigma _{2}\). The process is reversible, since we are dealing with a sequence of thermal equilibrium states. Thus, by reversing it, we obtain the required result. As an immediate consequence, we obtain that if \(\varSigma _{1}\) and \(\varSigma _{2}\) are in thermal equilibrium, and so are \(\varSigma _{2}\) and \(\varSigma _{3}\), then \(\varSigma _{1}\) and \(\varSigma _{3}\) are in thermal equilibrium, since they share the same value of \(\beta \). Einstein concludes this section with the intriguing remark:

I would like to note here that until now we have made use of the assumption that our systems are mechanical only inasmuch as we have applied Liouville’s theorem and the energy principle. Probably the basic laws of the theory of heat can be developed for systems that are defined in a much more general way. We will not attempt to do this here, but will rely on the equations of mechanics. We will not deal here with the important question as to how far the train of thought can be separated from the model employed and generalized.

Uffink [15] has remarked that “this quote indicates (with hindsight) a remarkable underestimation of the logical dependence of [Einstein’s] approach on the ergodic hypothesis.” But the passage shows, as also stressed by Uffink, that already in 1902 Einstein was considering the need to extend the statistical approach beyond its application to mechanical systems, no matter how general they can be conceived.

A simple calculation allows Einstein to derive the equipartition theorem in the following form. Let the kinetic energy of a system be represented by a quadratic expression of the form

$$\begin{aligned} K=\frac{1}{2}\left( \alpha _{1}p_{1}^{2}+\cdots +\alpha _{n}p_{n}^{2}\right) , \end{aligned}$$
(11)

where \(\alpha _{i}\), \(i=1,\ldots ,n\), are positive constants or functions of the coordinates q. This form can always be reached from a general quadratic expression by a suitable canonical transformation. The p variables had been denoted as “momentoids” by Boltzmann. Then the average of K at equilibrium is given by

$$\begin{aligned} \left<{K}\right>=\frac{n}{2\beta }. \end{aligned}$$
(12)

In particular, this result implies that the kinetic energy of a single molecule in an ideal gas in three-dimensional space is equal to \(3/(2\beta )\) on average. Kinetic theory teaches us that this quantity is proportional to the product of the pressure and the volume per particle in an ideal gas. Since this is proportional to the absolute temperature T, we obtain

$$\begin{aligned} \frac{1}{\beta }=k_{\mathrm {B}} T=\frac{\omega (H)}{\omega ^{\prime }(H)}, \end{aligned}$$
(13)

where \(k_{\mathrm {B}}\) is Boltzmann’s constant and \(\omega (H)\) is the volume of phase space contained by the energy surfaces of \(\varSigma \) corresponding to the values H and \(H+\delta \mathscr {E}\).

Having found the relation between \(\beta \) and the temperature, Einstein proceeds to the derivation of the second law of thermodynamics, which he here limits to the statement of the integrability of heat divided by the absolute temperature. He switches back to a Lagrangian setting, describing the system’s state by the coordinates q and their time derivatives \(\dot{q}\), and introduces externally applied forces. These forces are split into ones derived from a potential depending on the q’s, and others that allow for heat transfer. The first ones are assumed to vary slowly with time, while the second ones change very rapidly. The infinitesimal heat \(\delta Q\) is defined as the work of the second type of forces. Then a reversible transformation is one in which the system is led from an equilibrium state with given values of \(\beta \) and of the volume V to one with the values \(\beta +\delta \beta \) and \(V+\delta V\). Here Einstein tacitly assumes that the time average of the relevant quantities in a slow transformation can be obtained by averaging the same quantity over the distribution of the N systems in phase space. He thus finds that

$$\begin{aligned} \frac{\delta Q}{T}=d\left( \frac{\left<{E}\right>-F}{T}\right) , \end{aligned}$$
(14)

where \(\left<{E}\right>\) is the average total energy of the system, and F is a constant introduced so that the distribution \(P(p, q)= e^{\beta (E(p,q)-F)}\) is normalized. Einstein remarks that this expression contains the total energy, and is independent of its splitting into kinetic and potential terms.Footnote 4 One can readily integrate this expression, obtaining an explicit form of the entropy S:

$$\begin{aligned} S =\frac{\left<{E}\right>-F}{T}=\frac{\left<{E}\right>}{T}+k_{\mathrm {B}}\log \int e^{-\beta E(p,q)}\,d p\,d q+\text {const.} \end{aligned}$$
(15)

Now Einstein states the announced generalization of the second principle. It is worth quoting this short paragraph in its entirety. Einstein denotes by \(V_{a}\) the potential of the conservative forces performing the reversible transformation. He then states

No assumptions had to be made about the nature of the forces that correspond to the potential \(V_{a}\) [the conservative ones], not even that such forces occur in nature. Thus, the mechanical theory of heat requires that we arrive at correct results if we apply Carnot’s principle to ideal processes, which can be produced from the observed processes by introducing arbitrarily chosen \(V_{a}\)’s. Of course, the results obtained from the theoretical consideration of those processes can have real meaning only when the ideal auxiliary forces \(V_{a}\) no longer appear in them.

Thus the strategy which led to the establishment of the Einstein relation in Brownian motion, in the 1905 paper, is already sketched in this one.

3 A Theory of the Foundations of Thermodynamics

In his 1903 memoir, entitled “A theory of the foundations of thermodynamics” [3], Einstein asks whether kinetic theory is essential for the derivation of the postulates of thermal equilibrium and of the entropy concept, or whether “assumptions of a more general nature” could be sufficient. He goes on therefore to consider a general dynamical system whose state is identified by a collection p of variables \(p=(p_{1},\ldots ,p_{n})\), which correspond to both coordinates and momenta for a mechanical system, and evolve by a general system of equations of motion of the kind

$$\begin{aligned} \frac{d p_{i}}{d t}=\varphi _{i}(p_{1},\ldots ,p_{n});\qquad i=1,\ldots ,n. \end{aligned}$$
(16)

Assuming that the system allows for a unique integral of motion, the energy E(p), he then introduces the equilibrium postulate, according to which a “physical system” eventually reaches a time-independent macroscopic state, in which any “perceptible quantity” assumes a time-independent value. Einstein then looks for the stationary distribution of a collection of N systems, with N large. Each system evolves according to Eqs. (16) and has an energy between E and \(E+\delta E\). He claims that the equilibrium postulate, along with the absence of any integral of motion beyond the energy, implies the existence of a well-defined probability distribution in p-space. Einstein’s argument reads

Starting at an arbitrary point of time and throughout time \(\mathscr {T}\), we consider a physical system which is represented by the Eqs. (16) and has the energy E. If we imagine having chosen some arbitrary region \(\Gamma \) of the state variables \(p_{1}\ldots p_{n}\), then at a given instant of time \(\mathscr {T}\) the values of the variables \(p_{1}\ldots p_{n}\) will lie within the chosen region \(\Gamma \) or outside it; hence, during a fraction of the time \(\mathscr {T}\), which we will call \(\tau \), they will lie in the chosen region \(\Gamma \). Our condition then reads as follows: If the \(p_{1}\ldots p_{n}\) are state variables of a physical system, i.e., of a system that assumes a stationary state, then for each region \(\Gamma \) the quantity \(\tau /\mathscr {T}\) has a definite limiting value for \(\mathscr {T}=\infty \). For each infinitesimally small region this value is infinitesimally small.

Thus the stationary distribution is identified by a function \(\varepsilon (p_{1},\ldots , p_{n})\) such that the number dN of systems which at any given instant in time are found in the infinitesimal region g located around \((p_{1},\ldots ,p_{n})\) is given by

$$\begin{aligned} d N = \varepsilon (p_{1},\ldots ,p_{n})\,d p_{1}\ldots d p_{n}. \end{aligned}$$
(17)

If this is true at a given instant t, then at a close instant \(t+d t\) one has

$$\begin{aligned} d N_{t+d t}=d N_{t}-\left( \sum _{\nu =1}^{n}\frac{\partial (\varepsilon \varphi _{\nu })}{\partial p_{\nu }}\right) d p_{1}\ldots d p_{n}. \end{aligned}$$
(18)

Since \(d N_{t+d t}=d N_{t}\), by the stationarity of the distribution, one must have

$$\begin{aligned} \sum _{\nu =1}^{n}\frac{\partial (\varepsilon \varphi _{\nu })}{\partial p_{\nu }}=0. \end{aligned}$$
(19)

Then

$$\begin{aligned} -\sum _{\nu =1}^{n}\frac{\partial \varphi _{\nu }}{\partial p_{\nu }}=\sum _{\nu =1}^{n}\frac{\partial \log \varepsilon }{\partial p_{\nu }}\varphi _{\nu }=\frac{d\log \varepsilon }{d t}. \end{aligned}$$
(20)

The solution of Eq. (20) is

$$\begin{aligned} \varepsilon =\exp \left[ -\int d t\;\sum _{\nu =1}^{n}\frac{\partial \varphi _{\nu }}{\partial p_{\nu }}+\psi (E)\right] , \end{aligned}$$
(21)

where \(\psi (E)\) is a time-independent integration constant that, by the previous hypotheses, can only depend on the p’s via the energy E. One thus obtains

$$\begin{aligned} \varepsilon =\text {const.}\times \exp \left[ -\int dt\;\sum _{\nu =1}^{n}\frac{\partial \varphi _{\nu }}{\partial p_{\nu }}\right] =\text {const.}\; e^{-m}, \end{aligned}$$
(22)

where m is given by

$$\begin{aligned} m = \int dt\;\sum _{\nu =1}^{n}\frac{\partial \varphi _{\nu }}{\partial p_{\nu }}. \end{aligned}$$
(23)

Einstein now assumes that it is possible to introduce new state variables, denoted by \(\pi _{1},\ldots ,\pi _{n}\), such that the factor \(e^{-m}\) is cancelled by the Jacobian of the transformation. With this transformation, one obtains a uniform stationary distribution in phase space. However it is clear that this transformation cannot be performed unless m is time-independent, which implies \(d(\log \varepsilon )/d t=0\) throughout, i.e., a form of Liouville’s theorem. The oversight was realized by Einstein in March 1903, as witnessed by a letter to Michele Besso [23, Vol. 5, Doc. 7], quoted by Uffink [15]:

If you look at my paper more closely, you will find that the assumption of the energy principle & of the fundamental atomistic idea alone does not suffice for an explanation of the second law; instead, coordinates p must exist for the representation of things, such that for every conceivable total system \(\sum \partial \phi _{\nu }/\partial p_{\nu }=0\). [...] If that is true, then the entire generalization attained in my last paper consists in the elimination of the concept of force as well as in the fact that E can possess an arbitrary form (not completely)?

The sections that immediately follow, on the distribution of a system in contact with a reservoir, on the absolute temperature and thermal equilibrium, and on the definition of “infinitely slow” (quasistationary) processes, are not fundamentally different from the corresponding sections of the 1902 memoir. The derivation of the mechanical expression of the entropy is however slightly different, in particular because the possibility of resorting to the Lagrangian formulation is no longer available. Einstein considers a situation in which the functions \(\varphi _{\nu }\) which appear on the right-hand side of the Eqs. (16) depend not only on the coordinates \(p_{\nu }\), but also on some parameters \(\lambda \). He then considers an infinitely-slow infinitesimal transformation, subdividing it into an isopycnic process, in which the \(\lambda \)’s are kept constant, but the system is put in thermal contact with a system at a different temperature, and an adiabatic process, in which the system is isolated, but the \(\lambda \)’s are allowed to vary. The energy change dE is given in general by

$$\begin{aligned} dE=\sum \frac{\partial E}{\partial \lambda }d\lambda +\sum _{\nu }\frac{\partial E}{\partial p_{\nu }}dp_{\nu }. \end{aligned}$$
(24)

In an isopycnic process the first term on the right-hand side of this equation vanishes, but the second term can be different from zero, since the equations of motion (16), which conserve E, do not hold when the system is not isolated. In an adiabatic process, on the other hand, the second term vanishes, since the equations of motion (16) satisfy energy conservation, but at the same time one has \(dQ=0\). One can therefore write in general

$$\begin{aligned} dQ=\sum _{\nu }\frac{\partial E}{\partial p_{\nu }}\,dp_{\nu }. \end{aligned}$$
(25)

Therefore, in the expression for the change of energy in an infinitely slow process given in Eq. (24), one can identify the second term in the right-hand side with the infinitesimal heat exchange dQ, and the first one, accordingly, with the infinitesimal work. Einstein has thus obtained a mechanical expression of the first principle of thermodynamics.

Let us now denote by \(W(p_{1},\ldots ,p_{n})\) the probability distribution in phase space of the system when it is in equilibrium with an external body with a temperature function given by \(\beta \). As derived by Einstein in Sect. 3 of the paper, along the lines of its 1902 paper, it is given by

$$\begin{aligned} dW=e^{c-\beta E}\,dp_{1}\ldots dp_{n}, \end{aligned}$$
(26)

where the constant c is defined by the normalization condition

$$\begin{aligned} \int dW=\int e^{c-\beta E}\,dp_{1}\ldots dp_{n}=1. \end{aligned}$$
(27)

Let us assume that after the transformation, the system is in equilibrium with a body with temperature function \(\beta +d\beta \), while the parameters \(\lambda \) assume the values \(\lambda +d\lambda \). Then the normalization condition assumes the form

$$\begin{aligned} \int \exp \left[ c+dc -(\beta +d\beta )\left( E+\sum \frac{\partial E}{\partial \lambda }d\lambda \right) \right] \,dp_{1}\ldots dp_{n}=1. \end{aligned}$$
(28)

One thus obtains, to first order,

$$\begin{aligned} \int \left( dc - E\,d\beta -\beta \sum \frac{\partial E}{\partial \lambda }d\lambda \right) \, e^{c-\beta E}\,dp_{1}\ldots dp_{n}=0. \end{aligned}$$
(29)

Einstein now argues that the expression in parentheses can be considered as a constant, “because the system’s energy E never differs markedly from a fixed average before and after the process”, and thus obtains

$$\begin{aligned} dc - E\,d\beta -\beta \sum \frac{\partial E}{\partial \lambda }d\lambda =0. \end{aligned}$$
(30)

Since

$$\begin{aligned} E\,d\beta +\beta \sum \frac{\partial E}{\partial \lambda }d\lambda =d\left( \beta E\right) -\beta \sum _{\nu }\frac{\partial E}{\partial p_{\nu }}dp_{\nu }=d\left( \beta E\right) -\beta \,dQ, \end{aligned}$$
(31)

where Eq. (25) has been substituted, Einstein obtains the relation

$$\begin{aligned} \beta \,dQ=d(\beta E-c), \end{aligned}$$
(32)

and thus, since \(1/\beta =k_\mathrm {B}T\),

$$\begin{aligned} \frac{dQ}{T}=d\left( \frac{E}{T}-k_{\mathrm {B}}c\right) =dS, \end{aligned}$$
(33)

from which he obtains the expression of the entropy

$$\begin{aligned} S=\frac{E}{T}-k_{\mathrm {B}}c=\frac{E}{T}+k_{\mathrm {B}}\log \int e^{-E/k_\mathrm {B}T}\,dp_{1}\ldots dp_{n}. \end{aligned}$$
(34)

It is interesting to remark that in the 1902 paper Einstein had derived a similar expression of the heat exchanged dQ involving the average values of the kinetic and potential energies, while here Einstein states that the values of the energy E which matter are not very different from their mean value. This assumption is unnecessary because the relation (30) holds if E is understood as the mean value of the energy, which is enough to reach Einstein’s goals. Moreover, Einstein has not yet derived this property of the energy distribution. We shall see that this assumption also leads Einstein to a quite dubious result in the next discussion, where he attempts to establish the property of entropy increase. In our opinion, Einstein later reconsidered this argument and was led therefore to investigate the fluctuations of energy, which he discusses in his next paper.

Einstein now attempts to prove that the entropy does not decrease in transformations involving an adiabatically isolated system. He goes on from the probability distribution of a single system in its phase space, when the value of its energy is fixed, to the distribution of a collection of a very large number N of such systems with the same value of the energy. Dividing the phase space in \(\ell \) regions \(g_{i}\), \(i=1,\ldots ,\ell \) of equal volume, Einstein looks for the probability of that \(n_{1}\) systems fall in \(g_{1}\), ..., \(n_{\ell }\) systems fall in \(g_{\ell }\). The result is obviously

$$\begin{aligned} W=\left( \frac{1}{\ell }\right) ^{N}\frac{N!}{n_{1}!\ldots n_{\ell }!}. \end{aligned}$$
(35)

One then has, by Stirling’s formula,

$$\begin{aligned} \log W =\text {const.}-\sum _{i} n_{i} \log n_{i} \simeq \text {const.}-\int \rho \log \rho \;dp_{1}\ldots dp_{n}, \end{aligned}$$
(36)

where \(\rho \) is the density of systems in the p-space, when \(\ell \rightarrow \infty \). It would have been a simple step to connect explicitly this expression to the entropy by means of Boltzmann’s formula, but Einstein does not do it. He instead uses it first to show that this expression reaches a maximum when \(\rho \) is constant on the whole region of phase space in which the energy has the assigned value. He then argues that if the density \(\rho \) differs noticeably from a constant (for states of a given value of the energy), it will be possible to find distributions with a larger value of W. In this case, if we follow in time the ensemble, the distribution will change with time, and then “we will have to assume that always more probable distributions will follow upon improbable ones, i.e., that W increases until the distribution of states has become constant and W a maximum”. Thus, if the distribution changes from \(\rho \) to \(\rho ^{\prime }\) as time goes by, and the probability correspondingly increases from W to \(W^{\prime }\), the integral on the right-hand side of Eq. (36) decreases. He then argues that if the values of \(\log \rho \) (when \(\rho \) does not essentially vanish) are close to uniform, and the probability increases, one obtains the relation

$$\begin{aligned} -\log \rho ^{\prime }\ge -\log \rho . \end{aligned}$$
(37)

This equation cannot be true without qualification, due to the normalization condition, and is however unnecessary for Einstein’s argument in the immediately following section. See, e.g., the discussion in [15, Sect. 2.2]. This is probably one of the points which led Einstein, in retrospect, to reconsider the assumption that the values of the energy which have non-vanishing probability are close to constant, and to evaluate the energy fluctuations.

Einstein then takes advantage of this result to obtain the law of entropy increase in the following way. He considers a finite number of systems \(\varSigma _{1},\ldots ,\varSigma _{\nu },\ldots \), that together form an isolated system with state variables \(p^{(1)}_{1},\ldots ,p^{(1)}_{n_{1}},\ldots ,\) \(p_{1}^{(\nu )},\ldots ,p_{n_{\nu }}^{(\nu )},\ldots \), such that \(n=\sum _{\nu }n_{\nu }\). System \(\varSigma _{\nu }\) is initially in equilibrium at a temperature \(T_{\nu }=1/k_{\mathrm {B}}\beta _{\nu }\), and is therefore described by the distribution

$$\begin{aligned} dw_{\nu }=e^{c_{\nu }-\beta _{\nu }E_{\nu }}\,dp_{1}^{(\nu )}\ldots dp_{n_{\nu }}^{(\nu )}. \end{aligned}$$
(38)

Then the distribution of the global system is given by

$$\begin{aligned} dw=\prod _{\nu }dw_{\nu }=e^{\sum (c_{\nu }-\beta _{\nu }E_{\nu })}\,dp_{1}\ldots dp_{n}. \end{aligned}$$
(39)

Let us assume that the systems are now allowed to interact among themselves, and that at the end of the process a new equilibrium is reached, characterized by the temperature parameters \(\beta ^{\prime }_{\nu }\), etc. We then have, at the end of the process,

$$\begin{aligned} dw^{\prime }=\prod _{\nu }dw^{\prime }_{\nu }=e^{\sum (c^{\prime }_{\nu }-\beta ^{\prime }_{\nu }E^{\prime }_{\nu })}\,dp_{1}\ldots dp_{n}. \end{aligned}$$
(40)

Einstein now introduces an ensemble of a very large number N of global systems \(\varSigma \) to argue that, since W always increases, the distributions

$$\begin{aligned} \rho&= N \,e^{\sum (c_{\nu }-\beta E_{\nu })};\end{aligned}$$
(41a)
$$\begin{aligned} \rho ^{\prime }&= N \,e^{\sum (c^{\prime }_{\nu }-\beta ^{\prime } E^{\prime }_{\nu })}; \end{aligned}$$
(41b)

satisfy Eq. (37), i.e.,

$$\begin{aligned} \sum \left( c^{\prime }_{\nu }-\beta ^{\prime }_{\nu } E^{\prime }_{\nu }\right) \ge \sum (c_{\nu }-\beta _{\nu }E_{\nu }). \end{aligned}$$
(42)

But this implies, by Eq. (34),

$$\begin{aligned} \sum S^{\prime }_{\nu }\ge \sum S_{\nu }. \end{aligned}$$
(43)

Again, the detour by Eq. (37) is disputable and unnecessary. Indeed, it is sufficient to use Eq. (35) to obtain Eq. (42) where E is now taken as the mean value of the energy, and the result follows. The observations made after Eq. (30) also apply here. However, the main weakness of the argument lies in the petitio principii that the probability W of the ensemble distribution should always increase. This objection was raised by Paul Hertz in 1910 [24], and Einstein soon acknowledged [9] that the objection was “fully founded”.

In the closing section of this paper Einstein applies these results to a simple description of a thermal engine connected in turn to several heat reservoirs to derive the second principle in the form of Clausius.

4 On the General Molecular Theory of Heat

A change of pace is easily noticed already in the first lines of the 1904 paper, entitled “On the general molecular theory of heat” [4]. Here he refers to his previous papers, in which he had spoken of the “kinetic theory of heat” as laying the foundations of thermodynamics, by the less specific expression of “molecular theory of heat”. The paper contains several results worth mentioning, as announced at the end of the introduction

First, I derive an expression for the entropy of a system, which is completely analogous to the expression found by Boltzmann for ideal gases and assumed by Planck in his theory of radiation. Then I give a simple derivation of the second law. After that I examine the meaning of a universal constant, which plays an important role in the general molecular theory of heat. I conclude with an application of the theory to black-body radiation, which yields a most interesting relationship between the above-mentioned universal constant, which is determined by the magnitudes of the elementary quanta of matter and electricity, and the order of magnitude of the radiation wave-lengths, without recourse to special hypotheses.

These results are obtained as independent developments of the theory reported in the previous two papers. In the previous papers he had derived the canonical expression of entropy, namely

$$\begin{aligned} S=\frac{E}{T}+k_{\mathrm {B}}\int e^{-E/k_\mathrm {B}T}\,dp_{1}\ldots dp_{n}, \end{aligned}$$
(44)

where \((p_{1},\ldots ,p_{n})\) are the general state variables of the system, and E is the value of the internal energy. In Sect. 1 of this paper Einstein derives the expression we now call microcanonical, which is related to the density of states of energy E, \(\omega (E)\), by the relation

$$\begin{aligned} S=k_{\mathrm {B}}\log [\omega (E)]. \end{aligned}$$
(45)

He obtains this result by integrating the relation between the temperature and \(\omega (E)\) previously derived:

$$\begin{aligned} \frac{1}{k_{\mathrm {B}}T}=\frac{\omega ^{\prime }(E)}{\omega (E)}, \end{aligned}$$
(46)

where one assumes that the system’s energy lies between E and \(E+\delta E\). Note, however, that in the previous papers \(\omega (E)\) was the energy density of the thermal reservoir, while this relation is tacitly applied here to the energy density of the system. Interestingly, in this paper Einstein defines for the first time the density of states \(\omega (E)\) in the now customary way, by

$$\begin{aligned} \omega (E)\,\delta E=\int _{E}^{E+\delta E}d p_{1}\ldots d p_{n}, \end{aligned}$$
(47)

while in the previous papers he kept including the \(\delta E\) factor in its definition.

The “derivation” of the second law in Sect. 2 suffers again, as in the 1903 paper, from the petitio principii of the assumption that more improbable states never follow more probable ones. The calculation is now simpler, but the result is also more restricted. First Einstein formulates the zero-th laws law of thermodynamics by assuming that if a system is in contact with an environment at temperature \(T_{0}\) it acquires the temperature \(T_{0}\) and keeps it from then on. However, according to the molecular theory of heat, this is not absolutely true, but true only with some approximation. In particular the probability \(W\,\delta E\) that the energy of such a system has a value lying between E and \(E+\delta E\) at an arbitrary point in time is given by

$$\begin{aligned} W\,\delta E=C \, e^{-E/{k_\mathrm {B}T}_{0}}\,\omega (E) \,\delta E, \end{aligned}$$
(48)

where C is a constant. Einstein argues that this distribution is very sharply peaked and that, because of the previous result, it can also be written in the form

$$\begin{aligned} W\,\delta E=C\, \exp \left[ \frac{1}{k_{\mathrm {B}}}\left( S-\frac{E}{T_{0}}\right) \right] \,\delta E, \end{aligned}$$
(49)

where \(S=S(E)\) is the value of the entropy pertaining to the value E of the internal energy. Note that here again the property of the distribution of being sharply peaked is not needed, and anyway has not yet been derived. More interestingly, as far as we know, this is the first statement of Einstein’s principle of fluctuations, which relates the probability of an energy fluctuation in a thermodynamic system to the difference in the expression \(\mathscr {F}(E,T)=E-TS(E)\), which is now known as the availability. Now Einstein considers a system made of several such subsystems, all in contact with a large similar system at the temperature \(T_{0}\). The probability \(\mathfrak {W}\) of a given distribution \((E_{1},\ldots ,E_{\ell })\) of the energy among these subsystems is given by

$$\begin{aligned} \mathfrak {W}\propto \exp \left[ \frac{1}{k_{\mathrm {B}}}\left( \sum _{i=1}^{\ell }S_{i}-\frac{1}{T_{0}}\sum _{i=1}^{\ell }E_{i}\right) \right] . \end{aligned}$$
(50)

Let the reservoirs exchange energy, maybe by the assistance of cyclic machines, reaching an energy distribution \((E^{\prime }_{1},\ldots ,E^{\prime }_{\ell })\). The corresponding probability is given by

$$\begin{aligned} \mathfrak {W}^{\prime }\propto \exp \left[ \frac{1}{k_{\mathrm {B}}}\left( \sum _{i=1}^{\ell }S^{\prime }_{i}-\frac{1}{T_{0}}\sum _{i=1}^{\ell }E^{\prime }_{i}\right) \right] . \end{aligned}$$
(51)

Assuming again that less probable states are followed by more probable ones, one must have

$$\begin{aligned} \mathfrak {W}^{\prime }\ge \mathfrak {W}. \end{aligned}$$
(52)

Since \(\sum _{i}E_{i}\) is conserved, this equation implies

$$\begin{aligned} \sum _{i=1}^{\ell }S^{\prime }_{i}\ge \sum _{i=1}^{\ell }S_{i}. \end{aligned}$$
(53)

It is hard for us to make sense of this derivation. The results seems restricted to systems in contact with a reservoir with a given temperature \(T_{0}\), and therefore it is by no means general. In particular the inequality among the \(\mathfrak {W}\)’s cannot be absolutely satisfied without violating the normalization of probabilities, just as in the case of Eq. (37). The most interesting part is the way in which Einstein treats the distribution of energies among the system as a collective state of a system made of several subsystems and, at the same time, as one possible macroscopic state of a system governed by a canonical distribution at the temperature \(T_{0}\). This device will be put to use in the 1910 work on critical fluctuations [25].

The physical interpretation of the constant \(\kappa =k_{\mathrm {B}}/2\) is obtained by Einstein in Sect. 3 by evaluating, via his equipartition theorem, the kinetic energy of a mechanical system of n particles, and by relating the resulting expression to the one obtained by the kinetic theory for the ideal gas. He thus obtains an explicit estimate of \(\kappa =6.5\times 10^{-17} {\mathrm{erg}\, \mathrm{K}^{-1}}\), corresponding to \(k_{\mathrm {B}}=1.3\times 10^{-23}{\mathrm{J\,K}^{-1}}\). The discrepancy with modern values is due to the use of the value \(N_{\mathrm {A}}=6.4\times 10^{23}\,{\mathrm{mol}^{-1}}\) for Avogadro’s number, that Einstein found in O. E. Meyer’s book [26].

In Sect. 4, under the title “General meaning of the constant \(\kappa \)” Einstein discusses the fluctuations of the energy in the canonical ensemble, deriving the relation between the specific heat and the amplitude of energy fluctuations as

$$\begin{aligned} \left<{E^{2}}\right>-\left<{E}\right>^{2}=k_{\mathrm {B}}T^{2}\frac{d\left<{E}\right>}{d T}, \end{aligned}$$
(54)

where \(\left<{\ldots }\right>\) denotes the canonical average. Gibbs had obtained the same expression in [7, Eq. (205), p. 72], but pointed out almost immediately that these fluctuations were not observable. With \(\varepsilon \), \(\varepsilon _{p}\) and \(\varepsilon _{q}\) the total, kinetic and potential energies respectively, and denoting averages by a bar, he states [7, p. 74f]

It follows that to human experience and observation with respect to such an ensemble as we are considering, or with respect to systems which may be regarded as taken at random from such an ensemble, when the number of degrees of freedom is of such order of magnitude as the number of molecules in the bodies subject to our observation and experiment, \(\varepsilon -\bar{\varepsilon }\), \(\varepsilon _{p}-\bar{\varepsilon }_{p}\), \(\varepsilon _{q}-\bar{\varepsilon }_{q}\) would be in general vanishing quantities, since such experience would not be wide enough to embrace the more considerable divergencies from the mean values, and such observation not nice enough to distinguish the ordinary divergencies. In other words, such ensembles would appear to human observation as ensembles of uniform energy, and in which the potential and kinetic energies (supposing that there were means of easing these quantities separately) had each separately uniform values.

Characteristically, Einstein instead goes over immediately to look for a system in which these fluctuations could be observed and he finds that the blackbody radiation could provide such a system. It is worth quoting his reasoning [4, Sect. 5]

If the linear dimensions of a space filled with temperature radiation are very large in comparison with the wavelength corresponding to the maximum energy of the radiation at the temperature in question, then the mean energy fluctuation will obviously be very small in comparison with the mean radiation energy of that space. In contrast, if the radiation space is of the same order of magnitude as that wavelength, then the energy fluctuation will be of the same order of magnitude as the energy of the radiation of the radiation space.

Einstein pauses only one moment before proceeding to the application of his molecular theory of heat to black-body radiation [4, Sect. 5]

Of course, one can object that we are not permitted to assert that a radiation space should be viewed as a system of the kind we have assumed, not even if the applicability of the general molecular theory is conceded. Perhaps one would have to assume, for example, that the boundaries of the space vary with its electromagnetic state. However, these circumstances need not be considered, as we are dealing with orders of magnitude only.

Einstein can thus evaluate the size \(\left<{\varepsilon ^{2}}\right>\) of the energy fluctuations \(\varepsilon =E-\left<{E}\right>\) from Eq. (54) and from the Stefan-Boltzmann law

$$\begin{aligned} \left<{E}\right>=a\, v\, T^{4}, \end{aligned}$$
(55)

where \(a=7.06\times 10^{-15}\,{\mathrm{erg\,cm}^{-3}\,\mathrm{K}^{-4}}\) is the radiation constant, T is the absolute temperature, and v is the cavity volume. Then, the linear dimensions of a cavity for which \(\left<{\varepsilon ^{2}}\right>\simeq \left<{E}\right>\) are given by

$$\begin{aligned} \root 3 \of {v}=\frac{1}{T}\root 3 \of {\frac{4 k_{\mathrm {B}}}{a}}=\frac{0.42}{T}, \end{aligned}$$
(56)

which compares well (in order of magnitude) with the expression \(\lambda _{\max }=0.293/T\) obtained from Planck’s law (both lengths are expressed in cm, and T is expressed in Kelvin).

However, in the following months, trying to explicitly apply his theory to that system, he will encounter a paradox, which he will brilliantly overcome by renouncing the classical picture of the emission and adsorption of light, based on Maxwell’s equations, and by introducing the concept of the light quanta [5]. But that is another story, which has already been told many times.

5 Einstein and Gibbs

One usually takes for granted that the research projects pursued by Einstein in these three papers, and by Gibbs in his 1902 book [7] were equivalent, and that the more mathematically refined argumentation contained in the latter made Einstein’s approach redundant. A closer scrutiny shows however fundamental differences in their approaches, and makes Einstein’s approach more attractive to present-day physicists. Gibbs program focuses in understanding the properties of ensembles of mechanical systems, i.e., of systems whose dynamical equations are given, but whose initial conditions are only given in a probability distribution. He gives this discipline the name of “statistical mechanics”. He stresses that its relevance goes beyond establishing a foundation of thermodynamics [7, Preface, p. viii]

But although, as a matter of history, statistical mechanics owes its origin to investigations in thermodynamics, it seems eminently worthy of an independent development, both on account of the elegance and simplicity of its principles, and because it yields new results and places old truths in a new light in departments quite outside of thermodynamics.

Indeed, statistical mechanics laws are more general than those of thermodynamics [7, p. ix]

The laws of thermodynamics, as empirically determined, express the approximate and probable behavior of systems of a great number of particles, or, more precisely, they express the laws of mechanics for such systems as they appear to beings who have not the fineness of perception to enable them to appreciate quantities of the order of magnitude of those which relate to single particles, and who cannot repeat their experiments often enough to obtain any but the most probable results. The laws of statistical mechanics apply to conservative systems of any number of degrees of freedom, and are exact.

On the other hand, according to Gibbs, our ignorance of the basic constitution of material bodies make unreliable our inferences based on supposed models of matter, even when derived by the methods of statistical mechanics [7, pp. ix–x]

In the present state of science, it seems hardly possible to frame a dynamic theory of molecular action which shall embrace the phenomena of thermodynamics, of radiation, and of the electrical manifestations which accompany the union of atoms. [...] Even if we confine our attention to the phenomena distinctively thermodynamic, we do not escape difficulties in as simple a matter as the number of degrees of freedom of a diatomic gas. It is well known that while theory would assign to the gas six degrees of freedom per molecule, in our experiments on specific heat we cannot account for more than five. Certainly, one is building on an insecure foundation, who rests his work on hypotheses concerning the constitution of matter.

Gibbs therefore attempts to reduce his goal to a purely mathematical treatment [7, p. x]

Difficulties of this kind have deterred the author from attempting to explain the mysteries of nature, and have forced him to be contented with the more modest aim of deducing some of the more obvious propositions relating to the statistical branch of mechanics. Here, there can be no mistake in regard to the agreement of the hypotheses with the facts of nature, for nothing is assumed in that respect. The only error into which one can fall, is the want of agreement between the premises and the conclusions, and this, with care, one may hope, in the main, to avoid.

One can therefore only hope to establish analogies between quantities which are defined within statistical mechanics, and those which are empirically encountered in thermodynamics [7, p. x]

We meet with other quantities, in the development of the subject, which, when the number of degrees of freedom is very great, coincide sensibly with the modulus, and with the average index of probability, taken negatively, in a canonical ensemble, and which, therefore, may also be regarded as corresponding to temperature and entropy.

The relations of the laws of statistical mechanics with thermodynamics is further discussed in [7, Ch. XIV, p. 166]

A very little study of the statistical properties of conservative systems of a finite number of degrees of freedom is sufficient to make it appear, more or less distinctly, that the general laws of thermodynamics are the limit toward which the exact laws of such systems approximate, when their number of degrees of freedom is indefinitely increased. And the problem of finding the exact relations, as distinguished from the approximate, for systems of a great number of degrees of freedom, is practically the same as that of finding the relations which hold for any number of degrees of freedom, as distinguished from those which have been established on an empirical basis for systems of a great number of degrees of freedom.

The enunciation and proof of these exact laws, for systems of any finite number of degrees of freedom, has been a principal object of the preceding discussion. But it should be distinctly stated that, if the results obtained when the numbers of degrees of freedom are enormous coincide sensibly with the general laws of thermodynamics, however interesting and significant this coincidence may be, we are still far from having explained the phenomena of nature with respect to these laws. For, as compared with the case of nature, the systems which we have considered are of an ideal simplicity. [...] The phenomena of radiant heat, which certainly should not be neglected in any complete system of thermodynamics, and the electrical phenomena associated with the combination of atoms, seem to show that the hypothesis of systems of a finite number of degrees of freedom is inadequate for the explanation of the properties of bodies.

In Gibbs’ approach, the probability distribution is a datum of the problem, while in Einstein’s one it is one of the unknowns. The greatest difference is that Gibbs starts from the equal a priori probability postulate, while for Einstein what is important is to evaluate time averages and these are replaced by phase space averages through an ergodic hypothesis. Thus Gibbs is allowed to introduce the canonical distribution a priori, as a particularly simple one, endowed with interesting properties, in particular because it factorizes when one considers the collection of two or more mechanically independent systems [7, Ch. IV, p. 33]

The distribution [...] seems to represent the most simple case conceivable, since it has the property that when the system consists of parts with separate energies, the laws of the distribution in phase of the separate parts are of the same nature, a property which enormously simplifies the discussion, and is the foundation of extremely important relations to thermodynamics.

On the contrary, for Einstein, the canonical distribution is the distribution which describes the mechanical state of a system in contact with a thermal reservoir at a given temperature, while the “simplest” distribution is rather the microcanonical, which represents the state of an isolated system at equilibrium. And the former is derived from the latter.

Einstein’s 1910 lecture notes on the Kinetic Theory of Heat at the University of Zurich [27] show, in Navarro’s words [14, Sect. 6.2], how his approach allowed him to proceed to

the systematic application of statistical mechanics, once the canonical distribution is attained, to a large variety of fields. This is a sample list of the applications presented in the lecture notes: paramagnetism, Brownian motion, magnetic properties of solids, electron theory of metals, thermoelectricity, particle suspensions and viscosity. Gibbs invented, instead, a method whereby he could find no direct physical application other than the detection of the already mentioned thermodynamic analogies. Had Gibbs lived longer (he died the year after the publication of Elementary Principles) this might have changed. But, given his rigorous and extremely cautious attitude, any assumption on the issue is enormously risky.

Even more strikingly, in Einstein’s hands, deviations from the expected behavior become a tool for the investigation of the microscopic dynamics. This difference in attitude was already highlighted above, in the discussion of energy fluctuations, but the clearest example is the 1905 paper on light emission and adsorption [5], where one notably reads

This relation,Footnote 5 found as a condition for the dynamical equilibrium, not only fails to agree with the experiments, but also intimates that in our model a well-defined distribution of the energy between ether and matter is out of the question. [...] In the following, we shall treat the “black-body radiation” in connection with the experiments, without establishing it on any model of the production or propagation of the radiation.

Thus Einstein brackets the contemporary models of light adsorption and propagation, but maintains the statistical interpretation of entropy. He then evaluates the radiation entropy from the empirical distribution law and interprets it in terms of the statistical approach as describing the coexistence of point-like particles in a given volume (cf. [28]). This paper was soon followed by the equally bold application of Planck’s radiation theory to the specific heats of solids [29].

6 Concluding Remarks

We presented Einstein’s approach to statistical mechanics in contrast to the one taken by Gibbs. The results are equivalent since both are based on Boltzmann’s contributions. Gibbs’ starting point is the equal a priori probability hypothesis in phase space that leads to the microcanonical probability density for an ensemble (of representative systems, according to Tolman [8]). Einstein, on the other hand, starts by stating that what is important is the evaluation of time averages of appropriate quantities. These can be replaced by averages of the same quantities over an unknown density function over the phase space, with the help of an ergodic hypothesis. Einstein introduces the assumption that the energy is the only conserved quantity to play the role of the ergodic hypothesis. Using this assumption and Lioville’s theorem, Einstein shows that the unknown density function mentioned before must be constant on the energy shell, that is it must be the microcanonical distribution. From there, the interpretation of the canonical distribution is different: for Gibbs, it is the simplest distribution, which leads to describe as statistically independent systems which are physically independent, while for Einstein it is the distribution which describes the state of a system in contact with a reservoir. Thus the index of the canonical distribution (as defined by Gibbs) is analogous to the temperature for Gibbs, but can be identified with the temperature for Einstein. It is also interesting to remark that in several points Einstein states (without proof) that the distribution of energy values in the canonical ensemble is sharply peaked, and deduces from this some dubious inequalities for the probability density itself. Only in the 1904 paper he explicitly evaluates the size of fluctuations, obtaining a result already derived by Gibbs. Then, while Gibbs had stressed the non-observability of energy fluctuations in macroscopic systems (thus contributing to the “rational foundation of thermodynamics”), Einstein points at the use of fluctuations as a tool for investigating microscopic dynamics (as he did, in particular, in [30], where he hinted at the dual wave-particle nature of radiation by interpreting the two terms appearing the expression of energy fluctuations).

What interest can a present-day reader find in these papers? We think that they sketch a very neat road map for the introduction of the basic concepts of statistical mechanics, focusing on their heuristic value. One first focuses on isolated systems and identifies the microcanonical ensemble as the equilibrium distribution by means of the thermal equilibrium principle. For this step, Einstein’s reasoning given above, based on the postulate of the absence of integrals of motion beyond the energy, is excellent. Then, one looks at a small part of such an isolated system, and one shows that the corresponding distribution is the canonical one. Finally, one identifies the mechanical expressions of temperature, infinitesimal heat and, by integration, of entropy. All these steps can be tersely traced by following, more or less closely, Einstein’s path. At this point, the focus can be shifted to the evaluation of fluctuations, which allow on the one hand to recover the equivalence of ensembles for large enough systems and, by the same token, to identify situations in which the underlying molecular reality shows up in the behavior of macroscopic systems (like, e.g., in Brownian motion). This road map has been more or less followed by several modern textbooks on statistical mechanics, but we think that it would be fair to stress that it had first been sketched in the papers we described.

In any case, we will be satisfied if the present note encourages some colleagues to have a look at these papers, in which the first steps in the making of a giant are recorded.