1 Introduction

The simplest classical model of the nonrotating electron in special relativity consists of a static spherically symmetric distribution of total electric charge e over the surface of a rigid sphere of radius \(r_0\), as measured by an observer at rest with respect to the sphere. This model was first developed and studied during the first decade of the 1900s by Abraham [1, 2] Lorentz [3] and Poincaré [4], based entirely on Maxwell’s theory of electromagnetism. For an unaccelerated electron, the rest frame integral of the local energy density of the Coulomb field over the exterior of the electron sphere representing the total energy \(W=e^2/(2r_0)\) stored in that field, is equal to the self-energy of the charge distribution.

For any static isolated configuration of charge, this self-energy is equal to the work needed to assemble it by slowly bringing the charge elements in from spatial infinity. The factor of 1/2 in the energy formula is a geometric factor which is replaced by 3/5 if the model of the electron is a constant charge density solid sphere rather than a constant density spherical surface charge distribution and one also considers the additional contribution to the electromagnetic field energy inside the sphere (zero in the surface distribution case by spherical symmetry): \(1/2 +1/10=3/5\). Dropping these factors and converting the Coulomb energy to the entire observed mass \(m_e\) of the electron by Einstein’s famous mass-energy relation \(E=mc^2\) defines a corresponding radius \(r_{\textrm{e}}=e^2/(m_e c^2)\) that pure dimensional analysis would lead to, called the classical radius of the electron.

With the birth of special relativity occurring during the same time years as the Abraham-Lorentz model development, there was the expectation that apart from any additional “bare mass” \(m_0\) that the electron might have, the electromagnetic energy W should contribute to the inertial mass of the electron an amount \(m_{\textrm{em}}=W/c^2\) satisfying Einstein’s mass-energy relation, leading to a total mass \(m_e=m_0+m_{\textrm{em}}\). Instead they had found \(m_{\textrm{em}}=\frac{4}{3} W/c^2\) in the limit of nonrelativistic accelerated motion of the electron. This became the famous “4/3” problem.

After three preliminary papers on the inertial and gravitational mass of electromagnetic fields in 1921–1922 [5,6,7], in 1923 Fermi [8,9,10,11] reconsidered this problem for any regular spherically symmetric distribution of charge in motion that satisfies Born’s definition of relativistic rigidity [12, 13], namely that this distribution is time-independent in its instantaneous rest frame. Re-examining the Abraham-Lorentz derivation of the inertial mass of such a distribution of charge due entirely to its self-field, Fermi managed to correct the troublesome factor of 4/3 in their result which he showed is entirely due to their imposition of conventional rigidity with respect to a single inertial frame instead of the sequence of instantaneous rest frames following Born’s criterion. The former is not a Lorentz invariant condition like Born’s and so is in direct conflict with special relativity.

By an unfortunate coincidence the same numerical factor of 4/3 appears in the integral definition of the total 4-momentum observed by any inertial observer moving relative to an (unaccelerated) spherically symmetric charge distribution which is time-translation symmetric in its own inertial rest frame. Contracting the stress-energy tensor of the electromagnetic field due to such a distribution with the 4-velocity of any inertial observer gives the local 4-momentum distribution as seen by that observer, and integrating it over a time slice in that observer’s reference frame gives the total 4-momentum seen by that observer at that moment of inertial time. Since it arises as the hypersurface integral of a second rank tensor field (invariant under translation in the rest frame of the charge distribution), this quantity is a linear 4-vector-valued function of the 4-velocity of the inertial observer and indeed the 4/3 factor enters because of the tensor transformation law. In the absence of sources, the 4-momentum of the electromagnetic field is actually independent of the observer, as shown by textbook applications of Gauss’s law to the divergence-free stress-energy tensor, but in the presence of sources, this divergence is nonzero and leads to the complications encountered in this problem that of course were not understood in the early days of special relativity.

Kwal in 1949 [14] and later independently Rohrlich [15] in 1960 made the observation that by fixing the 4-velocity of the inertial observer in this calculation to be the one associated with the rest frame of the unaccelerated charge distribution, one obtains a fixed 4-momentum independent of time which equals the rest frame 4-momentum by definition and again the troublesome factor of 4/3 disappears. Unfortunately this is not the end of the story: the classical theory of charge distributions and electromagnetic self-forces and radiation reaction forces is a complicated and controversial subject into which many have entered the discussion over the past century since it began, and Fermi’s own contribution has been largely ignored.

Indeed the Fermi coordinates and Fermi-Walker transport for which Fermi is well known in relativity were developed specifically in 1922 to treat this problem [7] while he was a university student already knowledgeable in general relativity only a few years after its birth in 1916. In that very paper in its final section he considers the Lagrangian for an extended charged body with a given charge and mass distribution moving in an external electromagnetic field, where the distributions are confined to a length scale that in his subsequent paper will be assumed to be small compared to the variation of the external field. In that next paper he focuses only on the contribution of the charge distribution to its equation of motion in the external field, but one can easily retain the mass contribution as well, as in many discussions of this problem, where this mass is referred to as the bare mass or mechanical mass of the object. The result is the Lorentz force law with the inertial rest mass contribution to that equation consisting of the sum of the bare mass and the electromagnetic energy in the self-field of the charge distribution, the latter energy not preceded by the famous 4/3 factor of Abraham and Lorentz. Fermi’s derivation in this larger context is discussed in detail in the textbook on special relativity by Aharoni [16] who came out with his second edition in 1965 specifically to include this part as explained in his preface, after attention had been called to the problem by Rohrlich in 1960.

Following the analysis by Abraham and then Lorentz of the accelerated version of their model for the electron, Fermi considered a regular spherically symmetric distribution of accelerated charges held in a rigid configuration by some external force and applied the Lagrangian variational principle to compute the time rate of change of the momentum in the force law, without specializing to a particular charge density profile. In order to show exactly how the mistaken 4/3 factor in the inertial mass due to the energy of the self-field arises, Fermi contrasted the Born rigid calculation of the Lagrangian variation (variation B) with that assuming rigidity with respect to a particular inertial observer (variation A), which is not relativistically invariant and hence not to be trusted. He showed that the latter assumption in deriving the equations of motion leads to the Abraham-Lorentz result with the mistaken 4/3 factor multiplying the electromagnetic energy of the charge distribution in its contribution to the inertial mass, but that the Born assumption gives the correct factor as expected by the equivalence of mass and energy through the famous equation \(E=mc^2\).

Operationally, a congruence of timelike world lines is said to be Born-rigid if it has vanishing expansion. As discussed in detail by Salzman and Taub [17] in 1954, any timelike curve determines a family of orthogonal hyperplanes in special relativity and their orthogonal trajectories define the world lines of a body in Born-rigid motion (referred to as planar motion by Herglotz and Noether [18, 19]). The remaining class of motions are called group motions, and consist of curve segments from a continuous 1-parameter subgroup of Lorentz transformations of Minkowski spacetime into itself. The best example of these are the Rindler observers whose world lines are the integral curves of a single generator of Lorentz transformations, each world line with a unique constant acceleration. For the electron model, a time-independent spherically symmetric distribution of charge in the Fermi coordinate system adapted to the central world line is Born rigid.

In order to fix the 4/3 problem Poincaré [4] (followed up by von Laue [20]) seriously confused the issue by mixing it with the question of explaining the rigid configuration of charge through internal stresses. Long after Fermi’s resolution of the 4/3 problem, even in the commentary by his friend Persico on Fermi’s paper in the collected work of Fermi, it was thought that Poincaré stresses were necessary to explain this discrepancy. In fact the stability of the electron is an entirely different matter from the correct relation of the inertial mass to the electromagnetic energy as explained by Fermi.

Although Wilson [21] discussed the problem of the proper definition of the 4-momentum of the electromagnetic field in 1936 with no citations, he did not succeed in clarifying matters. In 1949 Kwal [14] showed that a slight modification of Abraham’s original integral definitions for the unaccelerated electron leads to an electromagnetic 4-momentum endowed with the correct Lorentz transformation properties. Even later Rohrlich [15] in 1960 came to the same conclusion without being aware of previous work. They both explained that the correct result can only be obtained from the usual special relativistic integrals over a hypersurface of constant inertial time if that hypersurface represents a time slice in the rest frame of the electron, although Kwal only discussed changing the element of hypersurface volume without relating the region of integration to that rest frame. The classical electron model has continued to intrigue people ever since, see for example, Feynman [22], Teitelboim [23,24,25,26], Boyer [27], Rohrlich [28], Nodvik [29], Schwinger [30], Campos and Jimenéz [31, 32], Cohen and Mustafa [33], Comay [34], Moylan [35], Kolbenstvedt [36], Rohrlich [37], Appel and Kiessling [38], de Leon [39], Harte [40], Pinto [41], Bettini [42], Galley et al. [43], Griffiths [44], Damour [45], and more.

At least three entire books are devoted to the topic of the classical theory of the charge distributions, those by Rohrlich [46, 47], Yaghjian [48, 49], and Spohn [50], and the model is described in detail by Jackson [51, 52], the universally accepted reference textbook on classical electrodynamics (see also Chapter 8 of Anderson [53]). Some interesting historical details may be found in the recent article of Janssen and Mecklenburg [54, 55]. This whole problem is not without explicit controversy, as detailed by Parrott in his archived exchange with Physical Review which would not publish his criticism of Rohrlich’s recent work [56].

Except for Aharoni [16], and much later Kolbenstvedt [36] in 1997, and for Nodvik [29] and Appel and Kiessling [38] who consider a spinning generalization of the relativistically rigidly rotating electron model reviewed by Spohn [50], none of these references seem to take into account Fermi’s actual argument nor connect it to that of Kwal and Rohrlich even though most of them cite Fermi’s original article. Kolbenstvedt [36] called attention to Fermi’s argument with a slightly different but equivalent explanation of his own, and not in an obscure physics journal, and yet the latest edition of the books of Jackson, Rohrlich, and Yaghjian, all published after that year still do not reflect this news. Jackson does explain that his nonrelativistic treatment can be relativistically corrected, referring to Fermi, and to be fair, the stated purpose of Yaghjian was to update the Abraham-Lorentz model which he did, apparently unaware of the content of Fermi’s articles. Misner, Thorne and Wheeler’s tome Gravitation [57], affectionately known as MTW, really raised the level of mathematical discussion of special and general relativity after 1973, and allowed Spohn to more cleanly and covariantly discuss the relativistic rigid electron model to include spin, but without discussing the observer-dependent 4-momentum integral for the electromagnetic self-field.

An important element of this discussion is the conserved nature of the integrals of the local densities of energy and momentum associated with the divergence-free stress-energy tensor of the sourcefree electromagnetic field when integrated over an entire spacelike hyperplane of Minkowski spacetime due to Gauss’s law. Such a conservation law fails to exist when the divergence is instead nonzero in the presence of sources or if a world tube containing sources is excluded from the integral, leading either to a spacetime volume divergence integral or (equivalently) to an internal boundary integral that must be taken into account in Gauss’s law. This is an important discussion since none of the textbooks on special or general relativity describe this more general situation, while textbooks on classical electrodynamics typically only use local such integrals within bounded regions of space.

Since this discussion is crucial in understanding the present problem, it is included in the section following this introduction where the preliminary details about the electromagnetic field needed to consider the spherical model of an unaccelerated electron are introduced together with the definitions of the 4-momentum in the field as observed by any inertial observer, and the role played by Gauss’s law in conservation laws is then explained, leaving the details of more exotic regions of spacetime integration to the appendix. The calculation of the 4-momentum integrals for the Abraham-Lorentz model of the unaccelerated electron is then reproduced in the subsequent section to explain the role played by Kwal and Rohrlich in this matter. Next we present Fermi’s re-analysis of the Abraham-Lorentz calculation of the inertial mass for their model of the accelerated electron taking into account Born’s rigidity condition. Finally the Kwal-Rohrlich definition of 4-momentum is related directly back to this correction using Gauss’s law.

One finds that the Kwal-Rohrlich restriction of the observer-dependent electromagnetic field 4-momentum integrals to the electron rest frame time hyperplanes associates a unique 4-momentum with the unaccelerated electron which is the one special relativity assigns, which has long been known. However, for a single static electron configuration in the absence of interaction, the 4-momentum is not so interesting since there is no way even of revealing its inertial mass from at most uniform translational motion in flat spacetime. To get information about the inertial mass and 4-momentum, the electron must be accelerated and if we limit our attention to electromagnetic interactions, it will be accelerated by an external electromagnetic field through the Lorentz force law. We expect that the total momentum of the electron and the electromagnetic field (for a closed system) should be conserved. We will show here for the first time that indeed the natural conclusion of Fermi’s calculation of the lowest order contributions to the equations of motion of the electron is that the total 4-momentum as observed in the time slices in the sequence of instantaneous rest frames along its path is conserved, i.e., is independent of time, and is the usual one we associate with the system. The key idea of Fermi of the importance of this sequence of hyperplanes orthogonal to the path of a given world line in spacetime was embedded in his Fermi coordinate system adapted to that world line, and which outlived the purpose for which he introduced it in those initial days of the theory of general relativity. However, the interest in this classical problem over the past century continues to be unaware of Fermi’s early resolution of the problem, undoubtedly due to the lack of an English translation of Fermi’s work. The present work aims at filling this gap. The calculations initiated by Fermi at the time are completed here by analyzing the conservation of the total 4-momentum of the accelerated electron, extending the discussion of Ref. [58] by displaying all the mathematical details and associated subtleties in a more systematic framework.

2 Electrodynamic Preliminaries

Although Fermi does not specify the density profile of the spherically symmetric charge distribution that he analyzes in his re-examination of the earliest classical electron theory proposed by Abraham [1, 2] and improved by Lorentz [3], he refers specifically to their spherical shell model of the electron in his introduction. Without acceleration of the electron this model cannot help identify the inertial mass which arises as the proportionality constant between the applied force and the resulting acceleration. However, it was the interest in their unaccelerated model which helped push towards the understanding of the 4-momentum hypersurface integrals for the electromagnetic field so it is useful to review this case first. We re-examine their work in light of modern notation and perspective.

The model for the electron first proposed by Abraham [1, 2] and improved by Lorentz [3] consisted of a uniform spherically symmetric distribution of total electric charge e over the surface of a rigid sphere of radius \(r_0\) in its rest frame. This was called the contractile electron since it would then undergo Lorentz contraction with respect to an inertial frame in relative motion, while Abraham had assumed that the electron remained a rigid sphere with respect to all inertial observers. Einstein’s understanding of special relativity only came after this model had been developed, and Lorentz had interpreted the Lorentz contraction as a dynamical effect rather than as a universal property of spacetime itself. They attempted to explain the mass-energy of the electron as due wholly to the electromagnetic field of the electron, equating the electron’s energy and momentum to the energy and momentum of its electromagnetic field, which can be evaluated by suitably integrating the normal components of the stress-energy tensor of the electromagnetic field over a spacelike hyperplane representing a moment of time in an inertial reference frame. This is a useful example to keep in mind.

In an inertial system of Cartesian coordinates \((x^\mu )=(t=x^0, x^1,x^2,x^3)\) associated with an inertial reference frame in Minkowski spacetime with signature (−+++) following the conventions of Misner, Thorne and Wheeler [57] with \(c=1\), Maxwell’s equations for the electromagnetic field tensor \(F_{\alpha \beta }\) due to the 4-current density \(J^\alpha \) are

$$\begin{aligned} F^{\alpha \beta }{}_{,\beta }=4\pi J^\alpha \,,\quad F_{\alpha \beta ,\gamma } +F_{\beta \gamma ,\alpha } + F_{\gamma \alpha ,\beta } =0\,, \end{aligned}$$
(1)

where Greek indices assume the values 0, 1, 2, 3, and Latin indices instead 1, 2, 3. Indices may be raised and lowered with the flat Minkowski spacetime metric whose inertial coordinate components are \((g_{\alpha \beta })=\textrm{diag}(-1,1,1,1)=(g^{\alpha \beta })\).

Of course when these equations are expressed in noninertial coordinate systems the comma here signifying partial coordinate derivatives \(f_{,\alpha }=\partial _\alpha f=\partial f/\partial x^\alpha \) must be replaced by the semicolon indicating the components of the covariant derivative. We will have need later for an arbitrary covariant constant covector field \(Q_\alpha \) of vanishing covariant derivative \(Q_{\alpha ;\beta }=0\), the components of which reduce to \(Q_{\alpha ,\beta }=0\) in an inertial coordinate system where the components \(Q_\alpha \) (and \(Q^\alpha \)) are actual constants. In fact such covariant constant vector fields \(Q^\alpha \) correspond to the translational Killing vector fields of Minkowski spacetime, which are special solutions of the general Killing equations that the symmetrized covariant derivative \(Q_{(\alpha ;\beta )}=0\) vanish. The noncovariant constant Killing vectors generate the rotations and boost symmetries of Minkowski spacetime.

The stress-energy tensor of the electromagnetic field

$$\begin{aligned} T_{\textrm{em}}^{\mu \nu } = \frac{1}{4\pi } \left( F^{\mu \alpha } F^\nu {}_\alpha - \frac{1}{4} g^{\mu \nu } F^{\alpha \beta } F_{\alpha \beta } \right) \end{aligned}$$
(2)

has the following explicit inertial coordinate components

$$\begin{aligned} T_{\textrm{em}}^{00}= & {} \frac{1}{8\pi } (E^2+B^2) =U_{\textrm{em}} ,\nonumber \\ T_{\textrm{em}}^{0i}= & {} \frac{1}{4\pi } (E\times B)^i = S^i ,\nonumber \\ T_{\textrm{em}}^{ij}= & {} \frac{1}{4\pi } [-E^i E^j - B^i B^j + \frac{1}{2} g^{ij} (E^2+B^2)], \end{aligned}$$
(3)

where \(U_{\textrm{em}}\) and S are the electromagnetic energy density and the Poynting vector respectively, and of course E and B are the usual electric and magnetic fields observed in the associated reference frame in index-free notation, with nontrivial inertial coordinate components \(E^i=F^{0i}=F_{i0}\) and \(B^1=F_{23}\) etc. In general if \(u^\alpha \) is the 4-velocity of an observer at a point of spacetime, the electric field as seen by that observer there is \(E(u)^\alpha =F^\alpha {}_\beta u^\beta \). In a system of inertial coordinates adapted to that observer, then \(u^\alpha =\delta ^\alpha {}_0\), so that one has \( E(u)^\alpha = F^\alpha {}_\beta \delta ^\beta {}_0=F^\alpha {}_0=\delta ^\alpha {}_i F^i{}_0\) since due to the change of sign under index raising and the antisymmetry of the field tensor \(F^0{}_0=-F^{00}=0\). Note that in inertial coordinates associated with a second inertial observer in relative motion to a given 4-velocity \(u^\alpha \), its components are given by \((u^\alpha )=(\gamma ,\gamma v^i)\), where \(v^i\) are the components of the relative velocity of the first observer and \(\gamma =(1-v^iv_i)^{-1/2}\) is the associated gamma factor.

The divergence of this stress-energy tensor in inertial coordinates is easily calculated using Maxwell’s equations

$$\begin{aligned} T_{\textrm{em}}^{\mu \nu }{}_{,\nu } = -F^\mu {}_\nu J^\nu \,, \end{aligned}$$
(4)

as shown by Exercise 3.18 of Misner, Thorne and Wheeler [57], for example. Thus in source-free regions where the 4-current \(J^\mu =0\) vanishes, this divergence is zero, which is the condition needed to obtain a conserved 4-momentum for the free electromagnetic field in textbook discussions using Gauss’s law. When the 4-current density \(J^\alpha =\rho U^\alpha \) is due to the motion of a distribution of charge moving with 4-velocity field \(U^\alpha \) and rest frame charge density \(\rho \), then this divergence has the value

$$\begin{aligned} T_{\textrm{em}}^{\mu \nu }{}_{,\nu } = - \rho F^\mu {}_\nu U^\nu =-\rho E(U)^\mu \,, \end{aligned}$$
(5)

which apart from the sign is the 4-force density exerted by the electromagnetic field on the charge distribution, expressable as the product of the charge density and the electric field in the rest frame of the moving charge. This divergence plays a crucial role in the Lagrangian equations of motion of the electron and in the conservation or not of the 4-momentum of the electromagnetic field. Unlike the 4-momentum of a particle which is locally defined and independent of the observer (but whose components depend on the choice of coordinates of course), the 4-momentum of the electromagnetic field is nonlocal and can only be defined at a momentum of time with respect to some inertial observer through an integral over an entire hyperplane \(\Sigma \) of spacetime corresponding to the extension of the local rest space of that observer at that moment. In the presence of sources \(J^\alpha \ne 0\), this 4-momentum not only generally depends on the time for nonstationary sources, but also on the choice of observer, since there is no a priori reason to expect integrals over different regions of spacetime to agree. When instead \(J^\alpha = 0\) as is the case for a free electromagnetic field, a conservation law applies due to the vanishing divergence and if those integrals are finite, they in fact all define the same 4-momentum vector on Minkowski spacetime.

The components of the 4-momentum of the electromagnetic field as seen by an inertial observer with 4-velocity \(u^\alpha \) at a moment of time t in the observer rest frame represented by a time coordinate hyperplane \(\Sigma \) (for which \(u^\alpha \) is in fact the future-pointing unit normal vector field) is given by the integral formula

$$\begin{aligned} P(\Sigma )^\alpha =\int _\Sigma T_{\textrm{em}}^{\alpha \beta }d\Sigma _\beta \,, \end{aligned}$$
(6)

where one can integrate over an object with a free index only if that index is expressed in some inertial coordinate system where it makes sense to compare 4-vectors at different spacetime points in the flat spacetime due to the path independence of parallel transport. The contracted pair of indices can be evaluated in any coordinates. For any spacelike hyperplane \(\Sigma \) with future-pointing timelike unit normal \(u^\alpha \), the hyperplane volume element

$$\begin{aligned} d\Sigma _\alpha = -u_\alpha dV_\Sigma \end{aligned}$$
(7)

and induced volume element \(dV_\Sigma \) are most easily evaluated in inertial coordinates \((t,x^i)\) adapted to the observer with 4-velocity \(u^\alpha \), where \(u^\alpha =\delta ^\alpha {}_0\) while \(u_\alpha =-\delta ^0{}_\alpha \) and \(dV_\Sigma =dx^1dx^2 dx^3\), while the spacetime volume element is simply \(d^4V=dt\, dV_\Sigma \). The minus sign \(-1=u^\alpha u_\alpha \) in \(d\Sigma _\alpha \) is needed to pick out the future normal component \(X_u=-X^\alpha u_\alpha \) of a vector field in its integral

$$\begin{aligned} \int _\Sigma X^\alpha d\Sigma _\alpha = -\int _\Sigma X^\alpha u_\alpha dV_\Sigma = \int _\Sigma X_u dV_\Sigma \,. \end{aligned}$$
(8)

In an inertial system of coordinates the above integral then has the components

$$\begin{aligned} P(\Sigma )^0 =\int _\Sigma T_{\textrm{em}}^{00}dV_\Sigma \,,\quad P(\Sigma )^i =\int _\Sigma T_{\textrm{em}}^{0i}dV_\Sigma \,,\quad \end{aligned}$$
(9)

which represents the integral of the local density of energy and momentum in the field as seen by the associated inertial observer.

For a given fixed choice of hyperplane \(\Sigma \), the above integral formula (6) for the 4-momentum defines a unique 4-vector whose components can be evaluated in (Cartesian) inertial coordinates with respect to any other inertial observer, resulting in a Lorentz transformation of those components. However, if the hypersurface is changed, the result is a different 4-vector, unrelated to the original one by any simple transformation.

Only in the special case of a divergence-free stress energy tensor is the result actually independent of the hypersurface because of Gauss’s law, and so defines a single 4-vector no matter what time slice or what inertial observer is chosen. When the components of this single 4-vector are transformed from one system of inertial (Cartesian) coordinates to another, they then transform according to the associated Lorentz transformation. Perhaps influenced by this atypical special case, early on there was the expectation that this should be the situation in general when sources are present which make the divergence of the electromagnetic stress-energy tensor nonzero, but this was a completely unjustified expectation.

Since Gauss’s law is so essential to this question, it is crucial to have its application understood before embarking on the details of the classical model of the electron. We consider the 4-dimensional spacetime region R bounded by two hyperplanes \(\Sigma _1\) and \(\Sigma _2\) each representing a moment of time with respect to some inertial observer and each oriented by its future-pointing unit normal vector field, a constant vector field which represents the 4-velocity of the observer. These hyperplanes are parallel for the same inertial observer and hence do not intersect, with one in the future of the other, but they do intersect for two observers in relative motion, in which case one has to be careful about the signs in the two disjoint contributions to the 4-dimensional integral relative to the future-pointing normals of the hyperplanes, since the future halves of each hyperplane switch passing from one to the other across the 2-plane of their intersection. The appendix discusses these details.

In its metric form rather than its metric-independent form involving only differential forms, Gauss’s law in Minkowski spacetime only applies to the integral of a vector field over the bounding hypersurface of a region R of spacetime, equating the integral of its divergence over R with respect to the spacetime volume element to the hypersurface integral of the outward normal component of the vector field with respect to the induced or intrinsic volume element on the hypersurface. Suppose \(\Sigma _1\) and \(\Sigma _2\) do not intersect, and \(\Sigma _2\) is to the future of \(\Sigma _1\). Then provided that the integral of the boundary at spatial infinity which closes the boundary between these two hyperplanes can be neglected due to the fall-off properties of the vector field there, Gauss’s law states that

$$\begin{aligned} \int _{R} \mathcal {J}^\beta {}_{;\beta } d^4V =\int _{\Sigma _2} \mathcal {J}^\beta d\Sigma _\beta -\int _{\Sigma _1} \mathcal {J}^\beta d\Sigma _\beta \,, \end{aligned}$$
(10)

where the negative sign proceeds the second integral since its future pointing normal is not outward but inward.

If the divergence \(\mathcal {J}^\beta {}_{;\beta }=0\) vanishes, then

$$\begin{aligned} \int _{\Sigma _2} \mathcal {J}^\beta d\Sigma _\beta =\int _{\Sigma _1} \mathcal {J}^\beta d\Sigma _\beta \,, \end{aligned}$$
(11)

so the integral is the same for these two parallel hyperplanes and so is independent of the moment of time for this single inertial observer. To extend this “conservation law” to any two inertial observers in relative motion, we just need to be careful about the signs of the orientations of the interior and bounding hyperplanes in the two disjoint regions and pairs of boundaries into which their intersection divides them. However, if the divergence is zero, this is all irrelevant and one again finds that the two integrals are the same, and hence the result is independent of the choice of spacelike hyperplane, giving the same result for all observers and all moments of their time. When the divergence is nonzero, the two integrals differ by a nonzero amount which depends on the region of integration and hence in general one finds a different result for every inertial observer and every moment of their time.

Gauss’s law can be applied to a second rank symmetric tensor \(T^{\alpha \beta }\) only by contracting it with a covector \(Q_\alpha \) to form a vector field \(\mathcal {J}^\beta =Q_\alpha T^{\alpha \beta }\), so introduce a covariant constant such covector, in terms of which the divergence becomes

$$\begin{aligned} \mathcal {J}^\beta {}_{;\beta }=Q_\alpha T^{\alpha \beta }{}_{;\beta }\,. \end{aligned}$$
(12)

We then get the result

$$\begin{aligned} \int _{R} \mathcal {J}^\beta {}_{;\beta } d^4V =\int _{\Sigma _2} Q_\alpha T^{\alpha \beta } d\Sigma _\beta -\int _{\Sigma _1} Q_\alpha T^{\alpha \beta } d\Sigma _\beta \,. \end{aligned}$$
(13)

If we agree to evaluate these expressions in inertial coordinates where \(Q_\alpha \) are constants, then they can be factored out of the equation and one gets a relation involving the 4-momentum as seen by the corresponding inertial observers

$$\begin{aligned} \int _{R} T^{\alpha \beta }{}_{;\beta } d^4V =\int _{\Sigma _2} T^{\alpha \beta } d\Sigma _\beta -\int _{\Sigma _1} T^{\alpha \beta } d\Sigma _\beta = P(\Sigma _2)^\alpha -P(\Sigma _1)^\alpha \,, \end{aligned}$$
(14)

or using Eq. (5) for the electromagnetic field we get

$$\begin{aligned} P(\Sigma _2)^\alpha -P(\Sigma _1)^\alpha = -\int _{R} \rho E^\alpha (U) d^4V \,. \end{aligned}$$
(15)

Thus if the divergence is nonzero, as occurs for the electromagnetic field in the presence of sources, the two 4-momenta differ by a quantity that depends on the region of integration, so there is no common agreement among inertial observers about the 4-momentum in the field, nor is the result independent of time for a single inertial observer. This is the source of the complication for defining the 4-momentum of the electromagnetic field in the classical model of the electron.

A covariant constant vector field is a Killing vector generating translational symmetries of Minkowski spacetime from which the conservation of linear momentum follows for translation invariant Lagrangians according to Noether’s theorem. The arbitrary translational Killing vector field \(Q^\alpha \) allows us to pick out the components of linear momentum. A general Killing vector field satisfies the condition that its symmetrized covariant derivative vanish \( Q_{(\alpha ;\beta )}=0\). If instead we use a nontranslational Killing vector field in the above argument, then since the stress-energy tensor is symmetric and only the symmetric part contributes to its contraction with the covariant derivative of \(Q_\alpha \), we get the same divergence formula as before

$$\begin{aligned} \mathcal {J}^\beta {}_{;\beta } =Q_\alpha T^{\alpha \beta }{}_{;\beta } +Q_{(\alpha ;\beta )} T^{\alpha \beta } =Q_\alpha T^{\alpha \beta }{}_{;\beta } \,. \end{aligned}$$
(16)

For the nontranslational Killing vector fields which generate rotations, for example, this process leads to picking out the components of the conserved angular momentum in the case of vanishing divergence. See Misner, Thorne and Wheeler [57], for example. However, we will not consider angular momentum here.

For a static electric field due to a static charge distribution \(\rho \) in its rest frame, when expressed in terms of inertial coordinates in that rest frame for a time slice \(\Sigma _{{\textrm{rest}}}\) in that frame, the quantity

$$\begin{aligned} P(\Sigma )^0 =\frac{1}{8\pi }\int _{\Sigma _{{\textrm{rest}}}} E^2(\textbf{x}) d^3\textbf{x} = W \end{aligned}$$
(17)

is just the self-energy of the charge configuration defined alternatively by

$$\begin{aligned} W =\frac{1}{2} \int \int d^3\textbf{x}d^3\textbf{x}'\,\frac{\rho (t,\textbf{x}) \rho (t,\textbf{x}')}{|\textbf{x}-\textbf{x}'|}\,, \end{aligned}$$
(18)

using the vector notation \(\textbf{x}=(x^i) \), \(d^3\textbf{x} =dx^1 dx^2 dx^3 =dV\). Jackson (see p. 41 of the Third edition [51, 52]) shows how the latter formula for the self-energy of such a static charge configuration is equivalent to the energy in its associated electric field using the integral formula for the potential

$$\begin{aligned} \phi (\textbf{x}) = \int d^3\textbf{x}'\,\frac{ \rho (\textbf{x}')}{|\textbf{x}-\textbf{x}'|} \end{aligned}$$
(19)

and the Poisson equation \(\nabla ^2\phi = -4\pi \rho \). Then replacing the primed factors in the double integral for W by this expression for the potential, and with a crucial integration by parts identity, we get

$$\begin{aligned} W= & {} \frac{1}{2} \int \int d^3\textbf{x}d^3\textbf{x}'\,\frac{\rho (\textbf{x}) \rho (\textbf{x}')}{|\textbf{x}-\textbf{x}'|} = \frac{1}{2}\int d^3\textbf{x}\, \rho (\textbf{x}) \phi (\textbf{x}) \nonumber \\= & {} -\frac{1}{8\pi } \int d^3\textbf{x}\, \phi (\textbf{x}) \nabla ^2\phi (\textbf{x}) = \frac{1}{8\pi } \int d^3\textbf{x}\, \left[ \nabla _i \phi (\textbf{x}) \nabla ^i\phi (\textbf{x})-\nabla _i( \phi (\textbf{x}) \nabla ^i\phi (\textbf{x}))\right] \nonumber \\= & {} \frac{1}{8\pi } \int d^3\textbf{x}\, \left[ E(\textbf{x})^2 -\nabla _i( \phi (\textbf{x}) \nabla ^i\phi (\textbf{x}))\right] . \end{aligned}$$
(20)

This integral is only over the charge distribution but one can extend it to over all space since the extra contribution is zero where the charge density is zero, but as an integral over all space, the divergence term by Gauss’s law is equivalent to a surface integral at spatial infinity, where the integrand goes to zero fast enough in this static case so that the surface integral evaluates to zero in the limit. The result is just the first term representing the total energy in the electric field.

$$\begin{aligned} W = \frac{1}{8\pi } \int E(\textbf{x})^2 \, d^3\textbf{x}=\int T^{00} d^3\textbf{x} \,. \end{aligned}$$
(21)

This self-energy plays a key role in the lowest order approximation to the equations of motion of the charge distribution.

Returning now to the divergence \(-\rho E(U)^i\) in inertial coordinates of the rest frame of a static distribution of charge, its spatial integral reversed in sign is just the total electric force on the charge distribution which of course must be zero for a static configuration of charge, assuming that the charge elements are held in place by forces that are not addressed yet in this model. Otherwise the situation would not remain static. However, if the charge distribution is accelerated, there is no a priori reason to expect that the total electric force in its instantaneous rest frame be zero, and this was the error made in the Abraham and Lorentz model. Fermi showed that by requiring that the rigidity of the model respect Born’s special relativistic rigidity condition, this total force integral is modified by a simple factor that his Fermi coordinate system provided, and resolves the 4/3 problem. Gauss’s law is then the key to picking out the correct conserved 4-momentum of the total system which remains ambiguous in the static unaccelerated case, as we will show in the final section.

3 The Static Electron Model

Fermi considers an arbitrary spherically symmetric static distribution of total charge e with density \(\rho \) in the rest frame of the electron, while referring specifically to the Abraham-Lorentz model of a uniform surface distribution of charge on a sphere of radius \(r_0\) as the motivation for his analysis. The latter is an instructive example to keep in mind. Let the spherically symmetric charge distribution remain at rest at the spatial origin of a system of inertial coordinates \((t,x^i)\) associated with the inertial frame K in which it is at rest for all time. The inertial observer 4-velocity is \(u=\partial _t\) (in index-free notation). Let \((t,r,\theta ,\phi )\) be a corresponding system of spherical coordinates in terms of which the sphere containing the charge has the equation \(r=r_0\). The metric is

$$\begin{aligned} ds^2=-dt^2+dr^2+r^2(d\theta ^2+\sin ^2\theta \,d\phi ^2)\,. \end{aligned}$$
(22)

Then whatever the internal distribution of charge, the exterior field outside its outer surface at \(r=r_0\) in index-free notation is

$$\begin{aligned} F=-\frac{e}{r^2}dt\wedge dr\,,\qquad E=\frac{e}{r^2}\,\partial _r\,, \qquad B=0 \quad \ (r\ge r_0) \,. \end{aligned}$$
(23)

so that the Poynting vector is also zero. Introducing orthonormal components with respect to the normalized spherical coordinate frame via

$$\begin{aligned} X^{\hat{0}} = X^0\,,\ X^{\hat{r}} = X^r\,,\ X^{\hat{\theta }} = r^{-1} X^\theta \,,\ X^{\hat{\phi }} = (r\sin \theta )^{-1} X^\phi \,, \end{aligned}$$
(24)

the nonvanishing such components of the stress-energy tensor of the exterior field (for \(r\ge r_0\)) are

$$\begin{aligned} T_{\textrm{em}}^{00} = -T_{\textrm{em}}^{rr} = T_{\textrm{em}}^{\hat{\theta }\hat{\theta }} = T_{\textrm{em}}^{\hat{\phi }\hat{\phi }}= \frac{1}{8\pi } E^2 = \frac{e^2}{8\pi r^4} = U_{\textrm{em}}\,. \end{aligned}$$
(25)

Its divergence is zero in the exterior of the electron sphere. For the shell model, the interior electromagnetic field is zero by spherical symmetry, but if instead one assumes a constant density model inside a ball of radius \(r_0\), the interior electric field has magnitude \(e r/r^3_0\) (for \(r\le r_0\)). This interior field then contributes to the total energy of the field.

The inertial coordinate components of the 4-momentum (9) in this rest frame K for any rest frame time slice \(\Sigma _{{\textrm{rest}}}\) are

$$\begin{aligned} P(\Sigma _{{\textrm{rest}}})^0 =\int _{\Sigma _{{\textrm{rest}}}} U_{\textrm{em}}dV = W, \quad P(\Sigma _{{\textrm{rest}}})^k =\int _{\Sigma _{{\textrm{rest}}}} S^k dV =0, \end{aligned}$$
(26)

where W is the self-energy of the static charge distribution due to its own electric field. These are time-independent because of the time-independence of the electric field in this frame, leading to the constant 4-momentum

$$\begin{aligned} P(\Sigma _{{\textrm{rest}}})^\alpha = W U^\alpha \,. \end{aligned}$$
(27)

For the Coulomb field of the spherical shell electron, evaluating this quantity in spherical coordinates gives the energy of the Coulomb field

$$\begin{aligned} W = \frac{e^2}{2r_0} \,. \end{aligned}$$
(28)

For the constant density model of the electron, this integral over the internal field produces an additional contribution of \(e^2/(10r_0)\) leading to the total energy \(3e^2/(5r_0)\). If one assumes that this electromagnetic energy makes a contribution \(m_{\textrm{em}}\) to the inertial mass of the electron via Einstein’s mass-energy relation \(E=mc^2\), then \(m_{\textrm{em}}= W\) (in units where \(c=1\)). However, the inertial mass can only be ascertained from the equation of motion of an accelerated electron, so this must be confirmed by the evaluation of the equation of motion.

Note that the tracefree condition \(T_{\textrm{em}}^{00}=T_{\textrm{em}}^{11}+T_{\textrm{em}}^{22}+T_{\textrm{em}}^{33}\) in the Cartesian inertial coordinates when integrated over the same region yields the condition

$$\begin{aligned} \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}^{00} dV = \int _{\Sigma _{{\textrm{rest}}}} \left( T_{\textrm{em}}^{11}+T_{\textrm{em}}^{22}+T_{\textrm{em}}^{33}\right) dV\,, \end{aligned}$$
(29)

but by the spherical symmetry of the electric field in the rest frame each of the terms on the right hand side has the same value

$$\begin{aligned} \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}^{11} dV =\int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}^{22} dV = \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}^{33} dV = \frac{1}{3} \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}^{00} dV \,. \end{aligned}$$
(30)

Consider a second inertial system \(K'\) with inertial Cartesian coordinates \((t',x^{1'},x^{2'},x^{3'})\), such that the original rest system K of the electron is moving with velocity v along the \(x^{1'}\)-axis, and let \(U'=\partial /\partial t'\) be the 4-velocity of the new time lines and let \(\Sigma '\) be a new time slice of constant time \(t'\). These are related to each other by the Lorentz coordinate transformation \(x^{\mu '} = L^\mu {}_\nu x^\nu \), namely

$$\begin{aligned} t'=\gamma (t + v x^1)\,,\ x^{1'}=\gamma (x^1 + v t)\,,\ \gamma =(1-v^2)^{-1/2}\,. \end{aligned}$$
(31)

If the 4-momentum (6) defined the same 4-vector for every inertial observer, then its inertial coordinate components would simply transform like those of a 4-vector should under this Lorentz transformation, namely the components (W, 0, 0, 0) would transform to \((W',p^{1'},p^{2'},p^{3'})\) whose nonzero values would be

$$\begin{aligned} W' =\gamma W=\gamma m_{\textrm{em}}\,, \qquad p^{1'} =\gamma Wv=\gamma m_{\textrm{em}}v\,, \end{aligned}$$
(32)

which in any case represent the new coordinate components of the 4-vector representing the 4-momentum as seen in the rest frame. However, the 4-momentum as seen by the new observer is a different 4-vector, as Gauss’s law requires, so this transformed 4-momentum is not the result of evaluating the 4-momentum formulas in the new frame, and it is senseless to actually compare the transformed components of the old 4-vector with the new components of the new 4-vector.

To instead evaluate the 4-momentum (6) as seen by the new inertial observer in the frame \(K'\) in the new inertial coordinates, we must first Lorentz transform the components of the electromagnetic energy-momentum tensor to the new frame and then perform the integration over the new time coordinate hyperplane \(\Sigma '\), and finally relate that integral to the integral over the original rest frame time coordinate hyperplane \(\Sigma _{{\textrm{rest}}}\) using the condition of time invariance in the rest frame. The stress-energy tensor transforms as follows

$$\begin{aligned} T_{\textrm{em}}^{\alpha '\beta '} = L^\alpha {}_\mu L^\beta {}_\nu T_{\textrm{em}}^{\mu \nu }\,. \end{aligned}$$
(33)

Using \(T{}_{\textrm{em}}^{01}=0\), the nontrivial part of this transformation in the t-\(x^1\) components is explicitly

$$\begin{aligned} T_{\textrm{em}}^{0'0'}= & {} \gamma ^2[T{}_{\textrm{em}}^{00}+2vT{}_{\textrm{em}}^{01}+v^2T{}_{\textrm{em}}^{11}] =\gamma ^2[T{}_{\textrm{em}}^{00}+v^2T{}_{\textrm{em}}^{11}] ,\nonumber \\ T_{\textrm{em}}^{0'1'}= & {} \gamma ^2[T{}_{\textrm{em}}^{01}+v(T{}_{\textrm{em}}^{00}+T{}_{\textrm{em}}^{11})+v^2T_{\textrm{em}}^{01}] =\gamma ^2 v(T{}_{\textrm{em}}^{00}+T{}_{\textrm{em}}^{11}) . \end{aligned}$$
(34)

The 3-volume element on the hyperplane \(\Sigma '\) transforms according to \(dV'=dV/\gamma \) due to the Lorentz contraction of the differential \(dx^1\). This follows from the relation \(dx^1=\gamma (dx'{}^1-v dt')\) restricted to \(dt'=0\), while \(dx^2=dx'{}^2,dx^3=dx'{}^3\), so that

$$\begin{aligned} dV' =dx^{1'} dx^{2'} dx^{3'} = \gamma ^{-1}dx{}^1 dx{}^2 dx{}^3 =\gamma ^{-1} dV\,. \end{aligned}$$
(35)

Then taking the symmetry property (30) into account, one finds

$$\begin{aligned} P(\Sigma ')^{0'}= & {} \int _{\Sigma '} T_{\textrm{em}}^{0'0'}dV' =\gamma \left( 1+\frac{1}{3} v^2\right) \int _{\Sigma '} T{}_{\textrm{em}}^{00}dV\nonumber \\= & {} \gamma \left( 1+\frac{1}{3} v^2\right) \int _{\Sigma _{{\textrm{rest}}}} T{}_{\textrm{em}}^{00}dV \nonumber \\= & {} \gamma \left( 1+\frac{v^2}{3}\right) m_{\textrm{em}} = \left( \frac{4}{3}\gamma -\frac{1}{3\gamma }\right) m_{\textrm{em}} , \nonumber \\ P(\Sigma ')^{1'}= & {} \int _{\Sigma '} T_{\textrm{em}}^{0'1'}dV' =\gamma v\left( 1+\frac{1}{3}\right) \int _{\Sigma '} T{}_{\textrm{em}}^{00}dV\nonumber \\= & {} \gamma v\left( 1+\frac{1}{3}\right) \int _{\Sigma _{{\textrm{rest}}}} T{}_{\textrm{em}}^{00}dV =\frac{4}{3}\gamma m_{\textrm{em}} v. \end{aligned}$$
(36)

Here the integral over \(\Sigma '\) of the integrand with respect to dV in each case equals the integral over \(\Sigma _{{\textrm{rest}}}\) because its value at \(x^{i'}\), re-expressed in terms of the old coordinates \(x^i\) of the same point, is independent of t because the charge configuration is static in the rest frame, and so has the same value at the corresponding point of \(\Sigma _{{\textrm{rest}}}\). For example, on the hyperplane \(\Sigma '\), when expressed in terms of the old coordinates, the old components \(T{}_{\textrm{em}}^{00}(t',x^{1'},x^{2'},x^{3'})=T{}_{\textrm{em}}^{00}(x{}^1,x{}^2,x{}^3)\) simply don’t depend on t, and so the integral against dV on that hyperplane is equal to its integral against dV on the original rest frame hyperplane \(\Sigma _{{\textrm{rest}}}\). The appendix shows how to re-express the difference between the 4-vectors \(P(\Sigma ')\) and \(P(\Sigma _{{\textrm{rest}}})\) independent of inertial coordinates.

In the nonrelativistic limit \(|v|\ll 1\) where \(\gamma \rightarrow 1\), the energy is unchanged, but the momentum has an unwanted extra factor of 4/3. This is the famous 4/3 problem for the unaccelerated electron. Furthermore, at nonzero speeds for which \(v^2\) becomes appreciable compared to 1, the ratio between the magnitude of the linear momentum and the energy is a complicated function of the speed |v| rather than the simple result |v|/c as in special relativity. However, this apparent problem is based on a misconception since as explained after Eq. (5) in the previous section, the 4-momentum of the electromagnetic field depends on the observer in the presence of sources, and each distinct inertial observer produces a different 4-vector from this process, so it makes no sense to compare the result (36) to the Lorentz transformation of the original 4-vector produced by the rest frame observer. For some reason this was never understood in the early days of relativity. Because historically people insisted on finding some conserved 4-momentum to assign to the electromagnetic field, they arbitrarily picked the only natural choice for an unaccelerated electron, the 4-momentum as seen in the rest frame of the electron, and in fact, this is the one we associate with a particle whose rest mass is \(m_{\textrm{em}}\). This was first proposed by Kwal in 1939 [14] although not stated so clearly and later independently by Rohrlich [15] in 1960. The real 4/3 problem is instead its unwanted appearance as a factor in the inertial mass evaluated for the accelerated electron model developed by Abraham and Lorentz. Unfortunately their calculation preceded the introduction by Born of a relativistically invariant notion of rigidity for that model, which Fermi eventually realized was the key to resolving that apparent conflict with the equivalence of mass and energy in special relativity.

For completeness we explain what Kwal and Rohrlich actually did. In the integral formulas in the primed inertial coordinates Kwal replaced the hypersurface volume element

$$\begin{aligned} d\Sigma '_{\beta '} = -u_{\beta '} dV' =\delta ^0{}_{\beta } dV' \end{aligned}$$
(37)

by the one corresponding to the rest frame hypersurface volume element at the same spacetime point but expressed in the new coordinates

$$\begin{aligned} d\Sigma _{\beta '} = -u^{\textrm{rest}}_{\beta '} \gamma dV' = u^{\textrm{rest}}_{\beta '} dV \,, \end{aligned}$$
(38)

where \(dV=\gamma dV'\) and \((-u^{\textrm{rest}}_{\beta '}) = \gamma (1, -v_i)\). This changes the integral to a new one. In other words this substitution disconnects the hypersurface volume element 4-vector from the hypersurface of integration, changing both its direction and magnitude. See Fig. 1. Then with this substitution, we get

$$\begin{aligned} P(\Sigma ')^{\alpha '} =\int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '}d\Sigma '_{\beta '}\,, \end{aligned}$$
(39)

implying

$$\begin{aligned} P_{\textrm{KR}}(\Sigma ')^{\alpha '}\equiv & {} \int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '}(-u^{\textrm{rest}}_{\beta '} \gamma dV')\,=\,\int _{\Sigma '} L^\alpha {}_\delta T_{\textrm{em}}{}^{\delta \beta } (-u^{\textrm{rest}}_{\beta }) dV \nonumber \\=\,& {} \int _{\Sigma '} L^\alpha {}_\delta T_{\textrm{em}}{}^{\delta \beta }\delta ^0{}_{\beta } dV\,=\, L^\alpha {}_\delta \int _{\Sigma '} T_{\textrm{em}}{}^{\delta 0} dV \nonumber \\=\, & {} L^\alpha {}_\delta \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}{}^{\delta 0} dV =\, L^\alpha {}_\delta P(\Sigma _{{\textrm{rest}}})^\delta , \end{aligned}$$
(40)

using \(-u^{\textrm{rest}}_{\beta }\,=\,\delta ^0{}_\beta \) for the rest frame inertial coordinate components. Again one must use the time invariance in the rest frame to conclude that the integral over \(\Sigma '\) when expressed in the rest frame inertial coordinates is independent of time and so agrees with the integral over \(\Sigma _{{\textrm{rest}}}\), allowing the components of the new 4-momentum to transform like those of a 4-vector from the components in the rest frame.

This redefinition of the momentum integral is perhaps more simply understood as the result of merely inserting the projection operator along the unit rest frame 4-velocity vector \(-u^{\textrm{rest}}{}^{\alpha '} u^{\textrm{rest}}_{\beta '} \) into the contracted pair of indices and using the relation \(\gamma = -u^{\textrm{rest}}{}^{\delta '} u_{\delta '}\) for the relative gamma factor of the two 4-velocities to get the gamma factor in the integrand which undoes the Lorentz contraction to get the rest frame volume element \(dV_{\Sigma _{{\textrm{rest}}}} = \gamma dV_{\Sigma '} \) at the same point

$$\begin{aligned} P_{\textrm{KR}}(\Sigma ')^{\alpha '}= & {} \int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '} (-u^{\textrm{rest}}{}^{\delta '} u^{\textrm{rest}}_{\beta '}) d\Sigma '_{\delta '} \nonumber \\= & {} \int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '} (-u^{\textrm{rest}}{}^{\delta '} u^{\textrm{rest}}_{\beta '}) (-u_{\delta '}dV_{\Sigma '}) \nonumber \\= & {} -\int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '} u^{\textrm{rest}}_{\beta '} (-u^{\textrm{rest}}{}^{\delta '} u_{\delta '}) dV_{\Sigma '} \nonumber \\= & {} -\int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '} u^{\textrm{rest}}_{\beta '} (\gamma ) dV_{\Sigma '} \nonumber \\= & {} -\int _{\Sigma '} T_{\textrm{em}}^{\alpha '\beta '} u^{\textrm{rest}}_{\beta '} dV_{\Sigma _{{\textrm{rest}}}}. \end{aligned}$$
(41)

Since there is only one free index here, if we re-express the integral in the rest frame inertial coordinates, then we get

$$\begin{aligned} P_{\textrm{KR}}(\Sigma ')^{\alpha '}= & {} - L^\alpha {}_\mu \int _{\Sigma '} T_{\textrm{em}}{}^{\mu \beta } u^{\textrm{rest}}_{\beta } dV_{\Sigma _{{\textrm{rest}}}}, \end{aligned}$$
(42)

but the integral is still over the new time hyperplane. However, the integrand is a static function independent of the rest frame time coordinate t, so it is equivalent to the integral over \(\Sigma _{{\textrm{rest}}}\) instead

$$\begin{aligned} P_{\textrm{KR}}(\Sigma ')^{\alpha '}= & {} - L^\alpha {}_\mu \int _{\Sigma _{{\textrm{rest}}}} T_{\textrm{em}}{}^{\mu \beta } u^{\textrm{rest}}_{\beta } dV_{\Sigma _{{\textrm{rest}}}} \nonumber \\= & {} L^\alpha {}_\mu P(\Sigma _{{\textrm{rest}}})^\mu = P(\Sigma _{{\textrm{rest}}})^{\alpha '} . \end{aligned}$$
(43)

Kwal was not sophisticated enough to do more than examine the volume element without ever referring explicitly to the actual region of integration, where the staticity condition in the rest frame is essential to allow the integral to be done on any time hyperplane. Rorhlich simply demanded that the original integral for the 4-momentum only be performed on a time hyperplane in the rest frame of the electron, which eliminates the consideration of the integrals on other hyperplanes which yield results different from that evaluated in the rest frame. Thus one always evaluates the 4-momentum integral to the same 4-vector, whose components one can express in any inertial coordinate system, and which will then transform under the corresponding relative Lorentz transformation.

Fig. 1
figure 1

A 2-dimensional diagram of the rest frame time coordinate line t (slanted forward) and a moment of rest frame coordinate time \(\Sigma \) (slanted upward) and the moving frame with time coordinate line \(t'\) (vertical) and a moment \(\Sigma '\) of its time (horizontal). For a differential region independent of time in the rest frame, like the strip between the t axis and the parallel line immediately to its right, the differential of volume \(dV'\) on \(\Sigma '\) as seen in the moving frame is Lorentz contracted with respect to the rest frame differential on \(\Sigma \): \( dV'=\gamma ^{-1} dV\). Thus integrating on \(\Sigma '\) with respect to the differential \(\gamma dV'\) is equivalent to integrating over the corresponding region of \(\Sigma \) (obtained by projection from \(\Sigma '\) to \(\Sigma \) along the t coordinate lines), provided that the integrand is independent of time in the rest frame

4 Fermi’s Contribution

Fermi’s first paper in 1921 (“On the dynamics of a rigid system of electric charges in translational motion,” [5]) studied a special relativistic system of electrons in rigid motion as then understood by Abraham and Lorentz and found the 4/3 factor in its inertial mass formula, while this factor was not present in the mass corresponding to the “weight” he calculated using general relativity in his second paper (“On the electrostatics of a homogeneous gravitational field and on the weight of electromagnetic masses,” [6]), referring to Levi-Civita’s uniformly accelerated metric for the calculations [59]. This contradicted the assumed equivalence of these two masses in general relativity. These papers were both written within five years of the birth of Einstein’s theory of general relativity in 1916, during which Fermi was first a high school student and then a university student writing his first two scientific papers. During the next year 1922 in preparation for his revisit to the problem, Fermi published his third paper on his famous Fermi comoving coordinate system adapted to the local rest spaces along the world line of a particle in motion (“On phenomena occurring close to a world line,” [7]), and calculated the variation of the action for a system of charges and masses interacting with an electromagnetic field in such a coordinate system. He then used this approach to resolve the 4/3 puzzle in his fourth paper (two versions Fermi 4a and 4c published in Italian and one in German, the most complete of which is “On a contradiction between electrodynamic theory and the relativistic theory of electromagnetic mass,” [8,9,10,11]) without explicitly referring to the third paper. These were published in 1922–1923. Still in 1923 collaborating with A. Pontremoli [60], Fermi applied his same argument to correct the calculation of the inertial mass of the radiation in a cavity with reflecting walls, where the same 4/3 factor had appeared when the cavity is in rigid motion not respecting the Born criterion; Boughn and Rothman provide a detailed alternative analysis which confirms Fermi’s result in that case [61].

His approach was to use a variational principle in a region of spacetime containing the world tube of an accelerated electron charge distribution within which one has to make certain assumptions on how the relative motion of the individual charge elements in the distribution behaves. Following the Born notion of rigidity compatible with special relativity, the only way an electron can move rigidly so that its shape in its rest frame does not change is if the individual world lines of the charge distribution all cut the local rest frame time slices orthogonally, a Lorentz invariant geometrical condition which is equivalent to stating that their relative velocities are all zero at that moment. This condition must hold in a sequence of different inertial observers with respect to which the charge distribution is at rest. If instead one takes the family of time slices associated with a single inertial observer and require that the shape not change, i.e., that the relative velocities are all zero at each such time, this corresponds to the nonrelativistic notion of rigidity, and the world lines may be varied by arbitrary time-dependent translations, so that their variations of the spatial inertial coordinates from a given state can be arbitrary functions of time. However, such a conventional rigid motion with respect to that single observer will not be seen as rigid in that sense with respect to any other single inertial observer, so it is clearly incompatible with special relativity as emphasized by Fermi. This was perhaps obvious, but no one had examined the equations of motion starting from the Lagrangian to understand that the usual starting point for the Abraham-Lorentz evaluation of their assumed equations of motion was equivalent to this assumption. This was the insight that Fermi had had to resolve the problem. Assuming conventional rigidity, one finds the starting point equations of motion of the Abraham-Lorentz model whose analysis yields the incorrect inertial mass factor with the 4/3 factor, but with Born rigidity one instead finds the one expected from Einstein’s mass-energy relation which removes this factor. The only difference in the two calculations is the resulting Fermi correction factor in the integral of the total force on the charge distribution, a factor arising from the spacetime volume element in Fermi coordinates due to the acceleration of its central world line.

Fig. 2
figure 2

A constant \(x^2,x^3\) slice of inertial coordinates \((t,x^i)\) showing the world tube of an electron sphere instantaneously at rest at \(t=0\) but accelerated in the negative \(x^1\) direction (\(\Gamma _1<0\)) and two successive rest frame Fermi time coordinate slices (\(\Sigma : t=0\) and \(\Sigma ': t=\Delta \tau (1+\Gamma _1x^1)\)) separated by infinitesimal proper time \(\Delta \tau \) at the center of the sphere, with the Fermi time slices intersecting to the right of the world tube (equivalent to the assumption \(|\Gamma _1|r_0<1\)). The spacetime region within the electron world tube between the two slices (shaded in this plane cross-section) occurs in the Gauss’s law application to the wedge between the two time slices, namely \(R_-\cup R_+\), two regions which are separated from each other by a plane of constant \(x^1\) within the hypersurface \(t=0\) shown as the intersection point in this diagram

Fermi considers a laboratory frame with inertial coordinates \((t,x^1,x^2,x^3)\) in which at the end of his argument, the accelerated electron is momentarily at rest centered about the spatial origin at the initial coordinate time which we will assume for simplicity to be \(t=0\). Assuming that the Fermi coordinate system \((T,X^1,X^2,X^3)\) is adapted to a world line in the electron charge distribution passing through the origin of these spatial coordinates at \(t=0\) when \(v^i = 0\), its time hypersurface \(T=0\) can be chosen to coincide with \(t=0\), but after a small interval dt of laboratory time along the central world line, equal to the increment dT in the proper time along that world line to first order, the Fermi time slice is instead tilted slightly to remain orthogonal to that world line as shown in Fig. 1. The metric in the Fermi coordinate system is

$$\begin{aligned} ds^2 = -N^2 dT^2 +\delta _{ij}dX^i \, dX^j\,,\quad N= c(1+\Gamma _i X^i/c^2) \,, \end{aligned}$$
(44)

where \(\Gamma _i=\dot{v}{}^i =dv^i/dT\) are the Cartesian components of the proper acceleration of the central world line (functions of T), and the speed of light c is not taken to be unity in this paragraph only in order to appreciate how factors of c enter the discussion. The proper time along the central Fermi coordinate time line is initially approximately \(dT=dt\) at \(t=0=T \), but away from the spatial origin at that world line there is a linear correction factor due to the lapse function N in the Fermi coordinate system. The proper time interval along the normal to the initial hypersurface (measured by the increment in t or T to first order) to a nearby Fermi time slice is the increment \(c^{-1}N\, dT = (1+ \Gamma _i x^i/c^2) dT\), namely the proper time along the time lines in the Fermi coordinate system. Misner, Thorne and Wheeler discuss the Fermi coordinate system in detail [57]. Of course because the proper time of each charge element world line varies by the Fermi lapse function factor compared to the central world line, the accelerations of the actual charge elements away from the central world line differ slightly from that of the central world line.

If we imagine doing a variation of the action integral over a spacetime region in inertial coordinates between two slices of inertial time (his variation A), then if we use the same coordinate symbols \((t,x^i)\) for the corresponding variation in Fermi coordinates between two slices of Fermi coordinate time (his variation B), the only formal difference in the action integrand is the additional Fermi lapse factor which enters through the spacetime volume element. This lapse correction factor is the entire basis for Fermi’s correction, and multiplies the coordinate volume element to provide the covariant spacetime volume element in Minkowski spacetime: \(d^4V\) which is \(d^4 x = dt\, dV\) in inertial coordinates but \(N dt\, dV\) in Fermi coordinates, where \(N dt=d\tau \) is the proper time along the time world lines orthogonal to the flat time slices and \(dV=dx^1 dx^2 dx^3\) is the spatial volume element in both cases. Fermi does not mention his mathematical article on these coordinates, but just presents a short derivation of the correction factor based on the curvature of the world line. The extra acceleration term in the integral with coefficient \(\Gamma _i x^i\) (with \(c=1\) again) provides exactly the necessary correction to produce the desired result in the inertial mass coefficient in the equations of motion for any smooth spherically symmetric model of the electron.

However, to justify this variation of the action yielding the Lagrange equations, the variations must vanish on the bounding time slices and be arbitrary functions of time for the intermediate times. For the variation A, Fermi explicitly states that the variations of the spatial coordinates are arbitrary functions of t which vanish at the end slices, but for the variation B he only examines an infinitesimal contribution of an interval of Fermi time to the whole 4-dimensional integral and he emphasizes that for that interval of time, the variations in the spatial coordinates of the world lines should be arbitrary constants to represent an overall translation of those world lines. However, in order to claim his resulting Lagrangian equation is valid, it has to be understood that as in the first case, the variations in the spatial coordinates must be arbitrary functions of the time coordinate which vanish at the end times. This implies that the Lagrangian variation extremizes the action among all those world lines which break the rigid Born symmetry assumed in the solution about which the variation takes place. It does not allow for a variation among the family of Born rigid motions of the electron nearby the given solution. None of this is made explicit in Fermi’s article.

If the spatial variations were arbitrary constants in the Fermi coordinate system in order to preserve the rigidity in the variation, and if they were to vanish on the end time slices, they would vanish everywhere, so could one not conclude that at every time along the world tube of the electron that the spatial integral coefficients of the variation must vanish. On the other hand if they did not vanish at the end times, one could not ignore the boundary terms which result from the integration by parts along the time lines. Furthermore, without being independent variations at each time, one cannot conclude that their coefficients must vanish. This is a very tricky point since in general one cannot impose symmetries on a Lagrangian and be guaranteed to get the same equations of motion for the restricted variational principle as those that result from imposing the symmetries on the Lagrangian equations of motion derived from the general variational principle as discussed by MacCallum and Taub for the complementary problem of spatial rather than temporal symmetry imposed on a Lagrangian [62]. It is the boundary terms which play the key role in this discussion. By not requiring that the variations about a symmetric solution conform to the symmetry, Fermi appears to have avoided these difficulties.

Note that the model of the charge distribution as some kind of rigid body is necessary in order to assign some common acceleration to the system at each moment of time (that of the central world line) so that its coefficient in the equations of motion can be interpreted as the inertial mass. Consider therefore as Fermi does such an accelerated system of electric charge in special relativity held at rest relative to each other by some external forces (i.e., in conventional or relativistic rigid motion). The corresponding action is given in inertial coordinates by the usual Lagrangian integral in inertial coordinates with the additional term in the mechanical mass added back into the discussion representing a rest mass distribution with differential mass dm assumed to have the same rigidity properties as the charge distribution with differential charge de, i.e., they mass and charge elements share the same world lines

$$\begin{aligned} S=S(A_\mu , x^\alpha )=\int \left( -\frac{1}{16\pi }F^{\alpha \beta }F_{\alpha \beta }+ A_\mu J^\mu \right) d^4 x -\int d\tau \,dm \,. \end{aligned}$$
(45)

The region of integration is an arbitrary region of spacetime, and the 4-current \(J^\mu = \rho \, U^\mu \) depends on the parametrized world lines of the charged particles, whose unit 4-velocity is \(U^\mu =dx^\mu /d\tau \) if \(d\tau \) is the increment of proper time along them. The charge and mass terms are first integrated over the world lines of the charge and mass elements and then over the family of these world lines. Both the charge and mass profiles as a function of the family of world lines of the matter distribution are assumed to be given and fixed along those world lines. Fermi discusses and varies this action in his Fermi coordinate article [7]. The line integrals in the charge and mass distribution terms are parametrization independent, so the world lines can be parametrized by any parameter, including coordinate time.

Varying S with respect to the vector potential \(A_\mu \), fixing the world lines of the charge distribution, leads to the inhomogeneous Maxwell’s equations. In fact

$$\begin{aligned} \delta S |_{x^\alpha =const.}= & {} \int d^4 x\,\left( -\frac{1}{8\pi } F^{\alpha \beta } \delta F_{\alpha \beta } +J^\mu \delta A_\mu \right) \nonumber \\= & {} \int d^4 x\,\left( -\frac{1}{4\pi } F^{\alpha \beta } \delta (\partial _\alpha A_\beta ) + J^\mu \delta A_\mu \right) \nonumber \\= & {} \int d^4 x\,\left( \frac{1}{4\pi } \partial _\alpha F^{\alpha \beta } \delta A_\beta + J^\mu \delta A_\mu \right) - \int d^4 x\, \frac{1}{4\pi } \partial _\alpha (F^{\alpha \beta }\delta A_\beta ) \nonumber \\= & {} \int d^4 x\, \left( \frac{1}{4\pi }\partial _\alpha F^{\alpha \mu }+J^\mu \right) \delta A_\mu , \end{aligned}$$
(46)

that is

$$\begin{aligned} \partial _\alpha F^{\mu \alpha }= 4\pi J^\mu \,. \end{aligned}$$
(47)

The next to last equality in this sequence follows from the usual Lagrangian variation integration by parts, resulting in the integral of a divergence which by Gauss’s law is equivalent to a boundary integral where the variation is assumed to vanish and hence does not contribute to the final expression.

The variation of S with respect to the coordinates of the charge element world lines where the above variations A and B are relevant requires first reinterpreting the spacetime volume integral of the interaction term as the integral over a family of line integrals along those world lines. This is most easily done using the adapted Fermi coordinate system where the spatial coordinates parametrize the world lines of the charge elements, which are the time lines of the system. The spacetime volume element is \(d^4x= N dt\, dV =d\tau \, dV\) with \(dV=d^3x\) and \(d\tau =N \,dt \). The 4-current is \(J^\mu =\rho \, U^\mu \), where \(\rho \) is the rest frame charge density, which is a constant (along the world lines but zero everywhere else), and \(U^\alpha =dx^\alpha /d\tau \) is the charge element 4-velocity. Then let \(de=\rho dV\). The interaction term in the action can then be represented as the integral of the line integral alone the world line with respect to the rest frame charge density

$$\begin{aligned} \int J^\mu A_\mu d^4 x = \int \int \rho A_\mu \frac{dx^\mu }{d\tau } d\tau \, dV = \int \left( \int A_\mu dx^\mu \right) \, de \,. \end{aligned}$$
(48)

Keeping in mind the geometrical origin of de, the line integral is coordinate independent and so one can use this expression also in inertial coordinates using any parametrization of the world lines.

Since variations of the electromagnetic field Lagrangian at constant \(A_\alpha \) vanish, we only have to vary the source term, where \(A_\alpha \) is instead evaluated along the charge element world lines so \(\delta A_\mu = A_{\mu ,\nu } \delta x^\nu \). Using the fact that \(\delta (dx^\mu )=d(\delta x^\mu )\) as usual in the Lagrangian variation, we find step by step for the variation of the interaction term

$$\begin{aligned} \delta S |_{A_\alpha =const.}= & {} \delta \left( \int d e\, dx^\mu A_\mu \right) \nonumber \\= & {} \int de \, \int \, \left[ A_{\mu ,\sigma } dx^\mu \delta x^\sigma +A_\sigma \delta dx^\sigma \right] \nonumber \\= & {} \int de \, \int \, \left[ A_{\mu ,\sigma } dx^\mu \delta x^\sigma - dA_\sigma \delta x^\sigma + d(A_\sigma \delta x^\sigma ) \right] \nonumber \\= & {} \int de \,\int \, \left[ ( A_{\mu ,\sigma } - A_{\sigma ,\mu } ) d x^\mu \delta x^\sigma + d(A_\sigma \delta x^\sigma )\right] \nonumber \\= & {} \int de \,\int \, F_{\sigma \mu }dx^\mu \delta x^\sigma +\int de\, \int \, d(A_\sigma \delta x^\sigma ). \end{aligned}$$
(49)

Ignoring the boundary term, the first integral (where the line integral part is independent of the parametrization of the world lines) can be expressed in terms of inertial coordinates or proper time in Fermi coordinates, where the Fermi lapse correction factor depends on the location of the charge element

$$\begin{aligned} \int \left( \int F_{\sigma \mu } \frac{dx^\mu }{dt} de\right) \delta x^\sigma \, dt = \int \left( \int F_{\sigma \mu } \frac{dx^\mu }{d\tau } N de\right) \delta x^\sigma \, dt \,. \end{aligned}$$
(50)

Both expressions are equivalent but the presence of a nonunit lapse function in the Fermi coordinate system is crucial.

If we consider the left expression in inertial coordinates in which the electron is momentarily at rest (so that \(N=1\), \(dx^\mu /dt=\delta ^\mu {}_0 \) and dV agrees with the Fermi coordinate volume element), it reduces to

$$\begin{aligned} \int \left( \int E_i \,de\right) \delta x^i \, dt =\int \, \left( \int \rho E_i dV\right) \delta x^i \, dt \end{aligned}$$
(51)

since \(F_{i0}=E_i\) is the electric field in inertial coordinates and \(F_{00}=0\). The factor in parentheses is just the total electric force on the distribution of electric charge at this moment. For the Fermi variation A in these inertial coordinates, one has \(\delta x^\sigma =\delta ^\sigma {}_i \delta x^i (t)\) and one can require that \( \delta x^i(t_1) =0 = \delta x^i (t_2)\) at the boundary inertial time hyperplanes of the region of integration, while leaving \(\delta x^i(t)\) arbitrary in between. This allows one to ignore the boundary term which integrates to the end times where the variation vanishes, while forcing the expression in parentheses to zero if we ignore the mechanical mass term in the Lagrangian for the moment, leading to the condition

$$\begin{aligned} \int \rho E_i dV = 0 \,. \end{aligned}$$
(52)

This is the starting point of the Abraham-Born derivation of the equations of motion in the model of the electron with zero mechanical mass, showing that it is equivalent to assuming the noncovariant rigidity condition, which Fermi concludes must obviously invalidate that model.

The only difference for his variation B in the Fermi coordinate system is the additional factor of the Fermi lapse in the differential of proper time needed to define the electric field in that coordinate system

$$\begin{aligned} 0=\int F_{\sigma \mu } \frac{dx^\mu }{d\tau } N de = \int F_{\sigma \mu } U^\mu N de = \int \rho E(U)_\mu N dV\,, \end{aligned}$$
(53)

an expression which only has nonzero components \(E(U)_\mu = \delta ^i{}_\mu E(U)_i\) in either Fermi coordinates or in inertial coordinates in which the electron is momentarily at rest, where \(E(U)_i=E_i\) then agree. Clearly when the acceleration is identically zero \(\Gamma _i=0\) and \(N=1\), the final conditions are the same for both cases A and B, so one must have nonzero acceleration to see a difference in these two cases. Of course without acceleration one cannot measure the inertial mass.

To finish the story we must analyze these conditions in terms of the internal forces exerted on the charge elements by other charge elements and the forces exerted by the external electromagnetic field responsible for the acceleration of the electron. It is the separation of the self-field and the external field that allows one to extract the Lorentz force law relation to the acceleration of the central world line (corrected by radiation reaction terms if one expands it far enough in the acceleration) and thus identify the inertial mass coefficient where the 4/3 problem is apparent, and Fermi’s correction restores this factor to 1. The uncorrected Abraham-Lorentz condition is discussed in detail in Jackson [51, 52] (although the Third Edition omits the final explicit evaluation of the famous 4/3 term), so we only summarize it here. We then follow Fermi in explicitly evaluating the correction term to see its effect in removing the unwanted 4/3 factor. Finally we will consider the additional mechanical mass term in the Lagrangian to follow Fermi’s original Lagrangian discussion in his third paper. For the moment we set this term to zero as in Fermi’s fourth paper.

\(\bullet \) Field separation for variations of type A

Consider first the system of variations A.

$$\begin{aligned} 0= \int E_a\, de\,. \end{aligned}$$
(54)

Let \(E=E_{\textrm{self}}+E_{\textrm{ext}}\), where \(E_{\textrm{self}}\) and \(E_{\textrm{ext}}\) the contributions to the total field due to the self-interaction of the system and to the external electric field respectively, the latter of which is assumed to be sufficiently uniform over the small dimensions of the system that it can be pulled out of the integral, which results in the total charge multiplying the external electric field evaluated at the central world line. Equation (54) thus becomes

$$\begin{aligned} F_{\textrm{ext}}^a\equiv \int E_{\textrm{ext}}^a\, de = E_{\textrm{ext}}^a \int \, de = -\int E_{\textrm{self}}^a\, de\equiv -F_{\textrm{self}}^a\,. \end{aligned}$$
(55)

The self-force is the result of the interaction of each element of charge of the sphere with every other element. The explicit details of the calculation involving the retarded times can be found in Jackson’s textbook [51, 52]. The self-field can be expressed in terms of the self-potentials A and \(\phi \) by

$$\begin{aligned} E_{\textrm{self}}=-\nabla \phi -\frac{1}{c}\frac{\partial A}{\partial t}\,, \end{aligned}$$
(56)

so that

$$\begin{aligned} F_{\textrm{ext}}= \int \rho \, \left[ \nabla \phi +\frac{1}{c}\frac{\partial A}{\partial t}\right] \, d^3\textbf{x}\,, \end{aligned}$$
(57)

since the charge element is \(de=\rho \,d^3\textbf{x}\). We now adopt the Jackson notation that \(\textbf{x}\) is the spatial position vector in the Cartesian coordinate system and \(dV=d^3\textbf{x}\) is the spatial volume element, and let \(\textbf{v}\) and \(\textbf{a}=\dot{\textbf{v}}=\Gamma \) be the velocity and acceleration of the charge distribution, which at the initial time t of our calculation satisfies \(\textbf{v}(t)=0\) (all elements of the charge distribution are simultaneously at rest) and \(\textbf{a}=\textbf{a}(t)\) (the acceleration is the same for all elements of the charge distribution at that moment), expressing the nonrelativistic rigidity of the charge distribution. We also reintroduce factors of the speed of light c into the discussion.

By evaluating the potentials at the retarded time \(t'=t-|\textbf{x}-\textbf{x}'|/c\), i.e.,

$$\begin{aligned} A= \frac{1}{c}\int \frac{[J(t',\textbf{x}')]_{\textrm{ret}}}{|\textbf{x}-\textbf{x}'|}\, d^3\textbf{x}'\,, \qquad \phi = \int \frac{[\rho (t',\textbf{x}')]_{\textrm{ret}}}{|\textbf{x}-\textbf{x}'|}\, d^3\textbf{x}'\,, \end{aligned}$$
(58)

and using the rule (Taylor series expansion about the time \(t'=t\))

$$\begin{aligned} {[}\ldots ]_{\textrm{ret}}=\sum _{n=0}^\infty \frac{(-1)^n}{n!}\left( \frac{|\textbf{x}-\textbf{x}'|}{c}\right) ^n\frac{\partial ^{n}}{\partial t^{n}}[\ldots ]\vert _{t'=t} \,, \end{aligned}$$
(59)

Eq. (57) becomes

$$\begin{aligned} F_{\textrm{ext}}= & {} \sum _{n=0}^\infty \frac{(-1)^n}{n!\,c^n}\int d^3\textbf{x} \int d^3\textbf{x}'\, \rho (t,\textbf{x})\frac{\partial ^{n}}{\partial t^{n}}\bigg [\rho (t,\textbf{x}')\nabla (|\textbf{x}-\textbf{x}'|^{n-1})\nonumber \\{} & {} +\frac{|\textbf{x}-\textbf{x}'|^{n-1}}{c^2}\frac{\partial J(t,\textbf{x}')}{\partial t}\bigg ]. \end{aligned}$$
(60)

Consider the first term in the brackets. The \(n=0\) term

$$\begin{aligned} \int d^3\textbf{x} \int d^3\textbf{x}'\, \rho (t,\textbf{x}) \rho (t,\textbf{x}')\nabla |\textbf{x}-\textbf{x}'|^{-1} \end{aligned}$$
(61)

vanishes in the case of a spherically symmetric charge distribution, whereas the \(n=1\) term is identically zero (gradient of a constant), implying that the first nonvanishing contribution comes from \(n=2\). Changing the summation indices thus leads to

$$\begin{aligned} F_{\textrm{ext}}= & {} \sum _{n=0}^\infty \frac{(-1)^n}{n!\,c^{n+2}}\int d^3\textbf{x} \int d^3\textbf{x}'\, \rho (t,\textbf{x})|\textbf{x}-\textbf{x}'|^{n-1}\frac{\partial ^{n+1}}{\partial t^{n+1}}\bigg [J(t,\textbf{x}')\nonumber \\{} & {} +\frac{\partial \rho (t,\textbf{x}')}{\partial t}\frac{\nabla (|\textbf{x}-\textbf{x}'|^{n+1})}{(n+1)(n+2)|\textbf{x}-\textbf{x}'|^{n-1}}\bigg ]. \end{aligned}$$
(62)

The continuity equation, spherical symmetry and angular averaging can be used to simplify this expression, taking into account also that for a rigid charge distribution the current is \(J(t,\textbf{x}')=\rho (t,\textbf{x}')\textbf{v}(t)\), where \(\textbf{v}(t)=0\) holds at the time t at which this calculation is carried out, so only its time derivatives contribute to the series expansion. The term in this expansion containing the first time derivative of the acceleration \(\dot{\Gamma }=\ddot{\textbf{v}}\) is associated with the radiation reaction, not discussed here.

The final result, obtained by neglecting all nonlinear powers of the acceleration and its derivatives (which appear for \(n\ge 4\)), at lowest order can be written as

$$\begin{aligned} F_{\textrm{ext}}=-F_{\textrm{self}}=\frac{2}{3}\sum _{n=0}^\infty \frac{(-1)^n}{n!}\frac{I_n}{c^{n+2}}\frac{\partial ^{n}}{\partial t^{n}} \dot{\textbf{v}}\,, \end{aligned}$$
(63)

where

$$\begin{aligned} I_n=\int \int d^3\textbf{x}d^3\textbf{x}'\,\rho (t,\textbf{x})|\textbf{x}-\textbf{x}'|^{n-1}\rho (t,\textbf{x}')\,. \end{aligned}$$
(64)

The lowest order term is the only one considered by Fermi to make his point and is twice the self-energy of the charge distribution

$$\begin{aligned} I_0 = 2 W =\int \int d^3\textbf{x}d^3\textbf{x}'\,\frac{\rho (t,\textbf{x}) \rho (t,\textbf{x}')}{|\textbf{x}-\textbf{x}'|}\,, \end{aligned}$$
(65)

which for the spherical shell model of the electron is \(2W=e^2/r_0\). In the point particle limit, \(I_0\) diverges corresponding to the infinite self-energy of a point particle, \(I_1=e^2\), and \(I_n=0\) for \(n>1\). When the charge is uniformly distributed over the surface of the sphere one has \(I_n=2e^2(2r_0)^{n-1}/(n+1)\).

In the nonrelativistic limit for any smooth spherically symmetric distribution of charge (i.e., considering only the \(n=0\) term of the series) Eq. (63) becomes

$$\begin{aligned} F_{\textrm{self}}^{\textrm{NR}}=-\frac{4}{3}\frac{W}{c^2}\dot{\textbf{v}} \,, \end{aligned}$$
(66)

so that the Newton’s equation of motion for the system takes the form

$$\begin{aligned} F_{\textrm{ext}}^{\textrm{NR}}=\frac{4}{3}m_{\textrm{em}}\dot{\textbf{v}} \,, \qquad m_{\textrm{em}}=\frac{W}{c^2}\,. \end{aligned}$$
(67)

This is 4/3 times the electromagnetic mass \(m_{\textrm{em}}\) defined by the Einstein mass-energy relation. Recall that this is understood to be expressed in an inertial frame in which the electron is momentarily at rest, ignoring higher order terms in the acceleration which include the famous radiation reaction terms.

\(\bullet \) Field separation for variations of type B

The “correct” result in which the unwanted factor of 4/3 is removed is achieved starting instead with Fermi’s corrected integral condition, so that in the previous calculation of Jackson we must replace the factor of \(\rho (t,\textbf{x})\) in the double spatial integral by \(\rho (t,\textbf{x}) (1+\dot{\textbf{v}}(t) \cdot \textbf{x})\), assuming that we are using a Fermi coordinate system at a time slice which coincides with the previous inertial coordinate slice of the preceding discussion when the electron is momentarily at rest. Thus the vanishing integral \(n=0\) term, namely Eq. (61), of the original expansion now becomes

$$\begin{aligned}{} & {} \int d^3x \int d^3x'\, \rho (t,\textbf{x}) [1+\dot{\textbf{v}}(t) \cdot \textbf{x}/c^2] \rho (t,\textbf{x}')\nabla |\textbf{x}-\textbf{x}'|^{-1} \nonumber \\{} & {} \quad = \int d^3x \int d^3x'\, \rho (t,\textbf{x}) [\dot{\textbf{v}}(t) \cdot \textbf{x}/c^2] \rho (t,\textbf{x}')\nabla |\textbf{x}-\textbf{x}'|^{-1} . \end{aligned}$$
(68)

Fermi noted that this double spatial integral will give the same value if the two dummy vector integration variables are switched, and hence can also be replaced by the average of these two ways of writing the same integral. Letting \(\nabla |\textbf{x}-\textbf{x}'|^{-1}=-(\textbf{x}-\textbf{x}')/ |\textbf{x}-\textbf{x}'|^{3}\)

$$\begin{aligned}{} & {} c^{-2} \int d^3x \int d^3x'\, \rho (t,\textbf{x}) \rho (t,\textbf{x}')[\dot{\textbf{v}}(t) \cdot \textbf{x}] (\textbf{x}'-\textbf{x})/ |\textbf{x}-\textbf{x}'|^{3} \nonumber \\{} & {} \quad = c^{-2}\int d^3x \int d^3x'\, \rho (t,\textbf{x}') \rho (t,\textbf{x}) [\dot{\textbf{v}}(t) \cdot \textbf{x}'] (\textbf{x}-\textbf{x}')/ |\textbf{x}-\textbf{x}'|^{3} \nonumber \\{} & {} \quad = -c^{-2}\frac{1}{2}\int d^3x \int d^3x'\, \rho (t,\textbf{x})\rho (t,\textbf{x}') [\dot{\textbf{v}}(t) \cdot (\textbf{x}'-\textbf{x})] (\textbf{x}'-\textbf{x})/ |\textbf{x}-\textbf{x}'|^{3} . \end{aligned}$$
(69)

Now imposing spherical symmetry about the origin, the components of this vector integral are nonzero only along the acceleration vector, with a coefficient which can be replaced by the average value of the vector component integral

$$\begin{aligned} -[\dot{\textbf{v}}(t) \cdot (\textbf{x}'-\textbf{x})] (\textbf{x}'-\textbf{x}) \rightarrow -\dot{\textbf{v}}(t) \frac{1}{3} (\textbf{x}'-\textbf{x}) \cdot (\textbf{x}'-\textbf{x}) = -\dot{\textbf{v}}(t) \frac{1}{3} |\textbf{x}'-\textbf{x}|^2 \end{aligned}$$
(70)

so it reduces to

$$\begin{aligned}{} & {} -\frac{1}{3} \frac{\dot{\textbf{v}}(t)}{c^2} \left[ \frac{1}{2} \int d^3x \int d^3x'\, \rho (t,\textbf{x}) \rho (t,\textbf{x}')/ |\textbf{x}-\textbf{x}'| \right] = -\frac{1}{3} \frac{W}{c^2} \dot{\textbf{v}}(t) , \end{aligned}$$
(71)

since the expression in square brackets is the self-energy of the charge distribution at the time t. This is the only additional term linear in the acceleration which contributes to the lowest terms of the previous calculation (so that the lowest order radiation reaction term is unchanged, although not shown here)

$$\begin{aligned} F_{\textrm{ext}}^{\textrm{NR}}=\frac{4}{3}\frac{W}{c^2}\dot{\textbf{v}} -\frac{1}{3}\frac{W}{c^2}\dot{\textbf{v}} = \frac{W}{c^2}\dot{\textbf{v}} \,, \end{aligned}$$
(72)

which leads to the desired result

$$\begin{aligned} F_{\textrm{ext}}^{\textrm{NR}}= m_{\textrm{em}}\dot{\textbf{v}} \,, \qquad m_{\textrm{em}}=\frac{W}{c^2}\,. \end{aligned}$$
(73)

in the nonrelativistic limit, according to Newton’s law with the electromagnetic mass \(m_{\textrm{em}}=W/c^2\).

Finally to consider the contribution to the Lagrangian from a mechanical mass distribution, we must vary the final term in the Lagrangian which has been ignored until now. In the Fermi coordinate system this is trivial. The Lagrangian term is simply

$$\begin{aligned} -\int d\tau \,dm = -\int \left( \int N \, dt\right) dm = -\int \left( \int 1+\Gamma _i x^i \, dt\right) dm \,, \end{aligned}$$
(74)

and its variation is

$$\begin{aligned}{} & {} -\delta \int \left( \int 1+\Gamma _i x^i \, dt\right) dm = -\int \left( \int \Gamma _i \delta x^i \, dt\right) dm \nonumber \\{} & {} \quad = -\left( \int \, dm\right) \int \Gamma _i\delta x^i dt = - \int (m_0\Gamma _i)\delta x^i dt , \end{aligned}$$
(75)

where \(m_0\) is the total mechanical mass. The contribution to the above Fermi condition are the coefficients of the arbitrary variations \(\delta x^i = \delta x^i (t)\), namely just the term \( - \int (m_0\Gamma _i) = - m_0 \dot{v}{}^i\). The complete equation of motion is then first

$$\begin{aligned} \int \rho E(U)_i (1+\Gamma _j x^j)\, dV - m_0 \Gamma _i = 0 \,, \end{aligned}$$
(76)

and then after splitting off the self-force and passing to the lowest order approximation

$$\begin{aligned} (m_0+m_{\textrm{em}}) \dot{\textbf{v}} = F_{\textrm{ext}}^{\textrm{NR}} \,. \end{aligned}$$
(77)

Thus mechanical mass and the electromagnetic mass contribute in the same way to the total inertial rest mass of the spherical distribution of charged matter.

5 Relating Kwal-Rohrlich Back to Fermi Through Gauss

Given the Kwal-Rohrlich 4-momentum evaluated for an unaccelerated electron and the inertial mass contribution from the electromagnetic field found by Fermi for the accelerated electron, it is natural to look for a relation between them. In the unaccelerated case, one has an entire family of distinct 4-momenta which depend on the inertial observer, but the one we usually associate with the electron of a certain rest energy is the one defined by the rest frame observer. Although Fermi stopped his analysis once he achieved his limited goal, in light of the 4-momentum integral situation in which interest later arose, it is natural to continue his line of thought to its logical conclusion. We do this here and find that Fermi’s corrected condition which generates the correct equations of motion guarantees the conservation of the total 4-momentum as seen in the instantaneous rest frame of the accelerated electron at each point of its world line.

All we need do do is specialize the Gauss law discussion begun in Section 2 to the electromagnetic stress-energy tensor over the spacetime region R between two successive time hyperplanes \(\Sigma _t\) and \(\Sigma _{t+\Delta t}\) associated with a Fermi coordinate system adapted to the central world line of the accelerated electron, as in Fig. 2 of the Appendix where the case of 1-dimensional motion is illustated. Let \(\Delta t>0\) so \(t+\Delta t\) is to the future of t along the central world line where t measures the elapsed proper time. Figure 2 shows the tilting of the Fermi time slices to remain orthogonal to the central world line of the electron and to the common local rest space of the elements of charge which make up the electron sphere. Then Eqs. (13) and (5) lead to the fundamental relation

$$\begin{aligned} -\int _{R} Q_\alpha \rho E(U)^\alpha d^4V = Q_\alpha \left[ P(\Sigma _{t+\Delta t} )^\alpha -P(\Sigma _t)^\alpha \right] \,, \end{aligned}$$
(78)

where on the right hand side the components have to be expressed in inertial coordinates or the components \(Q_\alpha \) are not constant and cannot be factored out of the integral. On the left hand side, if evaluated in the Fermi coordinate system, these components are functions of time to compensate for the time-dependent change of direction of the 4-velocity of the central world line, and so can only be pulled out of the spatial integral. Recall that \(E(U)^\alpha \) is the electric field seen in the electron rest frame and \(\rho \) is the rest frame charge density.

Let \(R_-\) be the half-region for which the hyperplane \(\Sigma _{t+\Delta t}\) is in the future of \(\Sigma _t\), while \(R_+\) has the reverse relationship, as in Fig. 2, so that the world tube of the electron cuts through the region \(R_-\) as shown there. Splitting the integral into the spatial integral and then the temporal integral, using the spacetime volume element \( d^4V=(1+\Gamma _i x^i) dV\, dt\), one then has

$$\begin{aligned} -\int ^{t+\Delta t}_t Q_\alpha \int _{\Sigma _\tau \cap R_-} \left[ \rho E(U)^\alpha (1+\Gamma _i x^i) dV \right] d\tau = Q_\alpha \left[ P(\Sigma _{t+\Delta t} )^\alpha -P(\Sigma _t)^\alpha \right] \,. \end{aligned}$$
(79)

For the Born rigid distribution of charge according to the Fermi condition (76), the spatial integral in parentheses on the left hand side of Eq. (79) at each Fermi time (which the proper time parameter along the central world line) equals the mechanical mass times the proper time covariant derivative \(D/d\tau \) of the 4-velocity of the central world line

$$\begin{aligned} m_0 \delta ^\alpha {}_i \Gamma ^i =m_0 \frac{D u^\alpha }{d\tau } =\frac{D}{d\tau }( m_0 u^\alpha ) =\frac{D}{d\tau } p^\alpha _0 \end{aligned}$$
(80)

where \(p_0^\alpha = DU^\alpha /d\tau \) is the mechanical momentum. Here we use the notation \(D/d\tau \) to remind us that in noninertial coordinates like those of Fermi, the covariant derivative along the parametrized curve does not coincide with the action of the ordinary such derivative, but when we evaluate the expression in components with respect to a fixed inertial coordinate system, it does. The final integral with respect to the Fermi time coordinate, if performed with the components taken in an inertial coordinate system, is then just the difference of the mechanical momentum between the two Fermi times

$$\begin{aligned} \int _t^{t+\Delta t} Q_\alpha \left( \frac{dp_0^\alpha }{dt}\right) \,dt = Q_\alpha \left[ p_0(t+\Delta t)^\alpha -p_0(t)^\alpha \right] \,, \end{aligned}$$
(81)

so that

$$\begin{aligned} -Q_\alpha \left[ p_0^\alpha (t+\Delta t) -p_0^\alpha (t)\right] = Q_\alpha \left[ P(t+\Delta t)^\alpha - P(t)^\alpha \right] \,, \end{aligned}$$
(82)

Expressing this in inertial coordinates, since \(Q_\alpha \) are arbitrary constants, we find

$$\begin{aligned} p_0(t+\Delta t)^\alpha + P(t+\Delta t)^\alpha = p_0(t)^\alpha + P(t)^\alpha \,, \end{aligned}$$
(83)

namely that the sum of the mechanical 4-momentum and the 4-momentum of the external electromagnetic field \(p_0^\alpha +P^\alpha \) must be the same on the two Fermi time slices and hence on every Fermi time slice. In other words the Fermi condition is equivalent to the conservation of the Kwal-Rohrlich 4-momentum for the total system, a fact which no one seems to have realized until now. Thus Fermi also pointed the way towards selecting the only observer-defined total 4-momentum which is conserved and which corresponds to what we associate with this system. The proper time derivative of this relation gives its rate of change version

$$\begin{aligned} \frac{D}{dt} (p_0(t)^\alpha + P(t)^\alpha )=0\,. \end{aligned}$$
(84)

Thus the calculations initiated by Fermi nearly a century ago have finally reached their natural conclusion.

Apart from Kolbenstvedt [36] much later in 1997, only Aharoni [16] seems to have seen and understood Fermi’s argument, explaining exactly what Fermi did in detail in his 1965 textbook revised because of the then recent Rohrlich work on this topic and re-interpreting it in his own way, explaining in detail how the 4-momentum integrals first explained by Kwal and later Rohrlich are connected to Fermi’s approach to the problem. Anaroni’s equations (6.5), (6.18) and (6.19) for the total self-force due to the electron charge distribution involve through his (6.18) the proper time rate of change of an integral over the spacetime region between two successive proper time hypersurfaces of the electron (his own reformulation of the self-force in view of the Kwal-Rohrlich integral definition as noted in a footnote). Aharoni considers the following equivalent reformulation of the previous equations valid for the total electromagnetic field, but restricted only to the self-field in order to define the self-force due only to the self-field of the charge distribution

$$\begin{aligned} \frac{dP^\mu }{d\tau } =-\frac{d}{d\tau } \int _{\tau _0}^\tau \int F_{\textrm{self}}{}^\mu {}_\nu J^\nu \, d\tau d\Sigma = - \delta ^\mu {}_i \int (1+\Gamma _j x^j) E_{\textrm{self}}^i \rho \, d^3 x \,. \end{aligned}$$
(85)

However, Aharoni failed to relate his“postulated” self-force expression to Gauss’s law to show that it actually is related to the proper time rate of change of the Kwal-Rohrlich 4-momentum integral restricted to the self-field. Spohn and Yaghjian both have long bibliographies in their textbooks, but neither mentions Aharoni, while Rohrlich has an author index indicating Aharoni’s name on page 283 where no reference to anyone can be found. Only the much later work of Kolbenstvedt acknowledges Fermi’s approach, rederiving it in a slightly different but equivalent form, also ignored by Rohrlich, Spohn and Yaghjian in their later editions.

6 Concluding Remarks

References [8,9,10,11] was the culmination of Fermi’s early work in relativity published in a series of his first four articles only a few years after the birth of general relativity and written while he was a university student. Its actual contents seem to have remained a mystery to nearly all those who have cited it in discussions of the classical theory of the electron which still interested people long after its beginnings in the early nineteenth century, while the leading textbook on classical electrodynamics only repeats the Abraham-Lorentz derivation of the equations of motion without Fermi’s correction, although admitting that it can be relativistically corrected following Fermi. Ironically Fermi’s third paper (see Ref. [63] for a historical discussion), which he considered only a tool for obtaining his result in that fourth paper, and which Fermi never even explicitly cited there, did make an indelible mark on relativity with the terms Fermi coordinates and Fermi-Walker transport, although even the much later paper by Walker that coupled together their names forever also ignores Fermi’s original paper in Italian. Surprisingly even the text by Rohrlich [46, 47], updated in 2007 four decades after its original publication, fails to connect his own adjustment of the definition of the 4-momentum of the electromagnetic field of the classical electron to Fermi’s argument about the equations of motion, while recent books by Yaghjian and Spohn devoted to this area also show no sign that they have ever seen Fermi’s argument. We hope the present work restores Fermi’s message to its rightful place, and perhaps provokes some thought about its meaning.