1 Introduction

This paper presents a historical review of the measurement of the energy of earthquakes, in the framework of the parallel development of the concept of magnitude. In particular, we seek to understand why the classical formula

$$\begin{aligned} \log _{10} E = 1.5 M_{\mathrm{s}} + 11.8 \end{aligned}$$
(1)

referred to as “Gutenberg [and Richter]’s energy–magnitude relation” features a slope of 1.5 which is not predicted a priori by simple physical arguments. We will use Gutenberg and Richter’s (1956a) notation, Q [their Eq. (16) p. 133], for the slope of \(\log _{10} E\) versus magnitude [1.5 in (1)].

We are motivated by the fact that Eq. (1) is to be found nowhere in this exact form in any of the traditional references in its support, which incidentally were most probably copied from one referring publication to the next. They consist of Gutenberg and Richter (1954) (Seismicity of the Earth), Gutenberg (1956) [the reference given by Kanamori (1977) in his paper introducing the concept of the “moment magnitude” \(M_{\mathrm{w}}\)], and Gutenberg and Richter (1956b). For example, Eq. (1) is not spelt out anywhere in Gutenberg (1956), although it can be obtained by combining the actual formula proposed for E [his Eq. (3) p. 3]

$$\begin{aligned} \log _{10} E = 2.4 m + 5.8 \end{aligned}$$
(2)

with the relationship between the “unified magnitude” m (Gutenberg’s own quotes) and the surface-wave magnitude \(M_{\mathrm{s}}\) [Eq. (1) p. 3 of Gutenberg (1956)]:

$$\begin{aligned} m = 0.63 M_{\mathrm{s}}+2.5, \end{aligned}$$
(3)

neither slope (2.4 or 0.63) having a simple physical justification. The same combination is also given by Richter (1958, pp. 365–366), even though he proposes the unexplained constant 11.4 instead of 11.8 in (1), a difference which may appear trivial, but still involves a ratio of 2.5. It is also given in the caption of the nomogram on Fig. 2 of Gutenberg and Richter (1956b), which does provide separate derivations of (2) and (3). As for Gutenberg and Richter (1954) (the third edition, generally regarded as definitive, of Seismicity of the Earth), the only mention of energy is found in its Introduction (p. 10)

“In this book, we have assumed for radiated energy the partly empirical equation

$$\begin{aligned} \log _{10} E = 12 + 1.8 M. \end{aligned}$$
(4)

This seems to give too great energy. At present (1953), the following form is preferred:

$$\begin{aligned} \log _{10} E = 11 + 1.6 M". \end{aligned}$$
(5)

While Eq. (4), with \(Q = 1.8\), was derived from Gutenberg and Richter (1942), Eq. (5), with \(Q = 1.6\), was apparently never formally published or analytically explained.

The fact that none of the three key references to “Gutenberg and Richter’s energy–magnitude relation” actually spells it out warrants some research into the origin of the formula, from both historical and theoretical standpoints. In order to shed some light on the origin of (1), and to recast it within modern seismic source theory, this paper explores the development of the concept of earthquake energy and of its measurement, notably in the framework of the introduction of magnitude by Richter (1935). In particular, we examine all of Gutenberg’s papers on the subject, using the compilation of his bibliography available from his obituary (Richter 1962).

2 The Modern Context and the Apparent “Energy Paradox”

Understanding the evolution of the concept of magnitude and the attempts to relate it to seismic energy must be based on our present command of seismic source theory. In this respect, this section attempts to provide a modern theoretical forecast of a possible relation between magnitude and energy. We base our discussion on the concept of double couple M introduced by Vvedenskaya (1956), and later Knopoff and Gilbert (1959) as the system of forces representing a seismic dislocation, its scalar value being the seismic moment \(M_0\) of the earthquake.

Note that we consider here, as the “energy” of an earthquake only the release of elastic energy stored during the interseismic deformation of the Earth, and not the changes in gravitational and rotational kinetic energy resulting from the redistribution of mass during the earthquake, which may be several orders of magnitude larger (Dahlen 1977).

2.1 The Energy Paradox

We first recall that magnitude was introduced by Richter (1935) as a measure of the logarithm of the amplitude of the seismic trace recorded by a torsion instrument at a distance of 100 km, and thus essentially of the ground motion generated by the earthquake. In the absence of source finiteness effects, and given the linearity of the equations of mechanics governing the Earth’s response [traceable all the way to Newton’s (1687) “f =  ma”], that ground motion, A in the notation of most of Gutenberg’s papers, should be proportional to \(M_0\), and hence any magnitude M should grow like \(\log _{10} M_{0}\). This is indeed what is predicted theoretically and observed empirically, for example for the surface-wave magnitude \(M_{\mathrm{s}}\) below about six (Geller 1976; Ekström and Dziewoński 1988; Okal 1989).

In most early contributions, it was generally assumed that the energy of a seismic source could be computed from the kinetic energy of the ground motion imparted to the Earth by the passage of a seismic wave, which would be expected to grow as the square of the amplitude of ground motion. Since the concept of magnitude measures the logarithm of the latter, this leads naturally to \(Q = 2\), as featured by earlier versions of Eq. (1) (Gutenberg and Richter 1936).

By contrast, using the model of a double-couple M, the seismic energy E released by the source is simply its scalar product with the strain \(\mathbf {\varepsilon }\) released during the earthquake. The absolute value of the strain should be a characteristic of the rock fracturing during the earthquake, and as such an invariant in the problem, so that E should be proportional to \(M_0\). Again, in the absence of source finiteness effects, the linearity between seismic source and ground motion (“f =  ma”) will then result in E being directly proportional to ground motion, and hence in a slope \(Q = 1\) in Eq. (1).

We thus reach a paradox, in that the two above arguments predict contradictory values of Q. The highly quoted Gutenberg and Richter relationship (1), which uses the intermediate value \(Q = 1.5\), may appear as a somewhat acceptable compromise, but satisfies neither interpretation. Thus, it deserves full understanding and discussion.

2.2 A Modern Approach

The origin of this apparent, and well-known, “energy paradox” can be traced to at least three effects:

  1. (i)

    Most importantly, the proportionality of energy to the square of displacement holds only for a monochromatic harmonic oscillator (with the additional assumption of a frequency not varying with size), while the spectrum of seismic ground motion following an earthquake is distributed over a wide range of frequencies, and thus the resulting time-domain amplitude at any given point (which is what is measured by a magnitude scale) is a complex function of its various spectral components;

  2. (ii)

    Because of source finiteness (“a large earthquake takes time and space to occur”), destructive interference between individual elements of the source causes ground motion amplitudes measured at any given period to grow with moment slower than linearly, and eventually to saturate for large earthquakes (Geller 1976);

  3. (iii)

    Ground motion can be measured only through the use of instruments acting as filters; while some of them could in principle be so narrow as to give the seismogram the appearance of a monochromatic oscillator, justifying assumption (i), the inescapable fact remains that most of the energy of the source would then be hidden outside the bandwidth recorded by the instrument. In parallel, ground motion can be measured only at some distance from the source, and anelastic attenuation over the corresponding path will similarly affect the spectrum of the recorded seismogram.

In modern days, the effect of (iii) is vastly reduced by the availability of broadband instrumentation. Using modern theory, it is possible to offer quantitative models of the concept of finiteness (ii), as first described by Ben-Menahem (1961), and, when integrating it over frequency [which takes care of (i)], to reconcile quantitatively the paradox exposed above.

In practice, seismic magnitudes have been, and still are, measured either on body waves, or on surface waves (exceptionally on normal modes). As discussed by Vassiliou and Kanamori (1982), the energy radiated in P and S wavetrains can be written as

$$\begin{aligned} E^{\mathrm{Body}} = F^{\rm B} \cdot \frac{M_{0}^{2} }{t_{0}^{3}} \end{aligned}$$
(6)

where \(t_0\) represents the total duration of the source (the inverse of a corner frequency), which under seismic scaling laws (Aki 1967) is expected to grow like \(M_{0}^{1/3}\), and \(F^{\rm B}\) is a combination of structural parameters (density, seismic velocities) and of the ratio x of rise time to rupture time, which are expected to remain invariant under seismic scaling laws. As a result, \(E/M_0\) is also expected to remain constant, its logarithm being estimated at \(-4.33\) by Vassiliou and Kanamori (1982), and \(-4.90\) by Newman and Okal (1998). Extensive datasets compiled by Choy and Boatwright (1995) and Newman and Okal (1998) have upheld this invariance of \(\log _{10} (E/M_{0})\) with average values for shallow earthquakes of \(-\,4.80\) and \(-\,4.98\), respectively, a result later upheld even for microscopic sources by Ide and Beroza (2001).

In the case of surface waves, we have shown (Okal 2003) that the energy of a Rayleigh wave can be similarly expressed as

$$\begin{aligned} E^{\mathrm{{Rayleigh}}} = F^{\rm R} \cdot \omega _{\rm c}^{3} M_{0}^{2}, \end{aligned}$$
(7)

where \(\omega _{\rm c}\) is a corner frequency, not necessarily equal to its body wave counterpart, but still expected to behave as \(M_{0}^{-1/3}\) [see Eq. (A8) of Okal (2003)], and \(F^{\rm R}\) is a combination of structural parameters and Rayleigh group and phase velocities, which can be taken as constant. While we argue in Okal (2003) that the energy carried by Rayleigh waves represents only a small fraction (less than 10 %) of the total energy released by the dislocation, the combined result from (6) and (7) is that energy, when properly measured across the full spectrum of the seismic field, does grow linearly with the seismic moment \(M_0\).

As for the growth of magnitudes, Geller (1976) has used the concept of source finiteness heralded by Ben-Menahem (1961) to explain how any magnitude measured at a constant period T starts by being proportional to \(\log _{10} M_0\) for small earthquakes, and then grows slower with moment, as the inverse of the corner frequencies characteristic of fault length, rise time, and fault width become successively comparable to, or longer than, the reference period T. Under Geller’s (1976) model, the slope of M versus \(\log _{10} M_0\) should decrease to 2/3, then 1/3, and finally 0, as any magnitude measured at a constant period reaches an eventual saturation. The latter is predicted and observed around 8.2 for the 20-s surface-wave magnitude \(M_{\mathrm{s}}\), and predicted around 6.0 for \(m_{\rm b}\), if consistently measured at 1 s (occasional values beyond this theoretical maximum would reflect measurements taken at periods significantly longer than the 1-s standard). Figure 1 plots this behavior (\(m_{\rm b}\) as a long-dashed blue curve, \(M_{\mathrm{s}}\) in short-dotted red), as summarized by the last set of (unnumbered) equations on p. 1520 of Geller (1976). Because of the straight proportionality between E and \(M_0\), the vertical axis also represents \(\log _{10} E\), except for an additive constant.

Fig. 1
figure 1

Variation of body-wave magnitude \(m_{\rm b}\) (long dashes, blue) and surface-wave magnitude \(M_{\mathrm{s}}\) (short dots, red) with seismic moment \(M_0\), predicted theoretically from Eqs. (16) and (17) (Geller 1976). Superimposed in solid green is the relationship (18), casting values of \(M_0\) into the scale \(M_\mathrm{w}\) (Kanamori 1977)

The conclusion of these theoretical remarks is that Q is expected to grow with earthquake size, from its unperturbed value of 1 in the domain of small sources unaffected by finiteness, to 1.5 under moderate finiteness, and to a conceptually infinite value when M has fully saturated. Note that these conclusions will hold experimentally only under two conditions: (1) that the energy should be computed (either from body or surface waves) using an integration over the full wave packet, in either the time or frequency domain, these two approaches being equivalent under Parseval’s theorem; and (2) that magnitudes for events of all sizes should be computed using the same algorithms, most importantly at constant periods.

3 The Quest for Earthquake Energy: A Timeline

In this general framework, we present here a timeline of the development of measurements of earthquake energy, and of the refinement of the concept of magnitude. The discussion of the critical papers by Gutenberg and Richter which underlie Eq. (1) will be reserved for Sect. 4 and further detailed in the Appendix; contributions from other authors will be discussed here.

  1. 1.

    To our knowledge, the first attempt at quantifying the energy released by an earthquake goes back to Mendenhall (1888), who proposed a value of \(3.3 \times 10^{21}\) erg (\(2.4 \times 10^{14}\) ft lbs) for the 1886 Charleston earthquake. This figure is absolutely remarkable, given that modern estimates of the moment of the event are around (1–10) \(\times 10^{26}\) dyn cm (Johnston 1996; Bakum and Hopper 2004), which would suggest an energy of about (1.2–12) \(\times 10^{21}\) erg, according to global scaling laws (Choy and Boatwright 1995; Newman and Okal 1998). Mendenhall’s (1888) calculation was based on Lord Kelvin’s description of the energy of a “cubic mile” of vibrating matter moving with the wave front (Thomson 1855), a concept nowadays expressed as an energy flux (Wu 1966; Boatwright and Choy 1986). However, Mendenhall uses grossly inadequate estimates of particle velocities (on the order of 15 cm/s), and of the cross section and “thickness” of the wavefront, taken as \(10^4\) square miles and 1 mile, respectively. Thus the surprising accuracy of his result stems from the fortuitous compensation of fatally erroneous assumptions.

  2. 2.

    By contrast, the first calculation of the energy released by an earthquake whose methodology would be upheld under present standards goes back to Reid (1910). In his comprehensive report on the 1906 San Francisco earthquake, Reid proposed a value of \(1.75 \times 10^{24}\) erg, based on an estimate of the forces necessary to offset the strain accumulated around the fault. This number is about 10 times too large given generally accepted values of the event’s seismic moment (Wald et al. 1993), but its computation is nevertheless remarkable.

    A year later, Reid embarked on measuring the energy of 12 additional earthquakes from an analysis of the areas of their Rossi–Forel intensity III isoseismals, scaled to that of the 1906 San Francisco earthquake, an idea originally found in Milne (1898). Reid does comment on the apparent scatter of his results, and in a truly visionary discussion, identifies probable sources of errors which would be described in modern terms as source radiation patterns, departure from earthquake scaling laws, and even the preferential attenuation of shear waves by the asthenosphere (Reid 1912).

  3. 3.

    The next reference to the computation of an earthquake’s energy seems to be the work of Golitsyn (1915) on the Sarez earthquake of 18 February 1911Footnote 1 in present-day Tajikistan, which was later assigned a magnitude of 7\(\frac{3}{4}\) by Gutenberg and Richter (1954). Golitsyn’s (1915) computation is important, because it is the first one making use of a recorded seismogram. It was then revised and discussed, first by Klotz (1915) and later by Jeffreys (1923, 1929), and it became part of the small original dataset used by Gutenberg and Richter (1936) in their first attempt to relate magnitude to energy [see (6) in this list]. An extensive study of the Sarez earthquake was recently published by Ambraseys and Bilham (2012), who recomputed a surface-wave magnitude \(M_{\mathrm{s}} = 7.7\), but did not perform any waveform analysis. These numbers would suggest a moment of about \(5 \times 10^{27}\) dyn cm, under the assumption that the event follows scaling laws. We were able to obtain an independent, modern, estimate of the moment of the earthquake by applying the \(M_{\mathrm{m}}\) algorithm (Okal and Talandier 1989, 1990) to original Love and Rayleigh wavetrains recorded at De Bilt and Uppsala, yielding an average value of \(M_0 = 3 \times 10^{27}\) dyn cm. As detailed by Ambraseys and Bilham (2012), the earthquake was accompanied by a catastrophic landslide, later surveyed by the Imperial Russian Army at an estimated volume of 2.5–3.5 \(\text {km}^{3}\), which dammed the Murgab River, creating Sarez Lake, a 17-\(\text {km}^3\) reservoir (Shpil’ko 1914, 1915); incidentally, the potential failure of the resulting natural Usoi Dam remains to this day a significant hazard in the region (Lim et al. 1997; Schuster and Alford 2004), especially in view of recent large-scale seismic activity in its neighborhood (Elliott et al. 2017; Negmatullayev et al. 2018).

    Golitsyn’s purpose in computing the energy of the seismic waves was to address the question of the causality of the landslide, i.e., was it a result of the earthquake or the opposite, the argument being that, if the slide was caused by the earthquake, it should have only a fraction of the seismic energy released. Golitsyn’s (1915) work constituted a significant improvement on Mendenhall’s (1888) in that he used actual seismograms (in this case Rayleigh waves at Pulkovo) to compute an energy flux, which he then integrated over the observed duration of the wavetrain, and, mistakenly as noticed by Jeffreys (1923), over the lower focal hemisphere, to derive the energy released at the source. Golitsyn (1915) obtained a value of \(4.3 \times 10^{23}\) erg, comparable to his estimate of (2–6) \(\times 10^{23}\) erg for the energy of the rockslide, and concluded that the slide was “not the consequence, but the cause of the earthquake.” Once again, and remarkably, his value of the seismic energy is not unreasonable, being only ten times larger than expected under the assumption of scaling laws (Boatwright and Choy 1986; Newman and Okal 1998) applied to our estimate of the seismic moment (\(3 \times 10^{27}\) dyn cm), but as we will show, this remains a coincidence, resulting from a number of compensating errors.

    In what amounts to a translation of Golitsyn’s paper, Klotz (1915) revised the seismic energy estimate slightly upwards, to \(7 \times 10^{23}\) erg, but offered no definitive comment on the matter of the possible trigger. By contrast, Jeffreys (1923) argued that Golitsyn’s (1915) computation was erroneous, since he had not taken into account the decay of Rayleigh wave amplitudes away from the Earth’s surface, which invalidates the integration over the focal hemisphere. Having corrected the calculation, and performed a cylindrical, rather than spherical, integration around the focus, Jeffreys (1923) came up with a considerably lower value of the seismic energy of the Rayleigh waves of only \(1.8 \times 10^{21}\) erg. To this he added a much smaller contribution from the S waves, estimated at \(9 \times 10^{19}\) erg.

    In the context of modern theories and energy computations, we can point to a number of fatal shortcomings in Jeffreys’ (1923) arguments, the most obvious one being that he assigns more energy to Rayleigh waves than to S waves. Indeed, we now know that most of the energy radiated by a seismic source is initially carried by high-frequency S waves, but those attenuate so fast in the far field that their contribution must be calculated by scaling that of the less attenuated P waves (Boatwright and Choy 1986); note that Jeffreys (1923) neglects anelastic attenuation altogether. Furthermore, he uses the model of a harmonic oscillator (despite expressing some reservations in this respect), whereas modern computations using digital data show that the major contribution to the energy integral at teleseismic distances is usually around 1 Hz. In his surface wave calculation, Jeffreys (1923) uses a single period of 14 s, and by ignoring other spectral contributions, once again underestimates the final energy of the wave, by a factor of about six with respect to the theoretical value predicted for an earthquake of that size [Okal (2003); Eq. (45), p. 2209]. In short, Golitsyn (1915) was grossly overestimating the Rayleigh energies, but we now know that they carry only a fraction of the seismic energy released, so that in the end, his result might have been correct, while Jeffreys (1923) was underestimating both Rayleigh and S energies, by considering only single frequencies. A scientific exchange between Golitsyn and Jeffreys would certainly have been enlightening, but unfortunately, Prince Golitsyn died of natural causes on 17 May (n.s.) 1916, aged only 54.

    Incidentally, we now understand that the whole argument about causality was totally flawed, since the two phenomena considered (the earthquake and the rockslide) express the release of two forms of potential energy of a different nature, one elastic and the other gravitational. If one phenomenon simply triggers the other, the relative amounts of energy released by the two processes are unrelated, since they come from different energy reservoirs.Footnote 2 As such, an earthquake could conceivably trigger a landslide more energetic than itself (this is possibly what happens during “orphan” slides for which the triggering mechanism is simply too small to be detected), and the converse might also be envisioned, i.e., a landslide triggering a more energetic earthquake in a tectonically ripe environment.

    Indeed, the interpretation of the 1911 Sarez earthquake by Golitsyn (1915) and Jeffreys (1923) as being due to the rockslide was questioned both by Oldham (1923), who argued that the earthquake source was normal (and in particular could not be superficial) on account of its isoseismal distribution and of its aftershock sequence, and by Macelwane (1926), based on a comparison with the case of the Frank, Alberta slide of 29 April 1903, for which no evidence of triggered seismicity could be found (admittedly for a significantly smaller, if still massive, rockslide, and during the dawn years of instrumental seismology). Jeffreys (1923) discusses in considerable detail the transfer of energy from a small body (the slide) falling on a big one (the Earth), and proposes a figure of 1/300 for its efficiency. His approach is correct only under his assumption “that the blow to the ground caused by the fall might have been the cause of the seismic disturbance,” and thus that the elastic energy carried by the seismic waves was traceable to the deformation of the Earth upon the impact of the slide, and not to the release of tectonic strain independently accumulated during the interseismic cycle. The low efficiency of this mode of triggering was verified in the case of the collapse of the World Trade Center on 11 September 2001, for which Kim et al. (2001) proposed a value of between \(10^{-4}\) and \(10^{-3}\). Incidentally, those authors also verified that the character of the seismic waves and the geometry of the spreading area differed significantly from those of a traditional earthquake, thus supporting in retrospect Oldham’s (1923) criticism of Golitsyn’s model for the 1911 Sarez events.

  4. 4.

    A few years later, in the second edition of The Earth, Jeffreys (1929) listed several additional values of earthquake energies, apparently all obtained from S waves: \(10^{21}\) erg for the Montana earthquake of 28 June 1925, only 10 times smaller than suggested by (1) based on the magnitude of 6\(\frac{3}{4}\) later assigned by Gutenberg and Richter (1954); \(5 \times 10^{16}\) erg for the much smaller Hereford, England earthquake of 15 February 1926, deficient by a factor of 200 when applying (1) to the earthquake’s present magnitude estimate, \(M_{\mathrm{L}} = 4.8\) (Musson 2007); and about \(10^{19}\) erg for the Jersey event of 30 July 1926, for which no definitive magnitude is available. He also lists a value of \(5 \times 10^{16}\) erg for the Oppau, Germany explosion of ammonium-nitrate-based fertilizer on 21 September 1921 (Wrinch and Jeffreys 1923), although we would nowadays question the concept of using S waves to quantify the source of an explosion.

  5. 5.

    In 1935, Richter published his landmark paper introducing the concept of magnitude, in which he cautioned that “[its] definition is in part arbitrary; an absolute scale, in which the numbers referred directly to shock energy [...] measured in physical units, would be preferable” (Richter 1935). Notwithstanding this disclaimer, Richter could not resist the temptation of relating his newly defined magnitude scale to physical units, using Jeffreys’ (1929) energy estimate for the 1925 Montana earthquake. Since the latter was outside the domain of his original study (limited to California and Nevada), Richter assigned it \(M = 7.5\) based on an interpretation of its isoseismals, and proceeded to scale magnitude to energy. While not expressed verbatim, the relation

    $$\begin{aligned} \log _{10} E = 2 M + 6 \end{aligned}$$
    (8)

    can be inferred from the several examples discussed on pp. 26–27 of Richter (1935). The slope \(Q = 2\) in Eq. (8) is not explained, probably because it looked obvious to Richter that energy should scale as the square of amplitude, under the model of a harmonic oscillator.

  6. 6.

    In collaboration with Richter, Gutenberg wasted no time expanding the concept of magnitude, and within one year the two Caltech scientists had published the third in their series of “On Seismic Waves” papers (Gutenberg and Richter 1936), in which they extended the concept of magnitude to teleseismic distances and thus to earthquakes worldwide, using exclusively surface waves. In parallel, they proposed the first formal relation between magnitude and energy [their Eq. (15) p.124]:

    $$\begin{aligned} \log _{10} E = 2 M + \log _{10} E_0, \end{aligned}$$
    (9)

    \(E_0\) being the energy of a shock of magnitude 0, “the smallest ones recorded” (of course we now know that events of negative magnitude can exist and be recorded). They also laid the foundations for the extension of the magnitude concept to teleseismic distances, in the process revising down the magnitude of the 1925 Montana earthquake to 6.8 [later transcribed as 6\(\frac{3}{4}\) in Gutenberg and Richter (1954)]. This has the effect of increasing the constant 6 in (8) [or \(E_0\) in (9)], but Gutenberg and Richter (1936) do not publish a definitive value of \(E_0\), indicating rather that it ranges between \(10^7\) and \(3 \times 10^8\) erg. On the other hand, they specifically justify the slope \(Q = 2\) in (9), stating “Since the magnitude scale is logarithmic in the amplitudes, doubling the magnitudes gives a scale logarithmic in the energies.”

    Note that, up to this stage, Gutenberg and Richter do not compute their own energies based on any personal analytical approach, but rather use published values, e.g., from Reid (1910) or Jeffreys (1929); while they write “Energies have been found for earthquakes by many investigators using different methods,” the modern reader is left at a loss to figure out exactly who these scientists were and what their methods might have been.

  7. 7.

    This situation apparently changes in the next few years, resulting in the first compilation of magnitudes derived from individual measurements of ground motion, as part of the first version of Seismicity of the Earth, published as a “Special Paper” of the Geological Society of America (Gutenberg and Richter 1941), although earthquakes remain grouped in “magnitude classes,” from a (\(M \approx 5\)) to e (\(M \approx 8\)).

  8. 8.

    Gutenberg and Richter’s (1942) next paper on the subject makes a number of fundamental breakthroughs. A critical discussion of the analytical parts of this paper will be given in Sect. 4.1. We present here only the general milestones in that contribution. First, the authors work out a detailed expression of the energy of the source based on a model of the vertical energy flux at the epicenter, expressed as a function of the maximum acceleration \(a_0\) observed above a point source buried at depth h [their Eq. (27) p. 178]:

    $$\begin{aligned} \log _{10} E = 14.9 + 2 \log _{10} h + \log _{10} t_0 \,+ 2 \log _{10} T_{0} + 2 \log _{10} a_0. \end{aligned}$$
    (10)

    where \( t_0 \) is the duration of the signal, and \(T_0 \) its (dominant) period. In this respect, and as indicated by the title of their paper, the authors are clearly motivated by relating magnitude to maximum intensities in the epicentral area, which are generally related to accelerations, rather than displacements. Similarly, the experimental data available to them in epicentral areas came primarily in the form of strong-motion accelerograms, hence the emphasis on acceleration \(a_0\), rather than displacement \(A_0,\) in (10). Second, and perhaps more remarkably, they propose that parameters such as the duration \(t_0\) or the prominent period \(T_0\) of the wavetrains are not constant but rather vary with the size of the earthquake. As such, their study represents the first introduction of the concept of a scaling law for the parameters of a seismic source. The authors suggest that \(\log _{10} t_0\) varies as \(\frac{1}{4} M\) [their Eq. (28)], albeit without much justification. In modern terms, \(t_0\) would be called a source duration, controlled by the propagation of rupture along the fault, and hence expected to grow like \(M_0^{1/3}\), which would be reconciled with Gutenberg and Richter’s (1942) Eq. (28) if \(\log _{10}~M_0\) were to grow like \(\frac{3}{4}~M\), an unlikely behavior. The fundamental empirical observation in Gutenberg and Richter (1942) is that the maximum acceleration \(a_0\) at the epicenter grows slower with magnitude than the amplitude of ground motion used to compute M [their Eq. (20) p. 176]:

    $$\begin{aligned} M = 2.2 + 1.8~\log _{10}~ a_0, \end{aligned}$$
    (11)

    from which they infer a slope of 0.22 between \(\log _{10}~T_0\) and M [their Eq. (32), p. 179]. \(T_0\) represents the dominant period of an accelerogram at the epicenter, which in modern terms may be related to the inverse of a corner frequency, itself controlled by a combination of source duration and rise time, and thus \(T_0\) would be expected to grow like \(M_0^{1/3}\) (Geller 1976), which is reconciled with Eq. (32) of Gutenberg and Richter (1942) if \(\log _{10}~M_0\) were to grow like \(\frac{2}{3}~M\), again an unlikely behavior. When substituting the dependence of \(t_0\) and \(T_0\) with M into their Eq. (27), the authors finally obtain their Eq. (34) p. 179:

    $$\begin{aligned} \log _{10}~E = 8.8 + 2~ \log _{10}~h + 1.8~ M \end{aligned}$$
    (12)

    as a replacement for the previous version featuring a slope \(Q = 2\) (Gutenberg and Richter 1936). Note that this formula would correspond to (10) for \(h = 40\) km, which is significantly greater than the typical depth of the Southern California earthquakes that constituted the dataset used by the authors for most of their investigations, and would even be greater than the probable depth of focus of most of their teleseismic events. They correctly assign the reduction in slope to issues nowadays described as evolving from source finiteness, even though these effects remain underestimated by modern standards.

    Gutenberg and Richter (1942) then proceed to estimate (only to the nearest order of magnitude) energies for 17 large earthquakes, based on intensity reports at their epicenter, or on their radius of perceptibility, both approaches being related to epicentral ground acceleration, and the latter reminiscent of Milne’s (1898) methodology. The values proposed are significantly overestimated (reaching \(10^{26}\) to \(10^{27}\) erg), which may result from underestimating the effect of source finiteness, ignoring in particular the existence of several corner frequencies, a phenomenon leading to a saturation of acceleration [ground accelerations in excess of one g are not observed systematically for great earthquakes, but rather in certain tectonic environments in the case of even moderate, but “snappy,” shocks (e.g., Fry et al. 2011)]. As a result, the growth of E with M [the slope \(Q = 1.8\) in Eq.  (12)] remains too strong, and this eventually overpredicts energy values, especially for large events. However, from a relative standpoint, it is remarkable that, in addition to the truly great earthquakes in Assam (1897) and San Francisco (1906), the three events given the strongest energies by Gutenberg and Richter (1942) are the shocks of 1926 off Rhodos, 1939 in Chillán, Chile, and 1940 in Vrancea, Romania. The Chillán event has been shown by Okal and Kirby (2002) to feature an anomalously high energy-to-moment ratio, a property expected to be shared by the other two on account of their location as intermediate-depth intraslab events (Radulian and Popa 1996; Ambraseys and Adams 1998). Finally, and in retrospect, a significant problem with the multiple regression (10) underlying Gutenberg and Richter’s (1942) model is that it predicts a logarithmic discontinuity in E as the source reaches the surface (\(h \rightarrow 0\)). This limitation reflects the simplified model of a point source, which is rendered invalid as soon as the fault’s width W and length L become comparable to h.

  9. 9.

    In 1945, Gutenberg published three papers establishing the computation of magnitudes from surface and body waves on a stronger operational basis. First, in Gutenberg (1945a), he formalized the calculation of a surface-wave magnitude \(M_{\mathrm{s}}\) by introducing the distance correction \(1.656~ \log _{10}~\Delta \) [his Eq. (4)]. This slope, still of an empirical nature, is significantly less than suggested by Gutenberg and Richter (1936); for example, the 42 points at distances less than 150\(^{\circ }\) on their Fig. 6 (p. 120) regress with a slope of \((-2.08 \pm 0.09)~\log _{10}~\Delta \) (or \((-1.94 \pm 0.15)\) for \(\Delta < 55^{\circ }\)). The new slope (rounded to 1.66) was to be later inducted (albeit after considerable debate) into the Prague formula for \(M_{\mathrm{s}}\) (Vaněk et al. 1962); while never derived theoretically, it was justified as an acceptable empirical fit to a modeled decay of 20-s Rayleigh wave amplitudes with distance (Okal 1989).

    In the second paper, Gutenberg (1945b) used the concept of geometrical spreading, initially described by Zöppritz [and written up as Zöppritz et al. (1912) following his untimely death], to extend to teleseismic distances the calculation of an energy flux pioneered at the epicenter by Gutenberg and Richter (1942), thus defining a magnitude from the body-wave phases P, PP and S. The most significant aspects of this paper are (1) the difficulty of the author to obtain both local and distant values of magnitudes for the same event; (2) the necessity to invoke station corrections reflecting site responses; (3) the relatively large values of the periods involved (up to 7 s even when considering only P phases); and (4) the introduction of correction terms for magnitudes larger than seven, clearly related to the effects of source finiteness.

    In the third paper, Gutenberg (1945c) attacked the problem of deep earthquakes, and produced (for P, PP, and S) the first versions of the familiar charts for the distance–depth correction, generally referred to as \(Q ( \Delta , h )\), but as \(A ( \Delta , h)\) in early papers,Footnote 3 to be applied to the logarithm of ground motion amplitude. After this correction was significantly adjusted by Gutenberg and Richter (1956b) [their Fig. 5, as compared with Gutenberg’s (1945c) Fig. 2], it was to be retained in the Prague formula for \(m_{\rm b}\) (Vaněk et al. 1962), and has remained to this day the standard for the computation of magnitudes from P waves. By imposing that the same earthquake should have the same magnitude when measured at different distances, it was possible, at least in principle, to obtain empirically the variation of \(Q (\Delta , h)\) with distance, and in particular to lock the body-wave magnitude scale with Richter’s (1935) original one. However, the dependence with depth obviously required a different approach. In the absence of a physical representation of the source by a system of forces, Gutenberg (1945c) elected to impose that shallow and deep earthquakes of similar magnitude should have the same radiated energy, which allowed him to obtain the first version of the \(A ( \Delta , h )\) chart (his Fig. 2), through a generalization of the concept of geometrical spreading. We now understand that this approach tacitly assumes a constant stress drop \(\Delta \sigma \), which may not be realistic for deep sources at the bottom of subduction zones.

    In hindsight, Gutenberg’s (1945c) approach suffered from ignoring the presence of the low-velocity zone in the asthenosphere, as well as of the seismic discontinuities in the transition zone, which result in significant distortion of slowness as a function of distance, and hence of its derivative controlling geometrical spreading. In addition, the attenuation structure of the mantle was at the time little if at all understood, which later prompted Veith and Clawson (1972) to propose a new, more streamlined, distance–depth correction, \(P ( \Delta ,h)\), based on a more modern representation of attenuation as a function of depth; however, they elected to use a structure derived from Herrin’s (1968) tables, which does not include mantle discontinuities, even though the latter were documented beyond doubt by the end of the 1960s (e.g., Julian and Anderson 1968). Note finally that Gutenberg (1945b, Eq. 18, p. 66; 1945c, Eq. (1), p. 118) introduces the ratio A/T rather than A in the computation of \(m_{\rm B}\), presumably motivated by the quest for a closer relation between magnitude and energy, the latter involving ground velocity rather than displacement in its kinetic form. This change from A to A/T is potentially very significant, since T is expected to vary with earthquake size.

  10. 10.

    The next decade saw the compilation of the definitive version of Seismicity of the Earth, published as a monograph in two subsequent editions (Gutenberg and Richter 1949, 1954), in which earthquakes are assigned individual magnitudes. In their introduction to the final edition (p. 10), the authors argued that the value \(Q = 1.8\), derived in Gutenberg and Richter (1942) and used in the first one, overestimated energies, and suggested the lower value \(Q = 1.6\); however, they did not revise their discussion of energy, perhaps because they felt that their new formula (5), which they had not yet formally published (and eventually would never publish), was itself not definitive.

  11. 11.

    In the meantime, the scientific value of the concept of magnitude as a quantification of earthquake sources had become obvious to the seismological community, and many investigators developed personal, occasionally competing, algorithms, and more generally offered comments on Gutenberg and Richter’s ongoing work. Among them, Bullen (1953) suggested that energies derived from (12) were too large for the largest magnitudes, which would go in the direction of the reduction of Q. On the other hand, following Jeffreys’ (1923) criticism of Golitsyn’s (1915) calculation of the energy of Rayleigh waves, Båth (1955) proposed to correct it by restricting the vertical cross-section of the teleseismic energy flux (which he evaluates at the Earth’s surface) to a finite depth, H, which he took as 1.1 times the wavelength \(\Lambda \). While this approach is indeed more sound than Golitsyn’s, it is still limited to Rayleigh waves, and thus does not quantify the contribution of body waves, even though Båth (1955) does state that the latter is probably important. Finally, his empirically derived slope Q reverts to the value 2, presumably because of his exclusive use of narrowband instruments, essentially converting the Earth’s motion into that of a harmonic oscillator. Similarly, Di Filippo and Marcelli (1950) obtained \(Q = 2.14\) from a dataset of Italian earthquakes using Gutenberg and Richter’s (1942) methodology, this higher value reflecting the departure of the dominant period T from the scaling proposed by Gutenberg and Richter. Sagisaka (1954) attempted to reconcile Reid’s (1910) approach, based on integrating the energy of the elastic deformation, with Gutenberg and Richter’s (1942) values derived from magnitudes in the case of several Japanese earthquakes (both shallow and deep), and noticed that the latter were consistently excessive, e.g., by a factor of at least 100 in the case of the great Kanto earthquake of 1923. We have proposed (Okal 1992) a seismic moment of \(3 \times 10^{28}\) dyn cm for that event, which under modern scaling laws, would suggest an energy of about \(4 \times 10^{23}\) erg, in general agreement with Sagisaka’s (1954) figure, and confirming that (12) overestimates energies, at least for large events.

  12. 12.

    The year 1956 sees no fewer than three new publications on the subject by Gutenberg and Richter. Gutenberg (1956) and Gutenberg and Richter (1956b) constitute in particular their last efforts at trying to reconcile the various magnitude scales they had arduously built over the previous 20 years, into a single “unified” magnitude. In this framework, they proceed to develop empirical relations between magnitude scales (and hence with energy) which become more complex, and in particular involve nonlinear terms. We now understand that, because they were measuring different parts of the seismograms at different periods and the scaling laws underlying the concept of a unified magnitude were distorted differently [different corner frequencies apply to different wavetrains (Geller 1976)], it was impossible for them to find a simple relation between magnitudes that would apply for all earthquake sizes. For example, we note that the relation proposed in Eq. (20), p. 134 of Gutenberg and Richter (1956a):

    $$\begin{aligned} \log _{10}~E = 9.4 + 2.14~M -0.054~ M^{2} \end{aligned}$$
    (13)

    would regress with a slope \(Q = 1.6\) between magnitudes 1 and 9, but only \(Q = 1.4\) between the more usual values of 5.5 and 8.5. We note that Gutenberg (1956) elects to align a “unified” magnitude m on his body-wave \(m_{\rm B}\). This may sound surprising since we now understand that, being higher frequency than a surface-wave magnitude, \(m_{\rm B}\) is bound to suffer stronger distortion from the effect of source finiteness, and saturate earlier (Geller 1976). However, as suggested in Gutenberg and Richter (1956a), Gutenberg may have been motivated by the goal of matching Richter’s (1935) initial scale, which would be regarded today as a local magnitude \(M_{\mathrm{L}}\), measured at even higher frequencies, and therefore more closely related to a body-wave scale. Also, Richter’s initial magnitude used torsion records with deficient response at longer periods.

    In many respects, the third paper (Gutenberg and Richter 1956a) [whose preparation apparently predated Gutenberg and Richter (1956b)] stands in a class by itself, and constitutes a superb swan song of the authors’ collaboration on this subject. In particular, it presents a review of previous efforts at extracting energy from seismograms, and revises the power laws relating phase duration \(t_0\) and dominant period \(T_0\) to magnitude. The authors now propose \(\log _{10}~t_0 = 0.32~ M - 1.4\) [rather than \(0.25~M - 0.7\) in Gutenberg and Richter (1942)], which would be in remarkable agreement with modern scaling laws predicting a slope of 1/3, but only in the absence of source finiteness. However, the dominant period \(T_0\) in the seismogram is described as being essentially independent of magnitude. The authors suggest an algorithm for the computation of seismic energy from P waves (“integrating \((A_0/T_0 )^{2}\) over the duration of large motion of the seismogram”) which constitutes a blueprint for later and definitive algorithms, such as Boatwright and Choy’s (1986), but of course, in the absence of digital data, they can only propose an approximate methodology (integrating the maximum value of velocity squared over an estimated duration of the phase), which by modern standards, keeps too much emphasis on representing the source as a monochromatic oscillator.

    In the course of what was to be their last contribution on this topic, Gutenberg and Richter (1956a) make a number of visionary statements, notably concerning the limitations inherent in the model of a point source, which they argue make it uncertain that “there is a one-to-one correspondence between magnitude [...] and the total energy radiation.” They conclude by emphasizing the necessity of a complete revision of the magnitude scale, which they claim to be in preparation, based on ground velocity (A/T) rather than amplitude, this statement reflecting once again both the pursuit of Richter’s (1935) original goal of relating magnitudes to the physical units of energy, but also the common and erroneous model of a monochromatic oscillator.

    Beno Gutenberg died suddenly on 25 January 1960, and Charles Richter, the inventor of the magnitude concept, never published any further contribution on the topic of magnitudes or seismic energy.

  13. 13.

    Developments regarding magnitude scales and the computation of seismic energy following Gutenberg’s death are better known and will be only briefly summarized here. The torch was then passed to Eastern Bloc scientists. Following the International Union of Geodesy and Geophysics (IUGG) meeting in Helsinki in 1960, Czech and Russian scientists met in Prague in 1961, and standardized the calculation of body- and surface-wave magnitudes, respectively, \(m_{\rm b}\) (measured in principle at 1 s) and \(M_{\mathrm{s}}\) (at 20 s) (Vaněk et al. 1962).Footnote 4 Both algorithms were based on the use of A/T, which the authors specifically justified as linked to energy, thus once again tacitly assuming the model of a monochromatic oscillator. In operational terms, a major difference arises between \(m_{\rm b}\) and \(M_{\mathrm{s}}\): because of the prominence of 20-s waves strongly dispersed along oceanic paths, and suffering little attenuation in the crust, and of the universal availability of narrowband long-period instruments peaked around its reference period, \(M_{\mathrm{s}}\) has indeed been measured close to 20 s, while measurements of \(m_{\rm b}\) have often been taken at periods of several seconds, which considerably affects not only its relation to physical source size (i.e., seismic moment and hence energy), but also its saturation for large earthquakes. As a consequence, the Prague-standardized \(M_{\mathrm{s}}\) has been more comparable to previous versions of surface-wave magnitudes than its \(m_{\rm b}\) counterpart may have been to previous body-wave scales such as Gutenberg’s (1945b) \(m_{\rm B}\), for which measurements were often taken at more variable periods, apparently as long as 7 s.

  14. 14.

    In the meantime, an interesting development had taken place in the then Soviet Union with the establishment of the so-called K-class scale for regional earthquakes. We refer to Rautian et al. (2007) and Bormann et al. (2012) for detailed reviews, and will only discuss here a number of significant points. That work originated with the need to compile and classify the very abundant seismicity occurring in the area of Garm, Tajikistan, notably in the aftermath of the Khait earthquake of 10 July 1949. It is remarkable that, even though it was conducted under what amounted to material and academic autarky (Hamburger et al. 2007), it nevertheless proceeded with great vision, and was rooted, at least initially, in the principles which would later constitute the foundations of modern algorithms using digital data (e.g., Boatwright and Choy 1986).

    Initial work on the K-class was clearly intent on deriving a physically rigorous measurement of energy from seismic recordings by setting an a priori relation with energy (Bune 1955), later formalized as

    $$\begin{aligned} K = \log _{10}~E \end{aligned}$$
    (14)

    with E in joules (Rautian 1960). This approach differed fundamentally from Richter’s (1935) empirical definition of magnitude, and indeed pursued the work of Golitsyn (1915) by seeking to compute an energy flux at the receiver (Bune 1956). It is interesting to note that Bune clearly mentions the need to work in the Fourier domain, but in the absence of digital data (and of computational infrastructure), he resorted to a time-domain integration over a succession of individual wavetrain oscillations, which may be justifiable under the combination of Parseval’s theorem and the presence of strong systematic dispersion during propagation; however, the latter may not be sufficiently developed at the short distances involved. Faced with these challenges, Rautian (1960) reverted to the simplified practice of adding the absolute maximum amplitudes of P and S traces, in order to define an adequate “amplitude” to be used in the computation of the energy flux. She also pointedly recognized the influence of instrumentation, and later the variability required when exporting the algorithm to geologically different provinces (e.g., Fedotov 1963; Solov’ev and Solov’eva 1967), both of which can be attributed to a filtering effect (by instrument response and regional anelastic attenuation), before interpreting a complex source spectrum through a single number in the time domain.

    Attempts to relate K-class values to magnitudes (as measured in the Soviet Union) were marred by the fact that the former was built for small events recorded at regional distances, while the latter, based on the application of the Prague formula to Love waves (Rautian et al. 2007), tacitly assumed larger shocks recorded at teleseismic distances. Nevertheless, the initial work of Rautian (1960) and systematic formal regressions later performed by Rautian et al. (2007) indicate an average slope \(Q = 1.8~\pm ~0.3\) [note, however, that in an application to larger earthquakes, Rautian (1960) suggested the use of a “total” seismic energy \(E_0\), growing slower than E (in fact like \(E^{2/3}\); her Eq.  (19)), leading to a lower value of \(Q = 1.1 \)]. The origin of the relatively high value \(Q = 1.8\) is unclear, but probably stems from the simplified algorithm used by Bune (1956) and Rautian (1960), which may amount to narrow-bandpass filtering, thus approaching the conditions of a monochromatic signal (\(Q = 2\)).

    In retrospect, the clearly missing element in the K-class algorithm is the key link between the energy flux at the station and the radiation at the source, which is now understood in the model of ray tubes, and quantified using a formula expressing geometrical spreading, e.g., in the algorithm later developed by Boatwright and Choy (1986). This concept, initially published as Zöppritz et al. (1912), could not be applied to the regional phases (\(P_n\), \(S_n\), \(L_g\)) on which the K-class was built, which motivated Rautian (1960) to develop other, more empirical algorithms.

  15. 15.

    Further significant progress could be made in the early 1960s on account of several theoretical developments. First, the double-couple was introduced by Vvedenskaya (1956), as the physical representation of a dislocation along a fault in an elastic medium, and later formalized by Knopoff and Gilbert (1959); its amplitude, the scalar moment \(M_0,\) materializes the agent, measurable in physical units, to which Richter (1935) had lamented he had no access. In addition, Ben-Menahem (1961) published a landmark investigation of the effect of source finiteness on the spectrum of seismic surface waves, introducing the concept of directivity, which was to prove critical in resolving the “energy paradox” exposed in Sect. 2.1. Based on the representation theorem, and considering a source of finite dimension, Haskell (1964) gave the first expression for the energy radiated into P and S waves by a finite source of arbitrary geometry, which he found proportional to

    $$\begin{aligned} E \propto \frac{ W^2 ~( \Delta u )^{2}~~L }{ \tau ^2}, \end{aligned}$$
    (15)

    where we have rewritten his Eq. (39) (p. 1821) with the more usual notation W for fault width, L for fault length, \(\Delta u\) for fault slip, and \(\tau \) for rise time. Under generally accepted scaling laws (Aki 1967), we anticipate that E would indeed be proportional to \(M_0\), in agreement with Vassiliou and Kanamori’s (1982) later work. Haskell’s paper is remarkable in that it is the first one to express seismic energy in the frequency domain by means of a Fourier decomposition. By then, efficient Fourier-transform algorithms were becoming available (Cooley and Tukey 1965), but digitizing analog records at time samplings adequate to compute energy fluxes remained a formidable challenge, which would be resolved only with the implementation of analog-to-digital converters in the 1970s (Hutt et al. 2002); it is not fortuitous that the only spectral data published in Haskell (1964) and its statistical sequel (Haskell 1966) are limited to \(f < 12 \) mHz [his Fig. 2, reproduced from Ben-Menahem and Toksöz (1963), and incidentally relating only to source phase]. The same improvement in computational capabilities in the 1960s had led to the systematic development of ray-tracing techniques which revitalized the estimate of energy fluxes in the far field [whose idea, we recall, can be traced all the way back to Mendenhall (1888)], based in particular on the concept of geometrical spreading defined by Zöppritz et al. (1912). In particular, Wu (1966) laid the bases for the computation of seismic energy from digitized teleseismic body waves, but in practice, it could be applied only to long-period records until digital data became available in the 1970s. The state of affairs in the mid-1960s is detailed in Båth’s (1967) review paper, which rather surprisingly does not mention Ben-Menahem’s (1961) work on source finiteness, which was to play a crucial role in any understanding of magnitude and energy for large earthquakes.

  16. 16.

    By contrast, Brune and King (1967) addressed the question of the influence of finiteness for large earthquakes when using conventional magnitudes measured at insufficient periods (e.g., \(M_{\mathrm{s}}\)), and embarked on computing “mantle” magnitudes using 100-s Rayleigh waves (their proposed \(M_{\mathrm{M}}\), which we call here \(M_{100} \)), thus becoming the first to propose a systematic use of mantle waves to quantify large earthquake sources. They documented a dependence of \(M_{100}\) on \(M_{\mathrm{s}}\) (actually more precisely on an undefined M) in the shape of a stylized “S”, whose intermediate regime corresponds to events with initial corner periods between 20 and 100 s. Brune and Engen (1969) complemented this study with measurements on Love waves, this time taken in the frequency domain in order to separate individual periods in the absence of dispersion. Remarkably, Brune and Engen (1969) recognized two striking outliers in their \((M_{100}: M)\) dataset: they found the 1933 Sanriku event deficient in \(M_{100}\) despite a record \(M =8.9\) assigned by Richter (1958); we have documented this shock as a “snappy” intraplate earthquake, which violated the scaling laws tacitly implied by Richter when converting a long-period magnitude into a “unified” one based on a short-period algorithm [see Okal et al. (2016) for a detailed discussion]. On the opposite side, Brune and Engen (1969) noticed that the 1946 Aleutian earthquake, now known as a “tsunami earthquake” of exceptional source slowness (Kanamori 1972; López and Okal 2006), featured a mantle magnitude significantly larger than its standard \(M_{\mathrm{s}}\). In a visionary statement, which unfortunately remained largely unnoticed at the time, they stressed the potential value of a mantle magnitude in the field of tsunami warning (Brune and Engen 1969; p. 933).

  17. 17.

    The connection between directivity, as introduced by Ben-Menahem (1961), and full saturation of magnitude scales measured at constant periods was described in the now classic papers by Kanamori and Anderson (1975) and Geller (1976). As mentioned in Sect. 2, Geller’s last set of (unnumbered) equations on p. 1520 detail the relationship between \(M_{\mathrm{s}}\), measured at 20 s, and \(\log _{10} M_0\), and in particular the evolution of the slope between the latter and former from a value of 1 at low magnitudes, through 1.5 over a significant range of “large” earthquakes (\(6.76 \le M_s \le 8.12\)), a narrow interval where a slope of 3 is predicted, and a final saturation of \(M_{\mathrm{s}}\) at a value of 8.22. We reproduce these relations, complete with moment ranges, here:

    $$\begin{aligned}&M_{\mathrm{s}} = \log _{10} M_0 - 18.89 \quad \text {for}\quad \log _{10} M_0 \le 25.65 \,(M_{\mathrm{s}} \le 6.76), \end{aligned}$$
    (16a)
    $$\begin{aligned}&M_{\mathrm{s}} = \frac{2}{3} (\log _{10} M_0-15.51) \quad \text {for}\quad 25.65 \le \log _{10} M_0 \le 27.69 \,(6.76 \le M_{\mathrm{s}} \le 8.12),\end{aligned}$$
    (16b)
    $$\begin{aligned}&M_{\mathrm{s}}=\frac{1}{3}\,( \log _{10} M_0 - 3.33 ) \quad \text {for}\quad 27.69\,\le \log _{10} M_0\,\le 28.00 \,(8.12 \le \,M_{\mathrm{s}}\,\le \,8.22),\end{aligned}$$
    (16c)
    $$\begin{aligned}&M_{\mathrm{s}} = 8.22\quad ~\text {for}\quad \log _{10} M_0 \ge 28.00. \end{aligned}$$
    (16d)

    The combination of these equations with the relationships derived between \(m_{\rm b}\) and \(M_{\mathrm{s}} \) (Geller 1976; first set of unnumbered equations, p. 1520) leads to the following four-segment expression for the variation of \(m_{\rm b}\) with moment:

    $$\begin{aligned} m_{\rm b} & =\log _{10} M_0-17.56 \quad \text {for} \\ & \quad \log _{10} M_0 \le 21.75 \,(m_{\rm b}\,\le \,4.19),\end{aligned}$$
    (17a)
    $$\begin{aligned}&m_{\rm b} =\,\frac{2}{3} (\log _{10} M_0-15.47) \quad \text {for}\quad 21.75 \le \,\log _{10} M_0 \le 23.79 \,(4.19 \le \,m_{\rm b}\,\le \,5.55), \end{aligned}$$
    (17b)
    $$\begin{aligned}&m_{\rm b} =\,\frac{1}{3} (\log _{10} M_0 - 7.16 )\quad \text {for}\quad 23.79 \le \,\log _{10} M_0 \le 25.16 \,(5.55 \le \,m_{\rm b}\,\le \,6.00),\end{aligned}$$
    (17c)
    $$\begin{aligned}&m_{\rm b}=6.00\quad \text {for}\quad \log _{10} M_0 \ge 25.16. \end{aligned}$$
    (17d)

    These relations were obtained assuming a constant stress drop, a constant aspect ratio (L/W) of the fault, and constant particle and rupture velocities. Equations (16) and (17) are plotted in Fig. 1.

  18. 18.

    In his landmark paper, Kanamori (1977) introduced the concept of “moment magnitude” \(M_{\mathrm{w}}\) by casting an independently obtained bona fide scientific measurement of the seismic moment \(M_0\) (in dyn cm) into a magnitude scale through

    $$\begin{aligned} M_{\mathrm{w}} = \frac{2}{3} \Big [\log _{10} M_0-16.1 \Big ]. \end{aligned}$$
    (18)

    This definition of \(M_{\mathrm{w}}\) seeks to (1) relate \(M_{\mathrm{w}}\) to earthquake energy (hence the subscript “w”); and (2) make its values comparable to those (M) previously published, notably by Gutenberg and Richter. In doing so, it specifically assumes a ratio of \(0.5 \times 10^{-4}\) between energy and moment [Kanamori 1977, Eq. (4′) p. 2983], and also “Gutenberg and Richter’s energy–magnitude relation” (1). While the former can be derived under scaling laws, we have seen that the latter lacked a satisfactory theoretical derivation. By further seeking to ensure the largest possible continuity between \(M_{\mathrm{w}}\) and traditional M, Kanamori (1977) forces the factor 2/3 into (18) and thus tacitly assumes that the dataset of M is always taken in the size range where the magnitude has started to feel the effects of source finiteness, characterized by a slope of 1.5 in Fig. 1. This was further developed in the case of local magnitudes \(M_{\mathrm{L}}\) by Hanks and Kanamori (1979) and can also be applied conceptually to the case of the body-wave magnitude \(m_{\rm b}\). The relationship (18) is then predicted to give an estimate \(M_{\mathrm{w}}\) approaching a classical magnitude as long as the latter is computed exclusively in a domain where it has started to be affected by source finiteness, but with only one corner frequency (that relating to fault length L) lower than the reference frequency of that magnitude scale, thus resulting in a slope of 1.5 in Fig. 1. In other words, \(M_{\mathrm{w}}\) should coincide with a traditional magnitude measurement if and only if that magnitude is \(M_{\mathrm{s}}\) for reasonably large earthquakes (\( 25.65 \le \log _{10} M_0 \le 27.69\)), \(m_{\rm b}\) for smaller events (\( 21.75 \le \log _{10} M_0 \le 23.79\)), and presumably \(M_{\mathrm{L}}\) at even smaller moments. Otherwise, and notably for smaller events (e.g., if using \(M_{\mathrm{s}}\) for magnitude 5 events), one should expect a discrepancy between \(M_{\mathrm{w}}\) and \(M_{\mathrm{s}}\); this is indeed the subject of the “bias” reported by Ekström and Dziewoński (1988). In general terms, Fig. 1, on which we have superimposed (in solid green) the relationship predicted by (18), features such a trend, but the values of \(M_{\mathrm{w}}\) are smaller, by about 0.35 logarithmic units, than predicted by the dotted red and dashed blue lines. Similarly, in the regime for smaller-size events (slope of 1), the constant (18.89) relating \(M_{\mathrm{s}}\) to \(\log _{10}~M_0\) in (16a) is smaller than derived theoretically by Okal (1989) (19.46, a difference of 0.57 units) or obtained experimentally by Ekström and Dziewoński (1988) (19.24; 0.35 units). This observation is traceable to the modeling of the \(M_{\mathrm{s}} : M_0\) relationship by Geller (1976), copied here as Eq. (16); note in particular that the fit of these relations to the dataset in his Fig. 7 deteriorates significantly for earthquakes in the moment range \(10^{27}\) to \(10^{28}\) dyn cm. A possible explanation is Geller’s use of a relatively high \(\Delta \sigma = 50\) bar as an average stress drop. For this reason, we prefer to replace Eq. (16) with

    $$\begin{aligned}&M_{\mathrm{s}} = \log _{10} M_0- 19.46\quad \text {for}\quad \log _{10} M_0 \le 26.22 \,(M_{\mathrm{s}} \le 6.76),\end{aligned}$$
    (19a)
    $$\begin{aligned}&M_{\mathrm{s}} = \frac{2}{3} ( \log _{10} M_0-16.08)\quad \text {for}\quad 26.22 \le \log _{10} M_0 \le 28.26 \,(6.76 \le M_{\mathrm{s}} \le 8.12),\end{aligned}$$
    (19b)
    $$\begin{aligned}&M_{\mathrm{s}} = \frac{1}{3}\,( \log _{10} M_0 -\,3.90 )\quad \text {for}\quad 28.26\,\le \log _{10} M_0\,\le 28.56 \,(8.12 \le \,M_{\mathrm{s}}\,\le \,8.22),\end{aligned}$$
    (19c)
    $$\begin{aligned}&M_{\mathrm{s}} = 8.22\quad \text {for}\quad \log _{10} M_0 \ge 28.56, \end{aligned}$$
    (19d)

    shown in Fig. 2 as the thick red line superimposed on the background of Geller’s (1976) Fig. 7. Note that a better fit is provided to interplate earthquakes (solid dots), especially in the range of moments \(10^{27}\)\(10^{28}\) dyn cm, characterized by the slope of 2/3 in (19b); upon reduction of stress drop, corner frequencies are in principle lowered, which results in a slight displacement of saturation effects to higher moments, and a better agreement with (18). We similarly replace (17) with

    $$\begin{aligned}&m_{\rm b} = \log _{10} M_0-18.18\quad \text {for} \quad \log _{10} M_0 \le 22.36 \,(m_{\rm b}\,\le \,4.19),\end{aligned}$$
    (20a)
    $$\begin{aligned}&m_{\rm b} = \frac{2}{3} (\log _{10} M_0 - 16.08 )\quad \text {for}\quad 22.36 \le \,\log _{10} M_0 \le 24.41 \,(4.19 \le \,m_{\rm b}\,\le \,5.55),\end{aligned}$$
    (20b)
    $$\begin{aligned}&m_{\rm b} = \frac{1}{3} (\log _{10} M_0-7.76 )\quad \text {for}\quad 24.41 \le \,\log _{10} M_0 \le 25.76 \,(5.55 \le \,m_{\rm b}\,\le \,6.00),\end{aligned}$$
    (20c)
    $$\begin{aligned}&m_{\rm b} = 6.00\quad \text {for}\quad \log _{10} M_0 \ge 25.76. \end{aligned}$$
    (20d)

    In Fig. 3, it is clear that the fit to \(M_{\mathrm{s}}\) and \(m_{\rm b}\) in the ranges where they feature a slope of 2/3 is much improved; we also achieve a small reduction in the gap between moments separating regimes for which \(M_{\mathrm{w}}\) coincides with either \(M_{\mathrm{s}}\) or \(m_{\rm b}\).

    Fig. 2
    figure 2

    Reproduction of Fig. 7 of Geller (1976), with our preferred relationship (19) between \(M_{\mathrm{s}}\) and \(\log _{10}~ M_0\) superimposed in red

    Fig. 3
    figure 3

    Same as Fig. 1, with our preferred relationships (19) and (20) for \(m_{\rm b}\) and \(M_{\mathrm{s}}\). Note the better agreement of \(M_{\mathrm{w}}\) with both, in the ranges where either has started to saturate

    In this context, and in retrospect, it is unfortunate that Brune and Engen (1969) would not (or could not) push their measurements to periods longer than 100 s. This should have been possible at least for the largest earthquakes they studied, and would probably have led them to the explanation of the full saturation of magnitude scales, given that even \(M_{100}\) will eventually be affected by finite source dimensions (bringing the second elbow in the stylized “S” in their Fig. 3), and then fully saturate for even larger events. In this respect, note that the dataset plotted on that figure uses as abscissæ “Magnitudes” which cannot be strict \(M_{\mathrm{s}}\) values, the latter saturating around 8.2 (Geller 1976, Fig. 7, p. 1517), and that the stylized “S” fails to take into account the effect of subsequent corner frequencies, for both \(M_{\mathrm{s}}\) and \(M_{100}\), which is expected to lead to the saturation of both. This is shown in Fig. 4, where we plot a theoretical version of their relationship, obtained by adapting Eq. (19) to a reference period 5 times longer (which simply amounts to multiplying all elbow moments by a factor of 125). However, the fully saturated value \(M_{100} = 9.77\) would occur at \(M_0 = 1.2 \times 10^{30}\) dyn cm, which would make it essentially unobservable.

    Fig. 4
    figure 4

    Left: Variation of mantle magnitudes versus conventional ones, reproduced from Brune and King (1967) (Rayleigh; top) and Brune and Engen (1969) (Love; bottom). Note that the abscissæ are probably not strict \(M_{\mathrm{s}}\) values, which are supposed to saturate around 8.2. Right: Theoretical behavior of a 100-s magnitude versus 20-s \(M_{\mathrm{s}}\) as predicted using Geller’s (1976) model. The dashed line reproduces the domain of study of Brune’s papers, in agreement with the stylized “S” curves at left. Note the eventual saturation of \(M_{100}\) at a value of 9.77 (\(M_0 = 1.2 \times 10^{30}\) dyn cm)

  19. 19.

    Vassiliou and Kanamori (1982) applied Haskell’s (1964) concept to derive energy estimates by extracting the seismic moment \(M_0\) and the time-integrated value of the source rate time function squared (\(I_{\mathrm{t}}\) in their notation) from hand-digitized analog long-period records of teleseismic body waves. They concluded that \(E/ M_0\) was essentially constant for shallow (and even a few deep and intermediate) sources, but suggested a slope \(Q = 1.8\) when regressing their estimates of \(\log _{10} E\) against published values of \(M_{\mathrm{s}}\). This is probably due to the regression sampling into a range of moments (\(\ge 10^{27.9}\)) where severe saturation affects \(M_{\mathrm{s}}\) and drives it away from its domain of variation as \(\frac{2}{3}~\log _{10}~M_0 \) (their Fig. 9a and Table 1). It is noteworthy that Vassiliou and Kanamori (1982) were the first to publish logarithmic plots of energy-to-moment datasets, later used systematically by Choy and Boatwright (1995) and Newman and Okal (1998). Without access to high-frequency digital data, they elected to directly estimate the ratio \(E/M_0^{ 2}\) [their Eq. (3) p. 373] from what amounts to modeling the shape of long-period body waves, which were at the time the only ones they could hand-digitize from paper records. However robust the procedure may be with respect to parameters in the simple tapered-boxcar source model they use (their Fig. 1), their approach will perform poorly with complex, jagged sources such as those characteristic of a number of tsunami earthquakes (Tanioka et al. 1997; Polet and Kanamori 2000).

  20. 20.

    Following the deployment of short-period and broadband digital networks in the 1970s, the landmark paper by Boatwright and Choy (1986) finally regrouped all necessary ingredients for the routine computation of radiated energy, including the far-field energy flux approach (Wu 1966), a computation in the Fourier domain as suggested by Haskell (1964), and adequate corrections for geometrical spreading, anelastic attenuation, and focal mechanism orientation. It set the stage for the systematic computation of radiated energies, the catalog of Choy and Boatwright (1995) providing the first extensive demonstration of a generally constant ratio \(E/M_0\). Outliers to this trend, few in number but of critical importance in terms of seismic or tsunami risk and providing insight into ancillary problems in plate tectonics, were later identified by Choy et al. (2006). Newman and Okal (1998) developed a simplified alternative to Boatwright and Choy’s (1986) algorithm, allowing rapid computation of \(\Theta = \log _{10}~ (E/M_0 )\), a robust estimate of the slowness of a seismic source, which was implemented as part of tsunami warning procedures (Weinstein and Okal 2005). Finally, Convers and Newman (2013) have combined the measurement of radiated energy with that of rupture duration, both obtained from P waves, to identify as rapidly as possible anomalously slow events bearing enhanced tsunami risk; a similar approach can be found in the definition of Okal’s (2013) parameter \(\Phi \).

  21. 21.

    Later developments in the study of the energy radiated by seismic sources largely transcend the mainly historical scope of the present paper, and we will only review them succinctly.

    Using a wide variety of sources from the largest megaearthquakes to microearthquakes induced during shaft excavation in granite (Gibowicz et al. 1991), McGarr (1999) and Ide and Beroza (2001) documented that, remarkably, the general constancy of \(E/M_0\) can be extended over 17 orders of magnitude of seismic moment. However, as summarized, e.g., by Walter et al. (2006), later studies failed to bring a firm consensus for either a constant stress drop across the range of significant earthquake sources, as suggested, e.g., by Prieto et al. (2004) and more recently Ye et al. (2016), or for a detectable increase of “scaled energy” \(E/ M_0 \) for large earthquakes (e.g., Mayeda et al. 2005). The origin of this disparity of results remains obscure, but may be rooted in the difficulty (or impossibility) to obtain adequate, universal models of attenuation at high frequencies (Sonley and Abercrombie 2006), with Venkataraman and Kanamori (2004) also mentioning the possible effect of source directivity in biasing the computation of radiated energy for large earthquakes.

4 A Discussion of Gutenberg and Richter’s Derivations

The timeline in Sect. 3 provides a comprehensive examination of the history of estimates of the energy radiated by seismic sources, as well as of the developments of the concept of magnitudes. We return here to the origin of Eq. (1) and specifically to the derivation of the slope \(Q = 1.5\) implicit in Gutenberg and Richter (1956a, b), and of the earlier value \(Q = 1.8\) (Gutenberg and Richter 1942). The process by which these values were obtained results from a combination of parameters (\(\sigma _i \) and \(\mu _j\), see below) generally expressing power laws controlling the relative growth with earthquake size of various physical quantities; these parameters were usually obtained in an empirical fashion by Gutenberg and Richter, but can now be explored in the context of seismic source scaling laws. Tables 1 and 2 summarize their values and properties, in particular their poor robustness.

As explained in Sect. 2.2, radiated energy is proportional to seismic moment \(M_0\), and this was verified eventually from datasets such as Choy and Boatwright’s (1995) or Newman and Okal’s (1998). On the other hand, the concept of magnitude, which measures the logarithm of ground displacement, should be proportional to \(\log _{10}~M_0\), at least in the absence of saturation effects due to source finiteness. The combination of these two remarks should lead to a theoretical value of \(Q = 1\).

We start by noting that energy is carried mostly by high-frequency body waves, and as such should be measured on high-frequency seismograms. On a global scale, this restricted early authors to measurements of acceleration by strong-motion instruments, at least initially (in the 1930s and 1940s). It is probable that, partly because of this instrumental restriction and partly because their model to trace back the energy flux to a point source was developed only above the hypocenter, all their energy calculations were performed at the epicenter (with the subscript zero on all relevant variables, such as \(A_0\), \(a_0\), etc.). This remains surprising since short-period instruments providing high performance in the far field had been developed at Caltech (Benioff 1932), and were operating routinely by the late 1930s. Their records were indeed used by Gutenberg (1945b, c) to develop his body-wave magnitude \(m_{\rm B}\). One can only speculate as to why Gutenberg and Richter did not embark on a teleseismic measurement of radiated energy, especially since Gutenberg was obviously cognizant of the concept of geometrical spreading, having written up (with Geiger) the landmark paper by Zöppritz et al. (1912) following the first author’s untimely death in 1908.

As a result, when trying to relate energy and magnitude on a global scale, the authors compute an energy in the near field, from what are primarily measurements of accelerations, but a magnitude from ground motions (or perhaps estimates of velocity) obtained in the regional or far field. This is the source of the complex, occasionally arcane, nature of their derivations; in order to streamline the argument, we have relegated critical details of their calculations to the Appendix, which also presents a discussion of the (generally poor) robustness of the resulting parameters, including the slopes Q.

4.1 The Derivation of \(Q = 1.8\) by Gutenberg and Richter (1942)

The derivation proposed in that paper is based on:

  1. (i)

    The definition of magnitude M from the amplitude of ground motion \(A_0\) recorded by a torsion instrument, which we can write schematically as

    $$\begin{aligned} M = \sigma _1~\log _{10}~A + c_1 = \sigma _1~\log _{10}~A_0 + {c'}_1 \end{aligned}$$
    (21)

    with a slope \(\sigma _1\) identically equal to 1, as imposed by Richter (1935); note that A in (21), being measured at a regional distance (typically up to a few hundred kilometers), is not necessarily an epicentral value (\(A_0\)), but is taken as such by the authors, the difference being absorbed into the constant \({c'}_1\).

  2. (ii)

    The correlation (11) between magnitude and epicentral acceleration \(a_0\) [their Eq. (20) p. 176] reproduced here as

    $$\begin{aligned} M = \sigma _2~\log _{10}~a_0 + c_2, \end{aligned}$$
    (22)

    the slope \(\sigma _2 = 1.8\) being obtained empirically from the comparison of magnitude values measured on torsion instruments in the regional field and accelerations derived from strong-motion seismograms in the vicinity of the epicenter;

  3. (iii)

    An empirical relation between the duration of “strong ground shaking” \(t_0\) and magnitude [their Eq. (28) p. 178]:

    $$\begin{aligned} \log _{10}~ t_0 = \sigma _3~M + c_3, \end{aligned}$$
    (23)

    where \(\sigma _3 = 0.25\);

  4. (iv)

    The calculation (10) of seismic energy E from the energy flux radiated vertically at the epicenter above a point source [their Eq. (24) p. 178], which we rewrite as

    $$\begin{aligned} E = C_4 \cdot t_0 \cdot V_0^{ 2}, \end{aligned}$$
    (24)

    where \(V_0\) is ground velocity, and all \(c_i\) and \(C_i\) are constants independent of event size.

We have rewritten (10) as (24) to emphasize that (21), (22), and (24) involve different physical quantities, namely displacement [used in Richter’s (1935) original definition], acceleration (available as strong-motion data in the epicentral area), and ground velocity (defining the kinetic energy flux). The authors’ ensuing combinations of these equations through the use of the “period \(T_0\)” of the signal, presumably the dominant one, tacitly imply a harmonic character for the source, an additional complexity being that the concept of signal duration, \(t_0\), is sensu stricto incompatible with this model, since a monochromatic signal is by definition of infinite duration.

Under that ad hoc assumption, they then derive their Eqs.  (24) or (27) for E, which we rewrite as

$$\begin{aligned} \log _{10}~E = c_4 + \log _{10}~t_0 + \sigma _4\,\log _{10}~T_0 + \sigma _5~\log _{10}~a_0 \end{aligned}$$
(25)

with \(\sigma _5 = 2\) identically [from (24), i.e., the power of 2 in the kinetic energy] and \(\sigma _4 = 2\) identically (two powers of \(T_0\) going from \(V_0^{ 2}\) to \(a_0^{ 2}\)). Similarly, they transform (21) into

$$\begin{aligned} M = \sigma _1~\log _{10}~a_0 + \sigma _4~\log _{10}~T_0 + c_5, \end{aligned}$$
(26)

equivalent to their Eq. (31). Substituting (22), (23), and (26) into (25) (note that \(\sigma _4\), in principle equal to 2, and hence all direct reference to \(T_0\), are eliminated), they obtain

$$\begin{aligned} \log _{10}~ E = QM + C_0 \end{aligned}$$
(27)

with

$$\begin{aligned} Q = \sigma _1 + \sigma _3 + \frac{\sigma _5 - \sigma _1 }{ \sigma _2 } = 1 + \sigma _3 + \frac{1}{\sigma _2} \approx 1.8. \end{aligned}$$
(28)

The slope \(Q = 1.8\) is thus explained as a combination of the various slopes \(\sigma _i\). The latter are of a very different nature: \(\sigma _1 = 1\) and \(\sigma _5 = 2\) were fixed in Gutenberg and Richter’s (1942) model, while \(\sigma _2 \) and \(\sigma _3\) (1.8 and 0.25, respectively) were obtained empirically by the authors, and as such could vary, impacting significantly the value of Q derived from (28). We emphasize this point in the third member of Eq. (28), which leaves only \(\sigma _2\) and \(\sigma _3\) as variables.

As summarized in Table 1, the critical examination of this derivation detailed in the Appendix shows that the robustness of \(Q = 1.8\) with respect to various assumptions underlying the computations is poor.

Table 1 Summary of parameters \(\pmb {\sigma _i}\) used by Gutenberg and Richter (1942)
Table 2 Summary of parameters \(\pmb {\mu _j}\) used by Gutenberg and Richter (1956b)

Note finally that Gutenberg and Richter (1942) combined their Eqs. (20) and (31) (our Eqs. 22 and 26) to obtain the dependence with magnitude of the dominant period \(T_0\), which had dropped out of their derivation of Q:

$$\begin{aligned} \log _{10}~T_0 = \sigma _6~M + c_6\quad \quad \text {with}\quad \quad \sigma _6 = \frac{1}{\sigma _4} - \frac{\sigma _1}{ \sigma _4 \cdot \sigma _2 } = 0.22, \end{aligned}$$
(29)

further leading to

$$\begin{aligned} \log _{10}~ T_0 = \sigma _7~\log _{10}~ E + c_7;\quad \quad ~~\sigma _7 = \frac{\sigma _6}{Q} = 0.12, \end{aligned}$$
(30)

some limitations of those two relations being further discussed in the Appendix.

4.2 The Final Regressions: the Road to \(Q = 1.5\) (Gutenberg and Richter 1956b)

In this section, we paraphrase that paper to underscore the fundamental steps which led the authors to \(Q = 1.5\). In the various equations relating E, m, A, etc., we will use formal slopes comparable to the \(\sigma _i\) used above, but with the notation \(\mu _j\) to avoid confusion; \(\gamma _j\) will be constants substituting for the \(c_i\) above, largely irrelevant in the present discussion. Note that indexing of the parameters \(\sigma _i \) and \(\mu _j\) is in both cases sequential in the derivation, so that \(\sigma \) and \(\mu \) of identical index (\(i =j\)) may not describe comparable physical relations. The results are summarized in Table  2, in a format similar to that of Table 1; again, additional details can be found in the Appendix.

The situation differs from the previous derivation [Sect. 4.1 above, after Gutenberg and Richter (1942)] in several respects. First, and most importantly, the authors introduce the variable \(q_{0} = \log _{10}~ (A_0 / T_0 )\); following Gutenberg (1945c), the body-wave magnitudes, in particular \(m_{\rm B}\) and hence m, are now defined from \(q_{0}\), rather than from the logarithm of \(A_0\):

$$\begin{aligned} m = \mu _1 \cdot q_{0} + \gamma _1 = \mu _1( \log _{10}~A_0 - \log _{10}~T_0) + \gamma _1 \end{aligned}$$
(31)

with \(\mu _1 = 1\) identically (Gutenberg and Richter 1956b; Eq. 11). Next, the authors now consider as their reference magnitude the unified magnitude m rather than Richter’s (1935) original scale (restricted to local records of California earthquakes). Finally, several relationships now feature nonlinear (albeit weak) terms.

The fundamental difference between (31) and (21) stems from an implicit variation of the dominant period \(T_0\) with earthquake size, which would be expected from scaling laws. However, the combined influence of anelastic attenuation and instrument response can act as a bandpass filter, reducing variations in the dominant period, which may then significantly violate scaling laws, an effect enhanced at the short periods considered by the authors. For this reason, it may not be possible to define an expected value of the parameters \(\mu \) under modern theories, and we leave blank several entries in the relevant column in Table 2.

Gutenberg and Richter’s (1956b) next step is their Eq. (8) p. 10, equivalent to (24), and leading to

$$\begin{aligned} \log _{10} E = \gamma _2+\mu _2 \log _{10} t_0 + \mu _3 ( \log _{10} A_0 - \log _{10} T_0 ) = \gamma _2 + \mu _2 \log _{10} t_0 + \mu _3 q_{0} \end{aligned}$$
(32)

with \(\mu _2 = 1\) and \(\mu _3 = 2\) identically. The introduction of \(q_{0}\) rather than \(A_0\) (with a nonconstant period \(T_0\)) then results in their new Eq. (9):

$$\begin{aligned} q_{0}=\gamma _{3} + \mu ^{\prime }_4 M_{\mathrm{L}} - \mu ^{\prime \prime }_4 M_{\mathrm{L}}^2 \end{aligned}$$
(33)

with \(\mu ^{\prime }_4 = 0.8 \) and \(\mu ^{\prime \prime }_4 = 0.01\) (empirical values). Note that (33) expresses the relationship between m and \(M_{\mathrm{L}}\) (Gutenberg and Richter 1956b; Eq. 14), and under the assumption that its curvature can be neglected, \(\mu _4 \) is also the slope of m versus \(M_{\mathrm{L}}\):

$$\begin{aligned} m = \mu _4 M_{\mathrm{L}} + \gamma _{4}. \end{aligned}$$
(34)

Note that, by comparison with (29), and assuming again that \(\mu ^{\prime \prime }_4\) can be neglected, Eq. (33) leads to

$$\begin{aligned} \sigma _6 = \frac{1 -\mu _4}{\sigma _1} = 0.2. \end{aligned}$$
(35)

Next, the authors study empirically the relation between the duration \(t_0\) in the epicentral area and the newly defined parameter \(q_{0}\):

$$\begin{aligned} \log _{10} t_0 = \mu _5 q_{0} + \gamma _{5} .\end{aligned}$$
(36)

They obtain a slope \(\mu _5=0.4\) [their Eq. (10)], which we have confirmed (\(0.40 \pm 0.06)\) by regressing the 19 points on their Fig. 6a [which are probably a subset of the previously mentioned dataset in Fig. 1 of Gutenberg and Richter (1956a)].

Note that the combination of (21), (23), (35), and (36) leads to

$$\begin{aligned} \mu _5 = \frac{\sigma _3 \sigma _1}{ 1 - \sigma _6 } = \frac{\sigma _3}{\mu _4}, \end{aligned}$$
(37)

which takes the observed value \(\mu _5 = 0.4\) for \(\mu _4 = 0.8\) and \(\sigma _3 = 0.32\) as favored by Gutenberg and Richter (1956a), but \(\mu _5 = 0.31\) for \(\sigma _3 = 0.25\) under the first combination. Although not specifically spelt out in Gutenberg and Richter (1956b), the combination of (31), (32), and (37) then leads, for \(\mu _5 = 0.4\), to

$$\begin{aligned} \log _{10} E = \mu _6 m + \gamma _{6}\, \text {with}\, \mu _6 = \frac{\mu _3 + \mu _2 \mu _5}{\mu _1 } = \mu _3 + \mu _5 = 2.4, \end{aligned}$$
(38)

and then to (2), which as discussed in the Appendix, remains dependent on \(\sigma _3\) through (37), and hence on the dataset in Fig. 1 of Gutenberg and Richter (1956a).

Using this slope of 2.4 in (2), and in order to finalize \(Q = 1.5\) in (1), there remains to justify (3). That relation traces its origin to Fig. 9 of Gutenberg and Richter (1956a, p. 138), which examines the slope \(\mu _7\) regressing m versus \(M_{\mathrm{s}}\):

$$\begin{aligned} m = \mu _7 M_{\mathrm{s}} + \gamma _{7} \end{aligned}$$
(39)

[note, however, that instead of m in (39), Gutenberg and Richter (1956b) apparently use the uncorrected body-wave magnitude (but with the notation \(M_{\mathrm{B}}\), as opposed to \(m_{\rm B}\))]. They describe an average slope of 0.4 between (\(M_{\mathrm{s}} - M_{\mathrm{B}}\)) and \(M_{\mathrm{s}}\), equivalent to \(\mu _7 = 0.6\), later revised to \(\mu _7 = 0.63\) by Gutenberg and Richter (1956b). We have verified that a modern regression of the 105 points in their Fig. 9 does yield \(1 - \mu _7 = 0.37 \pm 0.03\). Then, the slope Q is just the product

$$\begin{aligned} Q = \mu _6 \cdot \mu _7, \end{aligned}$$
(40)

which takes the value 1.51, justifying (1).

4.3 Discussion

The detailed examination of the derivations of the slopes \(Q = 1.8\) in Gutenberg and Richter (1942) and later \(Q = 1.5\) inferred from Gutenberg and Richter (1956b), and the critical analysis of their underlying assumptions carried out in the Appendix, should severely limit the confidence of the modern reader in the resulting slopes Q and in particular in the value 1.5, proposed in their last contributions and now universally enshrined into seismological dogma. We recap here some of the most serious limitations documented in the present study:

  • First, it is not clear that they correctly measured the energy flux at the epicenter.

As we have stated several times, the algorithm used by Gutenberg and Richter (1942) to evaluate energy flux near the epicenter consists of squaring the product of the maximum ground acceleration and the dominant period, and multiplying the result by the duration of sustained maximum ground motion. This is only a gross approximation to the integral defining energy flux, as defined later, e.g., by Haskell (1964), and suffers from an inherent flaw, the concept of a single (or dominant) frequency being incompatible with that of a finite signal duration.

  • Next, the authors were using a simplified analytical model.

The latter essentially predated any knowledge of anelastic attenuation, for which models would start being available (albeit at much lower frequencies) in the late 1950s and early 1960s (Ewing and Press 1954; Satô 1958; Anderson and Archambeau 1964); of an adequate physical representation of the earthquake source as a double-couple (Knopoff and Gilbert 1959); or even of the concept of a spatially extended source, an idea pioneered by Lamb (1916) and heralded analytically by Ben-Menahem (1961). Feeling constrained to work in the near field, Gutenberg and Richter were using the model of a point source, whereas most earthquakes in their datasets would have had fault lengths comparable to, or greater than, their hypocentral depth. It would take six years following Gutenberg’s death for Wu (1966) to regroup these later developments into a blueprint for the definitive computation of radiated energy from teleseismic datasets, which would be put to fruition only when digital data became available in the 1970s.

  • In addition, the robustness of their results is seriously cast in doubt.

The final slopes Q are obtained from the parameters \(\sigma _ i\) and \(\mu _j\), which relate observables of a high-frequency nature, strongly affected if not fully controlled by small-scale heterogeneity within the seismic source, as now universally documented by source tomography studies of recent megathrust events (Ishii et al. 2007; Lay et al. 2011). Such source structure acts to defy the scaling laws which are inherent in the derivations of Q proposed by Gutenberg and Richter (1942, 1956b). Not surprisingly then, Eqs. (22) and (23) [or (31) and (36)] formalizing these scaling laws have not been upheld by modern strong-motion studies in the near field.

One can only speculate as to how Gutenberg and Richter may have pursued this line of work, but for the former’s sudden death in 1960. Despite the promise of a full revision of magnitude scales (Gutenberg and Richter 1956a; abstract p. 165), no such work came forth during the next three years, a time when Gutenberg apparently shifted his main activity to studies in structural seismology, especially regarding the Earth’s core (Gutenberg 1958). However, it is highly probable that he would have been, at the time of his death, cognizant of the new representation of the seismic source as a double-couple quantified by a seismic moment \(M_0 \), introduced in the West a year earlier by Knopoff and Gilbert (1959). One can only assume that, eventually, Gutenberg would have sought to relate magnitude to \(M_0\), even though the first measurement of seismic moment from long-period waves had to wait until Aki’s (1966) study of the 1964 Niigata earthquake, and a sufficient dataset of values until the early 1970s, allowing the landmark studies by Kanamori and Anderson (1975) and Geller (1976).

Figure 5 summarizes the evolution of the slope factor Q in Gutenberg and Richter’s papers from 1935 until Gutenberg’s death. A further step in speculation would address how this inescapable, regular decrease would have continued, had the two authors been able to keep a proactive collaboration. A linear regression of the three points in the 1950s, when the authors’ work was most supported analytically, predicts that they would have reached the more justifiable value \(Q = 1\) in 1968. A full regression of the six values published since 1935 would suggest 1981. However, we note that, by then, Gutenberg would have been 92 years old.

Fig. 5
figure 5

Evolution with time of the value of the slope Q in the various relations proposed by Gutenberg and Richter between seismic energy and magnitude. The dashed line is the value (\(Q~=~1\)) derived theoretically for a point source double-couple. The vertical downpointing arrow indicates Gutenberg’s death on 25 January 1960. The green line shows a linear regression through the six published values (1935–1956) and the pink one through the last three (1949–1956). The dashed blue line expresses the value \(Q~=~1.5\), frozen into perpetuity as the last one published before B. Gutenberg’s death

5 Conclusion and Perspective

A detailed, occasionally forensic examination of the algorithms used by Gutenberg and Richter (1942; 1956b) suggests that they may have been legitimate under what was the state of the art of seismology in the 1940s and 1950s, but are nowadays difficult to reconcile with modern advances in source theory and Earth structure, including anelasticity, as well as with the advent of high-frequency digital data. In this context, the physical models underlying the derivation of the successive values of the slope Q, including the empirical relations used by Gutenberg and Richter, can now appear simplistic if not outright inaccurate, e.g., the model of a point source, and the implicit assumption of a monochromatic signal. As a result, the derived values of Q clearly lack robustness, and in particular the last one (1.5) hardly deserves the apparently sacred character which led to its being enshrined into the definition of moment magnitude \(M_{\mathrm{w}}\) by Kanamori (1977).

We recall that, in their last paper on the subject, Gutenberg and Richter (1956a) commented that there may not necessarily be a single relationship between seismic energy and any magnitude scale. In retrospect, this statement appears visionary, as we now understand that the concept of radiated energy is fundamentally anchored in the high-frequency part of the source spectrum, whereas individual magnitude scales target specific periods which can belong to significantly different parts of the spectrum. Note that Kanamori (1983) echoed this statement, asserting that “It is impossible to represent all [...] parameters by a single number, the magnitude.”

Since the advent of digital seismology in the late 1970s, it has become clear that many seismic sources do indeed follow laws of similitude allowing the description of many, if not all, of an event’s source properties on the basis of a single number, namely its seismic moment \(M_0\), which Kanamori (1977) has proposed to cast into the “magnitude” \(M_{\mathrm{w}}\), thus introducing a parameter combining the rigor of the underlying quantification of a physical quantity with the practicality of empirically derived magnitude scales. Such source properties generally include radiated energy E, as documented by extensive catalogs such as Choy and Boatwright’s (1995).

At the same time, a number of violations to these scaling laws have been regularly documented, both in the form of “slow” events, whose red-shifted spectrum leads to a deficiency in radiated energy, and of “snappy” ones featuring a blue-shifted spectrum. Such rogue events bear crucial societal relevance, since the former include the so-called tsunami earthquakes (Kanamori 1972), whose real-time identification remains a challenge in operational tsunami warning, while the latter have proven ominously destructive due to exceptional levels of ground acceleration, e.g., reaching 2.2g during the relatively small (\(M_{\mathrm{w}} = 6.2)\) 2011 Christchurch earthquake (Holden 2011).

However, little progress has been made in the understanding of parameters possibly controlling their occurrence. For example, it is not clear whether all megaquakes (defined as having moments greater than \(10^{29}\) dyn cm) feature source slowness (Okal 2013, Fig. 10), or whether there exists a regional trend controlled by simple tectonic patterns to the distribution (and hence to the predictability) of tsunami earthquakes, an idea hinted at by Okal and Newman (2001), and further supported by the 2006 Java and 2012 El Salvador events. In this context, the availability of catalogs asserting a diversity of source parameters is of crucial necessity for further research into such topics.

Yet, such catalogs are not (or no longer) made readily available. For example, an updated extension of Choy and Boatwright’s (1995) extremely valuable dataset is not widely distributed, and to our best knowledge, their dataset is no longer being routinely updated. Similarly, for many years, the National Earthquake Information Center of the United States Geological Survey had distributed an electronic catalog of epicenters, listing for each event a set of various and occasionally different conventional magnitudes (\(m_{\rm b} , M_{\mathrm{s}}, M_{\mathrm{L}}\)), in addition to moment estimates transcribed as \(M_{\mathrm{w}}\). This allowed the easy identification of anomalous events, e.g., through a simple search for an \(m_{\rm b} : M_{\mathrm{s}}\) disparity (a predecessor to the \(E / M_0\) ratio in the pre-digital age). As of 2015, this practice has been discontinued and replaced by the issuance of a single magnitude, whose nature (body, surface, local, moment, etc.) is not clearly specified, with the probable goal of making the catalog less confusing to the lay user, admittedly a legitimate concern. As a result, the individual researcher has lost a powerful tool to further our understanding of earthquake source properties. Within a few decades, the careless or simply uninformed investigator might be tempted to conclude that earthquakes violating scaling laws ceased to occur after 2014.

Incidentally, seismology is not alone in featuring a diversity of source properties acting to restrict the range of validity of scaling laws. For example, the size of hurricanes has been described through a Safir–Simpson “category” index, a single number quantifying several of their properties, such as geometrical extent, maximum wind velocity, and underpressure at the center of the eye; as such, it constitutes a measure conceptually similar to an earthquake magnitude. However, major hurricanes have shown that such an approach constitutes a drastic simplification, with Sandy in 2012 featuring a diameter of 1800 km for a minimum pressure of 940 mbar and winds of 185 km/h (Blake et al. 2013), while the 2015 Mexican hurricane Patricia registered a record low pressure of 872 mbar and winds of 345 km/h concentrated in a system not more than 800 km across (Kimberlain et al. 2016). Such limitations for hurricane scaling laws are not intrinsically different from the diversity expressed for seismic sources through variations in energy-to-moment ratios, and similarly prevent an accurate societal prediction of the level of natural hazard based on a single quantifier (magnitude or category) of the event. Another example would be the Volcanic Explosivity Index (Newhall and Self 1982), widely used to rank large volcanic eruptions, but ignoring specific properties significantly affecting their dynamics as well as their climatic and societal effects (Miles et al. 2004).

In this context, it behooves the seismological community to observe, respect, and fully document the occasional diversity in source properties of earthquakes, notably by restoring the now abandoned practice of systematic reporting of radiated seismic energy. This is a necessary step towards the furtherance of our understanding of seismic source properties in the context of their societal impact, in keeping with the pioneering work of Beno Gutenberg and Charles Richter.