1 Introduction

In the last twenty-five years arguments from naturalness have been playing an increasingly important role in particle physics. Gerard ’t Hooft was the first to introduce naturalness in this context. He connected it with symmetry:

The naturalness criterion states that one such [dimensionless and measured in units of the cut-off] parameter is allowed to be much smaller than unity only if setting it to zero increases the symmetry of the theory. If this does not happen, the theory is unnatural. [52]

Since Plato and the 17th-century French debate between Claude Perrault and François Blondel, two opposing views have taken symmetry to be, respectively, an expression of the aesthetic imperative of beauty or a human-invented instrument for better executing the work of an engineer. In turn, naturalness has both a connection with beauty and a road-mapping role in science. Based upon ’t Hooft’s definition, it could have received a double conceptual foundation similar to that of symmetry. But history has chosen a more intriguing path.

’t Hooft’s original idea has gradually faded away, leaving the place to a many-faceted use of naturalness in particle physics. In what physicists today say about the meaning of naturalness one encounters infrequent heuristic arguments as well as abundant references to beauty: naturalness is an “aesthetic criterion” [5], a “question of aesthetics” [29], an “aesthetic choice” [7]. Sometimes the aesthetical significance of naturalness and the heuristic role are mixed: “the sense of ‘aesthetic beauty’ is a powerful guiding principle for physicists” [34]. One should not belittle the place of beauty in the scientist’s thinking. Mathematical reasoning helps to develop an intuitive aesthetic sense that can subsequently serve as a thinking aid. In mathematics proper, after beauty and elegance have prompted the way to new discoveries, valid results must be rigorously established through formal proof. In natural science, on the contrary, “rational beauty” [46] can only be admired once we have given a sound scientific account in agreement with experiment. Einstein vividly supported this view early in his life, saying that the aesthetically motivated arguments “may be valuable when an already found [his emphasis] truth needs to be formulated in a final form, but fail almost always as heuristic aids” [31]. Used as a guide for discovering reality, aesthetic arguments may indeed turn out to be extraordinarily fruitful as well as completely misleading, and this for two reasons.

First, because the real universe is not just beautiful: one can also discern in it futility [53] or inefficiency [30]. Nature has proved wrong the American physicist Karl Darrow, who stated that it would be more “elegant” if there were only two particles in the atomic nucleus [25]. Dirac, an outspoken promoter of mathematical beauty in physics, has been many times led by this argument into scientifically sterile byways [39, chapter 14]. Thus beauty is not an exclusive characteristic of sound science; it should not be elevated to the status of a research imperative.

Second, because there is no necessary link between beauty and empirically verified truth. Without entering a debate on this subject (e.g., see [21]), we maintain that beauty and truth, as well as beauty and good, are distinct categories, in particular in their application to physics. The beautiful may be false and the true may be ugly, so there is no necessary connection between them. To summarize, aesthetic arguments are a methodologically problematic and a potentially misleading beacon on the way to sound science in the physical universe.

In Sect. 2 we review the physics of the Higgs mechanism and remind the reader that the argument from naturalness often gives an impression of being a perfectly normal scientific argument. Section 3 describes the development of the concept of naturalness in particle physics. Among all fine-tuning arguments, the valid one is neither anthropic (Sect. 4.1) nor an argument from beauty. We argue in Sect. 4.2 that it involves a special interpretation of probability and is meaningful only if naturalness in particle physics is understood as a heuristic. Practical sense of fine tuning then stems, not so much from aesthetics, as from down-to-earth sociological factors.

2 The Higgs Mechanism

The observed weak interaction is not locally gauge invariant and its unification with electromagnetism cannot ignore this. A mechanism must be introduced within any unified theory of electroweak (EW) interactions that puts the electromagnetic force and the weak force back on unequal grounds. By offering one such mechanism, the Standard Model (SM) describes electroweak symmetry breaking quantitatively. Invented in 1964 independently by several different groups, this so-called Higgs mechanism builds on the fact that a massless spin-one particle has two polarization states and a massive one has three. Electroweak symmetry breaking produces a would-be Goldstone boson, whose physical degree of freedom is absorbed by the massless gauge boson. The number of polarization states of the latter then increases from two to three as it becomes massive. Such massive gauge bosons account for the absence of gauge symmetry in the observed weak interaction.

Such an account was quickly recognized to be not very compelling due to its lack of explanatory power [34, 48]. Many physicists considered the problems of the Higgs mechanism unimportant, because they took it for no more than a provisional convenient solution of the problem of electroweak symmetry breaking. Jean Iliopoulos said at the 1979 Einstein Symposium: “Several people believe, and I share this view, that the Higgs scheme is a convenient parametrization of our ignorance concerning the dynamics of spontaneous symmetry breaking, and elementary scalar particles do not exist” [38]. Then over just a few years the situation has changed. The discovery of the W and the Z bosons and the growing amount of electroweak precision data confirmed the ideas of Weinberg and Salam. Today, not only is there confidence in the Standard Model, but it is clear that changing it ought to be exceptionally difficult due to an exceedingly large number of tests with which any model of the physics beyond the Standard Model (BSM) must conform. By 2004, Ken Wilson was completely assured: “A claim that scalar elementary particles were unlikely to occur in elementary particle physics at currently measurable energies …makes no sense” [55].

The SM Higgs mechanism is a pleasingly economical solution for breaking the electroweak symmetry. However, the global fit of the electroweak precision data is consistent with the Standard Model only if one takes an average value over all available experimental results: then the usual prediction of a relatively light Higgs, \(m_{H}<182\;\mathrm{GeV}\), arises [37]. The details of the data look highly problematic: the ways of calculating the Higgs mass m H based on distinct experimental measurements lead to incompatible predictions. Figure 1 is a vivid illustration that there is practically no overlap between the EW precision tests.

Fig. 1
figure 1

Values of the Higgs mass extracted from different EW observables. The vertical line is the direct LEP lower limit of 114 GeV. The average is shown as a green band [37]

To give a technical example, the value of the top quark mass extracted from EW data is \(m_{t}=178.9^{+11.7}_{-8.6}~\mbox{GeV}\), while the Tevatron result is m t =172.6±0.8(stat)±1.1(syst) GeV [33]. The SM fit can be worsened by such seemingly minor discrepancies in the measurement of m t . Of a more direct impact on the light Higgs hypothesis is the observation that the two most precise measurements of the Weinberg angle sin2 θ W do not agree very well, differing by more than 3σ. The \(b\bar{b}\) forward-backward asymmetry \(A_{fb}^{0,b}\) measured at LEP gives a large value of sin2 θ W , which leads to the prediction of a relatively heavy Higgs with \(m_{H}=420^{+420}_{-190}~\mbox{GeV}\). On the other hand, the lepton left-right asymmetry A l measured at SLD gives a low value of sin2 θ W , corresponding to \(m_{H}=31^{+33}_{-19}~\mbox{GeV}\), in conflict with the lower limit m H >114 GeV from the direct LEP searches [9]. Moreover, the world average of the W mass, m W =80.392±0.029 GeV, is larger than the value extracted from a SM fit, again requiring m H to be smaller than what is allowed by the LEP Higgs searches [35].

For a physicist, the inconsistency between the Higgs mass predictions typically entails that the argument in favour of the SM with a light Higgs is ‘less compelling’. What message exactly is encoded in the vanishing overlap between different measurements? Does it correspond to some very low probability? In what sense does its smallness make the SM Higgs less compelling?

3 Measures of Naturalness

3.1 Hierarchy Problems

The Standard Model suffers from a ‘big’ hierarchy problem: in the Lagrangian, the Higgs mass parameter \(m_{H}^{2}\), which is related to the physical mass by \(m_{h}^{2} = -2 m_{H}^{2}\), is affected by incalculable cut-off dependent quantum corrections. If a new theory, possibly including gravitation, replaces the Standard Model above some energy scale ΛNP, one can expect the Higgs mass parameter to be of the same size as, or bigger than, the SM contribution computed with a cut-off scale ΛNP. This way of estimating the size of the Higgs mass is made reasonable by the analogy with the electromagnetic contribution to \(m_{\pi^{+}}^{2}-m_{\pi^{0}}^{2}\). The leading quantum correction is then expected to come from the top quark sector and is estimated to be [48]:

$$\delta m_H^2\sim-\frac{3\lambda_t^2}{8\pi^2}\Lambda_{\mathrm{NP}}^2, $$
(1)

where λ t is the coupling between the Higgs boson and the top quark. This contribution is compatible with the allowed range of \(m_{h}^{2}\) only if the cut-off is rather low

$$ \Lambda_{\mathrm{NP}}< 600 \times\biggl(\frac {m_h}{200~\mathrm{GeV}}\biggr)~\mathrm{GeV}.$$
(2)

Now, if the energy range of the SM validity is as low as 500 GeV–1 TeV, why did previous experiments not detect any deviation from the SM predictions? Even though the center of mass energy of these experiments was significantly lower than 1 TeV, still their precision was high enough to make them sensitive to the virtual effects associated with a much higher scale.

To state it in other terms, note that effects from new physics at a scale ΛNP can in general be parametrized by adding to the SM renormalizable Lagrangian a tower of higher dimensional local operators, with coefficients suppressed by suitable powers of ΛNP. The lower bound on ΛNP for each individual operator, neglecting the effects of all the others and after normalization, ranges between 2 and 10 TeV [48]. Taking into account several operators at the same time does not qualitatively change the result unless parameters are tuned. This can be interpreted as an indication that if new physics beyond the SM affects electroweak observables at the tree level, then the generic lower bound on its threshold Λ NP is a few TeV. The tension between this lower bound and the bound in (2) defines what is known as the ‘little’ hierarchy problem.

The little hierarchy problem is apparently mild. But its behaviour with respect to fine tuning is problematic. If fine tuning of order ϵ is tolerated, then the bound in (2) is relaxed by a factor \(1/\sqrt{\epsilon}\). The needed value of ϵ grows quadratically with ΛNP, so that for ΛNP=6 TeV one needs to tune to 1 part in a hundred in order to have m H =200 GeV. The goal of this section is to make a precise statement about the meaning of this fine-tuning problem.

3.2 Standard Definition

The first modern meaning of naturalness is a reformulation of the hierarchy problem. It arises from the fact that masses of scalar particles are not protected against quantum corrections, and keeping a hierarchical separation between the scale of EW symmetry breaking and the Planck scale requires the existence of a mechanism that would ‘naturally’ explain this hierarchy. Although the difference in scales is a dimensionless parameter much smaller than unity (\(\frac{10^{3}~\mathrm{GeV}}{10^{19}~\mathrm{GeV}} = 10^{-16}\)), setting it to zero in accordance with ’t Hooft’s prescription is out of the question, because gravity exists even if it is weak.Footnote 1 With all its known problems, the Standard Model does not become more symmetric in the hypothetical case where gravity is infinitely weaker than the weak interaction. Naturalness therefore needs a new definition.

According to Wilson’s idea popularized by Susskind [51], naturalness means that the observable properties of a system are stable against small variations of fundamental parameters. This formulation, given in 1978 at the end of a decade filled with debates on the instability of the Higgs mass, is precisely the lesson learned from the hierarchy problem. In an article written at the end of 1970, Wilson had clearly stated his doubt that the Higgs mechanism could be fundamental: “It is interesting to note that there are no weakly coupled scalar particles in nature; scalar particles are the only kind of free particles whose mass term does not break either an internal or a gauge symmetry. …Mass or symmetry-breaking terms must be ‘protected’ from large corrections at large momenta due to various interactions (electromagnetic, weak, or strong). …This requirement means that weak interactions cannot be mediated by scalar particles” [54]. After ten years of such doubts in the electroweak symmetry breaking, the Standard Model was experimentally verified and little room was left for challenging its core components. If the hierarchy problem were to be tackled, the Standard Model now had to be complemented rather than discarded.

In the years around 1980, supersymmetry has become the leading candidate for an extension of the uncertain physics of electroweak symmetry breaking. Consequently, naturalness began to be discussed in the context of supersymmetric models with their enlarged content of particles and new predicted phenomena, e.g., in a seminal paper by Witten [56]. As the number of proposed supersymmetric extensions of the Standard Model was growing, there appeared an acute need for a formal definition of naturalness, in order to evaluate the effectiveness of various SM extensions in solving the big hierarchy problem. The first quantitative measure of naturalness was proposed in the mid-1980s as a mathematical analogue of Wilson’s idea.

Barbieri and Giudice looked at a variety of realizations of the low-energy supersymmetric phenomenology arising from supergravitational models [10, 32]. They interpreted the notion of naturalness by equating it with the sensitivity of the electroweak symmetry breaking scale (instantiated as the Z-boson mass m Z ) with respect to variations in model parameters. For a general observable O depending on parameters p i at point P′, this sensitivity is:

$$ \Delta_{\mathit{BG}}(O;p_i) = \biggl\vert \frac{p_i}{O(p_i)} \frac{\partial O(p_i)}{\partial p_i}\biggr\vert.$$
(3)

Barbieri and Giudice then chose number 10 as a natural upper bound on Δ BG . Their motivation was a subjective belief that if the discrepancies between quantities were to be natural, they must be less than of one order of magnitude; thus number 10 is a sheer convention. In a different context, for example, Lewis, when discussing a notion in semantic chains he calls ‘naturalness’, shows that the establishment of an endpoint of perfect naturalness is connected with our own appreciation of what is “not too complicated” [40, p. 61]. Opinion in such matters apparently can evolve: ten years after the Barbieri-Giudice definition, when experimental constraints on the leading BSM candidate—minimal supersymmetric standard model (MSSM)—have become stronger, the survival of the model required a fine tuning of 20 [11, 22]. Double the value of the old endpoint, this new limit of naturalness was also hailed as “reasonable” [20].

Note that (3) only involves infinitesimal variations in p i . It follows that the Barbieri-Giudice definition gives the measure of naturalness of a model independently of its rivals that have finite differences in the values of parameters and also pretend to solve the big hierarchy problem. This definition has been widely used and has helped to sort out the claims of success of different supersymmetric models in solving the big hierarchy problem. At the same time, this definition has failed to address a new set of issues in the then-flourishing enterprise of model building.

3.3 Naturalness in Supersymmetric Models

In the late 1980s, BSM models began to be studied more thoroughly and a multitude of their consequences became apparent, often unconnected with the big hierarchy problem. Comparing this predicted phenomenology with an ever-growing set of experimental data from particle accelerators required a new notion of fine tuning. It has become necessary to take into account many observables and not just the Z mass; and the definition of naturalness no longer considered only infinitesimal changes in the values of parameters, but a finite range.

Even if supersymmetry has never been the only available solution of the big hierarchy problem, a long line of studies were dedicated to the use of fine tuning in order to make guesses about the masses of particles. At an early stage, the MSSM parameter space was scrutinized, later leaving the place to that of NMSSM. In an article belonging to this line of research, de Carlos and Casas [26], who were critically reviewing an earlier work which used the Barbieri-Giudice measure [49], realized that a measure of sensitivity need not always be a measure of fine tuning. At the time, this had only led them to conclude that one should take 20 rather than 10 as a numerical limit of natural Δ BG .

More radically, a newly defined measure appeared in 1994, when Anderson and Castaño refined the Barbieri-Giudice definition in order to exclude the situations in which sensitivity is present in a model for other reasons than fine tuning, e.g. there is global sensitivity at all points [3]. Anderson and Castaño divided the Barbieri-Giudice measure by its average value \(\bar{\Delta}_{\mathit{BG}}\) over some “sensible” range of parameters p i at different points P′:

$$\Delta_{AC}(O;p_i)= \frac{\Delta_{\mathit{BG}}(O;p_i)}{\bar{\Delta}_{\mathit{BG}}}. $$
(4)

This range can be specified by fiat or can be chosen to include all parameter values at which the model’s experimentally valid predictions remain ‘unperturbed’. Naturalness can then be defined, in a slight modification of Wilson’s language, as a condition that observable properties of a system be “not unusually unstable” against small variations of fundamental parameters. The new word “unusual” implies a comparison with the introduced range of parameters and has first-order conceptual importance. We shall see that, historically, it has brought the meaning of the fine-tuning argument in particle physics closer to probability estimates.

That a range of parameters is involved in the definition of naturalness means that the values of parameters in a particular model begin to be seen as just one particular instantiation on a broader distribution of possible parameters. Anderson and Castaño became the first to connect naturalness with the “likelihood” of a given set of Lagrangian parameters as they presupposed that there exist a way in which “we parametrize our assumptions about the likelihood distribution of the theory’s fundamental parameters” [4]. The range over which vary fundamental parameters p i arises as a mathematical representation of such assumptions. Measurable quantities X i are then functions of the fundamental parameters and their probability is conditional on the likelihood of the underlying values of p i . This argument has paved the way for a consideration of a class of identical models, which differ only by the values of fundamental parameters, i.e., what is today called a landscape of scenarios defined by the values of p i . The distribution of parameters over their allowed range was taken as uniform and all the values were considered equally likely.

If Anderson and Castaño were careful to speak about naturalness only as likelihood of certain fundamental parameters, very soon did the word ‘probability’ enter the stage. Introduced by Strumia and his co-authors, probability was not yet the probability of a particular scenario seen on a landscape of many, but a mere inverse of the Barbieri-Giudice measure of fine tuning. The latter was now “supposed to measure, although in a rough way, the inverse probability of an unnatural cancellation to occur” [11]:

$$P \sim\Delta_{\mathit{BG}}^{-1}.$$
(5)

In a paper discussing naturalness of the MSSM, Ciafaloni and Strumia speak about probability as a “chance to obtain accidental cancellations” in M Z  [24]. They attempt to demonstrate that the choice of a particular limiting value of Δ BG is no more then a choice of a “confidence limit on unprobable [sic] calculations” and they suggest that probability could be normalized by requiring that it equals 1 if and only if “we see nothing unnatural”. This is how probability in the Bayesian sense, through a degree of belief or subjective confidence, made its way into particle physics. A precise meaning of the normalization condition is unclear: the normalization of probability is indeed an intractable problem, and the main difficulty here is that most attempts to rigorously define parameter space lead to non-normalizable solutions, so that it is impossible to define the ratios between the regions of these spaces [41]. Thus Strumia’s use of ‘probability’ is metaphoric. It was perhaps the reason why Anderson and Castaño have been careful not to use this word with respect to fundamental parameters.

Though originally metaphoric, the phrase “roughly speaking, \(\Delta ^{-1}_{\mathit{BG}}\) measures the probability of a cancellation” has proved popular (see, e.g., [17]). It was used by Giusti et al., when they variously spoke about “naturalness probability” or “naturalness distribution probability” [36]. This line of thought refers to probability because it needs a justification for performing Monte Carlo calculations of “how frequently numerical accidents can make the Z boson sufficiently lighter than the unobserved supersymmetric particles”. Note that, although Bayesian in its roots (Monte Carlo being a Bayesian method), probability is seen here as frequency of an event occurring only in thought experiments, performed by an agent who imagines worlds with different values of supersymmetric parameters. Such experiments cannot be made in the real world, not even in the future. We call this interpretation of probability Gedankenfrequenz; it becomes typical for a group of papers on fine tuning in supersymmetric models. Gedankenfrequenz is a mixture of Bayesianism and frequentism: it relies on a subjective assignment of priors but it involves frequencies of unperformed (and unperformable in principle) experiments.

For Anderson and Castaño, likelihood arises from an attribution of prior probabilities to fundamental parameters. These priors reflect “the way in which we parametrize our assumptions”. The agent’s initial freedom to assume any a priori probability is limited by the boundaries of the allowed region in parameter space. Even inside the allowed region strategies of choosing a priori values can vary. On the one hand, Giusti et al. propose to assign values randomly. On the other hand, among many articles using the Markov Chain Monte Carlo (MCMC) procedure for MSSM, one encounters particular choices of priors such as “naturalness-favouring prior” [2] or “theoretical probability of a state of nature” [14]. Resulting in what they called ‘LHC forecasts’, these Bayesian studies make predictions about future experiments; although reasonable, they are rather arbitrary. More importantly, such approaches have paved the road for understanding Strumia’s metaphoric probability in the statistical sense. When ten years later Casas et al. will be comparing definition (9) with definition (8), they will feel entitled to speak about “the statistical meaning” of fine tuning [19].

3.4 Naturalness in Model Comparison

Defining naturalness through a finite range of parameters corresponding to different model-building scenarios has become a dominant trend. Particle physics was now to be seen as consisting of scenarios defined by the values of fundamental parameters [7, 18, 19]; in this language naturalness has become a measure of “how atypical” is a given scenario [7]. If, previously, the use of fine tuning had been limited to highlighting the difficulties of a particular model, now naturalness began to be used for comparing different models.

Anderson and Castaño modified the Barbieri-Giudice measure, (4) in place of (3), because of the problem of global sensitivity. Athron and Miller went further to consider models with several tuned observables as well as finite variations of parameters [7]. The parameters themselves were no more required to be uniformly distributed over a certain range in parameter space. To give a quantifiable version of this larger notion, Athron and Miller explore the larger parameter space far from the initial point P′ and introduce “generic” scenarios and “typical” volumes of parameter space formed by “similar” scenarios. They claim, in opposition to Anderson and Castaño, that the definitions of these terms must be “chosen to fit to the type of problem one is considering”. A typical volume of parameter space cannot be the Anderson-Castaño average of volumes G throughout the whole parameter space, 〈G〉, for it would depend only on how far the parameters are from some “hypothesized upper limits on their values”. For example, an observable O which depends on a parameter p according to O=αp, will display fine tuning for small values of p if one chooses the maximum possible value of p to be large. In the Anderson-Castaño approach, upper limits on parameters arise from the requirement that the model’s meaningful predictions be preserved. For Athron and Miller this is too generic.

To fit the choice to particular cases, they introduce similar scenarios defined by a “sensible” choice of how far numerically the observable value may deviate from a given one. Let F be the volume of dimensionless variations in the parameters over some arbitrary range [a,b] around point P′ and G be the volume in which dimensionless variations of the observable fall into the same range:

$$a \leq\frac{p_i ( P )}{p_i ( P^\prime)} \leq b,\qquad a \leq\frac {O_j ( \{ p_i ( P )\} )}{O_j ( \{ p_i ( P^\prime) \} )} \leq b.$$
(6)

In their MSSM calculation Athron and Miller use a=b=0.1 claiming that this 10% threshold amounts to not encountering a “dramatically different” physics. The measure of fine tuning then is

$$ \Delta_{\mathit{AM}} = \frac{F}{G}.$$
(7)

This measure can be applied straightforwardly in the case of a single observable like the Z mass, but it can also be used to compare fine tuning between different observables. In the latter case, F and G are volumes in the multi-dimensional spaces of, respectively, parameters and observables. To consider a multi-dimensional space of observables is a novelty, while the multi-dimensional parameter space dates back to the time when it has become clear that the big hierarchy problem was not the only fine tuning to be found. Many experimental parameters were measured constraining the values of parameters in BSM models, such as quark masses, the strong coupling constant, the anomalous magnetic moment of the muon, the relic density of thermal dark matter, smallness of flavor violation, non-observation of sparticles below certain thresholds and so forth. Motivations for discarding models that arise from different fine-tunings may not be equally compelling, although each of the calculations is mathematically identical and “morally similar” [50] to the fine tuning from m Z . Barbieri and Giudice considered the most constraining fine tuning among all parameters:

$$ \Delta= \max_i \bigl\{\Delta_{\mathit{BG}} { (p_i )}\bigr\}.$$
(8)

Later alternative solutions have been considered, such as [16, 18]

$$ \Delta= \sqrt{\sum_i\Delta_{\mathit{BG}} ( p_i ) }.$$
(9)

Still the Anderson-Castaño problem of upper limits on p i cannot be avoided even if one defines similar scenarios independently. Athron and Miller wish to maintain decorrelated tunings and to vary each observable without regard for the others. Individual contributions to volume G are then made with no concern for the contributions from other observables. At this point Athron and Miller realize that two observables can only be compared if Δ AM is normalized. To do so, they are forced to reintroduce the Anderson-Castaño average value (4):

$$ \hat{\Delta}_{\mathit{AM}} = \frac{1}{\bar{\Delta}}\frac{F}{G},$$
(10)

which relies on the knowledge of the total allowed range of parameters in a particular model. The hypothesized upper limit of this range determines how compelling the naturalness argument for new physics will be. The same normalization procedure is essential if one wants to use fine tuning to compare different models.

Although it appears in the literature as an incremental refinement of the original Wilson’s original idea through the work of Barbieri, Giudice, Anderson, Castaño and others, the Athron-Miller notion of naturalness lies very far from Wilson’s. Naturalness has become a statistical measure of how atypical is a particular scenario. This reinforces the temptation to use the numeric value of fine tuning as an indicator and to set off several models against each other: a less tuned model is to be preferred to a more tuned one. In the literature the comparison bears not only on scenarios similar in the Athron-Miller sense, but also models predicting completely different physics. On the one hand, one reads:

The focus point region of mSUGRA model is especially compelling in that heavy scalar masses can co-exist with low fine-tuning…[8, our emphasis]

We …find preferable ratios which reduce the degree of fine tuning. [1, our emphasis]

On the other hand, such claims are mixed with assertions going beyond the applicability of the Anderson-Castaño or even the Athron-Miller definitions:

Some existing models…are not elevated to the position of supersymmetric standard models by the community. That may be because they involve fine-tunings… [13, our emphasis]

In order to be competitive with supersymmetry, Little Higgs models should not worsen the MSSM performance [in terms of the degree of fine tuning]. Fine tuning much higher than the one associated to the Little Hierarchy problem of the SM …or than that of supersymmetric models …is a serious drawback. [18, our emphasis]

…the fine-tuning price of LEP… [11, 22, our emphasis]

Comparing altogether different models by confronting the numbers, e.g., being tuned at 1% against being tuned at 10%, is only meaningful in the strict sense if the models can be put in a common parameter space. Short of that, any conclusion drawn from such a comparison loses its precise meaning and should be understood as a metaphor shaping the historic and sociological competition between otherwise incommensurable models.

4 Interpretation

4.1 The Anthropic Connection

Fine tuning in the big hierarchy problem was one of the factors that have given rise to theories beyond the Standard Model. In the 1970s, the SM began to be viewed as an approximation to some future ‘new physics’, i.e., an effective field theory (EFT) valid up to some limit scale Λ NP . The EFT approach relies crucially on the assumption of decoupling between energy scales and the possibility to encode such a decoupling in a few modified constants of the field-theoretic perturbation series. This connects EFT with naturalness, which is a measure of stability against higher-order corrections in the perturbation series. If they are non-negligible, this invalidates the use of the perturbation expansion together and the EFT method. Thus naturalness becomes amenable to experimental test in the searches for new physics: “If the experiments at the LHC find no new phenomena linked to the TeV scale, the naturalness criterion would fail and the explanation of the hierarchy between electroweak and gravitational scales would be beyond the reach of effective field theories” [34].

If BSM models (e.g., MSSM) are EFTs with respect to some unified theory involving gravitation (e.g., supersymmetric models of gravity), it is possible to speak within one and the same theory about the fine-tuning of low-energy observables (like m Z ) and the fine-tuning of gravitational parameters (like the cosmological constant). Thus fine tuning in particle physics and fine tuning is cosmology become ‘morally’ connected; and through this connection a long tradition of interpreting the latter notion in anthropic terms influences the interpretation of the former.

Introducing a landscape of different scenarios in the definitions of naturalness, (4), (7), and (10), may prompt, on the ontological side, a realistic interpretation of such scenarios. If every value from the landscape is realized in some world, one can justify fine tuning as a probability distribution corresponding to our chances to find ourselves in one of these worlds. The argument goes as follows. 1, establish that the descriptions of worlds with different values of parameters are mathematically consistent and not forbidden by the theory. 2, claim that such worlds really exist. For this, refer to Gell-Mann’s “totalitarian principle”, requiring that anything which is not prohibited be compulsory [12]. Alternatively, refer to what Dirac called “Eddington’s principle of identification”, i.e., asserting the realist interpretation of mathematical quantities as physical entities [27]. Or extrapolate to all physics Peierls’s position that “in quantum electrodynamics one has always succeeded with the principle that the effects, for which one does not obtain diverging results, also correspond to reality” [45]. 3, establish that among all possible worlds those containing highly fine-tuned models are statistically rare, for their probability is defined by the inverse fine tuning. Indeed, the definition of “unnatural” was so chosen that, compared to the full number of worlds, the proportion of unnatural worlds is necessarily tiny. 4, conclude that if we evaluate our chances to be in such a world, the resulting probability must be low.

This interpretation seems extremely imaginative, but it is the one shared intuitively by many physicists, particularly string theorists and cosmologists [15]. The above argument can, and has been, criticized at every step from 1 to 4. Its specific problem in particle physics is that the ‘full number of worlds’ (step 3) can only be defined arbitrarily. Upper limits on the range of parameter values have to be set by fiat or convention. If one goes too far in extending the landscape, some worlds would contradict the theory or experimental findings and violate the requirement of step 1. Just how far one is allowed to go in parameter variations while keeping the premises of step 1 intact is not obvious. This seems to be an unremarkable difficulty at the first sight, but it is real and raises the question of validity of the anthropic interpretation as a whole.

4.2 Interpretation of Probability

Casas et al. consider two tunings of two different observables and propose that “since Δ and Δ(λ) represent independent inverse probabilities, they should be multiplied to estimate the total fine tuning Δ⋅Δ(λ) in the model” [18]. This is a clear evidence of the statistical meaning of naturalness, shared by many particle physicists since the work of Ciafaloni and Strumia [24]. We argued in Sect. 3.3 that the notion of probability must be interpreted in a peculiar way combining a frequentist approach with Bayesianism. Frequency is Gedankenfrequenz, because one counts the number of particular occurrences in the class of imaginary untestable numerical scenarios, instantiated as points in parameter space. On the one hand, the Bayesian component arises in the form of a subjective degree of belief, for ‘we’ are concerned with ‘our’ current ignorance of the true value of a parameter, which we believe will be measured in the future. Here and now the future state of knowledge does not exist, and the bet is subjective. On the other hand, the frequentist component arises because in our mind this state of knowledge does exist, and it is in this mental reality that naturalness can be interpreted as frequency of experiments that are never to be performed in the external world, not even in principle or in some technologically advanced society, for they involve fictitious values of fundamental parameters.

As in the general case of probabilistic reasoning in a situation of uncertainty (e.g., [42]), the fine tuning argument is the last resort of the mind when no rational guidance to future results can be provided. Although a subjective bet adds nothing to objective knowledge of external reality, simply “living with the existence of fine tuning” [29] would be too hard a way of life. Psychologically, we do not resist the temptation to make a “statistical guess” [18] about the future state of knowledge. If we believe that a hidden new principle in particle physics will be uncovered, running a competition between models by comparing their amount of fine tuning may seem to bring us closer to uncovering the principle. However, the principles of nature, both known and unknown, are unique and unstatistical. Therefore, there is no firm epistemological ground to believe that fine-tuning actually leads to a true theory. We argue that naturalness can at best be understood as a sociological heuristic.

4.3 Naturalness as a Heuristic

Karl Popper’s falsification, which took much inspiration from a pretence to adequately describe the methodology of high-energy physics, relies on the assumption that physical experiment can rule out definitively certain predictions made within theoretical models. If this is the case, then the models, or at least such elements of these models that are directly responsible for unfulfilled predictions, do not describe physical reality; hence they are false.

The Popperian methodology depends critically on the possibility to interpret experimental data. If the findings are not conclusive, models cannot be falsified in the original sense of Popper’s. Yet in particle physics of the last twenty-five years experimental findings have not been conclusive (with rare exceptions). While the power of particle accelerators was growing and their exploratory capacity continued to be gradually augmented, no recent accelerator experiment has completely falsified a BSM model. Often all that happens is that the parameters in the model are shifted to a new unexplored region of parameter space. This is chiefly because the experiments at particle accelerators, as well as the gathered cosmological data, are so complex that one is unable to set up a unique correspondence between data and the predictions made within theoretical models. At best, experimental findings will suggest that certain options, while not completely ruled out, are rather difficult to sustain.

That the methodology of particle physics has thus mutated into a probabilistic version of falsificationism is a significant departure from the original Popper’s view. Complex experiments at the accelerators leave any model with a chance to die and a chance to survive, but never act as definitive model murderers. Notwithstanding, a model can still die: not because it was falsified, but merely for falling out of fashion. The rise and fall of theories and models in contemporary particle physics is more a matter of a partly circumstantial history than subject to a rigorous epistemology. For instance, the influence of sociological factors can be decisive, e.g., the choice at the leading universities of professors with a particular taste in physics, or the abrupt reversals between fashionable and worn-out lines of research. The argument from naturalness is a powerful instrument for influencing such developments, due to its persuasive form of a normal scientific argument. The arbitrariness of the measure of naturalness used for comparing different models is disguised; on the surface, one only sees a presumably legitimate comparison of numbers, without any sign of the underlying problematic choice of limiting parameter values.

Those who are the first to fix the arbitrary convention of what is natural and what is not, exercise significant influence over those who will follow. Wilson, Barbieri and Giudice did so. In particular, Barbieri and Giudice gave a mathematical definition of fine tuning, providing a definite form to what had only been a vague feeling of aesthetical unease. Ever since the 1990s, their work has been turned, usually in the hands of others, into a powerful sociological instrument.

Imagine two models which theoretically explain away the big hierarchy problem, and meanwhile no experimental measurements can be made to distinguish between them. The only possible competition between the models is based on purely mathematical criteria, such as the numerical value of fine tuning. Provided that the experimentalists are unable to settle the question, one can only make informed guesses about which of the two competing models will win in the course of history. And to make a better bet in this situation of uncertainty, it may be helpful to use the heuristic of naturalness.

Now imagine that at some other moment in history one definition of naturalness is replaced by another. The persuasive power of the argument survives largely intact, because its practical ‘reputation’ has been fixed during the previous stage. One such modification happened when fine-tuning began to be used in model comparison, while asserting continuity with the original notion only defined with regard to one parameter space. Such use of naturalness has influenced trends and fashions in particle physics by dismissing “unnatural” models and giving hope that it could yet lead to a “more complete model” [3] explaining stability of the weak scale. Physicists have been mostly lucid about the limits of this sociological turn. For example, Binétruy et al. warn:

The [fine-tuning] approach should be treated as providing guidance and should not be used for conclusions such as “the heterotic string theory on an orbifold is 3.2σ better at fitting data than a Type I theory…” [13]

Even if such warnings have been heard and a direct judgment of the kind “one model is better than another” avoided, the heuristic is still at work. A clear manifestation of this is that naturalness has over time influenced model building so that no simple model without significant fine tuning remains in the valid model space (Fig. 2). Unnaturalness of the simpler models has led to the development of more sophisticated ones, which are allegedly less tuned. Although upon further investigation the latter often turn out to be as tuned as the former [50], the very development of such models was started thanks to the naturalness heuristic.

Fig. 2
figure 2

Schematic graph of fine tuning versus model complexity in the space of models beyond SM [23]

Figure 2 shows how the research in particle physics has ventured away from the strife for simplicity. This was hardly imaginable even a short while ago, when, e.g., Quine put on a par “simplicity, economy and naturalness [as] contribut[ing] to the molding of scientific theories generally” [47]. Contrary to this view, naturalness and simplicity in particle physics have become frequent rivals pulling physics in different directions. Dirac believed that in this case aesthetic criteria must be preferred:

The research worker, in his efforts to express the laws of Nature in mathematical form, should strive mainly for mathematical beauty. He should still take simplicity into consideration in a subordinate way to beauty.…It often happens that the requirements of simplicity and beauty are the same, but when they clash the latter must take precedence [28].

Clashes happen more often these days, and the lack of simplicity can become dramatic. Some BSM models are so complex that it makes them less comprehensible, more difficult for doing calculations, and brings them closer to the status of a theory that we only believe, but do not know, to exist. Yet the beauty and elegance of simpler models, that are easier to grasp, come with such a fine-tuning that physicists lose faith in them. On the other hand, many researchers find it repulsive to look for less tuned but more complex models. Perhaps the difficulty to extract unambiguous predictions from such models suggests a rapid end of the heuristic of naturalness, as physicists will seek for dramatically different, but in a new way simpler, solutions.

To this day, naturalness as a heuristic has mainly served to support the claim for the need to temper falsificationism. If we are now concerned with the role of metaphysical and aesthetic arguments in science, in the future simplicity may yet prevail over beauty. For a Popperian, it might mean that naturalness would then be reduced to a purely circumstantial desire of certain scientists for a self-justification of their continuing work on semi-deceased physical models.