1 Introduction

A major challenge facing humanity is the possibility of climate change due to human and/or natural forcings, and how best to respond in a rational and informed manner. To this end, detailed global circulation models (GCMs) have been developed to predict the behaviour of the Earth climate system (atmosphere and oceans), involving solution of the continuity, Navier-Stokes, angular momentum and energy equations and constitutive relations over two- or three-dimensional domains, subject to various initial and boundary conditions (Peixoto and Oort 1992; McGuffie and Henderson-Sellers 2005). These are run interrogatively to yield climate projections—predictions as a function of future time – to examine various forcing and response scenarios. However, a serious difficulty for policy-makers is the promulgation of multiple models by different research groups, due to different modelling priorities, assumptions and input parameters, and inherent difficulties in the construction of climate models, especially in the handling of coupled phenomena [e.g. humidity (Paltridge et al. 2009)] and the need to dramatically reduce their computational complexity, necessitating a turbulence closure scheme. Even with the same (or similar) inputs, different models can provide significantly different climate projections (Stocker et al. 2010). A rational framework for the synthesis of such projections—which operates in a transparent and fully defensible manner—is urgently required, to avoid the lack of objectivity of seemingly ad hoc amalgamations of projections from different groups.

Over the past century, maximum entropy (MaxEnt) methods have been developed for the construction of probabilistic models, initially in thermodynamics (Boltzmann 1877; Planck 1901) and subsequently for all probabilistic systems (Jaynes 1957, 1963; Jaynes and Bretthorst 2003). Although imbued with several information-theoretic interpretations (Jaynes 1957, 1963; Jaynes and Bretthorst 2003; Shannon 1948), the success of such models rests ultimately on the maximum probability (MaxProb) principle of (Boltzmann 1877; Planck 1901; Vincze 1974; Grendár et al. 2001; Niven 2005, 2006, 2009, 2009; Niven and Grendar 2009; Grendar and Niven 2010): “a system can be represented by its most probable state.” This provides a probabilistic definition of the (relative) entropy function:

$$ {\mathfrak{H}}_{rel} = K \ln {\mathbb{P}} $$
(1)

where \(\mathbb{P}\) is the governing probability of an observable realization (macrostate) of the system and K is a constant. The maximum of \(\mathfrak{H}_{rel}\) thus coincides with that of \(\mathbb{P}.\) If the system can be represented by the allocation of distinguishable balls (objects) to distinguishable boxes (categories), then \(\mathbb{P}\) will satisfy the multinomial distribution (Brillouin 1930; Fortet 1977; Read 1983):

$$ {\mathbb{P}}=\hbox {Prob}(n_1,\ldots,n_s|q_1,\ldots,q_s,N) =N! \prod\limits_{i = 1}^s {\frac{q_{i}^{n_{i}} } {n_{i}!}} $$
(2)

where n i is the observed occupancy (number of balls) and q i is the prior probability of the ith category, N is the total number of balls and s the number of categories. Insertion of (2) into (1) with \(K=N^{-1},\) taking the asymptotic limit \(N \to \infty\) and \(n_i/N \to p_i,\) gives the (negative) Kullback-Leibler entropy function:

$$ {\mathfrak{H}}_{KL} = -\sum\limits_{i=1}^s p_i \ln {\frac{p_i} {q_i}}. $$
(3)

Maximisation of (3) for a system which satisfies (2), subject to its constraints, is therefore equivalent to seeking its most probable realization in the asymptotic limit, subject to the same constraints.

We therefore adopt a broader concept of ‘entropy’ than that normally used in the physical sciences. Climatologists will be familiar with the thermodynamic entropy S, which has a clearly defined meaning as the state function S = ∫δQ/T (Clausius) or \(S = k \ln \mathbb{W}\) (Boltzmann), where δQ is the increment of heat entering a system, T is temperature, k is the Boltzmann constant and \(\mathbb{W}\) is the number of microstates within a given realization (macrostate) of a system. Its rate of change is dS/dt, of which the excess (exported) component is commonly termed the thermodynamic entropy production \(\dot{\sigma}\) (de Groot and Mazur 1984; Niven 2009, 2010). However, under the MaxProb or MaxEnt approach adopted here, entropy acquires a more fundamental meaning in terms of the probabilistic state space of a system, however defined. To emphasise their generic character, such entropies are here denoted \(\mathfrak{H}.\) The thermodynamic entropy is in fact a special case of the generic, being derivable by the application of MaxEnt to an energetic system (Boltzmann 1877; Planck 1901; Jaynes 1957, 1963; Jaynes and Bretthorst 2003). The ensuing analyses are based entirely on generic entropy functions, not necessarily related to S; that said, much of the underlying mathematical structure is identical.

The aim of this study is to construct a framework for the synthesis of climate projections from multiple climate models, based on the MaxProb (hence MaxEnt) principle. By analogy with thermodynamics, two approaches are presented, involving constraints on the properties of individual climate models or of ensembles of climate models. In each case, the analysis identifies the most representative (most probable) model from a set of climate models, circumventing the need to calculate the entire set. Other implications of these frameworks, which arise from the mathematical structure given by (Jaynes 1957, 1963; Jaynes and Bretthorst 2003), are examined. In addition, we report a curious finite-time limit on the minimum cost of varying the overall framework at specified rates of change, using a theorem from finite-time thermodynamics (Ruppeiner 1979; Salamon et al. 1980; Salamon and Berry 1983; Nulton et al. 1985; Niven and Andresen 2009).

2 Derivations

Consider an individual Earth general climate model (GCM), composed of J separable computational components. Each component j = 1, …,J is executed by a single choice i(j) of algorithm, methodology or paradigm, from a total of I(j) possible choices. As shown in Fig. 1, this gives a combinatorial scheme in which an individual model is constructed from a set of unique choices i(j) ∈ {1, …, I(j)}, ∀ j. We assume that all models are calculated using the same set of input parameters and assumptions \({\varvec{\theta}};\) moreover, to accommodate variability or errors in \({\varvec{\theta}},\) each model will yield a set or domain of climate projections, which can be explored by Monte Carlo analysis or by some other means. If we move beyond the deterministic mindset that an individual climate model must be the “correct” one, how should we weight the projection sets from different climate models, to obtain a (statistical) picture of their merged sets of projections? One could simply combine an available set of model outputs using equal or assigned weighting factors, as suggested in Stocker et al. (2010), but unless every possible combination has been computed, the resulting composite model will be rather arbitrary. In addition, if the model space is infinite (or merely very large), it will be impossible to compute the composite model in the lifetime of the universe (or in any reasonable time frame). Moreover, the use of equal weights does not allow the incorporation of additional constraints on the model space. We therefore propose a MaxProb-based (hence MaxEnt-based) framework for the weighting of multiple climate models, for which two distinct approaches are available.

Fig. 1
figure 1

Generic combinatorial representation of the climate model weighting framework, showing a single model composed of individual discrete choices of the i(j)th algorithm or methodology for each model component \(j=1,\ldots,J\)

2.1 “Microcanonical” framework

We first construct a “microcanonical” climate model weighting framework, based on the properties of individual climate models. Extending the representation in Fig. 1, consider a single climate model shown in Fig. 2a, in which we choose to rank each choice of algorithm or method i(j) by its cost or energy ε ij , indicating (for example) the relative programming and computational cost of execution of this particular choice. Each energy level i(j) is considered to have the degeneracy g ij  ≥ 1, equal to the number of choices which share the same cost ε ij . The ranking scheme i(j) therefore accounts for, but does not distinguish between, choices of equal cost. Each level i(j) is taken to have the occupancy m ij  ∈ {0,1} (the choices are unique). From simple probabilistic considerations (Brillouin 1930; Fortet 1977; Read 1983) for equiprobable degenerate choices, the probability of a given choice i(j) is given by the reduced multinomial distribution:

$$ {\mathbb{P}}_{i|j}^{(\mu)} = \hbox{Prob}({\bf m}_j|{\bf g}_j,{\varvec{\theta}}) = {\frac{1} {G_j}} \prod\limits_{i=1}^{I(j)} {\frac{g_{ij}^{m_{ij}}} {m_{ij}!}} $$
(4)

where \({\bf m}_j = \{m_{1j},\ldots,m_{I(j)j}\}, {\bf g}_j= \{g_{1j},\ldots,g_{I(j)j}\}, G_j = \sum\nolimits_{i=1}^{I(j)} g_{ij}\) and superscript μ denotes the microcanonical framework. Equation (4) reduces to \(\mathbb{P}_{\tau|j}^{(\mu)} = {g_{\tau j}}/{G_j},\) where τ(j) is the selected choice, but it is preferable to keep the m ij explicit using (4). The probability of selecting a single overall model, assuming that the J components are independent, is therefore given by the “multi-multinomial”:

$$ \begin{aligned} {\mathbb{P}}^{(\mu)} &= \hbox{Prob}({\bf m}|{\bf g},{\varvec{\theta}}) = \prod\limits_{j=1}^{J} \hbox{Prob}({\bf m}_j|{\bf g}_j,{\varvec{\theta}})\\ &= \prod\limits_{j=1}^{J} {\mathbb{P}}_{i|j}^{(\mu)} = \prod\limits_{j=1}^{J} {\frac{1} {G_j}} \prod\limits_{i=1}^{I(j)} {\frac{g_{ij}^{m_{ij}}} {m_{ij}!}} \end{aligned} $$
(5)

where m and g are the respective matrices of m ij and g ij . Each model is subject to J constraints on the total occupancy within each component j:

$$ \sum\limits_{i=1}^{I(j)} m_{ij} = 1, \forall j. $$
(6)

Assuming that the costs are additive over the J components, we can also include a constraint on the total cost E of running the overall model:

$$ \sum\limits_{j=1}^J \sum\limits_{i=1}^{I(j)} \epsilon_{ij} m_{ij} = E. $$
(7)

To determine the most probable or equilibrium model, given the above occupancy and total energy constraints, we should maximise (5) with respect to the unknowns {m ij }, subject to (6)–(7). From the Boltzmann definition (1) with K = 1, this is equivalent to maximising the entropy:

$$ {\mathfrak{H}}^{(\mu)} = \ln {\mathbb{P}}^{(\mu)} = \sum\limits_{j=1}^{J} \left \{ - \ln G_j + \sum\limits_{i=1}^{I(j)} (m_{ij} \ln g_{ij} - \ln {m_{ij}!}) \right \} $$
(8)

subject to the same constraints. We again emphasise that (8) is defined on the space of climate models, and has no connection to the thermodynamic entropy S. If one adopts the Stirling (1730) approximation \(\ln {m_{ij}!}\approx m_{ij} \ln m_{ij}- m_{ij}\) for large m ij (in fact this is not strictly valid), (8) reduces to:

$$ {\mathfrak{H}}_{St}^{(\mu)} = \sum\limits_{j=1}^{J} \left \{ - \ln G_j - \sum\limits_{i=1}^{I(j)} (m_{ij} \ln {\frac{m_{ij}} {g_{ij}}} + m_{ij}) \right \}. $$
(9)

Extremisation of (9) subject to (6)–(7) yields the (microcanonical) Boltzmann distribution at equilibrium:

$$ m_{ij}^{*} = g_{ij} e^{-(\lambda_{0j}+1) - \lambda_E \epsilon_{ij}} = {\frac{1} {Z_j^{(\mu)}}} g_{ij} e^{- \lambda_E \epsilon_{ij}} $$
(10)

where * denotes the asymptotic (Stirling-approximate) extremum, λ0j and λ E are Lagrangian multipliers respectively for the allocation (6) and total energy (7) constraints, and \(Z_j^{(\mu)} = e^{\lambda_{0j}+1} = \sum\nolimits_{i=1}^{I(j)} g_{ij} e^{- \lambda_E \epsilon_{ij}}\) is the jth microcanonical partition function. Equation (10) can be solved in conjunction with (7) to calculate the predicted occupancies \(m_{ij}^{*}.\) If the occupancies are restricted to discrete values {0,1}, this will yield the choices i(j) of the optimal climate model, subject to the total energy constraint E. In practice, numerical solution will typically give floating-point values of \(m_{ij}^{*},\) which can be used as weighting factors with which to combine multiple models of the same total energy E.

Fig. 2
figure 2

Combinatorial representations of a the microcanonical framework, showing a single model composed of g ij degenerate choices i(j) for each model component \(j=1,\ldots,J\) (ranked by energy level \(\epsilon_{ij}\)); and b the canonical framework, composed of an ensemble of N amalgamated microcanonical models. Ball numbers denote the model index

As noted, since m ij  ∈ {0,1}, Stirling’s approximation does not strictly apply to the above analysis, and so (10) is only an approximate solution. This can be addressed by directly maximising the non-asymptotic entropy (8) with respect to m ij , subject to (6)–(7), giving the equilibrium distribution (Niven 2005, 2006, 2009, 2009):

$$ m_{ij}^{\#} = \Uplambda^{-1} \left( \ln g_{ij} - \lambda_{0j} - \lambda_E \epsilon_{ij} \right) $$
(11)

where # denotes the non-asymptotic extremum and \(\Uplambda^{-1}(y)= \psi^{-1}(y-1)\) is the upper inverse of the function \(\Uplambda(x)=\psi(x+1),\) defined for convenience, in which ψ(x) is the digamma function. (Note (11) can be written with additional terms in G j and N (c.f. Niven 2009); these are here incorporated in λ0j .) In this case, no explicit partition functions exist, and (11) must be solved in conjunction with (6)–(7). This method will give more precise values of the optimal weighting factors \(m_{ij}^{\#}\), although in practice, its numerical solution can be difficult. The non-asymptotic solution (11) is itself an approximation to the true discrete MaxProb solution (with m ij  ∈ {0,1}), which must be identified by a (computationally expensive) combinatorial search scheme.

Example

The above framework can be demonstrated by a simple example, in which a climate model is constructed from J = 3 components, with I = [3, 4, 3] choices of algorithm. The degeneracies and energy levels are taken as:

$$ \user2{g} = \left[ \begin{array}{lll} 1&2&1\\ 2&4&2 \\ 4&8&4\\ &16&\\ \end{array} \right], \qquad {\varvec{{\epsilon}}}= \left[ \begin{array}{lll} 1&1&1\\ 4&4&4 \\ 9&9&9\\ &16&\\ \end{array} \right] \hbox{units}. $$
(12)

In this framework, more (degenerate) algorithms, and algorithms with a fourth energy level, are available for model component 2. For a total energy per model of E = 17 units, the inferred asymptotic (10) and non-asymptotic (11) solutions are, respectively:

$$ \begin{aligned} \user2{m}^{*} &= \left[ \begin{array}{lll} 0.2823& 0.2250& 0.2823 \\ 0.3650& 0.2909& 0.3650 \\ 0.3527& 0.2811& 0.3527 \\ & 0.2031&\\ \end{array}\right],\\ {{\varvec{\lambda}_0}}^{*} &= \left[0.1192, 1.0393, 0.1191 \right]^{\top}, \quad \lambda_E^{*}=0.1455 \end{aligned} $$
(13)

and

$$ \begin{aligned} \user2{m}^{\#} &= \left[ \begin{array}{lll} 0.1950& 0.1695& 0.1950 \\ 0.4206& 0.3883& 0.4206 \\ 0.3844& 0.3532& 0.3844 \\ & 0.0890&\\ \end{array}\right],\\ {{\varvec{\lambda}_0}}^{\#} &= \left[0.1494, 0.8755, 0.1494 \right]^{\top}, \quad \lambda_E^{\#}=0.1460 \end{aligned}. $$
(14)

All calculations were conducted in Maple 14. The equilibrium model should thus be constructed using the weights in \(\user2{m}^{*}\) (or, arguably, \(\user2{m}^{\#}\)). In this example, algorithms of intermediate cost (the second energy level) have the highest weighting. Some difference is evident between the asymptotic and non-asymptotic solutions m and Massieu functions \({{\varvec{\lambda}_{0}}}\), due to the small model space of this simplified example. The energy multipliers λ E of the two solutions are, however, quite similar.

2.2 “Canonical” framework I

The foregoing methodology is mathematically sound and provides a formal framework for the combination of different climate models. It is, however, somewhat restrictive in that it only includes models of a single total energy E. It is possible to conduct the analysis at a higher “canonical” level—in the same manner as in thermodynamics—by the analysis of “systems of systems”, in this case involving ensembles of individual climate models. This is shown in Fig. 2b, in which an ensemble is constructed by collecting a sample (without replacement) of N individual models, and amalgamating the results. This can be represented by a combinatorial scheme in which distinguishable balls—labelled by the model index z ∈ {1, …, N}—are allocated to distinguishable levels i(j), again indicating choices of energy level ε ij with degeneracy g ij . This gives the occupancies \(n_{ij} \in \{0,\mathbb{N} \}\) for each energy level of the ensemble, which are connected to those for each model by:

$$ n_{ij} = \sum\limits_{z=1}^{N} m_{ij}^{(z)}, \quad \forall j $$
(15)
$$ N = \sum\limits_{i=1}^{I(j)} n_{ij} = \sum\limits_{i=1}^{I(j)} \sum\limits_{z=1}^{N} m_{ij}^{(z)}, \quad \forall j $$
(16)

The probability of a specified set of occupancies n ij for a particular j is now given by (Brillouin 1930; Fortet 1977; Read 1983):

$$ {\mathbb{P}}_{j}^{ (\chi)} = \hbox {Prob}({\bf n}_j|{\bf g}_j,N,{\varvec{\theta}}) ={\frac{N!} {G_j^{N}}} \prod\limits_{i=1}^{I(j)} {\frac{g_{ij}^{n_{ij}}} {n_{ij}!}} $$
(17)

where n j  = {n 1j ,…, n I(j)j }, while χ denotes the canonical framework. The multinomial factor \(N! \left / \prod\nolimits_{i=1}^{I(j)}\right. {n_{ij}!}\) accounts for number of permutations of models which attain the same set of occupancies n j . The probability of a specified ensemble, again assuming J independent components, is thus given by the “multi-multinomial”:

$$ \begin{aligned} {\mathbb{P}}^{ (\chi)} &= \hbox {Prob}({\bf n}|{\bf g},N,{\varvec{\theta}}) = \prod\limits_{j=1}^{J} \hbox {Prob}({\bf n}_j|{\bf g}_j,N,{\varvec{\theta}}) \\ &= \prod\limits_{j=1}^{J} {\mathbb{P}}_{j}^{ (\chi)} = \prod\limits_{j=1}^{J} {\frac{N!} {G_j^{N}}} \prod\limits_{i=1}^{I(j)} {\frac{g_{ij}^{n_{ij}}} {n_{ij}!}} \end{aligned} $$
(18)

where n is the matrix of n ij , whence (1) with \(K=N^{-1}\) gives the entropy:

$$ \begin{aligned} {\mathfrak{H}}^{(\chi)} &= {\frac{1} {N}} \ln {\mathbb{P}}^{(\chi)} = {\frac{1} {N}} \sum\limits_{j=1}^{J} \left\{\ln N! - N \ln G_j\right.\\ & \quad \left. + \sum\limits_{i=1}^{I(j)} (n_{ij} \ln g_{ij} - \ln {n_{ij}!}) \right\} \end{aligned}. $$
(19)

This is subject to constraints on the occupancies, given by the first part of (16).

How should we analyse ensembles of models? We could, in the first instance, examine the set of all possible models, of cardinality \(\prod\nolimits_{j=1}^{J} G_j^{N}\) (Niven and Grendar 2009). This would not, however, be very informative, since all models would a priori be of equal weight and so would not be discriminated by the MaxProb (or MaxEnt) method. The total ensemble also does not allow the inclusion of additional information about the desired set of models. If, on the other hand, we impose a constraint on the mean energy of the ensemble:

$$ {\frac{1} {N}} \sum\limits_{j=1}^{J} \sum\limits_{i=1}^{I(j)} \epsilon_{ij} n_{ij} = \langle E \rangle $$
(20)

we then impose a decision rule on its desired composition, namely, on the average cost of constructing its constituent models. In contrast to the microcanonical framework, this allows models of greater-than-average total energy E > 〈E〉, so long as these are balanced in the ensemble by models of lower energy E < 〈E〉. Combining (19) with (16) and (20) gives the Lagrangian:

$$ \begin{aligned} L^{(\chi)} &= \sum\limits_{j=1}^{J} \left \{{\frac{1} {N}} \ln N! - \ln G_j + \sum\limits_{i=1}^{I(j)} \left({\frac{n_{ij}} {N}} \ln g_{ij} - {\frac{1} {N}} \ln {n_{ij}!}\right) \right \} \\ &\quad - \sum\limits_{j=1}^{J} \kappa_{0j} \left \{ \sum\limits_{i=1}^{I(j)} {\frac{n_{ij}} {N}} -1 \right \} - \kappa_E \left \{ \sum\limits_{j=1}^{J} \sum\limits_{i=1}^{I(j)} \epsilon_{ij} {\frac{n_{ij}} {N}} - \langle E \rangle \right \} \end{aligned} $$
(21)

where κ0j and κ E are Lagrangian multipliers for the allocation (16) and energy (20) constraints. Extremisation gives the non-asymptotic equilibrium solution:

$$ n_{ij}^{\#} = \Uplambda^{-1} \left( \ln g_{ij} - \kappa_{0j} - \kappa_E \epsilon_{ij} \right), \forall j $$
(22)

(again all constant terms are brought into κ0j ). For any given N, these can be solved numerically in conjunction with the constraints (16) and (20), to give the optimum number of times (weighting factor) n # ij that each choice i(j) should be included in the ensemble, subject to 〈E〉.

When the factorials in (19) satisfy Stirling’s approximation, (22) gives the (canonical) Boltzmann distribution at equilibrium:

$$ {\frac{n_{ij}^{*}} {N}} = {\frac{1} {Z_j^{(\chi)}}} g_{ij} e^{- \kappa_E \epsilon_{ij}} $$
(23)

where \(Z_j^{(\chi)} = N e^{\kappa_{0j}+1} = \sum\nolimits_{i=1}^{I(j)} g_{ij} e^{- \kappa_E \epsilon_{ij}}\) is the jth canonical partition function.

Example

The canonical framework can be demonstrated using the example described previously (12), now constrained by a mean total energy per model of 〈E〉 = 17 units (less than the mean of all possible models, 〈E〉 = 24.3904 units). The inferred asymptotic solution (23) is identical to (13), i.e.:

$$ \begin{aligned} \frac{\user2{n}^{*}} {N} &= \user2{m}^{*},\\ {\varvec{\kappa_0}}^{*} &= \left[ \begin{array}{l} \ln \left( 3.062388620 \, {N}^{-1} \right) - 1\cr \noalign{}\ln \left( 7.685321125 \, {N}^{-1} \right) -1 \cr \noalign{}\ln \left( 3.062388620 \, {N}^{-1} \right) -1 \cr \end{array} \right], \, \kappa_E^{*}=\lambda_E^{*} \end{aligned}. $$
(24)

For N = 27 (say) this gives \({\varvec{\kappa_0}}^{*} =[- 3.1766, -2.2565, - 3.1766]^{\top}\). In comparison, the non-asymptotic solution (22) at N = 27 is:

$$ \begin{aligned} {\frac{\user2{n}^{\#}}{N}} &= \left[ \begin{array}{lll} 0.2795& 0.2232& 0.2795\\ 0.3668& 0.2940& 0.3668 \\ 0.3537& 0.2834& 0.3537\\ & 0.1994& \\ \end{array} \right],\\ {\varvec{\kappa_0}}^{\#} &= \left[-2.2315, -1.3292, -2.2315 \right]^{\top}, \, \kappa_E^{\#}=0.1455 \end{aligned}. $$
(25)

Compared to (14), the latter exhibits a more uniform distribution for each j, and is closer to the asymptotic form (24).

2.3 “Canonical” framework II

One difficulty with the above canonical framework is that it—like its microcanonical precursor—still requires separability of the model into J distinct components, for which the costs ε ij are additive. In more general situations, this separability may not be possible due to coupling between components. In that case we must revert to a model space based on ensembles of entire models. Severing all connection to the components j, we consider a model space from which we collect a sample (ensemble) of \(\mathcal{{N}}\) models, containing \({n}_\imath\) models each of total energy \(E_\imath\). Each energy level has degeneracy g i . The probability of a specified ensemble is:

$$ {\mathbb{P}}^{(\chi)}_{II} = \hbox{Prob}({\bf n}|{\bf g},{\mathcal{{N}}},{\varvec{\theta}}) = {\frac{{\mathcal{{N}}}!} {{\mathcal{{G}}}^{\mathcal{{N}}}}} \prod\limits_{\imath=1}^{I} {\frac{g_{\imath}^{n_{\imath}}} {n_{\imath}!}} $$
(26)

where \(\mathcal{{G}} = \sum\nolimits_{\imath=1}^{I} g_{\imath}\). Boltzmann’s equation (1) with \(K=\mathcal{{N}}^{-1}\) gives the entropy:

$$\begin{aligned} {\mathfrak{H}}^{(\chi)}_{II} &={\frac{1}{{\mathcal{{N}}}}} \ln {\mathbb{P}}^{(\chi)}_{II} ={\frac{1}{{\mathcal{{N}}}}} \left\{\ln{\mathcal{{N}}}! - {\mathcal{{N}}}\ln{\mathcal{{G}}} + \sum\limits_{\imath=1}^{I} (n_{\imath} \ln g_{\imath} - \ln {n_{\imath}!})\right\} \end{aligned}$$
(27)

This is subject to the occupancy and mean ensemble energy constraints:

$$ \sum\limits_{\imath=1}^{I} n_{\imath} = {\mathcal{{N}}} $$
(28)
$$ {\frac{1} {{\mathcal{{N}}}}} \sum\limits_{\imath=1}^{I} E_{\imath} n_{\imath} = \langle E \rangle $$
(29)

Forming the Lagrangian and extremisation gives the non-asymptotic equilibrium occupancies:

$$ n_{\imath}^{\#} = \Uplambda^{-1} \left( \ln g_{\imath} - \varphi_{0} - \varphi_E E_{\imath} \right) $$
(30)

where φ0 and φ E are Lagrangian multipliers for the occupancy and energy constraints. If (27) satisfies the Stirling approximation, the distribution reduces to:

$$ {\frac{n_{\imath}^{*}} {{\mathcal{{N}}}}} = {\frac{1} {Z^{(\chi)}_{II} }} g_{\imath} e^{- \varphi_E E_{\imath}} $$
(31)

where \(Z^{(\chi)}_{II}= \mathcal{{N}} e^{\varphi_{0}+1} = \sum\nolimits_{\imath=1}^{I} g_{\imath} e^{- \varphi_E E_{\imath}}\) is the canonical II partition function. Either (30) or (if valid) (31) can be solved in conjunction with the constraints (28)–(29), to give the weights \(n_\imath\) of the most representative model.

2.4 Summary

At this point, it is worth summarising some important features of the microcanonical and two canonical frameworks proposed:

  • As evident from the predicted solutions (10)–(11) and (22)–(23), if one seeks the optimal model to describe a set of climate models, it is not necessary to compute all possible combinations of models. Using the MaxProb method, one can directly calculate the single model or a reduced set of models which best represents the model set, subject to constraint(s) on the model or ensemble properties. The effect of two competing constraints is examined in the next section.

  • The microcanonical framework imposes constraint(s) on individual models, whereas the two canonical frameworks impose constraint(s) over ensembles of models. The latter enable the synthesis of larger sets of models.

  • Note that, due to the assumed independence of the J model components, the microcanonical and canonical I frameworks are “multi-multinomial” (5) and (18). The choices i(j) for a specified j = ϑ are thus independent of the other choices j ≠ ϑ. The MaxProb prediction can therefore be computed using individual models composed of whichever choices i(j) are convenient, so long as the overall set conforms to the MaxProb prediction. In the canonical II model, we overcome the difficulty of coupled model components by considering ensembles of entire models, with constraints on the total energy of each model.

  • How should we interpret the Lagrangian multipliers on the energy constraint? By analogy with thermodynamics, these can be interpreted as λ E  = 1/kT (μ), κ E  = 1/kT (χ) and φ E  = 1/kT (χ) II , where the T parameters are framework “temperatures” and k is a constant with units of energy (or cost) per temperature unit. The T’s are not thermodynamic temperatures, but express the distribution of energy over the available energy levels, in the relevant model or ensemble space. In effect, they serve as proxy variables for the total model cost E or mean ensemble cost 〈E〉.

  • Although the MaxProb framework is primarily designed to determine the most probable (maximum entropy) model, it is also possible to interrogate the Lagrangian to determine the minimum entropy model(s), i.e. those which lie farthest from the optimum. In this manner, one can explore the extremities of the model or ensemble space, to identify model outliers. Since minimum entropy solutions tend to lie on non-continuous boundaries of the solution domain, they are generally inaccessible to extremisation methods (Kapur and Kesevan 1992); nonetheless, they should be identifiable by numerical optimisation algorithms such as simulated annealing.

  • The mathematical structure of the output from the MaxEnt algorithm gives rise to many more features of the predicted solution. Some of these features are explored in later sections.

3 “Canonical” framework II with cost and benefit constraints

3.1 Derivation

We now examine a more comprehensive canonical II framework, in which we impose two constraints at the ensemble level: a constraint on the mean ensemble cost or energy 〈E〉, as before (29), and also a constraint on some measure of the average ensemble “worthiness” or “benefit” 〈B〉 (for example, a measure of its precision or accuracy). In this manner, we construct a MaxProb framework with which to conduct cost-benefit analyses of various ensembles of models, and to interrogate the trade-off between costs and benefits.Footnote 1 In general, the energy and benefit levels will have different ranks, necessitating the use of different indices i ∈ {1, …, I} (as before) and \(\ell \in \{1, \ldots, \mathfrak{L}\}\). We therefore consider model choices ranked by total model energies E i and benefits B i, of joint degeneracy g i. The probability that an ensemble of \(\mathcal{{N}}\) models has the occupancies {n i} is governed by the multinomial:

$$ {\mathbb{P}} = {\frac{{\mathcal{{N}}}!} {{\mathcal{{G}}}^{\mathcal{{N}}}}} \prod\limits_{i=1}^{I} \prod\limits_{\ell=1}^{{\mathfrak{L}}} {\frac{g_{i\ell}^{n_{i\ell}}} {n_{i\ell}!}} $$
(32)

where now \(\mathcal{{G}} = \sum\nolimits_{i=1}^{I} \sum\nolimits_{\ell=1}^{\mathfrak{L}} g_{i\ell}\) (for convenience we drop the superscript labels). From (1) with \(K=\mathcal{{N}}^{-1},\) we maximise the entropy:

$$ \begin{aligned} {\mathfrak{H}} &= {\frac{1} {{\mathcal{{N}}}}} \left \{ \ln {\mathcal{{N}}}! - {\mathcal{{N}}} \ln {\mathcal{{G}}} + \sum\limits_{i=1}^{I} \sum\limits_{\ell=1}^{{\mathfrak{L}}} (n_{i\ell} \ln g_{i\ell} - \ln {n_{i\ell}!}) \right\} \end{aligned} $$
(33)

subject to the constraints:

$$ {\frac{1} {{\mathcal{{N}}}}} \sum\limits_{i=1}^{I} \sum\limits_{\ell=1}^{{\mathfrak{L}}} n_{i\ell} = 1 $$
(34)
$$ {\frac{1} {{\mathcal{{N}}}}} \sum\limits_{i=1}^{I} \sum\limits_{\ell=1}^{{\mathfrak{L}}} E_{i\ell} n_{i\ell} = \langle E \rangle $$
(35)
$$ {\frac{1} {{\mathcal{{N}}}}} \sum\limits_{i=1}^{I} \sum\limits_{\ell=1}^{{\mathfrak{L}}} B_{i\ell} n_{i\ell} = \langle B \rangle $$
(36)

to give the non-asymptotic equilibrium solution:

$$ n_{i\ell}^{\#} = \Uplambda^{-1} \left( \ln g_{i\ell} - \omega_{0} - \omega_E E_{i\ell} - \omega_B B_{i\ell} \right) $$
(37)

where ω0, ω E and ω B are the Lagrangian multipliers. If Stirling’s approximation applies, the entropy is:

$${\mathfrak{H}}_{St} = \ln {\mathcal{{N}}} - \ln{\mathcal{{G}}} - \sum\limits_{i=1}^{I}\sum\limits_{\ell=1}^{{\mathfrak{L}}} \left({\frac{n_{i\ell}}{{\mathcal{{N}}}}} \ln {\frac{n_{i\ell}} {g_{i\ell}}} \right)$$
(38)

whence extremisation gives:

$$ \begin{aligned} {\frac{n_{i\ell}^{*}} {{\mathcal{{N}}}}} &= {\frac{ g_{i\ell}} {Z}} e^{- \omega_E E_{i\ell} - \omega_B B_{i\ell}}\\ Z &= {\mathcal{{N}}} e^{\omega_{0}+1} = \sum\limits_{i=1}^{I} \sum\limits_{\ell=1}^{{\mathfrak{L}}} g_{i \ell} e^{- \omega_E E_{i\ell} - \omega_B B_{i\ell}} \end{aligned} $$
(39)

where Z is the partition function.

The Lagrangian multiplier ω E can again be interpreted as an inverse ensemble temperature ω E  = 1/kT, where k is a constant. The multiplier ω B can be considered as a measure of the overall benefit provided by the ensemble, in reciprocal benefit units. In effect, it acts as a proxy variable for the mean benefit 〈B〉. Since 〈B〉 measures the average information or value provided by the framework, it can be interpreted very crudely as a reciprocal density or volume, whereupon we can interpret ω B  = P/kT, in which P is a mean ensemble pressure (this interpretation should not be taken too seriously).

3.2 Jaynesian mathematical structure

Now that we have the main results, we can examine several important mathematical features of the solution. Most of these were reported in a generic context by Jaynes (Jaynes 1957, 1963; Jaynes and Bretthorst 2003) [see also Kapur and Kesavan (1992) and Tribus (1961)], although many were previously known in thermodynamics. The foregoing microcanonical and canonical I and II frameworks also exhibit these features, but it is more interesting to examine the effect of two competing constraints.

Firstly, for the Stirling-approximate case, substitution of (39) into (38), by sorting into expectations along the lines of (Jaynes 1963), gives the asymptotic maximum entropy:

$$ {\mathfrak{H}}^{*} = - \ln {\mathcal{{G}}} - \phi + \omega_E \langle E \rangle + \omega_B \langle B \rangle $$
(40)

where for convenience we define the potential function (negative Massieu function) ϕ = −ω0 =  −ln Z. The most probable state of the ensemble is thus given by a constant term, plus the Massieu function, plus the sum of products of the constraints and their conjugate Lagrangian multipliers.

Since the entropy function and constraints are state variables on the space of ensembles of models, (40) provides a linear homogenous equation which describes the framework.Footnote 2 This can be used to examine the response of the framework to changes in the constraints and/or multipliers. For constant G and ϕ we immediately see that (Jaynes 1957, 1963; Jaynes and Bretthorst 2003):

$$ {\frac{\partial {\mathfrak{H}}^{*}} {\partial \langle E \rangle}} \biggr|_{\langle B \rangle} = \omega_E, \quad {\frac{\partial {\mathfrak{H}}^{*}} {\partial \langle B \rangle}} \biggr|_{\langle E \rangle} = \omega_B. $$
(41)

Second differentiation gives the Hessian matrix:

$$ -{\mathbf{a}}= \left[\begin{array}{ll}{\frac{\partial^{2} {\mathfrak{H}}^{*}} {\partial \langle E \rangle^{2}}} & {\frac{\partial^{2} {\mathfrak{H}}^{*}}{\partial \langle E \rangle \partial \langle B \rangle}}\\ {\frac{\partial^{2} {\mathfrak{H}}^{*}}{\partial \langle B \rangle \partial \langle E \rangle}} &{\frac{\partial^{2} {\mathfrak{H}}^{*}}{\partial \langle B \rangle^{2}}} \end{array}\right] = \left[\begin{array}{ll}{\frac{\partial \omega_E}{\partial \langle E \rangle}} &{\frac{\partial \omega_B }{\partial \langle E \rangle}}\\ {\frac{\partial \omega_E}{\partial \langle B \rangle}} &{\frac{\partial \omega_B }{\partial \langle B \rangle}}\end{array}\right]. $$
(42)

If the mixed derivatives are equivalent (i.e. \(\mathfrak{H}^{*}\) is continuous and continuously differentiable, at least up to second order), this gives the reciprocal or Maxwell-like relation (Jaynes 1963; Jaynes and Bretthorst 2003):

$$ {\frac{\partial \omega_E} {\partial \langle B \rangle}} = {\frac{\partial \omega_B}{\partial \langle E \rangle}}. $$
(43)

Equivalently, (40) can be rewritten as a function of the potential ϕ, whence it can be shown that (Jaynes 1957, 1963; Jaynes and Bretthorst 2003):

$$ {\frac{\partial \phi} {\partial \omega_E}} \biggr|_{\omega_B} = \langle E \rangle, \quad { \frac{\partial \phi} {\partial \omega_B}} \biggr|_{\omega_E} = \langle B \rangle. $$
(44)

Second differentiation gives:

$$ - {\varvec{\alpha}} = \left[\begin{array}{ll}{\frac{\partial^{2} \phi}{\partial \omega_E^{2}}} &{\frac{\partial^{2} \phi}{\partial \omega_E \partial \omega_B}}\\ {\frac{\partial^{2} \phi}{\partial \omega_B \partial \omega_E}} &{\frac{\partial^{2} \phi} {\partial \omega_B^{2}}}\end{array}\right] = \left[\begin{array}{ll}{\frac{\partial \langle E \rangle}{\partial \omega_E}} &{\frac{\partial \langle B \rangle}{\partial \omega_E}}\\ {\frac{\partial \langle E \rangle} {\partial \omega_B }} &{\frac{\partial \langle B \rangle}{\partial \omega_B }}\end{array}\right] $$
(45)

giving, again for equivalent mixed derivatives (Jaynes 1963, Jaynes and Bretthorst 2003):

$$ {\frac{\partial \langle B \rangle} {\partial \omega_E}} = {\frac{\partial \langle E \rangle} {\partial \omega_B }}. $$
(46)

From (42) and (45), it is evident that:

$$ {\mathbf a} = {\varvec{\alpha}}^{-1}. $$
(47)

This defines a Legendre transformation between \(\mathfrak{H}^{*}\) and ϕ representations of the system (Jaynes 1963; Jaynes and Bretthorst 2003; Kapur and Kesevan 1992).

Finally, we note that it may be desirable to rank climate models by more than two properties, e.g. the model cost E and several different benefits B 1B 2, …, B M . The foregoing analysis can readily be extended into as many dimensions as desired, giving the above mathematical structure as a function of the constraints 〈E〉, and 〈B 1〉, …, 〈B M 〉.

3.3 Implications

What are the implications of the above Jaynesian mathematical structure? In essence, it governs the effect of changes to the constraints and/or multipliers on the manifold of equilibrium positions of the framework. This includes:

  • Firstly, the first derivatives (41) and (44) can be interpreted as equations of state on the space of ensembles of models, describing the relationship between the rate of change of the entropy or potential as a function of the constraints or their conjugate multipliers (Jackson 1968).

  • Secondly, the second derivatives (42) and (45) describe the susceptibilities of the framework, i.e. the functional connections between the constraints and multipliers. In thermodynamics, such susceptibilities include the heat capacity, isothermal compressibility, coefficient of thermal expansion and so on (e.g. Gilmore 1982; Callen 1985; Niven 2009; Niven and Andresen 2009); if desired, such parameters can also be defined for the model framework proposed here. The Maxwell-like relations (43) and (46) reflect the coupling between the constraints, such that changes in one constraint or its multiplier, at constant \(\mathfrak{H}^{*}\) or ϕ, will produce adjustments to the other pair.

  • Thirdly, the second derivative matrix (45) of the potential function ϕ contains even further information, since in the asymptotic limit (N → ∞), it is equivalent (with change of sign) to the variance-covariance matrix of the constraints (Jaynes 1957, 1963; Jaynes and Bretthorst 2003; Kapur and Kesevan 1992):

    $$ {\varvec{\alpha}} = \left[\begin{array}{ll}\langle E^{2} \rangle - \langle E \rangle^{2}, &\langle E B \rangle - \langle E \rangle \langle B \rangle\\ \langle E B \rangle - \langle E \rangle \langle B \rangle, &\langle B^{2} \rangle - \langle B \rangle^{2}\end{array}\right]. $$
    (48)

    Accordingly, \({\varvec{\alpha}}\) is positive definite (or semi-definite if singularities exist) (Kapur and Kesevan 1992). From the Legendre transformation (47), \({\mathbf a}\) is also positive definite (or semi-definite) (Kapur and Kesevan 1992). In consequence, from (42) and (45) (including the tensor sign reversals), \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) and ϕ(ω E , ω B ) are both concave functions.

    Furthermore, the diagonal of (48) gives the magnitude of the standard deviation or “fluctuations” of the ensemble with respect to each constraint, usually expressed in normalised form by the coefficients of variation (Callen 1985):

    $$ \begin{aligned} C_V(E) &= {\frac{\sqrt{\langle E^{2} \rangle - \langle E \rangle^{2}}} {\langle E \rangle}} = {\frac{1} {\langle E \rangle}} \sqrt{-{\frac{\partial \langle E \rangle} {\partial \omega_E}}}\\ C_V(B) &= {\frac{\sqrt{\langle B^{2} \rangle - \langle B \rangle^{2}}} {\langle B \rangle}} = {\frac{1} {\langle B \rangle}} \sqrt{-{\frac{\partial \langle B \rangle} {\partial \omega_B }}} \end{aligned}. $$
    (49)

    The covariance, similarly normalised, provides a measure of the coupling between constraints (Callen 1985):

    $$ \begin{aligned} C_V(E,B) &= \sqrt{{\frac{ \langle E B \rangle - \langle E \rangle \langle B \rangle } {\langle E \rangle \langle B \rangle}}}\\ &= \sqrt{{\frac{1} {\langle E \rangle \langle B \rangle}} \left | - {\frac{\partial \langle B \rangle}{\partial \omega_E}} \right | } &= \sqrt{{\frac{1}{\langle E \rangle \langle B \rangle}} \left | - {\frac{\partial \langle E \rangle}{\partial \omega_B}} \right | } \end{aligned} $$
    (50)
  • Fourthly, the manifold of predicted equilibrium positions defined by \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) or ϕ(ω E , ω B ) can be interpreted as a framework geometry, analogous to the thermodynamic geometry examined by (Gibbs 1873, 1873, 1875) [see also (Callen 1985; Gaggioli et al. 2002; Gaggioli and Paulus 2002)]. For example, if we consider 〈B〉 as a function of 〈E〉, as shown graphically in Fig. 3, we can represent positions of constant entropy \(\mathfrak{H}^{*}\) by a series of isentropic curves on this graph. From (40), these will be straight lines with negative gradient −ω E B , indicating that an increase in the energy or cost 〈E〉, at constant \(\mathfrak{H}^{*}\) (and ϕ), causes a corresponding decrease in 〈B〉. Of course, many other curves can also be plotted on the diagram, including isoenergetic, isobenefit, iso-ω E and iso-ω B curves, defined by rearrangements of (40). We can also plot ω B as a function of ω E , on which we can construct isopotential curves with negative gradient −〈E〉/〈B〉. (Adopting the crude analogy of Sect. 3.1, these can be transformed to plots of P as a function of T for the model framework.) Three-dimensional graphs such as \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) or ϕ(ω E , ω B ) can also be constructed, containing isosurfaces of various kinds (Gibbs 1873, 1875). As pointed out by (Gibbs 1873, 1873, 1875), it is advantageous to plot “fundamental equations” such as \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) or ϕ(ω E , ω B ), rather than forms unobtainable from these by Legendre transformation (such as \({\mathfrak{H}^{*}}{(\omega_E, \omega_B)}\)), so that all parameters not represented on the axes can be calculated for a given path simply by differentiation.

    Recalling that the frameworks herein consist of all possible models consistent with the constraints, the resulting manifold \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) or ϕ(ω E , ω B ) should for the most part be continuous in its geometric space, reflecting infinitesimal changes in parameters and incremental changes in model algorithms. However, in some circumstances there may be discontinuities in the manifold, due to abrupt changes in model algorithm or adoption of different scientific paradigms. Such changes can be described as phase changes or tipping points within the model space, leading to assortments of stable and unstable solutions and path-dependent hysteresis effects. These may create particular difficulties, but can of course be handled in much the same manner as in thermodynamics.

  • Finally, it can be shown that either framework \({\mathfrak{H}^{*}}{(\langle E \rangle, \langle B \rangle)}\) or ϕ(ω E , ω B ) can be endowed with a Riemannian geometry (entirely distinct from the framework geometry just described), using the metric furnished by the respective (positive definite) Hessian matrix \({\mathbf a}\) or \({\varvec{\alpha}}\) (Weinhold 1975; Ruppeiner 1979; Salamon et al. 1980; Salamon and Berry 1983; Nulton et al. 1985; Niven and Andresen 2009). As noted, the two metrics and hence the geometries are connected by Legendre transformation (47). The Riemannian interpretation leads to an important physical limit: a least action bound on the cost, in units of \(\mathfrak{H}^{*}\)or ϕ, to move the framework from one equilibrium position to another at finite rates of change of the constraints or multipliers. This bound—which constitutes an extension of finite time thermodynamics (Weinhold 1975; Ruppeiner 1979; Salamon et al. 1980; Salamon and Berry 1983; Nulton et al. 1985; Niven and Andresen 2009), but is in some sense allied to the informational limits identified by Szilard (1929), Landauer (1961), Bennett (1973) and similar workers (Leff and Rex 1990)—is examined in more detail in the Appendix.

Fig. 3
figure 3

Schematic diagram of Gibbs’ geometry, for the MaxEnt cost-benefit climate model weighting framework of §3

4 Conclusions

In this study, several maximum-entropy frameworks are presented for the synthesis of outputs from multiple Earth climate models, based on constraints on the properties of individual models (microcanonical framework) or ensembles of models (two canonical frameworks). The asymptotic and non-asymptotic entropy functions for each case are derived by combinatorial reasoning, and applied to simple systems constrained by the total model energy E (microcanonical) or mean ensemble energy 〈E〉 (canonical). In each case it is shown that the MaxEnt method identifies the most representative (most probable) model from a set of climate models, subject to the specified constraints, eliminating the need to calculate the entire set. The parametric and geometric implications of the underlying Jaynesian mathematical structure are examined, with reference to a canonical framework with competing cost and benefit constraints, allowing interrogation of the trade-off between costs and benefits. Finally, a finite-time limit on the minimum cost of modification of the synthesis framework, at finite rates of change, is also reported.

The foregoing analysis therefore provides climate modellers—or those who must rank and combine climate models—with a rational tool to amalgamate a large set of models into a single representative model (or a small representative set). This enables the weighting of climate projections from different groups, and will also dramatically reduce the computational demand on the climate modelling community. Indeed, the benefits extend into other fields: as commented by a reviewer, for long-range weather forecasts it is common practice to combine projections from different meteorological models, to improve reliability. The MaxEnt frameworks proposed here could equally be applied to this task.

A caveat to the foregoing analysis is that the inferred equilibrium climate model will not necessarily be the “most correct” model, but merely the one which is most representative of the available set of models. If the model space is incomplete, or their underlying physical or modelling assumptions are incorrect, any resulting errors will also be incorporated in the equilibrium model. A more comprehensive probabilistic framework, which incorporates the errors associated with our lack of knowledge (of data, phenomena and models), would consist of a Bayesian inferential framework extending back to all raw climate data, a substantial endeavour which—as its minimum condition—would require climate scientists to abandon their use of orthodox methods for statistical inference and parameter estimation (Jaynes and Bretthorst 2003).