1 Introduction

The Atlantic Meridional Overturning Circulation (AMOC) plays an important role in the climate of the Northern hemisphere through its transport of heat into the North Atlantic (Bryden and Imawaki 2001; Vellinga and Wood 2002; Jackson et al. 2015). Stommel (1961) identified the AMOC’s potential to have multiple stable states, due to a simple salinity advection feedback mechanism. Beyond a certain threshold in the freshwater forcing of the North Atlantic, the AMOC becomes unsustainable and collapses. If freshwater forcing then returns to below the threshold value, the AMOC does not restart. If the AMOC were close to such a threshold, a small additional freshwater input to the Atlantic (e.g. from accelerated melting of the Greenland ice sheet) could trigger AMOC collapse (Fichefet et al. 2003).

Such theoretical AMOC behaviour has been demonstrated in a range of models, including more complex box models (e.g. Rahmstorf 1996; Lucarini and Stone 2005), intermediate complexity climate models (e.g. Rahmstorf et al. 2005; Lenton et al. 2007) and ocean general circulation models (GCMs) (Rahmstorf 1996; Dijkstra 2007; Hofmann and Rahmstorf 2009). It has also been proposed to be relevant to a number of transitions seen in the palaeoclimatic record (e.g. Alley 2003). Evidence of similar behaviour has been seen in some coupled atmosphere–ocean GCMs (AOGCMs) (Manabe and Stouffer 1988; Mikolajewicz et al. 2007), but due to computational constraints a full AMOC hysteresis curve has to date only been calculated for one, low resolution AOGCM (FAMOUS) for conditions of pre-industrial atmospheric carbon dioxide (CO2) (Hawkins et al. 2011, hereafter H11). In H11 and many previous studies using simpler models, the thresholds are explored through a ‘hosing’ experiment in which a standard model equilibrium state is perturbed by adding an extra source of fresh water, H, to the North Atlantic. The strength of the hosing H is increased very slowly, with the aim of allowing the model to adjust towards its equilibrium state for each value of H. Hence a model run of several thousand years is required, and even then as shown in H11 a full equilibrium is not reached. Typically in such experiments, once H passes a critical value Hcrit the AMOC collapses. H is then slowly reduced again, but in general the AMOC does not recover when H crosses back below Hcrit. Instead AMOC recovery occurs at a lower (or even negative) value of H, giving a hysteresis in the AMOC strength and a range of values of H for which the AMOC is bistable (both strong and weak/reversed AMOC states are possible). Recently Jackson et al. (2017, hereafter J17) have analysed the detailed dynamics of the AMOC thresholds seen in the H11 study, showing that the salinity budget of the North Atlantic can be used to understand the dynamics of the thresholds.

The region of H values for which two stable states exist is bounded by bifurcation points beyond which either only the strong AMOC (small or negative H), or only the weak AMOC state (large H) is sustainable. Many studies have pointed to the importance of the fresh water budget of the Atlantic basin (north of 34°S) in determining the bistable region, and in particular the importance of the fresh water transport across 34°S due to the AMOC itself (denoted here by MOV, deVries and Weber 2005; Drijfhout et al. 2011). If MOV <  0 there is a positive salinity advection feedback in which negative anomalies in the AMOC induce a freshening of the Atlantic basin and hence further AMOC weakening. It has been suggested that current AOGCMs are biased towards an over-stable AMOC, due to a common positive bias in MOV (e.g. Weber et al. 2007; Valdes 2011; Mecking et al. 2017). However Sijp (2012) pointed out that other feedbacks, specifically anomalous fresh water transports due to advection of salinity anomalies by the mean AMOC (< q > S′) and the gyre/eddy components, are always stabilising, so MOV< 0 is not a sufficient condition for instability. It is therefore likely that the location of AMOC thresholds or bifurcation points is not simply determined by MOV, but by a more complex set of feedbacks involving the fresh water budget of the Atlantic or North Atlantic basins. Recently Cheng et al. (2018) have shown that in two AOGCM control runs the salinity advection feedback is not the dominant factor in variability of the North Atlantic AMOC, again emphasising the more complex nature of the processes controlling AMOC dynamics.

To quantify how far the AMOC is from a threshold, based on AOGCM hosing results, would require a wider range of AOGCM runs than is currently possible, although advances in computational power are beginning to enable a more thorough investigation of thresholds in current generation climate models including eddy-permitting ocean components (Jackson and Wood 2018). Dijkstra et al. (2004) propose an alternative approach involving energetic analysis of the discrete GCM equations; however this involves a very large matrix inversion problem which is also likely to present computational challenges as model resolution and complexity increase. In this study we explore a new approach to quantifying AMOC thresholds: we hypothesise that AMOC thresholds are controlled by low-order dynamical processes which are quantitatively captured by a simple but physically-based box model. The box model structure is motivated by well-established understanding of the leading order water mass structure of the current AMOC. The crucial novelties of this model, compared to previous AMOC box models, are that the model is designed to represent a physically closed global circulation/water mass system, and that the model’s control parameters can be simply determined from observable, large-scale properties of the present day (H = 0) ocean state. Hence the box model cannot be ‘tuned’ to have a particular threshold—rather it is calibrated to the H = 0 ocean state and predicts where the threshold Hcrit will lie. To test the chosen dynamics of the box model we calibrate it to the unperturbed ocean state simulated using the FAMOUS AOGCM of H11 and J17. We demonstrate that the box model captures the leading mechanisms in the threshold dynamics of FAMOUS, as analysed by J17, particularly well for the first (‘ramp-up’) threshold in the hosing experiment described above. The box model dynamics are in this sense traceable to those of the AOGCM. Our calibration method implies that the present day ocean state contains sufficient information to determine the threshold hosing Hcrit (to within errors which we quantify). We test this claim by repeating the H11 hosing experiment using a modified version of the AOGCM and various atmospheric CO2 concentrations, yielding various values of Hcrit. We calibrate the box model to the various baseline (H = 0) AOGCM states and test its ability to predict the different values of Hcrit.

The box model also provides a simple diagnostic framework that allows us to identify the key processes and ocean properties that determine the position of the AMOC threshold over a range of modelled states, and so acts as an ‘emergent constraint’ (e.g. Hall and Qu 2006; Cox et al. 2018), allowing the threshold position to be estimated by calibrating the box model to present day observations. Here (Sect. 6) we calibrate the box model to a data-assimilating ocean reanalysis to provide a preliminary estimate of Hcrit for the present day ocean. However a more in-depth analysis would be needed to generate a robust estimate including error bars.

The question of whether increasing greenhouse gases will bring the AMOC closer to a threshold has not to date been directly addressed using AOGCMs. Schneider (2007) concluded from a variety of studies (including expert elicitations) that increasing greenhouse gases will increase the likelihood of substantial AMOC responses. Drijfhout et al. (2011) studied the response of MOV to increasing greenhouse gases, finding a complex response with MOV generally decreasing and the strongest change at medium levels of greenhouse gas increase; however it is not clear whether MOV has a close relationship to the threshold position, and they did not calculate the changes in AMOC thresholds explicitly. Here we directly calculate the AMOC hysteresis curve in FAMOUS, for a climate state with increased atmospheric CO2. We find that for this AOGCM the amount of freshwater Hcrit needed to provoke AMOC collapse is greater with elevated CO2. This change is reproduced by the box model when we calibrate it to the higher CO2 AOGCM state. We then use the dynamical understanding provided by the box model to assess whether this change is likely to be robust or merely an artefact of the particular AOGCM used.

Section 2 provides a brief description of the FAMOUS AOGCM, introduces the box model, and explains how the box model parameters are calibrated to the AOGCM state. Section 3 explores the processes behind AMOC thresholds in the AOGCM and box model, showing that the box model captures the essential dynamics of the AOGCM thresholds to within quantifiable errors. Section 4 explores the sensitivity of the AMOC collapse threshold to box model parameters, pointing to key features of the ocean state that determine the threshold position, and uses this insight to understand why Hcrit increases under increased CO2 in FAMOUS. Section 5 discusses limitations of the traceability between the box model and AOGCM. Section 6 draws together the results and discusses their implications for monitoring and early warning of AMOC thresholds, and the likely implications of climate change for future AMOC stability.

2 Model descriptions

2.1 The AOGCM

FAMOUS (Smith et al. 2008; Smith 2012) is a coarse resolution AOGCM based on the widely used HadCM3 model (Gordon et al. 2000). The atmospheric component has a horizontal resolution of 5° × 7.5° with 11 vertical levels, while the ocean has a horizontal resolution of 2.5° × 3.75° with 20 vertical levels. The model provides a three-dimensional simulation of atmosphere and ocean, with physically detailed representations of processes such as clouds, precipitation and atmosphere–ocean feedbacks. FAMOUS does not employ artificial flux adjustments, which are known to distort the AMOC hysteresis behaviour (Marotzke and Stone 1995; Dijkstra and Neelin 1999). We use two versions here: the first [‘XDBUA’, Smith et al. 2008, hereafter FAMOUSA] is the version used by H11, while the second is an updated version including a range of minor changes [version ‘XFXWB’, Smith 2012, hereafter FAMOUSB]. These model changes result in a change in the position of the AMOC threshold, and will provide an additional test of our model hierarchy.

2.2 The box model

Our box model is represented in Fig. 1a. Its five boxes represent large contiguous regions of the global ocean, corresponding to large scale water mass structures (Talley et al. 2011) (Fig. 1b): the ‘T’ box represents the Atlantic thermocline; the ‘N’ box the North Atlantic Deep Water (NADW) formation region and Arctic; the ‘B’ box the southward propagating NADW and its upwelling in the Southern Ocean as Circumpolar Deep Water; the ‘S’ box fresh Southern Ocean near-surface waters and their return into the Atlantic as Antarctic Intermediate Water; and the ‘IP’ box the Indo-Pacific thermocline. The boxes are connected by pipes of negligible volume that carry the flow. The flow is separated into a ‘cold water path’ (CWP), representing AMOC return flow via the South Pacific and Drake Passage, and a ‘warm water path’ (WWP), representing AMOC return via the Indo-Pacific thermocline and Agulhas leakage.

Fig. 1
figure 1

Box model definition. a Schematic representation of the box model. The control parameters of the model are the temperature difference between N and S boxes, the pipe constant (λ), the surface freshwater fluxes (Fi), the wind-driven transport constants (Ki), the S–B box mixing parameter (η) and the proportion of the cold water path (γ). All parameters except γ can be diagnosed from any GCM state, or in principle from observations. b Boundaries of model boxes used in the calibration of the box model to the FAMOUSA pre-industrial (1xCO2) run, superimposed on the zonal average of the FAMOUSA salinity distribution across the Atlantic and Indo-Pacific Oceans

The box model physics is governed by salt conservation in each box, and a linear dependence of the overturning circulation on the density difference of the North Atlantic and Southern Ocean boxes:

$$q \, = \lambda \, \left[ {\alpha \left( {T_{S} {-} \, T_{N} } \right) \, + \, \beta \left( {S_{N} {-} \, S_{S} } \right)} \right]$$
(1)

where q is the AMOC flow and λ is a constant. A linear equation of state is used, with thermal and haline coefficients α = 0.12 kgm−3 K−1 and β = 0.79 kgm−3(psu)−1. T and S denote mean temperature and salinity over the boxes. Such a relationship has previously been demonstrated in a range of models (e.g. Hughes and Weaver 1994; Rahmstorf 1996; Thorpe et al. 2001; Sijp 2012), and we find it holds in our FAMOUS runs over the entire hysteresis loop described below (Fig. 2a), justifying its use in our box model a posteriori.

Fig. 2
figure 2

a AMOC strength as function of N-S density difference. Scatter plot of FAMOUSA AMOC strength vs. density difference between the two portions of the ocean that define the N and S boxes in the box model. The points shown cover the entire hysteresis run with preindustrial CO2. b Temperature of N box as a function of AMOC strength. Scatter plot of FAMOUSA box-mean temperature TN vs. AMOC strength q. The points shown cover the part of hysteresis between the unhosed state and the first threshold crossing, for the run with preindustrial CO2

The salinities of the five boxes are governed by salt conservation:

$$q \ge 0:$$
$$V_{N} \frac{{dS_{N} }}{dt} = q\left( {S_{T} - S_{N} } \right) + K_{N} (S_{T} - S_{N} ) - F_{N} S_{0}$$
(2)
$$\begin{aligned} V_{T} \frac{{dS_{T} }}{dt} &= q\left[ {\gamma S_{S} + \left( {1 - \gamma } \right)S_{IP} - S_{T} } \right] + K_{s} (S_{S} - S_{T} ) \\&\quad+ K_{N} (S_{N} - S_{T} ) - F_{T} S_{0} \end{aligned}$$
(3)
$$\begin{aligned} V_{S} \frac{{dS_{S} }}{dt}& = \gamma q\left( {S_{B} - S_{S} } \right) + K_{IP} (S_{IP} - S_{S} ) + K_{S} (S_{T} - S_{S} )\\&\quad + \eta \left( {S_{B} - S_{S} } \right) - F_{S} S_{0} \end{aligned}$$
(4)
$$V_{IP} \frac{{dS_{IP} }}{dt} = \left( {1 - \gamma } \right)q\left( {S_{B} - S_{IP} } \right) + K_{IP} (S_{S} - S_{IP} ) - F_{IP} S_{0}$$
(5)
$$V_{B} \frac{{dS_{B} }}{dt} = q\left( {S_{N} - S_{B} } \right) + \eta \left( {S_{S} - S_{B} } \right)$$
(6)
$$q < 0:$$
$$V_{N} \frac{{dS_{N} }}{dt} = \left| q \right|\left( {S_{B} - S_{N} } \right) + K_{N} \left( {S_{T} - S_{N} } \right) - F_{N} S_{0}$$
(7)
$$V_{T} \frac{{dS_{T} }}{dt} = \left| q \right|\left( {S_{N} - S_{T} } \right) + K_{s} \left( {S_{S} - S_{T} } \right) + K_{N} \left( {S_{N} - S_{T} } \right) - F_{T} S_{0}$$
(8)
$$\begin{aligned} V_{S} \frac{{dS_{S} }}{dt}& = \gamma \left| q \right|\left( {S_{T} - S_{S} } \right) + K_{IP} \left( {S_{IP} - S_{S} } \right) + K_{S} \left( {S_{T} - S_{S} } \right)\\&\quad + \eta \left( {S_{B} - S_{S} } \right) - F_{S} S_{0} \end{aligned}$$
(9)
$$V_{IP} \frac{{dS_{IP} }}{dt} = \left( {1 - \gamma } \right)\left| q \right|\left( {S_{T} - S_{IP} } \right) + K_{IP} \left( {S_{S} - S_{IP} } \right) - F_{IP} S_{0}$$
(10)
$$V_{B} \frac{{dS_{B} }}{dt} = \gamma \left| q \right|S_{S} + \left( {1 - \gamma } \right)\left| q \right|S_{IP} - \gamma \left| q \right|S_{B} + \eta \left( {S_{S} - S_{B} } \right)$$
(11)

where Vi is the volume of box i, γ denotes the proportion of the cold water path, and η is a S-B box mixing parameter, representing mixing of NADW with fresher waters as it passes around the global circulation. Oceanographically η represents the mixing of Circumpolar Deep Water with fresher surface water masses in the Southern Ocean (Talley et al. 2011). Wind driven salinity transports between boxes are represented by a diffusive flux with coefficients KN, KS, KIP associated with the gyre strengths.

The box volumes Vi, gyre coefficients Ki, surface freshwater fluxes Fi, along with λ, η and γ are specified, time-invariant parameters. S0 is a reference salinity set to 0.035. We assume that the mean temperature TN of the North Atlantic box increases linearly with AMOC strength, reflecting the role of the AMOC in transporting heat into the North Atlantic:

$$T_{N} = \mu q + T_{0}$$
(12)

The other box temperatures are fixed. While not as tight as the q vs. density relationship (1) over the whole hysteresis loop, there is nonetheless a close linear relationship between q and TN, over the portion of the curve between the un-hosed state and the first threshold crossing, which is the part of the experiment which we will focus on in our analysis below (Fig. 2b). We found empirically that allowing for this variation in TN slightly increases the sharpness of the transition to the off state near the threshold, but temperature variations only play a minor role in density variations in these experiments (Fig. 4a) and there is little sensitivity of Hcrit to the value of µ (see discussion in Sect. 4.1). A more sophisticated treatment of temperature effects would be needed for thermally driven scenarios such as the response of the AMOC to transient global warming.

Our model adopts a similar broad approach to the box model of Rahmstorf (1996), but with several important additions:

  1. 1.

    Our model is designed to achieve a degree of quantitative, as well as qualitative agreement with corresponding AOGCM experiments. For this reason our boxes represent contiguous regions that span the majority of the global ocean, and are assigned different volumes that are identified with the largest scale water masses;

  2. 2.

    The choice of separate N and B boxes was partly driven by the desire for quantitative comparison with the AOGCM: in an earlier prototype of the model where the N and B boxes were merged, the relationship between the density difference and MOC strength (Fig. 2a) was less tight, leading to large quantitative errors in the hysteresis loop. In the Rahmstorf model the B box (Rahmstorf’s Box 4) is essentially passive and isolated (S4 = S2 at equilibrium), whereas here we allow for mixing between the B box and the surface ocean (S box);

  3. 3.

    Our model explicitly represents a closed global circulation and its associated fresh water transports, including the different roles of the cold and warm water paths. In contrast, in the Rahmstorf (1996) model the closure of the MOC outside the Atlantic basin (Rahmstorf’s Box 1), and the role of gyre transports, must be specified through the concept of a fixed ‘active fresh water flux’ which is hard to associate with a specific observable quantity and does not respond to the evolving salinity fields. The additional physics in our model allows it to generate self-consistent solutions that can be identified with physical variables.

Our representation of the WWP/CWP has limitations: due to the large extent of the IP box the water coming back into the Atlantic basin through the WWP is not as saline as the real Agulhas return flow. Therefore our model may underestimate the importance of the WWP/CWP parameter γ. We note that for the parameter values studied here, variations in SS and SB are small compared to the other boxes. This means that a 3-box reduction of the model (with SS and SB fixed) is possible that contains the essential dynamical behaviour of the 5-box model in the most relevant parameter ranges, at the cost of some quantitative fidelity. Even the 3-box reduction has one extra degree of freedom compared with the Stommel (1961) and Rahmstorf (1996) models, allowing a much richer dynamical structure including homoclinic and Hopf bifurcations in addition to the saddle-node bifurcations that are seen in the simpler models (Alkhayuon et al. 2019).

Our model has several similarities to the model of Johnson et al. (2007), which showed how more recent theories of the AMOC which emphasise closure of the potential energy budget through Southern Ocean winds and interior diapycnal mixing (e.g. Gnanadesikan 1999) can be reconciled with salinity-budget considerations and bistability as emphasised by the Stommel (1961) model. However our model differs from that of Johnson et al. (2007) in that we do not attempt to parametrise the processes that determine the transformation of NADW to cold, fresh Antarctic Intermediate Water or warm, salty thermocline water, and then solve for the pycnocline structure and AMOC. Instead in our model these transformations, and the basic geometry of the water masses are to some extent prescribed through the model parameters and the specified box boundaries. Our emphasis is on describing the dynamical mechanisms that occur when the AMOC passes from a strong (‘on’) state to a weak or reversed state (i.e. when the current strong AMOC state becomes unsustainable), on demonstrating that the box model dynamics accurately describe the dynamics of this transition in the AOGCM, and on identifying observable properties of the ocean circulation that determine where the transition lies.

2.3 Calibration of the box model to the AOGCM

To calibrate the box model to a GCM such as FAMOUS we use decadal mean variables diagnosed purely from large scale properties of the GCM’s unperturbed equilibrium state (red dot in Fig. 3c), without knowledge of the GCM’s response to hosing. First, box boundaries are chosen to reflect approximate water mass boundaries in the GCM salinity field (Fig. 1b). Once the box volumes are fixed, all but one of the control parameters of the box model can be diagnosed from emergent properties of FAMOUS (box average temperature and salinity, surface fluxes and section freshwater transports), and so could also in principle be diagnosed from observations. Box mean salinities, temperature and surface fresh water fluxes are obtained directly from the GCM. KN, KS and KIP are determined by diagnosing the gyre salt transport M in the GCM across the corresponding box boundaries:

Fig. 3
figure 3

Comparison between FAMOUSA and box model simulations. a Salinity evolution in the five model boxes through the 5000 years of the FAMOUSA hosing experiment [H11]. b As (a) but for the corresponding box model experiment. The same rate of increase of hosing is used for both experiments. c AMOC strength as function of hosing applied. Dots: FAMOUSA (decadal means). Red line: box model. The box model has been calibrated solely to the unperturbed initial state of FAMOUSA (shown by the red dot). The dashed lines show the critical hosing value Hcrit

$$K_{ij} \, = \,(M\, \times \,1000)/\rho_{0} \left( {S_{i} {-}S_{j} } \right)$$
(13)

where ρ0 is the mean seawater density. The Kij above are in units of m3 s−1, M in kg s−1 and the salinities in psu.

The flow constant λ is calculated from (1), after diagnosing q from the GCM as the maximum of the Atlantic overturning streamfunction at 30°S.

The parameters μ and T0 are calibrated by comparison with the North Pacific, a basin without a strong overturning circulation: we diagnose T0 as the mean oceanic temperature of a full-depth box covering the North Pacific and choose μ to balance (12) using the diagnosed values of TN and q. Finally γ, the proportion of the return AMOC flow carried by the cold water path, is chosen in the range 0 ≤ γ ≤ 1 to optimise the model fit to the box average salinities in the GCM control state. We find γ in the range 0.39–0.85 in the cases considered here, somewhat larger than the values diagnosed directly from ocean GCMs by Döös (1995) and Speich et al. (2001). The sensitivity of the AMOC threshold to γ is discussed in Sect. 4. In this paper we calibrate the box model to a number of AOGCM states, discussed below. The resulting parameter values are shown in Table 1.

Table 1 Box model parameter values for all calibrations used in this paper

3 AMOC thresholds in the GCM and box model

3.1 Dynamics of the hysteresis

The AMOC hysteresis structure and thresholds were assessed in FAMOUSA in a series of ‘hosing’ experiments by [H11]. A freshwater flux H was artificially applied to the North Atlantic surface between 20°N–50°N. The same flux was removed uniformly from the rest of the ocean surface to conserve global salinity. The AMOC response is sensitive to the region to which H is applied (Smith and Gregory 2009), and other regions may be more appropriate if the goal were to simulate, say, additional fresh water discharge from the Greenland Ice Sheet (Swingedouw et al. 2015; Bakker et al. 2016). However our focus here is on elucidating the dynamics of the AMOC thresholds so we stick to a single region of application for consistency with the existing AOGCM experiment.

H was gradually increased at a rate of 5 × 10−4 Sv/year (1 Sv = 106 m3 s−1), allowing the AMOC to adjust towards equilibrium with the hosing at any time. When H reached 1 Sv (after 2000 years), it was gradually reduced until it reached − 0.4 Sv. In the period of increasing hosing, the AMOC collapsed when H reached about 0.55 Sv (Fig. 3c, dotted curve). When H was reduced, the AMOC stayed collapsed, only recovering once H became less than about − 0.1 Sv.

Even though H is increased and decreased slowly, the experiments do not capture fully equilibrated AMOC solutions. This was shown in H11, which demonstrated that the region of bistable equilibrium solutions in FAMOUSA is narrower than the hysteresis region that appears in response to the slow increase then decrease of H. However in what follows we adopt a pragmatic definition of the ‘AMOC threshold’ as the value Hcrit of the additional freshwater flux H when the AMOC strength first reaches zero in the ‘ramp-up’ phase of the experiment (see dashed lines in Fig. 3c). Further discussion of the response of the box model to time-varying H, including rate-dependent tipping responses, can be found in Alkhayuon et al. (2019).

The dynamics driving the AMOC thresholds in FAMOUSA are captured by the simple physics of the box model. When the same hosing experiment is performed with the box model calibrated to FAMOUSA, box-average salinities in the regions represented by the box model evolve similarly in FAMOUSA and the box model (Fig. 3a, b). The box model’s AMOC shows hysteresis similar to that in FAMOUSA (Fig. 3c), collapsing at a similar hosing value (0.48 Sv). Together the salinities and AMOC in the box model represent its full state vector. This strongly suggests that the dynamics of AMOC hysteresis in the AOGCM are described to leading order by the dynamics of the box model. This will be confirmed below by a comparison of the box model dynamics with the detailed analysis of the FAMOUSA run by J17.

We note that our measure of the AMOC in AOGCMs is the maximum (negative value) of the overturning streamfunction at 30°S, which has been proposed as the key latitude at which the salinity advection feedback operates (e.g. Rahmstorf 1996; Drijfhout et al. 2011), rather than taking the maximum over the whole Atlantic, or around 30°N, as used by many previous studies. This explains why the FAMOUSA AMOC is negative in the collapsed state in Fig. 3, rather than close to zero as shown in H11 and J17 (whose Fig. 5a shows the maximum streamfunction at 26°N). The collapsed state in FAMOUSA has a reverse overturning cell that is largely confined to the South Atlantic and so not seen in the streamfuction at 26°N (see J17 Fig. 3c or H11 Fig. 1). The use of 30°S gives a tighter and more linear relationship between the density difference and the AMOC (compare Fig. 2a with Fig. 5a of J17, which defines the AMOC at 26°N), and the relationship passes through the origin, whereas if 26°N were used an offset would need to be added to Eq. (1) to obtain a good fit (J17), and it would be hard to calibrate the offset from the un-hosed state alone. The threshold values of H diagnosed for the AOGCM do not differ much whether either latitude is used (compare Fig. 3c with Fig. 2a of J17).

The agreement between box model and AOGCM is particularly good in the initial ‘ramp-up’ part of the hosing experiment, up to the point where the right-hand threshold is crossed (after about 1100 years, Fig. 3), although the decline of the AMOC as H is increased is more gradual in the box model. We show in Sect. 5.3 below that the more gradual AMOC decline in the box model is a consequence of the limited vertical resolution of the box model, with surface fluxes being distributed over the full depth of the boxes. Once the collapsed AMOC state is established, changes in AOGCM water mass structure (see J17) result in larger quantitative differences between the box model and AOGCM solutions. We discuss these differences briefly in Sect. 5.2, but our focus in this paper is primarily on the ‘ramp-up’ stage and the right-hand threshold, as this is the most relevant for assessing the resilience of the current AMOC.

3.2 Detailed dynamics of the ‘ramp-up’ threshold

The AMOC threshold behaviour in the FAMOUSA experiment has been analysed in detail by J17, in terms of the salinity budget of the North Atlantic/Arctic from 40º to 90ºN, the same region as the N box in our box model calibration. AMOC changes in FAMOUSA are driven primarily by changes in the salinity component of density in this region. We therefore compare here the salinity budget of the N box (Eqs. 2 and 7) with the corresponding budget in FAMOUSA from J17, as the right-hand threshold is crossed, to obtain a more detailed understanding of how well the box model captures the threshold dynamics of the AOGCM.Footnote 1 Having demonstrated very similar dynamics in the box model and AOGCM we exploit the simplicity of the box model to gain further insight into the threshold dynamics.

Figure 4a shows terms in the N box salinity budget for FAMOUSA, during the ‘ramp up’ part of the experiment, adapted from J17. During most of the ramp-up phase the North Atlantic freshens slowly in response to the increasing hosing (red). However the freshening is partly offset by increasing salinification due to advection by the gyre component of the flow, which transports the fresh anomalies out across 40ºN (blue). Advection by the overturning component of the flow (green) is remarkably constant for most of the ramp-up phase. However as the threshold is approached (from about 800 years into the run) two factors act to accelerate the freshening. First, atmospheric feedbacks act to increase the surface fresh water flux into the North Atlantic (seen as a slight increase in the slope of the red line in Fig. 4a from about t = 800 years), attributed by J17 to a spinup of the Pacific MOC and consequent increase in inter-basin atmospheric water transport. Secondly a strong salinity advection feedback begins to operate, leading to a rapid decrease in the salinity advection by the overturning component of the flow (green line). These two processes lead to rapid freshening of the North Atlantic and collapse of the AMOC. The box model does not include the atmospheric feedback on fresh water fluxes since its surface fresh water flux is fixed. So the question arises whether this atmospheric feedback plays a critical qualitative or quantitative role in the AMOC threshold. Figure 4a suggests that the atmospheric feedback (which can be seen more clearly in Fig. 6e of J17) is relatively small.

Fig. 4
figure 4

Salinity budget terms for the North Atlantic box in years 0–1200, for a FAMOUSA (adapted from J17), b box model. Black: dSN/dt; red: surface flux (including hosing); green; advection by MOC; blue: advection by gyre(FAMOUS)/diffusion by KN (box model). Also shown is the density change due to temperature response to the AMOC, converted into an equivalent salinity change (pink). Average slope lines for years 601–800 and 801–1000 are shown for the surface flux term in a to illustrate the atmospheric water flux feedback. The individual components of the fresh water transport by the MOC, − q(ST− SN), are shown for the box model in c [Red: q (Sv); blue: (ST− SN) (psu × 10); Green: − q(ST− SN) (Sv.psu)]

Figure 4b shows the corresponding salinity budget terms for the box model. We see quantitatively similar behaviour to FAMOUSA for all the budget terms, in the first 800 years. The salinity advection by the overturning is again roughly constant. From year 800, the box model surface fluxes do not include the atmospheric feedback described for FAMOUSA above. However the salinity advection by the MOC does decrease from this point in the box model just as in FAMOUSA, leading to AMOC collapse. Hence the atmospheric feedback identified by J17 does not appear to be an essential element in the AMOC collapse, which instead is primarily due to the sudden collapse of the salinity advection by the MOC. However the atmospheric feedback may be expected to hasten the AMOC collapse, as suggested by J17. To confirm this we have rerun the box model with time-varying FN diagnosed from the FAMOUSA run; the value of Hcrit diagnosed with time-varying FN is 0.40 Sv, compared with 0.48 Sv for the constant FN case. The total fresh water input (hosing plus increase in FN) at collapse is approximately the same in both cases, suggesting that the additional water input from the atmospheric feedback behaves simply as an additional hosing.

To elucidate the sudden reduction in the salinity advection by the MOC, we rewrite the salinity advection term in (2) by substituting for q from (1) and reformulating in terms of (ST − SN):

$$\begin{aligned} q\left( {S_{T} - S_{N} } \right) \,& = \, \lambda \left[ {\alpha \, \left( {T_{S} - T_{N} } \right) \, + \, \beta \left( {S_{T} - S_{S} } \right)} \right]\\&\quad\times \, \left( {S_{T} {-} \, S_{N} } \right) \, - \, \lambda \beta \left( {S_{T} - S_{N} } \right)^{2} \end{aligned}$$
(14)

Noting that over the first 800 years, salinity changes are dominated by changes in SN (Fig. 3b), we can approximate ST and SS as constant over this period. As ST − SN increases due to freshening of SN, the—λ β(ST − SN)2 term eventually dominates, resulting in the eventual rapid collapse of q(ST − SN).

Note that − q(ST − SN), the fresh water transport by the AMOC across 40ºN by the MOC, is the equivalent at 40ºN of the diagnostic commonly associated with AMOC stability through a linear salinity advection feedback argument [often referred to as MOV or FOV, e.g. Rahmstorf (1996); Mecking et al. (2017)]. We will use the notation LMOV to denote MOV at latitude L, where necessary for clarity. The linear feedback argument requires LMOV to be negative at latitude L for the salinity advection feedback to become positive/destabilising at that latitude. However, as pointed out by Sijp (2012), what is important for stability is not MOV but ∂MOV/∂q; positive ∂MOV/∂q implies a negative (stabilising) feedback. In the initial phase (years 0–800), decreases in q are offset by increases in (ST − SN) as the hosing freshens the North Atlantic (Fig. 4c). So although 40NMOV is negative in the initial state, the net salinity advection feedback 40NMOV/∂q is approximately zero until the (ST − SN)2 term begins to dominate around year 800.

3.3 The ‘ramp up’ threshold in other AOGCM states

To test the ability of the box model to provide quantitative insight into the position of the right-hand threshold, we have performed two new hosing experiments with FAMOUS. For these we use the more recent model version FAMOUSB. The baseline state for the first new experiment is the basic FAMOUSB model spun up from rest with pre-industrial CO2 (Smith 2012), while for the second experiment CO2 is doubled from pre-industrial values and the model is spun up for 920 years to adjust to the higher CO2 forcing. We then repeat the hosing experiments, starting from these two new baseline states. The first of these experiments is identical to the experiment of H11, except for the use of FAMOUSB rather than FAMOUSA, while the second experiment, also using FAMOUSB, starts from a different climate state representing a climate with increased greenhouse gas concentrations.

First we repeat the ‘ramp up’ part of the hosing experiment using FAMOUSB, with preindustrial CO2. The model change from FAMOUSA to FAMOUSB results in a reduction of Hcrit by about 0.1 Sv (Fig. 5a). This change is captured by the box model when calibrated to the different climate states of the two FAMOUS versions (Fig. 5b), providing further confidence in the box model. The different box model parameters for the FAMOUSA and FAMOUSB states are shown in Table 1.

Fig. 5
figure 5

AMOC thresholds in preindustrial and increased CO2 simulations. AMOC strength as function of hosing applied in transient experiments from various near-equilibrated CO2 states. Only the ‘ramp-up’ part of the experiment (hosing increasing up to 1.0 Sv) is shown. a FAMOUSA at pre-industrial CO2 (black), FAMOUSB at pre-industrial (blue) and 2 × CO2 (brown); b box model calibrated to the three FAMOUS runs shown in a; c box model calibrated to HadGEM2-AO at preindustrial (blue), 2 × CO2 (brown) and 4 × CO2 (red); d box model calibrated to Smith et al. (2007) ocean reanalyses for the decades 1979–1989 (black), 1989–1999 (cyan), 2000–2009 (blue)

As a further test of the ability of the box model to estimate Hcrit for different ocean states, we have rerun the FAMOUSB hosing experiment, but now starting from a state reached after 920 years of integration at twice preindustrial CO2. We find that around 0.35 Sv more freshwater input is needed to shut down the AMOC in the 2 × CO2 state, compared with the pre-industrial state (Fig. 5a). The same simulation is done with the box model, re-calibrated to the un-hosed 2 × CO2 state of FAMOUSB. The box model response to increased CO2 is qualitatively similar to that of FAMOUSB, with 0.23 Sv more hosing required than in the preindustrial state (Fig. 5b).

Overall the box model, when calibrated to different AOGCM states, appears to provide quantitative information on the value of Hcrit. This implies that large scale, emergent properties of the unperturbed ocean state contain enough information to constrain Hcrit. The simplicity of the box model allows us to understand the key factors and processes that determine Hcrit., and we pursue this in Sect. 4 through a set of parameter sensitivity studies.

4 Parameter sensitivity of the box model

In this section we examine the sensitivity of the ‘ramp-up’ threshold Hcrit to changes in individual box model parameters, and provide a physical interpretation of those sensitivities. We then discuss whether the fresh water transport by the AMOC in the baseline state (MOV) is a good predictor of the value of Hcrit, and assess the impact of the parameter changes seen at increased CO2.

4.1 Parameter sensitivity of the threshold

Figure 6a shows the value of hosing Hcrit at which q crosses zero in the ramp-up phase, as a function of the various box model parameters. Each parameter is varied individually with other parameters held fixed at their baseline values for the FAMOUSA experiment. Most parameters have been set to zero, one half and two times their baseline values, except where this did not make physical sense. We also varied the strength of the global atmospheric water cycle by simultaneously scaling all the surface fresh water fluxes Fi by 0.5 and 1.5 (thus mantaining zero global mean flux in each case).

Fig. 6
figure 6

Sensitivity of Hcrit to box model parameters. a Sensitivity of Hcrit to changes in the values of a single box model parameter, relative to a baseline state calibrated to the FAMOUSA AOGCM experiment. The baseline parameter values are given in Table 1, and the parameter changes are shown along the horizontal axis as a proportion of the baseline value. b For same box model parameter sensitivity experiments as in a, sensitivity of Hcrit to the value of the fresh water transport by the AMOC (Sv) in the un-hosed state, for the three diagnostics NOV (short dashed, left), TOV (long dashed, right) and BOV (solid, centre) – units: Sv

The physical mechanisms of the different parameter sensitivities during the ramp-up phase can be understood in terms of the analysis of the fresh water budget of the North Atlantic (N box) in Sect. 3 above. Rewriting Eq. (1) as

$$q \, = \, \lambda \, \left[ {\alpha \left( {T_{S} - T_{0} } \right) \, + \, \beta \left( {S_{N} - S_{S} } \right)} \right]/\left( {1 \, + \, \lambda \alpha \mu } \right)$$
(15)

we see that the temperature driving of the flow is constant in time (and positive, Table 1). Figure 3a shows that the salinity driving is also initially positive (SN> SS), and that the freshening of SN is much greater than variations in SS during the ramp-up phase. As the hosing increases, SN eventually becomes less than SS (Fig. 3a) and the salinity driving becomes sufficiently negative to counteract the temperature driving, giving q = 0. We use this framework to interpret the parameter sensitivities in the following.

K N :

Higher values of KN result in a larger Hcrit. As KN increases there is an increasingly strong negative feedback through salting of the N box by the gyre term as SN freshens, counteracting and delaying the positive salinity advection feedback due to advection by the MOC (λβ(ST− SN)2 in (14)). This can be seen by comparing the N box salinity budget in the case where KN= 0 (Fig. 7a) with the corresponding figure in the baseline case (Fig. 4b). Without the negative feedback from KN the salinity advection feedback is much sharper (green line), leading to an earlier and more abrupt collapse of the AMOC. A similar sensitivity has recently been reported in simulations of the Last Glacial Maximum using the UVic intermediate complexity climate model (Muglia et al. 2018): applying the stronger North Atlantic wind stress typical of the LGM (equivalent to increasing the gyre strength and hence KN) results in a stronger fresh water perturbation being required to shut down the AMOC.

Fig. 7
figure 7

N box salinity budget for selected box model parameter sensitivity tests relative to the baseline FAMOUSA calibration: aKN= 0, bKS= 2 × baseline value, cKIP = 0.3 × baseline value. Legend as for Fig. 4b

K S :

Larger values of KS result in a smaller Hcrit. Increasing KS increases SS, and so reduces (SNSS) in the un-hosed state. Hence less freshening of SN is needed to bring q to zero. This can be seen in Fig. 7b, which shows the case with doubled KS. The cases of doubled KS and zero KN (Fig. 7a) therefore result in similar values of Hcrit but for different physical reasons.

K IP :

Larger values of KIP result in a smaller Hcrit. This sensitivity is the only one where we find significant nonlinearity: it is particularly strong at low values of KIP because as KIP becomes small the only mechanism available to balance the net evaporation from the Indo-Pacific in (5) is the advective flux convergence (1 − γ)q(SB − SIP). So as q decreases SIP must increase rapidly to maintain the same advective flux convergence. This can be seen in the different evolution of SIP in runs with low and high KIP (Fig. 8). For low KIP, the rapid increase of SIP results in a negative feedback on q: weakening q results in saltier Indo-Pacific water, which then enters the Atlantic via the warm water path. This negative feedback from the warm water path swamps the more commonly emphasised positive salinity advection feedback (e.g. Rahmstorf 1996); the positive feedback results from advection of the mean salinity by the anomalous flow (q’ < S >), whereas the negative feedback that we identify here results from advection of anomalous salinity by the mean flow (< q > S’, Sijp 2012). Advection of anomalous salinity was also found to make a significant contribution to the natural internal variability of MOV and the AMOC in two modern AOGCMs by Cheng et al. (2018). In the low KIP situation it is likely that the consequent large increase in SIP (Fig. 8a) would result in changes to the Indo-Pacific circulation (e.g. the Pacific MOC, see J17), with possible oceanic or atmospheric feedbacks that are not included in the box model. So the strong sensitivity to KIP seen here may to some extent be an artefact of the limited Pacific Ocean and atmospheric processes in the box model.

Fig. 8
figure 8

Box model salinity evolution over the ramp-up stage in the parameter sensitivity studies for a KIP = 8.9778 Sv (0.1 × baseline value) and b KIP = 179.556 Sv (2 × baseline value)

TS − T0:

Larger values imply stronger temperature driving of the flow. Hence greater freshening of SN (stronger hosing) is needed to before the salinity gradient is strong enough to counteract the temperature gradient in (15).

µ :

In this case as µ was varied, TS − T0 was adjusted to keep the same value of q in the baseline state. Larger values of µ imply larger values of TS − T0, and hence the same sign of sensitivity as was seen to TS − T0. If µ is instead changed without adjusting TS − T0, there is virtually no sensitivity of Hcrit to µ, since the amount of North Atlantic freshening (hosing) required to bring the density gradient to zero in (15) is not directly changed. Thus the apparent sensitivity to µ is mostly due to sensitivity to the invariant part of the temperature gradient TS − T0.

λ :

The sensitivity is weak because a change in λ does not directly change the North Atlantic freshening (hosing) needed to bring the N–S density difference to zero in (15). Although increased λ produces a stronger baseline flow, there is a balancing change in the amount that q changes for a given density change.

η :

Sensitivity to η is weak. η effectively relaxes SS toward the salinity of the large deep water reservoir SB, resulting the small variation in SS seen in the baseline experiment (Fig. 3a). For small η, SS is free to vary more in response to advection by the changing q, but these salinity variations are simply advected around the CWP and cause corresponding changes in ST and SN. So the overall variations in (SN − SS) in (15) are not much different from the baseline case.

γ :

Larger values of γ have smaller values of Hcrit. Large values of γ imply a dominant CWP. In this case the Atlantic is fresher and the Southern Ocean saltier than in the low γ (WWP) case. In terms of (15) (SN − SS) begins at a lower value and so less freshening is required to reverse the density gradient.

F i :

Here all the surface fresh water fluxes are scaled by a factor of 0.5 or 1.5, maintaining zero global mean flux in each case. A stronger mean hydrological cycle results in a larger initial salinity difference (SN − SS) in (15). Hence more hosing is needed to reverse the density gradient, and larger fresh water fluxes result in a larger Hcrit.

Overall, we see that Hcrit is sensitive to many of the box model parameters, including those involving the thermohaline forcing (TS − T0, Fi, µ), and those involving wind-driven gyre exchange (Ki). It is perhaps surprising (but explained by the analysis above) that the sensitivity to parameters involving internal dynamics of the AMOC (λ, γ, η) is relatively weak. The parameter sensitivity is generally linear in the range considered, except for KIP, where the strong nonlinearity at low values may be a consequence of the simplicity of the box model dynamics.

4.2 Role of the AMOC fresh water transport MOV

The fresh water transport into the Atlantic basin across the southern boundary of the basin (around 34°S) by the AMOC itself (often denoted MOV or FOV) has been proposed as an important diagnostic of AMOC bi-stability at equilibrium, with negative MOV implying that the AMOC is in a bi-stable regime, and positive MOV implying a mono-stable AMOC (Rahmstorf 1996; deVries and Weber 2005; Mecking et al. 2017). MOV also plays a role in the transient response of the AMOC to hosing: modifying MOV by applying flux adjustments at the Southern boundary or throughout the Atlantic can change the response of the AMOC in AOGCM hosing experiments (Cimatoribus et al. 2012; Jackson 2013; Liu et al. 2017). The sign of MOV has been associated with the sign of the salinity advection feedback, with positive MOV implying a negative (stabilising) feedback and negative MOV implying a positive (destabilising) feedback on AMOC changes (Stommel 1961; Rahmstorf 1996). However the relationship between the role of MOV in AMOC bistability (a property of the equilibrium state) and the salinity advection feedback (a transient process) is unclear.

The role of MOV in AMOC feedbacks and stability was shown by Sijp (2012) to be more complicated than the above advection feedback argument. In the standard argument a negative MOV at a given latitude implies that the AMOC is removing fresh water from the Atlantic basin north of that latitude. A weakening of the AMOC leads to less fresh water removal and hence a fresher Atlantic basin and further AMOC weakening. This feedback focuses on fresh water transport anomalies arising from advection of the mean salinity field by the anomalous flow (q′ < S >); however as noted by Sijp (2012), advection of salinity anomalies by the mean flow (< q > S′) can also be an important term, is stabilising whatever the sign of MOV in the un-hosed state, and can be larger than the first term. A compensation between these two terms can be seen (for MOV at 40ºN) in Fig. 4c. Further, the gyre/eddy components of fresh water transport are always down-gradient and are expected to be stabilising. Hence there are both stabilising and destabilising feedbacks, and a stable AMOC is possible even when MOV< 0, as is believed to be the case in the real present-day ocean.

Given the theoretical importance of and interest in MOV as a diagnostic of AMOC bi-stability, we ask whether MOV in the un-hosed state contains any information about the distance of the AMOC from the right hand stability threshold, Hcrit. This distance does not a priori depend on whether the unperturbed AMOC is in a mono- or bi-stable régime. Our box model does not contain a physical boundary at 34°S, so we examine three alternative definitions of the fresh water transport by the AMOC into the Atlantic basin:

$$N_{OV} = \, - q \, \left( {S_{T} - S_{N} } \right)/S_{0}$$
(16)

is the transport into the N box (equivalent to the value of MOV at around 40°N in FAMOUS, and close to the North Atlantic region used for analysis of the FAMOUSA run in J17);

$$T_{OV} = - q \left[ \gamma S_{S} + ({1 - \, \gamma })S_{IP} - S_{N} \right]/S_{0}$$
(17)

is the transport into the combined T and N boxes (North Atlantic above the NADW layer); and

$$B_{OV} = \, - q \, \left[ {\gamma \left( {S_{S} {-}S_{B} } \right) \, + \, \left( {1 - \, \gamma } \right)\left( {S_{IP} {-} \, S_{B} } \right)} \right]/S_{0}$$
(18)

is the transport into the combined T, N and B boxes (whole Atlantic plus the global NADW/CDW water mass). BOV is the closest box model equivalent to the conventional 34SMOV, if we assume that the southward transport across 34°S is qSB. The first term on the right hand side is positive, representing northward fresh water transport by the CWP, and the second term is negative, representing southward transport by the WWP.

The dependence of Hcrit on the un-hosed value of NOV, TOV and BOV, for the box model parameter sensitivity experiments described above, is shown in Fig. 6b. We see that none of these diagnostics has a clear relationship with Hcrit overall. This is unsurprising given the variety of mechanisms by which parameter changes result in changes in Hcrit, as discussed in Sect. 4.1. For example, the sensitivity of Hcrit to KN is a consequence of changes in NOV (see discussion in Sect. 4.1 and Fig. 7a), and the ‘expected’ relationship between Hcrit and NOV (i.e. larger Hcrit as NOV increases) is seen in Fig. 6b. On the other hand, the sensitivity of Hcrit to KIP is primarily due to changes in the salinity of the Indo-Pacific water (Sect. 4.1), and we see large changes in Hcrit in response to changes in KIP, despite only small changes in the un-hosed value of any of NOV, TOV and BOV (Fig. 6b).

Overall we conclude that while the advection of fresh water by the AMOC (quantified by MOV) plays an important role in the stability of the AMOC, the distance of the unperturbed AMOC from the threshold (Hcrit) is sensitive to a number of processes, so that the unperturbed value of MOV does not in itself provide a reliable indicator of Hcrit.

4.3 Parameter changes at increased CO2 concentration

Comparing the two FAMOUSB experiments with pre-industrial and doubled CO2, we see that increased CO2 results in an increase in Hcrit by several tenths of a Sverdrup. The different box model parameters for the two states are given in Table 1, and we have performed further box model parameter sensitivity studies changing each of these parameters individually from its 1 × CO2 to its 2 × CO2 value, to determine the main causes of the threshold shift under increased CO2. From these sensitivity studies we find that the dominant factors contributing to the increase in Hcrit are:

  1. (a)

    An increase in the average temperature difference between the North Pacific and the S box, TS − T0. Causes increase in Hcrit of 0.16 Sv.

  2. (b)

    an increase in the overall strength of the global water cycle, particularly an increase in net Atlantic evaporation − (FN + FT). Causes increase in Hcrit of 0.12 Sv.

  3. (c)

    changes in the efficiency of the ‘gyre’ freshwater transports in the Atlantic (KS, KN). These roughly cancel, leaving an overall increase in Hcrit of 0.02 Sv.

The enhanced atmospheric water cycle at increased CO2 (b) is a robust feature of climate model simulations (Collins 2013). The increase in TS − T0 (a) is also likely to be a robust result: most of the ocean warming occurs in the upper layers (cf. Gregory 2000; Landerer et al. 2007), so for the same change in heat content the box-mean temperature TS (covering only the top 1000 m or so of the ocean) changes more than T0 (for which a full-depth North Pacific box is used). Changes in gyre transports (c) are less well understood.

To explore whether the increase in Hcrit with increasing CO2 is likely to be robust, we have calibrated the box model to the more recent (CMIP5-generation) AOGCM HadGEM2-AO (Martin et al. 2011), in quasi-equilibrium states with 1×, 2×, and 4× pre-industrial CO2, and performed hosing experiments to determine Hcrit. Parameter values for these three calibrations are given in Table 1. For HadGEM2-AO we find that Hcrit increases by 0.27 Sv and 0.43 Sv at 2 × , and 4 × CO2 respectively, compared to the 1 × CO2 state (Fig. 5c). As was seen for FAMOUSB, a strengthened fresh water cycle (b) and increased temperature driving (a) both contribute to the increase in Hcrit; however for the HadGEM2-AO calibrations, increases in KN dominate the changes in the ‘gyre’ components (c), and make a large contribution to the increase in Hcrit. Changes to gyre exchange are less well understood than the other factors above so more uncertainty remains about this contribution. We also see a flattening of the response curve, with a less sharp threshold at higher CO2 in HadGEM2 but not in FAMOUSB. Through single-parameter perturbation experiments (not shown), we find that the flattening is due to the increase of KN at higher CO2, in HadGEM2.

5 Limits of traceability

An advantage of our box modelling approach is that since all the box model state variables and control parameters can be diagnosed directly from GCM solutions (and in principle from observations), the box model provides a low order dynamical framework to analyse the GCM; we can examine discrepancies between the box model and GCM solutions directly, and so understand where the box model breaks down. Indeed we used this process in the development of the box model. For example an earlier, four-box version of the model treated the N and B boxes as a single box. While this provided solutions that were qualitatively similar to the GCM, quite large quantitative discrepancies arose, and diagnosis of the discrepancies pointed to the relationship between density and circulation strength (1), which was not as tight as in Fig. 2a when the density of the merged N and B boxes was used rather than the N box alone. In this section we examine aspects of the solution where quantitative agreement between box model and GCM solutions remains less good, and diagnose the reasons behind these discrepancies.

5.1 Atmospheric fresh water feedbacks

As discussed in Sect. 3 above and in J17, the climate variations associated with AMOC changes through the FAMOUSA hosing experiment result in a slight increase in the surface fresh water flux into the North Atlantic, which accelerates the AMOC weakening. This atmospheric feedback is not included in our box model but by re-running the box model using the time-dependent surface fluxes diagnosed from the FAMOUSA run we assessed that the atmospheric feedback reduces the value of Hcrit by about 0.08 Sv in FAMOUSA. In principle the atmospheric feedback could be parametrised in the box model. However, when we assessed the impact of the feedback in the same way for the FAMOUSB 2 × CO2 run we found that in this case it resulted in an increase in Hcrit (again by around 0.08 Sv). This suggests that the atmospheric feedback on fresh water flux may be noisy and/or difficult to parametrise, so we do not attempt this here but rather consider it an error term in the box model leading to an uncertainty of ± 0.08 Sv in Hcrit as estimated by the box model.

5.2 Left hand threshold

We note that in Fig. 3 the left hand (‘ramp down’) threshold appears to be less accurately captured than the right hand (‘ramp up’) threshold. This can be understood as an inherent limitation of the box model, based on the analysis of FAMOUSA by J17. J17 interpreted the AMOC recovery in the ramp-down phase in terms of the North Atlantic salinity budget, as for the ramp up phase. The AMOC-off state and ramp down phase are characterised by a weak reverse overturning circulation (− 4 Sv at 26°N), and the recovery is driven by advection of salinity anomalies by this circulation. However in the South Atlantic the reverse overturning circulation in the off state is much stronger (− 8 Sv, see Fig. 3 and J17 Fig. 3c). The box model does not differentiate between the AMOC in the North and South Atlantic, and its ‘off’ state has a strong reverse circulation (− 14 Sv) which extends into the North Atlantic boxes, introducing quantitative errors in the salinity advection feedbacks there (note the stronger salinity advection term in the box model than in FAMOUSA during the ramp-down phase, green lines in Fig. 9a, b). We conclude that the box model is more quantitatively accurate for the ‘ramp up’ threshold (which is the threshold of most direct interest for future changes), and that the quantitative errors in the ‘ramp down’ threshold are structural errors that could only be reduced by the addition of extra complexity in the box model (providing meridional structure in the reversed MOC cell).

Fig. 9
figure 9

As Fig. 4, but for the ramp-down phase from year 2000 (H = 1.0 Sv) to year 4800 (H = − 0.4 Sv)

5.3 Sensitivity to the method of applying fresh water perturbations

In our baseline FAMOUSA hosing hysteresis experiment, as analysed by H11 and J17, the hosing is compensated by an opposite surface fresh water extraction over the rest of the ocean surface, to maintain zero global mean fresh water flux (this experiment is called ‘SCOMP’ in J17). J17 also analyse an alternative FAMOUSA experiment in which the hosing is compensated by fresh water extraction distributed over the entire ocean volume (designated ‘VCOMP’). The VCOMP experiment behaves somewhat differently to SCOMP, showing:

  1. (a)

    a more gradual weakening of the AMOC in VCOMP during the ramp-up phase, although the value of Hcrit is similar to SCOMP. J17 attribute this difference to increased near-surface salinities in the subtropical Atlantic in SCOMP (due to the surface hosing compensation) being advected northwards by the MOC (‹q›Sʹ, where ‹ › denotes the unhosed state and a prime denotes departures from it) and so counteracting the freshening effect of the Stommel advection feedback (qʹ‹S›). In VCOMP the near-surface freshening is not present, as the compensation is distributed through the water column, so the ‹q›Sʹ term is smaller and the AMOC weakens more gradually as H increases (compare the total fresh water advection by the MOC in FAMOUSA, green curves in Figs. 4a (SCOMP) and 10a (VCOMP)).

    Fig. 10
    figure 10

    AMOC hysteresis in the VCOMP version of FAMOUSA and the corresponding box model. Shown in a, b are the FAMOUSA and box model salinity budgets for the N box in the ramp-up phase (cf. Fig. 4a, b for SCOMP), while c shows the whole hysteresis loop (red), with the corresponding loop from the SCOMP run in black dashed (reproduced from Fig. 3c)

  2. (b)

    The left hand (ramp-down) threshold occurs at a much higher value of H in VCOMP, resulting in a very narrow hysteresis region in the ramp-up/ramp-down experiment, and possibly an almost completely monostable AMOC when more equilibrated solutions are considered (J17 Fig. 2b). This is attributed by J17 to the different South Atlantic reverse cells in the ‘off’ state in SCOMP and VCOMP.

We have emulated the VCOMP experiment in the box model by distributing the hosing compensation over the whole box model volume. We find only small differences from the box model SCOMP solution in the hysteresis loop and in the detail of the salinity budgets (Fig. 10, compare with Figs. 3c and 4b). We attribute the lack of impact on the sharpness of the threshold ((a) above) to the limited vertical resolution of the box model: a change in surface flux into the T box in the box model is necessarily spread over a depth of around 1000 m, limiting the surface-intensified ‹q›Sʹ feedback which delays AMOC weakening in the FAMOUS. In fact this difference explains why the standard SCOMP box model solution has a more gradual AMOC reduction than seen in FAMOUS (Fig. 3c); in this respect the box model SCOMP solution is intermediate between the FAMOUS SCOMP and VCOMP solutions. This limited vertical resolution is a fundamental structural bias in the box model, when used to emulate SCOMP-type hosing experiments. Turning to the differences (b) between the left-hand thresholds in VCOMP and SCOMP, we have already noted in Sect. 5.2 that the ‘off’ state involves changes in the inter-hemispheric structure of the MOC that are not represented by the box model, so it is not surprising that these differences found in FAMOUSA by J17 are not present in the box model ramp-down phase.

5.4 Discussion of differences between box model and FAMOUS solutions

Overall we conclude that the box model tends to under-estimate the FAMOUS Hcrit by around 0.1–0.2 Sv. Some of this bias is attributable to the lack of feedbacks through atmospheric fresh water fluxes (Sect. 5.1), and some to the limited vertical resolution of the box model, which reduces a stabilising advection feedback in the SCOMP experiment (Sect. 5.3). However the box model does include the primary driver of the rapid MOC decline near the ramp-up threshold, namely the quadratic dependence of the salinity advection by the MOC, on the North Atlantic salinity itself. This means that the box model is able to pick up the qualitative (and to some extent quantitative) differences in Hcrit between different ocean states, and provide a simple framework to understand the main factors determining Hcrit.

The box model also produces a more gradual AMOC decline in the ramp-up phase than is seen in the surface-compensated FAMOUS hosing experiments (SCOMP). This reflects the limited vertical resolution of the box model (Sect. 5.3).

By calibrating the box model to different decades in FAMOUS (not shown) and in an ocean reanalysis (Fig. 5d), we estimate an additional uncertainty in the right-hand threshold position of at least ± 0.04 Sv due to decadal ocean variability in the calibration variables.

The quantitative biases are greater for the left hand (ramp-down) threshold, due to water mass reorganisations in the FAMOUS off state that are not captured by the limited vertical and hemispheric resolution of the box model. However the qualitative similarity between Fig. 9a, b suggests that the box model may still provide useful qualitative insights into the dynamics of the left-hand threshold.

6 Discussion and conclusions

Our results show that the AMOC threshold and hysteresis behaviour in the FAMOUS AOGCM is controlled by low order dynamics, as represented by a 5-box dynamical model. The agreement between the box model and FAMOUS is particularly good for the ‘ramp-up’ threshold, which is the most relevant for future climate change. The box model parameters are determined by calibration to the baseline (un-hosed) ocean state, implying that the current ocean state contains sufficient information to estimate how far it is from threshold behaviour (e.g. in response to future fresh water input from the Greenland ice sheet).

The simplicity of the box model allows us to identify the factors in the ocean state that determine the position of the threshold Hcrit. Because the overturning is strongly correlated with the North Atlantic density, we focus here on the salinity budget of the North Atlantic rather than the whole Atlantic basin, following Jackson et al. (2017). As in many previous studies the approach to the threshold is dependent on the ‘salinity advection feedback’, which involves a quadratic dependence of the AMOC on the North Atlantic salinity (Eq. 14). However the exact value of Hcrit depends on a balance between the salinity advection feedback and other processes. The un-hosed (‘present day’) value of MOV at either the southern boundary of the Atlantic or in the northern subtropical Atlantic is not in itself a good predictor of Hcrit. Other factors often play more important roles in determining Hcrit, including the overall strength of the surface fresh water fluxes (hydrological cycle), the strength of the temperature driving of the flow, and the strength of the ‘gyre’ (i.e. non-AMOC) exchanges between the different water masses.

In our FAMOUS run with increased CO2 concentrations, Hcrit increases by several tenths of a Sverdrup compared to the state with pre-industrial CO2. To the best of our knowledge this is the first time that the AMOC threshold has been evaluated explicitly with increased greenhouse gases. Analysis of the box model calibrated to the FAMOUS runs identifies three main factors driving the increase in Hcrit, of which two (surface-intensified ocean warming and a strengthening global water cycle) are likely to be robust features of climate change. The intensified global water cycle means that even though more fresh water is delivered to the deep water formation region, the Atlantic basin as a whole becomes more evaporative (FN+ FT becomes more negative, Table 1), leading to the increase in Hcrit. The same warming and water cycle sensitivities are also seen when the box model is calibrated to a more advanced AOGCM, HadGEM2-AO, with various CO2 concentrations. However, changes in the gyre mixing efficiencies also influence the value of Hcrit at increased CO2, and these changes appear less robust between models, perhaps because they result from changes in the wind field that are model-dependent. Analysis of more AOGCMs would be needed to understand how robust is the increase in Hcrit with increased CO2.

The box model can be calibrated to any AOGCM solution, and therefore opens up the possibility of obtaining a dynamical understanding of the different responses to hosing seen across different AOGCMs (e.g. Rahmstorf et al. 2005; Stouffer et al. 2006; Kageyama et al. 2013). Hysteresis experiments with other AOGCMs will also provide an important test of our model hierarchy, testing the robustness of our conclusions about the dominant AMOC stability mechanisms and allowing the importance of other modelling factors such as Bering Straits throughflow (Hu et al. 2012) or higher resolution (Jungclaus et al. 2013; den Toom et al. 2014; Cheng et al. 2018) to be considered. Hysteresis experiments with eddy-resolving coupled models are computationally prohibitive at present but potentially feasible in future; a partial exploration of the hysteresis structure in a current generation (prototype-CMIP6) AOGCM, including an eddy-permitting ocean, has recently been carried out by Jackson and Wood (2018) and will be the subject of future study.

We stress that our study focuses on the response of the AMOC to slowly-varying fresh water forcing. Other processes, beyond those currently included in the box model, may come into play when considering the transient AMOC response to more rapidly varying forcing. such as transient greenhouse gas increase (e.g. Stocker and Schmittner 1997; Thorpe et al. 2001; Gregory et al. 2005; Lucarini and Stone 2005). Such scenarios will be considered in a future study. We note that even the present box model exhibits a range of rate-dependent and duration-dependent responses to rapid changes in fresh water forcing (Alkhayuon et al. 2019).

While uncertainty remains over the quantitative modelling of changes in the AMOC threshold under increased greenhouse gases, our model hierarchy approach has identified some simple, low order dynamical controls on the threshold that can in principle be determined from observations (directly or through data-assimilating reanalyses). These observations provide a dynamically-based ‘emergent constraint’ (Hall and Qu 2006; Cox et al. 2018) on the position of the threshold. Hence it may be possible to monitor whether the threshold is becoming closer or further away, using large-scale oceanographic observations, to provide early warning of any approaching regime shift. This is particularly important because, as with many AOGCMs, FAMOUS and HadGEM2-AO overestimate the northward freshwater flux MOV carried across 34ºS by the AMOC (Huisman et al. 2010; H11; Rodríguez et al. 2011; Mecking et al. 2017). While we showed in Sect. 4.3 that MOV is not a direct indicator of Hcrit, this bias suggests that the salinity advection feedback may excessively stabilise the AMOC in our AOGCMs (Drijfhout et al. 2011; Cimatoribus et al. 2012; Jackson 2013). So, even if it were possible to perform hosing runs with all current AOGCMs, relying on the current ensemble of AOGCMs to estimate Hcrit may give a biased result. To obtain a preliminary estimate of Hcrit, based on observations we have calibrated the box model to ocean states derived from an ocean reanalysis (Smith et al. 2007), which has MOV around − 0.2 Sv, close to observational estimates (H11) (Fig. 5d). This yields an AMOC threshold at about 0.35 Sv, suggesting that the GCMs studied here (FAMOUSA, FAMOUSB and HadGEM2-AO) may all be slightly further from an AMOC threshold than the real ocean. Calibration of the box model to a wider range of both AOGCMs and ocean analyses, and a thorough uncertainty analysis of the observational constraints, are needed to provide a robust result; this will be the subject of a future study.