1 Introduction

The Atlantic meridional overturning circulation (AMOC), with large temperature and salinity differences between the northward- and southward-flowing components, is responsible for a large oceanic transport of heat (northward) and fresh water (southward) and plays a fundamental role in establishing the mean state and the variability of the climate system. Heat carried by the warm Atlantic water in the upper AMOC limb is known to warm the Northern Hemisphere in the mean state, western Europe in particular (Rhines et al. 2008). It may also have triggered the recent rapid melting of the Arctic Sea ice (Serreze et al. 2007; Mahajan et al. 2011; Day et al. 2012) and Greenland glaciers (Holland et al. 2008; Straneo et al. 2010). On a broader scale, fluctuations of the AMOC are often linked to the Atlantic Multidecadal Variability/Oscillation (AMV/AMO), the multidecadal fluctuation of the North Atlantic sea surface temperature that has wide-ranging climate impacts (e.g., Knight et al. 2005; Zhang and Delworth 2006; Delworth et al. 2007; Zhang and Wang 2013; Klöwer et al. 2014), even though it remains debatable whether the AMOC variability is the causative driver (e.g., Clement et al. 2015, 2016; Zhang et al. 2016). In light of the AMOC’s major role in the climate system, there is a strong need to quantify and understand its temporal variability, as well as to assess the impact of a changing AMOC in the coming decades (e.g., Srokosz 2012; Kirtman et al. 2013; Collins et al. 2013).

Observations of the AMOC are limited, so one must rely on long-term climate simulations to understand the AMOC variability and to assess its impact on climate. Much attention has been paid to the decline of the AMOC transports in climate models, in response to the warming and associated freshening in high latitudes under scenarios of increasing greenhouse gas concentration (e.g., Cheng et al. 2013; Kirtman et al. 2013; Collins et al. 2013). Some basic characteristics of the AMOC variability in the climate models, such as the magnitude and meridional coherence, have not been well-established. Very recently, Kim et al. (2017) and Yan et al. (2018) suggested that the coupled climate models underestimate the multidecadal variability of the AMOC and that the underestimation has significant implications for the representation of AMV and other climate phenomenon in the coupled models. The main goal of this paper, therefore, is to document and gain insights into the AMOC variability at different timescales as represented by the current generation of coupled climate models. This is achieved by comparing the basic characteristics of the AMOC variability in fully coupled models to the variability displayed by ocean-ice models forced with prescribed atmospheric fields. In the latter models, key components of AMOC variability, such as potential temperature and salinity anomalies averaged over the 150–1000 m depth range within the central Labrador Sea region, have been shown to be in good agreement with observations (Danabasoglu et al. 2016). The two sets of global, long-term numerical simulations used in this paper are: phase 5 of the Coupled Model Inter-comparison Project (CMIP5, e.g., Taylor et al. 2012) and phase 2 of the Coordinated Ocean-ice Reference Experiments (CORE-II, e.g., Danabasoglu et al. 2014, 2016). The main difference between these two sets of simulations is that atmospheric state is prescribed in CORE-II, whereas it is fully coupled in CMIP5. Half of the CORE-II simulations use the same ocean-sea ice models and share nearly identical configurations as in the CMIP5 simulations.

The proposed driving mechanisms for the AMOC variability depend on the timescale. On interannual and shorter timescales, the AMOC variability is mostly driven by local and remote wind stress forcing (e.g., Xu et al. 2014; Zhao and Johns 2014; Yang 2015) and, to a lesser degree, by the intrinsic ocean dynamics in the eddying regime (e.g., Gregorio et al. 2015). On decadal and multidecadal timescales, the AMOC variability is often described as a lagged oceanic response to the buoyancy forcing in the western subpolar North Atlantic associated with the North Atlantic Oscillation (NAO) (e.g., Böning et al. 2006; Deshayes and Frankignoul 2008; Xu et al. 2013; Danabasoglu et al. 2016). We find that on interannual and decadal timescales, the AMOC variability in the CMIP5 exhibits a similar magnitude and meridional coherence as in CORE-II simulations, indicating that the modeled atmospheric variability responsible for AMOC variability in the CMIP5 is in reasonable agreement with that in the CORE-II forcing. On multidecadal timescales, however, the AMOC variability is much weaker in the CMIP5 than in the CORE-II simulations. The CMIP5 simulations also exhibit a weaker long-term variability of the NAO than in the CORE-II forcing and climate data. The weaker AMOC variability cannot be fully attributed to the weaker NAO variability, however, because the CMIP5 simulations do not exhibit a robust NAO-AMOC linkage, a result that is different from the atmospherically-forced models.

The paper is organized as follows. First, the two sets of global, long-term numerical simulations (CMIP5 and CORE-II) used in this paper and the corresponding mean AMOC are described in Sect. 2. In Sect. 3, the AMOC variability on multidecadal, decadal, and interannual timescales is examined. The relationship between AMOC variability on multidecadal timescales and the long-term atmospheric variability of the NAO is then discussed in Sect. 4. A summary follows in Sect. 5.

2 COREII and CMIP5 simulations and time mean AMOC

Approximately 20 climate modeling groups from around the world contributed to the CMIP5 experiments as part of the Fifth Assessment Report (AR5) by the Intergovernmental Panel on Climate Change (IPCC); see Table 1 in the “Appendix”. Many of these groups performed and provided multiple simulations with different ocean models and/or different complexities, i.e., coupled climate models that only include ocean, sea-ice, and atmosphere versus more complex earth systems that also include biochemical processes. Furthermore, the CMIP5 simulations include (a) multi-century preindustrial control integrations (refer to as pi-control simulations) to obtain a quasi-equilibrium; (b) historical runs that are started from the pi-control simulations and are forced with the observed atmospheric composition change (both anthropogenic and natural sources) from the mid-nineteenth century to 2005 (total of ~ 150 years); and (c) future projection runs that are forced with specified concentrations and/or emissions, also referred to as “representative concentration pathways” or RCP runs (see Taylor et al. 2012 for more detailed discussion on CMIP5 design). In total, 44 historical simulations provided meridional transports or meridional velocity that can be used to calculate meridional volume transport (CMIP5 model outputs are distributed through http://cmip-pcmdi.llnl.gov/cmip5). These historical simulations are the main focus in this study. Many of the historical simulations include several (ensemble) members using different initializations. We focus on the last 60 years (1946–2005, a time period comparable to that for CORE-II simulations) of the first ensemble member for each of the historical simulations in the evaluation of the AMOC variability in CMIP5. The results of full 150-year historical simulations (1850–2005), different ensemble members of selected historical simulations, and the pi-control simulations are then considered in the discussion of the AMOC linkage to the atmospheric variability.

Fig. 1
figure 1

Structure of the 60-year mean Atlantic meridional overturning stream function Ψ (in Sv) in a 18 forced CORE-II simulations from last cycle of years 1948–2007 and b 20 fully-coupled CMIP5 simulations (bold in Table 1) for years 1946–2005. The first nine simulations share the same ocean-sea ice models between the CORE-II and CMIP5 simulations. The contour intervals are 5 Sv; the thick black contour (Ψ = 0) marks the deep extension of the southward AMOC limb

The CORE-II dataset consists of 18 ocean-sea ice hindcast simulations (see Table 2) forced with the identical interannually-varying atmospheric dataset over the 60-year period 1948–2007 developed by Large and Yeager (2004, 2009) and the same bulk formulations for surface forcing. Details of the 18 simulations can be found in Danabasoglu et al. (2014, 2016) and are not repeated here. One noteworthy detail is that CORE-II simulations, unlike the fully-coupled CMIP5, apply surface salinity restoring toward ocean climatology on timescales ranging from 50 to 1500 days for different models. This restoring is not physical, but has been found necessary to prevent uncontrolled drifts in salinity as a response to inaccuracies in precipitation data as well as in model representation of the location/ strength of the currents, upwelling, and etc. The salinity restoring impacts the time mean AMOC transports but much less on the AMOC variability; see Fig. 21 in Danabasoglu et al. (2014). The simulations are integrated for five repeated cycles of the 60-year CORE-II atmospheric forcing, and the AMOC variability is similar in each cycle; see Fig. 1 in Danabasoglu et al. (2014). Thus, the last cycle is used for the diagnostics performed in this study. Since one main reason for the CORE-II comparison effort is to evaluate the ocean and sea ice models used in the CMIP5, nine of the 18 CORE-II simulations discussed in this paper (first 9 models listed in Table 2) use the same ocean and sea ice models as in the CMIP5 simulations. The horizontal resolution used in the CORE-II and CMIP5 ocean models is on the order of 1°.

Figure 1a displays the 60-year mean meridional overturning stream function \(\psi (y,z)\), defined as the integrated meridional volume transport (Sv) across the basin above a given depth z at a latitude y, in the Atlantic Ocean for the CORE-II simulations:

$$~\psi \left( {y,z} \right)=\overline {{\iint_{{{z^\prime }<z}} {v\left( {x,y,{z^\prime }} \right)d{z^\prime }dx,}}}$$
(1)

in which \(v\) is meridional velocity and overbar indicates time average. The AMOC transport is defined as the maximum stream function \(\psi\) on depth. Except for one outlier—the INMOM (Institute of Numerical Mathematics Ocean Model), which is the only terrain-following ocean model in CORE-II and exhibits a noisy mean overturning stream function—there is a general agreement between these simulations with a similar pattern for the overturning stream function. With respect to latitude, most of the CORE-II simulations exhibit a stronger overturning in 40-50°N range, the transition between subpolar and subtropical regimes. As described by Danabasoglu et al. (2014, 2016), the magnitude of the AMOC varies significantly despite the same atmospheric forcing. Near 26°N, the 60-year mean AMOC transports range from 10.4 to 17.8 Sv for the 18 CORE-II simulations and have an ensemble average of 13.9 Sv. The maximum overturning depth (i.e., the depth where maximum \(\psi\) is found at a given latitude) is similar among the simulations (~ 1000 m), but the maximum depth of the southward limb varies from 2500 m in AWI to 4500 m in GISSH (Fig. 1a).

For comparison, the 60-year mean overturning stream functions \(\psi\) for the 20 CMIP5 simulations are shown in Fig. 1b. The selected 20 CMIP5 simulations are listed in bold text in Table 1 (one for each modeling group, except for GFDL-CM2p1 and GFDL-ESM2G, which use different ocean model and have corresponding CORE-II simulations). As in the case of the CORE-II simulations, the overturning patterns are similar among the CMIP5 simulations, but there is a much wider range (model spread) in the magnitude of AMOC transport. Near 26°N, the 60-year mean AMOC transports of the 20 simulations in Fig. 1b range from 11 to 31 Sv, with an ensemble average of 17.1 Sv (16.5 Sv for 44 simulations). These multi-model averages are closer to the observational estimate (~ 17 Sv from the decade-long RAPID observations begun in 2004; e.g., McCarthy et al. 2015) than the 18 CORE-II simulations (13.9 Sv). In general, the CMIP5 simulations exhibit a higher 60-year mean AMOC transport than their CORE-II counterparts: 17.6 versus 14.7 Sv for first nine panels in Fig. 1a, b that use the same ocean-sea ice models. However, the difference is small compared to the model spread and we consider the mean AMOC transports of the two sets to be comparable.

3 AMOC variability on multidecadal, decadal, and interannual timescales

In this section, we document the CMIP5 and CORE-II AMOC variability on different timescales defined by the ensemble empirical mode decomposition (EEMD; Huang and Wu 2008; Wu and Huang 2009). The EEMD extracts the amplitude-frequency modulated oscillatory components (termed “intrinsic mode functions” or IMFs) successively from the highest to the lowest frequencies, without using a priori determined/assumed basis function. The number of IMFs is the integer of the binary logarithm of the time series length N (log2N). The advantage of using this method is that both the frequency and amplitude of each IMF are determined adaptively from the local characteristic of the time series and vary as a function of time. This is illustrated in Fig. 2a by using the AMOC transport time series at 26°N of the GFDL CORE-II simulation; the colored lines in the lower panel are the decomposed five IFMs, and the colored lines in the upper panel show how the summation of the IMFs approaches the original time series when higher-frequency IMFs are added successively (the sum of five IMFs is the same as the original time series). Since the period or frequency for each IMF varies as a function of time, the average period (calculated as the length of the record divided by the number of cycles) can be viewed as the characteristic period of the IMF. For example, the average period in Fig. 2a is 6.7 years for IMF2, 12.5 years for IMF3, and about 45 years for IMF4 (the accuracy of the averages depends on the number of cycles in each IMF). It is also useful to examine how the EEMD result compares to that of more basic, low-pass filtering. In Fig. 2b, the sum of IFM3, IMF4, and IMF5 (representing variability on decadal and longer timescales) is shown to be similar to the result of a 10-year average (performed using a zero-phase moving average filtering), and the sum of IMF4 and IMF5 (representing variability on multidecadal timescales) is somewhat similar to the result of a 30-year average.

Fig. 2
figure 2

a Decomposition of the AMOC transport variability (of CORE-II GFDL simulation at 26°N) on different timescales using the ensemble empirical mode decomposition (EEMD). The black line in the upper panel is the original time series, the colored lines in the lower panel are the decomposed five intrinsic mode functions (IMFs), and the colored lines in the upper panel are the summation of IMFs in comparison with the original time series. b A comparison of variability defined in EEMD and in low-passed, running average filter. The upper panel compares the summation of IMF5, IMF4, and IMF3 (representing decadal and longer term variability) and a 10-year average; the lower panel compares the summation of IMF5 and IMF4 (multidecadal variability) and 20-, 30-, and 40-year average

For each CMIP5 and CORE-II simulation, the last 60-year time series of the annual mean AMOC transports at each latitude is decomposed into five IMFs independently. Then, the same IMFs at each latitude are pieced together to construct the overall pattern of variability across the whole Atlantic Basin. For simplicity, we refer to the sum of IMF4 and IMF5 as multidecadal variability, IMF3 as decadal variability, and the sum of IMF1 and IMF2 as interannual variability.

3.1 Time-latitude distribution of the AMOC variability

The multidecadal variability of the AMOC transports is displayed as a function of time and latitude in Fig. 3a, b for the 18 CORE-II and 20 CMIP5 simulations, respectively. The CORE-II simulations mostly exhibit a consistent, meridionally-coherent multidecadal variability: a decrease in the earlier years, followed by a major shift from a low to high AMOC transport regime in the 1980s (the last two simulations, MIT and NOCS, differ somewhat from the other simulations by displaying some northward propagation). In contrast, the variability in the CMIP5 models is not consistent. Some models show a clear increase (e.g., ACCESS1-3, NorESM1-M), while others show a clear decrease (e.g., CNRM-CM5-2, CCSM4, BNU-EMS, FIO-ESM). Thus, there is no systematic long-term variability in AMOC transports over this time period, despite the common atmospheric composition change in these historical runs. This suggests that the impact of external forcing (such as the greenhouse gas and aerosols) on the AMOC is smaller than that of the internal variability in the historical simulations. This is different from the RCP simulations, in which a consistent decreasing AMOC transport can be found (e.g., Cheng et al. 2013; Collins et al. 2013). More importantly, it is clear from Fig. 3a, b that the multidecadal variability is significantly weaker in the CMIP5 simulations than in the CORE-II simulations and this will be discussed further in section 3b.

Fig. 3
figure 3

Variability of the modeled AMOC transports (in Sv) on multidecadal timescale between a 18 forced CORE-II simulations (Table 2), and b 20 coupled CMIP5 simulations denoted in bold in Table 1. The 60-year mean AMOC transports at each latitude are removed. The results show that the variability in CORE-II is stronger and meridionally more coherent than in CMIP5 simulations

Figure 4 displays the decadal variability of the AMOC transports for the CORE-II and CMIP5 simulations. As in the case for the multidecadal variability, the decadal variability is consistent within the CORE-II simulations but not within the CMIP5 simulations. For example, most of the CORE-II simulations exhibit a low AMOC transport in the 1950s and a high AMOC transport in the 1990s. In contrast to the multidecadal variability, however, the decadal variability has comparable magnitude between the 18 CORE-II and 20 CMIP5 simulations as a whole. Note that there is significant model spread within the CMIP5 simulations, for example, much weaker decadal variability in CCSM4 and FIO-ESM than in GFDL-CM2.1. Furthermore, the decadal AMOC variability exhibits a comparable meridional coherence between the CORE-II and CMIP5 simulations as a whole, even though there exist more model-to-model differences in the CMIP5 simulations.

Fig. 4
figure 4

Variability of the modeled AMOC transports (in Sv) on decadal timescale between a 18 forced CORE-II simulations (Table 2) and b 20 coupled CMIP5 simulations denoted in bold in Table 1. The results show that the variability exhibits a similar magnitude and degree of meridional coherence between the CORE-II and CMIP5 simulations as a whole

The interannual variability of the AMOC transports in the CORE-II and CMIP5 simulations is displayed in Fig. 5. Although not as clear as in the case for the longer-term variability (Figs. 4, 5), the interannual variability also exhibits some similarity between the CORE-II simulations. For example, a high AMOC transport can be seen during early 1950s in most simulations. Such similarity cannot be seen between the CMIP5 simulations. The CORE-II and CMIP5 simulations, as a whole, exhibit similar magnitudes of variability. The meridional coherence pattern is not very clear, as the signals are noisy for both sets of simulations. The variability is more coherent to the south of about 40°N as compared to north of this latitude. Coherent interannual AMOC variability south of 30–40°N is also found in observations (Kelly et al. 2014) and in high-resolution model results (Xu et al. 2014).

Fig. 5
figure 5

Variability of the modeled AMOC transports (in Sv) on interannual timescale between a 18 forced COREII simulations (Table 2) and b 20 coupled CMIP5 simulations denoted in bold in Table 1. The results show that the variability exhibits a similar magnitude and degree of meridional coherence between the CORE-II and CMIP5 simulations as a whole

3.2 Magnitude and meridional coherence of the AMOC variability

The qualitative differences or similarities of the AMOC variability between the CORE-II and CMIP5 simulations (Figs. 3, 4, 5) can be examined more quantitatively. Figure 6 displays the variability of the AMOC averaged over the whole Atlantic Basin from 30°S to 60°N. All 44 simulations in Table 1 are considered to obtain a comprehensive measure for the CMIP5 simulations as a whole. The results show that, for all three timescales, the basin-wide AMOC variability is consistent in the CORE-II simulations but not in the CMIP5 simulations. This contrast is clearer in Fig. 6 than in Figs. 3, 4 and 5. The magnitude of the basin-wide AMOC variability is higher in CORE-II than in CMIP5 simulations on multidecadal timescales (Fig. 6a, b), but comparable on decadal (Fig. 6c, d) and interannual (Fig. 6e, f) timescales. The standard deviation of the multidecadal AMOC variability is 0.89 Sv for the 18 CORE-II simulations and 0.34 Sv for the 44 CMIP5 simulations (a factor of 2.6). For comparison, the standard deviation values are 0.24 and 0.20 Sv for decadal variability, and 0.36 and 0.38 Sv for interannual variability. The averaged standard deviation values for 20 CMIP5 simulations (shown in Figs. 3, 4, 5) are essentially the same as for 44 simulations.

Fig. 6
figure 6

Variability of the basin-wide averaged AMOC transports (in Sv) from 30°S to 60°N in the forced CORE-II and fully-coupled CMIP5 simulations: a, b multidecadal, c, d decadal, and e, f interannual. Colored lines from blue to red represent models listed in order in Tables 1 and 2. The numbers are multi-model averages of the standard deviation at each timescale (the multidecadal variability is much higher in CORE-II simulation than in CMIP5 simulations). The results also show similar variability among 18 CORE-II simulations, but not among CMIP5 simulations

To examine if the difference on multidecadal timescale is due to meridional averaging, the standard deviation of the AMOC variability on different timescales is displayed in Fig. 7 as a function of latitude. The differences or similarities between the CORE-II and CMIP5 simulations are consistent with Fig. 6 even though the overall magnitude of AMOC variability is higher. The magnitude of the AMOC variability between the CORE-II and CMIP5 simulations, represented in the multi-model averaged standard deviation value from 30°S to 60°N, differs by a factor of 2.2 on multidecadal (0.96 versus 0.42 Sv), while being similar on decadal (0.37 versus 0.32 Sv) and interannual (0.71 versus 0.70 Sv) timescales. Given the weaker multidecadal variability in the CMIP5 simulations, the model-to-model difference for multidecadal variability in the CMIP5 simulations is less than that in the CORE-II simulations. For decadal and interannual variability, the model spread is generally smaller in COREII than in CMIP5. A weaker AMOC variability on multidecadal timescales and similar variability on decadal and interannual timescales in coupled models, when compared to forced models, were also found by Kim et al. (2017) based on a large ensemble of simulations performed with the Community Earth System Model (CESM). Figure 7 also shows that the averages of nine common models of the CORE-II and CMIP5 simulations (dashed thick black lines) and the averages of all 18/44 simulations (solid thick black lines) are similar. Thus, the results are not significantly impacted by the number of CORE-II/CMIP5 simulations used.

Fig. 7
figure 7

Standard deviation of the AMOC transports as a function of latitude in the forced CORE- II and coupled CMIP5 simulations: a, b multidecadal, c, d decadal, and e, f interannual. Colored lines from blue to red represent individual models listed in order in Tables 1 and 2. The thick black lines are multi-model averages for all the simulations (solid line) and for the first nine simulations (dashed lines) that share the same ocean-sea ice models between CORE-II and CMIP5. The numbers are the averaged standard deviation value over the whole Atlantic (30°S–60°N) for all the CORE-II/CMIP5 simulations

As shown in Fig. 3, CORE-II simulations exhibit a meridionally-coherent multidecadal variability from 35°S to about 40°N (some extend further north to 60°N) and most of the simulations do not show significant phase shift, although the MIT and NOCS models do show some variability in the southern Atlantic leading the variability in the North Atlantic. On the other hand, the meridional coherence in CMIP5 is not as strong as in the CORE-II models. The degree of meridional coherence can be evaluated by computing the correlation between the variability at a specific latitude and the variability averaged over the entire Atlantic Basin from 30°S to 60°N (Fig. 8). The comparison shows that, for multidecadal variability, the correlation coefficient is above 0.9 for most of the Atlantic Basin in CORE-II simulations (Fig. 8a). The correlation is lower in the CMIP5 simulations overall, with several models exhibiting very little meridional coherence (Fig. 8b). On decadal and interannual timescales, the meridional coherence becomes similar between the CORE-II and CMIP5 (Fig. 8c–f). On interannual timescales, the correlation coefficient is highest in the tropical region and decreases toward the north and south. The results in Fig. 8 also show that the multi-model averages are similar between nine common models and all 18 CORE-II/44 CMIP5 models.

Fig. 8
figure 8

Zero-lag correlation coefficient between the AMOC variability at specific latitude and the basin-wide averaged AMOC variability (30°S–60°N) in the forced CORE-II and coupled CMIP5 simulations: a, b multidecadal, c, d decadal, and e, f interannual. Colored lines from blue to red represent models listed in order in Tables 1 and 2. The thick black lines are multi-model averages for all the simulations (solid lines) and for the first nine simulations (dashed lines) that share the same ocean-sea ice models between CORE-II and CMIP5

The fact that the CORE-II simulations are mostly consistent with each other validates largely the initial hypothesis put forward by Griffies et al. (2009) and Danabasoglu et al. (2014, 2016)—that global ocean-sea ice models run under the same atmospheric state and bulk formulas should produce qualitatively similar AMOC variability (although the time mean AMOC transport differs). The consistent AMOC variability under the same atmospheric state also suggests that the AMOC variability represented in these models is not internal ocean dynamics, but rather an external response to the atmospheric variability. This is consistent with the finding that the intrinsic variability is absent in coarse-resolution models (~ 1° for both CORE-II and CMIP5) and becomes significant in eddy-permitting and eddying regimes (e.g., Gregorio et al. 2015). The inconsistency between the coupled CMIP5 experiments is not surprising considering different ocean–atmosphere initializations and the chaotic nature of the coupled solutions. Because the modeled AMOC variability is an external response to the atmospheric state, the similarity between the CORE-II and CMIP5 simulations on interannual to decadal timescales implies that the atmospheric variability responsible for the AMOC variability in the CMIP5 simulations is in reasonable agreement with that in the CORE-II forcing.

4 AMOC variability and long-term atmospheric variability

The striking differences between the forced and coupled simulations on multidecadal timescales merit further discussion. It is often surmised, based on forced ocean simulations of various resolutions (e.g., Böning et al. 2006; Deshayes and Frankignoul 2008; Xu et al. 2013; Danabasoglu et al. 2016) and coupled climate models (e.g., Delworth and Zeng 2016; Kim et al. 2017), that the long-term AMOC variability is modulated by the NAO, the most prominent and recurrent pattern of the atmospheric variability over the middle and high latitudes of the Northern Hemisphere (Hurrell 1995; Hurrell et al. 2003). The NAO is associated with the strength and direction of the westerly winds and the location of storm tracks across the North Atlantic, which affect the deep-water formation processes in the subpolar North Atlantic and hence impact the AMOC. Note that the processes in the Nordic Seas can also impact the AMOC variability through the dense overflow water (e.g., Lohmann et al. 2014). This impact needs to be examined further and it may be sensitive to the choice made to model or parameterize the overflow in climate models (e.g., Legg et al. 2009; Danabasoglu et al. 2010). In this section, we focus on the subpolar North Atlantic to examine (1) if the long-term variability of the NAO is weaker in the CMIP5 than in the CORE-II simulations, and (2) if there is a robust NAO-AMOC linkage in these simulations. The NAO index is defined as the normalized (by the standard deviation) fluctuations in the difference of atmospheric pressure at sea level between the Icelandic low (21.82°W, 64.13°N) and the Azores high (9.14°W, 38.72°N) during December-March (i.e., the station-based wintertime NAO index as in Hurrell 1995). We also examined the NAO index based on the leading Empirical Orthogonal Function (EOF) of sea level pressure (not shown) and the results are very similar. The AMOC index is defined as the basin-wide average of the AMOC transport anomalies from 30°S to 60°N to minimize the potential influence of local dynamics.

The NAO-AMOC linkage is examined in terms of lead-lag correlation, and the statistical significance of the lead-lag correlation is evaluated using a two-tailed Student’s t test with an effective number of degrees of freedom \({N^*}\). The value of \({N^*}\) is given by the following approximation (see Pyper and Peterman 1998; Wang et al. 2017):

$$\frac{1}{{{N^*}}} \approx ~\frac{1}{N}+~\frac{2}{N}\mathop \sum \limits_{{j=1}}^{N} \frac{{N - j}}{N}{\rho _{XX}}(j)~{\rho _{YY}}(j),$$
(2)

in which N is the record length, and \({\rho _{XX}}(j)\) and \({\rho _{YY}}(j)\) are the auto-correlations of the two sampled time series, NAO and AMOC, at time lag j. Given \({N^*}\) value, the critical correlation value \({r_{crit}}\) at the significance level \(\alpha\) can be derived using the t distribution for two-tailed test:

$${r_{{\text{crit}}}}=\sqrt {t_{{\alpha ,{N^*}}}^{2}/\left( {t_{{\alpha ,{N^*}}}^{2}+{N^*}} \right)}$$
(3)

in which \({t_{\alpha ,{N^*}}}\) is the Student’s critical t value. Figure 9 shows the time series of the NAO and AMOC in CORE-II simulations. The CORE-II NAO index is calculated using the CORE-II forcing and it is essentially identical to the one derived from climate data (Hurrell 1995). There is a shift from low to high NAO index on multidecadal timescales and a clear, lagged AMOC increase in most CORE-II simulations. The values of maximum correlation coefficient and NAO-lead in years are listed in Fig. 9 and the lead-lag correlation coefficient is shown in Fig. 10a. Sixteen out of the 18 CORE-II simulations exhibit a positive correlation that is significant at 90% or higher level, with the NAO leading the AMOC by 7–13 years (the lower correlation in the two outliers, MIT and NOCS, may be impacted by basin-scale meridional averaging in the AMOC). Overall, the CORE-II simulations exhibit a robust NAO-AMOC linkage.

Fig. 9
figure 9

Variability of the station-based wintertime (December-March) North Atlantic Oscillation (NAO) index (blue) and the basin-wide AMOC variability in Sv (red) in 18 CORE-II simulations. The thin lines denote the annual mean and thick lines denote multi-decadal variability. The numbers are maximum correlation coefficient and the lead/lag time in years (negative values denote NAO leading the AMOC) based on the variability on multidecadal timescales; see Fig. 10a for the corresponding lead-lag coefficient

Fig. 10
figure 10

Lead-lag correlation between the station-based wintertime NAO index and the basin-wide averaged AMOC variability in a18 CORE-II simulations (1948–2007), b 20 historical CMIP5 simulations denoted in bold in Table 1 for the last 60 years (1946–2005) as shown in Fig. 11, c 20 CMIP5 historical simulations denoted in bold in Table 1 for the full integration period of about 150 years; d 44 CMIP5 simulations in Table 1 for the full integration period; e 15 pi-control simulations denoted in italic bold in Table 1 for record length ranging from 250 to 1000 years. Negative lag values indicate that the NAO leads the AMOC. Colored from blue to red represent models listed in order in Tables 1 and 2. Solid lines (along with the red dots) denote that the NAO-AMOC correlation is significant (at 90% level), whereas the dashed lines denote that the correlation is insignificant. The red square denote a significant AMOC-NAO correlation (with AMOC leading). The results highlight that, unlike in the CORE-II simulations, the NAO-AMOC linkage is not robust in the CMIP5 simulations

Fig. 11
figure 11

Similar to Fig. 9, but for variability of the station-based wintertime NAO index (blue) and the basin-wide AMOC variability in Sv (red) in 20 CMIP5 historical simulations denoted in bold in Table 2 over 1946–2005. The thin lines denote the annual mean and thick lines denote multi-decadal variability. The numbers are maximum correlation coefficient and the lead/lag time in years (negative values denote NAO leading the AMOC) based on the variability on multidecadal timescales. The corresponding lead-lag correlation coefficient is shown in Fig. 10b

The results in the coupled CMIP5 simulations are, however, very different (Fig. 11). First, the long-term variability of the modeled NAO is weaker in the CMIP5 simulations than in the CORE-II forcing. This is consistent with Wang et al. (2017), who documented that almost all CMIP5 models underestimate the NAO fluctuation on multidecadal timescales, despite the fact that most of them capture the basic characteristics of the interannual NAO pattern reasonably well. Kim et al. (2017) also found a similar weak multidecadal NAO variability in their large ensemble simulations of CESM. Thus, although some have suggested that the NAO may be more predictable than previously thought and skillful forecasts may be possible (e.g., Scaife et al. 2014), representing the long-term fluctuation of this prominent large-scale atmospheric variability pattern in climate models clearly remains a challenge. Second, and probably more importantly, there is no clear or robust NAO-AMOC linkage in the coupled CMIP5 simulations as a whole (Figs. 10b, 11). For example, the NAO index in the simulation of GFDL-ESM2G actually exhibits a similar phase (smaller magnitude) as in the CORE-II forcing and the climate data, but the corresponding AMOC transport variability is opposite to that in CORE-II simulations (see also Fig. 3). Among the 20 CMIP5 simulations, six simulations (NorESM1-M, bcc-csm1-1-m, CanESM2, EC-EARTH, FIO-ESM, and MIROC5) show a positive correlation that is significant at the 90% level, with NAO leading AMOC.

To determine the NAO-AMOC linkage on multidecadal timescales (IMF4-5) with more confidence, one needs to examine their correlation based on a longer time series than the 1948–2007 period. The lead-lag correlations for the full 150-year time series (1850–2005) for 20 CMIP5 simulations are shown in Fig. 10c. Eight simulations (ACCESS1-3, NorESM1-M, CNRM-CM5-2, GFDL-ESM2G, MRI-CGCM3, CCSM4, and CanESM2) show a positive NAO-AMOC correlation that is significant at 90% level, with the NAO leading the AMOC by 3–15 years. This result is different from that based on the last 60 years of the simulation making it clear that the robustness of the NAO-AMOC linkage in climate models needs to be further evaluated with longer simulations. We further examined the NAO-AMOC linkage in the CMIP5 models over the 1850–2005 period by including the decadal timescales (IMF3-5, not shown). Significant NAO-AMOC linkage is found for the eight models that displayed positive correlation on multi-decadal timescales, plus two more, EC-EARTH and IPSL-CM5A-MR. Wang et al. (2017) also examined the NAO-AMOC correlation in ten CMIP5 simulations during 1900–2005, focusing on decadal and longer timescales, and found that two (out of ten) simulations exhibited a positive correlation that is significant at 90% level. The main reason for the difference in the number of positive correlations (50% of the models in our study versus 20% in Wang et al. 2017) is because we use the wintertime NAO index whereas Wang et al. (2017) used an annual mean NAO index: (a) the NAO variance is stronger in winter and its oceanic influence is maximum (Czaja and Frankignoul 2002), and (b) the deep-water formation process takes place in winter and is expected to directly impact the AMOC (although warming seasons can impact the deep convection as well, through restratification and pre-conditioning of the water column).

When all historical CMIP5 simulations with the full 150-year record are considered, 16 (out of 44) simulations show a significant NAO-AMOC correlation on multidecadal timescales (Fig. 10d). We also examined the NAO-AMOC correlation in 15 pre-industrial control simulations (listed in italic bold text in Table 1, with a time series length ranging from 250 to 1000 years), and the results are similar: 7 of them show a significant correlation, whereas the rest 8 do not (Fig. 10e). Thus, the results in Fig. 10 suggest that, unlike the CORE-II simulations, the CMIP5 simulations do not exhibit a robust NAO-AMOC linkage on multidecadal timescales. Some coupled CMIP5 simulations also exhibit a negative AMOC-NAO correlation, with AMOC leading NAO (red squares in Fig. 10).

The NAO-AMOC linkage in 150-year historical simulation using the CCSM4 is consistent with Kim et al. (2017), who found significant NAO-AMOC linkage in the CESM, a new climate model developed from CCSM4. The results in Kim et al. (2017) are based on a large ensemble of CESM simulations and their Fig. 12 does show a significant spread among the ensemble members. To determine if the NAO-AMOC linkage is fully model independent, we examine the NAO-AMOC linkage among the multiple ensemble members of the 14 CMIP5 simulations and find that no CMIP5 simulation shows a robust NAO-AMOC linkage among all the available ensemble members. Two examples are displayed in Fig. 12. Their results are similar when multiple members are considered: Four out of ten members in CSIRO-Mk3-6-0 and two out of six members in CCSM4 show a positive NAO-AMOC linkage that is significant, while other members do not. Thus, the lack of a robust NAO-AMOC linkage applies not only to different CMIP5 models, but also to ensemble members of the same CMIP5 model. We also examined the correlation in CCSM4 and CSIRO-Mk3-6-0 with the ensemble means removed, to investigate the impact of the externally forced signals, and found that the result of a non-robust linkage among the ensemble members is the same (not shown).

Fig. 12
figure 12

Lead-lag correlation between the station-based wintertime NAO index and the basin-wide averaged AMOC variability in a six ensemble members of the CCSM4 simulations, b ten ensemble members of the CSIRO-Mk3-6-0 simulations. Results are based on the full near 150-year historical integration. Negative lag values indicate that the NAO leads the AMOC. Solid lines (along with the red dots) indicate that the correlation is significant (at 90% level), whereas the dashed lines indicate that the NAO-AMOC correlation is insignificant

The lack of a robust NAO-AMOC linkage in CMIP5 simulations seems to contradict the results of Delworth and Zeng (2016), who found that NAO impacts AMOC in three GFDL coupled models. Their results are based on perturbation experiments in which a pattern of anomalous heat flux corresponding to the observed NAO was added to the coupled model. To investigate this, we consider the variability of wintertime heat flux and maximum mixed layer depth (MLD) as the middle link between the NAO and AMOC since, after all, the NAO presumably impacts the AMOC through air-sea heat flux and deep-water formation (indicated by wintertime MLD) in the western subpolar North Atlantic. Figure 13 displays the multi-decadal variability of the heat flux and mixed layer depth. At each location, the multi-decadal variability is defined using EEMD methods. The variability is very similar among different ensemble members so only the first member is shown. The variability of heat flux exhibits a somewhat similar distribution between CCSM4 and CSIRO-Mk3-6-0. Both show a high variability in the western Labrador Sea (but further to the north in CSIRO-Mk3-6-0), the northern Irminger Sea, and the Nordic Seas (Fig. 13a, b). The variability of mixed layer depth differs significantly among CMIP5 simulations. The variability is higher and covers a larger area in the western subpolar region in CCSM4 than compared to CSIRO-Mk3-6-0 (Fig. 13c, d).

Fig. 13
figure 13

The magnitude of the multidecadal variability of the modeled a, b wintertime (December through March) head flux (in W/m2) and c, d maximum mixed layer depth (in m) in CCSM4 and CSIRO-Mk3-6-0 simulations. The mixed layer depth is defined as density difference equivalent to a temperature change of 0.2 °C. The results are based on the first ensemble member (very similar results among different members). The black box defines the western subpolar North Atlantic (30–70°W, 50–65°N)

Despite the difference in details between CCSM4 and CSIRO-Mk3-6-0, the modeled variability of the wintertime heat flux and the mixed layer depth in the western subpolar North Atlantic (30–70°W, 50–65°N) is consistently linked to the AMOC variability, significant across all ensemble members of the CCSM4 and CSIRO-Mk-3-6-9 simulations (Fig. 14). This is consistent with Delworth and Zeng (2016), who found that the variability of the heat flux associated with the observed NAO fluctuations impacts the deep-water formation and is strongly linked to AMOC variability in three GFDL coupled models. On the other hand, the modeled variability of wintertime NAO index is not always linked to the variability of the wintertime heat flux and MLD in the western subpolar North Atlantic (Fig. 15). For those ensemble members that do show a significant correlation between the NAO and heat flux or between the NAO and MLD, the NAO-AMOC linkage become significant (Figs. 13, 15). Therefore, the lack of a robust NAO-AMOC linkage in the CMIP5 simulations is mainly because the modeled NAO variability is not consistently linked to the variability of wintertime heat flux and deep water formation in the western subpolar North Atlantic. This linkage is more robust in CORE-II simulations: 17 out of 18 CORE-II simulations (except the NOCS) exhibit a significant linkage between NAO and MLD in the western subpolar North Atlantic (not shown, consistent MLD variability in CORE-II simulations is discussed in Danabasoglu et al. 2016).

Fig. 14
figure 14

Lead-lag correlation between the basin-wide averaged AMOC variability and the variabilities of the wintertime a, b heat flux and c, d mixed layer depth in the western subpolar North Atlantic (see black box in Fig. 13) in CCSM4 and CSIRO-Mk3-6-0 simulations. Results are based on the full near 150-year historical integration. Negative lag values indicate that the heat flux/mixed layer depth leads the AMOC. Colored lined indicate results from different ensemble members. Solid lines (along with the red dots) indicate that the correlation is significant (at 90% level), which is the case for all ensemble members in both CCSM4 and CSIRO-Mk3-6-0 simulations

Fig. 15
figure 15

Lead-lag correlation between the fluctuations of the station-based wintertime NAO index and the variabilities of the wintertime a, b heat flux and c, d mixed layer depth in the western subpolar North Atlantic (see black box in Fig. 13) in CCSM4 and CSIRO-Mk3-6-0 simulations. Results are based on the full near 150-year historical integration. Negative lag values indicate that the NAO leads the heat flux/mixed layer depth. Colored lined indicate results from different ensemble members. Solid lines (along with the red dots) indicate that the correlation is significant (at 90% level), whereas the dashed lines indicate that the correlation is insignificant

5 Summary and discussion

The Atlantic meridional overturning circulation (AMOC) plays a fundamental role in the earth climate system. But, because observations are limited, long-term climate simulations must be relied upon to better understand the AMOC variability and to assess its impacts on climate for the coming decades. Therefore, it is important to gain insights into the AMOC variability represented in the current generation of climate simulations. Specifically, we need to know if the AMOC variability in coupled simulations is similar to or different from observations and atmospherically-forced simulations. This is examined by comparing the AMOC variability in 44 CMIP5 simulations, in which the atmospheric state is fully-coupled, to 18 CORE-II simulations, in which a common atmospheric state is prescribed. These simple comparisons offer the following two points:

On interannual and decadal timescales, the AMOC variability in CMIP5 and CORE-II simulations exhibits a similar magnitude and meridional coherence. This implies that the atmospheric variability represented in the coupled models on these timescales is in reasonable agreement with the prescribed atmospheric forcing variability.

On multidecadal timescales, however, the AMOC variability is stronger by a factor more than two and is meridionally more coherent in the CORE-II simulations than in the CMIP5 simulations. The long-term AMOC variability is often described as a lagged oceanic response to the atmospheric variability, the NAO in particular (e.g., Böning et al. 2006; Deshayes and Frankignoul 2008; Xu et al. 2013; Danabasoglu et al. 2016; Delworth and Zeng 2016; Kim et al. 2017). The CMIP5 simulations do exhibit a weaker NAO variability on multidecadal timescales, compared to climate data. One cannot fully attribute the weaker AMOC variability to the weaker NAO, however, because the CMIP5 simulations do not exhibit a robust NAO-AMOC linkage. While the variability of modeled wintertime heat flux and mixed layer depth in the western subpolar North Atlantic is consistently linked to the AMOC variability, the modeled NAO variability is not.

Our finding of a weak multi-decadal variability of the AMOC and NAO in the CMIP5 simulations is consistent with the recent works of Wang et al. (2017), Kim et al. (2017), and Yan et al. (2018). It supports the general assessment that the current state-of-the-art coupled models lack natural variability on multidecadal timescales in the North Atlantic. Peings et al. (2016) showed that the modeled Atlantic Multidecadal Variability (AMV) is also weaker than observed in most of the CMIP5 simulations. Given the weak variability, it is probably not too surprising that the relationships between NAO, AMOC, and AMV are also weak or non-robust. As for the NAO-AMOC linkage discussed here, there is also a broad spread in the relationships between the NAO and AMV (Peings et al. 2016) and between the AMV and AMOC (Frankignoul et al. 2017) among the CMIP5 simulations and/or in different ensemble members of the same CMIP5 model.

Finally, our results of a non-robust NAO-AMOC linkage in the CMIP5 models are not sensitive to the definitions of the NAO index (i.e., station-based versus principle component-based) and/or the AMOC index (basin-averaged versus one latitude). It is important to keep in mind, however, that these indexes primarily reflect the magnitude of the NAO and AMOC and that changes in the spatial structure are not considered. For example, what matters the most for the deep water formation, and hence the AMOC to some extent, is the meridional pressure gradient of the sea level pressure and westerlies over the western subpolar North Atlantic, which can be altered by the magnitude and the structure change of the NAO. Similarly, the impact of the NAO on the AMOC may reflect on the water mass properties (i.e., temperature, salinity, and density) of the upper and lower limbs of the AMOC. These aspects of the variabilities need to be further investigated.