1 Introduction

The regionalization of climate studies can be viewed in terms of two fundamental objectives. The first concerns improved understanding, assessing the relative roles of remote and local processes. The second is utilitarian: the transmission of information from large to small scales for impact studies. Both are of great importance for studies of the West African monsoon, which has seen major climate variations on interannual timescales with severe local impacts. To understand and predict these variations, mechanisms have been identified involving both teleconnections to remote sea–surface temperature anomalies (Folland et al. 1986; Janicot et al. 1996) and local land surface feedbacks (Charney 1975; Zheng and Eltahir 1998; Zeng et al. 1999; Douville et al. 2001; Philippon and Fontaine 2002). Both types of processes are important, and their relative importance also depends on the time and space scales considered.

One approach to the remote/local dichotomy and the problem of multiple scales is to use a regional climate model (RCM), forced at the boundaries by observational analyses or general circulation model (GCM) results. Such models are able to simulate regional climates at higher resolutions than GCMs and with a dedicated physics adapted to the region of interest. RCMs can be used in experiments designed to isolate the effects of different boundary influences in a setting that is more realistic than typical GCM sensitivity experiments because of the imposition of observed conditions at the boundary. Studies of the West African monsoon with RCMs have recently been carried out by Paeth et al. (2005), Gallée et al. (2004), Vizy and Cook (2002) and Ramel et al. (2006). Since regional climate models are relatively expensive to run, sensitivity experiments are usually limited to two realisations, with differing boundary conditions or differing internal processes or both. Conclusions are drawn from the difference between the two runs and if further runs are to be carried out the resources are usually put into another study: a different year, a different parameterisation, some change in the boundary, etc. The statistical significance of the result is taken for granted, and general experience is used to decide which diagnostics are reliable.

Regional models can be used as scale filters between large-scale climate variations and the impact scale. In this application they become a more interdisciplinary tool, shared by different communities, and it becomes more important to establish some guidance to their interpretation. The purpose of this paper is to give a more objective guide as to which diagnostics are reliable, and how large a signal needs to be before conclusions can be safely drawn from it. This type of information is equally applicable to assessing confidence in sensitivity experiments and to providing a reasonable range of inputs to impact models.

We present a pair of experiments over West Africa which differ only in their internal variability. All boundary conditions, surface conditions and parameterisations are exactly the same for the two runs. Only the initial condition is changed. This gives rise to differences between the two experiments that are independent of all the factors usually considered in sensitivity experiments. After an initial spin-up time (3 months, which are not considered in the following analysis) this arbitrary change in initial conditions is forgotten and the differences between the two experiments are small. They represent the noise level: an amplitude for random unpredictable perturbations. This noise level depends on the variable and the time period under consideration, and a range of diagnostics will be presented here that are of interest to impact modellers. For sensitivity studies the results give a measure of significance that is necessary if inferences are to be drawn about mechanisms. For smaller scale impact models that rely on these regional simulations, the results provide a lower limit for meaningful changes in input, or alternatively a measure of the range of inputs that should be provided for an ensemble study.

This is the first dedicated internal variability study for the west African region, although recently Paeth and Feichter (2006) have used an analysis of variance approach to assess the significance of simulated changes over the region due to greenhouse gas modifications. In that study they applied an F-test to compare their signal with their inter-sample variability (or “treatment effect”, see Von Storch and Zwiers 1999). In this study our sole objective is to measure the inter-sample variability itself (where a sample is a temporal or spatial average on a subset of the data). By comparing two independent experiments we obtain a clearly separated measure of the internal variability, which consists of the treatment effect plus any effects due to non-independence of samples (i.e. lower frequency internal variability). This measure can in turn be applied to a sensitivity experiment to assess the significance of the changes observed.

Previous work in other regions has taken the form of analysis of variance in sensitivity experiments (Weisse et al. 2000; Christensen et al. 2001) or experiments such as ours, specifically designed to assess confidence levels in RCMs. These latter studies have generally initiated different integrations with random perturbations and found that the size and form of the perturbation has little influence on the result. Instead they find that the internal noise can vary with region and with season. For example Giorgi and Bi (2000) initiated a RCM over China with various spatial arrangements for the initial perturbations and found that the noise level was sensitive to season, but not to the form of the initial perturbation. Using a range of perturbation methods, Caya and Biner (2004) and Rinke et al. (2004) also draw similar conclusions for simulations over eastern North America and the Arctic respectively, with each simulation possessing special internal variability characteristics linked to the local climatology.

It is evidently of interest to carry out a separate study for a tropical region, where the characteristics of the climatology and thus the internal variability may be quite different. Such an experiment produces many interesting avenues of investigation, but here we concentrate on providing practical quantitative information, particularly on the relationship between the noise and the spatial or temporal scale considered. In Sect. 2 the model and experimental design are described and in Sect. 3 the results are shown. Conclusions are given in Sect. 4.

2 Model and experimental setup

The regional climate model MAR (Modèle Atmosphérique Régional) used in this study is a hydrostatic primitive equation model in which the vertical coordinate is the normalized pressure. The dynamics of the model is described in Gallée and Schayes (1994). The warm part of the cloud microphysics is based on an explicit representation associated with the work of Kessler (1969) and Gallée (1995). Solar and infrared radiation schemes are taken from Fouquart and Bonnel (1980) and Morcrette (1984), respectively. Originally conceived for polar regions, MAR has been adapted to tropical regions by implementing the convective adjustment scheme of Bechtold et al. (2001). Further details of the tropical implementation of MAR can be found in Gallée et al. (2004) and Messager et al. (2004).

The simulated domain covers Western Africa, from 25°W to 22°E and 6°S to 35°N (see Fig. 1). The horizontal grid spacing is 40 km and the vertical dimension is represented by 40 vertical levels irregularly spaced with a finer resolution close to the surface (first level 10 m above the ground). The atmospheric variables are initialized and forced every 6 h at the lateral boundaries using the ERA40 reanalyses from the ECMWF (European Centre for Medium-range Weather Forecasts). Details of the boundary forcing can be found in Marbaix et al. (2003). In all figures presented here, the buffer zone (a band 200 km wide) and the damped layer at the top of the model (6 sigma levels) are removed. The MAR is coupled with the one-dimensional land surface scheme SISVAT (Soil Ice Snow Vegetation Atmosphere Transfer) (De Ridder and Schayes 1997) in which water and energy budgets are solved independently for soil and vegetation. The SISVAT model has been validated by Derive (2003) for the Sahelian region.

Fig. 1
figure 1

Domain of the simulation. The boxes represent subregions of interest: West Africa Monsoon (WAM), Western Sahel (WSA), Central Sahel (CSA), Eastern Sahel (ESA), Central Sudan (CSU), Central Guinea (CGU), Ouémé catchment (OUE), Niamey area (NIA), and Gourma area (GOU)

Two simulations were performed for the year 1986 (1 January to 31 December). Within the context of the dry conditions experienced during the 1970–1980s, this year is considered as neither wet nor dry (see Nicholson, 1993, in which the wet year 1987 and the dry year 1988 are compared). The first run (EXP1) is a standard run, using the model configuration described previously. The second run (EXP2) is identical to the first in every respect except the initial conditions, i.e. the model state (including the boundaries) at the first time step. In order to introduce an initial perturbation, the zonal wind, meridional wind, potential temperature, pressure, specific humidity and surface temperature were initialised with the values of the 1 January (18:00 UT) of 1987. This effective initial perturbation is shown in Fig. 2. This is a large perturbation and for our purposes it is completely arbitrary. Figure 3a shows the subsequent development of the area averaged kinetic energy of the difference field EXP1–EXP2 in the mid-troposhere (500 hPa). After the initial shock the value diminishes over a period of a few months (January until March) and reaches an equilibrium level from April. It is the size of this “equilibrium” signal that interests us here for variables relevant to process and impact studies. This is the noise level, which represents the part of the signal from any experiment that is essentially unpredictable. It will depend on the physical and dynamical characteristics of the experiment (the region) and on the climatic situation that is imposed by the lateral boundary conditions (the season or year), even though neither of these factors changes from one integration to the other. On the other hand, to be generally meaningful it should be independent of the form and magnitude of the initial perturbation. The difference between the two runs as measured in Fig. 3a shows that the initial perturbation is large and that it subsequently decays. The constant level to which it equilibrates could clearly be achieved with a smaller initial perturbation over a shorter equilibration time. It is also possible that an initial perturbation that is smaller than the equilibrium level would result in a growing difference field that would equilibrate to the same noise level, although we have not tested this possibility. To test a variable that has a longer memory and thus a slower equilibration time, the development of the difference in integrated soil moisture over the region is plotted in Fig. 3b. The value recovers quickly from a sharp initial peak but continues to decline steadily into the summer. It should be noted, however, that the value of this anomaly is extremely small compared to the actual soil moisture (0.01%) and is unlikely to have a significant feedback effect, although whatever effect it does have is of course measured as part of our experiment.

Fig. 2
figure 2

Difference between the standard run EXP1 and the initially-perturbed run EXP2, at the first time step (01/01/1986, 18:00 UT), for the zonal wind (top) and specific humidity (bottom), at 10 m in height (left) and at 0°E (right). Dotted curves denote negative values

Fig. 3
figure 3

Kinetic energy of the difference field EXP1–EXP2 at 500 hPa (a) and the soil moisture difference field EXP1-EXP2 (all layers) (b), averaged over the whole domain (daily timescale)

3 Results

3.1 Time scales

The magnitude of the internal variability of the model is first established by looking at the difference EXP1–EXP2 for a selection of variables. Time-averages of this difference were calculated for various time intervals, for the rainfall, the air temperature (T) at 10 m and the wind speed in the boundary layer (BL). The time intervals considered are daily, 5-day, 10-day, monthly, “seasonal” (3-month averages) and “annual” (April–September average). Table 1 summarises the maximum absolute differences observed across the domain. This absolute difference decreases as the averaging period gets longer. At a daily time interval, the rainfall can show variability up to 27 mm/day, but usually typical values of the daily difference are around 10 mm/day.

Table 1 Absolute maximum differences between EXP1 and EXP2 for each considered variable and time interval, across the whole domain, from April to September

It should be remembered that Table 1 shows maximum values. These values represent a worst case for uncertainty for a given time interval if one is interested in the smallest scale present in the model (recall the resolution is 40 km). For example, if a sensitivity experiment yields a 5-day mean change in rainfall of 8 mm/day at a given location, this result may have no statistical significance. If the same value appears in the monthly mean signal it is probably safe to attach a physical interpretation to it and examine its effect in an impact study. The same general principal applies to the values for temperature and wind also shown in Table 1.

In order to illustrate the spatial characteristics of these anomalies, 14 July has been chosen as a typical day, and the difference EXP1–EXP2 is shown in Fig. 4 along with results averaged over longer periods that span this day. The boundary-layer wind, temperature and precipitation are shown and the structures seen are representative of the rainy season in general. Although the origin of the difference signal can be considered random, the signal itself shows clear synoptic scale organisation associated with dynamical balances for wind and temperature (if both runs are dynamically balanced at the large scale then the difference between them must be also). The precipitation anomalies are smaller scale, consistent with their convective generation and are co-located with areas of maximum rainfall in the individual runs. The typical annual mean ratio between convective and stratiform rainfall in the model is 87 versus 13% over West Africa, and separate diagnostics confirm that most of the differences in rainfall seen here are of convective origin. Longer averaging leads to a universal reduction in the rainfall signal (note the changing colour scales) as small displacements of the grainy signal lead to cancellations. Owing to their greater coherence at larger scales and slower temporal evolution, the signal for the dynamical variables diminishes less rapidly as the averaging period increases.

Fig. 4
figure 4

Differences between EXP1 and EXP2 for the wind in the boundary layer (BL) (left, arrows), the temperature (T) at 10 m (left, colours), and the precipitation (right) at daily, 5-day, 10-day and monthly timescales

To progress from a case study to a measure that is more generally applicable, we must quantify these effects and their geographical distribution over the whole of the rainy season. A long term average of the difference between the two runs will of course be extremely small, but we are interested in finding a typical difference between the two runs for the whole season, on a range of sub-seasonal time scales. The appropriate measure is the standard deviation, which we calculate according to:

$$ {\text{standard}}\;{\text{deviation}} = {\sqrt {\overline{{{\left\{ {{\left[ {{\left( {{\text{EXP}}1 - {\text{EXP}}2} \right)}_{{\Delta t}} - \overline{{{\left( {{\text{EXP}}1 - {\text{EXP}}2} \right)}}} _{{{\text{Apr}} - {\text{Sep}}}} } \right]}^{2} } \right\}}}} _{{{\text{Apr}} - {\text{Sep}}}} } } $$

where EXP1–EXP2 is the anomaly signal, Δt is the averaging period considered (Δt = daily, 5-day, 10-day, monthly, seasonal) and the overbar denotes a temporal average over the period indicated by the subscript. In each case the calculation is made relative to the April–September mean.

Figure 5 shows the standard deviation of the wind speed in the boundary layer, the temperature at 10 metres and the precipitation for the five different timescales. The signal shows large-scale structure and in all cases decreases steadily in magnitude with increasing timescale (for these plots the colour scales have been kept constant so they can be directly compared). There is a great deal of variance in the wind off the coast near Fouta Djallon. There are corresponding variations in temperature, which remain strong out to the 10-day timescale despite the fixed sea-surface temperatures. Variations in temperature are strongest over the southern part of the continental surface, where the strongest rainfall variations are located. Variations in rainfall appear to mimic the actual rainfall distribution and can probably be understood as essentially random variations in convection. The spatial distribution does not change with increasing timescale. It should be remembered that these variations are random in origin even though they display some spatial structure. They are different from the systematic errors also present in the model and must be considered separately when assigning confidence measures to the model output. Like many other models, the MAR has a tendency to produce too many weak rainfall events and not enough strong ones (see Ramel 2006). If this systematic error was not present, the random error measured here may indeed be different, but it would still be a source of uncertainty.

Fig. 5
figure 5

Standard deviations of the anomaly field EXP1–EXP2, shown for the wind in the boundary layer (BL) (left), the air temperature at 10 m (T) (middle) and the precipitation (right) at daily, 5-day, 10-day, monthly and seasonal timescales

3.2 Spatial scales

If the internal variability in the model is truly random, we can reasonably expect the signal to decrease when considering larger spatial scales in exactly the same way as when considering longer timescales. The fact that the signal we are interested in has various physical and dynamical origins means that this is not necessarily so, and it is interesting to consider separately the question of spatial scales. Here we examine rainfall and temperature on daily timescales, using the standard deviation as defined above for the rainy season. To measure the variability associated with spatial scales larger than the model grid size, the model resolution was progressively degraded by spatial averaging prior to calculating the standard deviation. The spatial degradation was performed as follows. The coordinates of each grid box centre in the new resolution were chosen to match the coordinates of the previous grid box centre (i.e. in the 40-km resolution). For the new enlarged grid box, the mean was computed on basis of a 80 × 80, 160 × 160 or 320 × 320 km area, as the sum of: (1) the n 2 full grid boxes forming a square centred on the 40-km initial grid box (where n = 1, 3 or 7 for a 80, 160 or 320-km resolution, respectively), (2) half of the m adjacent grid boxes to that new square (where m = 4, 12 or 28, respectively), and (3) a quarter of the four remaining corner boxes.

Results are shown in Fig. 6 for the model resolution of 40 km (as before) and the degraded resolutions of 80, 160 and 320 km. Unsurprisingly, the precipitation signal decreases uniformly without any obvious change in its spatial distribution. The temperature signal also decreases, but it decreases more over the continent than over the sea. The structures associated with temperature variance on the daily timescale are more spatially coherent on large scales over the sea than over the continent. We may conclude that over the continent the temperature variance is driven by changes in convection, whereas over the sea it results from propagating synoptic scale systems, and the convection signal is less influential.

Fig. 6
figure 6

Spatially degraded standard deviation of the anomaly field EXP1–EXP2 for the 10-m air temperature and the precipitation at a daily timescale. The spatial resolution was progressively degraded from 40 km to 80, 160 and 320 km before the calculation of the standard deviation

To bring together information about internal variability on different spatial and temporal scales in a practical way, we present in Table 2 the maximum values averaged over nine regions of standard diagnostics and field data (see Fig. 1) and in Table 3 the standard deviation averaged over the same regions. This is done for the five standard temporal categories, plus the “annual” mean—i.e. rainy season: April–September. They are not all the same size, and geographically they span maxima and minima in our observed variability signals. This is reflected in the wide range of magnitudes in Tables 2 and 3, with the smaller study areas obviously showing much larger values.

Table 2 Maximum absolute differences between EXP1 and EXP2 for nine regions of interest and each considered variable and timescale
Table 3 Standard deviation of the difference between EXP1 and EXP2 for nine regions of interest and each considered variable and timescale

4 Conclusions

It is necessary to have a measure of uncertainty associated with any experiment. This is true for experiments with regional climate models (RCMs) where the influence of some change in physical processes or boundary conditions is investigated. It is also necessary to have a measure of uncertainty in simulations of climate change. This is true for impact studies, which can amplify uncertainties in small-scale input data. In this paper we provide a practical measure of random uncertainty suitable for both applications by performing an initial condition perturbation experiment with a regional climate model (MAR). Unlike an ensemble forecast, where ensemble members diverge, the RCM experiments converge because they are controlled by identical boundary conditions. The initial condition is not of primary interest it serves only to perturb the system. An equilibrium state is reached in which any further difference signal is internally generated and self-sustaining, and independent of the initial condition. It is sufficient that the initial perturbation is large enough to allow this to occur.

This signal represents the extent to which two identical integrations can differ. Non-identical pairs of experiments performed to evaluate a sensitivity or a response must differ by more than this amount for the conclusions to be reliable. Ensembles of inputs for impact studies must span at least a range of values equal to the size of this signal.

We find that the daily signal for precipitation at 40 km scale can reach values of about 25 mm. The rainfall signal has a great deal of small-scale structure and we deduce that it is associated with essentially random variations in convective triggering. Temporal or spatial averaging thus always reduces this signal, and the reduction occurs relatively rapidly as we move to longer periods and larger scales. Daily temperature and wind variations at the 40 km scale can reach values of around 3°C and 4 m/s. These values persist in time and space to a greater extent than the precipitation signal, particularly over the ocean where the convective influence is weaker and large-scale dynamically coherent structures appear in the difference field. Figure 7 provides a visual summary of these findings in the form of a plot of the standard deviation in the time-space domain for the three variables considered in the study. Shown in Fig. 8 is a spatio-temporal spectrum for the low-level meridional wind. The peak for internal variability is around 4 days with a spatial scale of about 2,500 km, consistent with the internal generation of African easterly waves. This further illustrates the dynamical coherence of the internal variability signal for the circulation, but again this result is not reproduced for the precipitation.

Fig. 7
figure 7

Maximum standard deviations of the anomaly field EXP1–EXP2 over the WAM subregion, shown in the space time domain for the wind in the boundary layer (BL) (left), the 10-m air temperature (T) (middle) and the precipitation (right)

Fig. 8
figure 8

Power spectrum of the anomaly field EXP1–EXP2 for the wind in the boundary layer, averaged over 5°N–20°N in the sub-region WAM. The contours are drawn on basis of the base-10 logarithm of the power in arbitrary units (Wheeler and Kiladis 1999)

Naturally these results are most relevant for future experiments with the same model. Other models may produce different results, especially if there is a large difference in simulation of physical processes (the location of maxima in convection, for example). Nevertheless, to the extent that MAR is realistic, or at least to the extent that it shares systematic errors with other models, our findings may have a more general applicability. Comparisons between different RCMs, and the general question of the relationship between systematic errors and the random errors studied here will provide interesting material for future work in the limited area modelling community. In a related point, it should be stressed that this study only addresses the internal variability of the regional model itself. In some applications (climate change experiments for example) there will be further uncertainty associated with the larger scales, which will enter the regional model through uncertainties in the boundary conditions, which we have kept fixed in this study. If the climatological situation at larger scales changes sufficiently, it is also possible that it will affect the structure of the internal noise in the RCM. This comment applies equally to RCMs driven by different GCMs with different systematic errors. These sources of uncertainty must be added to the uncertainty that we have measured in this study. However, we believe that it is useful to assess these factors separately and independently. We hope that the information contained in the figures and tables of this paper will be a useful reference for climate modelling and impact studies over West Africa.