1 Introduction

To assess the climate and to plan sustainable development of economic and scientific activities in European areas, there has recently been a growing concern on the need of long-term information about prevailing environmental conditions. Additionally, there has been a major resurgence of interest in climate variability and trends on regional basis (Zorita et al. 1992; González-Rouco et al. 2000; Xoplaki et al. 2003). The major methodological drawback for a long-term assessment of regional climate and its variability comes from the lack of suitable observation and simulated data. The existing long-term databases, in terms of both local observations and analysed products, present some sort of inhomogeneities due to irregular spatial distributions as well as temporal inhomogeneities linked to changes and improvements in the measurement methodologies and devices. In order to create useful homogeneous climate databases, a number of institutions (such as NCEP/NCAR, ECMWF, NASA, and others) have made efforts to produce the so-called global reanalysis (Kalnay et al. 1996; Gibson et al. 1997; Uppala 2001; Rood et al. 2001). For that, they have analysed weather observations of the past decades with the same frozen state-of-the-art analysis scheme. Although these global reanalysis data are an adequate tool to evaluate trends at global scale, their use in regional climate studies presents some limitations because of their coarse spatial resolution. The resolution of global reanalysis allows large-scale features to be resolved, but not so much the lower-scale details resulting from the interaction between large-scale flows with regional geographical features such as orography, land–sea distribution, and soil types (von Storch 1999). Therefore, when interested on areas marked by complex orography (e.g. Mediterranean basin), especially on surface parameters, the coarse resolution used in the global reanalysis data is certainly a shortcoming (Sotillo et al. 2005).

Many efforts have been carried out to fill the existing gap of adequate climatological databases on regional scale. In that sense, the scope of several projects was related to the reconstruction and generation of homogeneous long-term and high-resolution environmental databases (Juang et al. 1997; Günther et al. 1998; Ebisuzaki et al. 1998; Cox and Swail 2001; Messinger et al. 2003). The HIPOCAS (Hindcast of Dynamic Processes of the Ocean and Coastal Areas of Europe) Project was established to produce a high-resolution, homogeneous, long-term database comprising atmospheric and oceanic parameters for the assessment of European waters climate, its trend, and variability (Guedes et al. 2002). Within the HIPOCAS Project, the Puertos del Estado (PE) was responsible for the implementation and execution of atmospheric and oceanographic hindcast systems for the whole Mediterranean basin. Ratsimandresy and Sotillo (2003), Sotillo (2003), and Sotillo et al. (2005) showed that the HIPOCAS hindcast data are able to reproduce with a reasonable accuracy the Mediterranean atmospheric state as well as the wave and sea-level climate of this basin. In addition, Sotillo et al. (2005) proved that the dynamical downscaling performed to produce the HIPOCAS hindcast data introduced a substantial regional improvement in comparison to the NCEP/NCAR global reanalysis. The aforementioned works derived from the HIPOCAS data validation process were mainly focused on oceanic parameters and surface atmospheric variables such as mean sea-level pressure, 2-m temperature, and 10-m wind field. This paper is focussed on the behaviour of the HIPOCAS precipitation data over the Iberian Peninsula and the Balearics, coming to complete the HIPOCAS validation process, already outlined in the aforementioned works.

The precipitation over Iberia has a strong seasonal character. The winter precipitation is mainly related to baroclinic synoptic-scale perturbations (Zorita et al. 1992), being the main contribution to the annual regime over many areas of Iberia. Nevertheless, in some regions, particularly over its eastern flank, the maximum seasonal precipitation occurs in autumn or spring. On the contrary, the sparse summer rainfall in most part of the peninsula depends on local factors and is mainly caused by convective storms associated with ground heating, high moisture content, and upper instability (Sumner et al. 2001). Topography plays a leading role in the characterization of the Iberian rainfall regime (Doswell et al. 1998; Romero et al. 2000). At local scales, the orography can be a decisive factor in the development of cloud systems or in the enhancement of precipitation from pre-existing precipitating systems, leading to enhanced rainfall differences between uplands and lowlands or between slopes with different exposures to the humid flows. At larger scales, the synoptic and mesoscale flows are generated or redirected, enhancing precipitation in favourably exposed areas and suppressing it in another more sheltered ones (Valero et al. 2004).

This paper attempts to make a first characterization of the winter HIPOCAS rainfall regime over the Iberian Peninsula and the Balearics, providing a validation of such hindcasted data. To achieve this goal, a high-resolution precipitation database derived from the Spanish Meteorological Service observing networks is used to validate the HIPOCAS precipitation. In addition, an evaluation of the potential improvement of this new hindcasted data versus current global reanalyses is performed. Although the global reanalyses can characterize realistically the large-scale atmospheric features, the regional ones are less accurately resolved. This point addresses the necessity of precipitation databases able to reproduce the rainfall regional structures occurring over areas marked by a complex orography. In that sense, we expect the hindcasted HIPOCAS precipitation data to be especially useful to those environmental scientists working with regional and local models who are in need of atmospheric high-resolution surface parameters to drive their models. Moreover, the important improvement of this new regional database with regards to the current global reanalysis makes the HIPOCAS database particularly helpful on those areas such as the southern Mediterranean ones, where long-term observations are scarce and the global reanalyses arise as the almost unique adequate homogeneous climatic databases.

The organization of the paper is as follows. A brief description of the HIPOCAS precipitation dataset along with the observed Iberian precipitation dataset and the used global reanalysis data is given in Sect. 2. Sect. 3 presents the methodology used to validate the simulated precipitation field. Sect. 4 is devoted to verify the quality of the hindcasted data through extensive comparisons with the observed precipitation database. Sect. 5 presents the different characterization of the observed precipitation performed by the HIPOCAS and the global reanalyses. Finally, main conclusions are drawn in Sect. 6.

2 Data

2.1 Mediterranean HIPOCAS precipitation dataset

The HIPOCAS long-term database is the result of an atmospheric hindcast performed over the whole Mediterranean basin. In order to produce the 44-year (1958–2001) hindcast, the regional atmospheric climate model REMO (Jacob and Podzun 1997) was used. The hydrostatic REMO model was set up in its climatic mode. This specific configuration implies the use of the same parameterization scheme as in the general circulation model ECHAM4 (Roeckner et al. 1996). The dynamical scheme of REMO is similar to the one used by the Deutscher Wetterdienst Europe Model/Deutsches Model (DWD EM/DM) regional forecast operative system (Majewski 1991). It is based on the primitive equations in a terrain-following hybrid co-ordinate system. Prognostic variables are surface air pressure, horizontal wind components, temperature, specific humidity, and cloud water content. Physical processes such as radiation (short and long wave), vertical diffusion, stratiform condensations, and convective and surface processes, all of them contributing on the sub-grid scale, were considered as in the ECHAM4 model. Parameterization of short-wave and long-wave radiative energy transfer was accomplished by following Morcrette et al. (1986), with modifications for additional greenhouse gases, 14.6-μm ozone band, and various types of aerosols. Vertical diffusion and turbulent surface fluxes are resolved from Monin-Obukov theory following Louis (1979).

Concerning precipitation, two different cloud schemes were applied. To evaluate the cloud water content of stratiform clouds, a scheme based on the Sundquist parameterization (Sundquist 1978) was used. Cumulus convection was parameterized by a mass flux scheme (Tiedtke 1989) with modifications after Nordeng (1994). It is also worth to note that a 5-layer soil model was included in order to take into account heat and water budgets within the soils (DKRZ 1994).

The 44-year (1958–2001) Mediterranean hindcast was performed with an horizontal resolution of 0.5°×0.5° (roughly 50×50 km2). The model domain (Fig. 1a) is covered by 101×61 grid points and it is wide enough to incorporate the whole Mediterranean basin within the forecast area. Twenty hybrid levels (η) were considered in the vertical. A time step of 300 s was adopted. NCEP/NCAR global reanalysis (Kalnay et al. 1996) available at 00:00, 06:00, 12:00, and 18:00 UTC was used to set initial and boundary conditions. In order to minimize propagating errors from the lateral boundaries into the forecast area, a cosine-shaped relaxation function over an 8-point “sponge zone” from the boundaries was used (Davies 1976). In order to impose time-variable, large-scale atmospheric states, a spectral nudging technique was used in the REMO hindcast (von Storch et al. 2000). Further information on the Mediterranean REMO hindcast as well as on the HIPOCAS database can be found in Sotillo (2003).

Fig. 1
figure 1

a Mediterranean HIPOCAS domain and its orography detailed with the HIPOCAS resolution. Study domain (10°W–5°E, 35°N–45°N) is framed. b Orography and spatial distribution of the stations over the Iberian Peninsula and the Balearics

This paper is focused on the analysis of the winter precipitation over the Iberian Peninsula and the Balearic Islands. Thus, from the whole HIPOCAS Mediterranean domain, a precipitation data subset was extracted over the area of interest keeping the same resolution (0.5°×0.5°) as the HIPOCAS database. Figure 1a also frames the geographical area (10°W–5°E, 35°N–45°N) used in this paper as study domain, as well as the orography detailed with the HIPOCAS resolution. The analysis is performed with monthly accumulated precipitation values for a 41-year period, covering from 1 January 1961 to 31 December 2001.

2.2 Iberian observed monthly precipitation dataset

A high-resolution, daily precipitation database derived from in situ measurements, coming from the station network of the Spanish Meteorological Service (Instituto Nacional de Meteorología, INM), is used to validate the HIPOCAS precipitation. The INM Climatological Area has elaborated an Iberia daily precipitation database by means of statistical spatial interpolation of in situ measurements onto a regular grid. The purpose was to build a complete daily dataset necessary as input of spatially distributed models and for understanding the climate variability at daily scale. All the in situ measurements of daily precipitation data from the Historical Database of INM were extracted for the period 1 January 1961 to 31 December 2003, regardless of their time coverage. Thus, the number of available stations depends on the date. These stations, irregularly distributed over the Iberian Peninsula and the Balearics, provided good coverage over the domain. In order to complete the observations in Portugal, data from the European Climate Assessment & Dataset project (ECA&D) were used. The distribution of the available stations on 31 December 1990 and the orography can be observed in Fig. 1b.

A 25-km regular grid was chosen by the INM, being this resolution suitable to the space scale needed for risk analysis models and for climatic variability studies, including evaluation of climate change impacts in Spain. The Kriging method was used to interpolate daily precipitation. This interpolation technique preserves more variance than other methods such as the inverse distance weighting method (Shen et al. 2001); additionally, this is a spatial interpolation tool widespread in software related to Geographical Information Systems, allowing comparisons with databases from different countries. All these considerations are outlined in the COST Action 719 (The Use of GIS in Meteorology and Climatology) in which the INM is an active participant. Nowadays, the INM is working on extending back the time coverage up to 1931 but results are not yet available. Further information about this precipitation dataset can be found in Luna and Almarza (2004).

In order to make the comparison with REMO precipitation data, it was necessary to interpolate the daily INM dataset to the HIPOCAS grid (0.5°×0.5°, roughly 50×50 km2). From the daily data, the monthly accumulated precipitation has been calculated over the 41-year period (1 January 1961–31 December 2001), obtaining the Iberian Precipitation Dataset (hereafter called IPD). The winter observed IPD will be used to validate the HIPOCAS hindcasted precipitation.

2.3 NCEP and ERA global reanalyses

In order to complete the validation of the HIPOCAS dataset, an additional comparative study of the above-mentioned HIPOCAS and IPD datasets will be performed with other climatological datasets. In this paper, NCEP/NCAR and ERA global reanalyses will be used for this purpose. To achieve this and to fulfil the comparative study, the monthly NCEP and ERA precipitation data were interpolated to the above-mentioned HIPOCAS (0.5°×0.5°) grid, spanning the same selected spatial domain centred on the Iberian Peninsula and the Balearics and covering the same 41-year HIPOCAS and IPD period. Further information on the NCEP precipitation data can be found in Kalnay et al. (1996). Concerning ERA, Gibson et al. (1997) and Simmons and Gibson (2000) provide information on the reanalysis performed by the ECMWF and its averaged databases.

3 Methodology

Once the monthly winter precipitation series were obtained, several products are derived to validate the HIPOCAS precipitation dataset. A general description of the HIPOCAS, NCEP, ERA, and IPD datasets was made by means of statistics such as precipitation means, spatial distribution of the bias (simulated datum minus observed one), and root mean squared error (RMSE). Moreover, other statistic parameters such as correlation indexes, as well as temporal evolution of the spatial average bias and RMSE, are derived to evaluate the different model performance ability.

To extract the general behaviour of each dataset, a principal component analysis (PCA) was applied to the four databases. Briefly, the PCA (Preisendorfer 1988) has proven to be a reliable method for data reduction and for examining the variance structure. However, beyond mere data compression, the PCA is a very useful tool for exploring large multivariate datasets because of its potential for yielding substantial insights into both the spatial and temporal variations of the analysed fields. This methodology applied to spatial data, known as S-mode decomposition (Richman 1988), enables patterns to be identified that can be attributed to specific physical processes by statistical assessment. The new uncorrelated variables are called principal components (PCs) and consist of linear combinations of the original variables derived from the diagonalization of the covariance/correlation matrix. Let X k represent a set of observation vectors at the grid points for N observations in time. X k may then be expressed by using summation notation as

$$ {\mathbf{X}}_{k} = {\sum\limits_{i = 1}^M {y_{{ik}} {\mathbf{E}}_{i} } },\quad k = 1,2, \ldots ,N, $$
(1)

where M is the total number of grid points and E i the ith eigenvector (loadings or PC patterns) associated with the ith eigenvalue. These patterns are the coefficients of the linear combinations and they represent the weight of the original variables in the PCs; and y ik is the time-dependent coefficient (scores or PC time series) of the ith eigenvector for the kth observation in time built as the projection of the original series onto each eigenvector. The PCs indicate patterns of variation of the original field and are numbered according to their related variance. Thus, the first PC is the linear combination with the maximum possible variance; the second one is the linear combination with the maximum possible variance which is uncorrelated with the first PC and so on. More details about PCA are found in Jollife (1986) and Sneyers et al. (1989). In our case, the PCA was applied to the correlation matrices of both datasets: the simulated and the observed precipitation fields, being a set of eigenvalues and eigenvectors produced. Generally, the most important (the first ones) eigenvectors tend to describe regions with largest fluctuations. Thus, most information from the data can be represented using some smaller number of the PCs resulting in a much smaller dataset.

The spatial patterns of these PCs and their interpretation can be enhanced by rotation. Rotation involves the independence of domain shape and maximizes the correlation of a few variables with a given PC, at the detriment of explaining total variance across the dataset (Karl et al. 1990). Rather than data compression, the rotation of a subset of eigenvectors also allows easier interpretation of the spatial patterns. As a consequence of the rotation of eigenvectors, a second set of new variables, rotated PCs, is produced. This subset presents specified properties (von Storch and Zwiers 1999). They are basis vectors containing simple geometrical patterns such as compact areas used for regionalization or maps composed of a dipolar region. Moreover, they have time coefficients bearing specific types of behaviour such as non-zero values during some time episodes.

Additionally, rotation produces distributions less sensitive to the location than the conventional PCs. The rotated PCs are obtained in a similar way to the PCA analysis equation from the original data, as the dot product of data vectors and the rotated eigenvectors. PCAs’ advantage of maximizing the variance in a few linear combinations of variables dictates that the number of PCs to retain for rotation must be decided prior to rotation. The Kaiser’s rule with 1.0 as threshold was applied to retain the number of the significant PCs (Wilks 1995). In this paper, the Varimax (orthogonal) procedure was used to rotate the significant PCs (Richman 1986; Preisendorfer 1988).

To identify significant signals in the obtained PC time series, correlation analyses, Mann-Kendall tests, and spectral analyses have been applied (Goossens and Berger 1986). Additionally, a wavelet multiresolution analysis has also been used in order to extract more information of these time series in all timescales. The wavelet transform technique was introduced and formulated by Morlet et al. (1982) and Grossman and Morlet (1984). Wavelet transforms have been applied successfully to different studies of geophysical time series in order to understand their temporal scales of variability (Kumar and Foufoula-Georgiou 1993; Weng and Lau 1994). In meteorological and climatological studies, the wavelet decompositions have also found many applications (Farge 1992; Mahrt 1991; Gamage and Blumen 1993; Weng and Lau 1994; Gao and Li 1993). These authors have shown in their studies the advantages of this technique compared with the Fourier transform because wavelets show structures on different time or spatial scales at different time or spatial locations. The Fourier transform does not contain any time dependence of the signal, not providing any local information regarding the time evolution of its spectra.

In the wavelet decomposition, the time-frequency variations of a time series are analysed. The main property is that the analysing functions, named wavelet functions, are localized both in time and frequency, i.e., they oscillate in a finite amount of time. Briefly, wavelets are a family of basis functions that can be used to approximate any given signal. Like Fourier sines and cosines, wavelets are basis functions that can be used to represent any given signal because wavelets contain information about frequencies of the signal over all times instead of showing the frequency variations in time (Morlet et al. 1982). Therefore, one of the differences with Fourier transforms is that the latter enables localization in frequency but not in time. Thus, wavelets turn to be an appropriate and powerful tool to study time series. The wavelet transformation has not only good local properties in time and frequency domain, but also it works as a microscopic function in analysis, decomposing a time series into scale components and allowing discrimination between oscillations to occur at fast scales and others at slow scales (Morlet et al. 1982; Grossman and Morlet 1984). Wavelets are characterized by pairs of the orthogonal functions, a mother wavelet interpreted as impulse response to band-pass filter, and a scaling function that can be interpreted as impulse response to low-pass filter. In this paper, the continuous wavelet transform (Mallat 1998) was used as a filter to decompose and isolate characteristics of the Iberian precipitation field at different frequencies. Furthermore, the continuous wavelet analysis presents the advantage that it is usually easier to interpret because all the information tends to be more visible.

4 Validation of the HIPOCAS dataset through IPD

This section illustrates the main results obtained from the validation of the HIPOCAS precipitation field over the Iberian Peninsula and the Balearic Islands. This validation was performed through comparisons with the observed IPD set taking into account the methodology described in Sect. 3. The results are presented as follows: first, the IPD allows the characterization of the winter rainfall regime, and at the same time such observed field is a useful tool to evaluate the model performance ability, pointing out similarities and differences between the hindcasted HIPOCAS and the observed fields. Second, the most significant rotated PCs of both simulated and observed precipitation datasets allow to pick up the main regional characteristics, in addition to describe the model ability to reproduce the main observed rainfall patterns as well as their time evolution.

4.1 General description

Figure 2a shows the 41-winter monthly mean IPD precipitation field over Iberia. This field is characterized by a strong gradient with a maximum (minimum) located over the north-western (south-eastern) side of the Iberian Peninsula. The well-known differences in precipitation behaviour between the Atlantic and Mediterranean areas of the Iberian Peninsula (González-Rouco 1997; Esteban-Parra et al. 1998; Font 2000) are shown by the spatial distribution pattern. This observed precipitation pattern is well reproduced by the analogous HIPOCAS precipitation field (Fig. 2b), highlighting the agreement not only in the spatial gradient, but also in the absolute precipitation values. The bias between the observed and simulated fields (Fig. 2c) is in absolute value lower than 20 mm over most of the Iberian Peninsula, pointing that the HIPOCAS precipitation field reproduces realistically the observed values. This figure displays over north-western Iberia zones of maximum positive/negative bias (absolute values of the order of 50 mm). Rather than a non-realistic HIPOCAS simulation, the existence of these neighbour, high biased areas seems to be more linked to a southward displacement of the hindcasted precipitation maximum (located in the north-western Iberia, as it can be seen in Fig. 2b) compared to the observed IPD one (see Fig. 2a). A similar maximum/minimum bias distribution is obtained along the northern Iberian coast, as well as over the Pyrenees. Moreover, a zone of positive bias is derived over south-western Iberia with a regional maximum along the Strait of Gibraltar. It is in this area where the bias is more significant in terms of percentage of total observed precipitation (not shown), the values being comprised between 40% and more than 100%. It can be noted in Fig. 2d that the aforementioned areas (north-western coast and Strait of Gibraltar area) are also depicted as the ones with maximum RMSE, showing values higher than 100 mm. In contrast, most of the Iberian Peninsula show RMSE values lower than 40 mm. Spatial distribution for time correlations between the observed and the HIPOCAS precipitation fields is shown in Fig. 2e. Such figures highlight the strong agreement between both fields showing a clear E–W gradient with a westward correlation increase. Values higher than 0.80 are obtained over most of Iberia, the highest values being located over the western side of the Iberian Peninsula and the lowest ones located along the Mediterranean coast, especially over its south-eastern part. This area shows low winter precipitation records (as can be seen in Fig. 2a), and it is characterized by a seasonal maximum in autumn, this maximum being linked to strong convective storm activity related to the typical intense Mediterranean cyclogenesis (Font 2000; Valero et al. 2004; Martín et al. 2004). It is also worth to mention that other HIPOCAS validation works focused on variables such as the 10-m wind field, waves, and sea level data, have identified the south-eastern flank of Iberia as an area where the HIPOCAS hindcast does not present its better performance (Sotillo et al. 2005; Ratsimandresy et al. 2005).

Fig. 2
figure 2

Winter spatial distributions of: (a) monthly mean IPD precipitation field (mm), (b) monthly mean HIPOCAS precipitation field (mm), (c) bias (mm), (d) RMSE (mm), and (e) temporal correlation between IPD and HIPOCAS precipitation field

The previous statistical comparative analysis performed between the long-term simulated HIPOCAS precipitation field with the observed IPD highlights the good agreement in terms of spatial and temporal distribution, as well as in terms of total amount of precipitation. Likewise, it is shown that how this agreement is more important in the Atlantic region than in Mediterranean one. The following PCA study of both observed and simulated precipitation fields pick up the regional differences between them, in addition to describe the model ability to reproduce the main observed rainfall patterns.

4.2 PC results

As it is described in Sect. 3, the selected PCs are proposed to adequately represent the original dataset variation without loss of significant information. The number of retained PCs and the percentage of total variance explained by them are listed in Table 1. From the original variables, a total of five PCs were retained that explain more than 90% of the total variance. From the simulated data, the loadings with eigenvalues greater than 1 were also five PCs, explaining 90.4% of the total variance. Moreover, the significant PCs retained were subsequently rotated through a Varimax rotation technique. The five rotated PCs of both IPD and HIPOCAS data are displayed in Figs. 3 and 4, respectively.

Table 1 Percentages of explained variance for the unrotated and rotated PCs derived from IPD and HIPOCAS datasets
Fig. 3
figure 3

Patterns of the selected Varimax rotated PCs corresponding to the IPD precipitation field: (a) first to (e) fifth

Fig. 4
figure 4

Same as Fig. 3 except for the HIPOCAS precipitation field

The first observed rotated PC pattern (Fig. 3a) is marked by a clear west–east gradient with the highest loading values located in western Atlantic Iberia. This precipitation pattern is linked to the predominant westerly circulation regime which is dominant from October to May (Capel 1981). The first rotated PC pattern obtained from the HIPOCAS dataset (Fig. 4a) also reproduces this structure, exhibiting a good agreement with the observed PC. In order to give an objective measure of similarity between observed and simulated patterns, spatial point-to-point correlation values between both PC maps were also computed. For this first case, a correlation of 0.98 gives us enough confidence in the model skill to reproduce the observed precipitation pattern. Both observed and simulated patterns match against the first rotated winter PC described in Serrano et al. (1999). The second rotated IPD PC (Fig. 3b) shows a centre of high loading values stretching over the south-western Iberia. Such a precipitation pattern is usually produced by westerly flows, associated with tropical maritime air mass advection (Linés 1970). The same pattern is also found by Serrano et al. (1999) as their second rotated PC in winter. The strong resemblance between the observed pattern and the corresponding simulated HIPOCAS PC (Fig. 4b), showing a spatial correlation value of 0.92, can be again noted. The third observed rotated PCs (Fig. 3c) is associated with precipitation regimes of the northern Iberian coast. Rainfall over this zone is usually related to cold maritime air masses coming from the North Atlantic Ocean flowing at low level against coastal mountains promoting high convective instability and enhancing updraft motions (Font 2000). Figure 4c shows a similar simulated pattern, with spatial correlation of the order of 0.90, displaying high loadings to the north of the study area, whereas the lowest values are located on the east of Iberia. These patterns (Figs. 3c, 4c) are similar to those of second and fourth configurations found by Esteban-Parra et al. (1998) and Serrano et al. (1999), respectively.

The following two winter patterns will be shown, despite their low contribution in terms of explained variance, to illustrate the different behaviour of the Mediterranean region in contrast to variability in the Atlantic region, which is better captured in the previous patterns. The fourth observed rotated PCs (Fig. 3d) is related to the precipitation over NE Iberia. Both observed and simulated (Fig. 4d) patterns have loadings concentrated to the east of the Pyrenees, at the north-eastern coastline and their immediate surrounding area. Rainfall in this zone is usually related to intrusions of cold fronts associated with low-pressure systems centred at the north of the western Mediterranean Basin (Font 2000; Sotillo et al. 2003). These patterns present some correspondence to the ninth and tenth Oblimin-rotated PCs of Romero et al. (1999) and to the fourth Varimax PC of Garcia et al. (2002).

The last selected PCs (Figs. 3e, 4e) show high loadings on the south-eastern flank of the Iberian Peninsula, with a decreasing gradient from the coast to the inner of the Iberian Peninsula. Rainfall over this area is mainly related to easterly air masses usually associated with upper-level cut-off lows over south-western Iberia which carry first maritime air coming from the Atlantic Ocean and then as the air passes from North Africa is warmed and moistened in its run along the Mediterranean Sea to promote convective instability conditions in this area (Valero et al. 1997; Sotillo et al. 2003). These patterns are similar to the third rotated PC found by Garcia et al. (2002).

The PC time series associated with the aforementioned five rotated PC patterns for both observed and simulated precipitation fields are shown in Fig. 5. A similar time evolution in all cases can be observed. Thus, the observed PC time series exhibit a consistent fluctuation with their corresponding simulated scores. A correlation analysis between the pairs of time series gave as a result values never below 0.66 (significant at the 0.01 level). The first three PCs, which explain more than 70% of variance, show values of 0.97, 0.96, and 0.82, respectively. These high time correlation values, along with the good agreement of the PC spatial patterns, indicate the high degree of similarity between the HIPOCAS and IPD datasets, demonstrating the remarkable model performance ability. After applying the Mann-Kendall test (Goossens and Berger 1986), significant trends are not found. To identify more detailed features, the PC time series are analysed in detail by means of the application of the Morlet continuous wavelet (Morlet et al. 1982). The wavelet analysis is applied over each time series associated with the observed and the hindcasted fields.

Fig. 5
figure 5

Time series of: (a) first to (e) fifth Varimax rotated PC. Continuous dotted line corresponds to the IPD (HIPOCAS) field. Correlations between both fields are indicated at the right corner of each panel

Figures 6 and 7, respectively, display the wavelet analyses corresponding to the PCs, which present highest and lowest correlation values between HIPOCAS and IPD scores. These figures show the wavelet power spectra displayed as a function of period and time, corresponding to the first and fifth rotated PC time series, respectively, for both observed and simulated precipitation fields. The magnitude of wavelet coefficients gives a measure of the correlation between the signal and the wavelet basis. By comparing Fig. 6a, b with Fig. 7a, b, some noteworthy characteristics are obtained. It can be observed that maximum and minimum power spectra zones match and present similar intensity. This coincidence in terms of time, period, and intensity of the HIPOCAS and IPD wavelets indicates that the HIPOCAS data are able to capture the main features of the signal involved in the observed precipitation field.

Fig. 6
figure 6

Wavelet power spectra of the first Varimax rotated PC time series of: (a) IPD and (b) HIPOCAS fields. The y-axis represents the period (years) and the x-axis corresponds to the time period (year)

Fig. 7
figure 7

Same as Fig. 6 except for the five PC time series

Additionally, Figs. 6 and 7 might be considered as representatives of the Atlantic and Mediterranean patterns, respectively, presenting some differences on their power spectra. The Mediterranean case is mainly characterized by scales evolving between 5 and 8 years (see y-axis of Fig. 7) throughout the whole time period (1961–2001). Although power of the spectra is mainly concentrated in periods between 5 and 8 years, around 1994 arose the maximum highly energetic oscillation, presenting a period between 4 and 9 years with maximum amplitude at 6 years. This core tends to be maintained up to the late 1990s, although the proximity of the record ending reduces confidence in this assertion. It is remarkable how pronounced peaks on these dates are also observed in the corresponding PC time series (Fig. 5e). In contrast, the Atlantic power spectrum (depicted in Fig. 6) is not restricted to the aforementioned Mediterranean timescales. The IPC and HIPOCAS maxima hold similar areas. Amplitude and intensity of the oscillations change with time, showing highly energetic amplitudes located on the central part of the record with periods between 3 and 10 years. Oscillations arise around 1970, reaching the maximum amplitudes between 1974 and 1984. Moreover, this is also observed in Fig. 5a with oscillations over such dates. Throughout the record, some episodes of quasi-biennial oscillation are found, lasting a short time. This quasi-biennial oscillation predominates during the period 1964–1968, exhibiting high intensity around 1966. Additionally, periodograms of PC time series (not shown) reveal that the maximum power of the spectra is concentrated in periods of less than 12 years for the Atlantic case and 8 years for the Mediterranean one, showing similarity with the wavelet results shown.

5 Regional comparison of HIPOCAS dataset versus NCEP and ERA reanalyses

As mentioned earlier, the main objective of the atmospheric hindcast performed within the HIPOCAS Project was to create a long-term set of consistent climate data on a regional scale. After validating the hindcasted precipitation data, we will assess the improvement introduced by the downscaling in relation to the, at the moment, existing climate datasets. Taking into account that the NCEP/NCAR global reanalysis was used to drive the regional REMO run, it seems natural to evaluate the improvement in quality as well as in accuracy introduced through the dynamical downscaling. Moreover, to provide a comparatively more complete study against current climatological datasets, a similar comparison was performed using the ERA global reanalysis dataset. Thus, the comparisons between the IPD and the HIPOCAS precipitation fields presented in the previous section were repeated, but this time using NCEP and ERA global reanalysis datasets instead of the HIPOCAS hindcasted one. To fulfil such comparisons, monthly NCEP and ERA winter precipitation data over the 41-year period were interpolated to the above-mentioned HIPOCAS (0.5°×0.5°) grid, as it was mentioned in Sect. 2.

Figure 8 shows the behaviour of the precipitation obtained from the global NCEP and ERA reanalyses. Figure 8a, b display their respective mean precipitation fields. Both fields show similar spatial distributions marked by a clear west–east gradient. The maximum is located over the north-western Iberia and reaches up to 140 mm in the case of ERA and 95 mm for NCEP. These maximum values are clearly lower than the observed IPD and hindcasted HIPOCAS (Fig. 2a, b), which are of the order of 220 mm in both cases. Thus, the important improvement in the characterization of the observed precipitation introduced by the HIPOCAS hindcast in relation to the other global reanalyses is remarkable. The improvement is noted both in total amount values and in the spatial distribution, being remarkable how the HIPOCAS data reproduce more realistically the IPD field than the more smoothed and negatively biased global reanalyses data. Furthermore, Fig. 8c, d shows the RMSE spatial distributions for NCEP/IPD and ERA/IPD, respectively. With regard to errors, a similar performance can be noted in both reanalyses; however, ERA errors are lower than NCEP ones over some specific areas. This slight improvement of ERA versus NCEP is largely exceeded by the HIPOCAS performance (Fig. 2d). The comparison of both figures with Fig. 2d corroborates the above-mentioned HIPOCAS improvement. Figure 8e, f displays the spatial distribution of time correlation values between IPD and the global reanalyses. In general, both cases present values higher than 0.90 over most of the Iberian Peninsula showing the highest values over the western Iberia side and the lowest ones over the Mediterranean coast. These patterns are quite coincident with the corresponding HIPOCAS time correlation spatial pattern (Fig. 2e).

Fig. 8
figure 8

Winter spatial distributions of: (a) monthly mean NCEP precipitation field (mm), (b) monthly mean ERA precipitation field (mm), (c) RMSE (mm) between IPD and NCEP fields, (d) RMSE (mm) between IPD and ERA fields, (e) temporal correlation between IPD and NCEP fields, and (f) temporal correlation between IPD and ERA fields. In order to facilitate the visual comparison, the scales used in this figure are coincident with those of Fig. 2

To provide a more complete view of the NCEP, ERA, and HIPOCAS performances that allow the observed Iberian precipitation field along the 41-year period to be characterized, Fig. 9 has been built to display the time evolution of the bias (upper panel) and the RMSE (lower panel) averaged over the whole spatial domain. Both figures highlight the much better HIPOCAS performance for reproducing the observed IPD field in comparison with the global reanalysis data. It can be noted in Fig. 9a that reanalyses involve negative bias along the whole time period, underestimating largely the observed precipitation over the Iberian Peninsula. On the contrary, the HIPOCAS bias fluctuates around zero, not showing such huge underestimation and thus, giving a measure of the better skill to reproduce the IPD. The bias averaged over the whole 41-year period (Table 2) highlights the above-mentioned better characterization of the observed IPD performed by the HIPOCAS dataset in comparison with the other reanalyses. Although both reanalyses show similar underestimation of the observed monthly precipitation (up to 29 mm), the HIPOCAS drastically reduces such value up to 5 mm. Throughout the whole period, the RMSE (Fig. 9b) shows a similar evolution for the three datasets. The magnitude of the errors is quite similar in both reanalyses, the ERA errors being slightly lower than the NCEP. However, the HIPOCAS diminishes such errors, emphasizing the noticeable improvement introduced by the performed hindcast versus the existing global reanalyses data. This improvement is also noticed in Table 2 in which a decrease of the time-averaged errors is shown. Although NCEP and ERA show differences with the observed precipitation around 40 mm, the HIPOCAS errors become of the order of 30 mm.

Fig. 9
figure 9

Temporal evolution of: (a) the bias and (b) the RMSE averaged over the whole spatial domain for the used datasets

Table 2 Temporal mean of spatial averaged bias and RMSE obtained from the HIPOCAS, NCEP, and ERA versus the observed IPD over the 41-year period

6 Summary and conclusions

A 44-year (1958–2001) high-resolution, hourly atmospheric hindcast was performed over the Mediterranean basin by EPPE within the EU-funded HIPOCAS Project framework. The hindcasted data were produced by means of dynamical downscaling from the NCEP/NCAR global reanalysis using the regional atmospheric model REMO. A spectral nudging technique was applied to the simulated wind field keeping it close to the imposed time-variable, large-scale atmosphere state provided by the NCEP forcing. The use of global reanalysis data to drive the regional model, instead of other data sources, was motivated by the need of a guaranteed temporal homogeneous output over the whole multi-decade run period.

This paper has been focused on the analysis of the hindcasted HIPOCAS winter precipitation over the Iberian Peninsula and the Balearic Islands. To do this, a precipitation data subset was extracted over the area of interest from the whole HIPOCAS Mediterranean domain, keeping the same resolution (0.5°× 0.5°) than the original HIPOCAS database. The analysis was based on winter (December–February) monthly-accumulated precipitation values for a 41-year period, covering from 1 January 1961 to 31 December 2001. To validate this hindcasted HIPOCAS precipitation database, an observed monthly IPD, generated by the Spanish Meteorological Service (INM), was used.

A general description of the model performance ability was made through comparisons between the IPD and HIPOCAS precipitation fields. The statistical comparative analysis highlighted the existence of a very good agreement not only in terms of spatial and time distribution, but also in terms of total amount of precipitation. Furthermore, comparisons of the PC patterns of HIPOCAS and IPD datasets illustrated how the HIPOCAS hindcasted field largely captures the main characteristics of the IPD field, the agreement was quite good over the first three PC patterns, the Atlantic ones, but not to the same extent for the two last selected PCs, more localized over the Mediterranean Iberia. It is also remarkable that how the HIPOCAS patterns reproduce reasonably accurately observed regional characteristics linked to the main orographic features in the study domain. In contrast, high time correlation values are obtained between hindcasted and observed PC time series, pointing out the high degree of similarity between both sets and corroborating the model performance ability. Results of the wavelets applied to the PC time series revealed again the capability of the HIPOCAS data to capture the main signals involved in the observed precipitation data.

To provide a broader picture of the HIPOCAS quality, the previous validation of the HIPOCAS winter precipitation was completed through a comparative study with global reanalysis data (NCEP and ERA). This study highlighted the important improvement in the characterization of the observed precipitation introduced by the HIPOCAS hindcast in relation to the above global reanalyses. It is worth to note that although both reanalyses underestimate significantly the IPD field, showing a negative bias of the mean precipitation along the whole time period, the HIPOCAS bias fluctuates around zero, not showing any important underestimation of the observed precipitation. Thus, a measure of the better HIPOCAS skill to reproduce the observed IPD is provided. It is also remarkable that such improvement is effective not only in terms of total amount values, but also in the spatial distribution, the IPD field data were reproduced much more realistically by the HIPOCAS than by the more smoothed and negatively biased global reanalysis data.

A specific analysis for the months when HIPOCAS and global reanalysis show their worse performance in terms of precipitation bias and RMSE were carried out. The selected dates are corresponding to December 1978, 1982, 1987, 1995, and 1996 (numbered in Fig. 9 as time steps 54, 63, 87, 105, and 108, respectively). Some of these dates (e.g. 105 and 108) coincide with months marked by the occurrence of precipitation over the Iberian Mediterranean flank. The Mediterranean signal over Iberia is identified on these months by the preponderance of the fourth and, especially, the fifth PCs. It is in these cases when global reanalyses make their worse performance, and HIPOCAS is not able to improve so much such performance. In contrast, there are other months (e.g. 54, 63, and 87) with high global reanalysis errors in which HIPOCAS performance minimize significantly the error and the bias. It is remarkable that these months, where HIPOCAS does not seem to be dragged so much by the errors of the NCEP reanalysis used as forcing, are more related to Atlantic regimes, prevailing the three first PCs.

Finally, it is worth to note that the performed validation over Iberia along with the remarkable improvement relative to global reanalysis data enhances the confidence on the HIPOCAS data. Furthermore, its use can be very helpful in regional climatological studies focused on specific Mediterranean areas, such as the offshore ones, handicapped by lack of observations.