1 Introduction

The goal of climatological studies is to understand the basic principles of global and regional climates, and furthermore, to map trends in the past variations of climate and the causes of possible future climate change. Reliable data, together with estimates about the quality, are the basis for any such study.

It has been a common practice to use meteorological observations and their statistics as the true state without making any synthesis of the information from different observation types. As the observation coverage is irregular and varies in time, large unobserved areas will cause uncertainties in the global climate diagnosis. Since the development of modern numerical weather prediction (NWP) models, the atmospheric data assimilation has been among the most feasible methods for restoration of reliable states of the atmosphere. Beside data assimilation by numerical models, satellite measurements can be used independently and comprehensive surface temperature analysis does exist as well (NASA GISS http://data.giss.nasa.gov/gistemp and HadCRU http://www.cru.uea.ac.uk/) and is widely used for global temperature analysis.

Atmospheric data assimilation comprises a sequence of analysis steps (or cycles) in which initial first guess information (usually short model forecast from previous analysis) is combined with observations to produce an estimate of the state of the atmosphere (the analysis) at this time.

While long operational databases of atmospheric analysis exist, they are not suitable for climatological studies because of changes in numerical model and data assimilation techniques over the time.

These considerations led (Bengtsson and Shukla 1988; Trenberth and Olson 1988) to propose reanalysis of archived meteorological observations, together with up to date data assimilation systems and realistic physical models, for creating long-term, internally consistent, homogeneous, multivariable datasets for Earth’s climate system (Uppala et al. 2005). Bengtsson also proposed regional reanalysis with mesoscale models in order to create more comprehensive four dimensional datasets of regional climate systems. In a reanalysis, it is possible to use meteorological data from sources that were not available during the operational runs. Also, it is possible to extend analysis back to times when numerical models where not available.

The predecessor of contemporary reanalysis projects was the First GARP global experiment (Bengtsson et al. 1982). However, as it included a single year 1979, it did not provide enough data for climate research. First successful comprehensive global reanalyses were carried out by European Center for Medium-Range Weather Forecasts (ECMWF), National Centers for Environmental Prediction (NCEP) and NASA in the beginning of 90s. These projects were successful and were therefore followed by long-term projects of reanalysis using more elaborate methods. As a result, two most capacious homogeneous datasets of global atmospheric state were established: the ECMWF’s ERA-40 for 1957–2002 (Uppala et al. 2005) and the NCEP–NCAR 50-year reanalysis database for 1948–1998 (Kistler et al. 2001). Both databases are being continually augmented in the near-real-time mode within ERA interim (Simmons et al. 2006) and NCEP–DOE AMIP-II REANALYSIS (Kanamitsu et al. 2002) projects.

The establishment of long-term global databases laid the foundation for several regional reanalysis projects—reanalysis of observational data that uses assimilation based on high resolution limited area models, which are ongoing or already carried out around the world. The biggest local reanalysis projects are the NCEP-based North American Regional Reanalysis (Mesinger et al. 2006) (continues in real time mode), the high-resolution 44-year hindcast database for the Mediterranean Basin by (Sotillo et al. 2005), and the ERA-40-based 43-year atmospheric hindcast for South America (Silvestri et al. 2009). In the context of the present BaltAn65+ project it is also essential to mention a Scandinavian pilot project of reanalysis in BALTEX framework (Fortelius et al. 2002), which used the HIgh Resolution Limited Area Model (HIRLAM) (Unden et al. 2002) model for reanalyses of climatic and hydrological parameters over the Baltic Sea drainage basin during one annual cycle (from September 1999–October 2000). All these projects have justified the expectations and have created high-quality local meteorological datasets surpassing global reanalysis in resolution and quality.

The main advantage of regional reanalysis is definitely the increased horizontal resolution, which provides the enhanced description of surface and boundary layer effects. Better representation of interaction of large scale circulation with regional geographic features like orography, land-sea distribution and soil types is achieved and better accuracy for near-surface and boundary layer fields can be expected.

The main deficiency of any regional reanalysis is not being a global one. However, a limited area model reanalysis includes implicitly also the global observational data through boundary conditions that are obtained from a (supposedly best available) global reanalysis project. In this perspective, a regional reanalysis should not be considered as a competition, but rather as a complement to global reanalysis datasets.

The aim of current project is creation of a high-resolution database BaltAn65+, covering the Baltic Sea region (Figs. 1 and 2). This area can be characterized with complicated, variable landscapes, undulating coastlines, plentiful islands, and richness with inland waters, appearing thus as a rewarding object for regional climatic database creation.

Fig. 1
figure 1

Global view of the Baltic Sea region regional reanalyse domain

Fig. 2
figure 2

Modelling area of BaltAn65+. Bar of grey tones shows the geopotential (m2/s2) of the underlying sea (=0) and ground (>0) surface

Limited usability of ERA-40 in such a diverse area is well characterized by the local example of Estonia. Existing ERA-40 dataset with its 125 km grid-resolution does not provide any possibility to research local climatic differences and developments (local pollution or pollen dispersion, statistics of wind energy etc.) due to small size of Estonian territory, which is 45,228 km2. Resolution of ERA-40 ground surface data leaves the undulating and rigorous Estonian coastline completely unresolved, although this is of great importance for surface parameters, especially for the 10 m coastal winds (Fig. 3).

Fig. 3
figure 3

ERA-40 equivalent-grid (125 km resolution) approximation of the surface geopotential (m2/s2) in a part of BaltAn65+ area (Estonia, Latvia, part of the Baltic Sea, fragments of Finland and Sweden)

Also, despite the small territory, local climate of Estonia is rather diverse, because Estonia is located in transitional zone between maritime and continental climates, causing great differences in meteorological and phenological conditions between the west coast and the eastern parts of the country. Depending on the relief characteristics, vegetation, inland waters and the distance from the coast, a remarkable meso- and micro-climatic variety is observable (Jaagus and Ahas 2000). These important regional differences are smoothed out by the coarse resolution of the global database. Similar examples can be presented also for other areas like different parts of Scandinavia. Thus, using higher resolution than 125 km of ERA-40 for local studies is essential (Fig. 4).

Fig. 4
figure 4

Representation of the surface geopotential of the same area, as shown in Fig. 3, by BaltAn65+ model grid (11 km resolution)

Another reason for new database creation is the fact that Estonia started its NWP activities recently and thus lacks the comprehensive archive of operational NWP output, available for scientific communities in many other countries. The usefulness of such information has proved itself in practice in scientific research and many other applications.

In Sect. 2, more details about the reanalysis domain and model setup are given. Section 3 contains overview of the numerical model, which is followed by analysis of the actual quality of the analysed meteorological data in Sect. 4.

2 Reanalysis domain and model setup

The applied modelling area is shown in Fig. 2. Horizontal grid resolution is 0.1 degrees (about 11 km), providing 206 × 206 points in horizontal and 60 levels in vertical. The lowest model level layer is located at 30 m height and the model top is located at approximately 30 km with the layer depths varying from 60 m at the ground to approximately 3 km at the topmost levels. BaltAn65+, like ERA-40 makes use of 60 levels in vertical, but distribution of levels is different. Lower part of the atmosphere is presented on Fig. 5a, showing that ERA-40 has a few more layers in boundary layer (at the height of 5 lowest BaltAn65+ levels are 6 corresponding levels of ERA-40), but BaltAn65+ has more layers in the troposphere (see Fig. 5b). No orographic smoothing is applied.

Fig. 5
figure 5

Comparison of ERA-40 and BaltAn65+ vertical grids. a is lower part of the atmosphere and b is the full domain. On the left is ERA-40 grid and on the right BaltAn65+ on both figures

The created reanalysis database for the Baltic Sea region describes the development of the climate system according to HIRLAM model analyses and prognoses. The period of the reanalysis is 01.01.1965–31.12.2005. This period is interesting due to systematic climate warming in Northern Europe accompanied with rise in CO2 concentrations and atmospheric transparency until the end of 1980-ties (Wild et al. 2005; Solomon et al. 2007). The interval of saving model states (snapshots) is 6 h, four times a day in standard meteorological hours 00, 06, 12, 18 UTC. Each record of the time stamp of the database consists of three data files in World Meteorological Organization (WMO) GRIB (GRIdded Binary) version 1 format. Tables with exact content and structure of the GRIB files are assembled in Appendix A.

Fields, that are a direct result of 3D-Var upper air data assimilation, are surface pressure, wind, temperature and humidity, the latter three at the model levels. Surface data assimilation, which is performed before upper air, upgrades values for surface and deep surface temperature and moisture, 2 m temperature and humidity. All other fields (precipitation, clouds, etc) are taken from the background (previous cycle) forecast. Snow cover is not analysed.

  1. 1.

    The main analysis file contains parameters describing atmospheric state (temperature, wind components, specific humidity, cloudiness, amount of cloud liquid water and ice, turbulent kinetic energy) on every model layer (Appendix table 1) and also the most important surface and screen level parameters (Appendix table 2) over the whole area. Surface level and screen level parameters and surface fluxes are presented according to tiles of surface type (sea/lake water, ice, bare land, agricultural terrain/low vegetation and forest) (Appendix table 3).

  2. 2.

    File of the model diagnostics consist of single model layer (top and bottom) atmospheric data (Appendix table 4) and additional ground surface and diagnostic parameters characterizing prognosis cycle (Appendix table 5).

  3. 3.

    Verification file of the Analysis consists of atmospheric parameters interpolated to 10 standard pressure levels (100, 200, 250, 300, 400, 500, 700, 850, 925, 1,000 hPa) (Appendix table 6), and ground surface and screen level parameters like the mean sea level pressure, geopotential, surface and 2 m temperatures, 10 m wind speed and direction, dew point temperature, relative humidity, total precipitation, depth of snow cover and water equivalent (Appendix table 7), which can be used to carry out comparisons of model products against the results of meteorological observations.

In addition to the analysis files, the base record of each time stamp contains verification files, which include the mean biases and root mean square (RMS) errors of analyses, the initialized analyses and the first-guess files with respect to observations in the analysis cycle. Logfiles of all model runs are archived as well.

3 Description of the applied NWP model HIRLAM

The NWP model used for creating reanalysis database is HIRLAM version 7.1.4. (released 18.02.2008 HIRLAM 7.1.4 is latest fixed version of HIRLAM 7.1 model used operationally 2007–2008). The most comprehensive HIRLAM model description is the official scientific documentation (Unden et al. 2002), which is available for model version 5. Since publishing, many parts of the model have changed and improved, mostly physical parameterizations, data assimilation and various other aspects. These updates, technical details and properties of the model are described in the HIRLAM Technical Reports and Newsletters, which are regularly published by HIRLAM Management Group. All recent HIRLAM related documentation is available at HIRLAM web site http://www.hirlam.org. In following, a short description of the current model and main parameters used in the reanalysis simulations are presented.

HIRLAM is a grid-point, limited area, numerical, atmospheric dynamics model, intended for short range weather prediction. Hybrid pressure coordinate is used as a vertical coordinate, semi-implicit semi-Lagrangian scheme is used for temporal integration. The integration time-step is 360 s (6 min) and 6-hour forecasts are made at each analysis cycle.

For the physical parameterizations, following model components are used:

Savijärvi radiation scheme (Wyser et al. 1999; Sass et al. 1994; Savijarvi 1990; Unden et al. 2002) Planetary boundary layer parameterization in HIRLAM is based on a prognostic turbulent kinetic energy equation with diagnostic length scale (TKE-l), which more precisely is a modification of the Cuxart scheme (Cuxart 2000) by (Lenderink and de Rooy 2000). Surface wind fields are handled as in (Geleyn 1988).

For clouds and condensation STRACO scheme is used (Unden et al. 2002).

The surface scheme is based on ISBA (Noilhan and Planton 1989; Noilhan and Mahfouf 1996). For better representation, grid cells are tiled to five different surface types. Fluxes of each type are calculated separately and they interact with atmosphere directly [the aggregation of fluxes approach, (Avissar and Pielke 1989)].

In reanalysis, standard HIRLAM physics modules are used with one exception. In HIRLAM 7.1.4, wintertime temperatures over the sea ice are sometimes unphysically low and to override the problem, the routine for calculation of that temperature is taken from the earlier HIRLAM 6.4 model.

Modeling cycle for reanalysis database is 6 h. Every cycle begins with providing new observational data and boundary conditions. Boundary fields are created by interpolation of corresponding ERA-40 fields to the HIRLAM grid. The surface pressure, temperature, wind speeds, specific humidity, and sea surface temperature are the applied boundary fields.

Preparation of initial condition for model run contains two steps, the data assimilation and the initialization. In the current project, the HIRLAM 3D-Var (3-dimensional variational) data assimilation scheme with statistical balance is used (Gustafsson et al. 2001; Lindskog et al. 2001). It minimizes a cost function measuring the distances between the model state and background field, and the model state and observations, respectively. HIRLAM has the possibility to use blending of ECMWF analysis, this is not used for current re-analysis.

Near surface fields, like the 2 m temperature and soil temperature, are analysed separately with surface analysis based on optimum interpolation. This analysis has an impact on upper air analysis via surface fluxes. Sea surface temperatures are analysed from ECMWF SST model fields as pseudo-observations with successive correction method (Unden et al. 2002). Data assimilation contains also quality control for both observations and background field. The observations which are used in BaltAn65+ reanalysis are conventional TEMP, AIREP, PILOT, SYNOP, SHIP and DRIBU data from WMO observational network. Observational data for Baltan65+ project is acquired from the ERA-40 database where possible and from Finnish Meteorological Institute’s operational HIRLAM observation archive for the period after 2002. Boundary fields come from ERA-40, where possible. After 2002, ECMWF operational model has been used for boundary data.

During initialization, fast ageostrophic gravitational waves are removed from the analysed atmospheric state to get a more suitable, smooth initial condition for dynamic model. In current project, incremental digital filtering with Dolph-Chebychev filter (Lynch 1997) is used.

4 Performance of the analysis system, quality of BaltAn65+ data

While 3D-Var scheme can use different kind of observations, radiosonde (TEMP) measurements have by far the biggest influence on upper air analysis. Number of radiosonde measurements used in observations is illustrated by Fig. 6.

Fig. 6
figure 6

Number of TEMP measurements

The number of observations at 00’UTC and 12’UTC remains fairly constant over time, showing drop from around 40–25 at 1990, but there is a major drop for the 06’UTC and 18’UTC measurements near 1990. Geographic locations for this drop are shown on the Fig. 7, showing that they disappeared from the former USSR territory.

Ship, DRIBU and AIREP measurements follow start times similar to ERA-40, SHIP and buoy since 1965, AIREP since 1973.

Number of surface synoptic (SYNOP) measurements is shown in Fig. 8. Years before 1967 do not cover Norway, Sweden, Finland and Poland, so surface fields have only a limited value for these countries until 1967.

Fig. 7
figure 7

Locations of TEMP measurements at 06’UTC in years a 1990 and b 1992

Fig. 8
figure 8

Number of SYNOP measurements

Standard verification scores for mean sea level pressure at 00’UTC are presented in Figs. 9, 10, 11, 12. Figures 9 and 10 present bias of background forecast and analysis, while the RMS errors of background and analysis are shown in Figs. 11 and 12. The most notable feature seen in these graphs is that both analysis and background errors decrease significantly after 2002. While slight increase in number of surface observations (Fig. 7) takes place at similar time, most of the effect is probably caused by the change of boundary fields from ERA-40 to ECMWF operational model. RMS error values differ from bias mainly by showing clear annual variability, with colder seasons having larger errors and variances.

Fig. 9
figure 9

Background mean sea level pressure bias at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 10
figure 10

Analysed mean sea level pressure bias at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 11
figure 11

Background mean sea level pressure RMS error at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 12
figure 12

Analysed mean sea level pressure RMS error at 00’UTC. Grey line represent exact values, black line 30-day moving average

For upper air fields, statistics about analysed 500 hPa geopotential height and 850 hPa temperatures is presented. For 500 hPa geopotential height, Figs. 13, 14, 15 show bias and Figs. 16, 17, 18 show RMS errors for 00, 06 and 12’UTC, respectively. The change in verification scores that was seen in pmsl data, is not observable in 500 hPa statistics at 00’UTC and 12’UTC. While a very slight trend towards more accurate results is observable in bias, clear improvement trend over the 40-year period can be seen in RMS error graphs for 00’UTC (Fig. 16) and 12’UTC (Fig. 18). Due to the small number of measurements, the analysed field for 06’UTC has a bigger variance and also a rise in RMS error for the last two years, which emerges simultaneously with change in amount of observations shown in Fig. 6. For 850 hPa temperature, analysis bias and RMS error are presented in Figs. 19 and 20. Again, no clear trend exists in bias, but RMS error shows small improvement over the time (Figs. 21 and 22).

Fig. 13
figure 13

Analysed 500 hPa geopotential bias at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 14
figure 14

Analysed 500 hPa geopotential bias at 06’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 15
figure 15

Analysed 500 hPa geopotential bias at 12’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 16
figure 16

Analysed 500 hPa geopotential RMS error at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 17
figure 17

Analysed 500 hPa geopotential RMS error at 06’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 18
figure 18

Analysed 500 hPa geopotential RMS error at 12’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 19
figure 19

Analysed 850 hPa temperature bias at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 20
figure 20

Analysed 850 hPa temperature RMS error at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 21
figure 21

Analysed 850 hPa temperature bias at 06’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 22
figure 22

Analysed 850 hPa temperature RMS error at 06’UTC. Grey line represent exact values, black line 30-day moving average

Finally, statistics of calculated analysis increments (the RMS error of the analysed and background field difference) is presented in Figs. 23, 24, 25. Figure 23 presents 500 hPa geopotential height analysis increment. Though this characteristic has large annual variance, significant systematical improvement can be seen over the forty-year period. For comparison, similar plots for 06’UTC and 12’UTC are presented in Figs. 24 and 25. While the three plots look similar, analysis increments for 06’UTC tend to be smaller. This can be explained by different number of radiosonde measurements: bigger number of observations at 00’UTC and 12’UTC is the cause for better backround forecast for 06’UTC and 18’UTC analysis.

Fig. 23
figure 23

500 hPa geopotential height analysis increment at 00’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 24
figure 24

500 hPa geopotential height analysis increment at 06’UTC. Grey line represent exact values, black line 30-day moving average

Fig. 25
figure 25

500 hPa geopotential height analysis increment at 12’UTC. Grey line represent exact values, black line 30-day moving average

5 Conclusion and discussion

Regional re-analysis database for the Baltic Sea region has been created. All the data will be available to all interested parties via the web page of Estonian Meteorological and Hydrological Institute together with essential information about software tools, necessary for database handling.

The quality of analysis depends on the quality and the availability of data, which varies strongly in time. Trends in RMS error scores and analysis increments are a clear sign of the perpetual improvement of initial data and boundary field quality. While some problems persist, like the lack of SYNOP measurements during first two years and increase in RMS error scores of upper air fields at 06’UTC and 18’UTC during last two years, the overall performance of the system is acceptable and the database can be considered to be homogeneous enough, to perform comparisons between different time periods covered by the database. Basic quality control gives authors enough confidence that the data are comparable by quality to data assimilation of operational NWP models and can be exploited in the similar way.

The BaltAn65+ development team hopes and expects that this new meteorological database will be a prominent and useful supplement to global databases like ERA-40, providing a better and more precise assessment of known effects and also possible discovery of new fine-scale effects and trends in the last forty year Baltic region climatology.