1 Introduction

The eddy-covariance (EC) technique is widely used in environmental research to quantify the biosphere-atmosphere exchange of mass and energy. However, in numerous field experiments over the last few decades, it was found that the surface energy balance at the land surface could not be closed. The available energy, which is the sum of net radiation \((Q_\mathrm{S}^{*})\) and ground heat flux \((Q_\mathrm{G})\), should be totally balanced by the surface fluxes of sensible \((Q_\mathrm{H})\) and latent heat \((Q_\mathrm{E})\) except for minor terms, e.g. photosynthesis. In most field experiments and long-term measurements, the sum of turbulent fluxes usually are about 10–30 % lower than the available energy, which produces an energy balance residual \(\Delta \) up to more than 100 W \(\hbox { m}^{-2}\) around noon in summer (Desjardins 1985; Lee and Black 1993; Twine et al. 2000; Wilson et al. 2002; Oncley et al. 2007; Foken et al. 2010; Hendricks-Franssen et al. 2010; Barr et al. 2012; Stoy et al. 2013),

$$\begin{aligned} -Q_\mathrm{S}^{*}-Q_\mathrm{G} =Q_\mathrm{H} +Q_\mathrm{E} +\Delta . \end{aligned}$$
(1)

The fluxes are defined to be positive when directed away from the earth’s surface. As an alternative to Eq. 1, an energy balance ratio \((R)\) can be defined as the ratio of turbulent energy to available energy (Wilson et al. 2002),

$$\begin{aligned} R=\frac{Q_\mathrm{H} +Q_\mathrm{E} }{-{Q_\mathrm{S}}^*-Q_\mathrm{G} }. \end{aligned}$$
(2)

In the literature, various hypotheses for the non-closure of the energy balance can be found. Instrumental issues have always been under discussion, but intercomparison experiments such as Mauder et al. (2007c) could not confirm that instrumental errors are the reason for the systematic underestimation of turbulent fluxes. Currently, the underestimation of vertical wind velocity by non-orthogonal sonic anemometers is considered (Kochendorfer et al. 2012). Furthermore, a considerable amount of energy can be stored in the soil, vegetation and air between the surface and the location of the measurement devices (Leuning et al. 2012). The energy storage in the atmosphere is directly related to the flux divergence (Betts et al. 1992). For measurements above low vegetation, the energy storage in air and biomass is negligible, but not the energy storage in the soil (Oncley et al. 2007). In this study, the latter was determined according to Liebethal et al. (2005) and included into \(Q_\mathrm{G}\).

The aim of our study is to test two parametrizations of the energy balance closure that follow the hypothesis that large-scale circulations are the reason for the non-closure of the energy balance (Sect. 2). Huang et al. (2008) argue that the EC method underestimates turbulent fluxes as compared to spatially-averaged fluxes due to large, organized structures, whereas Panin and Bernhofer (2008) claim that non-closure is correlated with the heterogeneity of the land surface, which can be regarded as an initiator of secondary circulations. Therefore, the role of large-scale circulations on EC measurements will be explained further in the following paragraphs.

Conventionally, EC flux estimates are based on block averages over time periods of a certain length, usually 30 min. It is assumed that within this time interval, the meteorological conditions are sufficiently stationary and all turbulent scales are captured. However, low-frequency motions with time scales larger than the selected averaging time (Finnigan et al. 2003) and non-propagating eddies that do not move with the mean wind (Lee and Black 1993; Mahrt 1998) are not detected: when calculating the EC flux, a Reynolds decomposition is applied, the flux is calculated from the covariance of the fluctuating parts of two atmospheric entities, but the structures mentioned above only affect the mean components. During daytime, when the absolute values of the fluxes are large, the air close to the ground is usually warmer and moister than above, so that the non-consideration of any turbulent transport mechanism would lead to an underestimation in \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\), i.e. to a non-closure of the energy balance (Mauder et al. 2010). In addition, low-frequency motions and non-propagating eddies might also be responsible for the net advection of energy into the control volume of the measurement, which is an alternative explanation for the energy balance closure problem (Aubinet et al. 2003).

Horizontal homogeneity is one condition for the validity of EC theory (Finnigan et al. 2003), and above such surfaces, energy balance ratios \(\approx \)1 have been reported (Heusinkveld et al. 2004; Mauder et al. 2007a; Leuning et al. 2012). However, most sites do not fulfil the condition of horizontal homogeneity, especially at the landscape scale. Secondary circulations that are induced by surface heterogeneities, e.g. differences in surface roughness, temperature and moisture (Segal et al. 1988; Segal and Arritt 1992; Sun et al. 1997; Strunin and Hiyama 2005), transport up to 30 % of scalar fluxes (Randow et al. 2002) and were related to the energy balance closure problem (Foken 2008; Foken et al. 2010; Ingwersen et al. 2011). Moreover, Desjardins et al. (1992) showed during the First ISLSCP Field Experiment (FIFE) that heterogeneous topography also induces a lack of energy balance closure.

Secondary circulations are also called ‘mesoscale’ motions because the respective size of the surface patches is crucial: for the mesoscale heterogeneity regime, the established theories developed for the homogenous atmospheric boundary layer (ABL) are not valid (Mahrt 2000). The minimum size that surface patches must have in order to induce mesoscale motions, is considered to be of the size of the ABL height (Shen and Leclerc 1995; Raasch and Harbusch 2001). Patton et al. (2005) found that the wavelength of any surface heterogeneity should be around 4–9 times the ABL height in order to be effective in generating secondary circulations. Simulations above multi-scale landscapes showed that mesoscale motions have a preferred scale \(\approx \)10–20 km (Baidya Roy et al. 2003) and they are mainly formed due to surface differences in the upstream area (Maronga and Raasch 2013). However, complex topography, small-scale heterogeneities and flow singularities might also be responsible for the non-closure of the energy balance, at least at certain sites in ‘non-ideal’ terrain (Panin et al. 1998; Foken et al. 2010).

To date, however, there is still no clear evidence that surface heterogeneities and secondary circulations cause the non-closure of the energy balance. The arguments presented above should be regarded as hypotheses, not as facts. Nevertheless, because surface fluxes are necessary input parameters for atmospheric models, and the EC method is widely used to determine trace gas budgets on an ecosystem scale, a robust parametrization of the missing turbulent energy is necessary. If the magnitude of underestimation, as well as the partitioning of the missing energy among sensible and latent heat are known, the measured fluxes can be corrected.

Therefore, we tested the two above-mentioned approaches that attempt to parametrize the non-closure of the energy balance, one of Huang et al. (2008) and one of Panin and Bernhofer (2008). Firstly, the two approaches are presented (Sect. 2) and data from aircraft and ground-based tower measurements (Sect. 3) are used to determine the respective value of \(R\) and to test both parametrizations (Sect. 4). The observed energy balance closure is discussed in the light of landscape heterogeneities. Fundamental shortcomings of the existing parametrization approaches are highlighted and potential improvements suggested (Sect. 5). We debate whether the underlying hypothesis, i.e. the existence of mesoscale circulations, can be verified. In addition, a note on the partitioning of the missing energy between \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\) is made, which is not included in either of the two parametrizations.

2 Approaches to Parametrize the Non-Closure of the Energy Balance

2.1 Parametrization of Huang et al. (2008)

Huang et al. (2008) used large-eddy simulation (LES) to develop a parametrization scheme, based on earlier LES studies of Kanda et al. (2004) and Steinfeld et al. (2007). They showed that the spatial average of the EC flux of any scalar \(\varphi \) that is based on the temporal covariance \(\left\langle {\overline{{w}'{\varphi }'} } \right\rangle \) (the ‘temporal’ EC method) is smaller than the temporarily-averaged spatial covariance \(\overline{\left\langle {{w}''{\varphi }''} \right\rangle } \) (the ‘spatial’ EC method), where \(w\) represents the vertical wind component. Here, the single prime denotes deviations from the temporal mean, the double prime denotes deviations from the spatial mean, angular brackets represent spatial averages, and the overbar represents temporal averages. Under horizontally homogeneous conditions and a negligible spatially-averaged vertical velocity \(\langle w \rangle \), the spatial EC flux can be regarded as the ‘true’ ensemble flux (Kanda et al. 2004; Steinfeld et al. 2007), so that the mean imbalance \(\langle I \rangle \) of the temporal EC method can be defined as

$$\begin{aligned} \left\langle I \right\rangle =\frac{\left\langle \, {\overline{{w}'{\varphi }'} } \,\right\rangle -\overline{\left\langle \, {{w}''{\varphi }''} \,\right\rangle } }{\overline{\left\langle {{w}''{\varphi }''} \right\rangle } }. \end{aligned}$$
(3)

In terms of conventional EC tower measurements, this definition is equivalent to the ratio \(\Delta /(-Q_\mathrm{S}^{*}-Q_\mathrm{G})\). The underestimation of the ‘true’ flux by the temporal EC method can be explained by the presence of turbulent organized structures, which are a fundamental property of the homogeneous convective ABL. They are characterized by narrow bands of relatively strong thermal updrafts that are surrounded by larger regions of weak subsidence. Under low wind conditions, the thermal updrafts are organized into spoke-like patterns, whereas with a significant background flow, streaky patterns appear near the surface, which merge into roll-like vortices at larger heights (Schmidt and Schumann 1989; Moeng and Sullivan 1994). It is important to note that these coherent structures remain at the same place even after 1-h averaging (Kanda et al. 2004). Therefore, they produce low-frequency trends and mean vertical advection, which cannot be reliably captured by point measurements and the use of the temporal EC technique. Consequently, turbulent fluxes are systematically underestimated, resulting in a negative imbalance. Kanda et al. (2004); Steinfeld et al. (2007) and Huang et al. (2008) argued that this imbalance can explain the energy balance residual detected by the EC approach in field measurements.

Following Huang et al. (2008), the magnitude of the underestimation depends on friction velocity \(u_{*}\), the Deardorff convective velocity \(w_{*}\), and the ratio of measuring height \(z\) to the ABL height \(z_{i}\),

$$\begin{aligned} \left\langle I \right\rangle =-f_1 \left( {\frac{u_*}{w_*}} \right) f_2 \left( {\frac{z}{z_i}}\right) . \end{aligned}$$
(4)

The empirical functions \(f_{1}\) and \(f_{2}\) capture an exponentially decreasing dependence of \(\langle I\rangle \) on \(u_{*}/w_{*}\) and an elliptic relationship to \(z/z_{i}\) and are further explained in Eqs. 11a, b and Table 2 in Huang et al. (2008). With increasing \(u_{*}\) the absolute value of the imbalance should decrease (Aubinet et al. 2000; Wilson et al. 2002; Hendricks-Franssen et al. 2010), because the stronger mechanical turbulence increases the probability that large-scale structures are advected past the point measurement and can be captured by the temporal EC method (Kanda et al. 2004). Larger sensible heat fluxes, inducing larger values of \(w_{*}\), are supposed to be accompanied by lower flux imbalances (Kanda et al. 2004; Steinfeld et al. 2007). Finally, the imbalance relates to the height within the ABL \((z/z_{i})\) because turbulent organized structures are strongest in the middle of the ABL and the magnitude of imbalance might also be influenced by entrainment processes at the top of the ABL (Steinfeld et al. 2007; Huang et al. 2008).

2.2 Parametrization of Panin and Bernhofer (2008)

Besides the parametrization of Huang et al. (2008) an alternative approach was proposed by Panin and Bernhofer (2008). Whereas the former concentrates on the large-scale structure of turbulence in a homogeneously heated ABL, the latter explicitly focuses on the role of surface heterogeneities by advancing the concept of Panin et al. (1998). Moreover, Panin and Bernhofer (2008) only propose a parameter that describes the mean closure of a site, whereas the approach of Huang et al. (2008) is designed to parametrize the energy balance closure on a 30-min basis.

The presence of surface heterogeneities violates basic assumptions of the EC methodology, so that they can be regarded as a potential reason for the lack of energy balance closure. In heterogeneous landscapes, cospectra of surface fluxes show additional peaks in the low-frequency range, far beyond the typical averaging period of the temporal EC method (Finnigan et al. 2003). Consequently, flux contributions associated with these low-frequency motions are not captured. Therefore, the sum of sensible and latent heat fluxes determined by the EC method under the assumption of horizontal homogeneity \((Q_\mathrm{H}+Q_\mathrm{E})_\mathrm{HH}\) has to be corrected by a factor \(k_\mathrm{f}\), which is independent of the magnitude of the fluxes, in order to retrieve the ‘true’ sum of the heat fluxes in inhomogeneous terrain \((Q_\mathrm{H} + Q_\mathrm{E})_\mathrm{INH}\),

$$\begin{aligned} \left( {Q_\mathrm{H} +Q_\mathrm{E}}\right) _\mathrm{HH} =k_\mathrm{f} \left( {Q_\mathrm{H} +Q_\mathrm{E} }\right) _\mathrm{INH}. \end{aligned}$$
(5)

Here, \((Q_\mathrm{H}+Q_\mathrm{E})_\mathrm{INH}\) is equivalent to the amount of available energy \((- Q_\mathrm{S}^{*}-Q_\mathrm{G})\). Panin and Bernhofer (2008) showed that the magnitude of the correction factor \(k_\mathrm{f}\) relates to the heterogeneity of the surrounding landscape. As a measure of non-homogeneity, they primarily focussed on surface roughness expressed by the roughness length \(z_{0}\), and suggested a parameter \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\), where \(z_{0}^\mathrm{eff}\) is the effective roughness length of the surrounding area of the measurement location and \(L^\mathrm{eff}\) is the dominant horizontal scale of landscape inhomogeneities. Following Panin and Bernhofer (2008), there should be a linear dependence between \(k_\mathrm{f}\) and \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\),

$$\begin{aligned} k_\mathrm{f} =K\frac{z_0^\mathrm{eff} }{L^\mathrm{eff}}+C, \end{aligned}$$
(6)

with \(K\) and \(C\) being empirical constants.

The analysis is restricted to several km around the measuring point because only heterogeneities within that distance have a significant influence on the exchange process.

3 Data and Analysis

3.1 Description of Aircraft Data and Tower Measurements

Parts of the dataset for this study were taken from flights of the Twin Otter aircraft of the Canadian National Research Council (NRC). An intercomparison study proved the reliability of the data with respect to spectra, standard deviation of horizontal and vertical wind and flux estimates (Dobosy et al. 1997). In the framework of the Boreal Ecosystem-Atmosphere Study (BOREAS) experiment (Sellers et al. 1997), 16 flights were conducted over the boreal forest at Candle Lake, Saskatchewan, Canada between May and September 1994. These “Candle Lake Runs” always followed the same flight route along an approximately 115 km long transect between \(53.57^{\circ }\hbox {N}, 106.40^{\circ }\hbox {W}\) and \(53.98^{\circ }\hbox {N}, 104.29^{\circ }\hbox {W}\) at a height of roughly 30 m a.g.l. (MacPherson 1996). The landscape is flat and can be characterized as a homogeneous boreal forest with numerous major or minor lakes distributed within; for details, see Mauder et al. (2007b). Measurements of the wind vector, temperature \((T)\) and water vapour mixing ratio \((q)\) were conducted on board of the Twin Otter aircraft, with a sampling rate of 16 Hz (MacPherson 1996). In April 2002, four additional flights took place on the same track in the framework of the Boreal Ecosystem Research and Monitoring Sites (BERMS) programme (Barr et al. 2002), when data were recorded at a sampling rate of 32 Hz.

In addition, tower-based measurements were also used to test both parametrizations. These measurements were conducted in the framework of the Terrestrial Environmental Observatories (TERENO) programme (Zacharias et al. 2011), which aims at observing the long-term impact of global change at a regional level. EC masts were mounted at three sites: Graswang (\(47.57^{\circ }\hbox {N}, 11.03^{\circ }\hbox {E}\), 865 m a.s.l.), Rottenbuch (\(47.72^{\circ }\hbox {N}\), \(10.97^{\circ }\hbox {E}\), 763 m a.s.l.) and Fendt (\(47.82^{\circ }\hbox {N}\), \(11.06^{\circ }\hbox {E}\), 600 m a.s.l.). They are located in the Ammer catchment in southern Bavaria, Germany and are part of TERENO’s Bavarian Alps/pre-Alps Observatory (Figs. 1, 2). The landscape is a varied mosaic of villages, grassland, some arable land and coniferous and mixed forests. Furthermore, the sites are surrounded by hilly terrain and the southernmost site, Graswang, is already located in the “Ammergebirge” mountains. All three EC stations are located on grassland (Fig. 2b, d, f). The EC stations run continuously and here, data for the period 1 April 2012–30 September 2012 were used. The equipment used for this study is listed in Table 1, and further information on instrumentation and data storage can be found in Zacharias et al. (2011).

Fig. 1
figure 1

Overview of the of the Bavarian Alps/pre-Alps Observatory located in southern Germany (a) and overview of the Ammer catchment (b) with black squares showing the locations of the TERENO sites (Zacharias et al. 2011)

Fig. 2
figure 2

Satellite images (area roughly 25 km \(\times \) 25 km, i.e. landscape scale, (a, c, e) and 2 km \(\times \) 2 km, i.e. flux footprint scale (b, d, f), Google 2013) of Fendt (a, b), Rottenbuch (c, d) and Graswang (e, f), where yellow crosses indicate the approximate location of the EC masts

Table 1 Part of the equipment of the TERENO stations (Zacharias et al. 2011) of the Bavarian Alps/pre-Alps Observatory used for this study

3.2 Calculation of Turbulent Fluxes and the Energy Balance Residual

In order to derive estimates of \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\) from the time series of the Twin Otter aircraft, wavelet analysis (Grossmann and Morlet 1984; Kronland-Martinet et al. 1987; Grossmann et al. 1989) was applied. As compared to the Fourier transform, the wavelet transform has the advantage that it is able to resolve the signal in frequency domain and time domain. Therefore, it especially allows spectral analysis of non-stationary data, e.g. if the dominant transport scales vary along the flight track. Details on the applied wavelet methodology can be found in Torrence and Compo (1998) and Mauder et al. (2007b). Here, only the fundamental steps are given.

In analogy to Mauder et al. (2007b), the Morlet wavelet with frequency parameter \(\omega _{0} = 6\) served as mother wavelet because of its better resolution in the frequency domain than other mother wavelets that are frequently used for analyzing atmospheric time series, such as the Haar wavelet or the Mexican Hat wavelet. Let \(T_{x}(a, b)\) and \(T_{y}(a, b)\) denote the wavelet coefficients of two time series \(x(n)\) and \(y(n)\), where \(a\) is the scaling and \(b\) is the translation parameter. The integration over the product \(T_{x}(a, b) T_{y}^{*}(a, b)\) in time and frequency domain, where the asterisk represents the complex conjugate, gives an estimate of the covariance, i.e. an estimate of the turbulent flux. We considered only those wavelet coefficients outside of the cone of influence; accordingly, the cross-spectrum \(W_{xy}(a)\) is defined as

$$\begin{aligned} W_{xy} \left( a \right) =\frac{\delta j}{a}\frac{\delta t}{C_\delta }\frac{1}{N}\sum _{b=0}^{N-1} {T_x \left( {a,b} \right) \;T_{y} ^{*}\left( {a,b} \right) } \end{aligned}$$
(7)

and gives the spectral contributions to the turbulent flux \(\overline{{x}'{y}'} \) (Hudgins et al. 1993). The non-dimensional factor \(\delta j = 0.25\) determines the spacing between the scales \(a\) of the wavelet transform, \(\delta t\) denotes the timestep of the time series, \(C_{\delta } =0.776\) is a constant for the Morlet wavelet and \(N\) is the length of the time series. The integration over the cross-spectrum in the frequency domain gives the total flux. When the integration is restricted to distinct scales, their respective contribution to the total turbulent flux can be quantified.

As suggested by Williams et al. (1996) and Strunin and Hiyama (2005), 2 km was chosen as the cut-off wavelength to separate small-scale from mesoscale motions. We claim that the scales larger than 2 km are not captured by typical EC masts, because they represent secondary circulations that do not move with the mean wind. Under the assumption that the neglect of mesoscale fluxes is the major reason for the lack of energy balance closure, their respective flux contributions could be regarded as an estimate of the corresponding value of \(\Delta \) (Eq. 1) and the ratio of small-scale contributions to total turbulent fluxes should be equivalent to \(R\) (Eq. 2). Accordingly, the negative mean energy imbalance \(-\langle I\rangle \) (Eq. 3) is the mesoscale contribution divided by the total turbulent flux. Please note that this is a very crucial step, since this method might underestimate \(\Delta \) because other potential reasons for the lack of energy balance closure, such as instrumental problems or horizontal heterogeneity within the footprint of the measurement, are not considered. The method could also overestimate \(\Delta \) if at least some parts of the scales \(>\)2 km completely pass by an EC tower within the averaging period of 30 min (Sect. 5.1).

Turbulent fluxes at the EC stations in Graswang, Rottenbuch and Fendt were derived by using the software package TK3.1 (Mauder, Foken 2011); for detailed information on post-processing of the high-frequency tower data, including quality assessment and quality control, see Mauder et al. (2013). The respective EC estimates of \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\) were derived on a 30-min basis. A considerable amount of energy is stored in the soil layer between the soil heat flux plate at depth \(z_\mathrm{HFP}\) and the soil surface (Oncley et al. 2007). This storage term was calculated using a calorimetric approach and added to the measured soil heat flux \(Q_\mathrm{G,HFP}\) (Liebethal et al. 2005), so that the ground heat flux \((Q_\mathrm{G})\) is defined as

$$\begin{aligned} Q_\mathrm{G} =Q_\mathrm{G,HFP} +\int \limits _{Z_\mathrm{HFP}}^0 {c_\mathrm{v,soil} \frac{\delta T}{\delta t}\text {d}z}. \end{aligned}$$
(8)

The amount of stored energy equals the vertically integrated temperature change \(\delta T/\delta t\) within the considered time interval multiplied by the volumetric heat capacity of the soil \(c_{v,soil}\). For testing both parametrizations, only those EC data with the best quality flag of 0 according to the scheme described in Mauder et al. (2013) were used; these high-quality data are characterized as being suitable for fundamental research. For the Huang et al. (2008) parametrization, only daytime data with absolute values of global radiation \(>\)30 W m\(^{-2}\) were considered. With respect to the approach of Panin and Bernhofer (2008), \(R\) was calculated on a daily basis.

3.3 Data Preparation for the Parametrization of Huang et al. (2008)

According to Huang et al. (2008), the energy balance residual is a function of \(u_{*}, w_{*}, z\) and \(z_{i}\) (Sect. 2). For the calculation of \(u_{*}\), high-frequency data of the wind vector are needed,

$$\begin{aligned} u_*=\left( {\overline{{u}'{w}'} ^{2}+\overline{{v}'{w}'} ^{2}} \right) ^{\frac{1}{4}}. \end{aligned}$$
(9)

With respect to the Twin Otter aircraft data, the wind vector was determined from the vector difference between the air velocity relative to aircraft, measured by pressure transducers of a Rosemount 858AJ 5-hole probe, and the aircraft velocity relative to the ground (MacPherson 1996). For the EC tower data from the TERENO sites, the wind vector was measured with a CSAT3 sonic anemometer (Table 1) and \(u_{*}\) was automatically calculated with the TK3.1 software.

For the calculation of the convective velocity scale \(w_{*}\) (Deardorff 1972), the surface sensible heat flux \(\overline{{w}'\theta ^{\prime }}_{0}\), a reference potential temperature \(\theta \) of the convective ABL, and \(z_{i}\) are needed,

$$\begin{aligned} w_*=\left( {\frac{g}{\theta } z_i \overline{{w}'\theta ^{\prime }}_0 } \right) ^{\frac{1}{3}}, \end{aligned}$$
(10)

where \(g\) represents the acceleration due to gravity. For the aircraft data, \(\overline{{w}'\theta ^{\prime }} _0 \) was determined by integration of the cross-spectrum of \(w\) and \(T\) (Sect. 3.2). For \(\theta \), the mean potential temperature along the flight track was used and, during BOREAS, \(z_{i}\) was determined every hour by a ground-based 915-MHz wind profiling radar system near the north-eastern edge of the flight track at \(104.67^{\circ }\hbox {W}\) and \(53.91^{\circ }\hbox {N}\) (Wilczak et al. 1997). For BERMS, no values of \(z_{i}\) are available, so that the test of the parametrization of Huang et al. (2008) is restricted to the BOREAS dataset. With respect to the TERENO stations, the surface sensible heat flux was calculated with the TK3.1 software and for \(\theta \), the 30-min average of the sonic temperature was used. The ABL height was determined by continuously running CL51 ceilometers (Table 1) at all three TERENO sites, which measure profiles of optical backscatter intensities. The height of the mixed layer, which can be regarded as the ABL height during convective conditions, was determined with the software tool ‘BL_Matlab’ that was run in Matlab2012a (version 7.14). It uses the algorithm of ‘BL-VIEW’(Vaisala Oyj., Finland) that is based on the gradient method (Emeis et al. 2007, 2008). In the mixed layer, particle concentrations are considerably higher than in the free atmosphere, and accordingly, \(z_{i}\) is the height at which the first derivative of the measured backscatter intensity has a minimum.

3.4 Data Preparation for the Parametrization of Panin and Bernhofer (2008)

In order to test the parametrization of Panin and Bernhofer (2008), spatial distributions of \(z_{0}\) for the surrounding area are necessary, which can be derived from land-use maps. For the Candle Lake area, data from the Landsat Thematic Mapper image at a grid resolution of 30 m are available (Hall et al. 1997). For the TERENO sites, raster data on land use at a grid resolution of 100 m were obtained from the Coordination of Information on the Environment (CORINE) programme of the European Union. Here, raster data from version 16 of the CORINE Land Cover 2006 project (CLC2006) were used (URL: http://www.eea.europa.eu/data-and-maps/data/corine-land-cover-2006-raster-2). Afterwards, each land-use class was assigned a \(z_{0}\) value according to Table 2. The effective surface roughness \(z_{0}^\mathrm{eff}\) was evaluated using

$$\begin{aligned} \log z_{0}^\mathrm{eff} =\frac{1}{n}\sum _{i=1}^n {\log z_{0,i} } \end{aligned}$$
(11)

where \(n\) is the number of pixels of the \(z_{0}\) raster map of the analyzed terrain. This approach was originally proposed by Taylor (1987) but there would be more sophisticated schemes developed using LES that also consider the effect of horizontal variability, such as the blending-height concept of Bou-Zeid et al. (2007). In addition, the \(z_{0}\) for water (Table 2) is quite high, especially at low and moderate wind speeds (Smith 1988), but we wish to test the parametrization of Panin and Bernhofer (2008) without serious modification.

Table 2 Roughness length \(z_{0}\) for homogenous land-use patches (Wieringa 1993; Jong et al. 1999)

Here, \(L^\mathrm{eff}\) was determined from selected transects along the \(z_{0}\) map. For the Candle Lake area, both in the north–south direction and in the east–west direction, 10 transects of the \(z_{0}\) pixel map were randomly chosen. For the TERENO sites, the surrounding area was divided into eight wind sectors, and at the centre of each sector—at \(0^{\circ }, 45^{\circ },\ldots ,315^{\circ }\)—one transect of 15-km length and starting at the location of the EC mast was taken. For each transect, a Fourier spectrum was calculated. The dominant horizontal scale \(L\) of one transect is half the wavelength of the maximum of the Fourier spectrum. Generally, values of \(L\) were discarded if no clear maximum could be found for the transect, i.e. the Fourier coefficients increased with wavelength and the maximum of the spectrum appeared at the largest wavelength. Only for the Rottenbuch site, along two of the eight transects (\(135^{\circ }\) and \(180^{\circ }\)), no maximum was found in the Fourier spectrum. The \(L^\mathrm{eff}\) value of each area is the arithmetic mean of the \(L\) values of the respective transects. Panin and Bernhofer (2008) used only two transects, but it is assumed that the method applied here delivers a measure that is more representative of the surrounding area.

The CORINE raster data are projected in Lambert azimuthal equal-area projection centred at \(52^{\circ }\hbox {N}\) and \(10^{\circ }\hbox {E}\). The projection is not length preserving and as the TERENO sites are located at a considerable distance from the projection centre (Table 1), this introduces an error into the estimation of horizontal length scales. However, in order to keep the regular grid, the original raster data were not re-projected. All calculations, if not stated otherwise, were made with the statistical software R, version 14.2.

4 Results

4.1 Energy Balance Closure for the Candle Lake Aircraft Flights and the TERENO EC Measurements

For the Twin Otter aircraft flights in the Candle Lake area, the values of \(R\) range from 0.73 to 0.99 and for 15 of the 20 flights during BOREAS and BERMS, \(R\) is between 0.86 and 0.91. The missing energy at all TERENO sites within the period from 1 July 2012 to 31 October 2012 (Table 3) is larger than for the Candle Lake area. Graswang \((R =0.75)\) and Fendt \((R =0.77)\) sites are similar, whereas the Rottenbuch site exhibits an exceptionally poor energy balance closure \((R = 0.60)\). Thus, it is of interest whether the two parametrization approaches are able to explain these large differences.

Table 3 Observed values of \(R\) at the TERENO sites Graswang, Rottenbuch and Fendt in the period from July to October 2012; due to a malfunction of the LI-7200, no reliable data are available at the Rottenbuch site between 30 August and 12 October 2012
Table 4 Height of aircraft above the ground \(z\), boundary-layer height \(z_{i}\), friction velocity \(u_{*}\), convective velocity scale \(w_{*}\), and mesoscale flux contribution to the sum of the sensible and latent heat fluxes (interpreted as the negative mean energy imbalance along the flight track \(-\langle I\rangle \), Sects. 2.1 and 3.3) during the Twin Otter aircraft flights of the BOREAS experiment between May 1994 and September 1994 and the BERMS programme in April 2002

4.2 Parametrization of Huang et al. (2008)

Table 4 gives an overview of the data from the Twin Otter aircraft that are necessary for testing the parametrization of Huang et al. (2008). All flights were conducted at heights between 29 m and 65 m a.g.l. during fair weather conditions, i.e. the sky was either cloudless or less than 50 % cloud cover. The friction velocities range between 0.14 and 0.84 m s\(^{-1}\); values of \(w_{*}\) are highly variable (0.55–3.82 m s\(^{-1}\)), which could be attributed to the large differences in \(z_{i}\) (500–2,250 m). Thus, during these convective conditions, buoyant plumes need \(\approx \)10–20 min for rising from the ground to the top of the ABL. The mesoscale flux contributions determined from the aircraft flights range between 1.1 and 27.2 %.

With respect to the EC data from the TERENO sites, only daytime data were analyzed and values of \(u_{*}, w_{*}\) and \(z_{i}\) were calculated on a 30-min basis (data not shown). At the Rottenbuch site the values of \(u_{*}\) (median: \(\text { 0.21 m s}^{-1}\)) and \(w_{*}\) (median: 1.01 m s\(^{-1}\)) are larger than at Graswang (\(u_{*}\): 0.20 m s\(^{-1}\), \(w_{*}\): 0.95 m s\(^{-1}\)) and Fendt (\(u_{*}\): 0.17 m s\(^{-1}\), \(w_{*}\): 0.87 m s\(^{-1}\)) sites. The mean ABL height is largest at Graswang (median of \(z_{i}\): 997 m), which is located in the Alps, and smallest at the Fendt site (median of \( z_{i}\): 813 m), which is located furthest away from the mountains.

Huang et al. (2008) derived the parametrization in the following way: for constant \(z/z_{i}\), they attempted to find the function \(f_{1}\) that describes the dependence of \(-\langle I\rangle \) on \(u_{*}/w_{*}\). Then, they determined \(f_{2}\) by finding a fit for \(-\langle I\rangle /f_{1}\) versus \(z/z_{i}\) for zero background wind, i.e. free convection conditions and constant \(u_{*}/w_{*}\). Neglecting any interaction between both terms, the product of these two functions should be a reliable parametrization of \(-\langle I\rangle \). Figure 3 shows the attempt to find the function \(f_{1}\) for the BOREAS aircraft data and for the EC data from the TERENO sites. The data do not follow the exponential relation suggested by Huang et al. (2008) and no correlation was found between \(-\langle I\rangle \) and \(u_{*}/w_{*}\). This implies that \(u_{*}/w_{*}\) cannot explain the observed variability it \(-\langle I\rangle \), but the suggested universal function of Huang et al. (2008) might serve as a lower bounds for the data. The function \(f_{2}\) also cannot be confirmed, simply because it does not make sense to calculate \(-\langle I\rangle /f_{1}\), since \(f_{1}\) is already false. Furthermore, we could not find a significant correlation between - \(\langle I \rangle \) and \(z/z_{i}\). It should be noted that close to the surface \((z/z_{i} < 0.08)\), the function \(f_{2}\) becomes imaginary and the parametrization becomes invalid. All in all, the parametrization did not work for our near-surface measurements.

Fig. 3
figure 3

Observed energy imbalance \(-I\) for the 16 Candle lake runs during BOREAS and for the TERENO sites Graswang, Rottenbuch and Fendt as a function of the ratio of friction velocity to convection velocity \(u_{*}/w_{*}\). Due to the large amount of data, the respective data from the TERENO sites were binned into 15 groups of equal size according to their value of \(u_{*}/w_{*}\); for each group, the median and the interquartile range is displayed; for the Candle Lake data, the error bars represent the random error of the airborne flux measurements (Lenschow and Stankov 1986). In addition, the suggested function \(f_{1}\) for \(Q_\mathrm{H}\) by Huang et al. (2008) is indicated as a dashed line

4.3 Parametrization of Panin and Bernhofer (2008)

The maps of the Candle Lake area and the TERENO sites that show the spatial distribution of the roughness length \(z_{0}\) are displayed in Fig. 4. Lakes and rocky areas are green, grassland and agricultural land is yellow and villages and forests are red. The values of the heterogeneity index \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\) are \(3.66\times 10^{-5}\) for the Candle Lake area, \(1.4 \times 10^{-4}\) for the Graswang, \(2.11 \times 10^{-4}\) for the Rottenbuch and \(4.52 \times 10^{-5}\) for the Fendt sites. Thus, according to Panin and Bernhofer (2008), the Candle Lake area represents the least heterogeneous landscape.

Fig. 4
figure 4

Surface roughness maps of the Candle Lake area (a) and the TERENO sites at Graswang (b), Rottenbuch (c) and Fendt (d); in (b)–(d), black crosses mark the approximate location of the EC masts; please note that the colour scale is logarithmic

As shown in Fig. 5, only the data from Candle Lake and Graswang are in accordance with the results of Panin and Bernhofer (2008) and all sites lie above a linear fit \((R^{2} =0.87, p= 0.002)\) through the data of Panin and Bernhofer (2008). Please note that in Fig. 5, the regression line does not appear as a straight line due to the logarithmic \(y\)-axis. The non-closure of the energy balance, especially at the TERENO stations Fendt and Graswang, is poorer than this parametrization suggests. However, at least the order of the \(k_\mathrm{f}\) values can roughly be explained by the heterogeneity index, e.g. the Rottenbuch site has the poorest closure of all sites and the largest value of \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\). To summarize, the parametrized imbalance using the approach of Panin and Bernhofer (2008) is lower than the measured imbalance, especially for the sites in the pre-alpine region.

Fig. 5
figure 5

Energy balance correction factor \(k_\mathrm{f}\) for the Candle Lake area and the EC masts at Graswang, Rottenbuch and Fendt versus heterogeneity index \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\); median values and interquartile ranges are displayed; data shown in Panin and Bernhofer (2008) are indicated as grey dots and a fit through these data is displayed as a grey dotted line; please note that the abscissa is logarithmic

5 Discussion

5.1 Short Note on the Aircraft-Based Estimate of the Energy Balance Closure

An aircraft is able to detect mesoscale structures that EC towers are not able to capture due to its motion in space (Mauder et al. 2007b). Consequently, we assumed that the energy balance residual measured with EC towers can be estimated with the aircraft by determining the flux contribution of eddies of scales \(>\)2 km (Sect. 3.2). However, several conditions need to be fulfilled in order to apply this method.

First, the flight track should be sufficiently long in order to capture the complete longwave fraction of the flux, e.g. it should be \({>}50 z_{i}\) (Lothon et al. 2007). This condition is fulfilled, because the largest value of \(z_{i}\) during the BOREAS flights (Table 4) would require a track length of 112.5 km. The long flight track of 115 km also ensures that the random error of the flux measurement is below 10 % (Lenschow and Stankov 1986). Moreover, the aircraft-based fluxes are a good estimate of the surface fluxes, because all flights were conducted within the surface layer whose height is approximately 0.1\(z_{i}\) (Stull 1988). Finally, the selection of the cut-off wavelength is a crucial step, because it is assumed that atmospheric motions with wavelengths longer than the cut-off wavelength are not detected with an EC tower. Here, a scale of 2 km serves as a cut-off wavelength that was suggested by Williams et al. (1996) and Strunin and Hiyama (2005). Mauder et al. (2007b) also used this threshold and showed, for the Twin Otter aircraft flights, that the flux contribution from scales \(>\)2 km is approximately equal to the missing energy measured at EC stations in the Candle Lake area. Moreover, eddies of scale \(>\)2 km usually exceed the ABL height, and there is a spectral gap in the wavelet cospectra at a wavelength of 2 km (Mauder et al. 2007b) that separates the small-scale region from the mesoscale region (Vickers and Mahrt 2003).

The mean yearly values of \(R\) in the Candle Lake area, based on long-term EC measurements, vary between 0.76 and 0.95 with a mean \(R\) of 0.85 for all sites and years (Barr et al. 2012), which is close to the estimated value of \(R\) from the Twin Otter runs (Sect. 4.1). Consequently, the chosen cut-off wavelength delivers reliable results and it is assumed that the energy balance closure of near-surface EC masts can be assessed with airborne flux measurements in the surface layer.

5.2 Energy Balance Closure at the TERENO Sites

This study is the first to present the energy balance closure at all three sites of the TERENO Bavarian Alps/pre-Alps Observatory. With respect to the Fendt and Graswang sites, Mauder et al. (2013) already reported estimates on systematic errors, which were based on monthly-averaged energy balance ratios of 0.81 and 0.72, respectively. Thus, the energy balance at the Fendt site is usually closer to balance than at the Graswang site, although the data used in this study only show a minor difference (Table 3). The observed non-closure is in accordance with similar studies conducted over short vegetation (Wilson et al. 2002; Hendricks-Franssen et al. 2010; Stoy et al. 2013). The \(R\) value from 32 grassland sites from the FLUXNET database is \(0.86 \pm 0.20\), and for the crop sites, it is \(0.78 \pm 0.16\) (Stoy et al. 2013). Above agricultural land near Ottawa, Canada, a mean \(R\) of 0.77 was found (Mauder et al. 2010). During the LITFASS-2003 (Lindenberg Inhomogeneous Terrain – Fluxes between Atmosphere and Surface: a long term Study) experiment, a mean \(R\) of 0.75 was measured at a grassland site during daytime (Mauder et al. 2006; Foken et al. 2010). With respect to the patchiness of the land use, the last two experiments were conducted in landscapes that are similar to the TERENO area, but the topography around the TERENO sites is more complex. Nevertheless, the values of \(R\) are comparable, which suggests that surface heterogeneities due to fragmented land use might play a key role for the energy balance closure problem (Sect. 5.4).

Sites with a comparably poor non-closure such as at Rottenbuch \((R =0.60)\) can rarely be found in the literature. We can exclude serious measurement errors because the measurement set-up as well as the post-processing is identical at the other two sites except for the enclosed-path gas analyzer used at Rottenbuch (Table 1). In the FLUXNET database, there are several sites with a comparable or even poorer non-closure: Stoy et al. (2013) report values of \(R\) down to 0.26, even after excluding severe outliers because of quality control concerns. In addition, Hendricks-Franssen et al. (2010) mention two EC stations above short vegetation with a comparably poor non-closure. The site of ‘East Saltoun’ is located east of Edinburgh, United Kingdom, approximately 10 km from the coastline, and has a mean \(R\) of 0.49. The site ‘Grignon’ lies west of Paris, France, within a densely populated area and has a mean \(R\) of 0.68. These two EC stations might be influenced by local circulations, due to sea-breeze effects or due to an urban heat-island circulation. The measurements at Rottenbuch might be influenced by similar circulations because of its proximity to a village and to the steep Ammer valley (Fig. 2d).

5.3 Parametrization of Huang et al. (2008)

No correlation was found between \(-\langle I \rangle \) and \(u_{*}\), \(w_{*}\) or \(z_{i}\), which firstly could be attributed to problems in determining \(z_{i}\), which also affects the values of \(w_{*}\) (Eq. 10). The \(z_{i}\) values at the Candle Lake site vary considerably (Table 4), which causes most of the variability in \(u_{*}/w_{*}\). At the TERENO sites, the mixed-layer height \(z_{i}\) was determined with ceilometers by detecting the sharp decrease of particle concentration at the top of the ABL (Emeis et al. 2008). This requires, (a) a significant particle source at the surface, and (b) good vertical mixing up to ABL top. Aerosol layers in the free atmosphere might cause unrealistically high estimates of \(z_{i}\), whereas weak turbulent mixing results in an underestimation of \(z_{i}\). Nevertheless, single daily cycles measured at the TERENO sites show the well-known pattern with increasing values of \(z_{i}\) during the morning hours and the highest values in the late afternoon. Thus, at least during daytime convective conditions, ceilometers produce reliable results.

Furthermore, Huang et al. (2008) derived the functions \(f_{1}\) and \(f_{2}\) in a different way than used here. They first derived \(f_{1}\) by restricting their LES analysis to the heights between 0.3 and 0.5\(z/z_{i}\), where they found the maximum imbalance in the model, and afterwards, they determined \(f_{2}\) for zero background wind. In contrast, the measurements at Candle Lake and at the TERENO sites were conducted in the surface layer, as with most experiments focusing on air-land interactions. Furthermore, all daytime data with sufficient data quality (Sect. 3.2) were used, including wind speeds up to 10 m s\(^{-1}\).

In addition, it should be mentioned that this parametrization is based on LES above homogeneous terrain. However, according to Inagaki et al. (2006) and Steinfeld et al. (2007), turbulent organized structures are not the only transport mechanism that the temporal EC method is not able to capture. Thermally-driven mesoscale circulations also play a significant role, especially in the lower half of the ABL above heterogeneous terrain (Strunin and Hiyama 2005). Maronga and Raasch (2013) confirm the formation of secondary circulations due to surface differences in the upstream area. According to Hiyama et al. (2007), low values of \(u_{*}/w_{*}\) serve as a good indicator for the presence of such heterogeneity-induced local circulations, but we could not confirm that (Fig. 3).

The usual dimensionless scaling parameter for universal functions in the surface layer is \(z/L\), with \(L\) being the Obukhov length (Stull 1988), which differs mainly differs from \(u_{*}/w_{*}\) (Eqs. 8, 9) by the height scaling. For \(z/L\), the measurement height is an important parameter, but the term \(u_{*}/w_{*}\) is height-invariant due to the use of \(z_{i}\) instead of \(z\). Attempts to explain the observed energy imbalance by \(z/L\) have shown that \(R\) is lowest for very unstable conditions (Stoy et al. 2013). We did not find a comparable dependence on \(u_{*}/w_{*}\) (Fig. 3), so that the use of \(w_{*}\) for parametrizing the energy imbalance in the surface layer does not appear to be a promising approach.

Finally, a shortcoming of the configuration of current LES models should be mentioned. If the boundary conditions at the ground surface are prescribed by the heat fluxes and the model correctly captures the physics, the energy balance near the surface has to be closed. In LES models, the strength of turbulent organized structures, as well as the simulated energy imbalance, are negligible within the entire surface layer (\(z/z_{i} < 0.1\), Kanda et al. 2004; Inagaki et al. 2006; Steinfeld et al. 2007; Huang et al. 2008; Maronga and Raasch 2013). The latter contradicts almost all EC tower measurements so that we conclude that either (a) mesoscale motions are not the major reason of the unclosed energy balance of near-surface measurements, or (b) that the grid resolution of LES models is too coarse near the surface. The simulations of Huang et al. (2008) were obviously not able to resolve all turbulent motions due to the limited grid resolution, so that the major part of the turbulent energy had to be parametrized. Furthermore, many subgrid parametrization schemes work better away from wall regions, and Monin-Obukhov similarity theory, which was used for the first gird layer of the LES grid, is based on horizontal homogeneity. In other words, heterogeneity-induced secondary circulations might also have a high impact on transport processes near the surface and might even make contact with the ground (Foken 2008), but this can neither be justified nor falsified with current LES models.

5.4 Parametrization of Panin and Bernhofer (2008)

In contrast to the functions suggested by Huang et al. (2008), which consider boundary-layer dynamics, the empirical approach of Panin and Bernhofer (2008) focuses on the role of roughness heterogeneities for the non-closure of near-surface EC measurements. Following a similar hypothesis, Stoy et al. (2013) recently demonstrated for 180 FLUXNET sites across the world that the energy imbalance can be related to the variability of the vegetation within an area of 20 km by 20 km surrounding the measurement site.

The parametrization of Panin and Bernhofer (2008) does not explain the entire missing energy (Fig. 5). This finding suggests that heterogeneity-induced mesoscale motions are not the only reason for the unclosed energy balance, and other effects that are not included in the parametrization, such as instrumental issues, also play a role. Moreover, there might be factors other than \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\), such as topography, differences in surface temperature or surface moisture, that induce mesoscale circulations, see below.

For the Candle Lake area, the observed imbalance is quite close to that derived from the data used by Panin and Bernhofer (2008), but for the TERENO sites, especially for Rottenbuch and Fendt, the imbalance is larger than expected (Fig. 5). One reason for the poor closure at the TERENO sites might be the use of non-orthogonal sonic anemometers that are suspected to underestimate sensible heat fluxes (Kochendorfer et al. 2012; Frank et al. 2013). However, this problem is still under debate (Mauder 2013) and it also cannot explain the different magnitudes of underestimation at the different sites. Probably the observed deviations from the data of Panin and Bernhofer (2008) can be better explained by the fact that not all surface heterogeneities are represented in a \(z_{0}\) map. It should be noted that the experiments on which they developed their parametrization were mainly conducted over flat terrain and in remote areas. So, surface heterogeneities are mainly caused by differences in vegetation and, therefore, calculating \(L^\mathrm{eff}\) based on a \(z_{0}\) map is sufficient to determine the heterogeneity of the landscape. The topography of the Candle Lake area is also flat and the lake-forest boundaries, which are clearly visible in the \(z_{0}\) maps (Fig. 4a), are the dominant heterogeneities. The data from Graswang are also close to the data from Panin and Bernhofer (2008), although the topography is very complex. However, the mountains can be recognised in the \(z_{0}\) map, because the mountain slopes are usually covered with forest and mountain tops are rocky areas. Thus, at Graswang, \(L^\mathrm{eff}\) determined on the basis of \(z_{0}\) is a good estimate for the dominant horizontal scale. Regarding the \(z_{0}\) maps of the Rottenbuch and Fendt sites (Fig. 4c, d), not all surface heterogeneities are clearly visible. Forests and villages, for example, have a similar roughness length, but usually a different surface temperature. However, a re-analysis of the data with an unrealistic high surface roughness for villages (\(z_{0} = 30\) m) did not improve the results (data not shown).

If we plot the values of \(R\) from the Fendt (Fig. 6a) and Rottenbuch (Fig. 6b) sites for different wind directions \((\varphi )\) it can be seen that the surrounding landscape affects the EC measurements. For southerly winds (\(157.5^{\circ }<\varphi < 247.5^{\circ }\) for Fendt, \(112.5^{\circ } < \varphi < 202.5^{\circ }\) for Rottenbuch), the energy balance closure is significantly lower than for all other directions (one-sided Wilcoxon rank sum test, \(p < 0.05\)). Due to the foehn effect of the Alpine mountains, warm and dry air is brought from southern directions, but the EC method does not account for horizontal advection. The secondary minimum at the Fendt site for the northern wind sector \((-22.5^{\circ }< \varphi < 22.5^{\circ })\) might be caused by the advection of moist air, because there is a small lake surrounded by swampy terrain approximately 3 km north of the site (visible in Fig. 4d as a small bluish spot). As explained Sect. 5.2, small-scale effects might also play a role at the Rottenbuch site that exhibits an extraordinarily poor closure.

Fig. 6
figure 6

30-min \(R\) values for different wind directions at Fendt (a) and Rottenbuch (b) sites; the data were grouped into wind sectors of \(45^{\circ }\) size and the respective sample sizes are indicated as grey italic numbers. The black dots indicate the sample median and the error bars represent the interquartile range

Besides our main criticism that Panin and Bernhofer (2008) only consider roughness heterogeneities, we also argue that the suggested heterogeneity index \(z_{0}^\mathrm{eff}/L^\mathrm{eff}\) is not well chosen. There is no evidence that the magnitude of \(z_{0}^\mathrm{eff}\) has any influence on the energy balance closure because large imbalances were found above various land-use types (Hendricks-Franssen et al. 2010; Stoy et al. 2013). For constant \(z_{0}^\mathrm{eff}\), the heterogeneity index increases with decreasing \(L^\mathrm{eff}\). Thus, the shorter \(L^\mathrm{eff}\), the larger should be the energy balance residual. However, this contradicts theoretical considerations (Dalu et al. 1996), scaling approaches (Mahrt 2000) and LES results (Raasch and Harbusch 2001; Patton et al. 2005) on heterogeneity-induced mesoscale transport, which found that there is an optimal surface length scale with respect to the strength of mesoscale motions. Particularly, the heterogeneities should be at least as large as \(z_{i}\), so that there should be a minimum threshold for \(L^\mathrm{eff}\), below which no considerable imbalance occurs. However, the estimated \(L^\mathrm{eff}\) of the TERENO sites and the Candle Lake area are larger than the corresponding values of \(z_{i}\), so that we could not identify a decrease of the imbalance for small horizontal length scales.

Finally, a short methodological note should be made: when calculating the Fourier spectrum along a transect on a \(z_{0}\) map, multiple local maxima can occur in the spectrum, implying that the landscape is not dominated by one single heterogeneity scale, but is a superposition of multiple scales. Here, we followed Panin and Bernhofer (2008) and only considered the global maximum, although we might have neglected additional heterogeneity scales that also have an impact on the exchange process.

All in all, the parametrization approach of Panin and Bernhofer (2008) appears to work only if \(L^\mathrm{eff}\) captures, through \(z_{0}\), the dominant landscape-scale heterogeneities. In our study, this is only the case for the Candle Lake area and, to a minor extent, for the Graswang site. Consequently, future parametrizations should not neglect the influences of surface temperature, surface moisture and complex topography, which might also be responsible for the generation of secondary circulations. A thorough assessment of the importance of all surface heterogeneities for the energy balance closure problem is beyond the scope of the present paper.

5.5 Partitioning of the Missing Energy Between \(\hbox {Q}_\mathrm{H}\) and \(\hbox {Q}_\mathrm{E}\)

A parametrization suitable for correcting the measured turbulent heat fluxes also has to consider the partitioning of the missing energy between \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\). Foken et al. (2012) recommend correcting both energy fluxes using the measured Bowen ratio. This implies the assumption of scalar similarity, which is justified for the high-frequency range of turbulence (Pearson et al. 1998), but cannot be taken for granted at longer wavelengths (Ruppert et al. 2006). Therefore, the Bowen ratios of the small-scale \((Bo_\mathrm{ss})\) and the mesoscale range \((Bo_\mathrm{ms})\) were determined from the flux contributions of the Twin Otter aircraft flights (Fig. 7). In accordance with Lamaud and Irvine (2006), the mean Bowen ratio itself influences the partitioning of \(T\) and \(q\) in the mesoscale range. For \(Bo_\mathrm{ss} \approx 1\), the assumption of scalar similarity could be confirmed, but if \(Bo_\mathrm{ss}\) largely differs from unity, \(Bo_\mathrm{ms}\) is mostly smaller than the respective \(Bo_\mathrm{ss}\). This implies that assigning the missing energy to \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\) according to the measured Bowen ratio would result in an underestimation in \(Q_\mathrm{E}\).

Fig. 7
figure 7

Bowen ratio of the small-scale turbulence range (\(Bo_\mathrm{ss}\), eddy scale \(<\)2 km), that is supposed to be measurable by standard EC techniques, versus the respective Bowen ratio of the mesoscale range (\(Bo_\mathrm{ms}\), eddy scale \(>\)2 km) from the Twin Otter aircraft flights during BOREAS and BERMS. For comparison, the dashed line indicates the assumption of scalar similarity for closing the energy balance

This finding is in accordance with Barr et al. (2006), who found a larger flux deficit for \(Q_\mathrm{E}\) in the Candle Lake area. This can be explained by the fact that the dominating surface heterogeneities are the lakes that are distributed within the boreal forest. These lakes can be regarded as small, wet patches within a relatively homogeneous ‘matrix’ and, following Moene and Schüttemeyer (2008), these wet patches of mesoscale size mainly affect the mesoscale fluxes of \(Q_\mathrm{E}\). Consequently, the finding that \(Q_\mathrm{E}\) is more sensitive to an underestimation seems to be a site-specific effect and cannot be transferred to other areas. Studies above forests or agricultural sites, where differences in surface temperature are larger, found that the residual should be mainly added to \(Q_\mathrm{H}\) (Stoy et al. 2006; Ingwersen et al. 2011). In summary, we hypothesize that the partitioning of the missing energy between \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\) depends on the surface properties of the surrounding landscape: if surface moisture differences are more important than surface temperature differences, the EC method underestimates \(Q_\mathrm{E}\) to a greater extent than \(Q_\mathrm{H}\).

6 Conclusions

We investigated the energy balance closure of tower-based measurements from the Bavarian Alps/pre-Alps observatory of the German TERENO network. In addition, we also estimated the closure from aircraft data obtained in Canada: the flux contribution of atmospheric motions with wavelengths longer than 2 km served as an estimate of the missing energy of near-surface tower measurements. The energy balance ratios of the aircraft flights (0.86–0.91) are larger than at the TERENO sites (0.60–0.77). The main focus was on testing two energy balance closure parametrization schemes of Huang et al. (2008) and of Panin and Bernhofer (2008). However, both approaches are not able to predict the closure, but the latter leads at least to qualitatively correct results.

The approach of Huang et al. (2008) was not suitable for our measurements in the surface layer and the proposed dependencies of the imbalance on \(u_{*}/w_{*}\) and \(z/z_{i}\) could not be confirmed. The main problem of this approach is that it is only based on LES studies, which have insufficient grid resolution near the surface, so that the relevant transport processes are not resolved. Additionally, this parametrization only considers the homogeneous ABL and neglects the influence of an underlying heterogeneous surface. This is the main asset of the approach of Panin and Bernhofer (2008), which focuses on surface roughness heterogeneities. The non-closure at the Candle Lake site is in good agreement with the data analyzed by Panin and Bernhofer (2008), but the observed non-closure at the TERENO sites in the pre-alpine region is worse than expected using this approach. This might be due to neglecting additional surface heterogeneities, e.g. complex topography or differences in surface temperature and moisture. We are not able to state which type of surface heterogeneity is the most important for parametrizing the energy balance closure, but our wind-direction dependent analysis indicates that there is indeed an influence of the upwind surface characteristics on the landscape scale. Particularly, the advection of warm air through foehn winds from the south strongly contributes to the non-closure of the energy balance.

In addition, we have analyzed the aircraft data from the Candle Lake area with regard to the partitioning of the mesoscale flux between \(Q_\mathrm{H}\) and \(Q_\mathrm{E}\). For Bowen ratios around unity, we could confirm the recommendation of Foken et al. (2012) to correct the fluxes with the Bowen ratio. But for Bowen ratios different from unity, the mesoscale structures mostly transport more latent heat than sensible heat.

Therefore, we suggest that an analysis of tower and aircraft data for the same area might give more insights into the role of large-scale structures on point measurements. With respect to the tower measurements, special attention should be addressed to longer wavelengths, i.e. time scales \(>\)30 min. Such investigations should go in hand with LES models that have a fine grid resolution close to the surface. Future parametrization approaches should probably also include the effects of topography and surface temperature besides surface roughness. Appropriate parameters for characterizing the relevant meteorological conditions that control the energy balance closure remain to be found.