1 Introduction

Processes such as frost, fog formation, and ground freezing as well as thermal comfort indices are directly affected by nocturnal near-surface air temperature, itself strongly reflecting the magnitude of the surface-atmosphere interaction. As a consequence, local surface features, such as terrain, vegetation, land-use, and proximity to obstacles may exert a significant control by enhancing the horizontal variability of near-surface air temperature, complicating its forecast (Gustavsson et al. 1998; Acevedo and Fitzjarrald 2003; Mahrt 2006; Bodine et al. 2009; Acevedo et al. 2013; Medeiros and Fitzjarrald 2014, 2015).

The degree of nocturnal temperature horizontal variability is itself highly dependent on the stable boundary-layer (SBL) flow. On the nights with a sufficient horizontal boundary-layer pressure gradient, or in concert with reduced radiative cooling associated with cloudiness, the weakly stable case, air near the surface is connected through turbulence to the upper SBL. For such conditions, the horizontal variability of near-surface temperature is largely reduced over areas of linear dimension from hundreds of metres (Bodine et al. 2009; Acevedo et al. 2013) to tens of kilometres (Acevedo and Fitzjarrald 2001). Conversely, on clear-sky nights with reduced large-scale forcing, the surface often decouples from the upper SBL. These nights are classified as very stable and, for such conditions, cold-air pooling is initiated, so that nocturnal air temperature varies significantly over small differences in altitude. Bodine et al. (2009) found a 5 K temperature difference over 300 m of horizontal separation with an altitude difference of 25 m. Acevedo et al. (2013) found that such a difference reached 10 K over similar horizontal and vertical distances. In the latter case, it is likely that the temperature difference was increased by the fact that the lower station was obstructed, further limiting local mixing and subsequent turbulent interaction with other regions of the SBL. Thus, on calm nights, although much of the nocturnal temperature variability may be attributed to topography, other local features such as the proximity to obstacles must also be considered. On such nights, the lower atmosphere over higher locations usually remains turbulent throughout the night, therefore becoming substantially warmer relative to the lower, decoupled locations (Acevedo and Fitzjarrald 2003; Medeiros and Fitzjarrald 2014, 2015). According to an objective classification of the nocturnal boundary layer based on the heat-flux dependency on stability (Mahrt 1999; Acevedo and Fitzjarrald 2003), weakly stable conditions occur when the absolute flux increases with stability because of the enhanced thermal gradient, while in the very stable case the enhanced stability dampens turbulence such that the heat-flux magnitude decreases with stability.

Forecasting nocturnal temperature is, therefore, a difficult task. The large horizontal variability of temperature over relatively small areas implies that, unless windy conditions prevail, no single temperature is representative over the area of a common grid cell of a mesoscale numerical model, which is usually of a few kilometres square. Furthermore, each model is sensitive to the boundary-layer turbulence parametrization scheme (Cuxart et al. 2006; Steeneveld et al. 2006; Svensson et al. 2011).

The parametrization schemes usually relate turbulence to atmospheric stability through a stability function, which is obtained from theoretical considerations and observations (Louis 1979; Delage 1997). One particularly important issue regarding stability functions refers to the turbulence they prescribe for strong stability. Basic SBL theory indicates that there is a threshold (related to the critical Richardson number) above which turbulence is effectively suppressed. However, when such a characteristic is implemented in a numerical weather model, it often leads to a process known as “runaway cooling” (Louis 1979; Steeneveld et al. 2006; Holtslag et al. 2013), for which the surface temperature decreases unrealistically due to radiative loss not being counteracted by the turbulent transport of warmer air from above. In fact, the stability function proposed by Louis (1979) retains some turbulence in very stable conditions to avoid such a problem. The same idea of retaining finite mixing even for intense stratification has also been justified as a means of accounting for localized turbulent activity within the area of a model grid cell, as is usually observed at locations with relatively higher altitudes in a region (Mahrt 1987; Delage 1997). Medeiros and Fitzjarrald (2014, 2015) found that the model use of the high Richardson number threshold could be understood observationally to be a result of spatially averaging surface temperatures in regions of moderately complex topography.

Cuxart et al. (2006) compared single-column models using different stability functions and concluded that schemes used by each operational weather forecasting model tended to overestimate mixing in the SBL. Svensson et al. (2011) made a similar comparison, finding that all compared schemes provided excessive turbulence at night, except for those considered in the Weather Research and Forecasting (WRF) model, viz., the Yonsei University (YSU) and Mellor–Yamada–Janjic (MYJ) schemes.

In the WRF model (Skamarock et al. 2008), the planetary boundary-layer (PBL) parametrization schemes have been subject to different adjustments, many of which focus on the level of nocturnal mixing for very stable conditions. Hu et al. (2010) found that the YSU PBL scheme provided higher nighttime temperatures based on an update that increased nocturnal mixing in the referred scheme (implemented in version 3.0), while the MYJ parametrization scheme produced lower temperatures when compared to observations. Jiménez et al. (2012) found that wind-speed simulations improved with the use of different stability functions and of a lower friction velocity minimum. The proposed changes have been implemented in subsequent versions of the WRF model (from 3.1.1). An additional update to the YSU PBL parametrization scheme that reduced turbulence (implemented in version 3.4.1) has been tested by Hu et al. (2013) and found to reduce the simulated nocturnal temperatures, which then approached values simulated using the MYJ parametrization scheme. Kleczek et al. (2014) found that the WRF model tends to underestimate nocturnal temperatures, regardless of the PBL scheme used.

In the present study, it is hypothesized that, in addition to the difficulty in properly quantifying turbulence in the stable environment, a large fraction of the nocturnal temperature forecast errors arise from the naturally occurring large horizontal variability of meteorological variables in the period. The WRF model nocturnal temperature simulations are compared to observations over a network of 26 stations in southern Brazil, for 31 different nights. The results using four PBL parametrization schemes, Bougeault–Lacarrere (BouLac), Quasi-Normal Scale Elimination (QNSE), YSU and MYJ are compared. The MYJ and YSU schemes have been chosen because they are popular choices of local and non-local schemes in similar studies, having been chosen by Svensson et al. (2011) to represent the WRF model in their extensive comparison of numerical models and respective PBL schemes. The BouLac scheme is tested because it is a scheme that provides higher turbulence levels than the other schemes, so that its inclusion better allows association of the quality of the temperature forecasts with turbulence-related quantities. The QNSE scheme is considered because it has been recently proposed specifically to address the problems that arise in the SBL (Sukoriansky et al. 2005). The influence of horizontal and vertical grid resolutions is also considered, but only for the MYJ scheme.

The main novelty regards the fact that the range of stations and nights used in the comparison allow us to investigate not only the influence of the PBL schemes on temperature forecasts, but also how the schemes perform for locations with different terrain characteristics and for nights with different large-scale forcing. Therefore, we use stations located at lower and higher altitudes relative to their surroundings. Furthermore, there are nights when the boundary layer at most of the stations is in a weakly stable state, while on the other nights, very stable conditions prevail.

We focus on the influence of terrain characteristics and atmospheric stability on nocturnal temperature forecast errors, and the results of an analysis on the effects of cloud cover on the temperature forecast are also presented. Influences that are not considered include those from land use, soil temperature and humidity, proximity to the coast or obstacles, occurrence of breezes. These aspects are not addressed for three main reasons: simplicity of the analysis; lack of observational data on many of these quantities at all stations, and the fact that it is shown that a large fraction of the errors are, in fact, explained by the factors included in the analysis.

2 Model Setup

We use the ARW-WRF model, version 3.6.1, with initial conditions provided by the Global Forecast System (GFS) model analysis. Boundary condition adjustments are provided every 6 h, also by the GFS analysis; all 31 nights of July 2012 have been simulated. A 15-h long spin-up time has been applied, so that the model simulation commences at 1200 UTC (0900 Local Standard Time, LST) of the previous day, being integrated until 0900 UTC. Hourly fields are extracted from 0300 UTC to 0900 UTC, resulting in seven values per station and per night for each variable. Three two-way nested domains are used (Fig. 1), where the larger covers longitudes \(75.5^{\circ }\hbox {W}{-}31.9^{\circ }\hbox {W}\) and latitudes \(11.3^{\circ }\hbox {S}{-}48.1^{\circ }\hbox {S}\), with a 48-km resolution. The second grid covers \(66.5^{\circ }\hbox {W}{-}40.9^{\circ }\hbox {W}\) and \(41.0^{\circ }\hbox {S}{-}18.4^{\circ }\hbox {S}\), with a 12-km resolution. In four of the model configurations, only the solutions from the second grid are considered. To address specifically the improvements that may arise from the use of a finer inner grid, several simulations with a higher horizontal resolution were also done. In these cases, a third grid covers longitudes \(60.4^{\circ }\hbox {W}{-}47.0^{\circ }\hbox {W}\) and latitudes \(35.3^{\circ }\hbox {S}{-}24.1^{\circ }\hbox {S}\) with a 4-km horizontal grid.

Fig. 1
figure 1

Positions of the three domains used in the simulations with respect to the South American map

The Lambert projection is used with 28 vertical terrain-following hydrostatic-pressure levels (Skamarock et al. 2008; Wang et al. 2015) in the three domains. The lowest level is located at 28 m and the thickness of the subsequent layers increases with height until the domain top, located at 100 hPa, such that there are five levels in the lowest 500 m. Simulations with higher vertical resolution were done, and in such cases, there were 56 vertical levels, with the lowest near 10 m and a total of 18 levels in the lowest 500 m.

In the simulations with 28 vertical levels, the timestep for the coarser domain is 288 s, decreasing accordingly to the grid resolution at the inner domains, to 72 s at the second domain, and to 24 s at the finer grid. When the 56 vertical levels are used, all timesteps are correspondingly reduced by half. The topographic data used in the two coarser grids are from the United States Geological Survey (USGS) and have a horizontal resolution of 2 min, adjusted to the model grid points. For the finest grid, the terrain information is obtained from a topographic dataset with a 30-s horizontal resolution. Land-use, roughness and vegetation type have also been obtained from a 2-min resolution USGS database.

For the two coarser grids, the Lin microphysics scheme is used (Lin et al. 1983), while for the finer grid the WRF-model single-moment-3-class microphysics scheme is applied (Hong et al. 2004). The Noah land-surface scheme (Tewari et al. 2004) and the Kain–Fritsch cumulus physics scheme (Kain 2004) are used; for the longwave radiation, the Rapid Radiative Transfer Model (RRTM) scheme is applied (Mlawer et al. 1997), and the Dudhia shortwave scheme is also used (Dudhia 1989).

Four PBL parametrization schemes are used for the two coarser grids: Bougeault–Lacarrere (BouLac; Bougeault and Lacarrere 1989), Quasi-Normal Scale Elimination (QNSE; Sukoriansky et al. 2005), Yonsei-University (YSU; Hong and Pan 1996; Hong et al. 2006) and Mellor–Yamada–Janjic (MYJ; Mellor and Yamada 1982; Janjic 1994, 2002). The simulations with higher horizontal and vertical grid resolutions are performed only using the MYJ scheme, chosen because this is the parametrization scheme that provides the lowest turbulence levels of all, so that it tends to enhance local effects. For this reason, it is also expected that the use of a finer grid in this scheme will have a greater influence than in the others. A summary of all simulations is presented in Table 1, while the main characteristics of the PBL parametrization schemes are described in the Appendix.

Table 1 Summary of all simulations performed

For similar values of the Obukhov stability parameter z / L (z is height, L is the Obukhov length), the MYJ scheme (regardless of the horizontal and vertical grid resolutions used) usually provides lower turbulence than the BouLac, QNSE and YSU schemes (Fig. 2). The exception is in the very stable limit (large z / L), when all schemes tend, on average, to provide similar values of friction velocity \(u_*\). In most extreme cases of stability, the minimum values of \(u_*\) provided by the MYJ scheme are actually slightly greater than those provided by the other schemes (figure not shown), indicating that for the very stable limit the MYJ scheme retains greater turbulence levels than in the others. For \(z/L>0.02\), the BouLac and YSU schemes provide similar average values of \(u_*\), a consequence of the use of the same surface-layer scheme in both parametrization schemes. In the near-neutral limit (small z / L), however, the BouLac scheme tends to provide higher turbulence levels than does the YSU scheme, while the QNSE scheme generally provides higher turbulence levels than the MYJ scheme, but lower levels than the BouLac and YSU schemes (for similar values of z / L).

Fig. 2
figure 2

Friction velocity \(u_*\) as a function of the stability parameter z / L. All values are obtained from the model fields. The lines represent the different PBL scheme, horizontal and vertical grids resolutions used (see Table 1). Results are block-averaged for all stations and nights being analyzed. Each point represents the average over 200 values

3 Observations

The model fields are compared to observations collected at 26 stations from the Brazilian National Meteorology Service (Instituto Nacional de Meteorologia—INMET) during all 31 simulated nights. These are all stations in operation during that period in the southernmost Brazilian state, Rio Grande do Sul (Fig. 3). The stations represent a range of surface conditions, and include coastal stations, with those furthest inland about 550 km from the coast. Station altitudes vary from sea level to 1230 m (Online Resource 1), with the highest altitudes being located in the north-eastern portion of the State. At such higher regions, forests are still present. The western, central and south-western regions of the state are part of the Pampa plains, whose dominant vegetation is pasture, used for cattle grazing. Crop cultivation dominates the northern half of the state. Air temperature (at 2-m height) and wind speed (at 10 m) are observed as hourly grabbed samples at these stations, and those from 0000 to 0600 LST are used in the present study. The period of observations is during winter, with July climatologically the coldest month of the year.

Fig. 3
figure 3

Rio Grande do Sul state map, showing the locations of the stations used in the study. Grey scale shows topography (m)

The temperature and wind observations are compared to model observations at the nearest model grid point, following common practice (Buckley et al. 2004; Jiménez et al. 2010; Müller 2011; Soares et al. 2012; Xie et al. 2012). This procedure introduces an error because usually the nearest grid point and the station are located at different altitudes and in some cases this difference between the altitudes is large. In the present case, using the 12-km horizontal grid resolution, the lowest station with respect to its closest grid point is Santa Maria (grid point is 93 m higher relative to that of the station), while the opposite occurs at Caçapava do Sul, where the station is located at an altitude 200 m higher relative to the nearest grid point. When the 4-km horizontal resolution is used, these differences tend to diminish, but not substantially (Online Resource 1).

Because there is an altitude difference between station and the nearest grid point, it is important that the model temperatures are compensated for the adiabatic variation before they are compared to the observations. This is presently done by subtracting \(g \left( {H_{\textit{station}} -H_{\textit{gridpoint}}} \right) C_p^{-1} \) from the model temperature fields, where g is the acceleration due to gravity, taken as 9.8 m s\(^{-2}\), H is the altitude of the station or nearest grid point with respect to the mean sea level, and \(C_p \) is the specific heat of the air at constant pressure, taken as 1005 J kg\(^{-1}\) K\(^{-1}\). At the station with the maximum height difference, Caçapava do Sul, this correction reduces the model temperature values by \(2.0\,^{\circ }\hbox {C}\), while at Santa Maria, the station lowest relative to the nearest grid point, the model temperature values are increased by \(0.9\,^{\circ }\hbox {C}\). In less than 10 % of the cases, the atmosphere was near saturation implying the need of a pseudo-adiabatic lapse rate for correction. Different corrections, dependent on the proximity to saturation, have been tested, minimally affecting the results. For this reason, to retain simplicity, the dry-adiabatic correction was used in all cases.

Cloud conditions have been analyzed in order to address whether there is any systematic temperature error associated with a given cloud cover. The cloud conditions were inferred for each hour and station using an algorithm based on brightness temperature \((T_{\textit{sat}} )\) from Geostationary Operational Environmental Satellite (GOES) 13 images. This was performed for 22 out of the 31 nights presently analyzed because such information was unavailable from 1 to 9 July. For each hour of analysis, an image produced within 20 min of the exact hour was used, and if no image was available in that period, no image was considered for that hour. The absence of images occurred in 16 % of the hours in the 22 nights considered. The cloud condition was classified as “clear sky” if \(T_{\textit{sat}} >\left( {T_{\textit{station}} -5} \right) ^{\circ }\hbox {C}\) where \(T_{\textit{station}}\) is the temperature observation from the surface network in \(^{\circ }\hbox {C}\). When \(-20\,^{\circ }\text {C}< T_{\textit{sat}} < ({T_{\textit{station}}-5})^{\circ }\hbox {C}\), it was classified as “low cloud”. The \(5\,^{\circ }\hbox {C}\) threshold is used because \(T_{\textit{sat}} \) has been found to overestimate \(T_{\textit{station}}\) in clear-sky conditions (Prihodko and Goward 1997), and guarantees the correct classification of the cases classified as “low-clouds”, although it is possible that some cases classified as “clear sky” actually presented very low clouds with warm cloud tops. It was classified as “mid-level clouds or shallow moist convection” when \(-30\,^{\circ }\hbox {C}< T_{\textit{sat}} < -20\,^{\circ }\hbox {C}\) and as “high-level cloud or deep moist convection” when \(T_{\textit{sat}} < -30\,^{\circ }\hbox {C}\). This discrimination criterion has been based on the mean temperature of the cloud layer as determined by the observed temperature profile. Despite the efforts to ensure the best classification, it is a simple method and other errors may occur.

Fig. 4
figure 4

Comparison between observed and simulated 2-m temperatures (height-corrected, as described in Sect. 3) for all nights and stations. Each panel shows the results using a different PBL scheme, as identified above the panels. All cases refer to the 12-km horizontal and lower vertical grid resolutions

4 Results

4.1 Observed and Simulated Temperatures

The direct comparison between observed and simulated temperatures indicates that PBL parametrization schemes underestimate the nocturnal temperature (Fig. 4), in agreement with Xie et al. (2012) utilizing version 3 of the WRF model, Hu et al. (2013) utilizing the MYJ scheme and an updated version of the YSU scheme implemented in version 3.4.1, and Kleczek et al. (2014), using version 3.4.1. Using earlier WRF-model versions, previous studies that found temperature overestimation include Hu et al. (2010) for the YSU scheme, version 3.0.1, García-Díez et al. (2013) in winter with the MYJ and YSU schemes, version 3.1.1, and Bosveld et al. (2014), version 3.0. The aforementioned studies used different sets of PBL schemes. In general, the different choices of parametrization schemes give similar scatter with respect to the observations, with nearly identical root-mean-square (r.m.s.) error between parametrization schemes and observations (Table 2). Even using larger vertical or horizontal resolution has little effect on the mean r.m.s. error. The mean temperature bias is negative for all parametrization schemes, except for the BouLac scheme, confirming the tendency of the WRF model to underestimate surface temperatures. It should be noted that the higher temperatures are obtained with parametrization schemes that give higher turbulence levels for the same stability condition. For this reason, the BouLac scheme provides a warm bias \((0.2\,^{\circ }\hbox {C})\), while amongst the simulations with low horizontal and vertical resolutions, the QNSE scheme gives the coldest bias \((-0.4\,^{\circ }\hbox {C})\), despite yielding higher average turbulent intensity than does the MYJ scheme. This improved performance of the QNSE scheme is mostly influenced by the results from weak-wind cases, see below. This general result contrasts with Xie et al. (2012) and Kleczek et al. (2014), who found non-local PBL schemes, such as the YSU scheme, give higher temperatures than local schemes, such as the BouLac scheme. The use of a finer horizontal resolution reduces only slightly the cold bias of the MYJ scheme (to \(-0.2\,^{\circ }\hbox {C}\), in run MYJ-4), and this improvement is mostly caused because the finer horizontal grid allows representing a given station by a closer grid point. Along this line, Müller (2011) found that enhanced horizontal resolution only improved the temperature forecast in some cases, mainly in complex terrain, while Zhang et al. (2013) stated that “... simulations at finer resolutions do not outperform those at coarser resolutions in most cases”. The finer vertical resolution does not affect the mean r.m.s. error by much, but enhances the cold bias of the corresponding MYJ simulations. This result is similar to that of Kleczek et al. (2014), who found slightly lower temperatures at night when a higher vertical resolution is used, and to that of Zhang et al. (2013), who found that the mean absolute error of nocturnal forecasts is not affected by the increase in the vertical resolution.

Table 2 Root-mean square (r.m.s.) and mean bias errors of simulated 2-m temperatures with respect to the observed temperatures, for each simulation performed

The model reduces the observed spatial variability, by underestimating the highest temperatures and slightly overestimating the smallest values. This tendency is confirmed when the observed spatial variability of temperatures over the network is compared with the variability of the simulated values. Regardless of the parametrization scheme, the spatial variability of temperatures in the model is always reduced with respect to what is observed across the surface stations (Fig. 5).

Fig. 5
figure 5

Block-averaged simulated spatial standard deviation of 2-m temperatures inferred from the 26 values from all stations at each hour as a function of the same quantity inferred from the observations. Each line represents a different PBL scheme, horizontal and vertical grid resolutions used

4.2 Dependence on Wind Speed

Larger deviations amongst the different parametrization schemes appear when the temperature forecast error is compared to the wind speed (Fig. 6a). In all cases, there is mean temperature overestimation with observed small wind speeds, and underestimation with larger wind speeds. With larger wind speeds, the degree of temperature underestimation is directly dependent on the intensity of turbulence that the parametrization scheme gives for similar stabilities (show in Fig. 2). For wind speeds >2 m s\(^{-1}\), the smallest cold bias is obtained with the scheme that provides highest turbulence levels (BouLac) and the cold bias progressively increases for the YSU, QNSE and MYJ schemes, in the same order as the turbulent intensity calculated by each scheme decreases. This fact indicates that, for wind speeds >2 m s\(^{-1}\), parametrization schemes that provide higher turbulence levels for similar values of stability also produce higher temperatures, as expected. This result, along with the fact that in these conditions (observed wind speeds >2 m s\(^{-1}\)) all parametrization schemes underestimate the observed temperatures, leads to the conclusion that the temperature forecasts provided by all schemes tend to improve if the schemes provide higher turbulence levels in weakly stable conditions.

Fig. 6
figure 6

a Block-averaged 2-m temperature simulation errors (height-corrected) as a function of the observed 10-m wind speeds, for each PBL scheme, horizontal and vertical grid resolutions used. b Same as in (a), but for block-averaged simulated 10-m wind speeds. c Same as in (a), but for block-averaged simulated friction velocities

For weak winds, on the other hand, there is not a perfect correspondence between the temperature bias and the turbulence provided by each scheme. In such conditions, the QNSE scheme performs better than all other schemes, having the smallest warm bias for observed wind speeds \({<}\)1 m s\(^{-1}\), showing that this scheme is suited for simulating very stable conditions. The use of higher vertical resolution affects mostly weak-wind conditions, as could be expected, given that this is the situation when the turbulence vertical scales are reduced. For small wind speeds, temperatures originated from a high vertical resolution model (labeled “MYJ-h” and “MYJ-4h” in Fig. 6a) are usually lower than the corresponding simulations with lower vertical resolution (“MYJ” and “MYJ-4”, respectively). For the most stable cases, when there is a warm bias, this cooling effect causes the simulated temperatures to approach the observed values, but for wind speeds \({>}\)1.5 m s\(^{-1}\), the opposite occurs. For 10-m wind speeds \({>}\)5 m s\(^{-1}\), the vertical resolution has no effect on the average final temperatures.

The model tends to largely overestimate wind speed in weak-wind conditions, regardless of the chosen parametrization scheme (Fig.6b). For larger wind speeds, the wind-speed forecast approaches the observations on average, and these are the conditions when the temperature becomes underestimated. In the average for all stations, the friction velocity seems to approach a constant value at the low wind limit (Fig. 6c), a possible consequence of intentionally enhancing mixing for very stable conditions in all parameterization schemes. The cooling effect of the higher vertical resolution is associated with slightly lower turbulence levels (Fig. 6c). This fact suggests that the higher vertical resolution allows a better representation of surface decoupling in very stable conditions. However, Fig. 6c shows that the decrease in \(u_*\) with higher vertical resolution is very subtle indicating that, in most cases, doubling the vertical resolution is not enough to simulate properly this phenomenon.

A better understanding on how wind speed, turbulence and temperature interplay in the model with respect to the observations can be inferred if the results from specific stations are analyzed separately. In order to do that, the stations Alegrete and Caçapava are chosen, because these sites represent contrasting extremes of terrain. Alegrete station (121-m altitude) is located in relatively low terrain with respect to its surroundings, topography that favours nocturnal cold-air accumulation around the station. The consequence is that small wind speeds are common at Alegrete, where 60 % of the nocturnal observations show wind speeds \(\le \)1 m s\(^{-1}\) (Fig. 7a). The problem in forecasting temperature at this station is enhanced because the nearest grid point is situated at a height considerably higher relative to the actual station (198 m in the 12-km grid, 181 m in the 4-km grid). The second station considered is at Caçapava do Sul (450-m altitude), where the actual altitude of the station is considerably higher relative to the nearest grid point (250 m in the 12-km grid, 268 m in the 4-km grid). At Caçapava, 88 % of the nocturnal observations used in the present study were between 2 and \(6 \hbox { m } \hbox {s}^{-1}\) (Fig. 7a).

Fig. 7
figure 7

a Frequency distributions of observed 10-m mean wind speeds in Alegrete (blue) and Caçapava do Sul (red). b Block-averaged simulated 10-m wind speeds as a function of the observed wind speeds for Alegrete (solid lines) and Caçapava do Sul (dotted), for each PBL scheme, horizontal and vertical grid resolutions used. c The same as in (b), but for the block-averaged 2-m temperature simulation errors (height-corrected). d The same as in (b), but for the block-averaged simulated friction velocities

An obvious consequence of the fairly large differences in altitudes \(H_{\textit{station}} -H_{\textit{gridpoint}} \) is that the WRF model wind speeds are consistently overestimated at Alegrete and often underestimated at Caçapava do Sul, for all parametrization schemes and regardless of the horizontal and vertical grids (Fig. 7b). The largest deviations between simulated wind speeds and observations occur for very small wind speeds that are common at Alegrete, which are not at all reproduced by the WRF model, suggesting that the model does not reproduce the decoupling of near-surface flow and that at higher levels in the SBL.

Fig. 8
figure 8

2-m temperature simulation errors (height-corrected) for each station, averaged for all nights, as a function of altitude difference between station and model nearest grid point \((H_{\textit{station}} -H_{\textit{gridpoint}} )\). Each panel refers to a different PBL scheme, as identified above the panels. All cases refer to the 12-km horizontal and lower vertical grid resolutions

The wind-speed model bias explains the general trends of the temperature bias. In general, temperatures are always overestimated at Alegrete (Fig. 7c) and this fact can be directly attributed to the corresponding overestimation of wind speed. The largest warm bias occurs at Alegrete with observed small wind speeds, when all parametrization schemes give a reasonably large friction velocity (Fig. 7d). It is hypothesized here that most of the warm bias at Alegrete is associated with excessive turbulence in the model results. At Caçapava do Sul, on the other hand, temperatures are, on average, underestimated for all wind speeds, except for the largest wind speeds (Fig. 7c). The largest cold bias at Caçapava do Sul occurs with wind speeds between 2 and \(4\,\hbox {m}\, \hbox {s}^{-1}\), the same range for which the wind speed is well reproduced by the model (Fig. 7b). On the other hand, the good temperature forecasts at that station are obtained with wind speeds between 4 and \(5\,\hbox {m} \hbox {s}^{-1 }\)(Fig. 7c); such wind speeds tend to be underestimated by all parametrization schemes (Fig. 7b).

4.3 Dependence on Terrain

When all stations are considered, it is clear that the altitude difference \((H_{\textit{station}} -H_{\textit{gridpoint}})\) explains a large fraction of the temperature bias, although other influences not being currently analyzed, such as the land use, may also be important. In all parametrization schemes, there is a clear tendency of stations that are lower relative to the grid point used for the comparison, having a positive temperature bias, while stations higher relative to the nearest grid point tend to show lower temperatures than the observations (Fig. 8). This is mainly caused by the fact that wind speeds and, consequently, turbulent intensity, tend to be greater at higher altitudes. This fact alone causes the model to overestimate wind speeds in locations that are lower relative to the model grid point being compared. Furthermore, the model topography is a smoothed representation of a region, such that “the simulation tends to overestimate the wind speed over the valleys and to underestimate it at the mountain tops” (Jiménez et al. 2012). The excessive mixing at stations lower relative to the model grid point being compared warms the surface accordingly and the opposite occurs at stations that are higher relative to the closest grid point. If a station is at the bottom of a hill, for example, local terrain favours cold-air pooling favoring surface decoupling from the upper boundary layer. When it occurs, the observed nighttime temperature becomes proportional to the difference between the altitude of the station and that of its surroundings (Acevedo and Fitzjarrald 2003; Acevedo et al. 2013). Medeiros and Fitzjarrald (2014, 2015) showed that the convex locations served at ‘hot spots’ that fostered surface-atmosphere exchange on strongly stable nights, and this points toward seeking better horizontal resolution to define such locations in a landscape.

To compare the results for the different nights, it is important to classify them in terms of stability. To do so, based solely on the observations provided by the station network, a “spatial Richardson number” is defined as

$$\begin{aligned} Ri_{\textit{spat}} \equiv \frac{g}{\bar{{\Theta }}}\Delta z\left( {\frac{\theta _{\max } -\theta _{\min } }{V_{\textit{mean}}^2 }} \right) , \end{aligned}$$
(1)

where \(\theta _{\max } \) and \(\theta _{\min } \) are respectively the maximum and minimum potential temperatures observed across the entire network for a given night, and \(V_{\textit{mean}}\) and \(\bar{{\Theta }}\) are respectively the mean wind speed and potential temperature for the entire network during that night. The height of the wind-speed observations, 10 m, is used for \(\Delta z\).

Equation 1 provides a value of \(Ri_{\textit{spat}} \) for each night and is, therefore, used as a stability classifier for the entire network. It is based on the idea that the horizontal temperature variability is enhanced in more stable conditions, while in less stable conditions the increased turbulent mixing provides horizontal temperature homogenization. Although the idea of a spatial Richardson number may seem to contradict the definition of the parameter, it is used here merely as a means of contrasting the background SBL state for the entire network over the different nights (Online Resource 2). Medeiros and Fitzjarrald (2014) applied the concept of a regional bulk Richardson number, which is applied to a spatially smaller network of stations than that used herein.

Fig. 9
figure 9

Average 2-m temperature simulation errors (height-corrected) as a function of the spatial Richardson number \((Ri_{\textit{spat}} \), see text) and the altitude difference between station and model nearest grid point \((H_{\textit{station}} -H_{\textit{gridpoint}} )\). Each panel refers to a different PBL scheme, as identified above the panels. All cases refer to the 12-km horizontal lower vertical grid resolutions

Fig. 10
figure 10

The same as in Fig. 9, but for the average 10-m mean wind speed errors

Among all nights, the largest wind speeds and corresponding lowest \(Ri_{\textit{spat}} \) occurred on 31 July 2012. In this case, the error is not highly correlated with \(H_{\textit{station}} -H_{\textit{gridpoint}} \), as given by the \(R^{2}\) values between these two quantities, which never exceed 0.24 for any parametrization scheme (figure not shown). The most stable night was 26 July 2012, when much higher correlation coefficients between temperature simulation error and \(H_{\textit{station}} -H_{\textit{gridpoint}} \), occurred, with \(R^{2}\) values always exceeding 0.6, indicative that lower locations more likely experience decoupling. For example, the Alegrete station (Fig. 7) is much warmer than is indicated by the observations. On the very stable night of 26 July 2012, the QNSE scheme showed the smallest correlation between temperature error and \(H_{\textit{station}} -H_{\textit{gridpoint}} \) and also the smallest r.m.s. error between forecasted and observed temperatures. This is a further evidence that this scheme is the most appropriate to simulate this type of night. The use of finer vertical grid (figure not shown) improves the temperature simulation for this very stable night (r.m.s. error of \(2.7\,^{\circ }\hbox {C}\) for the MYJ scheme decreasing to \(2.5\,^{\circ }\hbox {C}\) for the MYJ-h scheme), indicating that in such an environment, with reduced spatial scales of turbulence, the use of higher vertical resolution provides a better representation of the physical process.

The overall results are summarized in Fig. 9. In general, at stations higher relative to the nearest grid point the model provides lower temperatures than the observations, a fact that is independent of parametrization scheme and grid resolution. More interestingly, this result is also generally independent of nocturnal stability. This is because at those stations located at higher locations relative to the nearest grid point, there is a general tendency of windier and more turbulent conditions regardless of the night. In such locations, the surface tends to connect to the upper SBL both in the real world and in the model. At stations with positive \(H_{\textit{station}} -H_{\textit{gridpoint}} \) the wind speed is generally overestimated, with the exception of some extreme cases with largest \(H_{\textit{station}} -H_{\textit{gridpoint}} \) (Fig. 10). Therefore, although representing a lower location, the model still overestimates wind speed, a result similar to that found earlier (e.g., Zhang and Zheng 2004; Svensson et al. 2011; Jiménez and Dudhia 2012; Xie et al. 2012; Hu et al. 2013). At stations lower relative to the nearest grid point (negative \(H_{\textit{station}} -H_{\textit{gridpoint}} )\), on the other hand, the average temperature simulation error is highly dependent on \(Ri_{\textit{spat}} \). In these cases large warm bias occurs on the most stable nights, when such lower stations experience surface decoupling, not reproduced by any parametrization scheme presently being compared. However, the average error at these stations approaches zero or becomes slightly negative on weakly stable nights, when windier conditions prevail over the entire area and decoupling is unlikely, even at lower locations.

4.4 Dependence on Cloudiness

When the temperature error is classified in terms of the cloud condition, as described in Sect. 3, some trends become clear (Table 3). The largest r.m.s. errors occur in clear-sky conditions, while the smallest r.m.s. errors occur for high-level clouds or deep convection. For cloudy conditions, the largest errors occur for low-level clouds.

Weather situations with observed low-level clouds are the only conditions in which the WRF model produces a mean cold bias regardless of the parametrization scheme, although such a bias varies substantially among them (from \(-0.1\) for BouLac to \(-0.7\) for MYJ-4h). This result may indicate a difficulty of the model in properly resolving this cloud type, therefore, affecting the temperature forecasts. A similar result has been found for simulations of high-latitude weather by Dong and Mace (2003), Hines et al. (2011) and Bromwich et al. (2013) who concluded that “the model has difficulty predicting the downward longwave radiation and hence the surface energy balance...” and that “inadequate cloud representation is inferred to be responsible for the summer cold bias”. Furthermore, it is also interesting to notice that the mean bias for low-clouds is generally colder for parametrization schemes with lower turbulence levels, exactly as it has been found to occur in windy conditions (Sect. 4.2) and on the weakly stable nights (Sect. 4.3).

To isolate whether the cold bias is produced by problems in the turbulence or in the low-cloud representation, the mean temperature bias is compared for cases with low clouds or clear skies and those with small or large wind speeds (Table 4). For simplification, in this analysis, only the averaged bias among all parametrization schemes is compared, because these results are not largely dependent on the parametrization schemes. For mean wind speeds >2 m s\(^{-1}\), the cold bias occurs regardless of cloud condition. For windy, clear-sky conditions, the averaged bias of \(-0.8\,^{\circ }\hbox {C}\) shows that not all cold bias observed is produced by cloud misrepresentation. On the other hand, for the same range of wind speeds, the cold bias is enhanced to \(-1.3\,^{\circ }\hbox {C}\) when low clouds are present, indicating that their misrepresentation increases the problem. We conclude that the nocturnal temperature forecast improves if each model artificially allows for higher turbulence in weakly stable conditions. This improvement arises partially as a counteraction to poor low-cloud representation or deficiencies in other parametrization schemes, which are beyond the scope of the study.

Table 3 R.m.s. and mean bias errors of simulated 2-m temperatures with respect to the observed temperatures, for each simulation performed, classified according to cloud condition inferred from satellite image
Table 4 Mean bias errors of simulated 2-m temperatures with respect to the observed temperatures (\(^{\circ }\)C), averaged for all simulations, classified in terms of the two most common cloud conditions found and 10-m mean wind speed

5 Conclusion

We analyzed the influence of turbulence parametrization schemes on the quality of nocturnal temperature forecasts using the WRF model. The study is incomplete, since errors may arise due to other factors, such as deficiencies in the surface-layer scheme, in the land-surface scheme, in the radiation scheme, or by errors in the GFS boundary and initial conditions. A thorough analysis that included surface radiation and energy balance, as well as vertical profiles of the mean and turbulent quantities was not possible because there were no observations available of these variables.

We examined 31 nights, each with distinct nocturnal stability conditions at 26 sites over a broad area with diverse surface characteristics. Furthermore, four PBL parametrization schemes have been compared: BouLac, QNSE, YSU and MYJ. For the MYJ scheme, different horizontal and vertical grid resolutions have also been employed. Based on the results, a proper adjustment of the PBL schemes may correct the following deficiencies found in the model:

  1. (1)

    The model has a cold bias in windy conditions, regardless of the PBL scheme used. Such a bias is reduced for formulations that allow higher turbulence levels for a given value of the stability parameter z / L, such as the BouLac scheme. This result, therefore, indicates that enhancement of turbulence in weakly-stable conditions tends to improve the nocturnal temperature forecasts. On the other hand, such turbulence enhancement also tends to increase the model 10-m wind speed. Given that this variable is generally overestimated by the model, such an action tends to worsen the model wind-speed forecast. It has also been found that the model has a cold bias when low-level clouds are present. Given that low-level clouds tends to occur along with weakly-stable conditions, in many of these situations the cold bias may be partially attributed to the low-level cloud misrepresentation. However, a mean cold bias is also observed when large wind speeds occur with clear skies, indicating that the poor turbulence representation also plays a role.

  2. (2)

    The PBL schemes compared have difficulties in reproducing the decoupling between the surface and the upper SBL. This conclusion, also obtained in Shin and Hong (2011), is inferred from the fact that on the more stable nights the model temperature errors are highly correlated with \(H_{\textit{station}} -H_{\textit{gridpoint}} \). On those nights, stations at a lower altitude to their surroundings (where \(H_{\textit{station}} -H_{\textit{gridpoint}} \) is likely to be negative) experience decoupling in the real world, but not in the model, where the maintenance of turbulence throughout the night produces a warm bias. Therefore, the inability of the model to reproduce correctly the decoupled state is responsible for a large diversity of temperature biases on the low-wind nights, with a large warm bias at stations situated at lower heights relative to the nearest grid point and large cold bias otherwise. It is important to stress that the QNSE scheme performs appreciably better relative do the others in such very stable conditions.

The use of finer horizontal and vertical grids has little effect on the temperature forecasts. Slight improvements have been found with higher vertical resolution on the most stable nights, but in windier conditions the results are insensitive to the vertical grid resolution.

There is an inherent difficulty forecasting nocturnal temperatures in very stable conditions because cold-air pooling and consequent SBL surface decoupling lead to large variability over relatively small horizontal distances. Observations have shown large temperature differences over distances as small as a few hundred metres (Bodine et al. 2009; Acevedo et al. 2013). No mesoscale operational numerical weather forecast model can reproduce such variability, although recently Boutle et al. (2016) have shown that this may be improved through the use of a very fine nested grid. Therefore, an interesting strategy to predict the spatial distribution of temperatures on such nights may be to perform a post-processing algorithm, which relates surface temperature to height difference between a given location and its surroundings. An advantage of such an approach is that it can be applies to a much finer horizontal grid than that of the model simulations The near-linear relationship between temperature simulation error and \(H_{\textit{station}} -H_{\textit{gridpoint}} \) found in the present study during those nights suggests that a post-processing correction of the kind may be very effective and fairly simple to implement. However, the successful implementation of such a correction also demands a better solution of the coupling state of the SBL by the numerical model.