1 Introduction

It is well known that air pollution and air quality models make use of numerical weather prediction (NWP) models to provide the input data for transport and dispersion simulations (see, for instance, Bisignano et al. 2017). In addition to wind and temperature fields, diffusion coefficients and turbulent kinetic energy (TKE) are important for characterizing the dispersion processes. These quantities are provided by the planetary-boundary-layer (PBL) scheme in the NWP models. Non-local turbulence transport is critically important to PBL mixing processes, especially in turbulent fluxes where buoyancy plays a major role. Among the well-known and validated procedures in the literature, the Reynolds-averaged Navier–Stokes (RANS) model is a useful tool for describing turbulent phenomena. RANS models may be categorized depending on the highest order moments that are solved: first-order models solve only the equation for the mean values and are based on the down-gradient approximation; second-order models solve moment equations also for the second-order moments and adopt a parametrization for the third-order moments (Mellor and Yamada 1982; Byggstoyl and Kollmann 1986; Jones and Musange 1988; Durbin 1993; Cheng and Canuto 1994; Canuto et al. 1999; Speziale et al. 1996; Trini-Castelli et al. 2001). The common parametrizations employed in first-order models are not appropriate when applied to convective fluxes because they do not account for the complex mixing processes affecting the turbulent quantities (Moeng and Wyngaard 1989; Holtslag and Boville 1993). In fact, in these models, it is usually assumed there is a one-to-one correspondence between turbulent fluxes at a given height and other parameters of the flow, such as the local gradients of the mean field. This approximation is justified only when the turbulent mixing length is much smaller than the length scale of heterogeneity of the mean flow. However, observational and large-eddy-simulation (LES) studies have demonstrated that turbulence in the convective boundary layer (CBL) is associated with non-local integral properties of the PBL (Holtslag and Moeng 1991; Moeng and Sullivan 1994; Schmidt et al. 2006), implying that mixing is also active in PBL regions where there is no local production of turbulence. In the CBL, these effects are essentially due to large-scale semi-organized structures, namely buoyancy-driven cells covering the entire PBL. A similar effect is also observed in shear flows generated by large turbulent structures such as rolls or streaks (Moeng and Sullivan 1994). In past studies (Ferrero and Racca 2004; Ferrero 2005; Colonna et al. 2009), we demonstrated that non-locality plays an important role in these flows, even though the structures involved are smaller than those find in the CBL. In fact, a model including the third-order moments can better describe the PBL height in the neutral case (Ferrero and Racca 2004). Therefore, to properly represent non-locality, it is important to include all the effects from the surface to the given height at which the fluxes are computed. Several improvements of local schemes have been made by including non-local adjusting terms (Deardorff 1972; Holtslag and Moeng 1991; Wyngaard and Weil 1991; Canuto et al. 2005), though, to overcome the shortcomings of local schemes, a non-local scheme including the dynamical equations for the third-order moments should be employed (Canuto 1992; Cheng and Canuto 1994; Zilitinkevich et al. 1999; Canuto et al. 2001; Ferrero and Racca 2004; Ferrero, 2005; Cheng et al. 2005; Gryanik et al. 2005; Ferrero and Colonna 2006). In these studies, it is hypothesized that the third-order moments, which are the flux-of-the-flux, represent the non-locality, even though they are not parametrized on the basis of lower-order moments from other levels or, more generally, other locations, as for example in Stull (1988). The state-of-the-art NWP models include only first-order or TKE PBL schemes that are not able to account for the complex turbulent dynamics that take place in the lowest layers of the atmosphere. Currently, the most widely used NWP model is the Weather Research and Forecasting (WRF) model (Skamarock and Klemp 2008; Chen et al. 2011), which includes several PBL schemes coupled to different surface-layer schemes. For these reasons, it is of interest to evaluate the performance of the WRF model in simulating, not only the mean temperature and wind fields, but also the TKE from which the diffusion coefficients are derived. Several of the PBL schemes include a TKE equation, but only two of them can be coupled with the multi-layer urban model developed by Martilli et al. (2002), Salamanca and Martilli (2010) and Salamanca et al. (2010).

Many studies reported in the literature have verified PBL schemes in the WRF model against experimental datasets in different situations. Hu et al. (2010) compared three PBL schemes; one TKE scheme [the Mellor–Yamada–Janjic (MYJ, Mellor and Yamada 1974, 1982; Janjic 1990, 2001)] and two first-order schemes (YSU [Yonsei University, Hong et al. 2006]) and ACM2 (Asymmetric Convective, Pleim 2007a, b); they found that the MYJ scheme yielded the coldest and wettest biases within the PBL. Shin and Hong (2011) compared five PBL schemes in the WRF model for a single day: those include two first-order closures, YSU and ACM2, and three TKE closures, namely the MYJ, Quasi-Normal Scale Elimination (QNSE, Sukoriansky et al. 2005) and Bougeault–Lacarreére (BOULAC, Bougeault and Lacarrere 1989) schemes. They found that the first-order schemes are preferable in unstable conditions, while a TKE closure performs better in stable conditions. Kleczek et al. (2014) compared three first-order closures, YSU, ACM2 and VH96 (Vogelezang and Holtslag 1996), and four TKE closures, MYJ, QNSE, Mellor–Yamada–Nakanishi–Niino (MYNN25, Nakanishi and Niino 2004) and BOULAC. Generally, non-TKE schemes tend to produce higher temperatures and higher wind speeds than in TKE schemes, especially for night-time. During the night, TKE schemes give lower wind speeds than other schemes. The non-TKE PBL schemes underestimate the PBL depth and the altitude of the low-level jet at night by about 50 m, while the TKE schemes better estimate the low-level-jet altitude and strength. Roulet et al. (2005) performed a validation of a one-dimensional model in a street canyon, and used a simplified representation of the city as a combination of urban classes. Each class has a different characterization for street orientation and width, and for building height and width. Hamdia and Schayes (2008) studied the heat island of a European city using an urban surface-exchange parametrization implemented in a mesoscale atmospheric model adopted in a one-dimensional column model. Shin and Hong (2011) simulated the CASES-99 (23 October 1999) experiment and found that all PBL schemes (ACM2, MYJ, QNSE, and BOULAC) slightly overestimated the 10-m wind speed, except for the YSU scheme, which gave lower values. Coniglio et al. (2013) evaluated the thermodynamic variables predicted by different PBL schemes (MYJ, QNSE, ACM2, YSU, MYNN) in the WRF model, with results suggesting use of the MYNN scheme reduces the cool and moist bias of the MYJ scheme in the CBL upstream from convection. Nelson et al. (2016) compared the WRF model performances against measurements made by sonic anemometers on a tower, and used the MYJ scheme to predict the TKE. Dimitrova et al. (2016) showed that no single PBL scheme is superior to the others, as temperature at 2 m is poorly reproduced and, similarly, the vertical gradients of wind speed and wind direction were not modelled well.

We compare six PBL schemes embedded in the WRF model with experimental data from the Urban Turbulence Project (UTP, Mortarini et al. 2013; Trini Castelli et al. 2014) gathered in the outskirt of a large city (Torino) located in the Po Valley, Italy. The observation site is characterized by frequent cases of low wind speed and calm conditions, which correspond to strong convection during summer and strong inversions in winter. The dataset includes measurements of the velocity, temperature, and turbulence at three different levels; two of these levels are within the canopy layer and the third is within the inertial sub-layer.

It is worth stressing that, to our knowledge, no evaluation of the WRF model performance against turbulence observations in an urban environment has been published, and so our attempt is the first in this direction. Furthermore, the model comparisons in the literature are generally focused on cases limited in time, while we extend the analysis to two one-month periods, and account both for the seasonal and daily variability. In order to understand how PBL schemes perform in conditions such as an urban environment with frequent low wind speeds, we first compare six PBL schemes, built into the WRF model, with data from the highest anemometer located in the inertial sub-layer. The availability of the TKE observations allows us to directly evaluate the performance of the models, and in particular of those that entail a differential equation for TKE. Results of the model evaluation are presented in terms of the quantile versus quantile plots (qq-plots) and statistical indices. Then, we consider those PBL schemes that can be coupled to the urban-surface exchange parametrization and compare the simulation results against measurements from the two lower anemometers inside the canopy layer. Furthermore, to analyze the vertical structure of the turbulence between the inertial and the roughness sub-layers, we present a comparison of the probability density function (PDF) of the differences between the measurements taken at two heights.

The experimental dataset is presented in Sect. 2. Section 3 includes the WRF model simulation set-up, while in Sect. 4 the PBL schemes are described. The results are discussed in Sect. 5 and conclusions follow in Sect. 6.

2 The Experimental Dataset

For comparison, we used the UTP dataset (Mortarini et al. 2013; Trini Castelli et al. 2014) gathered from the outskirts of Turin, Italy located at the western edge of the Po Valley at about 220 m above m.s.l. A hill chain (maximum altitude of about 700 m a.m.s.l.) surrounds Turin on the eastern sector while the Alps (whose ridgeline is about 100 km away) encircles the city in the other three directions. The instruments were located on the southern outskirts of the town on a grassy, flat terrain surrounded by buildings, with buildings around 30 m tall located 150 m to the north–north-east of the station. Sparse low buildings (with heights from 4 m to 18 m) are distributed at a distance of 70 to 90 m throughout all other directions. The area experiences frequent occurrences of low wind-speed conditions (about 90% of the time). A 25-m mast, equipped with horizontal booms pointing west and east at heights of 5, 9, and 25 m above ground level (a.g.l.), was located at the centre of the station, with two booms pointing north and south installed at a height of 25 m. The three anemometers included two Solent 1012R2s (Gill Instruments, Lymington, UK) placed at 5 and 9 m, and one Solent 1012R2A (Gill Instruments, Lymington, UK) installed at 25 m. The UTP measurement campaign began on 18 January 2007 and spanned 15 months with several short interruptions for maintenance. More details about the measurements are available in Mortarini et al. (2009, 2013) and Trini-Castelli et al. (2014). The data were stored at a frequency of 20 Hz and include temperature, wind speed and direction. The temperature was computed from the sonic anemometer following Richiardone et al. (2012), who also discussed in detail the derivation accuracy. A preliminary analysis aimed at synchronizing the signals of the three anemometers was carried out, with the mean and the standard deviations of the measured quantities then calculated on an hourly basis (Richiardone et al. 2012). To account for different meteorological conditions with daily (both convective and stable) and seasonal variations, we performed simulations both for July 2007 and January 2008.

3 WRF Model Simulations

The WRFV3.8.1 model simulations were performed with four nested grids (Fig. 1) with a resolution of 30, 10, 3.3, and 1.1 km, respectively, centred at the mast location. The three coarser grids used 133 × 133 grid points, and the finest used 67 × 67 grid points (Fig. 2, left). This horizontal resolution could not be considered sufficient for comparing model results with data from one anemometer located in the urban canopy layer. On the other hand, a finer resolution may not be coherent with the assumptions of the PBL scheme, which is based on the Reynolds hypothesis of separation between the large and small scales. When using a finer resolution, parts of the small-scale spectrum are explicitly resolved and the Reynolds-average model acts as a LES model, which is not consistent with the PBL scheme hypothesis. This is the so-called grey zone of turbulence or “Terra Incognita” (Wyngaard 2004) where assumptions about the PBL structure used in turbulence models may not be valid. Here, PBL thermals are partly resolved, noting that almost all previous studies on the analysis of PBL schemes have adopted a resolution of 1 km or less (Dimitrova et al. 2016 is the only exception). However, we performed test simulations that indicated that the model performances with respect to TKE are much worse (on average) when reducing the grid size.

Fig. 1
figure 1

WRF model domain of the four nested grids

Fig. 2
figure 2

Left panel: the WRF model domain of the inner grid. Right panel: land use on the inner grid: black: urban, yellow: crop, green: prairie, dark green: forest, beige: barren soil, blue: water and waterland. The white cross indicates the position of the observational mast

In the vertical direction, the model used 38 levels from the surface to 50 hPa with six levels in the first 100 m (7, 16, 26, 42, 61, and 96 m), 13 levels in the first 1000 m (as before plus additional levels at 154, 242, 366, 508, 651, 796, and 942 m). We chose such a high resolution near the ground to compare the model results with the measurements performed at the mast levels.

As previously mentioned, the measurement site is located on the outskirts of a large Italian city and is surrounded by an inhomogeneous distribution of buildings of variable height. The maximum building height is around 30 m (but with an average height between 3.9 and 10.3 m), which is close to the level of the higher anemometer (25 m). The measurements performed at this level can be considered in the inertial sub-layer (Rotach 1999), while the lower level anemometers (5 and 9 m) are located in the roughness sub-layer. It should be noted that the PBL schemes are based on surface-layer similarity theory and thus do not account for a roughness sub-layer in the lower layers. However, when PBL schemes are applied, it is usually supposed that the first model level is located in the inertial sub-layer, i.e., well above the urban canopy, but the first three model levels that we chose do not meet this requirement. To assess the impact of violating this assumption in the simulations, we performed preliminary tests. We considered the MYJ scheme since it is commonly adopted as a PBL scheme for use with the WRF model, and no other PBL scheme was involved in this preliminary test. In fact, if the first model level height affects the results, this should not depend on the particular PBL scheme.

We performed a simulation with a higher first model level (25 m) and compared the results at the 25-m observational level, the only one that can serve, in this case, as an appropriate comparison level. We also compared these results with those obtained with the original resolution. The model results are comparable with those obtained using the two model levels inside the canopy layer. The wind-speed overestimation is slightly reduced, while the temperature and TKE indices are very similar in the two cases. To verify whether this result indeed reflects a missing sensitivity or is due to compensating behaviour, we compared the two vertical resolutions, stratifying by stability conditions and wind regimes, low (< 1.5 m s−1) or high wind speed. When comparing the statistical indices obtained in the two cases (not shown), no significant difference was found using a first model level either within or above the canopy layer.

For the microphysics we chose the Thompson graupel scheme; longwave radiation was parametrized with the rapid radiative transfer model scheme, and shortwave radiation used the rapid radiative transfer model general scheme. Regarding the surface layer, different schemes, depending on the PBL scheme, were used, while the surface was represented with the Unified Noah land-surface model (LSM; Chen et al. 1996). The cumulus parametrization Kain–Fritsch (new Eta) scheme was activated for the first and the second grids only. Global 1/2° reanalysis data from the NOAA Climate Forecast System Reanalysis system (CFSR; Saha et al. 2010) were used to initialize and force the WRF model at the boundaries. Daily sea-surface temperature at 1/12° from the NOAA Real-Time Global Sea (Thiébaux et al. 2003) server was used over the ocean. The WRF model LSM was initialized with 1/4° temperature and moisture analyses from the NASA Global Land Data Assimilation System (Rodell et al. 2004). Spectral grid nudging (Miguez-Macho et al. 2004) to 6-h CFSR global data was applied to the largest domain. The four-dimensional data assimilation (Stauffer and Seaman 1990) scheme was activated to assimilate conventional observations (surface stations, soundings, aircraft and satellite wind vectors) that were available from the World Meteorological Organization at the time of the field campaign. The MODIS land-use scheme with 21 categories was used (Fig. 2, right). The input parameters to the LSM are provided depending on the land-use categories through the specific file LANDUSE.TBL. For example, the urban category gives a roughness length of 0.8 m and an emissivity of 0.88; the crop category is associated with a roughness length of 0.15 m and emissivity of 0.985.

Then, we performed other simulations activating the Building Effect Parametrization (BEP) multi-layer urban model developed by Martilli et al. (2002) and the Building Energy Model (BEM; Salamanca and Martilli (2010); Salamanca et al. (2010)). This can be used with only two of the PBL schemes (BOULAC and MYJ) and allows both a direct interaction with the PBL and accounts for the three-dimensional nature of urban building surfaces. Sources and sinks of heat, moisture, and momentum are vertically distributed throughout the whole urban canopy layer, as well as the effects of vertical (walls) and horizontal (streets and roofs) surfaces on momentum (drag-force approach), TKE, and potential temperature. Moreover, the interaction between radiation and walls and roads accounts for shadows, reflections, and trapping of shortwave and longwave radiation in the street canyons. The urban model is coupled with the turbulence scheme of the WRF model by introducing a source term in the TKE equation within the urban canopy and by modifying turbulence length scales to account for the presence of buildings. In the standard version of the BEP model (Martilli et al. 2002), the internal temperature of the buildings is kept constant. To improve the estimation of exchanges of energy between the interior of buildings and the outdoor atmosphere, the simple BEM model (Salamanca and Martilli 2010; Salamanca et al. 2010) has been linked to the BEP model. The BEM model accounts for the diffusion of heat through the walls, roofs, and floors of the buildings; radiation exchanged through windows, longwave radiation exchanged between indoor surfaces, generation of heat due to occupants and equipment, air conditioning, ventilation and heating. We did not modify the parameters for urban morphology and physical parameters (e.g. mean roof height, anthropogenic heat), but we used the default values that are, however, appropriate for the city under study. As described in Trini Castelli et al. (2014), the site is characterized by an average height zH of the buildings between 3.9 and 10.3 m depending on wind direction, and on the method adopted for estimating zH, which are consistent with values given for the Martilli et al. (2002) urban model (category 2) that are between 5 and 20 m, with 60% for 15 m. We also evaluated the plan area as having values between 0.24 and 0.8, which are consistent with those calculated with the street parameters for the Martilli et al. (2002) urban model (category 2) of 0.4. Furthermore, the land-use map in Fig. 1 shows that the area surrounding the mast is classified as urban.

4 PBL Schemes

For all PBL schemes, a simple down-gradient approximation is adopted by introducing the eddy viscosity K,

$$ \overline{{u_{i}^{'} u_{j}^{'} }} = - K\frac{{\partial U_{i} }}{{\partial x_{j} }}, $$
(1)

where primes indicate the fluctuations of the velocity components, and the upper case represents the mean value. The schemes differ in the way that K is determined, basically, K is related to a velocity scale and a length scale,

$$ K = \tilde{V}l, $$
(2)

so that any PBL scheme provides a particular expression for these two scales. For example, Monti et al. (2002) suggested a semi-empirical model for the eddy diffusivities of heat (Kh) and momentum (Km) under stable conditions, based on observations. It was shown that for predominantly stably-stratified slope flows, Kh/Km ≈ 1 for Ri < 0.2, indicating that stratification effects are of minor importance for such Richardson numbers, with turbulent eddies transporting heat and momentum with equal efficiency.

We can distinguish between TKE schemes that entail a dynamical equation for the TKE (but not for the other second-order moments) and first-order closures that are based on parametrizations generally derived from the Monin–Obukhov similarity theory (MOST). In the WRF model, PBL schemes belonging to both categories exist, and the PBL schemes we considered for comparison are summarized in Table 1.

Table 1 The different PBL schemes available in the WRF model

Two of them (YSU and ACM2) are first-order closures (no prognostic equation is required) that attempt to account for the non-locality of the turbulence empirically. In the CBL, both schemes adopt a diffusion coefficient depending on height and on a velocity scale, which is a function of the friction and convective velocities (YSU) or a mixing length and the friction velocity (ACM2). The non-locality is introduced in the YSU scheme by a gradient adjustment term to the local gradient. The ACM2 scheme accounts for non-locality as a result of upward fluxes from the surface and downward fluxes from (to) the adjacent upper (lower) vertical level. Four PBL schemes consider the TKE (MYJ, BOULAC, QNSE, and MYNN2); the MYJ and BOULAC schemes can be coupled with the urban BEP + BEM model (Martilli et al. 2002; Salamanca and Martilli 2010; Salamanca et al. 2010). TKE closure schemes entail a prognostic equation for TKE, with the diffusion coefficient proportional to the square root of the TKE and a mixing length. Different TKE schemes are obtained specifying the mixing length and the proportionality coefficient and each PBL scheme is coupled to a surface-layer scheme, which provides the turbulent fluxes. It should be stressed that in the WRF model each PBL scheme is built with a specific surface-layer scheme that provides the surface-layer parameters. For this reason, we changed the surface-layer scheme for each PBL scheme. Using the same surface-layer scheme for all simulations implies coupling specific PBL schemes to a surface-layer scheme for which they have not been designed. On the contrary, the Unified Noah LSM (Chen et al. 1996) is used for all simulations. For the YSU and ACM2 schemes, we considered the MM5 similarity scheme based on MOST with a Carslon–Boland viscous sub-layer and standard similarity functions. The MYJ and BOULAC schemes were used together with the Eta similarity model based on MOST, with the Zilitinkevich (1995) thermal roughness length and standard similarity functions from look-up tables. The QNSE PBL scheme uses the QNSE surface-layer scheme, and the MYNN2 PBL scheme is coupled with the MYNN surface-layer scheme. The model configurations used herein are summarized in Table 2.

Table 2 Model configurations

5 Results and Discussion

We examined 2 months of data selected from the entire campaign. To cover different stability conditions, we chose 1 month during winter and 1 month during summer, January 2008 (from 1 January 0000 UTC to 1 February 0000 UTC) and July 2007 (from 1 July 0000 UTC to 1 August 0000 UTC), respectively.

We ran the WRF model with the six PBL schemes listed in Table 2, with the model variables linearly interpolated in height to the anemometer elevation. In the horizontal directions, an inverse-square-distance weighting accounting for the four closest grid points was performed. First, we carried out a preliminary investigation of the temporal trend and the qq-plots of the variables and, to evaluate the PBL scheme performances related to wind speed, temperature and TKE \( \left( {\overline{e} } \right) \), we computed the indices suggested by Chang and Hanna (2004): fractional bias (FB), normalized mean-square-error (NMSE) and correlation coefficient. Among the three anemometers, we selected the one located at 25 m a.g.l. on the mast, since it is not in the urban roughness sub-layer. The overall urban roughness likely affects the measurements at this elevation.

Then, to better investigate the effect of the urban canopy and the ability of the WRF model to reproduce the observations, we considered the two PBL MYJ and BOULAC schemes coupled to the BEP + BEM urban model and compared them with the measurements from the lower anemometer. For the 5 and 9-m heights, in some cases, the model values were linearly extrapolated from the two closest grid points, depending on the elevation of the lower level of the grid (centred or staggered) on which the variable is defined.

To further analyze the vertical structure of the flow and turbulence inside the canopy layer, we computed the PDF of the differences between the values at two heights (25 m and 9 m). For this analysis, we considered only the MYJ and BOULAC schemes coupled with the urban model, and the YSU scheme, which does not account for the TKE, and the urban model.

5.1 Comparison of the PBL Schemes

Figure 3 shows a time series of the four variables (wind speed and direction, temperature, and TKE) for 48 h in January (5 January 2008 0430 LT–7 January 2008 0430 LT) measured at an elevation of 25 m a.g.l. It can be noted that the sonic temperature is used as potential temperature (Richiardone et al. 2012), which may represent a source of inaccuracy in the measured values. For these preliminary analyses we considered all PBL schemes including the MYJ and BOULAC schemes coupled with the urban model (Martilli et al. 2002).

Fig. 3
figure 3

Time series (5 January 2008 0430–7 January 2008 0430 LT) of wind speed, temperature, TKE \( \left( {\overline{e} } \right) \) and wind direction from the 25-m anemometer. Circles: observational data; solid lines: model results: blue: MYJ; red: BOULAC; black: MYNN2; green: QNSE; yellow: YSU; magenta: ACM2; short dashed line: MYJ + BEP + BEM; long dashed: line BOULAC + BEP + BEM

It can be observed that the measured wind speed is always below 2 m s−1, and for these low wind speeds, the agreement between simulations and observations is usually poor. However, the MYJ scheme performs slightly better than the other schemes. In contrast, all six of the PBL schemes, without the urban model, do not show significant differences between simulated temperatures. These schemes approximately reproduce the measured trend and underestimate the observations by about 2 K. The two schemes coupled with the urban model (MYJ + BEP + BEM and BOULAC + BEP + BEM) are more accurate than the others and reduce the underestimation.

Concerning the TKE, the agreement of the results for each scheme with the measurements is better compared to that for wind speed. Coupling the urban model to the MYJ (MYJ + BEP + BEM) and BOULAC (BOULAC + BEP + BEM) schemes yields higher TKE values and a better agreement in the cases of TKE maxima, particularly for the BOULAC scheme, which however occasionally overestimates the measurements. Wind direction is generally well reproduced by all schemes despite a large number of low wind-speed cases. The MYNN2 scheme gives the worst performance.

An example of the July 2007 case is shown in Fig. 4. The time series (5 July 2007 0430 LT–7 July 2007 0430 LT) of wind speed and direction, temperature, and TKE, both simulated and measured at the 25-m anemometer, are presented. It can be noted that, for the first 24 h, all schemes reproduce the daily cycle for the variables considered, while for the second 24 h the observed wind speed is always lower than 3 m s−1 and the schemes overestimate the observed values. The MYNN2 scheme agrees better than the others with the lower values while underestimating the higher values. The first diurnal peak is better simulated by several schemes, while all schemes predict a peak during the second day, which is not found in the observations. The temperature is more accurately simulated than in January, with an underestimation of the minimum value. The TKE is underestimated during the first 24 h and overestimated during the second 24 h, and it is worth noting that the same behaviour is observed both for TKE and non-TKE schemes suggesting that, in these conditions, the TKE performance is mainly influenced by mean wind speed. Wind direction is not well reproduced by any of the schemes, and in Fig. 4, it can be observed that there is better agreement during some periods than others. However, this behaviour does not seem to be related to day time or night-time conditions but is likely due to the presence of buildings. Interestingly, the wind speed is higher in July than in January, so the influence of the obstacles on the flow may be enhanced.

Fig. 4
figure 4

Time series (5 July 2007 0430–7 July 2007 0430 LT) of wind speed, temperature, TKE \( \left( {\overline{e} } \right) \) measured at the 25-m anemometer. Circles: observational data; solid lines: model results: blue: MYJ; red: BOULAC; black: MYNN2; green: QNSE; yellow: YSU; magenta: ACM2; dotted line: MYJ + BEP + BEM; long dashed line: BOULAC + BEP + BEM

After this initial qualitative analysis of a short time window (48 h) showing that it is difficult to evaluate the PBL scheme performances on the basis of a limited period of time, we present a statistical verification on the two one-month periods. We considered 1 month in winter and 1 month in summer to include most of the possible stability conditions and we analyzed the hourly-average observations. The results are presented hereafter. It is worth noting that, in comparison with the literature that focuses on short periods, we have considered a longer simulation time. This allows us to better evaluate the model performance from a statistical point of view when many different meteorological conditions are simulated, accounting for both seasonal and daily variability.

To this aim we considered the quantile–quantile plots (qq-plots), which are useful since they represent and compare the distribution of both results and observations (see for instance, Garcìa-Dìez et al. 2013), allowing us to evaluate the model performance for different quantiles of the distribution.

The six top panels of Fig. 5 refer to wind speed in January. All PBL schemes overestimate the observed values. These results show that the PBL schemes are generally unable to reproduce low wind-speed conditions. In fact, except for a few cases, when the observed wind speeds are below 4 m s−1, the models predict values up to 6 m s−1. This result suggests that the diffusion coefficients, which are predicted by a low-order closure scheme, are not able to simulate low wind-speed conditions. Recently, using the performance of MOST in low wind-speed conditions, Luhar et al. (2009) came to the same conclusion. However, it should be stressed that it is equally likely that the observations are affected by the wake of the nearest buildings. This, at least, suggests the possibility of the observations underestimating the typical wind speed for the surface type that the model is simulating, rather than the model overestimating wind speed. In other words, the presence of the individual buildings is accounted for by an increased value of roughness length associated with the urban land-use category, which affects the flow over a larger area (one grid cell as a minimum) compared to the measurement site. This is particularly remarkable in our case (as often in similar cases) where the measurements are taken in a green area surrounded by relatively tall buildings. This fact may introduce discrepancies between simulation results and observations.

Fig. 5
figure 5

qq-plots (January 2008). Six top panels wind speed (m s−1), six middle panels temperature (K), six bottom panels wind direction (degrees) at the 25-m anemometer level

The qq-plots for temperature in January 2008 (Fig. 5, six middle panels) show very small differences among the schemes, with all underestimating the observed temperature, but with performances improving at higher temperatures. Close to the land surface the temperature is mostly determined by the heat exchange between the land surface and the atmosphere, and no significant difference is observed between the schemes because all use the same land-surface scheme.

The six bottom panels in Fig. 5 depict the results for the wind direction (January 2008), with all schemes performing similarly. It is worth noting that Wang et al. (2016) found the best performances for wind direction using the QNSE and MYNN2 schemes. The modelled direction distributions show poor agreement with the observed ones between about 025° and 150°, and between about 225° and 350°. In contrast, they are similar for wind directions around 000° and 180°, noting that the site is characterized by frequent north–south wind directions (Trini Castelli et al. 2014). The PBL schemes are able to reproduce this feature but they reproduce a rather uniform distribution. It may be useful to recall that the mast is located on the outskirts of the city, and so the distribution of the heights of the surrounding buildings is non-homogeneous and the measurements are influenced to a different extent for each direction. The models that do not consider the urban morphology cannot take into account the effect of the complex building characteristics. In fact, in these models the presence of the urban environment is simply parametrized through the surface-layer fluxes.

On the qq-plots for TKE in January 2008 (Fig. 6), one can see that the PBL schemes produce different results and behave differently. Better performances are found for the BOULAC and MYNN2 schemes, while the MYJ and QNSE schemes underestimate the observed values.

Fig. 6
figure 6

TKE (m2 s−2) qq-plots (January 2008) at the 25-m anemometer

The wind-speed qq-plots for July 2007 are presented in the top six panels of Fig. 7, where the speeds are generally higher than in January 2008. All schemes slightly overestimate the measurements at 25 m, but perform much better than in winter when there are many cases of low wind speed. The complexity of the flow is also less, most likely because the roughness sub-layer with lower wind speeds has less influence at this level.

Fig. 7
figure 7

As in Fig. 5 but for July 2007

The six middle panels in Fig. 7 depict the comparison between the modelled and observed temperatures in July 2007, and except for a small underestimation, the agreement is good for all schemes and the whole range of values. The result is even better than in January 2008, which suggests that the model’s accuracy improves in the warm season. During the cold season, it is more difficult for the model to reproduce the meteorological conditions. During winter in the Po Valley, low wind speeds and stable conditions are very frequent, but in contrast during the summer season the strong convective conditions are easier to simulate.

The wind direction (Fig. 7, six bottom panels) simulated in July 2007 agrees better with the observations than in January 2008, while the TKE schemes show superior performance. This result differs from that found for January 2008 probably because of the more local characteristic of the wind regimes in summer. This also implies that the diurnal cycle generated by the mountain-valley breeze circulation is well reproduced.

Looking at Fig. 8, where the comparison of the TKE in July 2007 is shown, it can be observed that, as expected, the observed values are much higher than in January 2008, but not all the schemes are able to reproduce this strong turbulence. Only the BOULAC scheme agrees with the observations, at least for the lower values; the MYJ, MYNN2 and QNSE schemes underestimate the observed TKE.

Fig. 8
figure 8

TKE (m2 s−2) qq-plots (July 2007) at the 25-m anemometer

As a general conclusion of the qq-plot analysis it appears that the mean values (wind speed, wind direction and temperature) do not show large differences among the PBL schemes, regardless of whether the TKE equation is included in the model or not. The TKE is reproduced with variable accuracy: the BOULAC and MYNN2 schemes give the best results in January 2008, while in July 2007 only the BOULAC scheme outperforms the other schemes.

Table 3 gives the verification metrics for the 25-m anemometer in January 2008. In this section we discuss the result of the six PBL schemes, while the performances of the MYJ + BEM + BEP and BOULAC + BEM + BEP schemes are analyzed below. The positive values of the fractional bias indicate a general overestimation for wind speed, with the wind-speed correlations very low for all schemes. Wang et al. (2016) found the best agreement for wind speed with the YSU, QNSE and BOULAC schemes when comparing those PBL schemes against WRF-LES model simulations. All schemes underestimate the mean temperature by about 3 K, except for the BOULAC scheme, while the correlation coefficients for temperature are satisfactory. It may be noted that Wang et al. (2016) obtained the best performance for temperature with the YSU and ACM2 schemes. These closure schemes performed better because they compared the vertical profile throughout the whole PBL. Concerning TKE, the best value of fractional bias is provided by the MYNN2 scheme, while for all schemes the correlations > 0.5, with a correlation of 0.65 for BOULAC. This PBL scheme also yields the lowest NMSE values. The YSU and ACM2 schemes do not provide TKE since they do not use a TKE equation.

Table 3 Statistical indices for January 2008

To investigate the seasonal variations and account for the widest range of meteorological conditions, we performed the same analysis for July 2007. Table 4 reports the results of the model evaluation for the 25-m anemometer; for January 2008, all schemes overestimate the wind speed, with the wind-speed correlations higher than those in January 2008, but still below 0.6. For the model evaluation of temperature, all schemes slightly underestimate with differences between simulations and observations of 1–2 K. Fractional bias and NMSE values are very small and the correlation coefficients ≈ 0.9. Our results partially agree with those of Hu et al. (2010) who, in a series of simulations spanning 3 months during summer 2005 with three PBL schemes (MYJ, YSU, and ACM2), found that WRF model simulations underpredicted the temperature and the MYJ scheme had the largest bias. In the lower atmosphere during the daytime, both the YSU and ACM2 schemes predicted higher temperatures and thus give smaller biases than the MYJ scheme because of their stronger vertical mixing. The last statistical analysis in Table 4 addresses the TKE during July 2007, with the MYJ and QNSE schemes significantly underestimating the TKE measurements. In general, the best performance is obtained with the MYNN2 and BOULAC schemes, in spite of its coupling with the urban model. These results are similar to those of Wang et al. (2016) who, although in different conditions, found that TKE values from the BOULAC and MYNN2 schemes were the closest to those of the WRF-LES model, although they were underestimated. In contrast, they found that the QNSE and MYJ schemes largely underestimated the TKE values obtained with WRF-LES model. In our case we found TKE correlation coefficient values of 0.5–0.6.

Table 4 Statistical indices for July 2007

It is worth noting that Garcìa-Dìez et al. (2013) reported negative biases in the surface temperatures throughout the summer. The MYJ scheme resulted in the most substantial cold biases during the daytime, compared to the YSU and ACM2 schemes, with the YSU scheme most commonly exhibiting the highest temperatures among the three schemes throughout the day and night. Kleczek et al. (2014) performed an intercomparison during 1 day in July (GABLS3 case), and observed that the 10-m wind speed showed wide disparity between the parametrizations (YSU, ACM2, MYJ, BOULAC, MYNN25, and QNSE). The highest values were generally obtained with the BOULAC scheme, which overestimated the observations, while the MYNN2 scheme provided the lowest values.

In our case, all the PBL scheme performances show a seasonal dependence. The temperature simulation was improved in July 2007 than in January 2008 when the temperature was underestimated by all schemes, while TKE results are more accurate for the summer than for winter. It can be observed that when comparing the performances of the different PBL schemes, it is hard to identify the optimum scheme for every situation and for every meteorological variable. However, the schemes including an equation for the TKE generally show better performance compared to the first-order schemes.

5.2 Comparison Between the PBL Schemes Coupled with the Urban Model

As previously mentioned, two of the PBL schemes (MYJ and BOULAC) can be coupled with the urban model of Martilli et al. (2002), Salamanca and Martilli (2010) and Salamanca et al. (2010). In this case, the urban model provides a parametrization for the surface exchanges, allowing us to compare the measurements at the 5 and 9-m levels with the model results. In particular, we wish to investigate the effect of this parametrization by comparing the two PBL schemes with and without the urban option.

The results of the model evaluation concerning the MYJ + BEP + BEM and the BOULAC + BEP + BEM schemes together with the other PBL schemes at the 25-m level are shown in Table 3 for January 2008 and in Table 4 for July 2007. We see that in January 2008 the MYJ scheme with the BEP + BEM model slightly improves the results with respect to the MYJ scheme for wind speed and shows a more remarkable improvement for the TKE. The agreement for the temperature is satisfactory if the urban model is coupled to the MYJ scheme, while the MYJ scheme without the urban model underestimates the temperature. The BOULAC + BEP + BEM scheme slightly increases the overestimation of the measured values, while in the case of the temperature, this scheme slightly decreases the underestimation with respect to the BOULAC scheme. For TKE, the BOULAC + BEP + BEM scheme overestimates the observations, while in the case without the BEP + BEM model the BOULAC scheme underestimates those observations. This demonstrates the ability of the surface-exchange parametrization to increase the TKE. In July 2007 the MYJ scheme overestimation of the wind speed is reduced when the scheme is coupled with the BEP + BEM model. In fact, a similar accuracy is exhibited for temperature and TKE. The BOULAC + BEP + BEM scheme performance is slightly worse for wind speed, which is overestimated, while temperature results are almost the same as in the case without the urban model. TKE is slightly underestimated if the urban model is used with the PBL scheme and slightly overestimated for the case without urban model.

Looking at these results, one could conclude that the effect of the surface-exchange parametrization does not improve the performance of the PBL schemes. However, as we already stressed, this parametrization should be correctly assessed in the roughness sub-layer rather than in the inertial layer. For this reason, hereafter we consider the two lower-layer anemometers at the heights of 5 m and 9 m respectively, which are located below the top of the canopy.

The four top panels in Fig. 9 show the qq-plots for the wind speed in January 2008. While the MYJ and BOULAC schemes overestimate at the 5 and 9-m levels, the same PBL schemes, but coupled with the BEP + BEM urban-surface-exchange model, agree satisfactorily with the measurements, at least for low wind speeds (< 2 m s−1). The four middle panels in Fig. 9 show the qq-plots for the temperature in January 2008, where we see that temperature predictions are slightly better when using the MYJ + BEP + BEM and BOULAC + BEP + BEM schemes, even though the models still underestimate the lowest and the highest values. Both wind speed and temperature do not show differences between the performances at the two anemometer levels. Looking at the four bottom panels in Fig. 9, where the qq-plots for the wind direction in January 2008 are shown, it can be noted that both the MYJ and BOULAC schemes behave almost in the same way at the two levels. They show, as in the case of the 25-m anemometer, good agreement with the observations in the north–south directions. However, coupling the urban model to the PBL scheme improves the agreement in the range between 200° and 350°, while a similar improvement does not appear in the range between 050° and 200°.

Fig. 9
figure 9

qq-plots (January 2008): black 5 m, red 9 m. Four top panels wind speed (m s−1), four middle panels temperature (K), four bottom panels wind direction (degrees)

Figure 10 shows the qq-plots for the TKE in January 2008, with the MYJ scheme underestimating at the 9-m level and agreeing with measurements at the 5-m level. Using the urban model BEP + BEM, which should increase the TKE production, improves the model performance at 9-m level but yields an underestimation at 5-m level. In the case of the BOULAC scheme, the results are satisfactory if the urban model is not used, but they are worse when the BEP + BEM model is coupled to the BOULAC scheme, leading to an overestimation of the TKE.

Fig. 10
figure 10

TKE (m2 s−2) qq-plots (January 2008): black 5 m; red 9 m

As far as the wind speed in July 2007 is concerned (Fig. 11, four top panels), when the BEP + BEM model is coupled to the MYJ scheme, the agreement between modelled values and measurements improves at both levels in the canopy layer. Also the BOULAC performance improves, although to a lesser extent. At the lower levels, the effect of the buildings on the wind field cannot be captured by any scheme, regardless of the PBL scheme, unless a surface-layer exchange model is introduced.

Fig. 11
figure 11

As in Fig. 9 but for July 2007

The results for the temperature in July 2017 (Fig. 11, four middle panels) are generally very satisfactory at 5 m, while at 9 m a general underestimation is found for all schemes, showing that the urban model slightly influences the performance of the schemes. Wind direction for July 2007 (Fig. 11, four bottom panels), both for the MYJ and BOULAC schemes, shows a smaller number of cases with respect to the observations in the range between 050° and 100°, and the urban model, when coupled with the MYJ and BOULAC schemes, reduces this underestimation.

Coupling the MYJ and BOULAC schemes with the BEP + BEM model does not significantly improve the accuracy of the model TKE predictions in July 2007 (Fig. 12). It is well known that during summertime turbulence is generally more intense during daytime and the urban heat island sustains weak turbulence at night as well. Thus, the effect of the urban model in these conditions may be less significant than in winter.

Fig. 12
figure 12

As in Fig. 10 but for July 2007

One conclusion for the qq-plot analysis of the MYJ and BOULAC scheme performances is that the urban model improves the results for wind speed and direction (both in winter and in summer) in the roughness sub-layer, and slightly improves the results for the temperature in winter, while in summer coupling the urban model does not improve the results with respect to the MYJ and BOULAC schemes. Concerning the TKE, the two PBL schemes differ: the performance of the BOULAC scheme is slightly better in July 2007 if it is coupled with the BEP + BEM model, and worse in January 2008, while the MYJ + BEP + BEM scheme improves the results for the MYJ scheme at the height of 9 m in January 2008, with no change in July 2007, but the results worsen at the height of 5 m.

To investigate the turbulence structure in the surface layer, we calculated PDF(δ) of the differences (δθ) between the wind speed, temperature, and TKE at the different elevations above the ground surface,

$$ \delta \theta \left( {r,t} \right) = \theta \left( {x + r,t} \right) - \theta \left( {r,t} \right) $$
(3)

where θ is a generic mean variable (wind speed, temperature, TKE), x is position, r is the separation distance and t is time. The statistics of the difference between measurements performed at different heights give information about the coupling of the flow at different levels. In our case, it is particularly interesting since the higher measurement point we considered (25 m) was approximately in the inertial layer while the lower measurement point (9 m) is in the roughness sub-layer. The comparison between the statistics of the simulated and measured differences allows us to assess how the model is able to reproduce the flow both within and above the canopy layer, and the interactions between the two layers. On the other hand, PDF(δ) values for the differences reveal whether the model correctly predicts the vertical temperature gradient (and hence the stability) and the wind shear within the inertial and roughness sub-layers).

For this analysis we considered only three PBL schemes, two of them being TKE schemes (MYJ and BOULAC) coupled with the BEP + BEM urban model, which are the only schemes able to reproduce the flow inside the canopy. Furthermore, as a reference, we considered the first-order closure YSU scheme.

In Fig. 13 (top), PDF(δ) values of the wind-speed differences, between 25 m and 9 m, for the observations (black) and for the model results (red), are depicted for January 2008. When comparing the simulated and measured PDF(δ) values, the best performance is obtained with the YSU scheme, which reproduces the measured PDF(δ) values very well. The BOULAC and MYJ schemes also reproduce the observed PDF(δ) values, although with less accuracy. Both simulated and measured PDF(δ) values show high occurrences of positive values, indicating the presence of positive vertical gradients. The BOULAC and MYJ schemes tend to slightly overestimate the wind-speed gradients compared to the measurements. The observations show a considerable number of cases close to zero, which is not found in the simulated PDF(δ) values.

Fig. 13
figure 13

Wind-speed (top) and temperature (bottom) difference PDF(δ) between 25 and 9 m (January 2008). The BOULAC and MYJ schemes are coupled with BEP + BEM model. Black: observations, red: model

In Fig. 13 (bottom panels), the PDF(δ) values of the temperature difference are plotted (January 2008), with the maximum on the vertical axis limited to a value of 2 to improve the clarity. The simulated PDF(δ) values overestimate the frequencies of the small temperature differences, with the observed PDF(δ) less noisy than the modelled values, and with its maximum value corresponding to a negative vertical temperature gradient. Both BOULAC and MYJ schemes give an almost constant temperature profile. This result agrees with Sterk et al. (2016), who found that, using a single-column version of the WRF model, the simulated temperature and moisture gradients were underestimated. The PBL schemes generate excessive vertical mixing, which reduces the vertical stratification, and may be due to the overestimation of the vertical wind-speed gradient. In fact, overestimating the occurrences of a large vertical wind-speed gradient would also overestimate the cases of large vertical momentum flux, inducing a homogenous vertical temperature profile in the surface layer. As will be shown hereafter, the urban model does not influence the temperature but only the TKE. It may be noted that the PDF(δ) for the YSU scheme is asymmetric (skewed) towards positive values and overestimates the number of cases with gradient equal to zero. Also, the YSU scheme does not generate negative vertical gradients.

In Fig. 14 (top), the PDF(δ) values of the wind-speed vertical differences for July 2007 are shown, where the PDF is wider than in January 2008. The BOULAC scheme shows a high dispersion of values, which is lower in the case of the MYJ scheme, while the YSU scheme gives values closest to observations. Compared to January 2008, there are a lower number of measurements close to zero, and as in January 2008, PDF(δ) for the YSU scheme leads to the best result.

Fig. 14
figure 14

Wind-speed (top) and temperature (bottom) difference PDF(δ) between 25 and 9 m (July 2007). The BOULAC and MYJ schemes are coupled with BEP + BEM model. Black: observations, red: model

The temperature difference PDF(δ) for July 2007 is shown in Fig. 14 (bottom). The observed PDF(δ) values suggest a large variability in the vertical temperature gradient due to the more intense turbulence in summertime than in wintertime, and this gradient is more frequently negative because of surface heating. Only the BOULAC scheme generates a similar PDF(δ), even though this scheme underestimates the vertical gradient (there are no cases with a vertical gradient < − 2 K over the distance between the anemometers). The YSU scheme largely overestimates the frequency of the vertical gradients close to zero and does not generate negative vertical gradients < − 1 K over the distance between the anemometers. As in January 2008, the PBL schemes generate strong mixing, which reduces the vertical temperature gradient, though the overestimation of the wind speed may also contribute to the excessive mixing.

The PDF(δ) values of the TKE differences for January 2008 are shown in Fig. 15 (top), where we note that the observed PDF(δ) is skewed toward positive values. The PDF(δ) simulated by the two schemes (BOULAC + BEP + BEM and MYJ + BEP + BEM) show differences, and while the results of the BOULAC scheme agree better with the observations, PDF(δ) obtained with the MYJ scheme is sharper than the observed PDF(δ), indicating an overestimation of the cases with a uniform vertical distribution of the TKE. Due to the intense turbulence generated by convection in July 2007, the TKE PDF(δ) values cover a wider range than in January 2008 (Fig. 15, bottom). Comparing the results of the two schemes, the PDF(δ) values obtained using the BOULAC scheme agree better with the observed PDF(δ) than those predicted by the MYJ scheme, even though both simulated PDF(δ) values are close to zero and excessive with respect to the observations. The observed PDF(δ) has its maximum corresponding to a positive value, indicating that the TKE increases from the surface to the top of the roughness sub-layer, but both the PBL schemes are not able to reproduce this behaviour.

Fig. 15
figure 15

TKE difference PDF(δ) between 25 and 9 m, for January 2008 (top) and for July 2007 (bottom). BOULAC and MYJ schemes are coupled with BEP + BEM model. Black: observations, red: model

6 Conclusions

We carried out an evaluation of six PBL schemes available in the WRF model. For the sake of comparison, we considered the Urban Turbulence Project (UTP, Mortarini et al. 2013; Trini Castelli et al. 2014) dataset that provided measurements from three sonic anemometers located at different heights on a mast in the outskirts of the city of Turin in the Po Valley, Italy. We performed several quantitative analyses to highlight possible strengths and weaknesses of the PBL schemes. Two out of the six schemes can be coupled to the urban model proposed by Martilli et al. (2002), Salamanca and Martilli (2010) and Salamanca et al. (2010). Four schemes solve the TKE equation. After performing a qualitative comparison over a 48-h time window, an in-depth statistical analysis was performed for the months of July 2007 and January 2008.

A first comparison with the data from the highest anemometer (25 m), located in the inertial sub-layer, was carried out for the six PBL schemes (MYJ, BOULAC, QNSE, MYNN2, YSU, ACM2). Analyzing the qq-plots, which assess the agreement between modelled and observed distributions, it appears that wind speed, wind direction and temperature results do not show remarkable differences among the PBL schemes, even for the PBL schemes that solve the TKE equation. The agreement of the predicted wind speed with the observations is particularly poor for all schemes for the winter case. The accuracy improves for both the wind direction and temperature for July 2007; in contrast, the TKE is reproduced with different degrees of accuracy. The BOULAC and MYNN2 schemes give the best results in January 2008, while in July 2007, the BOULAC scheme outperforms the other schemes.

Then, several metrics (fractional bias, normalized mean-square error, and correlation) reveal that all PBL schemes show similar performances during January 2008 for wind speed and temperature. The wind speed is overestimated while the temperature is underestimated for January 2008; all schemes exhibit good correlation regarding the temperature. The results for July 2007 indicate that the PBL schemes overestimate the wind speed while slightly underestimating the temperature. The TKE underestimation is greater in July 2007 than in January 2008, except for the MYNN2 and BOULAC schemes that show a good performance.

Then, we considered the MYJ and BOULAC schemes, coupled with the urban model (BEP + BEM: Martilli et al. 2002; Salamanca and Martilli 2010; Salamanca et al. 2010), in the roughness sub-layer. The model results were compared with the anemometer data at 5 and 9 m. The urban model coupled to the MYJ and BOULAC schemes improves the results for wind speed (both in winter and in summer), and slightly improves the results for wind direction and temperature in winter. In summer, temperature results remain of the same quality as in the case of the MYJ and BOULAC schemes without the urban model and wind-direction performance improves to a small degree. Concerning the TKE, the two PBL scheme performances differ. The BOULAC scheme performance is improved whether the PBL scheme is coupled with the BEP + BEM model. The results of the MYJ + BEP + BEM scheme are similar to those of the MYJ scheme in winter and worse in July 2007 (particularly at a height of 5 m). It is worth stressing that the urban model is essentially a source of TKE, which is likely to be more appropriate in winter than in summer.

Finally, we compared the measured and simulated PDF(δ) of the differences (δ) between the values at two heights (25 and 9 m). Only three schemes were considered, the MYJ and BOULAC schemes coupled with the urban BEP + BEM model and (except for TKE) the first-order non-local YSU scheme as a reference. The BOULAC + BEP + BEM and MYJ + BEP + BEM schemes are not able to correctly reproduce PDF(δ) for wind speed, while the YSU scheme yields the best result. The number of cases with a small vertical temperature difference is overestimated by both the BOULAC + BEP + BEM and MYJ + BEP + BEM schemes, while the YSU scheme overestimates the cases with a positive vertical temperature difference. The BOULAC + BEP + BEM scheme shows a satisfactory performance concerning PDF(δ) for TKE.

The overall conclusion is that the six PBL schemes analyzed perform with a similar degree of accuracy when only the inertial sub-layer is considered. When focusing on the flow in the roughness sub-layer due to the urban canopy, the best results are obtained when the surface exchange model (BEP + BEM) is coupled to the TKE schemes, particularly in winter season, since the production of TKE is enhanced.

Finally, it is worth noting that all the schemes, including those that can be coupled with the urban-surface exchange model (BEP + BEM), are lower-order schemes. They cannot reproduce the effects of the turbulent transport (non-locality) that is represented by the third-order moments. Coupling the BEP + BEM model to the TKE schemes improves the local production of TKE, which however is not transported upward to the inertial layer.