1 Introduction

The estimation of the land surface temperature (LST) from the thermal infrared radiation (TIR) emitted by the Earth became possible a few years after the arrival of remote sensing from the space. Rao (1972) was the first to apply TIR to estimate temperature patterns for the cities along the USA mid-Atlantic coast from data collected by the Improved TIROS Operational Satellite (Gallo et al. 1995). TIR measurement from the space permits to know the LST assigned to a pixel (Dash et al. 2001, 2002), without considering the multiple parts contained in this pixel, as vegetation, sunlit and shadowed soils, irregular urban surfaces or soil moisture, with different albedo and emissivity values (Tomlinson et al. 2011; Benali et al. 2012). In fact, TIR is derived from the top of atmosphere radiances, from which LST is obtained after applying corrections due to atmospheric attenuation, angular effects and emissivity values at the heterogeneous surface. Water vapour and aerosols are the main agents causing variable attenuation in the TIR signal. This reduces the LST availability to only under cloud-free conditions, to avoid a systematic bias toward colder-than-true values (Williamson et al. 2013). In this sense, the integration of synergistic information from satellite optical-IR and passive microwave remote sensing has been proved recently to permit consistent and reasonable temperature estimations with cloudy skies (Jang et al. 2014).

The right estimation of the temperature of the air at ≈ 2-m height above ground (Ta) from LST is possible but complex. The vertical lapse rate to be applied is function of the surface energy balance, which varies in function of the nature of the surface and of the instant of the day, as also of advection, adiabatic processes, turbulence and latent heat fluxes, all of them affected by cloud cover, water vapour content and vegetation (Benali et al. 2012). During the night, the estimation of Ta becomes simpler because the earth surface behaves almost as homogeneous surface (Didari et al. 2017).

This Ta estimated from satellite measurements would solve the weather stations scarcity in wider regions, where the geospatial interpolation methods, as kriging or splines, cannot provide accurate estimations, as happens, for instance, in mountainous terrain (Lin et al. 2016) or undeveloped countries. In this way, Ta estimation becomes of crucial importance to solve spatial gaps for a wide range of applications, in such a way that it is accepted that TIR produces better Ta estimations than those obtained by interpolating ground-station temperatures (Mendelsohn et al. 2007).

In the first years of remote sensing, this strategy based on TIR permitted to obtain extended and automatic LST for large regions, but with a limited spatial resolution, as in the case of Meteosat satellite (e.g. Cresswell et al. 1999). Since the year 2000, the MODerate resolution Imaging Spectroradiometer (MODIS) sensor in Terra and Aqua polar satellites (http://modis.gsfc.nasa.gov/) have reduced the spatial resolution for LST to 1 km per pixel. In this way, two images per satellite per day are generated (Terra satellite passes daily over the equator close to 10:30 UTC and 22:30 UTC; Aqua at 13:30 UTC and 01:30 UTC), using both the 10.78–11.28 μm and 11.77–12.27 μm spectral bands, together with split-window algorithms (Wan et al. 2002). These MODIS products have been submitted to consistent validation (Coll et al. 2005; Wang et al. 2008). In addition to climatological applications, other biological and physical processes on the land and the ocean may be derived using 36 electromagnetic spectral bands from visible to TIR available from MODIS (Zhang et al. 2003; Wan et al. 2004; Wang et al. 2009) or also mapping the global distribution of urban land (Schneider et al. 2009).

Given the practical impossibility of Ta direct determination from MODIS Terra LST, different estimation methods have been applied (Zaksek and Schroedter-Homscheidt 2009). The simplest one consists in supposing a linear relation between Ta and LST by distinguishing different land cover types, as Shen and Leptoukh (2011) have applied to Central and Eastern Eurasia, or without this distinction for smaller regions (Fu et al. 2011; Sohrabinia et al. 2015). The consideration of the called temperature-vegetation index (TVX), proposed by Nemani and Running (1989), by applying the normalised difference vegetation index (NDVI) and ignoring its seasonal, ecosystem type and soil moisture variability, has permitted the inclusion of the vegetation cover as a relevant factor (Prihodko and Goward 1997; Vancutsem et al. 2010; Cristóbal et al. 2008; Nieto et al. 2011; Wenbin et al. 2013; Shah et al. 2013; Bustos and Meza 2015). Modifications have been also added, as the differential TVX method (Sun et al. 2014). Certainly, the vegetation cover is determinant by its transpiration cooling and latent heat fluxes, as also through their low albedo and roughness which aides efficient sensible heat dissipation (Benali et al. 2012). Alternatively, the multiple linear regression applies different variables in addition to LST and NVDI to estimate Ta, as latitude, distance from coast, altitude and solar radiation (Cristóbal et al. 2008) or albedo and solar radiation (Xu et al. 2014). Nevertheless, after having considered different predictors, Lin et al. (2012) have proved that just the altitude and LST permit to obtain Ta for East Africa. Zhang et al. (2011) also apply the solar declination variable along the year and LST to derive Ta in China. Kloog et al. (2017) derive daily Ta estimations from LST, NVDI, elevation and the grid cell percentage of urbanicity for France. But, in general, for extended regions, spatial-temporal variables as Julian day of the year, latitude, longitude, height above sea level, slope, curvature and distance to the coast use to be considered (Recondo et al. 2013; Peón et al. 2014; Good 2015; Thanh et al. 2016; Yang et al. 2017). A relation with the variables applied by different authors can be found in Janatian et al. (2017). Also, the use of spatio-temporal regression-kriging and incorporation of time-series of remote sensing images have been proved to permit significantly more accurate maps of temperature than if plain spatial techniques were used (Hengl et al. 2012). At the planetary scale, the new dataset of spatially interpolated monthly climate data for global land areas at a very high spatial resolution (approximately 1 km2) has considered LST observations to cover areas with a low station density (Fick and Hijmans 2017). This effort improves the first LST map at the planetary scale from MODIS measures (Kilibarda et al. 2014).

Besides different valuable applications of the estimation of Ta from LST, this study should be the base of future detailed analyses of the urban heat island (UHI) of Barcelona considering its entire Metropolitan Region (BMR). Unless otherwise indicated, UHI intensity is derived as the difference in spatially averaged surface temperatures between urban and non-urbanised surroundings, as a measure of the excess of warmth of the urban atmosphere (Voogt and Oke 2003). UHI phenomenon has been usually analysed from air temperature measurements of a short number of gauges across the city, sometimes with emplacements submitted to criticism, as gardens or roofs, and others outside the city influence (Stewart 2011). This humble departing state (Landsberg 1981), due to the scarcity of points with measurements, has led to a first spatial and temporal characterisation of the phenomenon (Arnfield 2003). Later on, thermometers installed in automobile have permitted to extend the analysis for selected transects across the city (Caselles et al. 1991; Moreno-Garcia 1994) to derive a thinner description of the anomalous urban thermal behaviour in relation to the rural proximity of the city. Urban networks of stations together with different rural temperature observatories around the city have been also undertaken permitting detailed descriptions (Giannaros and Melas 2012; Yang et al. 2013). The multiplicity of urban internal configurations, due to the diversity of geometry, morphology and size of the cities, together with local or regional air dynamics, as the mesoscale sea breeze in the case of littoral cities, makes the urban climate a difficult objective to be rightly modelled. This shortcoming is notably solved by considering Ta series derived from satellite remote sensing. Voogt and Oke (2003) have reviewed the research done with thermal remote sensing before MODIS application. They conclude that the complexity of the urban surface should be analysed through couple canopy radiative transfer models with both sensor view models and surface energy balance models to simulate air temperature in and above the urban canopy layer. With this purpose, Miao et al. (2009) have applied MODIS observations with dynamical models to simulate urban weather features for comparison with observations in Beijing. Nevertheless, most of the studies analyse the spatial UHI taking advantage of the high spatial resolution provided by MODIS but without considering air dynamics for single cities (Cheval et al. 2009; Cheval and Dumitrescu 2009; Fabrizi et al. 2011; Tomlinson et al. 2012; Ma et al. 2016), as also for sets of cities (Jin et al. 2005; Hung et al. 2006; Yasuoka 2006; Pongrácz et al. 2006, 2010; Imhoff et al. 2010), for selected episodes, months or a few years.

2 Database

2.1 Study area

The metropolitan region of Barcelona, BMR, with an extension of 3242.2 km2 and a population density of 1566.2 inhabitants/km2, according to IdesCat-2017 (Institut Català d’Estadística), is a crowded area close to the Mediterranean Sea. Particularly, Barcelona city, with a population of 1.6 million inhabitants, covers an area close to 100 km2 with a population density close to 16,000 inhabitants/km2. The orography of the analysed region is characterised by the Littoral and Pre-Littoral chains, with moderate altitudes up to 1700 m a.s.l., both parallel to the Mediterranean coast. Between both chains are placed the Vallès valley and Penedès Basin. The most extended urban area (Barcelona city) is constrained among the Mediterranean shoreline, the Littoral chain and Llobregat and Besós rivers. The main orographic features of the region and the distribution of altitude in meters are shown in Fig. 1a,b. Details of the spatial distribution of CORINE land cover classes (http://land.copernicus.eu/pan-european/corine-land-cover/clc-2012), at level 3 for the year 2012, are shown in Fig. 1c and Table 1.

Fig. 1
figure 1

Spatial distribution of main orographic features (a), altitude above sea level (b), CORINE land cover classes (c) and the thermometric stations (d)

Table 1 Types of land cover (percentage and areas) on BMR

2.2 Meteorological station data

Observed daily minimum, Tmin; mean, Tmean and maximum, Tmax temperatures are the dependent variables of this study. These come from 48 meteorological stations, 37 of them belonging to the Servei Meteoròlogic de Catalunya (www.meteocat.cat) and 11 to the Agencia Estatal de Meteorología, (www.aemet.es) for the year 2015. Data are obtained in both cases from automatic weather stations, and their quality is guaranteed by periodic instrumental controls of the two governmental institutions. Additionally, the 48 thermometric records are free of perturbations, such as sharp changes or artificial trends, in agreement with the results of the Buishand (1982) and Pettitt (1979) tests, as proposed by Wijngaard et al. (2003). Given that these possible perturbations cannot be detected analysing only 1 year, the two mentioned tests have been applied to longer records including the year 2015, and the results have been also compared with previous analysis of the thermometric regime in a wider area of Catalonia (Martínez et al. 2010). Figure 1 d shows the spatial distribution of the stations, where five of them are outside but very close to the studied region. The stations are well spread over BMR except in the north, where they are scarce. Table 2 gives the main geographical and topographic variables of the thermometric station emplacements. Figure 2 a and b show the histograms of altitudes for the 1 km2 pixels covering all the BMA and for the set of meteorological stations respectively. Altitudes of the available thermometric stations are mostly emplaced below 600 m a.s.l.

Table 2 GIS characteristics of the stations: longitude (Lon), latitude (Lat), altitude above sea level (Alt), distance to shoreline (Con), orientation of the slope (Ori) and topographic slope (Slp). Ori equal to − 1.0 indicates flat terrains (slope equal to 0.0)
Fig. 2
figure 2

Histograms of altitude for all the pixels (a) and meteorological stations (b). Histograms of NDVI (c) and NDBI (d) only for meteorological stations

2.3 Satellite data

The daily MOD11A1 LST measured by MODIS Terra, including daytime 10:30 UTC surface temperature, LSTd, and night time 22:30 UTC, LSTn, with 1 km2 spatial resolution, has been used in this study. The normalised difference vegetation index, NDVI, is obtained from the 16-day MOD13Q1 product (with a resolution of 250 m):

$$ \mathrm{NDVI}=\frac{\mathrm{NIR}-\mathrm{RED}}{\mathrm{NIR}+\mathrm{RED}} $$
(1)

where NIR is the near-infrared reflectance band-2 (841–876 nm) and RED, the reflectance of the red band-1 (620–670 nm). The normalised difference build-up index, NDBI, is calculated as:

$$ \mathrm{NDBI}=\frac{\mathrm{MIR}-\mathrm{NIR}}{\mathrm{MIR}+\mathrm{NIR}} $$
(2)

where MIR is the surface reflectance band-6 (1628–1652 nm) from MOD09A1 product of 8-day average (with a resolution of 500 m). Figure 2 c and d show the histograms of NDVI and NDBI for emplacements of temperature stations. A half of these coefficients are within the 0.3–0.5 interval, corresponding to emplacements with low vegetation cover. For the whole set of 1 km2 pixels, the modal value of NDVI is shifted toward 0.6. In consequence, rural areas are slightly predominant in comparison with urban domains. With respect to NDBI, it is worth mentioning that most of the station emplacements have coefficients ranging from − 0.2 to 0.1. For the whole domain, the mode of NDBI is − 0.1, suggesting a slight predominance of nonurban areas.

2.4 GIS data and calendar day

Besides Satellite variables LSTd, LSTn, NDVI and NDBI, other six geographical and topographic variables are considered. These are latitude (lat), longitude (lon), distance to coast or continentality (con), altitude (alt), orientation (ori) and slope (slp) of the terrain for every meteorological station and pixel. The first three are derived from ArcGIS software (Geographic Information Systems, GIS). Altitude, orientation and slope are obtained from the Ground Digital Model (MDT—Institut Cartogràfic I Geològic de Catalunya, ICGC) with a 15 × 15 m2 resolution. Table 3 summarises the minimum, mean and maximum of LSTd and LSTn, NDVI and NDBI, together with geographic and topographic variables. Furthermore, the calendar day, cd, has been transformed into a new calendar day, cd*, according to:

$$ {\mathrm{cd}}^{\ast }=\cos \frac{2\pi \left(\mathrm{cd}-{\mathrm{cd}}_{\mathrm{max}}\right)}{365} $$
(3)

to obtain the linearity respect to the air temperature (Janatian et al. 2017). cdmax is the calendar day for which the mean temperature along the year is the highest. Figure 3 a and b show the relationship between Tmean and cd or cd* respectively. cdmax for the year 2015 is equal to 200 (July 19th). Figure 3 b shows more signs of linearity between the air temperature and the transformed calendar day, cd*.

Table 3 Mean, minimum and maximum of recorded diurnal (LSTd) and nocturnal (LSTn) temperatures, normalised difference vegetation index (NDVI) and normalised difference built-up index (NDBI), latitude (Lat), longitude (Lon), distance to shoreline (Con), altitude (Alt), orientation (Ori) and slope (Slp) for all 1 × 1 km pixels and for the 48 stations
Fig. 3
figure 3

Evolution of Tmean with the calendar day, cd (a) and the transformed calendar day, cd* (b)

Figure 4 a shows the dependence of Tmax on the orientation. This dependence is unclear and a linear relationship should be discarded. Trying to solve this lack of linearity, the orientation is given as sine and cosine compounds. Figure 4 b shows the case for Tmax against sine compound, where a small linear increasing tendency is observed. Conversely, the cosine compound does not show signs of linear tendency.

Fig. 4
figure 4

Dependence of Tmax on orientation, Ori (a) and sinus compound, ORIsin (b)

3 Methodology

The estimation of surface air temperatures is based on the relationships between variables obtained from satellite (LSTd, LSTn, NDVI and NDBI), geographic and topographic data (latitude, longitude, altitude above sea level, orientation, slope and distance to coast), the modified calendar day (cd*) and empiric data (Tmin, Tmean and Tmax) recorded at the thermometric stations.

The first step consists of computing the Pearson correlation coefficient for all possible pairs of data, including empiric data. In this way, possible relationships between assumed independent variables can be detected. Additionally, the dependence of empiric data on the set of independent variables can be determined. The rotated principal component analysis (RPCA) (Jolliffe 1986; Richman 1986; Preisendorfer 1988) is the second step. In this way, more detailed characteristics of relationships between independent variables can be established; particularly, the ratio of data variance explained by every rotated principal component, RPC, and the contribution (factor loading) of every independent variable in the RPCs. These strategies, Pearson correlation and PCA, have been also applied by Thanh et al. (2016).

The third step consists of a multiple regression process with software from Statistic package for Social Sciences, IBM-SPSS, with an assumed linear relationship between empiric data and independent variables. The multiple regression goodness of fit is quantified by the square regression coefficient, R2, including all the significant independent variables according to P values and α = 0.05 (Harrell 2001) and by residuals between empirical, emp, and estimated, est, temperatures computed from the root mean square error, RMSE, and the mean average error, MAE. These errors are computed as:

$$ RMSE={\left\{{N}^{-1}{\sum}_{i=1}^N{\left({emp}_i-{est}_i\right)}^2\right\}}^{1/2} $$
(4a)
$$ MAE=\left\{{N}^{-1}{\sum}_{i=1}^N\left|{emp}_i-{est}_i\right|\right\} $$
(4b)

The multiple regression process is repeated adding step by step a new independent variable. In this way, the relevance of every variable on the multiple regression process is contrasted by observing the changes in R2 and RMSE. In addition to the coefficients of the multiple regression equations, the beta weights (standardised coefficients) are also computed to determine the relevance of every independent variable on the multiple regression equation.

As a summary, in comparison with other similar researches based on satellite and thermometric stations data, it should be mentioned that in this paper, first, cross-correlation and principal component analysis (PCA) permits the detection of possible redundant variables in the multiple regression process. Second, new geographic variables (orientation and slope) are tested. Third, both thermometric satellite data (LSTd and LSTn) are used in the multilinear regression, whatever minimum, Tmin; mean, Tmean and maximum, Tmax, daily temperatures are deduced from the multiple linear regression. It should be also remembered that only variables with absolute values of beta weight exceeding 0.01 have been finally considered for the multiple regression equations.

4 Results and discussion

4.1 Correlation coefficients and PCA

The Pearson correlation and the RPCA, based on the principal component analysis, PCA, have permitted to detect the degree of dependence between empiric data (Tmin, Tmean and Tmax) and the rest of parameters (geographic and topographic variables and data from a satellite). Table 4 shows the Pearson correlation coefficients among all variables. The high correlations, ranging from 0.86 to 0.97, between Tmin, Tmean and Tmax with daytime and nighttime LST and also with cd* are outstanding. The correlation of the empiric temperatures with the other parameters is always inside ± 0.23.

Table 4 Pearson correlation coefficients for empiric temperature data, satellite data, geographic and topographic variables and transformed calendar day. The not significant coefficients (α = 0.05) are codified as ns

NDVI and NDBI are strongly correlated (− 0.79) as expected. There is also some correlation between these indices and the geographical variables latitude, longitude, altitude and slope, with values for NDVI between 0.26 and 0.52 and for NDBI between − 0.48 and − 0.36. It is important to mention that the correlations between the NDVI and NDBI indices with temperatures are low, with absolute values around 0.20. Latitude shows its highest correlation with longitude (0.62) and also with altitude (0.55), continentality (0.50), NDVI (0.48) and NDBI (− 0.48). Altitude has the highest correlation with continentality (0.60) and with latitude and NDVI index (0.52). The orientation has low correlations with all the other variables, although the highest ones correspond to those of the NDVI and NDBI indices. Finally, the slope, with low correlations in general, presents the highest ones with NDVI, NDBI, continentality and altitude.

Figure 5 shows some examples of the possible linear relationship between Tmin, Tmean or Tmax and some of the independent variables. The variables with the clearest linear relationship are, as expected, LSTd and LSTn according to the high correlation value obtained between these variables and the temperatures.

Fig. 5
figure 5

Dependence of Tmax and Tmin on some of the variables of the multiple regression process

The first four RPC selected with the eigenvalue exceeding 1.0 criterion have a similar percentage of explained variance (from 23.4 to 14.6%). Whereas they explain 77.2% of data variance, the remaining eight RPC are associated with the 22.8% of data variance. Consequently, the revision of the results offered by the PCA is centred on these four first RPCs. Tables 5b and 6b show the RPC factor loadings and the explained variance by the components, for the set of thermometric stations. The first component, RPC1, is strongly related to LSTd and LSTn and cd*, explaining 23.4% of data variance. The second component, RPC2, is mainly correlated with continentality and altitude, and also related to slope, latitude and NDVI. This component explains 22.6% of data variance. RPC3 is highly correlated with longitude and latitude, and slightly negative with NDBI. The third component explains 16.6% of data variance. Finally, RPC4 is notably correlated with the sine of the orientation, moderately with the cosine of the orientation and with NDVI and NDBI. This last component explains the variance of 14.6%. It is worth mentioning that NDVI and NDBI have middle weights on the second, third and fourth components, this fact suggesting that these two variables will probably have a lesser role in the multiple regression equations than expected. Given that PCA is applied to 12 variables for the 48 thermometric stations, this relatively low number of samples in comparison with the high number of 1 km2 pixels could mask the relevance of some variables on the spatial distribution of Tmin, Tmean and Tmax. In order to detect these possible differences, RPCA has been also applied to pixels with 1 km2 resolution. Tables 5b and 6b summarise the RPC factor loadings and the explained variance by the components, for the 4042 pixels of 1 km2. By comparing with results of Tables 5a and 6a, the first RPC is quite similar for both cases. The second RPC for Table 5a is equivalent to the third RPC for Table 5b, almost disappearing the contribution of the slope, NDVI and latitude. The third component (Table 5a), linked to latitude, longitude and NDBI, corresponds to the fourth RPC (Table 5b). The fourth RPC (Table 5a), with the contribution of NDVI, NDBI and orientation, is substituted by the fifth RPC (Table 5b), basically sinus of the orientation. Finally, the 2nd RPC (Table 5b) is constituted by NDBI, NDVI and slope. Conversely to Table 5a, where NDVI and NDBI are linked with moderate weights to more than one component, for Table 5b, these parameters are clearly related only to the second RPC. As a summary, the degrees of independence of the multiple regression variables are quite similar considering data from the set of thermometric stations or from a denser network of 1 km2 pixels. Only some discrepancies are detected comparing factor loadings corresponding to NDVI and NDBI for both spatial resolutions.

Table 5 Rotated principal components
Table 6 Total variance and percentage of variance and cumulated variance for every RPC

4.2 Multiple regression

4.2.1 Annual case

The first multiple regression is applied to the annual case, including all days along the year 2015 accomplishing two conditions: first, only not cloudy days can be selected, given that LSTn, LSTd, NDVI and NDBI cannot be accurately computed for cloudy days; second, days with missing Tmin, Tmean or Tmax are not chosen for the multiple regression procedure. The set of days accomplishing both conditions are designed as complete data days and the same dataset with the same constraints are used at seasonal and monthly scale. Table 7 gives the different models obtained in every stepwise regression, being added one more variable until all the significant variables are used. This table also gives the R2 coefficients and RMSE. The goodness of fit improves step by step, but with minor differences. For Tmin, Tmean and Tmax in the first step with a single variable, values of R2 from 0.860 to 0.935 and RMSE from 1.8 to 2.7 °C are obtained. When all significant variables are involved, R2 coefficients of 0.920, 0.955 and 0.918 and RMSE of 1.9 °C, 1.5 °C and 2.0 °C are reached.

Table 7 Models obtained with the stepwise regression analysis for Tmin, Tmean and Tmax at annual scale

Table 8 gives the standardised (beta weight) and non-standardised multiple regression coefficients for the last models of every dependent variable. In the case of Tmin, the variable with the highest beta weight is LSTn, while LSTd does not appear in the multiple regression equation because it is not significant. Latitude, longitude and calendar day also have prominent standardised coefficients. For Tmean, the most important variables are LSTn and LSTd, with the rest of the variables having small or non-significant coefficients. Finally, in the case of Tmax, the variables LSTd and LSTn have the highest beta weights and are also quite similar. Latitude, longitude and altitude also play a significant role in Tmax. It should be noted that NDVI and NDBI indices have small or non-significant coefficients.

Table 8 Standardised and not standardised multiple linear regression coefficients for the annual case. Discarded variables for the multiple regression are codified by ns

Orientation and slope are the variables that present the lowest beta weight coefficients for the three temperatures, possibly because a much denser network of stations would be necessary to represent all the variety of slopes and orientations. Figure 6 shows the bar histogram of the beta coefficients of the last model, including all the significant variables. It is outstanding the positive values for LSTn and, not so relevant, the positive values of LSTd. Among the negative values, the beta weight corresponding to latitude for Tmin is the most relevant.

Fig. 6
figure 6

Histogram of the beta coefficients of the last model at an annual scale for Tmin, Tmean and Tmax

The relationships between the temperatures obtained from the multiple regression equations and the observed Tmin, Tmean and Tmax are plotted in Fig. 7. While R2 for Tmean is 0.96, for Tmin and Tmax is 0.92. The RMSE (MAE) ranges from 1.5 (1.2) °C for Tmean to 2.0 (1.6) °C for Tmax. The histogram for Tmean residuals is also shown in this figure. Fifty-one per cent of differences between estimated and observed temperatures are lower or equal to 1.0 °C.

Fig. 7
figure 7

Estimated versus observed Tmin, Tmean and Tmax, and the histogram for residual Tmean at an annual scale

4.2.2 Seasonal and monthly cases

Table 9 shows the R2, RMSE and the number of samples, N, with complete data for seasonal multiple regressions and for the different temperatures. The highest correlations correspond to Tmean for spring and autumn, possibly due to the moderate range of the temperatures in these seasons. The lowest R2 corresponds to the winter Tmin. The RMSE values do not exceed 2.0 °C, which corresponds to Tmax in summer.

Table 9 Square regression coefficient, R2, root mean square error, RMSE and number of samples, N, at seasonal scale

The standardised beta coefficients for each variable are given in Table 10, with only significant coefficients. The italic entries correspond to coefficients greater than 0.10, seeing at a glance the most important variables in the multiple regression equations. Tmean is the variable that depends on the minimum number of variables in any season of the year, especially in spring and autumn. For example, in autumn, only the LSTn and LSTd temperatures have notable beta coefficients, and in spring, the cd* is also important. The NDVI and NDBI indices do not have high coefficients in any of the seasons. Only NDVI has values slightly higher than 0.10 in winter and summer for Tmax, while NDBI has a negative coefficient in spring for Tmax. The geographical variables that more contribute to the multiple regression are latitude, longitude, continentality and altitude, especially for Tmax and Tmin. Slope and orientation have small or non-significant beta weights.

Table 10 Standardised and not standardised multiple linear regression coefficients for the seasonal scale. Discarded variables for the multiple regression are codified by ns

Table 11 shows R2, RMSE and N for every monthly multiple regression and for each temperature. For monthly cases, R2 is lower than for seasonal or annual cases, ranging from 0.503 for July (Tmax) to 0.867 for November (Tmean). However, RMSE have lower values than seasonal or annual cases, especially for Tmean, which ranges from 1.2 to 1.5 °C. The highest values correspond to April (Tmin) and July (Tmax), both with 2.1 °C. Figure 8 shows the RMSE for every month and Fig. 9 the estimated versus observed Tmean for November. Table 12 summarises the significant variables on the multiple regression process for every month and the different temperatures. A significant difference in comparison with annual and seasonal scale is the relevance of LST. Whereas LST is the most relevant at annual and seasonal scales, some differences are detected at monthly scale depending on the specific month and Tmin, Tmean and Tmax. The first multiple regression variable for Tmin is the latitude for 8 months, the longitude (1 month) and LSTn only for 3 months. Conversely, for Tmean, LSTn is the most relevant for 11 months. Only for August is detected a slightly higher relevance of LSTd in comparison with LSTn. Finally, for Tmax, the first multiple regression variable is the altitude for 5 months, LSTd (5 months) and LSTn (2 months).

Table 11 Square regression coefficient, R2, root mean square error, RMSE and number of samples, N, at monthly scale
Fig. 8
figure 8

RMSE for every month

Fig. 9
figure 9

Estimated versus observed Tmean for November

Table 12 Significant variables in decreasing order exceeding Beta equal to 0.20 for the multiple regression process at monthly scale of Tmin, Tmean and Tmax

4.3 Spatial distribution of T min, T mean and T max

Some examples of the spatial distribution of temperatures on BMR obtained by multiple regression are shown in Figs. 10, 11, and 12. Figure 10 corresponds to Tmin, Tmean and Tmax obtained for November 29. The UHI phenomenon is quite evident for Tmin, being associated with Barcelona city and a neighbouring area at the south of the city with the highest temperatures. Yellow, green and blue areas represent zones of lower temperatures, which correspond to the Littoral and Pre-Littoral chains (yellow and blue areas respectively) and the Vallès valley (green area). The combined effect of the vicinity to the Mediterranean coast and the UHI phenomenon is detected on the Tmean map, with the highest temperatures along a narrow littoral fringe. It is also worth mentioning the detection of two nuclei of high temperatures, spatially coincident with those observed for Tmin. For Tmax, this effect of the vicinity to the littoral disappears and high temperatures cover a good part of the metropolitan region. Only at the northern extreme of BMR (Pre-Littoral chain) small green and blue areas are detected with lower temperatures.

Fig. 10
figure 10

Example of the spatial distribution of Tmin, Tmean and Tmax derived by multiple regression for November 29, 2015

Fig. 11
figure 11

Winter season spatial distribution of average Tmin derived by multiple regression

Fig. 12
figure 12

Monthly spatial distribution (winter season) of average Tmin derived by multiple regression

At seasonal scale, some examples of the spatial distribution of average Tmin for the winter season are shown in Fig. 11. Figure 12 depicts the monthly average of Tmin for winter months (January, February and March). In spite of the spatial distributions, the other three seasons are obtained with a notable degree of accuracy; the winter case for Tmin is introduced here given that the characteristics of these four maps clearly manifest the UHI phenomenon in the BMR. The map of average winter Tmin reproduces the two nuclei of UHI on the downtown of the Barcelona city and at the south along the coast. If the spatial analysis is revised at a monthly scale, very similar spatial patterns to those observed for November 29 (Tmin) are now found for the cold months of January and February. In the case of March (a more temperate month), the UHI is not so clear, but the two nuclei appear again.

4.4 Discussion of the results

With respect to the correlations among dependent and independent variables used in the multiple regression, it is noticeable, as expected, the high correlation (0.97) between the three daily temperatures (minimum, mean and maximum) and the two LST temperatures and also the calendar day. The correlations are notably small for the rest of independent variables, sometimes achieving values lower to 0.23. Another relevant question is that results obtained by PC analysis, taking into account the thermometric stations or the dense network of 1 km2 pixels, are very similar. Consequently, the relatively sparse distribution of the 48 thermometric gauges would not be a shortcoming to obtain a reliable spatial distribution of temperatures, being then defined a relatively good image of the thermometric variability on the BMR.

With respect to the multiple linear processes, the square regression coefficients obtained at annual scale are notably good, in spite of the RMSE varies within the (1.5–2.0 °C), results quite similar to those obtained by Cristóbal et al. (2008) for the whole Catalonia. For the results at seasonal and monthly scales, even though the square regression coefficients are lesser than those obtained at an annual scale, the RMSE values are very similar, being not exceeded 2.1 °C. Consequently, the images of the spatial distribution of temperatures should be of similar quality at annual, seasonal and monthly scales. Nevertheless, the highest RMSE values are detected at a monthly scale (Tmin for April and Tmax for July).

Another noticeable characteristic is the low weight of NDVI and NDBI on the multiple linear regression, which has been also detected in other similar analysis around the world. In spite of both coefficients could be relevant, as they represent the type of vegetation and building respectively, the LST obtained from satellite data could itself include a great percentage of the information concerning NDVI and NDBI.

From an applied point of view, given that the obtained thermometric maps are submitted to a maximum RMSE of 2 °C, a dense network of minimum, mean and maximum temperature data could be possible to analyse thermometric phenomena (UHI and hot and cold outbreaks) affecting life quality and health of BMR population. Additionally, data obtained with smaller pixels, for instance with LANDSAT satellite, would permit a notable increase in the spatial resolution of temperatures. Unfortunately, the available data from LANDSAT is nowadays minor than that found from MODIS satellite, and the accuracy of the results would be then questionable.

5 Conclusions

Previous to the multiple regression process, the Pearson correlation coefficient and the PCA have permitted to detect links between empiric temperatures recorded at 48 meteorological stations and satellite data, geographic and topographic data and transformed calendar days. The PCA has also permitted to validate if the set of the thermometric stations are appropriate for a good multiple regression process by comparing RPCs and factor loadings corresponding to 48 stations dataset and 1 km2 pixel network. In spite of the very different spatial data density for thermometric stations and pixel coverage, a few discrepancies are found with respect to the factor loadings of NDVI and NDBI. In this way, a denser network of thermometric data would improve the role of NDVI and NDBI on the multiple regression. It is also worth mentioning the substitution of the topographic parameter of orientation (Ori) by cos(Ori) and sin(Ori), being detected a slight improvement on the relevance of sin(Ori) when it is used instead of Ori in the multiple regression process.

A revision of the multiple regression analyses results manifests the strong relevance of LSTn for Tmin and Tmean and LSTd for Tmax at the annual scale, as obtained by Thanh et al. (2016). A similar pattern is observed at a seasonal scale. The relevance at a monthly scale of LSTd on Tmin is not significant for February, April, May, June and September. Additionally, the relevance of LSTn on Tmax is not significant for September.

With respect to specific results at the annual scale, first of all, it is noticeable that the best fit between empiric temperatures and those generated by multiple regression is usually found for Tmean, being obtained the worst for Tmax in terms of R2 and RMSE. It is also noticeable that cd* only plays a relatively important role for Tmin and the set of relevant variables are not the same for Tmin, Tmean and Tmax. At monthly scale, whereas the best fits are effectively obtained since January to December for Tmean, the worst fit is obtained for Tmax (June, July) and Tmin (April). At a seasonal scale, the results are quite different. Whereas the minimum residual for Tmean is detected in winter and spring, for Tmin it is found in summer and for Tmax in autumn.

In short, the reasonably good results of the multiple regression process would permit:

  • Describing with detail (pixels of 1 km2) the spatial distribution of temperatures, notably improving the spatial data density on BMR derived from the thermometric network and without applying interpolations.

  • Obtaining detailed maps of UHI phenomenon on urban areas. It has to be remembered that these details of the UHI intensity could not be obtained from a few thermometric stations. In particular, two clear focus of high UHI intensity for Tmin in winter have been detected. Additionally, the smooth temperatures along the Mediterranean coast are verified by observing Tmax and Tmean maps.

A systematic and detailed spatial description of temperatures with the methodology used in this paper could be a significant improvement in the analysis of cold and hot outbreaks. It should be underlined that these analyses could be very useful to study the effects on the life quality and health of the Barcelona city and metropolitan area inhabitants. Conversely to a relatively scarce distribution of thermometric stations used to analyse these outbreaks, the multiple linear regression method provides a more detailed (1 km × 1 km) spatial distribution of air temperatures.

Finally, comparing R2 and RMSE for BMR with those obtained by Cristóbal et al. (2008) for the whole Catalonia, it is observed that better results have been obtained for BMR. Whereas for Tmin and Tmax, the best results are achieved for BMR (R2 equal to 0.92 in front of 0.54–0.57 and RMSE equal to 1.5–1.6 °C in front of 2.3–1.8 °C), for Tmean, R2 is again better for BMR (0.96 in front of 0.66). The RMSE for Tmean obtained for the whole Catalonia (1.3 °C) is slightly better than that obtained for BMR (1.5 °C). One reason for these differences could be that BMR area is ten times smaller than Catalonia, implying a minor variability of temperatures and geographical characteristics and permitting a better description of the spatial distribution of temperatures. A higher spatial density of thermometric stations could be another factor favouring the BMR results.