1 Introduction

High-resolution daily temperature gridded datasets are widely used for many purposes. They serve as input data for numerous models across various research fields, such as agronomy, meteorology, hydrology, ecology, and climatology. Researchers use spatial or spatio-temporal interpolation methods to create maps from point data and covariates. Nowadays, point data are available from weather stations on a global level (e.g., GHCN (Menne et al. 2012), GSOD (https://data.noaa.gov/dataset/dataset/global-surface-summary-of-the-day-gsod)), regional level (ECA&D (Klein Tank and coauthors 2002)), and local (e.g., national hydrometeorological services) level. Furthermore, many of these point data sources have open data policy so they are easily accessible.

One needs to consider the extent, resolution, and support while performing an interpolation. In this case, the support is a time interval, an area, or a volume over which a measurement or prediction is made. A variety of gridded temperature datasets exists in various spatial and temporal resolutions and supports (an extensive list is available at https://www.esrl.noaa.gov/psd/data/gridded/). For example, researchers have investigated spatial ranges from 5° (Osborn and Jones 2014) to 250 m (Holden et al. 2016), temporal resolution ranges from 30 year period (PRISM Climate Group, Oregon State University, http://prism.oregonstate.edu/normals/) to daily (Kilibarda et al. 2014) gridded datasets, spatial extent ranges from areas covering the whole world (Kilibarda et al. 2014) to relatively minute area extents (Rosenfeld et al. 2017), and finally temporal extent ranges from more than 50 years (Oyler et al. 2015) to a single year period (Parmentier et al. 2015). However, the global datasets are not optimal for most of the applications mentioned above due to their coarse spatial and temporal resolution and insufficient accuracy. Coarser spatial and temporal support leads to the averaging of spatial and temporal variability. This results in the omission of microclimatic areas and short time phenomena. Due to the shortcomings of global models, there is a need for the development of local models that can produce gridded datasets at a much finer spatial and temporal resolution with improved accuracy.

Longitude, latitude, and elevation are most commonly used covariates in temperature modeling—especially in linear models (Wu and Li 2013; Yuan et al. 2014). Generalized additive models based on longitude, latitude, and elevation gave the best results for generating a gridded daily dataset for maximum air temperature surfaces at 1 km spatial resolution for the state of Oregon, USA (Parmentier et al. 2014). The elevation is often an essential covariate due to the average temperature decreases with altitude. The elevation is used either directly in temperature models (e.g., Jarvis and Stuart 2001) or in the form of a topographic index or any other DEM derivatives. For example, Dodson and Marks (1997) used elevation in the form of hydrostatic and potential temperature equations in the inverse distance weighting (IDW) method to interpolate the minimum and maximum temperature at 1 km resolution for the mountainous region in the US Pacific Northwest. However, many other covariates have been proven to be beneficial for temperature interpolation. Courault and Monestiez (1999) used general atmospheric circulation patterns along with elevation, Jarvis and Stuart (2001) introduced land cover as a covariate, specifically useful in modeling urban effects. In recent years, the moderate resolution imaging spectroradiometer land surface temperature (MODIS LST) is widely used as one of the most important covariates for temperature interpolation. Zhu et al. (2013), Hengl et al. (2012), Kilibarda et al. (2014, 2015), Williamson et al. (2014), Xu et al. (2014), Kloog et al. (2014), Parmentier et al. (2015), Huang et al. (2015), Stewart and Nitschke (2017), and Li et al. (2018) used MODIS as the main covariate for their models. The MODIS LST is highly correlated with surface measured air temperature, where specifically daytime images are correlated well with maximum temperatures and nighttime images with minimum temperatures (Oyler et al. 2016). The problem with MODIS LST images is that they have spatial and/or temporal gaps that need to be filled. Filling the gaps using spatial or temporal interpolation together with the processing of images are computationally consuming processes. Proximity to the sea, land cover, vegetation indices, canopy height, cloud cover, etc. are also used as covariates in temperature modeling.

The most commonly used methods for interpolation of temperature involve distance criteria methods (Dodson and Marks 1997; Srivastava et al. 2009), splines—being mostly thin plate splines (Jarvis and Stuart 2001; Hutchinson et al. 2009; Yuan et al. 2014; Stewart and Nitschke 2017), regression and geostatistical methods (Courault and Monestiez 1999; Kurtzman and Kadmon 1999; Hunter and Meentemeyer 2005; Carrera-Hernández and Gaskin 2007; Haylock et al. 2008; Perčec Tadić 2010; Hengl et al. 2012; Wu and Li 2013; Krähenmann and Ahrens 2013; Kilibarda et al. 2014), and recent machine learning techniques (Xu et al. 2014; Gasch et al. 2015).

Kriging has become a very popular interpolation method for temperature and other meteorological variables due to its ability to take into account spatial correlation, to estimate target variables at unobserved locations, and to quantify the uncertainty associated with the estimator. Courault and Monestiez (1999) used ordinary kriging (OK) to interpolate maximum and minimum temperatures at a 1 km spatial resolution for southeast France with the RMSE of 1–2 °C. Afterwards, regression kriging (RK) was introduced, and it was proven that it gives better results than OK (Hunter and Meentemeyer 2005; Carrera-Hernández and Gaskin 2007) or any other interpolation methods like distance criteria, regressions, and splines (Hofstra et al. 2008). Haylock et al. (2008) interpolated, amongst other variables, the mean surface temperature for Europe at 25 km spatial resolution (E-OBS) by using kriging with an external drift (KED) on anomalies from monthly averages. Perčec Tadić (2010) made 20 climatological (climatological normals) maps for Croatia for the period of 1961–1990 at a resolution of 1 km using regression kriging. Frick et al. (2014) provided gridded daily datasets of surface air temperatures for Germany at a 5 km spatial resolution using RK. Brinckmann et al. (2016) and Berezowski et al. (2016) interpolated the daily anomalies from the monthly averages using simple kriging (SK) for minimum and maximum temperatures for Europe. Spatio-temporal regression kriging (STRK) has recently become popular due to the development of R gstat (Pebesma 2004) and spacetime (Pebesma 2012) packages. Gräler et al. (2016) added extensions to the R gstat package for handling data formats from the R spacetime package, spatio-temporal variogram modeling, and spatio-temporal interpolation. Hengl et al. (2012) used STRK for interpolation of daily temperatures for Croatia and Kilibarda et al. (2014) for global land areas.

Nowadays, machine learning (ML) methods are becoming popular because they are easy to use and have decent accuracy performances. One of the reasons why we did not use ML methods in this study is that they cannot be easily explained (black-box approach). Even though there are some initiatives to establish a framework for spatio-temporal interpolation using ML (Hengl et al. 2018), the accuracy is still lower in comparison with RK. As opposed to an ML approach, the use of STRK spatial and temporal correlations can be recognized and explained through variograms.

The first objective of this research is to examine the performance of the existing global STRK model (STRK_global, Kilibarda et al. 2014) over Croatia using an independent station dataset from dense Croatian national meteorological observing network. The second objective is to develop more accurate local model for mean daily temperature for Croatia based on smaller number of covariates (without MODIS LST) with respect to already existing model, i.e., Hengl et al. (2012). Finally, validation results of the developed local model will be compared and discussed in relation to the (1) existing STRK_global without high density station dataset from Croatia and (2) existing local model relying on MODIS data as covariate.

2 Study area and datasets

2.1 Study area

Although Croatia is a medium sized European country, the diverse topography, openness toward the Pannonian Plain and position on the eastern Adriatic coast characterizes the country with three main climatic regions: continental, mountainous, and maritime (Zaninović et al. 2008). This climate diversity has inspired the testing of different climatic (Antonić et al. 2001; Hengl et al. 2012) or physical models, where the research of the strong and gusty bora wind is amongst the most interesting examples (Bajić 1989; Belušić and Bencetić Klaić 2004; Horvath et al. 2009; Ivatek-Sahdan and Ivancan-Picek 2006). The diversity of climate conditions is explored and mapped in detail for the most recent standard climate normal 1961–1990 as reported in the climate atlas of Croatia (Zaninović et al. 2008), where the large range of values of different temperature parameters are presented, amongst which are the mean monthly and annual temperature, annual number of frosty, warm, and days with summer nights. In the recent decade, the observed climate change in the region is especially supported by pronounced warming and extended dry periods (Cindrić et al. 2010), which emphasize the need for spatio-temporal interpolation of temperature on fine spatial and daily temporal scale. The maps produced by these studies can serve as data sources for climate assessment and monitoring.

2.2 Datasets

Two different sets of the measurements from meteorological stations were used, namely the Global Surface Summary of the Day (GSOD) and Croatian mean daily temperature dataset (CMDT). In addition, the digital elevation model (DEM) and topographic wetness index (TWI) were used as static covariates.

GSOD (https://data.noaa.gov/dataset/dataset/global-surface-summary-of-the-day-gsod) is a global dataset of the daily summaries of meteorological variables including mean daily temperature. The data is collected from over 9000 stations. Mean temperatures are measured with the precision of 0.1 °F (0.055 °C). Daily summaries are calculated only if there are a minimum of 4 observations at the station during the day. There are 48 GSOD stations in Croatia for the year 2008 (Fig. 1, blue circles). GSOD is used in this study because it provides more measurements of the mean daily temperature than other open datasets (e.g., GHCN-daily, ECA&D), and it allows the prediction of mean air temperature with the STRK_global model to be independent with respect to CMDT. For the purpose of this study, the mean temperatures were converted from °F to °C.

Fig. 1
figure 1

Spatial distribution of GSOD (blue circles) and CMDT (green squares) meteorological stations for mean daily temperature, CMDT stations which are included in GSOD dataset (orange diamond), and CMDT stations with missing DEM and TWI values (red triangles). DEMSRE3 is global relief model (DEM) produced by combining SRTM 30+ and ETOPO DEM, and TWISRE3 is derived from the SAGA GIS Topographic wetness index (TWI). DEMSRE3 and TWISRE3, both at a 1 km spatial resolution, were downloaded from worldgrids.org (not active anymore) (Hengl et al. 2015) (Fig. 2)

CMDT (http://spatial-analyst.net/book/HRclim2008) provides data from 159 stations in Croatia (Fig. 1, green squares). Furthermore, there are 57,282 measurements of the daily mean temperature available for the year 2008. A detailed description of this dataset is given by Hengl et al. (2012). The daily mean temperature is calculated as a weighted average of measurements taken at 07, 14, and 21 UTC. The precision of the measurements is 0.1 °C which is comparable with the GSOD dataset.

GSOD and CMDT datasets are stored in R STFDF objects (space-time full data frame) (Pebesma 2012), which are appropriate space-time objects, because the data exist for nearly all of the days at all of the stations’ locations. For each CMDT station, DEM derivatives are extracted and added as an attribute to STFDF. Not all 157 CMDT stations are used for accuracy assessment. The 9 coastal stations were not used for accuracy assessment (Fig. 1, red triangles) because static covariates DEM and TWI were missing for these stations due to a poorly defined coastline on 1 km DEMSRE3. Furthermore, the stations located in the vicinity of 2 km from the GSOD stations, 37 of them that are considered as duplicates, were not used to test the STRK_global predictions made by GSOD (Fig. 1, orange diamonds).

3 Methods

3.1 Spatio-temporal regression-kriging

In a purely spatial variant of regression-kriging, the multiple linear regression (MLR) and SK are combined into RK, where the MLR is used to model a trend, and the SK is used for modeling the spatial correlation between regression residuals. MLR finds the relationships between covariates and the observed variable. Then, it uses the MLR model on the covariates to make a prediction at unknown locations. Even though multi-iteration generalized least squares (GLS) represents an optimal solution for MLR trend modeling, Kitanidis (1993) showed that OLS produces almost the same results as GLS in the case of kriging. Based on stationarity assumption, which means that the mean and variance of the residuals are constant throughout the space-time domain (the mean is zero), SK can be used for MLR residual modeling. The spatial correlation between the MLR residuals can be explained using a spatial variogram. Thus, RK provides better results than MLR and SK (or OK) used independently, except in special cases. For example, when MLR describes all of the variability of the observed variable and residuals have no spatial structure, then there is no need for SK. In the opposite case, when there are no relationships between the covariates and the observed variable, then only SK (or OK) could be performed (Zhu et al. 2013). The term RK is often used interchangeably with universal kriging (UK) and KED. Although these interpolation methods have differences in the means of computation, the predictions and accuracy of the predictions are the same (proof Hengl et al. 2007, Appendix). UK was first presented by Matheron (1963). All the computation was performed in one step, and the trend was modeled as a function of coordinates. Wackernagel (1998) started to use the term KED as an improved version of UK by introducing auxiliary variables, i.e., covariates in trend modeling. Consequently, RK imposed itself as a two-step interpolation technique that separates trend and residual modeling (Ahmed and de Marsily 1987; Odeh et al. 1995). Therefore, the trend could be modeled using linear, machine learning, or any other regression technique. Considering that the target variable is air temperature, the spatial trend in RK should not be confused with e.g., “positive temperature trend” as is used in global warming research and discussions.

STRK is an extension of RK. Besides the spatial component, it considers the influence of the time component and the space-time interaction on a prediction, i.e., it replaces spatial RK with spatio-temporal RK. STRK is a suitable candidate for the modeling of mean daily temperatures because of its ability to describe the spatio-temporal variability of a certain variable. Following the STRK interpolation method, the mean temperature variable Z(s, t) that varies over space (s) and time (t) can be decomposed as (Heuvelink and Griffith 2010):

$$ Z\left(\boldsymbol{s},t\right)=m\left(\boldsymbol{s},t\right)+V\left(\boldsymbol{s},t\right). $$
(1)

In previous equation (Eq.1), m is a deterministic component of the variable (trend) and is modeled using MLR:

$$ m\left(\boldsymbol{s},t\right)=\sum \limits_{i=0}^p{\beta}_i{f}_i\left(\boldsymbol{s},t\right), $$
(2)

where the βi are regression coefficients estimated using ordinary least squares (OLS) and β0 is model intercept (by imposing f0 is equal to 1), the fi are covariates that are known over the spatio-temporal domain, and p is the number of covariates (Eq. 2).

V is a zero-mean spatio-temporal stochastic residual and is modeled using a spatio-temporal sum-metric variogram (Gräler et al. 2016):

$$ \boldsymbol{\gamma} \left(\boldsymbol{h},u\right)={\gamma}_S\left(\boldsymbol{h}\right)+{\gamma}_T(u)+{\gamma}_{ST}\left(\sqrt{{\boldsymbol{h}}^2+{\left(\alpha \cdotp u\right)}^2}\right), $$
(3)

where γ(h, u) denotes the semivariance of residuals at h units of a distance in space and u units of a distance in time, γS, γT are purely spatial and temporal components, γST is the space-time interaction component and α is a spatio-temporal anisotropy ratio which converts units of temporal separation (u) into spatial distances (h) (Eq.3).

3.2 Mean daily temperature model for global land areas

The mean daily temperature gridded dataset for Croatia at a 1 km spatial resolution was produced using the STRK_global implemented in the R meteo package (Kilibarda et al. 2014) and GSOD stations. The tmeanGSODECAD_noMODIS model from the tregcoef data of the R meteo package were used for trend estimation. Then the tmeanGSODECAD_noMODIS fitted variogram from the tvgms data of the R meteo package was used for residual prediction. An up-to-date version of R meteo package is available for download at https://r-forge.r-project.org/projects/meteo/.

Geometric temperature trend (GTT), DEM, and TWI are covariate layers used for MLR. The only dynamic covariate layer is GTT proposed by Kilibarda et al. (2014). GTT is a function of latitude (φ) and the day of the year (day). GTT is defined with the following function (Eqs. 4 and 5) for the mean daily temperature:

$$ GTT=30.4\cos \varphi -15.5\ \left(1-\cos \theta \right)\sin \mid \varphi \mid, $$
(4)

where θ is:

$$ \theta =\left( day-18\right)\ \frac{2\pi }{365}+{2}^{1-\operatorname{sgn}\left(\varphi \right)}\pi . $$
(5)

The original model for global land areas uses MODIS LST as a covariate (Kilibarda et al. 2014). However, the MODIS LST daily images have spatial gaps while MODIS LST 8-day images have temporal gaps due to cloud contamination, so those gaps need to be filled. Since the idea of our research was to develop a simple, accurate, and fast model for mean daily air temperature estimation, MODIS LST data are omitted. The STRK_global model is explained in detail by Kilibarda et al. (2014).

3.3 Mean daily temperature model for Croatia

In order to make a better estimation of the mean daily temperature for Croatia at a 1 km resolution, an adaptation of the presented STRK_global for mean daily temperature was made. The STRK_Croatia was developed using the data from CMDT. This dataset contains observations from more than 150 stations, which is about three times the amount compared with 48 GSOD stations used for the making of the STRK_global (Kilibarda et al. 2014). A trend model was made using the same covariates applied in the STRK_global: GTT, DEM, and TWI. Consequently, a spatio-temporal sum-metric variogram was made for the residuals calculated at the stations locations.

The trend modeling, estimation of the sample variogram, and fitting of the spatio-temporal variogram are performed in the R software (R Development Core Team 2012) using the lm base function and the vgmST and fit.StVariogram functions from the R gstat package. The code is available at http://osgl.grf.bg.ac.rs/materials/tac_hr/. The STRK_Croatia is now available in R meteo package, i.e., trend in tregcoef data and spatio-temporal variogram in tvgms data named hr. It was used to produce a local mean daily temperature gridded dataset for Croatia for year 2008.

3.4 Accuracy assessment

The accuracy of the STRK_global and STRK_Croatia is assessed by leave-one-out (LOO) and stratified 5-fold cross-validation. Before that STRK_global predictions made using GSOD stations were compared with the CMDT data. Five stratified folds are created using modified stratfold3d function of the R sparsereg3D package (https://github.com/pejovic/sparsereg3D, Pejović et al. 2018). This function creates stratified folds in three steps:

  1. 1.

    Stations are clustered using k-means clustering according to spatial location

  2. 2.

    Each cluster is split to folds, stratified according to the altitude of the station

  3. 3.

    Each final fold is obtained by merging one fold from each cluster.

Each of the folds was used once for cross-validation based on the data from the other four folds. This method is chosen because temperature observations in CMDT, as well as in GSOD, are not represented well enough in the areas at higher altitudes. Rather than LOO cross-validation, stratified cross-validation was used for two reasons. (1) It is better in terms of bias and variance (Kohavi 1995), and (2) it has an ability to separate data in such a way that each fold of the data is a representative sample of the whole dataset with regard to altitude and spatial distribution of the stations.

The coefficient of determination (R2) and root mean squared error (RMSE) were calculated as performance measures for both (STRK_global and STRK_Croatia) examined models. Also, the annual average RMSE per test or cross-validated station is calculated in order to find a cause of the worst results which occur at some stations. All of the figures used to present annual average RMSE per station are available as interactive maps at http://osgl.grf.bg.ac.rs/materials/tac_hr/. These maps were produced by R package plotGoogleMaps (Kilibarda and Bajat 2012).

Fig. 2
figure 2

DEM (left) and TWI (right) values for Croatia

4 Results

4.1 Mean daily temperature model for global land areas and prediction

The STRK_global was already implemented in the R meteo package (tmeanGSODECAD_noMODIS model, Kilibarda et al. 2014), so predictions were made for the limited area of Croatia. The spatio-temporal trend model for STRK_global is given by Eq. 6:

$$ trend=-2.44+1.02\ GTT+0.0004\ DEM-0.025\ TWI $$
(6)

The parameters of the fitted sum-metric variogram are shown in the Table 1.

Table 1 Sum-metric variogram parameters for the STRK_global (Kilibarda et al. 2014)

Kilibarda et al. (2014) found that GTT by itself explains 75% of mean daily temperature variations with a standard error of ± 5.7 °C which makes GTT the most important covariate of the model. Furthermore, they concluded that there is no pure temporal correlation between the residuals and also that the temporal correlation is caught by the spatio-temporal component.

Mean daily temperatures at a 1 km spatial resolution for Croatia for the year 2008 were estimated using above described STRK_global model and both GSOD and CMDT dataset. These datasets are available at http://osgl.grf.bg.ac.rs/materials/tac_hr/ in GeoTIFF format. Predictions for each pixel are made using the 30 nearest GSOD or CMDT stations, and observations at these stations are not only for a specific day, but also for the day before.

4.2 Mean daily temperature model calibration for Croatia and prediction

The estimated spatio-temporal trend for the STRK_Croatia is defined as:

$$ trend=18.73+0.86\ GTT-0.00921\ DEM-0.606\ TWI $$
(7)

This trend explains about 80% of the variation of the mean daily temperature with RMSE = 3.5 °C, and GTT by itself explains 74% of the mean daily temperature variation with a standard error of ± 4 °C.

In Fig. 3 the scatterplot of observations and predictions is presented (left). Residuals from the trend are normally distributed allowing for the kriging interpolation (Fig. 3, right). In Fig. 4 sample variogram and fitted sum-metric variogram are presented. The sample variogram shows that there is obviously a spatio-temporal correlation between the residuals and on account of this spatio-temporal kriging that is applicable. The parameters of the fitted sum-metric variogram are shown in Table 2.

Fig. 3
figure 3

The scatterplot of estimated mean daily temperature values from the trend for STRK_Croatia vs. observed values, (left). Histogram of the residuals from the trend for STRK_Croatia (right). It shows that residuals follow the normal distribution which justifies the use of the kriging

Fig. 4
figure 4

Sample variogram (left) and fitted sum-metric variogram (right) of the residuals from the trend model for STRK_Croatia. Variograms are presented in 3D

Table 2 Sum-metric variogram parameters for the STRK_Croatia

Mean daily temperatures at a 1 km spatial resolution for Croatia for the year 2008 were estimated using above described STRK_Croatia and CMDT dataset. They are also available at http://osgl.grf.bg.ac.rs/materials/tac_hr/ in GeoTIFF format. Predictions for each pixel were made using the 30 nearest CMDT stations and observations from them for a specific day and previous 6 days as it could be inferred from the range in the temporal component of the sum-metric variogram (Table 2).

The STRK_Croatia is also added to the R meteo package.

4.3 Accuracy assessment

STRK_global predictions based on GSOD stations was tested with CMDT. It is important to emphasize that these 111 stations from the CMDT were not used in the making of the STRK_global. The R2 of the test is 92.9% and RMSE is 2.1 °C. The annual average RMSEs per station are presented in the Fig. 5. The test shows that R2 is about 4% lower than for cross-validation (96.6%, Kilibarda et al. 2014), and RMSE is in a range of the result for the cross-validation for the whole world (2.4 °C, Kilibarda et al. 2014) and averaged for Croatia (2 °C, http://dailymeteo.org/node/3). These results are explainable by larger number of stations used by Kilibarda et al. (2014) since they merged ECA&D dataset with GSOD.

Fig. 5
figure 5

Annual average RMSE per station for testing of STRK_global predictions made by using GSOD stations (http://osgl.grf.bg.ac.rs/materials/tac_hr/). RMSE values are presented by the radius of the circles

The LOO cross-validation was performed for STRK_global and STRK_Croatia models with 148 CMDT stations (nine stations without DEM and TWI were excluded). The R2 of STRK_global and STRK_Croatia equals 94.4% and 97.8%, respectively, while the RMSE equals 1.9 °C and 1.2 °C, respectively. The annual average RMSE per station is presented in the Fig. 6. For the STRK_Croatia, three stations at altitudes higher than 1000 m got the highest RMSEs and they are around 3 °C (Fig. 7). All the other stations at altitudes lower than 1000 m got an RMSE less than 2.5 °C.

Fig. 6
figure 6

Annual average RMSE per station. Results of LOO cross-validation, STRK_global on the left and STRK_Croatia on the right (http://osgl.grf.bg.ac.rs/materials/tac_hr/). RMSE values are presented by the radius of the circles

Fig. 7
figure 7

Scatter plot DEM vs annual average RMSE from LOO cross-validation, STRK_global on the left and STRK_Croatia on the right. Stations at altitudes above 1000 m (red) in the top right corner have the highest RMSEs. Notice the smaller scale on the y-axis for the STRK_Croatia model

The 5-fold stratification folds from 148 CMDT stations are shown in Fig. 8. Each of the folds is a representative sample of the entire dataset considering the elevations and spatial distribution of the stations and considering that the median and mean of the folds do not differ more than 10 to 20 m from the median and mean of the whole dataset. The spatial distribution of the stations per fold is presented in the Fig. 9.

Fig. 8
figure 8

Boxplot of the altitude per fold

Fig. 9
figure 9

Annual average RMSE per station for 5-fold cross-validation, STRK_global on the left and STRK_Croatia on the right (available at http://osgl.grf.bg.ac.rs/materials/tac_hr/). RMSE values are presented by the radius of the circles

The results from the stratified 5-fold cross-validation show that the STRK_global explains about 95.3% of the variation with 1.7 °C RMSE, while proposed STRK_Croatia explains about 98.2% of the variation with 1.1 °C RMSE. These results are in agreement with the LOO cross-validation. The RMSE per station are presented in the Fig. 9 with different color coding for each fold.

The accuracy per month for both models was also assessed (Table 3). The STRK_global does not show noticeable seasonal differences in average monthly RMSEs. On the other hand, STRK_Croatia shows noticeably larger RMSEs in cold season with the largest RMSE in January and smaller RMSEs in warm season with the smallest RMSE in April. The improvements in changing from a global to local model are also larger in warm season.

Table 3 RMSE values [°C] for each month for STRK_global and STRK_Croatia obtained by LOO and 5-fold cross-validation and differences between them

5 Discussion

5.1 Global vs local model

Besides GTT, DEM and TWI proved to be significant covariates in the STRK_Croatia trend model (Eq. 7). They have a larger influence in the prediction of the STRK_Croatia compared with the one for STRK_global as is supported by the t test for the significance of the regression coefficients. The significant difference between the trend of the STRK_Croatia and the STRK_global is in the intercept value. The aforementioned value is 18.73 °C for Croatia, which is significantly higher compared with − 2.43 °C for the STRK_global. This can be explained by the mean annual temperature that is higher for Croatia (around 13 °C) than for the entire world (around 1 °C).

A striking difference between the STRK_Croatia and STRK_global fitted variograms is that a temporal variogram component appears in the STRK_Croatia. This means that there is a pure temporal correlation between the data in the range of 7 days. Nugget effects from the spatial and spatio-temporal components indicate that the short-range variability is 0.3 °C. This, in turn, shows that there is room for model improvement because the precision of the measurements being 0.1 °C is declared, which suggests that the stations with lower precision are presented in the data or are themselves potential outliers in data. As expected, the ranges and the anisotropy ratio (excluding the temporal component) for the STRK_Croatia are lower than for the STRK_global due to the higher density of stations and smaller spatial extent.

Accuracy assessment shows that the STRK_Croatia, which is an adaptation of the STRK_global, significantly improves interpolation accuracy by 3.3% in R2 and 0.6 °C in RMSE. When comparing accuracies per month (Table 3), the STRK_Croatia performs better than STRK_global in each month and the improvements are larger in warm parts of the year. The largest improvement is for April, from 1.84 °C to 0.93 °C and the smallest for January, from 1.87 °C to 1.51 °C RMSE. The average monthly RMSEs of the STRK_Croatia, for those that are larger in the cold season compared with the warm one, indicate that there are still some influences that modifies winter temperatures (like e.g., cold air pool and temperature inversions) that cannot be explained by the model. Similar conclusions were obtained in Hiebl et al. (2009) and in Perčec Tadić (2010) when comparing monthly normals and in Hiebl and Frei (2016) when comparing daily minimum and daily maximum temperatures. In these papers, the cold months/season had prediction errors that were larger than in warm months/season. The adjustment of the global model and a benefit of the larger observations density become obvious if we take a look at predictions in Fig. 11. The STRK_Croatia model shows a more pronounced spatial variability, especially in the mountainous regions. On the other hand, the STRK_global smooths the prediction because it was trained on the sparser station network for the whole world (Kilibarda et al. 2014). Further on, the spatial range of 221 km for the STRK_Croatia variogram is much shorter than 5130 km for STRK_global. Also, the spatial nugget of 2.24 °C for the STRK_Croatia variogram is much larger than 0.56 °C for STRK_global. This results in a loss of local variability and accuracy in the STRK_global. For the STRK_global, the highest errors occur in the western part of Croatia and near the coastline (Figs. 7 and 11) because it represents a mountainous region (Fig. 2). The STRK_Croatia managed to reduce errors not only in that region but for the whole area of Croatia. However, the error in the mountainous region is still higher compared with the other parts of Croatia. Figure 10 shows time series of predictions from LOO cross-validation and observations for Zavižan (1514 m) and Zagreb-Maksimir (121 m) stations. It can be noticed that both STRK_global and STRK_Croatia predict mean daily temperature with high accuracy at lower altitude (Fig. 10, Zagreb-Maksimir), which confirms Kilibarda et al.’s (2014) claim that STRK_global performs better for areas at lower altitude. STRK_Croatia predictions are much closer to observations with slight underestimation while STRK_global mostly overestimates mean daily temperature for stations at higher altitude (Fig. 10, Zavižan). This improvement at higher altitude was expected because STRK_global models variability of mean daily temperature for the whole world, while STRK_Croatia tends to explain variability just for Croatia. It looks like that STRK_Croatia predictions are STRK_global predictions shifted to observations values (Fig. 10, Zavižan) with small adjustments. This shift is a consequence of the shift in trend models, i.e., trend for STRK_Croatia performs better than trend for STRK_global. Another reason is that residuals for STRK_global and STRK_Croatia follow the same spatio-temporal patterns, even though residuals from STRK_global are larger than STRK_Croatia (Fig. 11).

Fig. 10
figure 10

Time series of predictions from LOO cross-validation (red—STRK_global, blue—STRK_Croatia) and observations (green) for station Zavižan and station Zagreb-Maksimir

Fig. 11
figure 11

Maps of predicted mean daily temperatures at 1 km spatial resolution using STRK_global on the left and STRK_Croatia on the right with CMDT stations for the first 4 days of January 2018 for Croatia

5.2 Mean daily temperature model for Croatia and comparison with other models

The mean temperature for Croatia has already been modeled by regression-kriging and the same dataset of CMDT stations in a previous study (Hengl et al. 2012). Latitude, longitude, DEM, topographically weighted distance from the coastline, and TWI were used as static, and DEM-derived total potential insolation (INSOL) and MODIS LST images as dynamic covariates in that study. Hengl et al. (2012) explained 86% of variation with 3.4 °C RMSE by MLR using these covariates, which is slightly better compared with 80% of the variation with 3.5 °C RMSE for the STRK_Croatia. This result may be explained by the larger number of covariates used and the effect of dynamic predictors in a model. Consequently, the fitted variograms are also different. The STRK_Croatia fitted variogram has lower nuggets, sills, and ranges, and the spatio-temporal component is also more significant. However, the overall accuracy is improved by 1.2 °C in RMSE and 7 s% in R2 (RMSE = 2.4 °C and R2 = 91% in Hengl et al. 2012), even though the MODIS LST images were omitted. The trend model proposed by Hengl et al. (2012), which includes MODIS, already explained a lot of spatial patterns and there was not much spatio-temporal relation left for SK to model. On the other hand, the simple STRK_Croatia trend model performed slightly worse but the fitted variogram explained more spatio-temporal variation. GTT explains a lot of temperature variation, which is comparable with MODIS LST. However, GTT obviously leaves a stronger spatio-temporal relation between residuals that can be explained by kriging.

When comparing the accuracy of other local (country) models at 1 km spatial resolution, STRK_Croatia performs similar or even better than some of them. Frei (2014) interpolated daily temperature at 1 km spatial resolution for Switzerland (European Alps) using nonlinear profiles and non-Euclidean distances and Rosenfeld et al. (2017) applied linear mixed effect models (3-step model with MODIS LST) for Israel. They both achieved RMSE of around 1 °C and R2 of around 97%. One must keep in mind that both Switzerland and Israel cover a smaller area (around 41,000 and 21,000 km2, respectively) than Croatia (57,000 km2), while the number of stations used for model development were comparable or larger (100 and 239, respectively). Nonetheless, the results of the STRK_Croatia are in the range of this accuracy. Huang et al. (2015) used linear regression models with MODIS LST as a covariate for central China’s Shaanxi Province, and Janatian et al. (2017) used a similar method with 11 more covariates in the eastern region of Iran. The accuracies of these two models (RMSE ranged from 2.5–3.5 °C and R2 was around 90%) are lower compared with STRK_Croatia because of a much larger area of these two countries and due to the fact that only around 20 stations were used for model development. Extensive research is available that interpolates daily minimum and maximum temperatures at 1 km spatial resolution for different areas. For example, Jarvis and Stuart (2001) performed the analysis for England and Wales, Zhu et al. (2013) for Xiangride River basin in the north Tibetan Plateau, Parmentier et al. (2014 and 2015) for the state of OR, USA, Oyler et al. (2015) and Li et al. (2018) for the conterminous USA, Hiebl and Frei (2016) for Austria. RMSE values were around 1–3 °C and R2 did not exceed 97%. Some of the models performed even better (RMSE below 1 °C) but the reason for this was due to a larger number of stations available. Others modeled daily temperature at coarser spatial or temporal resolution. Benali et al. (2012) provided weekly 1 km mean temperature estimations for Portugal (RMSE was 1.33 °C and R2 was 94.1%). Frick et al. (2014) provided 5 × 5 km gridded daily datasets of surface air temperature for Germany (RMSE was 1.39 °C and R2 was 98.3%). Brinckmann et al. (2016) provided daily mean temperature dataset for Europe at 5 km spatial resolution (RMSE was 1–2 K and R2 was 90%). Most of the above mentioned models are generally more complex or they use a large number of covariates including MODIS LST, which has a well-known problem with missing values and cloudiness. However, their accuracy is not better in comparison with STRK_Croatia. As a result, we recommend STRK_Croatia as a simple framework not only for mean but also for maximum and minimum temperature interpolation that can be applied to other countries or local areas.

There is still some room for model improvement in terms of mean daily temperature prediction at higher altitudes (specifically over 1000 m altitudes). Microclimate at higher altitudes is more complex. Also, insufficient number of stations and their distribution at higher altitudes do not cover temperature variability that could be explained by STRK (Kilibarda et al. 2015). Model underperformance and station deficiency problem at higher altitudes are also confirmed by Hengl et al. (2012). Many other publications point to the same problem. Perčec Tadić (2010) mapped monthly means of 20 climatological parameters, including the mean temperature, for the 1961–1990 period for Croatia with a resolution of 1 km. She proved that mapping accuracy is lower at higher altitudes due to the station deficiency problem. Dodson and Marks (1997) also confirmed that interpolation of the temperatures on higher altitudes will be biased toward temperature at lower elevations. Stahl et al. (2006) compared 12 interpolation methods for interpolating daily maximum and minimum temperatures over British Columbia , Canada, and the main conclusion was that the prediction was better with a denser distribution of stations on higher altitudes. Many others, like Krähenmann and Ahrens (2013) and Frei (2014) also have drawn the same conclusion. Benali et al. (2012) suggest that MODIS LST can improve accuracy in areas with low station density. MODIS LST could explain microclimatic conditions but the model will become more complex. The stations of neighboring countries are even more likely to improve the STRK_Croatia due to the fact that they have a significant impact on the prediction for the areas near the Croatian border.

6 Conclusions

Considering both accuracy and model simplicity, the STRK_Croatia has proved to be a good solution for production of high resolution mean daily temperature grids for local areas. Compared with STRK_global (Kilibarda et al. 2014), the improvement was made in 3.4% in R2 and 0.7 °C in RMSE. Suggested methodology uses only tree covariates, DEM, TWI, and GTT, and improves overall accuracy by 7% in R2 and 1.2 °C in RMSE in comparison with the study (Hengl et al. 2012), where seven covariates, including MODIS LST images, were used. Having in mind that all covariates (DEM, TWI, and GTT) used in our study are available in real-time, the proposed STRK_Croatia can be used for obtaining real-time temperature grids, which is not the case with models based on MODIS LST images. Most of existing temperature models are generally more complex or they use large number of covariates that also include MODIS LST. However, in most cases, their accuracy is lower in comparison with STRK_Croatia. Nonetheless, accuracy assessment shows that the STRK_Croatia model still does not perform well enough for the prediction of mean daily temperatures at higher altitudes (> 1000 m) by reporting similar errors as before with spatial or spatio-temporal interpolation methods for this area. Additional stations and measurements at higher altitudes and stations from countries around Croatia and MODIS LST could improve prediction accuracy at higher altitudes. This limitation makes the model most suitable for application on lower elevations such as in agriculture, health care, spatial planning, tourism, etc. Future research should focus on the enhancement of model prediction accuracy at higher altitudes. The proposed framework for the development of the STRK model could be applicable to any local area not only for mean but also for daily maximum and minimum temperature.