Introduction

River basin management is crucial for water allocation and distribution within a country or among several countries in transboundary river basins (Bai et al., 2016; Gao et al., 2010). Monitoring water availability and demand within a basin is a primary requirement of effective and sustainable river basin management, in which water availability mainly depends on the hydrology and ecology of the basin (Lakshmi et al., 2018). Water availability in a basin is significantly influenced by climate change (Bai et al., 2016; Moghim, 2018). Changes in streamflow considerably impact ecological systems and human societies (Bai et al., 2016; Mohammed et al., 2018b; Deliry et al., 2020). River basin management requires accurate delineation of watersheds and their stream channels based on the basin's terrain. For basin water budget estimation, information on soil, vegetation, and water budget components (i.e., precipitation, evapotranspiration, runoff, and surface and groundwater storage) is required.

Precipitation can be measured directly using in situ observations (rain gauges) and remote sensing techniques such as satellite sensors and weather radars (Tang et al., 2016; Shen et al., 2020). In gauge-based observations, given sufficient gauge density, spatial variations can be resolved (Gao et al., 2010); however, since gauge-based observations are point-based, uncertainty in precipitation values increases by increasing distance from the measuring station (Kidd et al., 2017; Shen et al., 2020). In addition, in regions with sparse gauge stations, especially over mountainous areas, point-based rainfall observation leads to the uneven spatial distribution of gauge data. Ground-based rainfall observation is generally challenging due to high costs and unavailability in remote areas (Gao et al., 2010). Spatial variability is generally high in measuring evapotranspiration and terrestrial water storage change at large scales via in situ methods (Gao et al., 2010; Lakshmi et al., 2018; Lv et al., 2017; Pan et al., 2020; Yin et al., 2019). Evapotranspiration is the sum of evaporation from the land surface and transpiration from plants, which depends on many variables (i.e., solar radiation at the surface, land and air temperatures, surface winds, humidity, soil conditions, and vegetation cover and types). Terrestrial water storage (TWS) is a key component of the hydrological cycle, which includes all forms of surface and subsurface water (Syed et al. 2008). Runoff variability can be monitored using hydrological stations; however, many basins in the world suffer from a lack of hydrological stations or sparse stations (Gao et al., 2010; Lakshmi et al., 2018).

Land surface models (LSMs) that simulate surface-atmosphere interactions have been efficient tools for studying the terrestrial water budget and projection and prediction of the land surface dynamics (Rodell et al., 2004; Gao et al., 2010; Bai et al., 2016; Fisher & Koven, 2020). Coupled water, energy, and carbon fluxes between the earth’s surface and atmosphere can be solved using mathematically represented models (Fisher & Koven, 2020). LSMs are the most sophisticated tools that can be used for global climate change studies because spatial and temporal variability of water and energy cycles can be characterized using LSMs (Bai et al., 2016). Global land data assimilation system (GLDAS), provided by the National Aeronautics and Space Administration (NASA), offers uniform and frequent information about water and energy components (Rui et al., 2020). GLDAS integrates remote sensing and ground-based observations and provides quantities (e.g., evapotranspiration, runoff, and snow water equivalence) that cannot be directly observed by satellites (Rui et al., 2020) or provided by simple models. Despite the availability of advanced LSMs such as GLDAS, simulation of human-induced alterations within a large basin is still a challenge (Gao et al., 2010; Oliveira et al., 2014; Bai et al., 2016; Wang et al., 2016; Lv et al., 2017; Qi et al., 2020). Nevertheless, due to their high spatial resolution, hydrological models at smaller scales (microscale models) might be suitable for closing the water budget.

Satellite remote sensing products are becoming increasingly important in water resources management. Satellite remote sensing provides global coverage and spatially uniform data compared to ground-based non-uniform measurements. Satellite observations can provide reliable precipitation estimates on a global scale with fine spatial and temporal resolution; offering precipitation data over data-sparse regions is one of the distinct advantages of earth-observing satellites (Kidd et al., 2017). Recently, the performance of satellite-based rainfall estimates has been studied by many researchers (Funk et al., 2015; Le et al., 2018; Hosseini-Moghari & Tang, 2020; Shen et al., 2020; Hsu et al., 2021), which promising results have been reported and remotely sensed precipitation products have been recommended as an alternative in terms of time and space for data-scarce areas. However, in regions with elevation variations, high spatial variability is expected in satellite-based precipitation estimations (Jia et al., 2020). Remotely sensed evapotranspiration can be calculated based on geophysical parameters observed by satellite sensors. Similarly, the potential of evapotranspiration products retrieved from space-borne remote sensing data have been evaluated by several researchers (Velpuri et al., 2013; Long et al., 2014; Du & Song, 2018; Dzikiti et al., 2019; Chen et al., 2020; Senay et al., 2020); despite uncertainties, the studies have revealed the advantages of satellite-based evapotranspiration products compared to expensive conventional methods. Although runoff cannot be obtained directly from satellite data, it can be inferred as a residual of the water budget (Sheffield et al., 2009; Gao et al., 2010; Lv et al., 2017). Some individual components of TWS such as surface water and soil moisture can be measured using different satellite data; however, integrated measurement of TWS using remote sensing technique is only possible from Gravity recovery and climate experiment (GRACE) satellite mission (Jia et al., 2020; Syed et al., 2008). Global water mass changes can be inferred from changes in gravity using GRACE data, which was not possible before the launch of GRACE satellites (Yin et al., 2019; Jia et al., 2020; Rzepecka & Birylo, 2020). GRACE TWSC includes all aspects of change in water storage, including human alterations (Gao et al., 2010; Lakshmi et al., 2018).

Despite some limitations, the potential for satellite remote sensing to estimate the water budget is high and uncertainties vary from basin to basin (Long et al., 2014; Lakshmi et al., 2018; Yin et al., 2019); thus, more studies over different basins can better reveal the potential of satellite-based water budget estimation. This paper presents estimates of the Kizilirmak River Basin terrestrial water budget from remote sensing and GLDAS-2.1 model outputs and intercompares the results. Assessments are made for the water years 2014 and 2015. The main objective of this paper is to evaluate the performance of satellite remote sensing in water budget estimation and to analyze and compare the consistency of spatial patterns between satellite data and earth system-modeled data. We take precipitation data from two satellite-based remote sensing products (GPM IMERG and CHIRPS and two models, GLDAS-2.1 CLSM and Noah. Evapotranspiration products are taken from Terra MODIS satellite data and three models (SSEBop, CLSM, and Noah). Total water storage is taken from GRACE satellite data and model outputs. Since no explicit runoff retrievals are made from satellite remote sensing, we infer runoff from remote sensing estimations based on water balance and compare with runoff data taken from streamflow observations and two model outputs. We use the observed runoff data as a target to assess water budget closure feasibility from remote sensing data for ungauged rivers. We first process each data in the ArcGIS environment and evaluate each of the remotely sensed water budget components against data taken from GLDAS-2.1 model outputs and other satellite datasets. We then calculate the total water budget and runoff as a residual of the water budget and compare them with the model outputs and gauge measurements. Finally, we evaluate the uncertainties and discuss barriers in the water budget closure using remote sensing data.

Materials and Methods

Study Area

The Kizilirmak River, with a length of approximately 1355 km, is Turkey’s longest river that originates and ends within the country. The river rises from the eastern part of Central Anatolia; it first flows to the west and south-west, then forms an arc and flows into the Black Sea as a delta (Fig. 1). The river collects water from many rivers as it passes through the provinces of Sivas, Kayseri, Nevsehir, Kirsehir, Kirikkale, Ankara, Aksaray, Cankiri, Corum, and Samsun, respectively. Its main tributaries are Delice River, Devrez and Gokirmak. Central Anatolia is a region where drought is intense; since it is surrounded by mountains, the region is under the influence of a continental climate with hot and dry summers and cold and snowy winters, where the average air temperature is 13.7 °C (Yüce & Ercan, 2015). The Kizilirmak river is fed by rain and snow, and it has the lowest flow in September and reaches its peak in April (Harmancioglu & Altinbilek, 2020). The Kizilirmak basin has ten major sub-basins; in some areas of the basin, the valley widens and turns into a plain. The drainage area is 82197 km2; the annual average precipitation, evapotranspiration, and runoff are 451 mm, 243 mm, and 74.46 mm, respectively (Harmancioglu & Altinbilek, 2020; Selek & Aksu, 2020). There are 11 dams on the river (Ozturk & Sesli, 2015); the river supplies water to Ankara.

Fig. 1
figure 1

Study area

Model Description

The NASA Global Land Data Assimilation System (GLDAS) project provides optimal fields of land surface states (e.g., soil moisture, temperature) and fluxes(e.g., evapotranspiration, runoff) by incorporating satellite- and ground-based observational data products as well as data assimilation techniques (Rodell et al., 2004; Rui et al., 2020). By integrating a huge amount of global observation data, multiple Land Surface Models (LSMs) are driven by GLDAS. Currently, four LSMs are driven by GLDAS (Rui et al., 2020), namely Noah (Chen et al., 1996), Catchment Land Surface Model (CLSM; Koster et al., 2000), the Community Land Model (CLM; Dai et al., 2003), and Variable Infiltration Capacity (VIC; Liang et al., 1994, 1996).

In this section, the reprocessed data products of GLDAS Version 2 (hereafter, GLDAS-2) are discussed, which has three components: GLDAS-2.0, GLDAS-2.1, and GLDAS-2.2 (Rui et al., 2020). GLDAS-2.0 provides time-series data from 1948 to 2014, which is temporally consistent and forced completely with the Princeton meteorological input data (Sheffield et al., 2006). GLDAS-2.1 provides data from 2000 to present, which is forced with a combination of observation and model data from the NOAA/GDAS (Global Data Assimilation System; Derber et al., 1991), GPCP (Global Precipitation Climatology Project; Huffman et al., 2001; Adler et al., 2003), and the AGRMET (Air Force Weather Agency’s AGRicultural METeorological modeling system). GLDAS-2.0 and GLDAS-2.1 products are publicly available and do not include data assimilation, whereas GLDAS-2.2 includes data assimilation from Gravity Recovery and Climate Experiment (GRACE) and provides data from 2003 to present (Li et al., 2019a, 2019b, 2019c). Basically, the temporal resolutions for these products are 3-hourly and daily; the GLDAS-2 monthly products are generated from the 3-hourly products through the temporal averaging method. The monthly model outputs include three categories of data: water balance (such as rainfall rate, snowfall rate, surface and subsurface runoff, evapotranspiration, and soil moisture), energy balance (such as latent heat net flux, sensible heat net flux, and ground heat flux), and forcing parameters (such as temperature, wind speed, and short- and long-wave radiation). For a complete specification of the GLDAS-2 products, the reader is referred to (Rui et al., 2020).

The Noah LSM was developed in 1993 by collaborating researchers from public and private institutions, initiated by the National Centers for Environmental Prediction (NCEP; Chen et al., 1996; Ek et al., 2003), and has been used operationally for climate predictions since 1996 (Bai et al., 2016). The Noah model continues to evolve by adding new functions and enhancing the existing equations of land surface processes (Cai et al., 2014). The Noah model for hydrological modeling has a multilayer soil structure that simulates the freezing and thawing of soil water in all layers. The Noah model describes soil water movement using the Richards equation and calculates the surface and subsurface runoffs based on a water balance scheme (Schaake et al., 1996; Bai et al., 2016).

The catchment LSM (Koster et al., 2000) was designed and is constantly being developed in NASA’s Global Modeling and Assimilation Office (GMAO; GES DISC, 2021). In the traditional LSMs, a grid cell is used as the land surface element, while CLSM uses a topographically derived hydrological catchment as the model’s basic computational unit (Xia et al., 2017). Groundwater is also included in the CLSM by associating the spatial distribution of water table depth to the catchment’s topography statistics (Koster et al., 2000; GES DISC, 2021). In the CLSM, three non-traditional bulk moisture variables (the catchment deficit, the surface layer excess, and the root zone excess) are used to represent the catchment moisture conditions—equilibrium conditions related to the distribution of water table and non-equilibrium conditions near the surface (Koster et al., 2000).

Data

In this study, remote sensing and earth system-modeled datasets (see Table 1) were used for estimating the water budget in the Kizilirmak River basin for the water years 2014 and 2015 (October 01, 2013 to September 30, 2015). A Digital Elevation Model (DEM) with 1 arc-sec (~ 30 m) grid resolution was obtained for the study area from the Shuttle Radar Topography Mission (SRTM). The DEM was used to delineate the basin and its stream network. GLDAS-2.1 Noah (Li et al., 2020b) and CLSM (Li et al., 2020a) models were selected to use their Level-4 monthly output data for relative comparisons because of their ability in representing groundwater as well as the high performance of the data assimilation framework (Getirana et al., 2017; Jung et al., 2019). The GLDAS-2.1 Noah and CLSM data products are available in 0.25° and 1° spatial resolutions, respectively. Monthly averaged Precipitation (P), Evapotranspiration (ET), Surface Runoff (R), and Terrestrial Water Storage (TWS) data outputs of the two models were downloaded from NASA’s Goddard Earth Sciences Data and Information Services Center (https://daac.gsfc.nasa.gov). Satellite-based hydrological datasets were obtained from different sources to evaluate the water budget estimation using only remote sensing data by comparing them with the model outputs and in situ observations.

Table 1 List of hydrological variables used in this study

Methods

To achieve the study aim, first, the basin and its stream network were delineated from the DEM data using Arc Hydro Tools in the ArcGIS software environment. Then image pre-processing was performed on the hydrological raster data to make them ready for analysis. Next, the variable units were converted to mm/month using the Raster Calculator function. Subsequently, monthly basin-averaged values were extracted from each variable using Zonal Statistics. For calculating the basin water budget, the monthly data were accumulated, and the calculations were performed based on the general water balance equation (Gao et al., 2010; Lakshmi et al., 2018; Yin et al., 2019). Finally, the yearly accumulated components were multiplied by the basin area to obtain total annual quantities. The general equation of water balance is given below.

$$P = {\text{ET}} + R + \Delta S$$
(1)

where \(P\) is precipitation, \({\text{ET}}\) is evapotranspiration, \(R\) is runoff, and \(\Delta S = \frac{{{\text{ds}}}}{{{\text{dt}}}}\) is change in surface and subsurface water storage.

It is worth noting that water quantities used for irrigation or other domestic uses are not explicitly included in Eq. 1 because of the lack of a globally consistent method for estimation of such quantities (Lakshmi et al., 2018). Before and after launching earth observation satellites and introducing new products, studies are usually conducted to ensure the accuracy and quality of observations. Validation studies can be conducted by comparing the results with the in situ measurements, remotely sensed data, as well as model outputs. All the data used in this study have been extensively validated using in situ studies and other methods. Remotely sensed precipitation datasets such as TRMM and GPM have been independently evaluated by many researchers (Nicholson et al., 2003; Huffman et al., 2007; Xu et al., 2017; Hosseini-Moghari & Tang, 2020). MODIS retrieved evapotranspiration has been assessed in several studies (Mu et al., 2007; Kim et al., 2012; Velpuri et al., 2013; Gemitzi et al., 2017). GLDAS outputs and the forcing data have also been validated in numerous studies (Lohmann et al., 2004; Luo et al., 2007; Zaitchik et al., 2010; Rodell et al., 2011; Chen et al., 2013; Wang et al., 2016; Bai et al., 2016).

This study focuses on the comparison of remotely sensed water budget components with model outputs and in situ observations to analyze their spatial patterns and the correlation between them. For this purpose, water budget components from satellite observations were compared with that of GLDAS-2.1 Noah and CLSM outputs and available station observations. Precipitation data were compared with remote sensing and integrated station and remote sensing data (IMERG, CHIRPS). Evapotranspiration was compared with the MOD16 ET and actual ET from the SSEBop model. Modeled runoff data were evaluated with the stream gauge observation and inferred runoff from the water balance equation. For evaluating TWSC, comparisons were made between the GRACE and GLDAS model data. Relative comparisons were performed, and the coefficient of determination (R2) was used to as a metric to assess agreement between the model and remote sensing data. Coefficient of determination is the square of correlation coefficient, which shows percentage variation and ranges between 0 and 1; the higher the better.

Results and Discussion

Remote sensing data products and GLDAS-2.1 model outputs were used for estimating water budgets in the Kizilirmak River Basin for the water years 2014 and 2015. Quantities of precipitation, evapotranspiration, runoff, and terrestrial water storage change were calculated for each year. This section evaluates satellite-based water budget components by comparing them with model outputs and measured data.

Precipitation

Figure 2 shows the spatial distribution of total precipitation over the water years 2014 and 2015 from remote sensing observation (GPM IMERG) and GLDAS model outputs (Noah and CLSM). Since GLDAS outputs have low spatial resolution, extracting raster by mask can lead to data loss; considering this issue, the shape extent coordinates were used, and then the area average values were extracted using the Zonal Statistics tool in ArcGIS software. From Fig. 2, it can be seen that GPM IMERG underestimates precipitation over the southeastern regions, while both models overestimate in the eastern regions and underestimate over the northern (coastal) areas.

Fig. 2
figure 2

Average annual total precipitation for 2014 and 2015 from a and d GPM IMERG (remote sensing observation), b and e GLDAS Noah, and c and f GLDAS CLSM

Figure 3 compares monthly basin-averaged precipitation datasets. The comparison of four datasets illustrates that satellite-based remotes sensing observation (GPM IMERG) and rain gauge and satellite observation (CHIRPS) provide lower precipitation rates; however, both GLDAS models present higher rates. The differences between the datasets can be due to using different forcing data in the models. Scatter plots show a strong linear correlation between the GLDAS CLSM and Noah precipitation datasets (see Fig. 3c). Figure 3a shows higher consistency between the GPM IMERG and CHIRPS compared to model outputs. Pairwise comparison of GPM IMERG and CHIRPS shows a positive correlation with a value of R2 = 0.79 (Fig. 3b), while better linear association (R2 >0.85) can be seen between the GPM IMERG and GLDAS models (see Fig. 3d, e). Compared to CHIRPS, GPM IMERG tends to overestimate the amount of precipitation ranging from 2% to 50%; this is consistent with the results of (Hosseini-Moghari & Tang, 2020). CHIRPS datasets are reliable gridded precipitation datasets available globally because these datasets are from rain gauge and satellite observations and validated in several studies (Dinku et al., 2018; Haghtalab et al., 2019; Katsanos et al., 2016). Alejo and Alejandro (2021) reported that CHIRPS showed adequate performance in their validation study, and they recommended using this data in water resources planning of regions with data scarcity and sparse weather monitoring networks. On the other hand, Hsu et al. (2021) showed that IMREG performed slightly better than CHIRPS in their study area. Precipitation is one of the most complex processes in the hydrologic cycle; thus, diverse inputs and retrieval algorithms lead to different estimates, especially over mountainous regions.

Fig. 3
figure 3

Comparison of a monthly basin-averaged precipitation and pairwise scatter plots of b GPM IMERG versus CHIRPS, c CLSM versus Noah, d GPM IMERG versus Noah, and e GPM IMERG versus CLSM

Evapotranspiration

Figure 4 illustrates the annual area-averaged total evapotranspiration for the water years 2014 and 2015 from remote sensing observation (MOD16) and the Noah and CLSM models. The figure shows that the MOD16 product underestimates ET compared to the model outputs. Although the spatial resolutions of ET maps derived from the model outputs are very low, the patterns show some similarities.

Fig. 4
figure 4

Average annual total evapotranspiration for 2014 and 2015 from a and d MOD16 (remote sensing observation), b and e GLDAS Noah, and c and f GLDAS CLSM

Monthly basin-averaged evapotranspiration derived from satellite data (MOD16) and the output of the three models (SSEBop, GLDAS Noah, and GLDAS CLSM) are shown in Figure 5. The figure shows a high correlation between the Noah and CLSM, which both provide the highest ET throughout the water years. The ET values from the SSEBop model, especially during the wet seasons, are the lowest. According to Alemayehu et al. (2017), since land surface temperature retrieved from remote sensing data is the primary forcing data for the SSEBop model, the weak performance of SSEBop model is mainly associated with the use of constant calibration coefficient for determining the cold reference temperature. Remote sensing product shows lower values ranging from 40 to 60% of modeled (GLDAS) ET. Depending on basin characteristics, MOD16 ET may have significant uncertainties (Velpuri et al., 2013; Du & Song, 2018; Dzikiti et al., 2019; Souza et al., 2019). Since ET is essential component in the water budget estimation and it is difficult to obtain in situ ET measurements, for better comparison, we also added the SSEBop model ET product, which is based on satellite thermal data and assimilated weather fields (Senay et al., 2013). In the study conducted by Kim et al. (2012), MOD16 actual ET showed reasonable accuracy. Velpuri et al. (2013) evaluated MOD16 and SSEBop ET data. The researchers reported that MOD16 ET was effective in their study; nonetheless, they indicated that both MOD16 and SSEBop have their advantages and limitation in different land cover classes. Because of the lack of data, the ensemble mean ET from LSMs is usually used for estimating water budget (Jimenez et al., 2011; Mueller et al., 2011; Lv et al., 2017; Yao et al., 2017; Wartenburger et al., 2018; Pan et al., 2020).

Fig. 5
figure 5

Comparison of monthly basin-averaged evapotranspiration extracted from remote sensing data and products of the three models

Terrestrial Water Storage Change

Terrestrial water storage is a key component of the water cycle. Estimates of TWSC derived from GRACE and GLDAS models were calculated by taking the difference of monthly TWS over the study period. Change in TWS derived from two model outputs and GRACE product are shown in Fig. 6. Inconsistencies between the model outputs and GRACE TWSC are seen over most of the regions. Figure 7 compares monthly basin-averaged TWSC values of models and remote sensing data. Some months are missing in the GRACE archive (missing months for our study period are Feb-2014, Jul-2014, Dec 2014, Jun-2015, and Oct 2015). Data gaps in GRACE happens in many years since 2011 due to the active battery management of the aging satellite batteries (Cooley & Landerer, 2021).

Fig. 6
figure 6

Annual total terrestrial water storage change over the years 2014 and 2015

Fig. 7
figure 7

Comparison of monthly basin-averaged TWSC derived from the GLDAS model outputs and remote sensing data product

For filling the gap for those missing months, we took the average of five previous years because the interpolation method using adjacent two months recommended by Lv et al. (2017) showed higher variations from the mean. Lv et al. and Long et al. (2015) recommended interpolation methods using the adjacent two months for filling the missing monthly data. Therefore, for some months (e.g., Jan-2014, Feb-2014, Jun-2014), significant variations between GRACE and modeled values are observed (Fig. 7). In addition to the data gap, the differences could be due to many reasons, which details can be found in the GRACE L-3 Product User Handbook (Cooley & Landerer, 2021). Due to the coarse resolution of GRACE (330 × 330 km), spatial signal-leakage from surrounding areas is possible, especially at the sea boundary (Gao et al., 2010; Yin et al., 2019; Cooley & Landerer, 2021). When the orbit is close to an exact repeat, the monthly grids have more considerable inaccuracies, resulting in inaccurate gravity field calculations (Cooley & Landerer, 2021).

Furthermore, uncertainties in P, ET, and R lead to uncertainties in TWSC. Another explanation for the discrepancies between the model and GRACE TWSC might be due to the lack of consideration of lake and river modules in the GLDAS model (Gao et al., 2010; Xia et al., 2017; Lakshmi et al., 2018). Considering the low spatial resolution of the GRACE and GLDAS data, comparable TWSC results may be obtained over a large basin (> 150,000 km2) because the effective spatial resolution of GRACE is around 150 km2 (Li et al. 2019a, 2019b). GRACE data have been used in many studies related to water balance for obtaining TWS anomalies for a given time period (Rodell et al., 2007; Landerer & Swenson, 2012; Ouma et al., 2015; Xiao et al., 2015; Jia et al., 2020; Rzepecka & Birylo, 2020). Nevertheless, the data gap in the GRACE archive is a crucial challenge for estimating monthly TWSC (Li et al. 2019c; Wang et al., 2021). Moreover, due to the coarse sensor resolution, GRACE data is not practical for basins at smaller scales (Lakshmi, 2016; Lakshmi et al., 2018), which is the major limitation of GRACE and GRACE-FO. For small watersheds, total water change can be estimated using remote sensing P and ET data and surface runoff from observation stations. Ensemble mean TWSC from LSMs is also used for reducing uncertainties when using model data (Li et al. 2019c).

3.4 Runoff

Figure 8 shows annual total runoff from the Noah and CLSM models and the calculated residual (P-ET-TWS) from the water balance equation for the water years 2014 and 2015. Since R cannot be obtained directly from satellite data, we calculated the residuals to see if we can interpret those values as runoff in the basin. The aim of this study is not to close the water balance but to examine the behavior of each component over the Kizilirmak Basin. For this, residuals were also calculated from the model data to see the changes from the actual R. Figure 8 compares the annual accumulated runoff with the residuals from the water balance equation. The figure indicates that even from model output residuals, the exact R values cannot be obtained. Figure 9 compares monthly modeled runoff values with the stream gauge flow rates.

Fig. 8
figure 8figure 8

Average annual total runoff for 2014 and 2015 from a and c GLDAS Noah and water balance equation residual, b and d GLDAS CLSM and water balance equation residual, and e GRACE water balance residual

Fig. 9
figure 9

Comparison of monthly basin-averaged modeled runoff with the runoff obtained from gauge observation

Figures 8 and 9 show significant differences in R values derived from the Noah and CLSM models. There are also considerable variations between the modeled R and stream gauge observation data. GLDAS simulates runoff, which is not directly comparable to observed streamflow at basin outlet; for obtaining more accurate R, comparable to streamflow, river routing models are used (Li et al., 2013; Bai et al., 2016). Since streamflow routing is not included in the GLDAS-2.1 simulations, the errors in the modeled R are quite large as compared with the observed values. The results show that modeled R values significantly vary from the observed streamflow; the difference in R values is consistent with the results from the previous studies (Bai et al., 2016; Lv et al., 2017; Pan et al., 2017; Yin et al., 2019; Liu et al., 2020; Qi et al., 2020). Nevertheless, Noah runoff seems to be closer to the in situ values compared to the CLSM. From Fig. 8e, it can be seen that inferred runoff is greatly overestimated; thus, it is difficult to consider the inferred R from the water balance equation residual as discharge. However, since obtaining streamflow records over regions with sensitive water issues is challenging, particularly over transboundary river basins, the residual value will be effective to reach an approximate calculation of the runoff over a basin. Although modeled runoff data have limitations such as ignorance of water management practices due to their low spatial resolution, these data are useful because of their temporal resolution and global coverage.

Comparison of P-ET-R and TWSC

Figure 10 represents area-averaged monthly P-ET-R and water storage change from the GLDAS CLSM model. Since runoff cannot be obtained directly from remote sensing observations, and TWSC from GRACE has data gaps, the comparison was made using the GLDAS model output to see the variations. Figure 10a shows that both P-ET-R and TWSC show temporal variability of water equivalent thickness anomaly because increase and decrease in water storage are reflected by both (P-ET-R and TWSC). A positive P-ET-R value corresponds to a positive value of TWSC and vice-versa. Figure 10b shows the correlation between P-ET-R and TWSC from GLDAS CLSM outputs.

Fig. 10
figure 10

Comparison of a time series and b scatter plot of P-ET-R and TWSC from GLDAS CLSM outputs

The amount of groundwater withdrawal in a basin can be obtained from the difference between P-ET-R and ΔS (Lakshmi et al., 2018), but the dynamic of withdrawal is quite complicated. For determining the actual dynamics, extensive analysis needs to be done at the sub-basin level. Considering the results from the GLDAS CLSM model, if GRACE data with no gap is available for the study period, runoff can be inferred from the water balance equation. However, R and TWSC inferred from the water balance equation are subject to uncertainties, which differ from basin to basin (Long et al., 2014; Lakshmi et al., 2018; Yin et al., 2019). Lakshmi et al. (2018) studied the correlation between P-ET-R and TWSC over the world’s major river basins; their results showed R2 values ranging from 0.35 to 0.9. The authors concluded that human activities affect the water system in a basin because basins with less human activities (e.g., the Amazon River Basin) showed less uncertainty in total water change. Surface water (lakes and reservoirs) and melting of snow can be another factor for variations in TWS.

Evaluation of Satellite-Based Water Budget Estimation

Figure 11 shows basin-averaged water budget components in billion cubic meters for the water years 2014 and 2015.

Fig. 11
figure 11

Total annual water budget of the Kizilirmak River Basin for the water years 2014 and 2015

Figure 11 compares total annual P, ET, R, and TWSC from the GLDAS models and remote sensing observations. The figure shows that the amount of total P in 2015 is much higher than in 2014, where the differences are approximately 21% for CLSM, 22% for Noah, and 21% for remote sensing (GPM IMERG). Likewise, total ET shows higher values in 2015 than in 2014, where the variations are about 26% for CLSM, 13% for Noah, and 20% for remote sensing (MOD16). In the same manner, Modeled R shows about 62% and 50% increases in the water year 2015 from CLSM and Noah, respectively. Except for Noah, TWSC from the CLSM and GRACE is in agreement with precipitation for both years. From the results in Fig. 11, we see a consistency in the hydrological cycle with respect to the water balance in both years. The annual average precipitation for the Kizilirmak basin was estimated as 689 mm, 690 mm, and 613 mm from CLSM, Noah, and Remote Sensing, respectively, which corresponds to 55.7, 55.8, and 49.5  m3 of water, respectively. The average values of total P and ET were compared, which ET losses account for over 75% from model products and about 50% from remote sensing data. In a study conducted by Selek and Aksu (2020), they reported the average ET loss of 49% for entire Turkey and 54 % for the Kizilirmak basin. This shows a good agreement of the MOD16 ET in this study with the results achieved by Selek and Aksu.

Numerous studies have been conducted to examine water budget closure using satellite remote sensing data by comparing with the in situ or model data, which most of them agree with our study. Sheffield et al. (2009) performed a study over the Mississippi River Basin; they found a great overestimation of R due to high bias in P. Gao et al. (2010) studied water budget estimation using remote sensing data over major US river basins. They reported considerable spatial variations in ET and TWS and significant inconsistencies among P products. The authors also indicated that inferred R (as a residual of the water balance equation) values from satellite data were overestimated. Sahoo et al. (2011) estimated the water budget from satellite data over ten global river basins, but water budget closure was not achieved. Similarly, other researchers evaluated the water budget closure over various river basins in the world using satellite data (Oliveira et al., 2014; H. Wang et al., 2014; Penatti et al., 2015; Lv et al., 2017). Overestimation of R and underestimation of ET have been observed as common barriers in water budget closure. Although water budget closure was not achieved in any of the above studies, the authors concluded that satellite data is quite useful in evaluating trends and assessing changes in water balance.

In summary, water budget closure at the basin scale based on satellite data alone is still not possible. In addition to spatial and temporal discrepancies, instrumental errors and using different retrieval algorithms and parameterizations are the barriers to closing the water budget. However, satellite-based data can be extremely useful for hydrological modeling, basin management, and predictions in the data-scarce regions. For example, Mohammed et al. (2018a) developed a regional hydrological decision support system based on multiple satellite earth observations along with the soil and Water assessment tool (SWAT), which showed promising results over the Mekong River Basin.

Conclusions

In this study, major water balance components (P, ET, R, and TWSC) were examined based on GIS analysis over the Kizilirmak River Basin using publicly available monthly satellite data and GLDAS model output products for the water years 2014 and 2015. For water years 2014 and 2015, amounts of precipitation, evapotranspiration, runoff, and terrestrial water storage change were calculated from GPM IMERG, MODIS, GRACE, and GLDAS-2.1 Noah and CLSM products. The results revealed the monthly and yearly changes of the overall water budget components over the basin. For the water years 2014 and 2015, annual precipitation and evapotranspiration obtained from remote sensing observations and GLDAS-2.1 CLSM showed differences ranging from approximately 11% and 12% for precipitation, and 47% and 51% for evapotranspiration, respectively. Similarly, annual differences between the remote sensing observations and GLDAS-2.1 Noah were 11% and 12% for precipitation and 48% and 43% for evapotranspiration. Since ET showed significant uncertainties, we also calculated the total ET from the SSEBop model product, which is produced based on satellite data. The differences between the SSEBop ET and GLDAS CLSM ET were 47% for 2014 and 53% for 2015. The largest uncertainty was observed in estimating the change in terrestrial water storage. There were large differences between the modeled runoff products; however, Noah showed a better correlation with the in situ streamflow observations. The differences between the runoff obtained from the Noah model and stream gauge observations were around 58% and 6% for the years 2014 and 2015, respectively. Exact runoff cannot be obtained from remote sensing; nonetheless, we examined the indirect approach—interpreting the residual from the water balance equation (P-ET-TWSC) as runoff. The residual quantities were compared with the modeled runoff values in which the results showed significant discrepancies.

Closing water balance continues to be a challenge due to a variety of uncertainties such as the low resolution of GRACE & GRACE-FO and significant errors in MODIS evapotranspiration. There are constraints in calculating the total water budget using GLDAS model outputs and satellite-based remote sensing data due to limitations in modeling/observing all the water components in a basin. For example, streamflow, irrigation, groundwater pumping, and other anthropogenic influences are not included. The advantage of estimating the water budget based on GLDAS model products is the consistency of spatial resolution in all components. However, lakes and reservoirs are not included, which leads to uncertainties. This study demonstrated the strengths and limitations of satellite-based remote sensing and GLDAS-2.1 CLSM and Noah models in estimating water budget. Although CLSM has a lower resolution (1°) compared to the Noah model (0.25°), the TWS component is included in this model as output. Contrarily, in the Noah model, the TWS needs to be calculated based on soil moisture, snow water equivalent, and canopy water variables. Water budget components estimated from satellite remote sensing data come from different datasets with different resolution and error characteristics; therefore, the quantities are not absolute. Even if we obtain long-term remote sensing data, it will be difficult to close the water budget because of the lack of explicit runoff information. Ensemble modeling approach using remote sensing and in situ data, and streamflow routing simulation would yield better estimates. In spite of the uncertainties in GLDAS and remote sensing data, such data can be quite useful for evaluating seasonal and interannual changes in water components and river basin management, particularly in data-sparse regions. Moreover, remote sensing and LSM datasets can be used as ancillary data for calibrating and validating regional hydrological models.