1 Introduction

Understanding the spatial variability of rainfall in densely populated regions dependent on the supply of water for agriculture and human consumption is essential and indispensable for many sectors of the economy (Zhu et al. 2018; Cunha et al. 2021). Acquiring this knowledge is particularly important in regions both with scarce data (which also suffer from flooding) and with historic problems involving access to adequate water in the Recife Metropolitan Region (RMR), located in the coastal zone of northeastern Brazil (Braga et al. 2013). Rain is considered one of the most important variables of the hydrological cycle and the main input variable for hydrological modeling and for representing rainfall-runoff transformation processes (Lima et al. 2021).

Fortunately, various organizations are providing remote sensing products, such as rainfall, temperature, and air humidity data, needed for hydrologic modeling (Gajbhiye et al. 2014; Meshram and Sharma 2017; Musie et al. 2019). For instance, Climate Forecast System Reanalysis (CFSR), Tropical Rainfall Measuring Mission (TRMM), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR), and Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) are datasets at global or quasi-global scales. These datasets have been available in the last few years (Tapiador et al. 2012; Ferreira da Silva et al. 2020; Santos et al. 2021a, b), and several studies have been carried out using open access meteorological data for the flow simulation. However, the majority of such studies focused only on rainfall data.

Several studies assessed the performance of TRMM and CFSR rainfall to drive Soil and Water Assessment Tool (SWAT) model (Arnold et al. 1998) in streamflow simulation (Worqlul et al. 2017; Duan et al. 2019) and contrasting findings were reported from different studies (Mararakanye et al. 2020; Zhang et al. 2020). For instance, Dile and Srinivasan (2014) reported that CFSR rainfall data was found to yield satisfactory streamflow simulation in Lake Tana River basin, Ethiopia. Fuka et al. (2014) also evaluated the use of CFSR data in four small catchments in United States. Yang et al. (2014) assessed streamflow simulation in two upstream basins of the Three Gorges Reservoir in China. However, those works only analyzed the performance of CFSR rainfall data but did not comprehensively evaluate the other weather variables, such as maximum and minimum temperature, air relative humidity, wind speed and solar radiation obtained from CFSR. Monteiro et al. (2015) compared different grid precipitation data for Tocantins River basin, in Brazil and evaluated which grid data set best represents precipitation. Li et al. (2018) evaluated the use of TRMM product and the role for hydrologic simulations for a large basin in China. De Almeida et al. (2020) evaluate the TRMM-estimated rainfall data for a humid basin in southern Brazil, and concluded that the values projected for rainfall totals and rainfall occurrences are reliable, showing that TRMM-estimated data could be a suitable alternative to evaluate the rainfall for areas with a sparse density of rain gauges.

Santos et al. (2017), Santos et al. (2018a), and Santos et al. (2018b) performed spatiotemporal drought analyses over several areas in Brazil using TRMM-estimated rainfall data and several techniques, as clusters, dendrograms, and spatial distribution of Standardized Precipitation Index (SPI). The authors concluded that the TRMM products could be a powerful tool in identifying homogeneous regions. However, TRMM data have scarcely been applied to hydrological modeling for the eastern portion of northeastern Brazil (Tobin and Bennett 2014; De Medeiros et al. 2018). Furthermore, some studies have reported that the accuracy of these products is weak, which indicates that a high cloud cover in a region likely cannot be converted into rainfall (Fedorova et al. 2016). Thus, further research is needed to verify this possibility and to demonstrate the applicability of TRMM data in hydrological modeling for the eastern part of northeastern Brazil. The validation of satellite precipitation product data can be reached by a direct comparison with the existing rain gauge (Bitew and Gebremichael 2012) or by leveraging the ability of those data to forecast streamflow using hydrologic model (Aouissi et al. 2019; Dinku et al. 2007).

Those studies focused on assessing gridded meteorological datasets and their potential for hydrological application in different areas worldwide with scarce data. In this sense, the TRMM has been widely used because it offers global coverage, high temporal resolution, and data with a relatively high spatial resolution. However, potential hydrological application of TRMM products to a humid area in the coastline of northeastern Brazil has not yet been proven (Soares Cruz et al. 2018). In this study, CFSR and TRMM data were used in the SWAT model. The SWAT has been satisfactorily applied to several basins throughout northeastern Brazil (Silva et al. 2013; De Medeiros et al. 2018; Silva et al. 2018), especially for basin management and for simulating evapotranspiration, infiltration and discharge. However, more effort is required to test the reliability of freely available precipitation products for hydrologic modelling. In addition, there is a need to assess the performance, applicability, and accuracy of these products using hydrologic models. Thus, this study analyzes the performance of TRMM satellite data in the simulation of streamflow in the Pirapama River basin by using the SWAT model and compares the generated flow estimates with the observed rainfall data from the study area.

The climate in the study area is influenced by two atmospheric systems, the Upper Tropospheric Cyclonic Vortices (TCV) and the Easterly Wave Disturbances (EWD). The TCV cause instability in edge and very concentrated rainfalls. In relation to the rainfalls that occur in autumn and winter, the EWD stand out in the modulation of rainfalls, which spread from the ocean towards the continent. These characteristics affect the behavior of hydrological variables, such as surface and subsurface runoff, evapotranspiration, percolation, evaporation, and aquifer recharge. The Pirapama River basin is a strategic basin because is the most important water source for the RMR, one of the largest population concentrations in Brazil (Braga et al. 2013). The Pirapama River basin has been one of the main sources of water for the RMR since 2001, when the Pirapama reservoir was first implemented. However, this region has low rain gauge density and scarce hydrological data. Thus, this study seeks to provide an analysis of the quality of the satellite-estimated rainfall data in hydrological modeling, since these data can be considered as an alternative for carrying out this type of study. Therefore, the Pirapama River basin was chosen to understand the functions of the hydrological processes using gridded meteorological datasets.

2 Material and methods

2.1 Studied area

The Pirapama River basin is situated in the center portion of the RMR and in the forest zone of the state of Pernambuco, more precisely between the latitudes 8° 07′ 29″ S and 8° 21′ 00″ S and the longitudes 34° 56′ 20″ W and 35° 23′ 13″ W. This basin has an area of approximately 600 km2, covers a distance of 80 km from the source to the outlet, and rises to an average altitude of 450 m. The basin’s outlet is located in the Jaboatão River between the cities of Jaboatão dos Guararapes and Cabo de Santo Agostinho (CPRH 1998) (Fig. 1). The main reservoirs in the basin are Pirapama, Gurjaú, and Sicupema. The Pirapama reservoir is the main reservoir with a storage capacity of 55 million and 234,000 m3, and the Gurjaú and Sicupema reservoirs have a maximum capacity of 1.0 and 3.2 million m3, respectively. These reservoirs are used to supply approximately 2.5 million inhabitants of the RMR (ANA 2019).

Fig. 1
figure 1

Location map of the Pirapama River basin in the state of Pernambuco, Brazil

This study provides a theoretical and methodological basis on the hydrological dynamics of the basin using two types of data, i.e., satellite-estimated and rain gauge-measured rainfall data, which can guide and assist in more in-depth studies dealing with the association of these results with the operating rules applied to existing reservoirs in the basin. In addition, simulations of water behavior for the entire basin or Pirapama, Gurjaú, and Sicupema reservoirs, provide subsidies for managers to anticipate possible problems in the region, such as coping with drought or flood events. The results serve as a basis for decision-making, which involve the proper planning of the water resources.

The Pirapama River basin encompasses seven cities in the RMR, which contain a total of approximately 1,158,595 inhabitants, 84.4% of which live in urban areas (IBGE 2010). The LULC settings within the Pirapama River basin are quite diverse: the basin is characterized by urban and industrial settings, small farms, polyculture (rural settlements), two small hydroelectric power stations, sugarcane cultivation areas, Atlantic forest coastline and mangroves (Santos and Silva 2007).

The climate of the region is the Tropical type with dry summer (As), which is warm and humid with improved strong solar radiation by trade winds, according to Köppen’s climate classification (Florencio et al. 2001). The monthly average temperature varies between 26 and 28°C, while the air relative humidity is higher than 70% from March to September (CPRH 2003). Within the rainfall regime, the region has two well-defined periods: a dry season between September and February with a monthly average rainfall of less than 60 mm and evaporation that exceeds precipitation and a rainy season between March and August, in which the hydrological balance is generally positive (Viana et al. 2019). The annual averages of precipitation and evaporation in the region are approximately 1500 mm and 1200 mm, respectively (Stretta 2000). In relation to the pedology of the Pirapama River basin, the predominant soils in the area are red-yellow ultisol, yellow ultisol and gleysols. To a lesser extent, psamment (close to the coastline), nitosols, yellow oxisol, and mangrove soils also occur in the basin.

2.2 Meteorological datasets and streamflow data

In this study, three meteorological datasets with data from 2000 to 2010 were used, i.e., rain gauge-measured, TRMM-estimated, and CFSR data (Table 1). Rain gauge-measured rainfall data were acquired from the Agência Pernambucana de Águas e Climas (APAC 2013). Rainfall CFSR data (Saha et al. 2010) related to two grids were used. This product is a grid of 0.31° spatial resolution covering almost the whole globe from 1979 to 2014. CFSR utilizes precipitation from the National Oceanic and Atmospheric Administration, unified daily gauge analysis from Climate Prediction Center, and data assimilation scheme. According to Essou et al. (2017), the coupling of the atmosphere-ocean-land surface-sea ice system has improved the precision of the CFSR dataset. As the atmospheric model-generated rainfall is considered too biased, the land surface component does not use rainfall from this product.

Table 1 Rainfall, flow, and weather data and TRMM grid cells used for this research

The daily rainfall data from the TRMM satellite for the study region were obtained from the TRMM 3B42 V7 product for the period from 2000 to 2010. The TRMM 3B42 product provides estimates of the cumulative precipitation over 24 h (mm/day), which are generated with a spatial resolution of 0.25° (Huffman et al. 2007; Huffman and Bolvin 2013). The TRMM, which provides crucial information on rainfall, has been fundamental to several studies because it provides valuable rainfall data in portions of the world where such data are scarce, as exemplified in Baker and Miller (2013), Pombo and de Oliveira (2015), Tekeli and Fouli (2016), Nastos et al. (2016), Santos et al. (2017), Kiany et al. (2018), and Li et al. (2018).

The TRMM-derived and CFSR precipitation data were compared with the observed precipitation by performing a point-by-point analysis at daily, monthly and annual scales. As the basin is not equipped with a robust rain gauge network and because there were missing data in the rainfall data, only the four rainfall stations with the most consistent data were used for the analysis. However, there was no rain gauge in the grid area of ​​the CFSR 83353 and CFSR 83350, and then the closest rain gauge (Pirapama gauge) was used for comparison. The observed monthly streamflow data from three stations within the study area were obtained from Agência Nacional de Águas (ANA). We selected the period that corresponded to the same period of the flow series employed for hydrological modeling (2000-2010).

2.3 Performance evaluation of the meteorological datasets and simulated streamflow

The quality of the precipitation time series estimated by the TRMM 3B42 product and CFSR data were verified by the root mean square error (RMSE), normalized root mean square error (NRMSE), coefficient of determination (R2), and percent bias (PBIAS) as recommended by Brito et al. (2021) and Brasil Neto et al. (2021). In this way, the daily, monthly, and annual time series were analyzed to compare the behaviors of the measured and estimated data.

To analyze hits and misses between TRMM-estimated and rain gauge-measured rainfall data, we analyzed the occurrence statistics that are linked to the amount of rain events in the region, i.e., rainfall depth equal or greater than 1 mm (De Almeida et al. 2020). The four occurrence statistics were bias, detection probability, false alarm rate, and correct proportion. Bias indicates quantitative analysis; i.e., it is an indicator of the underestimation or overestimation of the number of rainfall events that are correctly identified by the TRMM products, which is calculated using Eq. 1:

$$ \mathrm{Bias}=\left(S+{F}_a\right)/\left(S+F\right) $$
(1)

where S is success when the estimated and the observed rainfalls indicate daily total rainfall equal to or greater than 1 mm, Fa is false alarm when the estimated rainfall records rainfall while there is not observed rainfall, and F is failure when the estimated rainfall does not record rainfall within the study area while there is observed rainfall.

The detection probability (DP) indicates the percentage of rainy days identified by the estimated rainfall (Eq. 2), in which DP equal to 1 represents a perfect detection.

$$ \mathrm{DP}=S/\left(S+F\right) $$
(2)

The false alarm rate (FAR) is calculated using Eq. 3, which represents the percentage of dry days not correctly identified by the satellite.

$$ \mathrm{FAR}={F}_a/\left(S+{F}_a\right) $$
(3)

Correct proportion (CP) identifies the estimated rainfall data accuracy percentage, without distinction between the correct existence and the correct absence of rainfall, which is computed as:

$$ \mathrm{CP}=\left(S+C\right)/T $$
(4)

where C is correct negative when the estimated rainfall and the observed rainfall do not capture rainfall within the basin in a day, and T is equal to the total of successes, correct negatives, false alarms, and failures.

In this study, the quality of the observed simulated streamflow (Qsim_obs) and simulated streamflow using TRMM-estimated rainfall data (Qsim_TRMM) were analyzed by the coefficient of determination (R2), and percent bias (PBIAS), according to Silva et al. (2013), Silva et al. (2018), and Viana et al. (2019).

2.4 SWAT model setup, calibration, and validation

The SWAT model is a physically distributed and temporally continuous based model that simulates streamflow, erosion in planes and channels, and the transport of nutrients and pesticides on daily, monthly and annual timescales. The hydrological model is based on the water balance equation (Eq. 5):

$$ {SW}_t={SW}_0+\sum \limits_{i=0}^t\left({R}_d-{Q}_{\mathrm{sup}}-{E}_a-{W}_{vad}-{Q}_{sub}\right) $$
(5)

where SWt is the final soil water storage (mm), SW0 is the initial storage of water in the soil on day i (mm), t is the time (days), Rd is the precipitation on day i (mm), Qsup is the streamflow on day i (mm), Ea is the evapotranspiration on day i (mm), Wvad is the percolation on day i (mm), and Qsub is the return flow (capillary action from the vadose zone) on day i (mm). The streamflow was calculated based on the Soil Conservation Service method (Neitsch et al. 2011).

Hydrological simulation in the Pirapama River basin was performed with the 2012 version of SWAT model in ArcGIS 10.2® software through an ArcSWAT interface. To apply the model, initially we used a digital terrain elevation model (DEM) from Shuttle Radar Topography Mission 30 m (Fig. 2a), and soil type (Fig. 2b) and LULC maps (Fig. 2c). The model discretizes the watershed into subbasins using direction of drainage network streamflow, derived from the DEM. In addition, we used meteorological data (air relative humidity, maximum and minimum temperature, wind speed, and solar radiation) from CFSR, and TRMM-estimated and rain gauge-measured rainfall data.

Fig. 2
figure 2

(a) Digital elevation model, (b) LULC map, and (c) soil types of the Pirapama River basin

The LULC map was based on two Landsat 5/TM satellite images with a spatial resolution of 30 m (orbit 214 and point 66), obtained from the National Institute of Space Research, Brazil. The images were acquired on July 6, 2005, and July 28, 2007, which presented the fewer clouds in the region. The LULC classification was based on a supervised classification, and six classes were determined and associated with the LULC corresponding to the SWAT database: (a) water, (b) urban area, (c) bare soil, (d) dense vegetation, (e) pasture and (f) sugarcane (Fig. 2b and Table 2). To estimate the streamflow in the Pirapama River basin by the SWAT model, data from four rain gauges and TRMM satellite were used.

Table 2 Area, percentage of soil types, and LULC used in this study

To validate the supervised classification, a set of points were collected. These points were determined using samples that were classified by visual interpretation and checked in the field using a LULC map developed by APAC (2013). These procedures were used to verify hits, misses and the accuracy of the classified LULC map. In this process, a number of samples was defined for the six classes of LULC, according to their occupation area in the basin, totaling 170 samples. After defining the samples, the data were tabulated in a confusion matrix. Based on this matrix, it was possible to apply the calculation of global accuracy, producer accuracy, user accuracy and the kappa agreement statistic (κ). This statistic is determined by the amount of correctly classified samples, corresponding to the ratio between the total of the main diagonal of the error matrix (samples correctly classified) and the total number of the sample (sum of all elements of this matrix), having as base the overall number of classes (Eq. 6).

$$ \kappa =\frac{N{\sum}_{i=1}^r{n}_{ii}-{\sum}_{i=1}^r\left({n}_i\times {n}_{+i}\right)}{N^2-{\sum}_{i=1}^r\left({n}_i\times {n}_{+i}\right)} $$
(6)

where nii is the number of observations in row i and column i, r is the number of lines in the matrix, ni and n+i are marginal totals for row i and column i, respectively, and N is the total number of observations (Congalton and Green 2009).

The degree of performance of the data was assessed according to the kappa statistic: (a) none: κ < 0, (b) poor: 0 < κ ≤ 0.2, (c) fair: 0.2 < κ ≤ 0.4, (d) moderate: 0.2 < κ ≤ 0.6, (e) good: 0.6 < κ ≤ 0.8, and (f) very good: 0.8 < κ ≤ 1.0. The soil type map, whose scale is 1:100,000 (Fig. 2c), was obtained from EMBRAPA (2013). The soil parameters followed the EMBRAPA soil classification (available at http://www.sisolos.cnptia.embrapa.br).

The Pirapama River basin and its sub-basins were delineated using a rectangular cut-out of the DEM and the vector file of the basin drainage network (i.e., the Pirapama River basin). During this process, 29 sub-basins were generated. In this study, the river basin was divided in multiple hydrological response units (HRUs), i.e., homogeneous areas with the same types of soil, LULC and slope. To define the HRUs, unique combinations of LULC, soil types and slope were entered. The maps were overlaid in such a way that all cells with the same combination of LULC, soil type and slope classes generated a single map, and an identifier number was assigned to each combined area, representing an HRU. For this research, 1641 HRUs were generated.

Five categories of slope were defined for the HRUs: 0–3%, 3–8%, 8–20%, 20–45%, and >45%, according to Santos et al. (2021a, b). Multiple methods were adopted to define the HRU. The percentage defined for multiple HRUs was 0% for the three categories (LULC classes, soil types and slope). In the process of defining the HRUs, we chose to leave the sensitivity level at 0% in the three categories presented above, as this indication allows all types of LULC, soil types and slope ranges to be considered in the model, without loss of information. After these processes, meteorological and precipitation data were introduced into the model. It is worth highlighting that daily rainfall data are automatically distributed by the SWAT model based on the nearest neighborhood method, which defines the areas of influence of each rain gauge (Zhang et al. 2009).

2.5 SWAT model performance evaluation

The SWAT was semiautomatically calibrated and the uncertainty analyses were calculated using the SUFI2 algorithm in the software named SWAT Calibration and Uncertainty Programs (SWAT-CUP) 2012 v.5.1.6.2 (Abbaspour et al. 2007). As described by Rouholahnejad et al. (2012), SUFI2 uses the Latin hypercube method to define the parameters, and the calibration process starts with a range of values determined by the user. More details regarding the SWAT-CUP operation and calibration algorithms are described in Abbaspour et al. (2007) and Abbaspour (2012).

The SWAT model was warmed up for 3 years (1997–1999), and the subsequent period of 2000–2006 was used for calibration, while the data during 2007–2010 were used for validation, for both projects (rain gauge-measured and TRMM-estimated rainfall data). The calibration consisted of a maximum number of four iterations, each iteration consisted of 500 simulations with a combination of parameters for the sub-basin corresponding to each streamflow gauge. The calibration and validation of the model for the study area was carried out in a monthly time step, due to the large amount of failure in the daily streamflow data.

For the selection of rainfall stations, those with data from similar period to the streamflow time series and with the lowest number of missing data (1997–2010) were considered. The same time series period was considered for the TRMM-estimated rainfall data. The SWAT model was calibrated from upstream to downstream, and the parameter values that showed the best results were replaced in the model.

The self-calibration of the SWAT model was preceded by a parameter sensitivity analysis, in which the influence of each parameter on the hydrological modeling process of the basin is analyzed. The sensitivity of the parameters was determined by applying a multiple regression system, which is related to the objective functions. The SUFI-2 algorithm offers two ways for global sensitivity analysis, i.e., t-stat and p-value. The t-stat is used to detect the relative significance of each parameter, indicating that the higher its absolute value, the more sensitive the parameter is. The p-value calculates the significance of the sensitivity (Abbaspour et al. 2015).

For this research, 19 parameters influencing the flow rate under the conditions of the basins in northeastern Brazil were considered in the sensitivity analysis: Alpha_BF, Biomix, Canmx, CNII, CH_K2, CH_N2, Epco, Esco, GW_Delay, GW_Revap, Gwqmn, Rchrg_DP, Revapmn, Slsubbsn, Sol_Alb, Sol_Awc, Sol_K, Sol_Z, and Surlag in agreement with Santos et al. (2015) and Silva et al. (2018). The interval of variation of each parameter and the parameter’s modification method used in the calibration process were defined based on the recommendations of Arnold et al. (2012), Pinto et al. (2013), Pereira et al. (2014, 2016a, 2016b), and Andrade et al. (2021). In this study, only the monthly streamflow was calibrated and validated. The performance of the model was verified through the following objective functions: PBIAS, Nash-Sutcliffe efficiency (NS), and R2, based on studies about the evaluation of modeling performance (Bonumá et al. 2015; Bressiani et al. 2015; Faramarzi et al. 2015; Zeiger and Hubbart 2018; Ren et al. 2018; Santos et al. 2021a, b). PBIAS evaluates the average trend in which the simulated data must be larger or smaller than the observed data, and NS looks for the best fit for the maximum flows and can range from infinite negative to 1. The ranges of values considered satisfactory were NS ≥ 0.5, PBIAS ≤ ± 25% and R2 ≥ 0.6 (Moriasi et al. 2007).

3 Results and discussion

3.1 Classification accuracy assessment

Table 3 shows the confusion matrix, user accuracy and producer accuracy obtained after the validation of the samples selected in the LULC map already classified for the Pirapama River basin. Through this crossing analysis, the overall accuracy calculation and the kappa statistics were also obtained. In the lines are the information obtained with the maps used as reference and in the columns are the information of the maps classified automatically (supervised). The diagonal of the table shows the accuracy of each LULC class. The values that are outside the main diagonal refer to the errors of omission and commission of each class; more details about these errors can be found in Carvalho et al. (2004).

Table 3 Accuracy assessment results and confusion matrix

According to the confusion matrix assessment, most classes showed good accuracy or precision in terms of classifications; however, the classes referring to sugarcane, dense vegetation and pasture showed a greater number of poorly classified samples, i.e., those classes that had greater number of pixels attributed to other categories or classes of LULC (producer accuracy and omission error). The producer and the user accuracies showed that the results were considered good to excellent (Congalton and Green 2009). In the user accuracy, the bare soil, dense vegetation, and water classes showed 100, 97, and 73% accuracy, respectively. In the producer category, the urban area class had 100% accuracy, whereas the bare soil, water, and sugarcane had 97, 93, and 89%, respectively. The total accuracy of LULC classes was 84%, which is considered excellent (Table 3).

In general, although the classifications did not present 100% accuracy for all LULC classes, both in terms of producer and user accuracies, the results were considered satisfactory, since each category presented correctly classified pixels above 60% accuracy, of the total samples collected for each LULC class. With regard to global accuracy and the kappa statistic, the values were 0.77 and 0.80, respectively, which are considered very good according to Congalton and Green (2009), because they are closer to 1.

3.2 Comparison between observed rainfall and gridded meteorological datasets

In this section, daily, monthly, and annual comparisons and evaluations of the ground-measured and TRMM-observed rainfall data (2000 to 2010) for the Pirapama River basin are carried out. Figure 3a shows comparisons of the daily rainfall obtained by the rain gauges (observed) with the daily rainfall estimated by the TRMM satellite according to the area of each centroid. In these daily comparisons, the TRMM rainfall peaks were very high in relation to all analyzed rainfall stations, as can be observed in Pirapama and Recife (Fig. 3a). These two rainfall stations are located in regions within the RMR with high rainfall. However, although this region receives more rainfall than the rest of the basin, the TRMM estimates do not correspond to reality or to the rain peaks analyzed at the other stations.

Fig. 3
figure 3

Daily comparisons between the accumulated rainfall obtained from (a) TRMM 3B42 and the rain gauge measurements (2000–2010), and (b) CFSR 83350 and CFSR 83353 for the study area

Figure 3b shows the daily comparison of the accumulated rainfall obtained from the Pirapama rain gauge and CFSR data. The comparison shows that the estimated data overestimated the observed values. However, the results show that there is a similarity in the pattern of rainfall that is well represented in the largest precipitated volumes, especially in the values of rainfall peaks. In general, the global data are not able to represent well the trend of increasing rainfall in the region throughout the time series, as well as its variability.

Table 4 shows the efficiency indicators for the observed and estimated rainfall in the Pirapama River basin on daily, monthly, and annual timescales. The four efficiency indicators for the daily rainfall confirm the abovementioned visual analysis results and have unsatisfactory values. For two sets of data among the four seasons analyzed, the minimum values were the same, but the maxima differed, as the TRMM data overestimated the observed rainfall. Regarding the average, this overestimation occurred at only two of the analyzed stations (Vitória de Santo Antão/TRMM_P2 and Pombos/TRMM_P1). The R2, RMSE, NRMSE, and PBIAS values indicate a low correlation between the measured data and satellite-estimated rainfall. In addition, the PBIAS values indicate an overestimation with the satellite data between two data sets and an underestimation in the other two.

Table 4 Daily, monthly, and annual efficiency indicators for the rain gauge-measured, TRMM- and CFSR-estimated rainfall data

However, considering the coverage of a TRMM grid (≈ 625 km2), the statistics obtained in the individual analysis of each station can be considered reasonable, except for R2. Because of this coverage, discrepancies can occur between the precipitation recorded at stations separated by a few kilometers but within the same TRMM pixel. Thus, the lowest correlations observed at these stations (Recife/TRMM_P3 and Vitória de Santo Antão/TRMM_P2) can be related to some error in the estimation of the precipitation in the pixel that encompasses those stations. Regarding the monthly statistics, Table 4 shows that the maximum and average values ​​varied little in the comparison between the estimated and observed values. The R2, RMSE, NRMSE, and PBIAS values indicated good correlation between the data sets, with PBIAS indicating an underestimation in the satellite data (−8.44%). The statistical data showed good accuracy between the observed and estimated data, with an R2 of 0.62, a PBIAS of −11.51% (indicating an underestimation of TRMM data), an RMSE below 100 mm (89.50 mm) and an almost adjusted NRMSE (0.07 mm) (Table 4).

Table 4 shows also the efficiency indicators for measured (Pirapama rain gauge) and estimated rainfall data with reanalysis data (CFSR 83350 and 83353) in the Pirapama River basin on daily, monthly and annual timescale. For the daily scale, the maximum estimated values showed a difference of up to 78.4 mm, underestimating the measured values. The average of the estimated values showed a smaller variation in relation to the measured data, with a difference of up to 1.41 mm/day, with an overestimation of the estimated values. The values of R2, RMSE, and NRMSE indicate a low precision in the estimation of the rainfall values in relation to the measured ones due to some values present in the comparison between the daily data, especially in the larger values. The PBIAS, when comparing the two points of the CFSR with the measured data from Pirapama gauge, indicates an overestimation of the estimated data, with values within satisfactory range, according to Moriasi et al. (2007).

With regard to monthly and annual statistics, Table 4 shows that the estimated maximum values varied little from those measured rainfall data, as well as the averages, despite the overestimation of these values. The R2 values were better, both on monthly and on annual scales, when compared to the daily data, but are still unsatisfactory according to Moriasi et al. (2007). The RMSE presented values of greater magnitude when compared to those on daily scale; however, it is natural, since the total precipitated amount in one month and in 1 year are considerably higher than in just one day. The results for RMSE and NRMSE indicate that on monthly and annual scales, the estimated rainfall data are more accurate.

Figure 4a-d shows the analysis of the rainfall variability in the Pirapama River basin from 2000 to 2010. The data show that the TRMM-estimated rainfall was underestimated compared with the measured data, except in 2000, 2008, and 2009, when the estimated values were higher than the measured data. In general, the annual values estimated by the TRMM satellite were close to the measured annual averages (1367.97 mm observed rainfall compared with 1210.56 mm estimated by the TRMM satellite, i.e., a discrepancy of 157.41 mm). The year 2000 showed the highest annual average rainfall recorded among the years analyzed, both for the data estimated by remote sensing and for the measured data, in which similar values were recorded with annual averages of approximately 2400–2300 mm. The lowest annual average rainfall was recorded in 2001, when the rainfall estimated by the TRMM presented a greater discrepancy in relation to the measured data (Fig. 4a).

Fig. 4
figure 4

Analysis of the rainfall variability in the Pirapama River basin from 2000 to 2010: (a) annual, (b) monthly, (c) correlation between CFSR and measured at Pirapama gauge, and (d) correlation between mean TRMM-estimated and measured rainfall data

Figure 4b shows the seasonal variability in the basin for the data measured by the rain gauge and estimated by the CFSR and Fig. 4c-d shows the dispersion of these data. It is possible to observe that between October and March the estimates are more different from the observed values, a period in which it rains less in the basin, according to the historical time series analyzed (2000 to 2010). Between April and September, estimates are closer to the measured values, even though overestimating in some months and underestimating in others. During this period, June showed a higher peak of rainfall, showing an underestimation of the estimated data. The rainy period in the basin is from March to August, and the dry period is from September to February. Figure 4b shows that the estimated rainfall data better represented the rainy period, with less divergent values than in the dry period, except for March. Figure 4c-d shows a good correlation between the measured and estimated rainfall data for the two grids, with R2 equal to 0.72 and 0.60, between the CFSR 83350/Pirapama and CFSR 83353/Pirapama, respectively. Overall, the data represented well the seasonal variability of the Pirapama River basin, with R2 values considered good to satisfactory, according to Moriasi et al. (2007).

These results are close to those from works developed in Brazil and in many other countries, which were carried out to evaluate TRMM-estimated rainfall data comparing with measured rainfall data, as reported by Franchito et al. (2009), Oliveira et al. (2014), Ochoa et al. (2014), Santos et al. (2019a, 2019b), and Brasil Neto et al. (2021). Moreover, in general, the coefficients were satisfactory on the monthly and annual scales because improvements were observed in the results proportional to the increase in temporal scale. Such behavior was also observed in many works that applied the TRMM products to the northeastern coast of Brazil (Pereira et al. 2013; Soares et al. 2016; Santos et al. 2018a; Soares Cruz et al. 2018). Thus, these daily, monthly, and annual assessment analyses indicate that the TRMM 3B42 precipitation product performs better on the monthly and annual scales than on the daily scale when compared with measured data. However, while the daily data did not present as good of an evaluation in relation to the measurements, they were tested to evaluate their accuracy in the study area when generating flow estimates through SWAT. As the model works with the average rainfall distributed by sub-basins, the daily data tend to adjust better to this average.

3.3 Occurrence statistics

Table 5 shows the values of occurrence statistics for the Pirapama River basin. It is possible to note that, for all analyzed rain gauges, the BIAS showed values less than 1. This indicates a condition that the estimated rainfall data underestimated the measured data. The BIAS values ranged from 0.67 to 0.80. Bernardi (2016) studied a basin in the state of Rio Grande do Sul, Brazil, and the BIAS statistics ranged from 0.82 to 1.04, with an average of 0.95. De Almeida et al. (2020) reported BIAS values ranging from 0.81 and 0.97, which were considered satisfactory, indicating a good estimated rainfall data response.

Table 5 Results of the occurrence statistics for the Pirapama River basin

The DP and FAR statistics indicate the percentages of wet days that were correctly identified and those of dry days that were not correctly identified, respectively. The average values of the DP and FAR statistics were respectively equal to 0.50 and 0.32, indicating that about 50% of wet days were correctly identified, whereas 32% of the days were not. Jiang et al. (2018) analyzed the performance of DP and FAR parameters comparing TRMM-estimated and measured rainfall data for Shanghai city and obtained values of 0.65 and 0.35, respectively. Similar values were also obtained by Ouatiki et al. (2017) for an extensive area in Morocco (PD = 0.40 and FAR = 0.60). In contrast, Anjum et al. (2018) obtained values of PD = 0.76 and FAR = 0.26 in Pakistan, while Yang et al. (2018) obtained PD = 0.77 and FAR = 0.37 for Dadu River basin, in China, and De Almeida et al. (2020) reported values around 0.5 for both parameters for Itapemirim River basin, in Brazil. In this study, CP values ranged from 0.56 to 0.66, which were lower than those values obtained by Soares et al. (2016) and De Almeida et al. (2020), who reported values above 0.70. However, the results can be considered satisfactory, because they indicated that the estimated rainfall data showed efficiencies greater than 50%, which can be explained because the region of Pirapama River basin is located in the coastal zone of northeastern Brazil. According to Gadelha et al. (2019), this region presents a great overestimation of estimated rainfall data due to the high cloudiness that is usual in this region. In addition, these results are probably due to the inability of the passive microwave and infrared sensors to detect warm-rain processes over land in this region. Even so, these results show that the TRMM-estimated rainfall data can be a good source of data for Pirapama River basin, as well as for most of northeastern Brazil, although some uncertainties are found and need to be further studied.

3.4 Hydrological modeling performance

Based on the input data, the SWAT model was used to simulate the flow, after which the model was similarly calibrated and validated using the two types of rainfall data (measured by rain gauges and estimated by TRMM). To proceed with the calibration process, a sensitivity analysis was previously performed. Figure 5 shows the most sensitive parameters and their order according to the degree of sensitivity. Based on this analysis, eight parameters were identified as the most sensitive for streamflow calibration in the study area, with a p-value equal to or less than 0.1 and t-stat above 1 (GW_Delay, GW_Revap, Esco, Gwqmn, Revapmn, Alpha_Bf, CN2, and CANMX) (Fig. 5). However, four of those that were not considered sensitive will also be taken into account for later calibration, due to their importance for the study region (i.e., Ch_K2, Ch_N2, Rchrg_Dp, and Sol_Awc), according to Andrade et al. (2019). Thus, twelve parameters were considered for the calibration. According to Daggupati et al. (2015), not necessarily only the parameters considered sensitive during the sensitivity analysis need to be calibrated; all parameters can be evaluated based on the modeler experience.

Fig. 5
figure 5

Result of the sensitivity analysis of the parameters in the SWAT model for the Pirapama River basin

Five out of the eight parameters considered most sensitive during the sensitivity analysis, five are related to shallow and underground aquifers, which influence the base flow (GW_Revap, GW_delay, Gwqmn, Alpha_Bf, and Revapmn), two parameters are related to evapotranspiration and evaporation (Canmx and Esco), and one parameter is related to streamflow (CN2).

After selecting the parameters most sensitive to the flow adjustment for the Pirapama River basin, the parameters were automatically calibrated with SWAT-CUP to adjust the calculated flows to match the observed flow data. Table 6 shows the parameters that were used in the calibration, the methods used, and the values adjusted after this process in the three contribution areas of the flow stations.

Table 6 Parameters and methods used in the calibration of the SWAT model, and the fitted values using rain gauge-measured and TRMM-estimated rainfall data

Methods: v replace, r relative, and a absolute

3.4.1 Calibration and validation

The hydrographs are the observed and calculated flows after performing the calibration (2000 to 2006) and validation (2007 to 2010) for three streamflow stations (Fig. 6a-c). Figure 6a shows that after the calibration with measured rainfall data (the simulated Q), the model simulated the flow data well for Destilaria Bom Jesus station, exhibiting improvements in the flow peak, base flow and hydrograph recession, which more closely matched the observed values. However, for the period between January and November 2004 and in July 2005, the model did not simulate the flow very well. Regarding the adopted statistics, according to Moriasi et al. (2007), the results are considered very good (NS = 0.82, R2 = 0.83 and PBIAS = 9.9%) (Table 7). After the validation period, the hydrological modeling presented satisfactory simulations with good fit between the hydrographs (measured and simulated) for Destilaria Bom Jesus station. The results show also a good representation of the peaks and base flows, except for the periods between March and August 2008 and between August and October 2010 (Fig. 6a). The results for Destilaria Bom Jesus station presented values of NS, R2, and PBIAS obtained after the validation lower than those found in the calibration. However, the validation modeling results were still classified as very good for NS (0.78) and good for R2 (0.72) and PBIAS (−12.55%) (Table 7).

Fig. 6
figure 6

Calibration and validation of the monthly streamflow for (a) Destilaria Bom Jesus, (b) Destilaria Inexport, and (c) Cachoeira Tapada

Table 7 Performance statistics after calibration and validation of the results based on measured and estimated rainfall data

After adjusting the parameters, the simulated values of the TRMM-obtained flow (Qsim_TRMM) were also correlated well with the observed data (Fig. 6). Nevertheless, these values were less accurate than the simulations that used the measured precipitation data, especially in relation to the peak flows. The statistical values of NS (0.75) and R2 (0.74) for the Destilaria Bom Jesus station were considered good, as were those of PBIAS (−7.1%), which reflects underestimation of the data (Table 7). These statistics were lower than those obtained in the calibration of the flow generated by the measured data. After validation, the flow data obtained through the TRMM rain estimates (Q_TRMM) also fit relatively well with the observed data with good representations of the peak and base flows in general, except in 2008, which showed a different variability from the measured data (with early hydrograph recession). This statistical analysis indicates satisfactory adjustments to NS = 0.61 and R2 = 0.64 and a very good adjustment to PBIAS = 6.07%. However, in comparison, the validation statistics from Q_TRMM showed less accuracy, except for PBIAS (6.7%) (Table 7).

Figure 6b shows the calibration and validation results for the monthly streamflow at the Destilaria Inexport station. The hydrographs show that after calibration, the simulated flow data obtained through the measured rainfall fit closely with the observed data; however, the periods between July and October 2000 and between June and December 2004 did not adjust well to the flow peaks, presenting an overestimate and an underestimate, respectively. In addition, the base flow did not show a good adjustment, especially between August 2004 and April 2005, when the values were highly discrepant. In general, the adopted statistics show that the results were considered very good with an NS of 0.81, an R2 of 0.84 and a PBIAS of 2.33% for the Destilaria Inexport station. For the values obtained after the validation (2007–2010), the estimated data also fit well to the observed data, with good adjustments in the peak, average and base flows. The period between March and August 2009 did not display such a good fit for the peak flow and overestimated the observed flow, but the variation trend of the flow during the period was represented well. The values of NS, R2, and PBIAS obtained after validation were lower than those found during the calibration, except for the R2 value, which was higher (Table 7). According to standards in Moriasi et al. (2007), the values obtained after the validation for this flow station were considered good for NS (0.72), very good for R2 (0.86) and satisfactory for PBIAS (19.11%).

After calibration, the simulated flow from the TRMM data was adjusted satisfactorily, improving the peak flow in relation to the rainfall peaks during the studied period. However, the peaks did not fit as well as with the simulated values that used TRMM-estimated rainfall data for Cachoeira Tapada station. The simulated values from the TRMM data resulted in statistics with acceptable values; these values were also lower than those found in the calibration obtained by the measured rainfall. Thus, the values of NS (0.75) and R2 (0.70) were considered good, while the value of PBIAS (5.51%) was very good (Table 7). In the validation, part of the series did not fit well to the observed data, in which it is possible to observe a delay in the response of the simulated hydrograph in relation to the measured flow. The variability of the hydrograph over the entire series was represented well only in 2010. Thus, the statistical analysis also did not effectively represent the validation of the flow data with TRMM, as unsatisfactory NS (0.04) and R2 (0.40) values were acquired. However, the value of PBIAS (10.40%) was considered very good, indicating an overestimation of the estimated data (Table 7).

Figure 6c shows the hydrographs of the calculated and observed discharge data after the calibration and validation for the Cachoeira Tapada station. After the calibration, the simulated Q fit relatively well to the observed data, mainly improving the base flow. The flow peaks did not present such good estimates, the NS value was satisfactory (0.68), the R2 value (0.71) was good, and the PBIAS value (−1.5%) was very good (Moriasi et al. 2007). During the validation, the simulations fit better to the observed data, especially the peak flows, whereas the base flows were underestimated in much of the series (October 2007 to January 2008 and October 2009 to January 2010) and overestimated between January and May 2007. Regarding the statistical data, the NS value (0.67) was good, the R2 value (0.85) was very good, and the PBIAS value (−19.18%) was satisfactory.

The calibration results with TRMM-estimated rainfall data for this streamflow gauge revealed values that did not fit well using the observed rainfall, especially for the base flow (Fig. 6c). The highest peaks were overestimated by the model, especially in 2000, 2002, and 2005. The performance indicators obtained during the calibration ranged from satisfactory to good and very good (NS = 0.54, R2 = 0.75 and PBIAS = −2.43%, respectively). In the validation, the hydrograph showed a delay in the response of the simulated data in much of the series; the peak flow and recession were delayed in relation to the measured flow, except in 2010, in which similar variability to the measured flow was observed, but with significant overestimation between June and July 2010 (Fig. 6c). The statistics for the validation using the TRMM data indicated that the simulated values did not fit reliably (NS= −0.24 and R2 = 0.35), except for PBIAS (−8.99%), which was considered very good (Table 7).

Some of the R2, NSE, and PBIAS values obtained in both the calibration and the validation stages with the measured data and the TRMM estimates were higher than the values considered acceptable by Green et al. (2006), Green and Van Griensven (2008), Ren et al. (2018), Santhi et al. (2001), and Brighenti et al. (2019), with the exception of some data, for example, the values estimated using the TRMM data (the validation of the Destilaria Inexport and Cachoeira Tapada stations). The results presented in this research, in relation to hydrological modeling, are similar to the results obtained by Santos et al. (2014), who performed simulations with the SWAT model for the Tapacurá River Basin in northeastern Brazil; the authors obtained good results in both calibration and validation, with NS and R2 values of 0.78 and 0.79, respectively, for the calibration period and 0.85 and 0.86, respectively, for the validation period.

4 Conclusions

This study analyzed the simulation of streamflow using the SWAT model based on measured and estimated gridded meteorological datasets for Pirapama River basin, located in a humid area with scarce data in northeastern Brazil. The conclusions from the present study can be summarized as follows:

  • The evaluation of the accuracy of the LULC mapping for the Pirapama River basin was satisfactory, whose results were considered very good, based on the confusion matrix and the adopted statistics.

  • On the daily scale, the precipitation product derived from TRMM 3B42 showed poor correlation with the gauge-measured precipitation. In addition, the daily analysis overestimated precipitation compared with the gauge data, especially in the rain peaks.

  • The monthly satellite data showed considerable improvement and more closely matched the measured precipitation data. However, the satellite data underestimated the monthly average flow rates in a considerable proportion of the series.

  • The CFSR data are not able to represent well the trend of increasing rainfall within the region, as well as its variability. For the daily scale, however, the R2, RMSE, and NRMSE values indicated a low precision of the estimated rainfall values in relation to the measured data.

  • Results of modeling with the satellite-derived precipitation data showed good statistical results in the calibration.

  • The data from the TRMM satellite are capable of generating satisfactory results, despite the unsatisfactory values in the validation, however not as better as the rain guage-measured rainfall data.

  • The results show that satellite-estimated rainfall data can be configured as a support alternative for areas with scarce rain gauge data, which constitute the main input variable for hydrological modeling, especially for the SWAT model. As a result, the TRMM 3B42 product can be a source of data for streamflow simulations of the Pirapama River basin and can provide valuable information for the future management of water resources in ungauged basins.