1 Introduction

Water vapor is the dominant greenhouse gas in the Earth’s atmosphere and, at the same time, highly variable. The analysis of its spatial and temporal variations is a major objective of climate research (Bennartz and Fischer 2001), important in several major areas of atmospheric sciences, on scales ranging from turbulence to synoptic-scale systems, and including cloud formation and maintenance, radiation and climate (Couvreux et al. 2005). Information about the distribution and variability of atmospheric water vapor is essential to understand processes that control the Earth's radiative budget and the hydrological cycle (Thies and Bendix 2011). Therefore, its accurate measurement is of great interest (Raja et al. 2008).

Precipitable Water Vapor (PWV) is the total atmospheric water vapor contained in a vertical column from the Earth’s surface to the top of the atmosphere. PWV is expressed as the depth of the water column in millimeters if all the water vapor from the air column is condensed in a vessel of the same cross section (Ruckstuhl et al. 2007).

There are various techniques for estimating PWV, including the Global Positioning System (GPS), Aerosol Robotic Network (AERONET), radiosonde data, satellite measurements, and re-analysis data. The PWV data that are derived from the ground-based measurements, such as the radiosonde and AERONET PWV, are considered reference data sets due to their accuracy. The radiosonde network has long been the primary in situ observing system for monitoring atmospheric water vapor. However, the use of radiosondes is restricted by their high operational costs, decreasing sensor performance in cold and dry conditions, their poor coverage over oceans and in the Southern Hemisphere and very inhomogeneous distribution (Li et al. 2003; Vey et al. 2010). These limitations and the capabilities of the satellite data has attracted more attention to the use of the latter. The Moderate Resolution Imaging Spectroradiometer (MODIS) offers a new opportunity to improve global monitoring of temperature, moisture, and ozone distributions and changes therein. The wide spectral range, high spatial resolution, and near-daily global coverage of MODIS enable it to observe the earth’s atmosphere and continuously monitor changes (Seemann et al. 2003).

Although the MODIS observations have continuous spatial coverage, their accuracy must be evaluated with the ground-based measurements. Gao and Kufman (2003) pointed out that typical errors in the derived water vapor values are in the range between 5 and 10%, while they can reach up to 14% under hazy conditions. Several sources of errors for water vapor retrievals from near-IR channels have been reported, including spectral reflectance of the surface, sensor radiometric and spectral calibrations, pixel registration between several channels, atmospheric temperature and moisture profiles, and the amount of haze. In addition, Seemann et al. (2003) mentioned that regions with cloud contamination in the MODIS retrievals show inaccurate moisture. Albert et al. (2005); Prasad and Singh (2009) selected samples under completely clear sky and free from clouds to validate the MODIS PWV data.

Numerous studies have evaluated the accuracy of satellite data with other databases in different parts of the world. Many studies have compared the accuracy of MODIS PWV with GPS data. Some studies, such as Li et al. (2003) in Germany, Torres et al. (2010) in Spain, Gui et al. (2017) in China (under clear sky conditions), Wang et al. (2017) in the Tibetan Plateau, and recently Khaniani et al. (2020) over Iran, concluded that MODIS PWV overestimates compared to GPS. Other studies, such as Lu et al. (2011) over southern Tibet under clear sky, Ningombam et al. (2016) over the trans-Himalayan region and Bai et al. (2021) over China, reported that MODIS underestimates PWV values compared to GPS. Some researchers, such as Liu et al. (2015) and Gui et al. (2017), showed that the MODIS PWV tends to be higher than radiosonde PWV in China. In addition, some studies compared PWV from reanalysis data with satellite observations ERA5 from European Centre for Medium-Range Weather Forecasts (ECMWF) is the latest reanalysis data set and is an improvement over ERA-Interim. Zhang et al. (2019) indicated that ERA5 PWV has higher accuracy than ERA Interim. The comparison of ERA5 PWV with GPS data has shown satisfactory results over China. Recently, Bai et al. (2021) showed that the difference between PWV ERA5 and GPS is negligible (− 0.28 mm) over China.

In Iran, few researches have been done regarding precipitable water vapor. For instance, Asakereh et al. (2015) studied the anomalies and the cycles of precipitable water over Iran using NCEP–NCAR data sets. They found that the coastal areas experienced positive anomalies owing to their proximity to large bodies of water, while upland areas and the northwest and northeast of the country experienced negative anomalies because of their distance from water resources and altitude. Recently Khaniani et al. 2020) concluded that MODIS PWV over Iran are overestimated compared to GPS, but they did not use radiosonde and reanalysis data in their study and their study period was only one year long. Therefore, until now, no study has attempted to evaluate the MODIS PWV products with respect to the climatology of atmospheric water vapor and to compare it with the most recent reanalysis ERA5 data over Iran.

The intention of this paper is to demonstrate the quality of MODIS PWV products at monthly and daily scales over Iran. Therefore, the results are presented in two sections. The first section compares the long-term (2003–2015) monthly mean MODIS Level 3 and ERA5 PWV data sets. The second section validates the level 2 MODIS PWV products using radiosonde data at daily scales. For a better comparison of MODIS level 2 PWV products with radiosonde data, we used 12 radiosonde stations over Iran. Additionally, we consider the sky conditions (cloudiness and visibility) in our comparison.

2 Data and methodology

The data sets used in the present study are given in Table 1. We used the measurements of radiosondes and ERA5 as reference data for the comparison of the MODIS PWV estimates. MODIS has five near-infrared bands located within and around the \(0.94\upmu \text{m}\) water vapor band for remote sensing of column water vapor amounts over clear land areas and over oceanic areas with sunlight. The algorithm uses ratios of water vapor absorbing bands (within the \(0.94\upmu \text{m}\) water vapor band) with atmospheric window bands at 0.86 and 1.24 \(\upmu \text{m}\)(King and Coauthors 2003).

Table 1 Details of the data used in the study

These data were obtained at monthly and daily scales. As previously mentioned, this study consisted of two main sections. In the first section, long-term (2003–2020) spatial and temporal characteristics of monthly mean PWV are investigated over Iran. For this, Level-3 MODIS terra (MOD08_M3) products and ERA5 data were obtained with 1de and 0.25 latitude and longitude geographic resolution for Iran, respectively. With regard to this spatial resolution, there are 196 grid points for MODIS and 3136 points for ERA5 data. Therefore, we extracted ERA5 PWV data based on the geographical location of the center of each 1 × 1° gridded point from MODIS data. Then we obtained 196 samples from the ERA5 data (consistent with MODIS samples) for the geographical area of Iran. These data have been compared in terms of descriptive statistics (including the spatial mean, and coefficient of variation (CV)). The CV was calculated spatially, by dividing the spatial standard deviation (Ϭ) by the spatial mean of the PWV for 196 1° grid points over Iran (Eq. 1).

$${\text{spatial}}\,\,{\text{CV}} = \,\frac{{{\text{spatial}}\,\sigma }}{{{\text{spatial}}\,\upmu }}$$
(1)

In addition, the spatial relationship between MODIS and ERA5 PWV data with topography was investigated. The digital elevation data (DEM) were obtained from the NASA Shuttle Radar Topographic Mission (SRTM) for Iran. SRTM data are available as 3-arc second (approx. 90 m resolution) DEMs for the globe. SRTM data are available at http://www.cgiar-csi.org/data/srtm90m-digital-elevation-database-v4-1. In this study, the DEM was resampled from 90 m to 1°× 1° by grid point based on 196 points of MODIS data over Iran and the spatial relationship between the MODIS and ERA5 PWV with topography was calculated.

We calculate the Percentage of Error (PE) for MODIS PWV data for all grid points in Iran. For this purpose, first, the Root Mean Square Error was calculated as follows:

$${\text{RMSE}} = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {e_{i}^{2} } }$$
(2)

where \(e_{i}\) the differences between monthly MODIS and ERA5 PWV data each of 1° latitude/longitude grid points. Then the Percentage of Error of MODIS PWV is calculated as follows;

$${\text{PE}} = \frac{{{\text{RMSE}}}}{{{\text{Average}}{\mkern 1mu} \,{\text{monthly}}\,{\mkern 1mu} {\text{mean}}{\mkern 1mu} \,{\text{ERA5}}{\mkern 1mu} \,{\text{PWV}}}} \times 100{\mkern 1mu}$$
(3)

Then, the spatial relation of PE with the long-term (2003–2020) spatial monthly mean of CF and AOD was calculated once by ordinary least squares (OLS) and once by Geographically Weighted Regression (GWR). GWR estimates local parameters and extends the linear regression model (Fotheringham et al. 1998). The GWR model equation follows as:

$$y_{i} = \,a_{o} \,(u_{i} ,\,v_{i} )\, + \,\sum {a_{k} \,(u_{i} ,\,v_{i} )} \,x_{ik} + \varepsilon_{i}$$
(4)

In the Eq. 4, \((u_{i} ,\,v_{i} )\,\) shows \(i{\text{th}}\) point coordinates in the space and \(a_{k} \,(u_{i} ,\,v_{i} )\,\,\) is a realization of the continuous function \(a_{k} \,(u,\,v)\,\,\) at point \(i\). The advantage of the GWR model is that it examines the existence of spatial non-stationarity in the relationship between the PE of MODIS PWV (as dependent variable) and Cloud Fraction (CF) and Aerosol Optical Depth (AOD) (as independent variables). In general, the Akaike Information Criterion (AIC) is often suggested in GWR for bandwidth selection (Fotheringham et al. 2003).

In the second section, January (as a month with low values of PWV and unstable atmosphere) of 2004 and July (as a month with high values of PWV and stable atmosphere) of 2008 were selected for comparison of MODIS daily (MOD05-L2) PWV product with radiosonde data for 10 radiosonde stations in Iran. The geographical locations of these stations along with the elevation are shown in Fig. 1. The radiosonde network has long been the primary in situ observing system for monitoring atmospheric water vapor. Radiosondes provide vertical profiles of meteorological variables, such as pressure, temperature, relative humidity and wind (Li et al. 2003). Usually, radiosondes are expected to produce PWV with an uncertainty of a few millimeters, which is considered to be the accuracy standard of PWV for meteorologists (Niell et al. 2001).

Fig. 1
figure 1

The distribution of elevation and radiosonde stations over Iran

For an appropriate comparison, we create a simultaneous time-series from MODIS (level 2) and radiosonde PWV data. In addition, with regard to the different horizontal grid resolution between the two data sets, we resampled all of them into a common grid in GIS software according to the geographical location of the radiosonde stations. Then, the capability of MODIS data was evaluated using statistical methods, such as the coefficient of determination R2 and Root Mean Square Error (RMSE), for each station. This comparison was carried in suitable (clear sky and appropriate visibility, i.e., more than 10 KM) and unsuitable (cloudy sky and poor visibility, i.e., lower than 10 KM) atmospheric conditions.

3 Results and discussion

3.1 Comparison of MODIS and ERA-5 PWV (Monthly scale)

Boxplots of the average monthly PWV (2003–2020) from MODIS and ERA5 are presented in Fig. 2a, b. The average MODIS-PWV amount from January to March is lower than in ERA5-PWV. These values are equal to each other during April, while from May to September (warm period), the average MODIS-PWV is higher in comparison to ERA5. Finally, during the fall season, it becomes lower than ERA5. The average annual MODIS and ERA5 PWV values are 13.3 and 13, respectively. These values are very close to each other and to those derived by Asakereh and Doostkamian (2014) from NCEP data re-analysis (about 14.3 mm). In addition, Kern et al. (2008) concluded that the ERA5 and the MODIS PWV fields are very similar. The coefficient of determination between monthly mean and standard deviation of PWV from both data sets is 0.89 and 0.8, respectively (Fig. 2c, d), which indicates that the MODIS and ERA-5 PWV values are almost identical to each other. The maximum and minimum values of PWV for both data sets is observed during July and January, respectively. Tuller (1968) indicated that February and July are the months of highest and lowest precipitable water at most stations. At some stations, August replaces July, and at a smaller number, January replaces February. Also, our results are same with the study of Maghrabi and Dajani (2014) over Saudi Arabia, which reported that the lowest PWV values were observed in December and January, whereas the highest values in June and July. They pointed out that during warm periods, increases in the temperature and height of constant pressure levels result in an increased capacity for water vapor of the air mass, keeping it away from the saturation point and consequently preserving high PWV values. In contrast, in cold periods, the decrease in the height of constant pressure levels, reduces the capacity for water vapor of the air mass and facilitates the condensation process, resulting in a decrease in the amount of PWV.

Fig. 2
figure 2

Long-term (2003–2020) monthly average of PWV from MODIS (a), and ERA-5 (b) and comparison of monthly mean (c) and standard deviation (d) of both data sets over Iran

Figure 3 shows the spatial variations of PWV for MODIS (first and third rows) and ERA 5 (second and fourth rows). Temporally, the highest PWV for both MODIS (44.4 mm in southeast at the Oman Sea coast) and ERA5 (41.6 mm on the southern shores of the Caspian Sea) is observed in July, but the location of maximum values varies. For ERA5 data, the maximum amount of the PWV is observed in six cold months of the year (i.e., from November to April) in southeastern Iran. While in the warmer months of the year (May–October), the maximum amount is observed on the southern shores of the Caspian Sea in the north of the country. However, for MODIS data, the maximum value in all months can be seen on the southeastern shores of the Oman Sea. The minimum amount of the PWV is observed in January (3.7 mm in the northwest for ERA5 data and 2.9 mm for MODIS) over the Zagros Mountains in northwestern Iran Asakereh et al. (2015) concluded that factors, such as proximity to water bodies in the south and north of Iran, roughness in the Zagros Mountains and inland deserts, could form the patterns of precipitable water.

Fig. 3
figure 3

Spatial average of the PWV in Iran between 2003 and 2020, separately for each month. (The first and third rows for MODIS data, the second and fourth rows for ERA5 reanalysis data)

Figure 4 shows the scatter diagram of the MODIS and ERA 5 PWV for all months. At best, the coefficient for January has reached 0.89. In January, the coefficient of spatial variation of the PWV is higher than other months (43% for MODIS and 38% for ERA 5 data). The reason for the high spatial variation in January is the high amount of the PWV on the southern coast and the low amounts in other areas. This pattern can be seen for both data sets in January. On the other hand, the lowest coefficient of determination is observed in May (R2 = 0.57). The coefficient of spatial variation of the PWV in May is 26 and 20% for ERA5 and MODIS data, respectively. This difference is because the MODIS data show more PWV for the central deserts of Iran (in the eastern half) compared to ERA 5 data.

Fig. 4
figure 4

The scatter diagram between the MODIS and ERA 5 PWV over Iran separately for each month

3.2 The PE of MODIS-PWV (in comparison to ERA-5)

In this section, the monthly PE values of MODIS data in comparison to ERA 5 are calculated. Then, the spatial relationship of the PE values with the CF and the AOD is calculated once with the OLS method and once with the GWR.

Figure 5 shows the distribution of monthly PE of MODIS-PWV (in comparison to ERA-5), CF and AOD in long-term (2003–2020). Table 2 shows the average of the PE, CF and AOD for each month. In general, the average PE reaches the highest value (29.3% in July) in the summer months (June, July and August). In addition, the spatial standard deviation of the PE increases in the summer months. The maximum PEs are often observed in the Central Zagros Mountains and the Kerman Mountains (in the southeast) However, in October, the PE reaches the lowest value (15% ± 7.8). In general, the seasonal pattern of AOD is contrasting with the CF over Iran (Table 2). While the CF is reduced in summer (9.4% in September), the AOD increases in this season (0.4 in July). However, the spatial coefficient of variations is high in the summer months (especially in September) because of the relatively high values of CF in the southeast and the southern coast of the Caspian Sea, and its low amounts in other areas. However, the spatial coefficient of variations in the AOD values has a similar seasonal pattern throughout the year. Rezaei et al. (2019) has shown that the distribution of the AOD is associated with topographic characteristics in Iran. As seen in Fig. 5, the AOD is higher in the lowlands of southwestern in Khuzestan compared to other areas. Therefore, the CF and AOD caused errors in the retrieval of the PWV over Iran in all months. The spatial relationship between the PE with the CF and AOD is calculated once with OLS and once with the GWR method for each month. In all cases, the GWR provided a better coefficient than the OLS method. The highest relationship between the CF and PE was obtained in November (R2 = 0.26) using the OLS method. This is because the amount of CF and PE has increased only in the northwest of Iran. However, the coefficient of determination between the PE and CF reaches the highest value in June (R2 = 0.71) using the GWR, while it is 0.15 with the OLS method. The spatial distribution of local R2 (not shown) shows relatively higher values in the eastern regions of Iran. In addition, the model coefficients in these areas are positive, indicating that there is a positive relationship between PE and the amount of CF in this region. In May, the coefficient of determination (between the PE and CF) obtained based on the GWR method (R2 = 0.66) is strongly higher than the OLS one (R2 = 0.06). The highest coefficient of determination between the AOD and PE values is observed in June (R2 = 0.72) based on the GWR. During June, the local R2 values are high in the southern region and there is a negative relationship between the PE and AOD values in this region. This is because the AOD in the mountains of Kerman province is low and simultaneously the PE is high in the same region. However, there is a positive relationship between the AOD and the PE values in the west and northwest.

Fig. 5
figure 5

Spatial distribution of the monthly P of MODIS PWV (first and fourth rows), cloud fraction (third and sixth rows) and AOD (second and fifth rows) over Iran during 2003–2014

Table 2 MODIS PWV percentage errors (in relation to ERA-5) and their spatial relation with CF and AOD in Iran

3.3 Relationship between PWV and topography

The correlation coefficients between topography with MODIS and ERA5 PWV are presented in Table 3. In all months, there is a stronger correlation between ERA5 PWV data and topography (compared to MODIS and topography). However, the PWV from both data sets has a significant negative relationship with the distribution of elevation in all months. This means that the concentration of PWV is high in the highland regions and vice versa. In both data sets, the values of correlation coefficient are relatively lower during the warm season (Jun and July) due to a widespread increase in the PWV amount and stability of the weather patterns during summer due to the presence of subtropical high pressures over Iran (Alijani 1994). Therefore, the topography is a key factor in the spatial distribution of PWV and previous studies, such as Tuller (1968); Alijani (1994); Wang et al. (2013) and Asakereh et al. (2015), obtained similar results. Asakereh and Doostkamian (2014) suggest that the low values of the PWV over the mountains and inland areas of Iran could be the result of decreases in wet advection in the country in recent decades.

Table 3 Spatial correlation coefficients between PWV of MODIS and ERA5 data sets with elevation on a monthly basis

3.4 Comparison of MODIS and radiosonde PWV (daily scale)

In this section, the results of the comparison between MODIS and radiosonde PWV are presented in each radiosonde station. For this purpose, January of 2004 (as the month with the most rainfall) and July of 2008 (as a representative of the warm season) were selected. With regard to the effect of weather conditions, such as cloud cover and visibility in the accuracy of retrievals, the comparisons are conducted once for all days and once for days with suitable weather conditions (clear sky with appropriate visibility).

The Root Mean Square Error (RMSE) and coefficient of determination between MODIS and ERA-5 PWV values are given in Table 4 for both January 2004 and July 2008. These values are calculated once for all days and once only for days with clear sky and suitable visibility (lower than 10,000 m). As shown in Table 4 in all stations, the coefficients of determination and RMSE values are significantly improved under suitable weather conditions (Fig. 6).

Table 4 RMSE and coefficients of determination between MODIS and radiosonde PWV in different weather conditions during January of 2004 and July of 2008
Fig. 6
figure 6

Comparison of coefficients of determination and RMSE values between MODIS and radiosonde PWV in radiosonde stations. a R2-January 2004. b RMSE-January 2004. c R2-July 2008. d RMSE-July 2008

During January 2004, the range of errors is 5.53 mm (in the best-case over Tabriz) and 16.02 mm (in the worst case over Ahwaz). In all stations, the coefficients are small when all days are considered in the comparison, while RMSE is lower under suitable weather conditions. The significant differences of RMSE and the coefficients of determination between MODIS and ERA-5 PWV are shown in Fig. 6 for the two cases of different weather conditions: during January of 2004 and July of 2008. During July of 2008, at many stations, such as Zahedan, Kerman and Esfahan, cloud cover and visibility conditions have been appropriate (more than 10 KM), while in Bandar Abbas, the visibility was poor (less than 5 KM) in all days. It seems that cloud cover and visibility conditions result in the high coefficients of determination in Esfahan, Kerman and Zahedan (77, 80 and 66%, respectively) and cause high errors in the Bandar Abbas station. It can be seen in Fig. 6 that the significant improvement in the coefficient of determination and RMSE appears due to the removing of days with unsuitable weather conditions.

Another way for the comparison of MODIS and radiosonde PWV is to combine all stations regardless of their geographical location. Among all samples, the days with clear sky and suitable visibility (more than 10 KM) were compared with days with cloudy and poor visibility conditions for January 2004 and July 2008. As can be seen in Fig. 6 in suitable weather conditions, the coefficient of determination between the MODIS and radiosonde PWV data, is significantly different from the days with unsuitable weather conditions. The R2 values in January of 2004 are 71 and 0.003 in suitable and unsuitable weather conditions, respectively (Fig. 7a, b). Such a situation is also observed during July of 2008, the R2 values of which are 73 and 0.05 in suitable and unsuitable weather condition, respectively (Fig. 7c, d). These results are similar to the results of other studies. Gao and Kufman (2003); Kern et al. (2008); Prasad and Singh (2009) concluded that MODIS NIR in the cloudy column water vapor shows very poor correlation with other data sets. For example, the coefficients of determination obtained by Prasad and Singh (2009) are very poor over Indian cities (R2 = 0.33, 0.04, 0.10). These researchers noted that the satellite water vapor shows systematic biases with month and season that is found to be sensitive to the sky conditions. These errors that are due to the presence of clouds, the MODIS channels in the 0.8–2.5 \({\upmu\, \text{m}}\) regions contain information about absorptions only above and within the clouds (Gao and Yoram J. 1992). In suitable weather conditions, the overall values of RMSE are 2.46 and 3.3 mm in July and January, respectively. These values are virtually identical with those obtained by Chen et al. (2008),who showed that the RMSE differences between GPS PWV and MODIS near-infrared (NIR) PWV are 3.3 mm over the United States.

Fig. 7
figure 7

Comparison of MODIS and radiosonde PWV values in two different weather conditions in all stations. a January 2004—suitable weather conditions. b January 2004—unsuitable weather conditions c July 2008—suitable weather conditions. d July 2008—unsuitable weather conditions

It should be noted that there might be a time lag between the different radiosonde observations (12UTC) and the overpass time of satellites (Terra) over Iran (6-8UTC). In the stations with high diurnal variability of PWV, this time lag can result in seriously misleading errors. The values of the coefficient of variation of radiosonde PWV (as a reliable source) are given in Table 5 separately for January of 2004 and July of 2008. The correlation coefficient between the coefficient of variation of radiosonde PWV and percentage of error (percentage of error compared to the radiosonde average PWV) is robust in the stations studied (0.81 and 0.608 in January of 2004 July of 2008, respectively). In other words, at the stations with high coefficients of variation, the percentages of errors are consequently high. For example, in January 2004, the highest coefficients of variation (up to 59.1%) and percentage of error (up to 91.1%) were both observed at the Kermanshah station in the west of the country.

Table 5 Coefficients of variation of radiosonde PWV and percentage of errors in MODIS PWV during January 2004 and July 2008 (in daily scale)

4 Conclusion

In this study, we presented a statistical comparison of the different MODIS PWV products including level 3 and level 2 in monthly and daily scales over Iran. The monthly (2003–2014) and daily (January 2004 and July 2008) MODIS products were compared with ERA5 and radiosonde data sets, respectively. Among the factors that cause the errors in the MODIS retrieval of the PWV values, the effect of weather conditions, including cloudiness and aerosol, was investigated. The main results obtained are as follows:

  • •In the monthly scale, the percentage of errors for the MODIS PWV increases during the summer period. Based on our analysis, cloudiness and aerosol optical depth caused the errors in the retrieval of MODIS PWV.

  • •Annual average MODIS PWV and ERA5 are close to each other. In addition, MODIS has a strong negative correlation coefficient with topography, similarly with the ERA5 PWV data. This suggests that MODIS level-3 monthly PWV data are valuable to estimate the monthly long-term climatology of PWV over Iran.

  • •In daily scale, a comparison of MODIS and radiosonde PWV data under different atmospheric conditions showed significant differences. During clear days with appropriate visibility (despite the time lag between the two data sets), the values of R2 are higher compared to cloudy days with poor visibility. However, the accuracy of the MODIS PWV data over Iran is strongly dependent on weather conditions and geographically location, and errors can occur throughout the year.