1 Introduction

General circulation models (GCMs) are widely used sophisticated tools to study climate as well as the large-scale upper-air features of our Earth; however, they do not give reliable information at the local scale (Hanssen-Bauer et al. 2003; Eden and Widmann 2014; Das et al. 2016; Gaur and Simonovic 2017). Direct output from the GCMs has limitations at subregional or local scale due to their scale mismatch. To overcome this scale differences, several downscaling methods have emerged to bridge the gap between the large-scale coarser resolution of GCMs simulations and the local-scale higher-resolution information required for climate impact studies (Wilby and Wigley 1997; Huth 1999; Eden and Widmann 2014; Meher 2019). The empirical-statistical downscaling (ESD) method is one of them. The statistical downscaling involves developing empirical relationships between large-scale atmospheric predictors (for example, mean sea level pressure, geopotential height, humidity, or the wind) and local-scale surface predictands (for example temperature or rainfall at a weather station) (Hanssen-Bauer et al. 2005; Dabanlı and Şen 2017). There are three categories of statistical downscaling techniques developed so far, namely, (i) weather classification or weather typing, (ii) regression/transfer function, and (iii) weather generators (Wilby et al. 2004; Anandhi et al. 2009; Hofer 2010; Blazak 2012; Kannan and Ghosh 2013). The first two categories of approach may involve a perfect prognosis approach (Kannan and Ghosh 2013) which use empirical models that relate observation-based predictand and large-scale predictor during a common time period and then applied to simulated predictors (for example GCM scenario runs) for the future (Kannan and Ghosh 2013; Eden and Widmann 2014). Perfect prognosis approach is based on the assumption that the relationship between simulated predictors and the predictands will remain consistent in the future (Blazak 2012).

There is no general consensus regarding appropriate selection of suitable predictor variables (Hu et al. 2013; Eden and Widmann 2014) for developing downscaling models. Selection of appropriate predictor is sensitive to the domain under consideration, predictand to be downscaled, attributes of the prevailing large-scale circulation, seasonality, and the topographic context, etc. (Anandhi et al. 2008; Anandhi et al. 2009; Forland et al. 2011). Earlier studies on predictor selection (Wilby et al. 1999; Wilby et al. 2004; Ghosh and Mjujumdar 2006; Anandhi et al. 2009; Shashikanth and Ghosh 2013; Salvi and Ghosh 2013) reported that the suitable predictors must have the following characteristics: Firstly, statistical features of the predictors need to be well reproduced by GCMs and reanalysis data products. Secondly, they ought to be strongly correlated with the considered predictand. Thirdly, they should be physically and/or conceptually sensible. It is often advised to experiment with different geographical domain (preferably larger than the targeted predictand domain) while selecting a suitable predictor for a target region (Wilby and Wigley 2000; Sauter and Venema 2011) because a smaller domain over the target region may fail to capture the strongest correlation between predictand and the predictor (Wilby and Wigley 2000). In the case of precipitation downscaling for a particular location, the optimal predictor domain should be selected in such a way that all domains should capture the mechanism that leads to the formation of precipitation over that location. For example, rainfall downscaling study by Anandhi et al. (2008) and Hu et al. (2013) screened various predictors from the NCEP/NCAR (National Centers for Environmental Prediction/National Center for Atmospheric Research) reanalysis datasets on the basis of predictor’s role in generating monsoon rainfall over the “Malaprabha river basin of India” and “Yellow River source region of China” respectively.

The major portion of annual rainfall (70–80%) over the Indian subcontinent occurs in the south-west monsoon season from June to September due to the large-scale monsoonal wind circulation. Similarly, the winter precipitation over the Western Himalayan region (WHR) of India occurs in the cooler winter season during December to February due to another wind flow originated from the extratropical region which is commonly known as western disturbances (Dimri et al. 2015; Dimri et al. 2016; Das and Meher 2019). Lower tropospheric planetary waves over mid-latitudes play a significant role in generating monsoon rainfall over India (Bawiskar 2005) and its neighboring regions like Western Himalayas (Priya et al. 2016; Meher 2019) and the Indus basin (Saeed et al. 2013). Studies by some scholars showed that meridional (or v wind) velocities (Bawiskar et al. 2005; Parthasarathy et al. 1991) and mean sea level pressure (Douville 2006) are the major parameters which play a significant role in the occurrence of all-India monsoon rainfall. The mean sea level pressure (mslp) can be directly linked to the south-west monsoon rainfall over India (Douville 2006) through a pressure gradient developed between the Thar Desert (low pressure) and the Bay of Bengal (high pressure) during the active phase of the south-west monsoon season. The monsoonal circulation intensifies over the Indian region with the increase of pressure gradient, causing the increased moisture advection. Saeed et al. (2013) reported that geopotential height over central Asia could be used as a potential predictor to serve as a precursor for the rainfall in the upper Indus basin region. Pervez and Henebry (2014) reported that precipitation over two major river basins of South Asia namely the Ganges and the Brahmaputra were significantly influenced by various predictors like geopotential height, u wind (wind flow from east to west or across lines of latitude), v wind (wind traveling from south to north or across lines of longitude) (at 850 and 1000 hPa pressure level), and specific humidity (at 500 and 1000 h Pa pressure level) while the influence of air temperature was found to be poor. Dimri et al. (2016) reported that the mid-tropospheric circulation due to zonal wind (u-wind) and geopotential at 500 and 850 hPa play a crucial role in convergence for triggering the winter precipitation over the Western Himalayas. Statistical metrics for evaluating independent predictors against a predictand is not firmly established in the literature. We have reviewed the works of several scholars (see Table 1) leading to the selection of suitable predictors for different statistical downscaling studies and found that scatter plots, partial correlation, and stepwise regression are some of the commonly used tools to select a suitable subset of predictors from the reanalysis datasets. For more reviews on predictor used in different downscaling studies, readers are advised to follow the work by Anandhi et al. (2008). In the present study, authors have reviewed numerous literatures (mentioned in the above paragraphs and Table 1) which have pointed out different upper-air large-scale predictors and surface variables that can be possibly taken as suitable predictors to downscale precipitation over different regions of South Asian domain and Indian region (excluding the north-western part) in specific. The Western Himalayan region of India is one of the less monitored regions of the globe in terms of downscaling studies. In the present paper, the authors have taken the initiative to select some of the suitable predictors that will help for the future downscaling studies over this region. The purpose of the present paper is to choose suitable predictors for reliably predicting station level precipitation over the WHR using some conventional statistical techniques such as correlation maps, EOF-based variance analysis, and calculating correlation coefficient. All the predictors were exhaustively evaluated through several statistical measures over three different domains to ensure reliable choice of suitable predictors for estimating annual and seasonal rainfall over WHR through statistical downscaling techniques. With this background information, the present study was carried out with the following objectives:

  1. 1.

    To show how different techniques and domain size are sensitive towards selecting appropriate predictors over WHR.

  2. 2.

    To select domain wise potential predictor on the basis of ranks obtained from the predictor-predictand relationship.

Table 1 A brief literature review on the selection of suitable predictors using the NCEP/NCAR datasets in different domains over India and South Asia. Bold references show that the work has been carried out nearby the Western Himalayas. See Appendix for the long name of the variables

The rest of the description on the present investigation has been divided into three major sections. Section 2 of the paper gives a short description of the study region, the data used, and the detailed methodology used in the present work. Section 3 of the article provides a detailed account of the results and discussion. The key messages or the conclusions from the present work are inscribed in Sect. 4.

2 Data used and methodology of predictor selection

The study area is the Western Himalayan region of India (WHR), extending between 28°42′ to 33°12′N and 75°34′ to 81°05′E and comprises the two northern states of India namely Himachal Pradesh and Uttarakhand. Monthly gridded rainfall data (Pai et al. 2014) from the India Meteorological Department (IMD) was used in the present study. The area-averaged rainfall over the two states mentioned above was treated as the reference predictand. Besides the observational data, 24 numbers of large-scale atmospheric variables extracted from the NCEP/NCAR reanalysis dataset (Kalnay et al. 1996) on a 2.5° × 2.5° grid over the same time period as the observation data (1951–2005) were engaged for the present study. These variables include geopotential height, zonal and meridional wind speeds, specific humidity, relative humidity, divergence at various pressure levels, vorticity, wind speed, sea level pressure, precipitable water content, precipitation rate, and air temperature (see Table 2 for details).

Table 2 Twenty-four NCEP/NCAR large-scale variables taken for the present study is given in the first column. For simplicity, the variables are represented by a variable I.D. in the second column

2.1 Spatial region and correlation mapping

As downscaling results are sensitive to the size of the predictor domain, the developing downscaling model considering the different size of domains may provide more reliable information for policymaking. Forland et al. (2011) showed that smaller predictor domain is more reliable than the larger ones; however, the GCMs have a minimum skillful scale, and the local state is expected to depend on ambient large-scale conditions. To justify which size of domain will provide reliable downscaling results, three different sizes of domains namely (i) South Asia (10°S-40°N, 20°-120°E), (ii) India (8–38°N, 68–98°E) and (iii) the Western Himalayan region (27–38°N, 72–82°E) have been considered in the present work. For each domain, we separately tested how well different predictor variables can reproduce the observed feature of rainfall over the WHR. First, the following statistical analyses were performed over the bigger domain of South Asia for selecting the suitable predictor(s) and thereby, the same procedure was continued for another two domains as stated above.

Spatial correlation or pattern correlation coefficient has been a commonly used metric for quantifying the similarities between predictands and spatial patterns of the predictors (Srinivasan et al. 1995; Parding et al. 2019). It is quantified through the calculation of correlation coefficient between predictor’s data at each grid point and the predictand for a common time period. Spatial correlation maps were used to visualize those regions on the map where the correlation coefficients were higher than the other regions.

2.2 Multiple regression and temporal correlation

Predictor selection often requires a transformation of the raw predictors into a useful form because the information in the nearby grid boxes in the predictor data is not independent of each other (Maraun 2010). Empirical orthogonal function (EOF) analysis, or more generally principal component analysis, is a prominent technique for reducing higher dimensional fields (for example raw predictors) into a set of orthogonal basis vectors that are linearly independent (uncorrelated) to each other (Lorenz 1963; Hannachi et al. 2007). One merit of EOF analysis is that the orthogonal basis vectors reduce the problem of co-variability in subsequent regression analysis, and a small set of components capture most of the variability (often > 90% in its first seven vectors) through a lower dimensional representation of the original data (Huth 1999; Maraun 2010).

First of all, we subtracted the long-term (1951–2005) mean of the observed rainfall from the raw rainfall data (or predictand). In case of predictor variables, the long-term mean was subtracted from each grid point. In the present study, we retained the principal components (PCs) of the seven leading EOFs of each of the predictor variables to estimate the total variance explained by each of the EOFs. The percentage of explained variance of the Nth EOF can be defined as the ratio between the eigenvalue of the Nth EOF to the sum of all eigenvalues of all the EOFs taken together (Wilks 2011; Lorenzo-Seva 2013). The higher-order EOFs (beyond those explaining 90%) associated with negligible variance represent noise and are not expected to add any value to the regression used in the downscaling.

Backward elimination is a special case of stepwise regression. In this study, the whole process of backward elimination was carried out using the R-statistical package (R Core team 2002). For each predictor variables, backward elimination begins with seven leading EOFs in the model, and at each step, different EOFs were eliminated from the model one at a time. The final model or the best-fit model includes only those EOFs which produce a minimum AIC (Akaike information criterion) value and eliminating any one of these EOFs that did not result in a lower AIC (Ripley 2002). We have identified the better-performing predictors by observing the adjusted R2 values from the best-fit model; the larger the value of the adjusted R2, the better the ability of the variable to act as a suitable predictor (Hofer et al. 2010).

In another exercise, a stepwise regression was carried out between the area-averaged data of observation (predictand) and the area-averaged data of large-scale predictors in all three predictor domains. Before carrying out the multiple regressions, all the area-averaged datasets were standardized using the linearly detrended method to avoid spurious results associated with accidental trends. The best-fit model with the highest value of adjusted R2 has been taken to select the suitable combination of predictors over different predictor domain. The fitted values of the models were compared with the observational data using different agreement indices (d-index and Pearson correlation coefficient) and error indices, i.e., normalized root mean squared error (NRMSE). Details of these indices were mentioned in Meher et al. (2017) and Meher and Das (2019)

The correlation analysis between the predictors and the predictand is carried out in two different ways as follows.

  1. 1.

    The linear relation between different predictor and predictand was analyzed using the Pearson’s product-moment correlation between the area-averaged predictor (X) and the area-averaged predictand (Y) for different seasons (T) (annual, monsoon, and winter). For simplicity, we have termed this correlation as the RA in the whole document.

  2. 2.

    The Pearson’s product-moment correlation was also used to quantify the linear relation between the predictand and each of the leading EOFs of the predictor variables. For simplicity, we have termed this correlation as RE in the whole document.

Each of the method mentioned above was repeated for four different time periods of varying temporal resolution (25 years [1981–2005], 35 years [1971–2005], 45 years [1961–2005], and 55 years [1951–2005]) to put more confidence in the selection of suitable predictors.

2.3 Ranking of predictor variables and sensitivity analysis

The overall aim of the ranking approach is to scrutinize the top/bottom-ranked (1/24) predictors in all domains irrespective of season. Ranking of predictors was carried out for the three selected domains using the results obtained from the three methods (i.e., R2, RA, and RE) as discussed in Sect. 2.2. The ranking of predictors for a single domain (say India) and for a particular season (say monsoon) was carried out as follows.

2.3.1 Ranking for R2 and RA values

  1. 1.

    Firstly, we have taken four time periods of different temporal resolution as mentioned in Sect. 2.2 and calculated the values of R2 and RA for the predictor variables.

  2. 2.

    Secondly, ranks (1–24) were given to the predictor variables based on their absolute values. Therefore, the highest correlation value of a predictor implies a top-ranked (1) predictor, whereas the lowest correlation value of a predictor implies a bottom-ranked predictor (24).

  3. 3.

    Final rank (lie between 1 and 24) of a predictor is calculated by taking the arithmetic mean of its ranks obtained in all the four-time period.

Similar steps were followed for the other two domains in the winter season and annual time scale.

2.4 Ranking of RE values

  1. 1.

    Firstly, we calculated the RE values of all the predictors for the first seven leading EOFs in the four different time period as mentioned above.

  2. 2.

    Repeated step 2 as mentioned in Sect. 2.3.1.

  3. 3.

    An aggregate value of the ranks was calculated for each predictor using the arithmetic sum of their ranks obtained in all the four-time period for all the seven numbers of EOFs.

  4. 4.

    For a particular predictor, the values obtained in step 3 were summed together to arrive at a final rank.

Similar steps were followed for the other two domains in the winter season and annual time scale.

The overall rank of a predictor variable is calculated irrespective of the ranks obtained in all the season (for example, see Table 3). The sensitivity of the ranks was tested to ensure different methods used in the study are meaningful and to check the consistency of the predictors. The sensitivity of the ranks was analyzed through two different methods.

  1. 1.

    Comparison of overall rank obtained by each predictor in different domains. Box plots were used to visualize the sensitivity of the ranks.

  2. 2.

    Comparison of overall ranks with rank calculated excluding only R2 value, ranks calculated excluding only RE values, and ranks calculated excluding only RA values over South Asia, India, and the WHR. The ranks are calculated considering all the season and annual scale.

Table 3 Suitable combination of predictors selected using multiple regressions of all the 24 predictors in different predictor domain and time scale. Agreement (d-index and R2) and error indices (normalized root mean squared error, NRMSE) between observation and modeled data were also calculated at each of the predictor domain in two different seasons (monsoon and winter) as well as in the annual time scale. Cells in italics represent predictors that are common in all domains in different seasons and annual scale

3 Results and discussion

3.1 Analysis of spatial correlation maps

Figure 1 shows correlation maps of two randomly selected predictors (z0500 and r1000) over the South Asia domain during annual, monsoon, and winter season. Correlation maps of all the other predictors are shown in Fig. S1S3 of the supplementary material. In all the three figures, predictors like u, v, and ▽ at all pressure levels and Ʊ exhibited scattered patches of positive and negative correlations over and around the Indian subcontinent. Some predictors like z at all pressure level and mslp had negative correlations (< − 0.35) over the South Asian region, while v1000, s0500, and s0850, r at all pressure level, prw, and pr gave mostly positive correlations (> 0.40). Over the target-predictand domain (i.e., the WHR), predictors like u0850 and t showed positive correlations in all the season and annual scale, but they failed to reproduce the same in other regions. The predictors had stronger positive or negative correlation in annual and monsoon season than in the winter season. It was observed that both u and v were well correlated with the predictand. In this regard, Satyanarayana and Srinivas (2008) reported that u responds to heating in the monsoon trough in North India, while v has more local effects. Hence, together, u and v are responsible for the convergence of moisture and therefore related to precipitation over India. On the basis of correlation maps, Sinha et al. (2013) found that s, u, and v (at different levels) over different domains around India are potential predictors (from NCEP/NCAR) to predict Indian monsoon rainfall. The results of the present work are almost similar to the finding of Sinha et al. (2013).

Fig. 1
figure 1

Correlation maps of two different predictors in annual, monsoon, and winter time scale over the South Asian domain. Here, the correlations were calculated between the total annual/monsoon/winter rainfall of the observational data and the aggregated annual/monsoon/winter values of the predictors at each grid point location. For all other predictors, refer to Figs. S1 to S3 of the supplementary article

3.2 Analysis of variance explained by EOFs and multiple regression

Figure 2 shows the variance explained by first 20 leading EOFs for all the predictors over the three study domains during annual time scale and monsoon season. The variance shown for the EOFs are the average value of the variance shown by respective predictors in four different time periods of varying temporal resolution as given in Sect. 2.2. It can be seen that the variance explained by higher-order EOFs (e.g., 8–20) were negligible as compared to the first seven leading EOFs. Similar results were found for the winter season (not shown). In most of the cases, the first seven leading EOFs together explained > 90% variance of the raw data. First, few EOFs are expected to explain a major portion of the variance compared to the variance explained by the rest of the higher-order EOFs. It is also found that the explained variance of the first EOF of most of the predictors was higher in WHR than in the Indian and South Asian region (not shown here). A similar type of result was reported by Akhter et al. (2019) where they found the explained variance of the downscaling model in the monsoon season was higher in the Western Himalayan region (Also known as the North mountainous India in their paper) compared to four other homogenous rainfall zones of India. In the present study, we have taken first seven leading EOFs in the multiple regression processes so that most of the regional and subregional variance can be incorporated in the selection of suitable predictors. Figure 3 shows the R2 values obtained for each predictor over different domains during annual, monsoon, and winter time frame. Over the South Asian region, the mean of the R2 values obtained for all the predictors was higher during monsoon (0.41) season than annual (0.24) and winter (0.18) seasons. Similar results were also found in the other two regions namely whole India and WHR. In all the seasons, the average R2 of all the predictors over the WHR was less than the Indian and South Asian region whereas over the Indian and South Asian regions, they were close to each other. Some predictors such as z0850 and z1000, s0500 and r0500, mslp, and pr gave higher R2 values for most of the cases (season and domain). The predictors having the highest value of R2 on the annual time scale were z1000 (0.34), u0850 (0.42), and z0850 (0.46) over South Asia, India, and WHR respectively. In the monsoon season, the predictors having the highest value of R2 were the z0850 (0.53) over South Asia and s0500 over both India (0.53) and WHR (0.59). Similarly, in the winter season, the predictors having the highest value of R2 were u1000 (0.30) over South Asia and pr over both India (0.41) and WHR (0.39). The results obtained in the monsoon season are consistent with the findings of Akhter et al. (2019) in their predictor selection study over the seven homogenous regions of India that reported that downscaling model with s500, s850, s1000, and prw was able to explain more than 70% of the observed rainfall variance over the Western Himalayan region whereas predictors like ta500 and u1000 have explained little about observed variance. ▽ and Ʊ parameters at different pressure level have shown poor skills in all the selected predictor domains. An EOF-based downscaling study by Nicholas and Battisti (2012) found that the most skilful predictors from the NCEP/NCAR data were all combinations of low-level specific humidity and one or more other fields at the same level over China, which was consistent with our finding over South Asia and the WHR. Pervez and Henebry (2014) reported that the predictors like z, s, u0500, u0850, and u1000, mslp, and w0500 gave higher explained variance of the observation in the Ganges–Brahmaputra basin of the South Asian region, which supports the findings of the present study.

Fig. 2
figure 2

Variance explained by first 20 leading EOFs is shown for all the predictors over the selected study domains during annual time scale and monsoon season. The variances shown for the EOFs are the average value of the variances shown by respective predictors in four different time periods of varying temporal resolution (25 years [1981–2005], 35 years [1971–2005], 45 years [1961–2005], and 55 years [1951–2005]). It can be seen that the variance explained by higher-order EOFs (e.g., 8–20) were negligible (~ 0–5%) as compared to the first seven leading EOFs. Similar results were found for the winter season (not shown here)

Fig. 3
figure 3

R2 from the regression between the predictand and the predictors over a South Asia (upper), b India (middle), and c Western Himalayan region (lower) for the annual total, monsoon total, and winter total precipitation calculated using backward elimination method. Seven leading empirical orthogonal functions (EOFs) were used against the observation in the backward elimination method and the R2 values were calculated from the best model fit

3.3 Analysis of product-moment correlation coefficient

Figure 4 shows the correlations between area-averaged predictand data and area-averaged predictor datasets for all domains during annual, monsoon, and winter timescale. Predictors like u0850, v0850, u1000, v1000, s, r at all pressure levels, prw, and pr had high positive correlation coefficients (RA > 0.4) in all domains during annual and monsoon time frame, whereas in winter, the RA values were less (< 0.3). These predictors gave higher RA values over the Indian domain (average RA = 0.54) compared to the other two domains (average RA = 0.46) during annual and monsoon time scale, whereas during winter season, the RA values over WHR (average RA = 0.28) was higher than over South Asia (average RA = 0.06) and India (average RA = 0.18) domain. The predictors having a strong positive correlation with the predictand were s0500 (average RA = 0.66), r0500 (average RA = 0.60), and prw (average RA = 0.57) in all the three domains during annual and monsoon time frame. In the winter season, the predictor having high positive correlation was r1000 (RA = 0.29), v1000 (RA = 0.40), and pr (RA = 0.55) over South Asia, India, and WHR, respectively. Similarly, some predictors like z at all pressure levels and mslp gave strong negative correlation (average RA = − 0.55) with the predictand over all the domains in all the season and annual time scale. These predictors exhibited higher RA values over the South Asia domain (average RA = − 0.59) than the other two domains (average RA = −0.55). Similar results were also found for the winter season.

Fig. 4
figure 4

Comparison of the correlation coefficient between the observed rainfall and the area-averaged predictor variables over the South Asia (Black bars), India (gray bars), and Western Himalayan region (white bars) during annual, monsoon, and winter seasons. The figure shows the correlation coefficients were higher in monsoon season than the annual and winter time scale

Figure 5 shows the correlation coefficients (RE) between area-averaged predictand and seven leading EOFs of each predictor field over the three selected domain. The first EOF (and sometimes the second EOF) of most of the predictors were consistently well correlated with the predictand in annual and monsoon timescale over all domains, whereas, in the winter season, none of the EOFs have shown such consistent results. In all the seasons and annual time scales, the first EOF of u0850, u1000, v1000, s at all pressure level, r0500, prw, and pr had strong positive correlation with the predictand over all domains. It was also found that these predictors had nominally higher correlation over the WHR domain (Average RE is 0.54 in annual and 0.60 in monsoon) than over the South Asian (average RA is 0.53 in annual and 0.58 in monsoon) and Indian domain (average RE is 0.53 in annual and 0.49 in monsoon) during the same time frame. In the winter season, the correlations shown by the first EOF of these predictors were though positive but their values were low, i.e., < 0.20. The leading EOF of z at all pressure levels, ▽0850, Ʊ, and mslp had strong negative correlation with the predictand in all the domains during annual (average RE = −0.51), monsoon (average RE = −0.58), and winter (average RE = − 0.17) time frame. In general, it was found that during the monsoon season, the predictors were having higher RE values than the winter and annual time scale. Besides the first EOF, there were also other EOFs which gave a good correlation (both positive and negative) with the predictand in different domains and a different season, but a general statement cannot be written for these correlations. Hence, we have included the seven leading EOFs while ranking different predictors in the subsequent sections.

Fig. 5
figure 5

Comparison of the correlation coefficient (RE) between the observed rainfall and the seven leading EOFs of different predictor variables over a South Asia, b India, and c Western Himalayan region during annual, monsoon, and winter season. Figures show the correlation coefficients are either highly positive or negative for the first or second EOFs. There are several cases in which higher-order EOFs also showed good correlation with the observation

3.4 Ranking of predictors and sensitivity analysis

Figure 6, 7, and 8 show the ranking of all the predictors over South Asia, India, and WHR domains respectively. The three statistical metrics RA, RE, and R2 were used to evaluate the final rank (1 to 24) of each predictor over all domains. In all the study domains, rank 1 of a predictor denotes the best predictor whereas rank 24 denotes a poor predictor. In the case of RA and RE, we have used their absolute values in evaluating the ranks. The final rank of a predictor was calculated by ranking the total sum of all the ranks obtained in all the seasons.

Fig. 6
figure 6

Ranking of 24 predictor variables in the South Asia domain during a annual, b monsoon, and c winter. Here, rank 1 shows a better performing variable, whereas rank 24 denoted a poor performance of the variable. Bottom panel of the figure shows the final ranks obtained for all the predictors considering all the three time scales taken in ac. The aggregated rank for a particular predictor was calculated by the arithmetic sum of the ranks obtained in all season and annual scale. Ranking of the aggregated ranks was represented through overall rank

Fig. 7
figure 7

Same as Fig. 6, but for India domain

Fig. 8
figure 8

Same as Fig. 6, but for WHR domain

Over the South Asia region, the top-ranked predictors were r1000 (rank 1), r0500 (rank 2), and z0850 (rank 3) whereas the bottom-ranked predictors were Ʊ and v1000 (rank 23 each) and s0850 (rank 22). Over the India domain, the top-ranked predictors were z0850 (rank 1), r1000 (rank 2), and r0850 (rank 3), whereas the bottom-ranked predictors were v1000 (rank 24), ▽0500 (rank 23), and v0500 (rank 22). Similarly, over the WHR domain, the top-ranked predictors were pr (rank 1), r0500 (rank 2), and s0500 (rank 3), whereas the bottom-ranked predictors were v0500 (rank 24), ▽0500 (rank 23), and w (rank 22). In a separate study, it was reported that downscaling models with precipitable water (prw) and specific humidity predictors have shown good validation results compared to other predictors over the WHR of India. In general, the predictors which have shown very poor performance were the v0500, v1000, and ▽0500, whereas the well-performed predictors were z0850, r500, r1000, and pr. These are the predictors which acquired either top ranks (1–4) or bottom rank (2–24) at least in two out of three domains.

Figure 9 shows the sensitivity analysis of ranks obtained by each of the predictors in different domains. The final ranks of the predictor were used in this analysis as they were prepared taking all the seasons and annual fields into consideration. It can be seen that five predictors (z0500, z0850, v0500, s0500, and ▽0850) were independent of the method and season of predictor selection, as their ranks had lower standard deviation than the other predictors. The overall ranks obtained for z0850 were consistently good and never exceeded > 5 in any of the domains. Similarly, the ranks obtained by v0500 were consistently poor and never fell below < 21. One predictor, i.e., z0500, consistently showed a rank which lies between 10 and 13 in all the domains. It was found that other than these predictors, all others were sensitive to the domain under consideration. For example, predictor like pr ranked 1 and 4 over the WHR and India domains, respectively, but was ranked 17 over South Asia. Similarly, s1000 showed a poor rank of 20 and 21 over the South Asian and Indian regions respectively whereas it showed a better rank (7) over the WHR. Figure 10 shows the second way of analyzing the sensitivity of the ranks where we compared the ranks obtained through different methods like overall rank, rank calculated excluding only R2 method, ranks calculated excluding only RE method, and ranks calculated excluding only RA method over the three selected domains. It is revealed that in each of the three domains, there is a significant correlation (between 0.90 and 0.95, at 1% level) between the overall ranks and the ranks obtained after excluding different methods; hence, the methods used in this study are very effective and meaningful in selecting suitable predictors over the South Asia domain. Over the South Asia domain, eight predictors (z1000, v0850, v1000, s0500, r at all pressure levels, and Ʊ1000) have shown their overall ranks were not varied more than ± 3 after exclusion of any of the methods taken in this study. Similar results were also obtained for five predictors (z0850, u0850, v0500, s1000, and pr) over the India domain and eight predictors (v0500, v1000, r0500, and ▽0500, Ʊ1000, w, t, pr) over the WHR domain.

Fig. 9
figure 9

Sensitivity analysis of ranks in different domains. The plots were generated using the overall rank obtained by each of the predictor variables over different domains namely South Asia, India, and Western Himalayan region

Fig. 10
figure 10

Comparison of ranks obtained through different methods like overall rank, rank calculated excluding only R2 values, ranks calculated excluding only RE values, and ranks calculated excluding only RA values over South Asia, India, and WHR. The ranks are calculated considering all the seasons and annual scales

3.5 Analysis of multiple regressions of the raw data

Table 3 shows the selected combination of variables obtained from the backward multiple regression between all the 24 variables (i.e., predictors) taken in this study and the observational data (i.e., predictand). Table 3 also shows calculated values of various statistical metrics between the observational data and the best-fit model data. Variables which have maximum occurrence in different best-fit models were v0850, s1000, r0850, r1000, Ʊ1000, mslp, and prw. The adjusted R2 values over the smaller predictor domain of WHR were significant (at 5%) and higher than other regions, while monsoon was the season in which the R2 values were significantly higher than annual and winter time scale. Calculated d-index (and correlation coefficient) values were 0.60 < d-index < 0.90 (and 0.50 < r < 0.82) for the annual and monsoon time scale and < 0.31 (and very low negative values) in the winter, which showed that model-fitted values using the selected combination of variables were close and in good agreement with the observation during the annual and monsoon seasons whereas poor in winter season over all the selected predictor domain. The calculated NRMSE values showed that the model-fitted data were characterized by low normalized error with observation in annual and monsoon seasons whereas high error in the winter season.

4 Conclusions

The major conclusions from the present study were outlined as follows:

  • The predictors examined exhibited stronger positive or negative spatial correlation with the observed regionally averaged rainfall (the reference predictand) in annual and monsoon season than in the winter season. In all the selected domains, the mean of the regression coefficient values obtained for all the predictors was higher during monsoon than annual and winter seasons. In all the season, the average R2 values of all the predictors over the WHR were less than the Indian and South Asian region whereas over the Indian and South Asian regions, they were close to each other.

  • Predictors like u0850, u1000, v0850, v1000, s and r at all pressure levels, prw, and pr indicated high positive correlation coefficients (calculated through areal average method, RA) in all the selected domains during annual and monsoon time frame whereas in winter, the RA values were less. These predictors also had higher RA values over the Indian domain as compared to the other two domains during annual and monsoon time scale, whereas during the winter season, the RA values over WHR were higher than South Asia and India domain.

  • First EOFs of most of the predictors were consistently well correlated (RE) with the predictand in annual and monsoon timescale over all the selected domains, whereas, in the winter season, none of the EOFs have shown such consistent results. In general, it was found that during the monsoon season, the predictors were having higher RE values than the winter and annual time scale.

  • WHR predictor domain as mentioned in this study can be taken as a potential predictor domain for downscaling monsoon rainfall for the Western Himalayan region. Whereas, the statistical analysis of predictor selection for winter season rainfall over the Western Himalayan region was associated with poor findings (low agreement with observation). Hence, extreme care must be taken while downscaling winter rainfall over the Western Himalayan region.

  • Predictors like z0500, z0850, v0500, s0500, and ▽0850 were independent of the method, season, or size of the domain.