1 Introduction

The agricultural yield of major crops in the Himalayan region has been stagnant over the last few decades (Shrestha et al. 2017). Various factors have been identified as the major cause for the poor productivity, which includes inadequate irrigation and improper cultivation (Joshi and Tiwari 2014). The inadequate irrigation can be attributed to the long dry spell in the Himalayas which is further accompanied by deep aquifers during the dry periods. In addition to this, despite the presence of water retention structures in the high altitudes of Nepal (Sharma et al. 2010), proper irrigation cannot be provided due to the lack of information on water retention characteristics of the soil (Kumar 2012). In addition to this, crop suitability to soil is also an important factor to be considered while cultivation. For instance, in the past decades, farmers have introduced crops such as wheat in the Himalayan ranges of Nepal (Tripathi and Jones 2010) which are suitable to the well-drained loamy soil. However, in the hilly regions of Nepal, silty soils mostly dominate the region (Bajracharya et al. 2007). Similarly, in Buthan, a major declination of the maize productivity in the mid to high hills of Himalayas is postulated to be due to the higher acidic soil (Baillie et al. 2004). Also, there are several other evidences which have claimed soil characteristics to be the major mechanism leading to poor production compared to other nearby countries (Paudyal et al. 2001; Kshetri 2010; Chapagain and Raizada 2017).

There is an immense association of agriculture and the soil physical and chemical properties. These properties vary widely with soil types, which in turn control water and nutrient uptake by plants (Mueller et al. 2010). For instance soil with higher organic matter content can result in lowering the irrigation water demand relative to lower organic matter content through retaining more soil moisture (Murphy 2015). This can potentially help growing crops during the dry seasons where water resources are limited. Also, consumption of energy in various agricultural practices such as tillage, seeding, fertilization and irrigation depends on the soil physical properties (Collins and Fowler 1996). Besides, it has also direct implications on the bearing capacity, moisture storage capacity and its availability to plants. Similarly, soil chemical properties such as organic matter have been found to enhance the plant nutrient uptake through decomposition of the plant and animal residues done by microbial biomass (Asadi et al. 2012). In addition, lower pH in the surface soil restricts the microbial activity and nitrogen nodulation, which in turn results in nitrogen deficiency in plants (Cienciala et al. 2016). Thus, both soil physical and chemical properties are of utmost importance to agriculture which is further controlled by the elevation change.

Altitude plays a significant role in changing the climatic characteristics, soil properties and land use patterns. This further drives the soil microbial functions and nutrient interaction with plants (Awasthe et al. 2005). For example, with the reduction in mean soil temperature and increase in precipitation in the higher Himalayas, microbial activities are declined (Hu et al. 2016). A study by Soethe et al. (2008) in Ecuador concluded that the nutrients available to plants reduce significantly with an increase in elevation. Also, Pandey et al. (1983) observed soil nutrients being washed along the steep slopes of Nainital in Uttar Pradesh, India which leads to nutrient deficiency among plants at higher altitudes. The soil water retention characteristics are highly dependent on the land use pattern. For instance, agriculture is practised in the lower elevations, whereas, forestry is dominant in the higher altitudes (Kidanemariam et al. 2012). Therefore, surface soil in the lower elevations is more susceptible to disturbance due to agricultural practices. On the other hand, the surface soil at higher elevation is more prone to be covered by organic matter due to forest cover. Furthermore, Jha et al. (2009) compared the soil moisture content for vegetated and barren land in the Western Himalayas. They concluded soil under vegetation cover retains more moisture and in order to grow crops and higher irrigation are to be supplied in the barren land. Similarly, in a different study soil moisture storage was noted to be higher in a dense forest relative to a degraded one (Tyagi et al. 2013). Gupta and Pandey (2008) found soil organic carbon (SOC) is highly sensitive to the forest type in the Himalayan ranges and concluded with an increase in Eucalyptus, Shisham and Teak, higher SOC was observed. Since altitudinal gradient directly affects the soil characteristics which in turn affects soil–water–plant relationship, in situ measurements of the spatial variation of soil properties are crucial.

The application of pedo-transfer functions to estimate the soil hydraulic properties is recommended in the literature; however, in many developing countries either they are not available or they are less suitable for hilly watersheds where soil properties are highly variable (Santra and Das 2008). Therefore, to simplify this issue a plethora of studies have tried to establish a relationship between maximum water holding capacity (WHC) and obtainable textural and structural soil properties while using regression models (Dabral and Pandey 2016; Gupta et al. 1977; Gupta and Larson 1979; Petersen et al. 1968). Although their application has proved to be successful to identify WHC of soil, however, their application is limited to American catchments and flatter basins. Therefore, generation of simple regression equations to calculate WHC of soil and identifying their suitability is imperative. As established in the literature, there are still existing knowledge gaps on how the soil physical and chemical properties vary with the altitudinal gradient in the Himalayas. In addition to this, the applicability of simple regression models to calculate WHC at different agroecological zones (AEZ) is unknown. Thus the novelty of this study is that it aims to answer the questions: (1) how soil physical and chemical properties vary along with different AEZs in the Himalayan region?; (2) how the soil properties are correlated to each other along with AEZs?; and (3) can simple multiple linear regression (MLR) models be generated to estimate WHC at three AEZs in the Himalayan region? The outcomes of this study can be used to estimate the irrigation water requirements for more efficient application of the water in the Himalayas where water scarcity is an important issue during dry seasons. In addition, the results can also be used to identify the crop suitability at different AEZs in the region.

2 Materials and methods

2.1 Site description

The study was conducted at Rangeet river catchment in the Sikkim state of India. The catchment has a total geographical area of approximately 944 km2 (till Rangeet dam site) and is located within 27°06′–27°58′N latitude and 88°01′–88°41′E longitude (Fig. 1). The elevation ranges from 620 m at the dam site to approximately 7300 m at the ridge. Based on local climatology and land use pattern, the study site can be categorized into three AEZs. The subtropical zone ranging from 620 to 2500 m asl, the temperate zone from 2501 to 4000 m asl and the trans-himalayan zone > 4000 m asl (Sharma et al. 2016). The land use land cover (LULC) map was created for the catchment based on the satellite image of IRS P6 LISS III dated 23 March 2012 with a spatial resolution of 24 m. The image was reclassified into major land use classes by using ArcGIS software (v. 10.3.1). The LULC map reveals approximately 60% of the area is covered with forest (including evergreen trees, shrubs and deciduous trees), followed by 22% of the land on which rainfed agriculture is being practised. 2% land is covered with grasslands/savannas and approximately 16% of the area is covered with permanent snow and water bodies (such as lakes and glaciers). Precipitation dominates the monsoon season (May–September) which contributes 85% of total rainfall ranging from 2200 to 3900 mm. The maximum and minimum temperature vary from 17–24 to 9–13 °C, respectively (Yadav et al. 2016). In addition, due to the increasing pressure on the land contributed by the increasing population, the forests are deliberately being cleared into different land uses such as agroforestry and terracing (Subba 2008; Deb et al. 2018). Furthermore, instead of using artificial fertilizers, farmyard manure (FYM) is generally applied in the region (Deb et al. 2015a, b). Inceptisols dominate the region (soil order); while a small, area also comprises of entisols. Also, based on WRB classification, the Leptosols and Umbrisols are dominant in the region (WRB 2014) (Table 1).

Fig. 1
figure 1

The location of study site along with DEM, sampling points and the land use land cover map

Table 1 Temperature, rainfall, slope, aspect, land use and dominant forms of agriculture in the AEZs selected for the study

2.2 Soil sampling and laboratory analysis

Discrete soil sampling approach was applied where a single soil sample was collected from each location at a particular depth of 0–20 cm. The representative sample size was decided based on the homogeneity of the land use pattern (since evergreen forest and rainfed agriculture dominates the higher and lower altitudes, respectively) at the catchment. Moreover, since there is no significant variance in the climatology at intra-AEZ level, the influence of climate on soil properties is also minimal. Furthermore, too many samples can result in spatial correlation of the statistical tests (Cochran 1977; Wilks 2006). Therefore, based on the recommendations by other similar studies, 50 soil samples each from the subtropical (lower elevation) and temperate (middle elevation) AEZs; whereas, 29 samples were collected from the trans-himalayan AEZ (higher elevation) due to low accessibility caused by water bodies and permanent snow cover (Fig. 1). The sampling density was decided based on the land use pattern, accessibility and the topography throughout the area considered so that the collected samples do not belong to similar land use pattern which can potentially lead to biases in the soil attributes. In addition to this, care was taken while collecting the soil samples so that the samples are not distracted by human intervention. Therefore, the samples from the trans-himalayan and temperate AEZs were collected from the slopes of the mountains with natural land cover.

The soil sampling for the subtropical and temperate AEZs was done during the winter season (December to February) of 2014–2015 whereas, for the trans-himalayan AEZ, spring of 2015 was considered. Since the winter and spring seasons are dominated by drought (Kusre and Lalringliana 2014), the agricultural practices and rainfall influence in the soil characteristics are limited. The identification of the sampling sites from all three AEZs was done on the principle of first identification later sampling (FILS) technique where firstly, the latitude and longitude of the sampling points were identified randomly from the digital elevation model (DEM) of the watershed. Secondly, the location was transferred to the ground using global positioning system (GPS) (model: Garmin 76CSX). The DEM for the watershed was generated from the 30-m resolution DEM retrieved from Shuttle Radar Topography Mission (SRTM) (http://www2.jpl.nasa.gov/srtm/). Furthermore, a major part of the study site lies within the Khangchendzonga National Park, which is a conservation park, and performing research is prohibited; therefore, for this study special permission was taken from the regional forest office of the Sikkim government in order to collect the samples. The samples of the top soil from those sites were taken by scraping away the surface litter and inserting the soil augur up to a depth of 20 cm. The collected soil samples were thereby kept in jars while numbering the sample number for each AEZ.

The collected soil samples were separately air-dried and passed through a 2-mm sieve for further laboratory analysis. The textural classification of the samples was done by the triangular textural class method as described in USDA (1987). The particle size distribution (mechanical analysis) was done by standard Bouyoucos hydrometer method (Day 1965). The soil pH was measured by glass electrode with calomel as a standard reference electrode with 1:2.5 soil-to-water ratio as suggested by Peech (1965). Organic carbon (OC) present in the soil samples was identified by wet digestion method (Walkley and Black 1934). Particle density (PD), bulk density (BD), soil water (air dry) (SW), total porosity (TP) and WHC were determined by Hilgard apparatus as described in Baruah and Barthakur (1998).

2.3 Statistical analysis

Descriptive statistics were used to identify the characteristics of the 10 soil attributes, i.e. pH, OC, sand, silt and clay content, BD, PD, TP, SW and WHC derived from the collected soil samples of the three AEZs considered. Also, single-factor analysis of variance (ANOVA) was applied to identify the variation in the soil attributes of the collected soil samples corresponding to the three AEZs while considering the null hypothesis: there is no significant variation in the soil attributes at the AEZs. Due to the limitation of ANOVA being unable to identify for which particular AEZs the soil attributes vary significantly, Bonferroni post hoc test was applied. The null hypothesis considered was: there is no significant variation among the magnitude of soil attributes for the considered pair of the dataset. Since ANOVA assumes the equality of variances “homogeneity”, Fligner–Killeen test was applied for all of 10 soil attributes among the three AEZs prior to performing the single-factor ANOVA. The null hypothesis considered was: the variances of considered soil attribute is equal. All tests were performed at a significance level of 0.05. In addition to that, the correlation among the soil attributes at intra-AEZ scale was identified by Pearson correlation coefficient at a significance level of 0.05. Furthermore, in order to identify the generalized relationship among the soil attributes throughout the catchment, correlation coefficients among the soil attributes were also derived from all soil samples considering all the AEZs as a group (referred to as all AEZs combined).

Finally, multiple linear regression (MLR) models were generated for each AEZ while considering WHC as the response (dependent) variable and all other soil attributes as the predictor (independent) variables. Prior to the generation of the MLR models, multicollinearity among all the soil attributes was evaluated using the variation inflation factor (VIF) with a threshold of 10. Also, since MLR models assume linear relationship among the dependent (maximum WHC) and independent variables (pH, OC, sand, silt, clay content, BD, PD, TP and SW), linearity was checked prior to the derivation of MLR equations using scatter plots. The scatter plots were created for the dependent variable, i.e. maximum WHC and each of the independent variables while holding others fixed. The plots were generated for the soil attributes considering all three AEZs. The findings reflect all the independent variables possess a linear relationship with the maximum WHC with a correlation coefficient of ≥ |0.65| and therefore were preceded for the generation of MLR as per the recommendation by Zhang et al. (2018).

Stepwise MLR modelling using the forward additional approach was applied for the selection of the appropriate variables due to the existence of multicollinearity among the predictor variables. The stepwise MLR constructs a multivariate model for the dependent variable, Y considering a set of independent variables as given in Eq. (1).

$$Y = b_{\text{o}} + b_{1} X_{1} + b_{2} X_{2} + \cdots + b_{n} X_{n}$$
(1)

where bo is the intercept of the regression line and the Y axis; b1, b2, …, bn are the standard partial regression coefficients; X1, X2, …, Xn are the independent variables. In stepwise MLR modelling, first, the selection of the most correlated independent variable and WHC was identified. Thereby, the regression coefficient of the selected variable was determined and if the coefficient was found to be statistically significant, it was retained. This was followed by the selection of next correlated variable. As including a new variable can potentially influence the contribution of the previously selected variable, the regression coefficient of each variable was evaluated at each step while using the F-statistic and p value. The non-significant terms were eliminated and the selection of the variables was continued until the lowest Akaike Information Criterion (AIC) was obtained. The calculation of AIC is done as in Eq. (2).

$${\text{AIC}} = n \times \ln \left( {\frac{\text{SSE}}{n}} \right) + 2k$$
(2)

where SSE is the sum of squared errors, n is the number of observations, k is the number of independent variables and ln is the natural logarithm. The advantage of stepwise MLR model over the traditional linear regression is that it can obtain the best combination of predictor variables to response variables without causing an impact on inter-correlations among predictor variables (Du et al. 2013). Since MLR models also assume the residuals of regression to be normally distributed, the normality of the residuals for the best-fit model was checked using the Anderson–Darling normality test with the null hypothesis: all residuals are normally distributed. The autocorrelation among the residuals for the best-fit model was evaluated using the Durbin–Watson test against the null hypothesis: there is no autocorrelation among the residuals, i.e. the residual are independent. Both tests were performed at a significance level of 0.05. The performances of the best-fit models were evaluated using the adjusted coefficient of determination (R2). All the statistical analyses were performed in R statistical programming platform.

2.4 Methodological limitations

The present study considers 50 soil samples collected from subtropical and temperate AEZs, whereas, 29 samples from trans-himalayan AEZ. The number of samples collected may not be representative of the three AEZs due to high heterogeneous nature of the soil in the area, especially for the trans-himalayan AEZ. Furthermore, while identifying the soil attributes through analytical methods, human-induced errors are inevitable although utmost care was taken. Despite the mentioned limitations, it is emphasized that the study is utterly helpful in identifying the soil physical and chemical attributes and their relation to WHC in Himalayan AEZs, although the figures may slightly vary due to limitations.

3 Results and discussion

3.1 Variation of soil physicochemical properties by virtue of AEZs

The variation in the soil physicochemical properties along AEZs is presented through descriptive statistics and the single-factor ANOVA (mentioned earlier). The Fligner–Killeen test for homogeneity of the 10 soil attributes considered through the AEZs suggests that for all the soil attributes, p value ≥ 0.05 is obtained. Therefore, the null hypothesis of the variances of the soil attributes through the three AEZs is accepted.

3.1.1 pH content

The overall value of pH shows a sharp decrease from 6.28 ± 0.63 for subtropical AEZ to 4.79 ± 0.51 in the case of the trans-himalayan zone, indicating the existence of high acidic soil at the higher altitudes (Fig. 2a). In addition, the observed data also suggest pH is less dispersed and close to the median value for the temperate zone relative to the other two. F and p values (Table 2) of ANOVA illustrate the existence of significant variation in the pH for the three AEZs. Furthermore, Bonferroni post hoc test reveals a statistically significant variance in pH among the trans-himalayan and temperate zones along with subtropical and trans-himalayan AEZs (Table 3). No statistically significant variation in pH is observed for subtropical and temperate zones. The washout of the alkaline parameters (basic in nature) specifically calcium and sodium ions from the higher altitudes due to steeper slopes by the surface runoff can be attributed to the observed pattern of acidic soils in trans-himalayan AEZ.

Fig. 2
figure 2

Effect of elevation on a pH, b OC, c sand, d silt, e clay, f BD, g PD, h TP, i SW and j maximum WHC of the soil (box refers to the quantiles and the middle line represents the median; outliers are also represented in the plots)

Table 2 Single-factor ANOVA analysis for the calculated soil attributes
Table 3 p values for Bonferroni post hoc test calculated for all soil attributes at the three AEZs considered

3.1.2 OC content

The analysis of OC for the three AEZs illustrates that the lowest magnitude of OC is observed in the subtropical zone with a median value of 1.6 ± 0.5%. In addition for the trans-himalayan zone, lower dispersion of the OC can be seen and all the collected samples tend to accumulate to the median value of 2.8%. The higher temperature and the restricted water availability in the low lands relative to the trans-himalayan zone can be attributed to the observed pattern of the soil OC pool (Gupta and Sharma 2013). Also, the existence of larger herbaceous forest area coverage in the higher elevation leading to higher litter accumulation in the soil accompanied with restricted bacterial activity by the low temperatures can be attributed to higher OC deposit in the soil (Michael 2006). Another possible explanation for higher OC in the surface soil for the corresponding zone is due to minimal disturbance by human since the trans-himalayan region is not inhabited (Subba 2008). Moreover, a compelling difference in the OC content is observed among the trans-himalayan and temperate zones and trans-himalayan and subtropical AEZs. An insignificant variation in the OC pool is observed for the temperate and subtropical zones from both visually (Fig. 2b) and the ANOVA along with Bonferroni post hoc test (Tables 2, 3). This is probably due to the compensation of OC content by the outliers in the case of temperate and subtropical zone soil samples. Similar, trends in the soil OC pool were also observed by Arrouays et al. (2006) at France and Jones et al. (2005) for entire Europe where the observed OC was found to be higher in the higher altitudes compared to lower elevations.

3.1.3 Sand, silt and clay content

The observed proportion of sand in the collected soil samples suggests higher sand content in the subtropical zone (55.2 ± 9.4%) followed by trans-himalayan (46.2 ± 10.0%) and temperate zone (45.9 ± 7.4%). Interestingly, the trans-himalayan zone is observed to have a higher concentration of silt (32.6 ± 8.7%) and clay content (21.2 ± 5.7%) relative to the other two zones. The weathering of rocks and transportation of the larger particles (e.g. sand) by surface water runoff from a higher elevation to the lower ones can be the plausible cause for this (Römkens et al. 2002). In addition, as the runoff reaches the flatter regions, the flow velocity reduces and the larger particles, i.e. sand particles, settle as a suspension. On the contrary, the existence of higher content of silt and clay in the higher altitude (trans-himalayan zone) can be attributed to the influence of the roots from the mature trees which enhance the clay and silt particles to form aggregates (Burri et al. 2009). Similar results were also observed by Natake (2012) in the humid region of Puerto Rico where higher clay and silt contents were observed in the ridges. In addition, it is also noteworthy that the difference in the sand and silt content for the trans-himalayan and temperate zones is insignificant both visually (Fig. 2c–e) and statistically (Tables 2, 3). Furthermore, it can also be observed from the ANOVA analysis that, for all three AEZs, the variation in clay content is statistically insignificant.

3.1.4 Soil BD and PD

Based on visual inspection of the results, it can be identified that the soil samples of subtropical zone possesses highest BD (1.20 ± 0.13 g/cm3) followed by the trans-himalayan (1.09 ± 0.15 g/cm3) and temperate zone (1.07 ± 0.09 g/cm3) without any statistically significant inter-variation among the latter two (Fig. 2f). Furthermore, ANOVA analysis reveals the existence of statistically significant difference among the temperate and subtropical zones. The presence of higher sand in the subtropical zone (discussed in Sect. 3.1.3) can be attributed to higher values of BD in the corresponding AEZ relative to others. Although, the organic matter content from the forestland also has a significant contribution on the BD and therefore higher magnitudes are expected for trans-himalayan and temperate zones; however, for the present study, the only probable explanation is the observed higher sand content in the soil texture. Another possible explanation for higher BD in the subtropical zone can be the contribution from the agricultural practices where the soil mass is under constant disturbance due to tilling (Anghinoni et al. 2017).

A trivial variation is observed in the case of PD among the three AEZs (2.56 ± 0.42, 2.41 ± 0.16 and 2.43 ± 0.17 g/cm3 for subtropical, temperate and trans-himalayan zones respectively) (Fig. 2g). Also, the ANOVA test suggests that there is no significant variation of PD among all the three elevation ranges for the study catchment (Table 2). The similar explanation of the existence of higher sand content in the soil of subtropical zone can be attributed to the observed slender higher value of PD for the corresponding AEZ. Also, a similar magnitude of PD for the temperate and trans-himalayan zone is observed due to the similar silt and clay content in the respective AEZs.

3.1.5 TP and SW

A low magnitude of TP is observed for subtropical zone (0.52 ± 0.07) relative to the temperate (0.55 ± 0.03) and trans-himalayan zones (0.54 ± 0.06) (Fig. 2h). This is possibly due to higher sand content in the soils of the subtropical zone compared to other AEZs which attributes to lower pore space (Pla et al. 2017). In contrast, the trans-himalayan and temperate zones have a higher proportion of clay soil and enhance the availability of higher micropore space which ultimately contributes to higher porosity. Furthermore, ANOVA test suggests the existence of statistically insignificant variation in TP among all three AEZs. Similarly, statistically insignificant variation in the water content of air dry soil is observed among the three AEZs. Interestingly, the subtropical zone is observed to have higher water withheld in the soil samples (4.1 ± 2.8%) relative to the other two elevations (temperate with 1.7 ± 0.6% and trans-himalayan with 2.1 ± 0.5%). Higher BD of soil in case of the subtropical zone can be the contributing reason for the observed pattern of higher SW in the corresponding AEZ (Li et al. 2016).

3.1.6 Maximum WHC

The determination of the maximum WHC illustrates that the ability of soil to retain water increases with elevation from subtropical to trans-himalayan zones. The highest maximum WHC is observed for the trans-himalayan zone (59.7 ± 12.7%) followed by temperate (58.6 ± 7.1%) and least in the case of subtropical AEZ (49.9 ± 11.0%). The presence of higher OC in the trans-himalayan zone followed by temperate and subtropical zone can be the possible explanation for the observed patterns in the WHC at the three elevations (Wang et al. 2013). The presence of higher OC in the soil increases the number of micro- and macropores either by “glueing” the soil particles together or by creating favourable living conditions for soil organisms, which further leads to the increased capability of the soil to hold water (Reicosky 2005). It is also worth noting that higher variability in the WHC is observed for the subtropical and the trans-himalayan zones (Fig. 2j). This is probably due to the high variability of TP observed for the corresponding AEZ, which influences the WHC (Horne and Scotter 2016). Additionally, the existence of more dispersed silt and clay content from the median value (which has larger micropores to contain more water) at the trans-himalayan zone can be attributed to the observed pattern of WHC in the particular zone. The ANOVA analysis shows the existence of momentous variation in the maximum WHC for the three elevation ranges. In particular, the Bonferroni post hoc test suggests the variation is statistically significant for temperate and subtropical zones along with subtropical and trans-himalayan zones with a p value of 0.017 and 0.034, respectively (Table 3).

3.2 Correlation among soil attributes

The derivation of the Pearson correlation coefficients among the soil attributes at the three AEZs illustrates that the relationship is mostly insignificant except for some soil attributes which show similar relationship at all three AEZs. For instance, sand content is noted to have a significant negative relationship with silt and clay content at all three AEZs. The presence of higher sand in the soil samples relative to silt and clay content at all three AEZs can be attributed to the observed pattern of negative correlation (Deb et al. 2014). Similarly, a statistically significant negative correlation is also observed among the BD and TP for all three AEZs and all AEZs combined. The possible explanation for this behaviour can be attributed to the fact that TP responds to the void spaces present among the soil particles (captured by air and water), whereas, BD represents the soil compaction, hence both are counter-intuitive. OC is observed to have a significantly strong negative correlation with pH for the temperate AEZ and all AEZs combined (Fig. 3). OC is also noticed to have a strong and moderate positive correlation with WHC for subtropical and all AEZs combined, respectively. This is possibly due to the higher use of FYM in agriculture in the subtropical AEZ contributing to the increase in the macropores in the soils, which leads to higher WHC (Debnath et al. 2012). Moreover, the enhanced WHC further has potential to leach H+ ions from a higher elevation to lower ones and this can be attributed to the negative correlation observed among pH and OC for the temperate AEZ (Gruba and Socha 2016). BD exhibits statistically significant negative correlation with WHC at all three AEZs, this is possibly due to the reduced pore space available with increasing BD, which is available for water storage. Similar results of negative correlation among maximum WHC and BD are also observed by Bi et al. (2014) in Shenfu Dongsheng coalfields in China.

Fig. 3
figure 3

Corrplots of soil attributes at a subtropical AEZ, b temperate AEZ, c trans-himalayan AEZ and d all AEZs combined (only significant correlations are shown)

The results of inter-correlation among the soil attributes can also be compared to the outcomes of Heshmati et al. (2011), where negative correlation is observed among pH/OC and clay/sand content of soil at semi-arid regions of Iran. In addition, the soil saturation percentage is positively correlated to sand content, which is also in line with the present study in the case of temperate AEZ. Similarly, a separate study by Borůvka et al. (2002) at Klučov, Central Bohemia illustrates the correlation among soil attributes at a spatial scale varies with altitude. TP is noted to be negatively correlated to silt content when all elevation zones are combined. Also, TP is observed to be positively correlated to soil moisture content at lower and higher elevations; whereas, pH is noted to be negatively correlated to OC content of soil at the mid-elevation. In addition, soil moisture content exhibits a negative correlation with silt content when all elevation ranges are combined. Similar results are also observed in the present study for the correlations among all these attributes. Thus, based on the identified results it is obvious that correlation among soil attributes varies widely at spatial scale and is necessary to be considered and handled carefully while generating MLR models.

3.3 Stepwise multiple linear regression analysis

The test for multicollinearity (VIF) among the predictor variables shows that there are several variables with statistically significant relationship; for instance pH and clay for subtropical AEZ; pH, OC and clay for temperate AEZ; sand, silt, clay and TP for trans-himalayan AEZ. Since the inclusion of these variables in the MLR model can lead to over-fitting (Latt and Wittenberg 2014), they are eliminated for the subsequent steps. In the case of subtropical, temperate and trans-himalayan AEZs, the lowest AIC obtained are 28.15, 1.75 and 58.42, respectively. The final models generated from the stepwise MLR consist of OC, sand and silt content, BD, TP and SW as predictors for subtropical AEZ. Similarly, sand and silt content, BD, PD, TP and SW are considered for temperate AEZ; whereas, pH, OC, BD, PD and SW are considered in the case of trans-himalayan AEZ. It is worth noting that only three of the nine independent variables, i.e. soil BD, PD and SW, are selected for all three AEZs while generating the MLR models. The adjusted R2 obtained for the models are 0.93, 0.90 and 0.72 for subtropical, temperate and trans-himalayan AEZ, respectively (Table 4). The lower performance of the model in the case of trans-himalayan AEZ is probably due to the lower number of samples used.

Table 4 Stepwise MLR models generated for the predictability of maximum WHC for the collected soil samples at the three AEZs

The good predictability of the model is also validated by the model standard error and the F-statistic. As model standard error represents the ability of the model to predict the dependent variable with the given independent variables, a lower value is preferred (Stojanovic et al. 2013). In the case of the models generated, standard errors for subtropical, temperate and trans-himalayan AEZ are 0.42, 0.51 and 0.66, respectively (Table 4). This implies for a predicted value of maximum WHC of the soil, the variations in the results compared to the observed values are 0.42, 0.51 and 0.66%, respectively, for the corresponding AEZs. Similarly, the F-statistic calculated for model illustrates higher values relative to the Fcritical (1.59 for subtropical and temperate AEZs and 1.84 for trans-himalayan AEZ at a significance level of 0.05). In addition, the p values of Anderson–Darling test for normality of residuals (0.39, 0.16, 0.61 for subtropical, temperate and trans-himalayan AEZs, respectively) suggest acceptance of null hypothesis that the residuals of MLR follow a normal distribution for the best-fit models. The Durbin–Watson test for autocorrelation among the residuals suggest that for all three AEZs, the null hypothesis of no autocorrelation among the residuals is accepted (p values of 0.82, 0.33, 0.36 for subtropical, temperate and trans-himalayan AEZs, respectively) for the best-fit models. The statistical evaluation of the assumptions while creating the MLR models illustrates that the models are acceptable and can be applied for the determination of WHC at the corresponding AEZs.

The results obtained in the present study can also be compared to the outcomes of the study by Qiu et al. (2003). The objective of the study was to generate three MLR models to predict soil moisture while using variable related to land use, topography and meteorology in a small catchment in the hilly region of China. The outcomes suggest that the MLR models which considered variables related to land use and topography showed the best performance. The present study also considers soil attributes as independent variables which are controlled by land use and topography. In another recent study of comparison of artificial neural networks, support vector machine for regression, MLR and k-nearest neighbour (kNN) methods to identify soil water retention in the tropical delta region of Vietnam, MLR and kNN method resulted in best outcomes (Nguyen et al. 2017). The independent variables used in the model were sand, silt, clay content, BD, OC and logarithm of OC which were collected from 160 sites. Data from 140 were used for training the model; whereas, soil water content data from 20 sites were used for the model testing. Adjusted R2 obtained for the MLR model showed an average value of 0.89 which is similar to the ones obtained in the present study.

4 Conclusion

The present study evaluates the effect of AEZs on the surface soil physical and chemical properties and their relationship with the maximum WHC. The results confirm that pH, sand content, BD and SW decline with the increase in altitude. On the other hand, OC, silt and clay content, TP and maximum WHC increase with altitude; whereas, PD remains stagnant throughout all elevation ranges. In addition, statistical tests suggest pH, OC, BD, sand and silt content and maximum WHC vary significantly along with the AEZs. Furthermore, BD, TP, SW and sand and clay content are found to be strongly correlated with maximum WHC across the three AEZs. Finally, MLR models were generated for the prediction of maximum WHC while using various soil physical and chemical properties at the three AEZs. Higher adjusted R2 (≥ 0.70) and F-statistics (> F-critical), whereas, lower model standard error reflects the suitability of the models in prediction of maximum WHC in the three AEZs. Based on these findings it is suggested that (a) the soil properties vary widely along with altitudinal gradient and (b) simple multiple linear regression-based models perform well in the identification of WHC of soil at different altitudes. The outcomes of this study are applicable in identifying the crop suitability and cropping calendar in the AEZs at the Himalayan region. Also, the results can be used as a guide to calculate the maximum WHC of the soil and designing of irrigation systems for the hilly catchments.