Introduction

Saudi Arabia is one of the arid regions that lack specific studies related to the estimation of CN from field data since runoff stations are absent in most of the basins. Therefore, flood mitigation studies rely on the soil conservation service curve number (NRCS-CN) procedure. The NRCS-CN method is an acceptable procedure for estimating the volume of runoff in basins for various rainfall-runoff events worldwide USDA-SCS (1985). This procedure uses tables that are easily accessible and experimental formulas to create representative values for CN. The tables are based on λ = 0.2 and derived mainly NRCS-CN for agricultural basins. Therefore, there is a need to transfer these tables to λ = 0.01 for arid basins (Farran and Elfeki 2020a). This point is addressed in this paper. There are various methods to estimate CN value for basins. These methods are NRCS-CN table USDA-SCS (1985), least square method (LSM) (e.g., Hawkins et al. 2008; Mishra and Singh 2013; Farran and Elfeki 2020a), and asymptotic fitting method (AFM) (e.g., Hawkins 1993; D’Asaro and Grillone 2012; Farran and Elfeki 2020b). Hawkins (1975) showed the importance of the estimation of an accurate CN. Therefore, the research question in this study is regarding which of these methods provides an accurate estimation of CN and consequently a reliable runoff prediction in natural arid basins? To the best of the authors’ knowledge, there is no answer to this question in the available literature. In the next paragraphs, a review of the methods for CN estimation is presented followed by the main objectives of the research.

To calculate the curve number based on the NRCS method, two main components, namely hydrologic soil group (HSG) and land use, are needed. Most research on literature uses the NRCS-CN procedure to evaluate the generation of runoff. For example, in many river basins in Italy, statistical models have been used with CN methods to find runoff volumes (Kottegoda et al. 2000). In addition, Lin et al. (2007) have used the CN procedure to see how runoff changed due to land-use changes. The runoff has been significantly affected by changes in land use. Moreover, Hamdan et al. (2007) and Hamad et al. (2012) have studied the effect of land-use changes on runoff. Furthermore, Isik et al. (2013) have investigated how land-use changes impact runoff using the CN procedure and artificial neural networks. In West Bank, Shadeed and Almasri (2010) utilized GIS with CN procedure to predict runoff and CN values. Also, Hamad et al. (2012) have utilized GIS to assess how land-use change affects the water budget of the Gaza Strip. Additionally, Al-Juaidi (2018) has assessed the effect of land-use change on potential runoff volumes in the Gaza Strip using the NRCS-CN method and GIS with HEC-GeoHMS. Therefore, the mentioned studies among many others use the common NRCS-CN table to estimate runoff from rainfall events for ungagged basins to provide a design value for CN. However, there are other approaches to determining CN based on rainfall-runoff measurements. For example, one of the methods applied in the literature is the least-squares method (LSM). Walker (1970) is believed to be the first to use LSM, where he applied an experience-and error LSM to the data of runoff from many small basins in the state of Utah in the USA. Several studies have used LSM to reach the potential maximum retention values, e.g., (Simanton et al. 1973; Bales and Betson 1981; Cooley and Lane 1982; Curtis et al. 1983). Hanson et al. (1981) used LSM to find CNs on twenty-five small basins in Montana, Wyoming, and South Dakota and reached a general agreement with the NRCS handbook values. Hawkins et al. (2002 and 2008) used LSM to determine the maximum retention, S, of the basin, and the abstraction ratio, λ, of the runoff equation.

Hawkins (1993) used the asymptotic fitting method (AFM) for the asymptotic estimation of CN values and observed three different types of CN behaviors resulting from CN and rainfall relationship named standard, complacent, and violent behaviors. Thereafter, several studies followed Hawkins’s method of using AFM. Bonta (1997) proposed the development of Hawkins’ technique for the asymptotic estimation of CN values from measured data in violent and standard basins using the derived frequency distributions. He discovered that the derived distribution method is promising for CN estimation when rainfall-runoff data is limited. Hjelmfelt et al. (2001) provided complimentary examples of basins that highlight similar behaviors. Fennessey and Hawkins (2001) have compared the CN values for 60 catchments in the USA resulting from three methods and they named them CNbest( LSM based), CN(AFM based), and CNdesign determined from the CNtable. D’Asaro and Grillone (2012) assessed CN from rainfall-runoff events of 61 Sicilian basins by three various methods: the least-squares method, National Engineering Handbook-4 (NEH4) method, and the asymptotic fitting method (AFM). The results showed that the value of the initial abstraction ratio (λ) studied for basins with the standard CN behavior is well below the original value (λ = 0.2) mentioned in the USDA-NRCS reports. The NEH4 method was unable to determine the CN value of the correct basins. As AFM indicates a significant response to the standard CN behavior in about 77% of the basins studied versus a complacent behavior (18%) and a violent behavior (5%).

The geographic information system (GIS) has turned into a basic tool in hydrological modeling since it can deal with a vast measure of spatial information. Also, remote sensing (RS) can supply estimations of several hydrologic variables utilized in hydrologic modeling applications. In recent years, satellite remote sensing and GIS have emerged as powerful tools for collecting information on land use and land cover of large areas (Shih 1988; Subudhi et al. 1989). The information on land use, land cover, and hydrologic soil type can be integrated into a GIS environment for a quick and accurate estimation of runoff curve numbers (Stuebe and Johnston 1990). In addition, Still and Shih (1991) utilized remote sensing data to develop the runoff index at the level of the basin and showed how successfully can the remotely sensed data be utilized to track changes in surface runoff caused by land-use changes. Moreover, Saleh, and AI-Hatrushi (2009), and Youssef et al. (2011) have used remote sensing (RS) and GIS procedures as effective tools in flood risk evaluation.

In the past few years, several flood assessment studies have been conducted in Saudi Arabia mainly using GIS and RS procedures(e.g., Metwaly et al. 2010; Al Saud 2010; Dawod and Koshak 2011; Dawod et al. 2011; Al-Hasan and Mattar 2014; Abdulrazzak et al. 2017; Abdulrazzak et al. 2018). Additionally, Al-Subai (1992), and Şen and Al-Suba'i (2002) have used the traditional NRCS-CN tables for modeling rainfall-runoff in the southwestern region of Saudi Arabia. Furthermore, Dawod et al. (2013) have recommended using the NRCS-CN method as an ideal method for estimating the flood in Makkah, Saudi Arabia. Moreover, Mahmoud et al. (2014) have used GIS-based land use, hydrological soil group, and slope to estimate runoff in the Al-Baha region, the Kingdom of Saudi Arabia (KSA). The above studies considered only the traditional NRCS-CN tables (USDA-SCS, 1985) because they deal with ungauged basins and there are no runoff measurements.

In recent times, some quantitative studies have been conducted on CN in Saudi Arabia using rainfall-runoff data. For example, Alagha et al. (2016) studied the CN behavior of the Yiba basin in KSA, and the results showed a standard behavior with an approached value of fifty-two. Farran and Elfeki (2020a) used LSM to obtain the optimum range of values for λ and CN based on three estimation methods. The analysis showed that the range of CN varies between 45 and 85 at λ = 0.2. They concluded that the initial abstraction ratio λ of 0.01 is more representative to the arid basins. Also, Farran and Elfeki (2020b) used AFM to estimate the asymptotic CN (CN). The analysis showed that the dominant CN-P relationship is the standard behavior and CN shows high variability over the studied basins concerning three values of λ (0.01, 0.2 and 0.3) and the two sorting methods.

Therefore, the aims of this study are the following: (1) estimation of the curve number using the traditional (NRCS-CN) method based on land use and land cover evaluated by using GIS and RS techniques, (2) transforming the NRCS-CN table at λ = 0.2 into CN table at λ = 0.01 and apply the resulted values on flood predications, and (3) comparison between the different estimation methods, namely LSM, AFM, and SCS-CN table methods for flood evaluation. The objective of this comparison is to see how the values of the traditional (SCS-CN) method (used in ungauged basins) are related to the values obtained in both the LSM and the AFM (used in gauged basins). The results of this research will guide hydrologists and engineers during the estimation of the CN value of ungauged basins for flood mitigation studies in KSA.

Study area and data collection

The study has taken place on the eastern cliff of Asir region in the south western part of Saudi Arabia. The elevation the cliff of Asir is up to 3000 m and extends west to the Red Sea coast. Geologically, the study region is located in the Asir escarpment which is part of the Arabian Shield (Farran and Elfeki 2020b). The main geology of the region consists of volcanic rocks (Precambrian) and sedimentary rocks (Elfeki et al. 2020). The region is characterized as being very arid with extremist temperature differences, high evaporation rates, and low annual rainfall and random rainfall events are causing flash floods (Abdulrazzak et al. 2020). Also, annual rainfall is closely related to altitude, with total rainfall ranging from 30 to 100 mm on the coastal plain of the Red Sea and up to 450 mm at altitudes over 2000 m (Wheater et al. 1991).

The study included five main basins which are Al-Lith, Tabalah, Yiba, Liyyah, and Habawnah. It is divided into sub-basins (19 sub-basins in total: 4 subbasins in Al-Lith, 3 subbasins in Tabalah, 4 subbasins in Yiba, 3 subbasins in Liyyah, and 5 subbasins in Habawnah). Al-Lith, Yiba, and Liyyah basins drain to the west towards the Red Sea, while Habawnah and Tabalah basins drain to the east towards the Rub al Khali. The study depended on the historical data from measurements of storms (rainfall and runoff events) reaching 161 events collected over a 4-years period (1984 to 1987) by Saudi Arabian Dames and Moore (1988). The historical data was found in a hard copy format, and it has to be transformed into a soft copy format (digital  format) to be used. Therefore, Get Data Graph Digitizer software is necessary for this process. Then, the data is transformed to the Excel program. The Watershed Modeling System (WMS) is used to extract basin characteristics and the stream network through the delineation process. Figure 1 shows the general layout of the basins and the locations of the runoff stations at each outlet of the sub-basins and the stream networks.

Fig. 1
figure 1

Locations of the study basins in KSA. a General layout of the basins, b AL-Lith basin, c Tabalah basin, d Yiba basin, e Liyyah basin, and f Habawnah basin

Methodology

The methodology can be summarized in the following steps:

  1. 1.

    Estimation of CN based on the LSM for natural and ordered sorting of rainfall-runoff data pairs to get CNobs for both λ = 0.2 and 0.01,

  2. 2.

    Estimation of CN based on the AFM for natural and ordered sorting of rainfall-runoff data pairs to get CN for both λ = 0.2 and 0.01,

  3. 3.

    Developing a formula to transfer the corresponding NRCS-CN tables at λ = 0.2 to tables for λ = 0.01,

  4. 4.

    Estimation of CN using NRCS-CN tables for both λ = 0.2 and the developed tables in step 3 for λ = 0.01 using GIS and RS for land use and land cover classifications. These values are called CNdesign.

The detailed methodology is explained in the following sections.

The NRCS-CN method equations which are introduced by the USDA-SCS (1985) are given by:

$$ Q=\left\{\begin{array}{c}\frac{{\left(P-\lambda S\right)}^2}{P+\left(1-\lambda \right)S},\kern0.5em P>\lambda S\\ {}0,\kern0.5em P\le \lambda S\end{array}\right.\kern1em $$
(1)

Where: P is the rainfall depth,

Q:

is the direct runoff depth,

λ:

is the initial abstraction ratio, and

S:

is the potential maximum retention of the basin (S = 25,400/CN  -254), where CN is the curve number.

From rainfall and runoff measurements of individual storms, and for a given value of λ, one may calculate CN by combing Eq. 1 with the equation for S to get the equation of CN as,

$$ \mathrm{CN}=\frac{25400}{\ \frac{P}{\lambda }+\frac{\left(1-\lambda \right)Q-\sqrt{{\left(1-\lambda \right)}^2{Q}^2+4\lambda PQ}}{2\ {\lambda}^2}+254} $$
(2)

In the following sections, the methods used to estimate CN is reviewed and a brief description of each method is presented.

The least squares method

The LSM for CN estimation is introduced by Hawkins et al. (2008). The method determines the parameters λ and CN of the well-known runoff equation. The LSM is a minimization of the sum of squared differences between the observed direct runoff (Q) and the estimated runoff \( \left(\hat{Q}\right) \). It gives the best estimation of the parameters. The estimated runoff \( \hat{Q} \) is obtained by Mishra and Singh (2013),

$$ \hat{Q}=\frac{{\left[P-254\lambda \left(\frac{100}{\mathrm{CN}}-1\right)\right]}^2}{P+254\left(1-\lambda \right)\ \left(\frac{100}{\mathrm{CN}}-1\right)} $$
(3)

The minimization problem is formulated by the objective function as given by Hawkins et al. (2008),

$$ f\left(\mathrm{CN},\lambda \right)=\min \sum \limits_{i=1}^n{\left({Q}_i-{\hat{Q}}_i\right)}^2 $$
(4)

Where: Qi is the observed runoff of an event i (mm), \( {\hat{Q}}_i \) is the estimated runoff of the same event i (mm), and n is the no. of events (observations).

The values of CN and λ obtained by LSM is called CNbest and λbest. Farran and Elfeki (2020a) have used the MLS to obtain CNbest and λbest for these basins and their sub-basins and even for each rainfall-runoff event. The best value for λ was found to be 0.01. In the current study, both the value of λ = 0.01 and the common value of λ = 0.2 (USDA-SCS 1985) are used for comparison reason. Therefore, the authors use the terminology CNobs instead of CNbest to represent the value of the curve number obtained by substituting the rainfall depth, the corresponding runoff depth, and the value of λ (0.01 or 0.2) at each event in Eq. 2 to get CNobs at λ = 0.01 and λ = 0.2 respectively in the rest of the paper.

Asymptotic fitting method

The asymptotic fitting method (AFM) is a technique to estimate the behavior of the precipitation and the CN value (the so-called CN-P relationship). As the precipitation approaches infinity (P → ∞), the CN approaches an asymptotic value called CN. Hawkins (1993) showed three different behavior between CN and P. These behaviors are as follows: (1) the complacent behavior, (2) the standard behavior, and (3) the violent behavior. For more details about these behaviors, a reference is made to Hawkins (1993). The formula of the complacent behavior which represents the lower limit of the CN-P relationship is given by Hawkins (1993),

$$ \kern5.25em {\mathrm{CN}}_{\mathrm{O}}=\frac{25400}{\left(\frac{P}{\lambda}\right)+254} $$
(5)

Where CNO is the complacent behavior that represents the curve number at P = Ia (the initial abstraction, Ia = λS), and therefore, Q = 0 (a reference is made to Eqs. 1 and 2).

The formula of the standard behavior is given by Hawkins (1993),

$$ \mathrm{CN}(P)={\mathrm{CN}}_{\infty }+\left(100-{\mathrm{CN}}_{\infty}\right)\ \exp \left(-\frac{P}{b}\right) $$
(6)

Where CN is the asymptotic value ; and b is a constant .

The formula of violent behavior is not considered in this context because it is beyond the scope of the study. It does not show up in the analysis of these basins (Farran and Elfeki 2020b). An extensive study of CN on the same basins is presented in Farran and Elfeki (2020b). The estimation of the fitting parameters (CN and b) is made through the minimization of the squared difference of CN for the given P, Q, and λ as,

$$ f\left({\mathrm{CN}}_{\infty },b\right)=\min \sum \limits_{i=1}^n{\left[\left(\frac{25400}{\ \frac{P_i}{\lambda }+\frac{\left(1-\lambda \right){Q}_i-\sqrt{{\left(1-\lambda \right)}^2{Q_i}^2+4\lambda P{Q}_i}}{2\ {\lambda}^2}+254}\right)-\left({\mathrm{CN}}_{\infty }+\left(100-{\mathrm{CN}}_{\infty}\right)\ \exp \left(-\frac{P_i}{b}\right)\ \right)\right]}^2 $$
(7)

Where Pi and Qi are the precipitation and the runoff depths respectively of the event i.

The results of this technique are discussed in the “results and discussions” section.

Remote sensing and geographical information system for CN estimation

In this approach, the CN is estimated from land use and land cover based on the NRCS table at λ = 0.2 (NRCS 2004) and not from rainfall-runoff events as explained in the previous methods. The procedure to implement this approach is given in the following steps.

RS-based land cover classification approach

Data collection

Data were collected freely from the open-source of the Earth Explorer website (https://earthexplorer.usgs.gov/). Images of Operational Land Imager (OLI) were used in the current study. OLI consists of 11 bands including three bands in the visible region, one band in the InfraRed, two bands in the Shortwave Infrared (30-m resolution), one Panchromatic band (15 m resolution), and two bands in the Thermal Infrared (100 m resolution) in addition to two bands targeting the Ultra blue and Cirrus.

Eight OLI scenes were collected coherently during November 2017 to minimize the variation of the atmospheric conditions. The scenes were preprocessed for geometric, radiometric, and atmospheric correction (Itten and Meyer 1993; Baboo and Devi 2011) then mosaicked to cover the whole study area.

Image classification process

The classification process is an information extraction procedure that detects these spectral signatures and then assigns pixels into classes according to the similarity of their spectral signatures (Briem et al. 2002). Image classification is an essential procedure in the field of remote sensing, digital image analysis, and pattern recognition (Kloer 1994; Richards and Jia 1999).

Supervised classification

Supervised classification is a computer-based process where each pixel is assigned to a category based on a preferable decision rule (Swain and Davis 1981). Supervised classification starts initially with the selection of the training sites for different terrain categories. Based on previous scholarly work on classification accuracies (Elhag and Boteva 2016; Elhag 2016), two different supervised classification algorithms are exercised in the current research study, maximum likelihood—ML and support vector machine—SVM. Maximum likelihood classification is achieved according to the following equation:

$$ {g}_i\ \left(\boldsymbol{x}\right)=\ln P\left({\omega}_i\right)-\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.\ln \left|{\Sigma}_i\right|-\frac{d}{2}\ln 2\pi -{\left(\boldsymbol{x}-{\boldsymbol{m}}_{\boldsymbol{i}}\right)}^T{\Sigma}_i^{-1}\left(\boldsymbol{x}-{\boldsymbol{m}}_{\boldsymbol{i}}\right) $$
(8)

Where:

i:

is the class number,

x:

is the d-dimensional data (where d is the number of bands),

P(ωi):

is the probability that class ωi occurs in the image and is assumed the same for all classes,

|Σi|:

is the determinant of the covariance matrix of the data in class ωi,

Σi−1:

is the inverse of the covariance matrix, and

mi:

is the mean vector of class i,

Support vector machine is performed according to the two-layer neural net using the following inner product kernel,

$$ K\left({x}_i,{x}_j\right)=\mathit{\tanh}\left({\alpha x}_i^T{x}_j+\beta\ \right) $$
(9)

Where:

K(xi, xj) is the inner product kernel function,

α:

is the gamma term in the kernel function for all kernel types except linear, and

β:

is the bias term in the kernel function for the polynomial and sigmoid kernels.

This approach is employed using Erdas software. With this technique, three land cover classes were identified that contain rocks, vegetation, and soil as shown in Fig. 2 and the details of estimation CN, based on this approach, will be explained in the next section.

Fig. 2
figure 2

Maps of the study basins. a AL-Lith, b Tabalah, c Habawnah, d Liyyah, and e Yiba

Classification accuracy assessment

Accuracy assessment was realized based on 100 points of randomly distributed points in each basin collected from Google Earth, the points representing the major land use land cover in the designated study area. The points were evenly distributed in each scene. The points were converted into 50-m2 polygons under the GIS environment for approachability reasons.

Validation points were individually assigned to three different land cover categories: rocks, alluvium, and vegetation. Points were used to compute users, producers, and overall accuracies.

Producer’s accuracy is calculated as follows:

$$ \mathrm{Producer}\ \mathrm{accuracy}\%=\frac{C_{ii}\ }{C_{\ast i}}\ 100 $$
(10)

Where,

Cii:

are the elements at position ith row and ith column.

Ci:

is the column sums.

User’s accuracy is calculated using the formula,

$$ \mathrm{User}\ \mathrm{accuracy}\%=\frac{C_{ii}\ }{C_{i\ast }}\ 100 $$
(11)
C ii :

is the element at position ith row and ith column, and

C i :

is the row sums.

The overall accuracy is estimated by the following formula,

$$ \mathrm{overall}\ \mathrm{accuracy}\%=\frac{\sum \limits_{j=1}^U{C}_{jj}\ }{O}\ 100 $$
(12)

Where,

Cjj:

is the element at position jth row and jth column, and

O and U:

are the total number of pixels and classes respectively.

Matching of user’s and producer’s accuracies delivers accurateness to the classification and assures a robust liability of the implemented accuracy assessment (Cohen (1960), Congalton et al. (1983), and Congalton (1991)).

\( \hat{\mathrm{K}} \) statistics is a second measure accuracy agreement. This measure of agreement is based on Congalton and Mead (1983) findings. It is defined as the maximum likelihood estimation from the multinomial distribution. \( \hat{\mathrm{K}} \) is calculated using the following equation:

$$ \overset{\wedge }{\mathrm{K}}=\frac{\sum \limits_{i=1}^r{x}_{ii}-\sum \limits_{i=1}^r\left({x}_{i+}.{x}_{+i}\right)}{N^2-\sum \limits_{i=1}^r\left({x}_{i+}.{x}_{+i}\right)} $$
(13)

Where

r:

is the number of rows and columns in the error matrix,

xii:

is the number of observations in row i and column i (the diagonal elements),

xi+:

is the marginal total of row i,

x+i:

is the marginal total of column i, and

N:

is the total number of observations in the matrix.

To increase the accuracy assessment of the vegetation land cover class, Normalized Difference Vegetation Index (NDVI) was exercised according to Pettorelli et al. (2005) where the NDVI layer was overlaid over the slope layer under GIS environment to examine the vegetation cover in steeply areas. NDVI was carried out as follows:

$$ \mathrm{NDVI}=\left(\mathrm{NIR}-\mathrm{RED}\right)/\left(\mathrm{NIR}+\mathrm{RED}\right) $$
(14)

Where:

NDVI is the Normalized Difference Vegetation Index.

NIR and RED are the amounts of near-infrared and red light, respectively, reflected by the vegetation and captured by the sensor of the satellite.

Figure 2 shows the study basins projected on the base map to show the land cover in the study area. Since these basins are natural basins, therefore the only thing one may see is the rocks and the alluvium. Figure 3 shows the basins based on the above classification, and three different classifications are distinguished, namely the alluvium, the vegetation, and the rocks. Since most of the basins contain rock and alluvium, therefore, these classifications are reduced to alluviums and rocks to simplify the estimations. Therefore, it has been decided that the vegetation area located in the alluvium is added to the alluvium area and the vegetation located in the rock is added to the rock area. Table 1 shows the area of the vegetation, the rock, and the alluvium and the percentage of the vegetation, the rock, and the alluvium for each sub-basin. According to the NRCS-CN table, the hydrologic condition is estimated for each sub-basin based on the percentage of grass in the ground cover to poor: < 30%, fair: 30 to 70%, and good: > 70%. Based on this classification, the hydrologic conditions of the sub-basins are determined. The table shows that most of the sub-basins have a poor hydrologic condition. Table 1 shows also the hydrologic soil groups A, B, C, and D, with group A corresponding to high infiltration, while group D corresponds to low infiltration rate. The values of CN assigned to these groups are estimated based on the common landscape of the arid and the semi-arid basins as given in the table of NRCS-CN (NRCS 2004), namely desert shrub-major plants include saltbush, greasewood, creosote bush, black bush, bur sage, paloverde, mesquite, and cactus. Therefore, the  values in the tables are obtained from the NRCS-CN table at λ = 0.2 (NRCS, 2004).

Fig. 3
figure 3

Land cover (LC) map of the study basins. a AL-Lith, b Tabalah, c Habawnah, d Liyyah, and e Yiba. Three classifications are identified, namely rock, alluvium, and vegetation

Table 1 Area of alluvium, vegetation, and rocks and the estimated CN values of sub-basins based on NRCS-CN table at λ = 0.2 (USDA-NRCS 2004)

Design CN value based on RS-GIS and NRCS-CN table: the updated traditional approach

In this section, the traditional approach to estimate CN based on the NRCS-CN table is reviewed. However, the traditional approach has been updated by two new aspects. The first aspect is the use of the RS-GIS techniques, as explained in the previous section in the classification of the land cover of the basins into alluvium, rock, and vegetation and estimation of the proportion of each of these classes in the basins. The second aspect is that the NRCS-CN table values are estimated based on the abstraction ratio, λ = 0.2 (USDA-NRCS 2004). However, in the current study, a value of λ = 0.01 is also considered in the analysis since it has been shown to best represent the arid basins in Saudi Arabia (Farran and Elfeki, 2020a and b). Therefore, there is a need to convert the NRCS-CN table values form λ = 0.2 to λ = 0.01 to test the effect of λ on the results. Figure 4 shows the regression equations that have been developed to make such a transformation. The figure can be expanded as follows. The left image shows the relationship between CN estimated from the rainfall-runoff observations at λ = 0.2 based on Eq. 2 (CN values are also presented in Table 2 under column CNobs) and plotted on the vertical axis. The CN values from the NRCS-CN table estimated from the values in Table 2 (the bold italic face in the table under HSG, they are presented in column CN value (HSG) close to CNobs) and plotted on the horizontal axis. A best-fit line is made between the percentiles of both CN values as shown in Fig. 4 (left) to find the relationship between the observed and tabulated CN at λ = 0.2. The developed relationship is shown in the graph. It is almost a one to one relationship (linear relation with R2 = 0.83). It is worth mentioning that the CNobs at λ = 0.2 can be classified as soil C and D which indicates relatively low infiltration. However, this seems to be in the contrary to the actual soil condition in these basins which normally have high infiltration during the phenomenon of transmission losses. This point will be discussed later in the “results and discussions” section. Figure 4 (right) shows a percentile plot between the observed CN at λ = 0.2 on the vertical axis and the observed CN at λ = 0.01. A best-fit line is made through the data to find a relationship between CNobs at λ = 0.2 and at CNobs at λ = 0.01. This relation takes the form in the figure (linear relation with R2 = 0.91). Combining the two formulas in Fig. 4 and rearranging, one may obtain a relationship between NRCS-CN table at λ = 0.2, CNT(0.2) , and the developed CN table at λ = 0.01, CNT(0.01) as,

Fig. 4
figure 4

The developed regression equations to convert NRCS-CN table values at λ = 0.2 to the developed CN table values at λ = 0.01 suited for the study basins

Table 2 Classification of the observed CN from rainfall-runoff measurements into hydrologic soil groups A, B, C, and D (NRCS-CN table at λ = 0.2, USDA-NRCS 2004) for the sub-basins
$$ {\mathrm{CN}}_{\mathrm{T}(0.01)}=2.4{\mathrm{CN}}_{\mathrm{T}(0.2)}-143,{\mathrm{CN}}_{\mathrm{T}(0.2)}>60 $$
(15)

Equation 15 is then used to t constract a table for CN at λ = 0.01. It should be noted that Eq. 15 is only valid for CNT(0.2) larger than 60. The reason is that CN is directly proportional to λ (Farran and Elfeki 2020a) and therefore when λ decreases, the value of CN decreases as well; however, CN cannot be negative, so the limit is CNT(0.2) = 60, which corresponds to CNT(0.01) = 1. Table 3 shows a comparison between CNT(0.2) and the corresponding CNT(0.01) for different hydrologic conditions and HSG. It is obvious from the table that the CNT(0.01) is lower than the CNT(0.2). The low values of CNT(0.01) reflect the effect of transmission losses in these basins. Both values of CNT(0.2) and CNT(0.01) are going to be used to estimate design CN values for the basins as will be explained in the coming section.

Table 3 Comparison between runoff curve numbers for arid and semiarid rangelands CNT(0.2) and estimated CNT(0.01) in the current study

The common approach of CN estimation in ungagged basins is made using the NRCS-CN table (USDA-NRCS 2004). Hydrologists normally use a weighted average CN (called the composite CN, and is abbreviated as CNc) over the basin through the application of the following equation,

$$ {\mathrm{CN}}_{\mathrm{c}}=\frac{\sum \limits_{i=1}^n{\mathrm{CN}}_i{\mathrm{A}}_i}{\sum \limits_{i=1}^n{A}_i} $$
(16)

Where,

CNc:

is the composite runoff curve number,

Ai:

is the area of class i,

CNi:

is the curve number of class i, and

n:

is the number of classes.

Since the natural basins in the current study are classified as containing only two classes, namely rock and alluvium, therefore, for practical applications point of view, Eq. 16 can be simplified to read (USDA-NRCS 2004),

$$ {\mathrm{CN}}_{\mathrm{c}}={\mathrm{CN}}_p+\left(\frac{P_{\mathrm{im}}}{100}\right)\left({\mathrm{CN}}_{\mathrm{im}\mathrm{p}}-{\mathrm{CN}}_p\right) $$
(17)

where:

CNp:

is the pervious runoff curve number (in case of natural basins is the alluvium),

Pim:

is the percent imperviousness (in the case of the natural basins in the rock), and

CNimp:

is the curve number of the impervious area (rock) which is equal to 98 based on (USDA-NRCS 2004).

Equation 16 is mainly used in the literature to study the runoff in urban areas with CNimp used to describe the runoff curve number in paved regions with a value of 98. This equation is considered to estimate the design CN (which is called CNdesign = CNc) based on the traditional approach used by hydrologists and engineers to estimate the CN of a basin.

The aforementioned methodologies will lead to three values of CN for each λ, namely CNobs which is estimated from the observation, CN which is the asymptotic CN, and the estimated CN from the NRCS-CN table (CNdesign). The results are discussed in the following section.

Results and discussions

Analysis of the results based on LSM

Figure 5 shows a comparison between the NRCS-CN theory and the observed rainfall-runoff data under two scenarios, namely the natural versus the ordered sorting of rainfall-runoff data pairs (P: Q). The natural sorting of data is used in the literature by, e.g., D’Asaro and Grillone (2012). The pairs of rainfall and the corresponding runoff are combined to estimate CN. This method is consistent since the generated runoff is coming from the same rainfall that caused it. However, in the ordered data sorting method (Hjelmfelt 1980), the rainfall and runoff data are independent and are rearranged by ranking as one may do in frequency analysis of rainfall. This ordered method is also called the frequency matching technique since a new group of pairs of P: Q is generated (in this case, Q is not associated with its causative P). Both techniques are considered in the analysis. The left column of Fig. 5 displays the results for both techniques at λ = 0.01 and the right column displays the results at λ = 0.2. The natural data appeared scattered in the figure with the red circles. However, the ordered data are less scattered and presented in the blue triangles. The NRCS-CN theory given by Eq.1 is plotted through the data for both cases (i.e., natural and ordered) at both values of λ. Visual inspection of the figure shows that the theory fits better the ordered data rather than the natural data. The reason is that the ranking process leads to less scatted data and therefore most of the literature studies use ordered data. Another important observation is that at λ = 0.01, the data is better represented by the NRCS-CN theory in comparison with λ = 0.2 (the common value). This confirms that in the Saudi arid environment, it is recommended to use λ = 0.01 rather than 0.2. Table 4 presents a quantitative analysis of the results presented in Fig. 5. Both the coefficient of determination, R2, and the root mean square error (RMSE) of the fitting are estimated for both cases at both values of λ. The table shows that R2 is the highest at λ = 0.01 under ordered sorting of data and provides the minimum RMSE. This result is previously supported by Farran and Elfeki (2020a). Table 5 displays the results of CN based on LSM. The table shows that the values of CN are lower at λ = 0.01 when compared with λ = 0.2. This is due to the significance of the transmission losses in these basins which is a common phenomenon in arid regions. Also, one may notice that the CN values in the case of the natural data are relatively lower than the case of ordered data. The reason is that in the natural data, the runoff is generated from its rainfall; therefore, the obtained CN is a real representation of the event. It accounts for the actual transmission losses from that specific event. However, in the ordered data, it is not the case. The rainfall and runoff are not from the same event and therefore the CN does not reflect the actual transmission losses.

Fig. 5
figure 5

Comparison between the NRCS-CN theory and the observed rainfall-runoff data under two scenarios, namely the ordered versus the natural sorting of data pairs (P: Q): Left column (λ = 0.01) and right column (λ = 0.2)

Table 4 R2 and RMSE of the runoff, Q, estimated by the LSM at the two values of λ under both natural and ordered sorting of data
Table 5 Summary of the results of CN based on LSM (CNobs) at the two values of λ under both natural and ordered sorting of data

Figure 6 shows the mapping of the CNobs values over the basins based on λ = 0.2 (right column) and λ = 0.01(left column). The figure displays the magnitude of CN for each subbasin with different colors noticing that the subbasin with gray color has no data since there are no measurements of runoff at the outlet of such subbasin (see Table 1 and Fig. 1 for each runoff station for each subbasin). It is quite obvious that the values of CN are varying between each subbasin according to the rainfall-runoff measurement at the outlet station. The values of CN for λ = 0.2 is relativity higher than the values of CN for λ = 0.01. This observation is because there is a direct proportion between the CN and λ as explained by Farran and Elfeki (2020a). However, the values of CN at λ = 0.01 seem more realistic to represent these basins since it accounts for the transmission losses which is a significant phenomenon in these mountainous arid basins (Farran and Elfeki, 2020a and c). The values of CN in these maps are useful to study the rainfall-runoff simulations in these basins for flood studies.

Fig. 6
figure 6

Mapping CNobs values over the basins based on λ = 0.2 (right column) and λ = 0.01(left column). Zones with gray color have no values because there are no runoff stations at the outlet of these basins to estimate CN

Analysis of the results of the AFM

Figure 7 shows the results of the application of the AFM to the rainfall data under the two values of λ = 0.01 and 0.2 and under the natural (left column) and ordered (right column) sorting of data pairs (P: Q). The standard behavior is the dominant pattern in the basins. The figure shows a relatively higher scatter for the natural sorting of data when compared with the ordered sorting data. Another observation is that the scatter is relatively high for λ = 0.01 when compared with λ = 0.2; however, the asymptotic value, CN, is reached for most of the basins within the size of the rainfall data for λ = 0.01and not reached for λ = 0.2.

Fig. 7
figure 7

AFM applied to the rainfall data under the two values of λ = 0.01 and 0.2 respectively, and the natural (left column) and ordered (right column) sorting of data pairs (P: Q)

The developed CN-P relationship for the basins is displayed in Table 6. The fitting parameters (CN and b) of the CN-P equations are given in the table. The CN varies from 12.1 to 32.6 in the case of λ = 0.01and from 9.5 to 56.1 for λ = 0.2 under the natural sorting of data. The decay coefficient, b, varies between 7.5 and 20.3 mm in the case of λ = 0.01and from 32.2 to 98.4 mm for λ = 0.2 under the natural sorting of data. This means that low CN is reached for λ = 0.01 with a fast decay when compared with λ = 0.2 which again reflects the high transmission losses in these basins. The high transmission losses have been supported by an earlier study on the Yiba basin (Elfeki et al. 2015). They showed that the transmission losses, in some reaches in the Yiba basin, can reach up to 84% of the inflow hydrograph.

Table 6 Summary of the developed equations for the standard behavior under both natural and ordered sorting of data pairs with both values of λ

On the other hand, the CN varies between 29.2 and 63.6 in the case of λ = 0.01and from 49.9 to 80 for λ = 0.2 under the ordered sorting of data. The decay coefficient, b, varies between 1.7 and 18.7 mm in the case of λ = 0.01and from 12.7 to 44.6 mm for λ = 0.2 under the ordered sorting of data. In this case, the CN and b have the same features as the natural case; however, the CN is somewhat higher due to the ordering of the data and also the scatter is reduced. The decay is faster to reach CN in the sorting case while the decay at λ = 0.01 is faster than at λ = 0.2.

A quantitative measure of the discrepancies between the cases is displayed in Table 6. The coefficient of determination, R2, is relatively higher in the case of λ = 0.2 than in the case of λ = 0.01. However, this does not mean that λ = 0.2 is better to represent the data than λ = 0.01 because the NRCS-CN theory has proven that λ = 0.01 is the best to fit the data (see the results in Table 4). Therefore, AFM seems to be elusive in testing the validity of the NRCS-CN theory. Since the results of AFM is on the contrary of the LSM in the previous section. Therefore, it is not recommended to be used to represent CN for these natural arid basins.

Comparison of the CNc (CNdesgin) estimated from NRCS table and CN estimated from the observation (CNobs)

Figure 8 shows a scatter plot between the composite CN (CNc) obtained from Eq. 17 and observed CN (CNobs) at various values of λ for different values of CNimp at different HSG (A, B, C, and D) as shown in Table 1. The average value of CN of the various values of HSG (A, B, C, and D) is also plotted. In this figure, a tuning with the parameter CNimp is made to get the scattered data points around line 1:1. In Fig. 8a at λ = 0.2, and CNimp = 98 (it represents impervious pavements in urban areas), the scatter points are mostly located above the line of perfect fit (1:1 line). This means that the tabulated NRCS-CN values (Table 3 CNT(0.2), obtained from the reference NRCS 2004) overestimate the observations at λ = 0.2 by using CNimp = 98. However, by tuning the CNimp to a value of 90 at the same λ = 0.2, the line of perfect fit is somewhat located in the middle of the scattered data (Fig. 8b). This implies that when analyzing runoff in these basins at λ = 0.2, the best value for CNimp should be 90 instead of 98. Table 7 shows the estimation of the RMSE for both cases at λ = 0.2. It shows that the RMSE in the CN is reduced from 11at CNimp = 98 to 7 at CNimp = 90. The interpretation of these results is that the mountainous rocks in the study area contain a relatively high density of fractures and fissures, due to weathering conditions, that contribute to the infiltration process and consequently lead to lower CNimp. Figure 8c shows the same relationship between the composite CN (CNc) obtained from Eq. 17 and the observed CN (CNobs) at λ = 0.01 and CNimp = 98. The figure shows the overestimation of the CNc when compared with CNobs. The CNc is estimated based on the results in Table 3 under the column of CNT(0.01), based on the developed equation (Eq. 15). However, CNc, in this case, is relatively much more scattered in the vertical direction than that of Fig. 8a. This scatter is reduced by tuning the CNimp to a value equal to 72 as shown in Fig. 8d. Table 7 shows a reduction in RMSE of CN from 34 to 18 for CNimp = 98 to CNimp = 72 respectively. Again, this reflects the infiltration from these rocks due to fissures and cracks in these rocks. Therefore, it is recommended to use λ = 0.2 with CNimp = 90 or λ = 0.01 with CNimp = 72 in analyzing runoff in such basins. In both cases, the error in runoff estimation will be minimal.

Fig. 8
figure 8

Comparison between composite CN (CNc) obtained from Eq. (17) and observed CN (CNobs) at various values of λ and CNimp at different HSG (A, B, C, and D) and the average value of CN. a λ = 0.2 and CNimp = 98, b λ = 0.2 and CNimp = 90, c λ = 0.01 and CNimp = 98, and d λ = 0.01 and CNimp = 72

Table 7 Comparison between RMSE of CN observed (CNobs) and composite runoff CN (CNc) at different values of CNimp and λ

Relationship between CNc from Eq. (17) and CNc from the data as a function of the percentage of the impervious area (P im)

Figure 9 displays the theoretical relationship between CNc obtained from Eq. 17 and the CNc from the data as a function of the percentage of the impervious area (Pim) at various values of λ and CNimp. The figure shows four scenarios. The first scenario is when λ = 0.2 and CNimp = 98; this case is presented in Fig. 9a. It shows that the data for the basins are located at the line of CNp = 70 which means that the alluvium area has CN = 70 which seems relatively high. The same observation can be noticed in the second scenario when λ = 0.2 and CNimp = 90; the CN data for the alluvium is also located at the line with CNp = 70. The third scenario is presented in Fig. 7c where λ = 0.01 and CNimp = 98. In this scenario, the data is located on the line with CNp = 40. This leads to an estimation of the CN of the alluvium in these basins to be about 40. This seems more realistic for these basins since it would account for the transmission losses in the alluvium. The last scenario that appeared in Fig. 9d is similar to the previous scenario with λ = 0.01, while with CNimp = 72. Table 8 summarizes the parameters (P%, Imp%, CNp, CNimp, and CNc) used for fitting the data for all scenarios for these basins. These parameters can be used for future studies of rainfall-runoff simulations in these basins.

Fig. 9
figure 9

Comparison between CNc obtained from Eq. 17 CNc from the data as a function of the percentage of the impervious area (Pim) at various values of λ and CNimp. a λ = 0.2 and CNimp = 98, b λ = 0.2 and CNimp = 90, c λ = 0.01 and CNimp = 98, and d λ = 0.01 and CNimp = 72

Table 8 Summary of the percentage of pervious (alluvium) and impervious (rocks) and the corresponding CN (CNp) for alluvium, CNimp for rocks, and the composite CN (CNc) at different values of λ

Comparison of various methods of CN estimation

Figure 10 summarizes a comparison between the various estimation methods of CN. namely CNobs, CN, and CNdesign, for the basins under ordered and natural sorting of P: Q pairs and at λ = 0.01 and λ = 0.2. Figure 10a and b are for the natural sorting of data at λ = 0.01 and 0.2 respectively, while Figs. 10b and c are for ordered sorting of data at λ = 0.01 and 0.2 respectively. The figures showthat CNdesign is the highest for all basins for both values of λ and at both types of data sorting. The second highest value is CNobs (that is based on LSM), while the lowest value is for CN (based on AFM). The design CN is made through the NRCS-CN table for λ = 0.2 which is commonly used by hydrologists and engineers worldwide. However, the design CN for λ = 0.01 is somewhat lower than the CN at  λ = 0.2. The low values of CNdesign for these arid basins are obtained from the developed equation (Eq. 15) which accounts for the transmission losses in these basins. It is also obvious that the values based on ordered sorting of data are relatively higher than the natural sorting of data for both values of λ and for both CNobs and CN. However, CNdesign does not depend on the order because it is based on the NRCS table. These results show high variability between the various methods. However, the variability is relatively reduced in the case of ordered sorting of P: Q data pairs, since ordering provides a relatively higher correlation between the P: Q pairs. The ordered sorting is commonly used in consideration for design purposes, while the natural sorting of data is mainly used for natural simulation of the rainfall-runoff process in the basins.

Fig. 10
figure 10

Comparison between various methods for CN estimation (CNobs, CN, and CNdesign) at λ = 0.01 and λ = 0.2. a Natural sorting of data pairs (P: Q) at λ = 0.01, b natural sorting of data pairs (P: Q) at λ = 0.2, c ordered sorting of data pairs (P: Q) at λ = 0.01, and d ordered sorting of data pairs (P: Q) at λ = 0.2

Validation of various methods of CN estimation

Figure 11 shows a comparison between the observed and the estimated runoff depths (mm) using the values of CN estimated from the various methods (CNobs, CN, and CNdesign) at both λ = 0.2 and 0.01 under the case of ordered and sorted data pairs (P: Q). The figure shows that CNdesign overestimated the observed values for all cases. This indicates that the use of the NRCS-CN table (Table 3 at CN(0.2)) at λ = 0.2 or the modified NRCS-CN table (Table 3 at CN(0.01)) at λ = 0.01 produces high values for CN estimation that in turn leads to a safer design of the flood mitigation structures. However, CNobs and CN are around the line of perfect fit (1:1 line) for the case of λ = 0.01 under both sorted and ordered data points (see Figs. 11 a and b). This result suggests the use of both CNobs and CN for the simulation of the hydrological processes in the basins since it matches the observed runoff values. The case of λ = 0.01, in Figs. 11 c and d, shows however a highly scatter data especially at the natural sorting of data pairs (P: Q). This confirms that λ = 0.2 is not suitable to be used in the Saudi arid environment (Farran and Elfeki 2020c). Table 9 summarizes a quantification of Fig. 11. Both R2 and RMSE are estimated for the aforementioned cases and the results are tabulated. The table indicated that the highest R2 = 0.83 and minimum RMSE = 1.47 mm at CNobs with λ = 0.01 at the natural sorting of data. While the lowest R2 = 0.0 and a maximum RMSE = 32.54 mm at CN with λ = 0.2 at the natural sorting of data. This result shows that CN is not a good estimation of the CN in these arid basins which is also confirmed by Farran and Elfeki (2020b). The CNdesign for both λ = 0.2 and λ = 0.01 shows relatively the same R2 = 0.72 and 0.74 respectively; however, the RMSE is minimal at λ = 0.01 (5.98 mm) in comparison with λ = 0.2 (10.42 mm). CNdesign does not depend on data sorting or ordering. This last result concludes that the NRCS-CN table at λ = 0.01 should be used in the design of mitigation structures rather than the NRCS-CN at λ = 0.2 since it reduces the RMSE.

Fig. 11
figure 11

Validation of the various methods for CN estimation (CNobs, CN, and CNdesign) at λ = 0.01 and λ = 0.2. a Natural sorting of data pairs (P: Q) at λ = 0.01, b ordered sorting of data pairs (P: Q) at λ = 0.01, c natural sorting of data pairs (P: Q) at λ = 0.2, and d ordered sorting of data pairs (P: Q) at λ = 0.2

Table 9 Summary of the validation of the CN estimation methods. Comparison between estimated and observed runoff depth (mm)

Summary and conclusions

A comparison between various methods for CN estimation has been carried out. These methods are the NRCS-CN table (CNdesign), the least-squares method (LSM) which leads to the so-called CNobs, and the asymptotic fitting method (AFM) which leads to obtaining the so-called asymptotic CN, CN. A comparison between these methods is carried out under the effect of changing the coefficient of abstraction ratio, λ, and under the effect of sorting the data pairs (P: Q). Moreover,  a methodology has also been developed to convert the NRCS-CN table values from λ = 0.2 to λ = 0.01 for arid basins. The following conclusion can be drawn from the study:

  1. 1.

    The values of CNdesign, CNobs, and CN are relatively lower in the case of λ = 0.01 when compared with λ = 0.2. This reflects the transmission losses occurring in these basins.

  2. 2.

    The highest CN value obtained is the CNdesign, then CNobs, while CN shows the lowest value. Therefore, for a safe design of the hydraulic structures, it is recommended to use CNdesign. However, for the simulation of the rainfall-runoff process in the natural basins, it is recommended to use CNobs at the natural sorting of data pairs (P: Q). CN is not a good estimation of CN in the Saudi arid environment since the asymptotic value is not reached within the rainfall record in most of the basins.

  3. 3.

    The relationship between the observed CN and the NRCS-CN table shows that estimating runoff using λ = 0.2 is best made by CN of the impervious areas (CNimp = 90) instead of CNimp = 98 used in the literature. The RMSE of CN is reduced from 11 at CNimp = 98 to 7 at CNimp = 90. This value reflects the infiltration process in the rocks of the mountains (impervious region of the basin) due to the high density of fractures and fissures in these mountains.

  4. 4.

    The developed NRCS-CN table at λ = 0.01 should be used in the design of mitigation structures in the natural basins in the Saudi arid environment rather than the NRCS-CN at λ = 0.2 since the developed table reduces the RMSE in the estimated runoff by 57%.

The results of this research recommend the use of λ = 0.01 for flood studies in the Saudi arid environment for natural arid basins together with the developed NRCS-CN table at λ = 0.01.