Introduction

Around 9% of the world’s population is dependent on karst water resources for drinking water (Stevanović 2019). Karst systems are heterogeneous media with high contrasts in porosity and permeability, inducing a high variability in infiltration and internal flow processes (Bakalowicz 2005; Ford and Williams 2007). With the increasing demand for water, the characterisation of the functioning of karst aquifers become a major challenge for water resource management (Bakalowicz 2011). Among the numerous methods to study karst aquifers, analyses of spring discharge time series (recession curves, signal, statistics) are the most accessible as they only require the monitoring of spring discharge. Therefore, they are generally used as a preliminary step for characterising the hydrological functioning of karst systems, and subsequently for developing and designing hydrological models. Also, many authors declined these analyses into classifications for differentiating karst systems, with recession curves (Bonacci 1993; Cinkus et al. 2021; Dewandel et al. 2003; Fiorillo 2014; Kullman 2000; Malík and Vojtková 2012; Mangin 1975; Soulios 1991), signal (Gárfias-Soliz et al. 2010; Mangin 1984), hydrograph (Kovács 2021; Zhang et al. 2020) and statistical indicators (Flora 2004; Hakoun et al. 2022; Rashed 2012; Springer et al. 2008).

Recession curve analysis has been widely developed over the past century (Barnes 1939; Boussinesq 1903; Coutagne 1948; Drogue 1972; Horton 1933; Kullman 2000; Maillet 1905; Mangin 1975; Padilla et al. 1994). It consists in calibrating numerical models on a selection of recession curves from a hydrograph and interpreting the parameters of the equations. Signal analyses—originally developed by Box and Jenkins (1976), Brillinger (1975), and Jenkins and Watts (1968)—were introduced in karst hydrology by Mangin (1984). Their purpose is to characterise the temporal structure of hydrological signals, which allows deducing information on the inertia of a karst system (Jeannin and Sauter 1998; Kovács 2003; Larocque et al. 1998). Statistical analyses include distribution indicators such as mean, standard deviation and quantiles, but also cumulative frequency curves (Malík 2015; Mangin 1971). Numerous studies across the world are based on these analyses for characterising karst systems properties, determining the hydrodynamic parameters of aquifers and providing information on flow dynamics (Guo et al. 2021; Lorette et al. 2018; Malík et al. 2021; Nurkholis et al. 2019; Sağır et al. 2020; Vrsalović et al. 2022; Zerouali et al. 2021).

The completion of these analyses often requires a meticulous reading of the literature and appropriate programming skills. The application of statistical and signal analyses (e.g. simple correlational and spectral analyses) is generally done using or writing specific code functions. The recession curve analysis requires to (i) select and isolate several parts of the discharge time series, (ii) calibrate a recession model over each recession curve, and (iii) calculate indicators of functioning from the model’s parameters values. These operations can be tedious—especially for long time series—and are subjects to errors in selection, calibration and indicators calculation. For these reasons, several authors proposed powerful toolboxes and software to facilitate the completion of the recession curve analysis (Arciniega-Esparza et al. 2017; Carlotto and Chaffe 2019; Gregor and Malík 2016; Posavec et al. 2017), one of them also including statistical and signal analyses (BRGM 2022).

This paper presents an application (KarstID) that provides the user a toolbox for both the analysis of karst spring discharge time series and the characterisation of karst systems hydrological functioning. KarstID is distinguishable from other software because (i) it supports multiple analyses of discharge time series (statistical, recession curves, simple correlational and spectral, classified discharges) and automatic calibration of recession model; (ii) it proposes a classification of karst systems hydrological functioning (according to the proposal of Cinkus et al. (2021)) and a comparison of the results to a database of 78 karst systems; and (iii) it is free, open source and actively developed on a developer community platform. KarstID is built with the R Shiny framework (Chang et al. 2021) and is embedded into an R package (R Core Team 2021), which make the installation and launch easy even for non-programmers.

Software overview

The links to the user guide, the source code and the git repository are available on the French SNO KARST (Service National d’Observation du Karst) website (https://sokarst.org/en/softwares-en/karstid-en/). The user guide provides guidelines for the installation and launch, as well as a technical, in-depth and visual description of all the features of the application. The source code includes the data and functions used (i) for applying the analyses, (ii) for generating the plots, (iii) for managing the application and (iv) for building the R package. Users can start discussions or raise issues in the git repository, as well as propose new code or modify existing code with Pull Requests.

Workflow

First, the user has to load an appropriate dataset using the “Data import” tab. The second step is to apply four different methods for analysing the hydrological functioning of the system (Fig. 1). Two of these analyses (statistical and classified discharges analyses) do not need any actions from the user and the results of these analyses are directly displayed in their respective tabs. Two complementary analyses (recession curves and signal analyses) require the user to select curves and/or define parameters for the functions (i.e. recession model, autocorrelation function). The completion of the recession curve analysis automatically launches the third step which is the classification of the hydrological functioning of the system according to the methodology proposed by Cinkus et al. (2021). The user can then appreciate the results of the classification and compare the various hydrological characteristics of the analysed karst system with the ones of 78 karst systems located worldwide.

Fig. 1
figure 1

Synthetic workflow of the KarstID application. Green, yellow, blue, and purple boxes represent, respectively, (i) input data, (ii) action within KarstID, (iii) available analyses within KarstID, and (iv) output data

Data import

The “Data import” tab allows the user to load a karst spring discharge time series into KarstID. The raw data can be either a plain text or an Excel file, and must have only two columns referring to date and discharge, respectively. KarstID supports date and date-time format for the date column, and numeric format for the discharge column. The application proposes several features to minimise the preprocessing of the data: it is thus possible to (i) skip rows, (ii) select a specific Excel sheet, (iii) use a header or not, (iv) define the decimal mark, (v) define the delimiter and (vi) specify the date format. The user can give a name to the dataset, which will be used when displaying or downloading results.

The user can choose to interpolate missing discharge values and specify the maximum gap that will be covered. The interpolation is performed with the na.spline(method = “monoH.FC”) R function from the zoo package (Zeileis and Grothendieck 2005). The method is particularly well suited for the interpolation of small gaps, but users must be careful when using it for large gaps. Critical gap length cannot be specified a priori since it depends on both the time step on the time series and the hydrological behaviour of the investigated system. The user can also choose to either (i) keep all missing values in the time series or (ii) keep only the longest part of the time series without missing values.

After defining the import options and starting the importation, the application will (i) look for missing date entries and fill the blanks if necessary (adapted to the time step of the time series), (ii) interpolate missing discharge values, (iii) perform a daily or hourly mean over the discharge time series, and (iv) display a hydrograph on the same page. The interpolation and daily/hourly mean are realised according to the user-defined options. Note that the hourly mean can only be applied if the initial time step of the time series is at an hourly time step or less.

Methods

Four different methods are proposed in KarstID for analysing karst spring discharge time series. The methods can be applied independently of each other in their respective tabs.

Statistical analyses

Statistical analyses of spring discharge provide fundamental information about the hydrological functioning of a system. In KarstID, the following indicators are automatically calculated over the discharge time series when a dataset is imported: mean, maximum, minimum, standard deviation, 0.1 quantile (Q10), 0.9 quantile (Q90), Coefficient of Variation (CV) and Spring Variability Coefficient (SVC). The coefficient of variation corresponds to the ratio between the standard deviation \(\sigma\) and the mean \(\mu\) of the values:

$$\mathrm{CV}=\frac{\sigma }{\mu },$$

The SVC, which corresponds to the proposal of a “characteristic discharge” by Netopil (1971), is the ratio between Q90 (value that is exceeded 10% of the time) and Q10:

$$\mathrm{SVC}=\frac{Q90}{Q10}.$$

The statistical indicators appear in a table below the hydrograph in the “Data import” tab. The number of missing discharge values are given in the last column of the table. The statistical analyses can be performed even if there are missing discharge values in the discharge time series.

Recession curve analysis

Recession curves correspond to the periods when the discharge gradually decreases without replenishment of water (Toebes and Strang 1964). The analysis of recession curves can be used to assess groundwater storage and gain insights into the hydrological functioning of an aquifer (Drogue 1972; Forkasiewicz and Paloc 1967; Kovács 2003; Krešić 2007; Kullman 2000; Malík 2006; Malík and Vojtková 2012; Mangin 1975). Generally, a recession curve can be divided into (i) an influenced regime or quickflow component, and (ii) a non-influenced regime or baseflow component. Usually, the influenced regime results from the fast infiltration of the precipitation through large fractures and conduits, while the non-influenced regime results from slow infiltration through a less transmissive media such as a porous matrix (Mangin 1975). Numerous recession models exist whose indicators of functioning and interpretation differ.

To date, after analyses of the various aforementioned methods, Mangin’s recession model (Mangin 1975) was identified as the most informative model (Cinkus et al. 2021). Accordingly, KarstID only propose Mangin’s recession model to identify relevant indicators necessary for classifying karst systems hydrological functioning (see section “Classification”). Mangin’s model is a two-equation recession model that requires the manual definition of an inflexion point for distinguishing between influenced and non-influenced regimes:

$${Q}_{t}={Q}_{R0}{e}^{-\alpha t}+{q}_{0}\frac{1-\eta t}{1+\varepsilon t},$$

with \({Q}_{t}\) the discharge at time \(t\), \(\alpha\) the recession coefficient, \({Q}_{R0}\) the baseflow extrapolated at \(t=0\), \({q}_{0}\) the influenced discharge corresponding to the difference between \({Q}_{0}\) (discharge at \(t=0\)) and \({Q}_{R0}\), \(\eta\) a constant characterising the speed of infiltration (\(\eta =1/{t}_{i}\), with \({t}_{i}\) the beginning of the non-influenced regime) and \(\varepsilon\) a constant characterising the concavity of the influenced part of the recession curve.

The Mangin’s recession model is widely used as several indicators can be calculated for characterising the hydrological functioning of a karst system. The indicator \(k\) gives information about the capacity of a system to store and release recharge water, and is calculated as follows:

$$k=\frac{{V}_{\mathrm{DYN}}}{{V}_{an}},$$

with \({V}_{an}\) the yearly mean volume of water discharged at the spring. The dynamic volume \({V}_{DYN}\) corresponds to the integral of the exponential function of the recession model:

$${V}_{\mathrm{DYN}}={\int }_{0}^{\infty }{Q}_{i}{e}^{-\alpha t}dt=\frac{{Q}_{i}}{a}.$$

The indicator \(i\) can be used to characterise the capacity of a system to dampen the precipitation signal, and corresponds to the discharge generated by the influenced regime two days after the flood peak:

$$i=\frac{1-2\eta }{1+2\varepsilon }.$$

The “Recession curve analysis” tab allows to perform the recession curve analysis. The selection of recession curves is done with the cursor using the graphical interface. The retained recession curves appear in a recap table where they can be selected to apply Mangin’s model. The user has to define the inflexion point of the curves, based on his knowledge and experience. The recession model is calibrated with the nonlinear least squares nlsLM() function from the minpack.lm package (Elzhov et al. 2016), which minimises the squared sum of the residuals between observed and simulated discharges. The Root Mean Square Error (RMSE) between observed and simulated discharges is displayed below the recession model plot and helps to appreciate the performance of the model. Once a recession model is calibrated and validated, the indicators of functioning are calculated. They appear in the recap table when saved by the user.

The user can choose to remove spikes on the recession curves, which usually correspond to the system’s response to small precipitation events and can be considered as noise for the modelling. Recession curve analysis can be performed even if there are missing discharge values in the discharge time series.

Simple correlational and spectral analyses

Simple correlational and spectral analyses are used to study the frequency content of a signal (Massei et al. 2006) by calculating the autocorrelation function and the associated spectrum with a Fourier transform. Mangin (1984) first applied these signal analyses to karst hydrology and proposed three indicators of karst hydrological functioning: the memory effect, the regulation time and the cut-off frequency. These three indicators mainly help to characterise the inertia of a karst system and its capacity to filter unitary impulse (Larocque et al. 1998; Marsaud 1997; Massei et al. 2006). The autocorrelation \({r}_{k}\) and autocovariance \({C}_{k}\) functions are calculated as follows:

$${r}_{k}=\frac{{C}_{k}}{{C}_{0}},$$
$$C_{k} = \frac{1}{n}\mathop \sum \limits_{1}^{n - k} \left( {x_{i} - \overline{x}} \right)\left( {x_{i + k} - \overline{x}} \right),$$

with \(n\) the length of the series, \(m\) the maximum possible shift (usually \(m<n/3\)), \(k\) the shift (between 0 and \(m\)), \(\overline{x}\) the mean of the series, and \({x}_{i}\) and \({x}_{i+k}\) the ith and the (i + t)th elements of the series, respectively. The spectrum \({s}_{f}\) is derived from the autocorrelation function:

$${s}_{f}=2\left[1+2\sum_{k=1}^{m}{D}_{k}{r}_{k}cos\left(2\pi fk\right)\right],$$

with \(f\) the frequency (\(f=j/2m\) at daily time step) and \({D}_{k}\) a weighting function to ensure that \({s}_{f}\) is not biased (Mangin 1984):

$${D}_{k}=\frac{1+cos\left(\frac{\pi k}{m}\right)}{2}.$$

The correlogram is represented as the plot of \({r}_{k}\) against \(k\), and the spectrum of \(sf\) against \(f\). The memory effect corresponds to the value of \(k\) for a \({r}_{k}\) of 0.2, which can be read on the correlogram or calculated from the data. The regulation time corresponds to the value of the integral of the spectrum between \(0\) and \(+\infty\), i.e. the maximum value of the spectrum divided by 2.

The “Simple correlational and spectral analyses” tab displays the results of the simple correlational and spectral analyses. The user can define the cutting point \(m\), which correspond to the maximum shift possible for the calculation. The cut-off frequency is not displayed as it results from a visual, subjective assessment of the spectrum. Simple correlational and spectral analyses cannot be performed if there are any missing discharge values in the discharge time series. Appendix A presents a comparison of the results obtained by Mangin (1984) and those calculated with KarstID, although the databases are different as the ones used in Mangin (1984) are unavailable.

Analysis of classified discharges

The analysis of classified discharges provides information on flow dynamics within a system by analysing the distribution of the discharges at the spring. For most authors, classified discharges are equivalent to the empirical cumulative function of discharge (Stevanović 2015). Mangin (1971) proposed a variant based on the assumption that the distribution of the discharges can be approximated by a half-normal distribution. From this perspective, classified discharges refer to the quantile-quantile graph of observed discharges quantiles against quantiles of the half-normal distribution. Homogeneous hydrological functioning should be outlined by a straight line in the classified discharge plot. Interpretation of Mangin’s classified discharges thus consists of assessing the discontinuities of the curve and to relate them to changes in the hydrological functioning (e.g. activation of overflow springs, storage and release of water, leakage to another aquifer or miscalibration of the gauging station). The repartition function corresponding to the cumulative probability density regarding the standard normal distribution is calculated as follows:

$$P\left(X\le z\right)=\frac{1}{2}\left[1+erf\left(\frac{z}{\sqrt{2}}\right)\right].$$

For a half-Gaussian distribution:

$$P\left(X\le z\right)=erf\left(\frac{z}{\sqrt{2}}\right).$$

The “Analysis of classified discharges” tab displays the results of both analyses of classified discharges. No user action is needed for the calculation. Analyses of classified discharges can be performed even if there are missing discharge values in the discharge time series.

Classification

In KarstID, it is possible to characterise a karst system after the methodology proposed by Cinkus et al. (2021) and compare the results with 78 karst systems located worldwide, the discharge of which being extracted from different database (Banque Hydro (Banque Hydro 2021), SNO KARST (Jourde et al. 2018), WoKaS (World Karst Spring hydrograph; Olarinoye et al. 2020). This dataset covers a wide diversity of karst hydrological functioning (from very reactive to inertial responses) with data from 17 countries in 12 different climatic conditions, according to the Köppen–Geiger classification (Cinkus et al. 2021). The classification allows characterising karst systems hydrological functioning according to 6 classes based on 3 indicators of functioning (Table 1). The indicators are derived from the results of the analysis of at least two recession curves. The draining of the capacitive function \({\alpha }_{mean}\) is calculated by averaging the \(\alpha\) parameters of the recession models. The capacity of dynamic storage \({k}_{max}\) corresponds to the maximum value of \(k\) among the analysed recession curves. The variability of the hydrological functioning \(IR\) corresponds to the difference between the maximum and minimum of the \(i\) distribution:

Table 1 Indicator thresholds and corresponding characterisation of hydrological functioning for each class
$$IR={i}_{\mathrm{max}}-{i}_{\mathrm{min}}.$$

The “Classification” tab highlights the results obtained for the analysed karst system and summarises the values of the various indicators considered for the classification (Fig. 6). A flowchart thus indicates how the system is classified according to the values of the indicators of functioning. The associated text section (i) describes the hydrological functioning of the system according to its class, (ii) displays the indicators values and (iii) shows the distance to other classes. A 3D scatter plot shows the investigated system (highlighted in red) alongside 78 other karst systems, with each axis corresponding to one indicator of functioning. Results from statistical, recession curves, simple correlational and spectral analyses, as well as indicators of functioning of all 78 systems also appear in a recap table. By default, the systems in the table are ordered by increasing distance to the investigated system. The user can select a system in the table to highlight (in yellow) its position on the 3D scatter plot.

Test case

Fontaine de Vaucluse is a karst spring located Southeast of France. Its recharge area is estimated to be about 1160 km2 (Ollivier et al. 2019), resulting in one of the highest karst spring interannual mean discharge in Europe (17.5 m3 s−1 over the 1966–2018 period).

Fontaine de Vaucluse’s daily discharges over the 2013–2019 period (amounting to 1923 observations) are provided in KarstID as a test dataset. After importation using the “load test dataset” button, the hydrograph is loaded on the import page (Fig. 2). The statistical indicators and number of missing discharge values are displayed in the table below the plot. For this period, Fontaine de Vaucluse’s interannual mean discharge is about 15.9 m3 s−1 with no missing discharge values. The maximum observed discharge (about 67.1 m3 s−1) and the 90th quantile of observed discharges (about 32.0 m3 s−1) show that the discharge at the spring can be and stay very high during wet periods. The minimum observed discharge and the 10th quantile of observed discharges are relatively close (about 3.3 and 5.6 m3 s−1, respectively), highlighting a slow and consistent release of water from storage during dry periods. The coefficient of variation (72.3%) and SVC (5.7) are average and correspond to a “moderate” and "balanced" discharge variability, respectively (Flora 2004; Springer et al. 2008). The moderate discharge variability and the fact that the discharge can attain very high values can be related to a strong karstification of a part of the system. Using cross-correlation analyses between precipitation and discharge, Ollivier et al. (2015) found a transfer time between 1 and 6 days, indicating a somewhat reactive response of the system to precipitation events.

Fig. 2
figure 2

Import and statistical analyses tab. Left pane is dedicated to data import (section “Data import”). Right part presents the hydrograph and the results of statistical analyses (section “Statistical analyses”)

The autocorrelation function of discharge (Fig. 3) declines slowly and steadily, reaching 0 at 117 days. The memory effect and the regulation time are of about 56.0 and 44.0 days, respectively. These values testify of a significant capacity of filtration of the precipitation signal, which relates to the overall organization of flows in the system (Jeannin and Sauter 1998). The noticeable dampening of the recharge in the Fontaine de Vaucluse karst system can be related to the very large dimensions of its recharge area and unsaturated zone (Ollivier et al. 2019) or the characteristics of the Urgonian limestones (Carrière et al. 2016).

Fig. 3
figure 3

Simple correlational and spectral analyses tab. Left and right graphs present the autocorrelation function and the variance density spectrum, respectively (section “Simple correlational and spectral analyses”)

The analysis of classified discharges (Fig. 4) according to the methodology proposed by Mangin (1975) hints that there are flow properties changes beyond 20 m3 s−1 (less steep slope following the inflexion point). This discontinuity reflects the overflow threshold of the upper spring pool (Mangin 1975). The other inflexion point, occurring at 57.5 m3 s−1, can be related to several hydrological processes: activation of an overflow, temporary storage of water or leakage to another aquifer. It can be also due to a miscalibration of the gauging station or uncertainties on the water level-discharge calibration curve.

Fig. 4
figure 4

Analysis of classified discharges tab. Left and right graphs present the empirical cumulative function of discharge and the Mangin classified discharges, respectively (section “Analysis of classified discharges”)

Three recession curves were selected and applied Mangin’s recession model to identify relevant indicators (Fig. 5). The recession curves were chosen according to the following criteria to ensure a maximum relevance of the analysis and its results: (i) the peak discharge must be at least one tenth of the maximum discharge of the time series, (ii) there must be little or no untimely peaks during the recession, and (iii) the recession must include both influenced and non-influenced regimes. The inflexion points (n.b. “breakpoint” in the application) were defined manually based on expert knowledge and RMSE values. The indicators k, i, and \(\alpha\) are then calculated for each recession curve and appear in the recap table.

Fig. 5
figure 5

Recession curves and modelling tab. The left graph presents the studied time series and retained/selected recession curves. The right graph displays the selected recession with the Mangin recession model. The table shows the details of each recession curves and their corresponding indicators values (section “Recession curve analysis”)

Fontaine de Vaucluse is classified C6 with a \({k}_{max}\) of 0.403, an \({\alpha }_{mean}\) of 0.006 and an \(IR\) of 0.022 (Fig. 6). This class characterises a system with noticeable capacity of dynamic storage, slow draining of the capacitive function and low variability of hydrological functioning. Fontaine de Vaucluse is considered close to the C4 class with a distance of about 0.8% (normalised Euclidean distance in the three-dimensional criteria space), meaning that C4 characteristics can also be considered in the interpretation. It highlights that the system may have a capacity of dynamic storage and a draining of the capacitive function in-between C4 and C6 characteristics. Fontaine de Vaucluse’s class is also far from the classes C3 and C5 with distances of about 91.4%, which is due to the very low \(IR\) (0.022). The low variability of hydrological functioning and noticeable capacity of dynamic storage assigned to this system can be also related to the large extent of the recharge area and the thick unsaturated zone. Local variability of hydrological functioning may thus be mitigated as a consequence of the spatial averaging (indirectly inducing a strong filtration of the precipitation signal). The dampening of the rainfall-discharge relationship and the noticeable capacity of dynamic storage may also be related to particular hydrological behaviour of Urgonian limestones (Carrière et al. 2016). By looking at the page 1 of the database table, users can find other karst systems with similar hydrological functioning, e.g. Taillade, PR_0005 and IE_0018. These systems are highlighted in yellow on the 3D scatter plot, alongside the investigated system highlighted in red. Studying the characteristics of other similar systems may help to support the interpretation of the investigated system. Note that Fontaine de Vaucluse also appears in the table but here corresponds to the permanent entry of the database, which results from the analysis of the whole discharge time series.

Fig. 6
figure 6

Classification tab. The top part shows the classification flowchart and its associated text description: indicator values and distance to other classes (section “Classification”). The bottom part presents the classification of the 78 karst systems in a 3D scatter plot with the values of key indicators in the associated table

Conclusion

KarstID can be seen as a useful tool for gaining preliminary insights into the hydrological functioning of a karst system. The application supports different methods for analysing discharge time series and proposes a classification of karst systems hydrological functioning. It is also possible to compare the results with a database of 78 karst systems located worldwide. KarstID is free, open source, and available on a developer community platform, which allow potential interaction between users and developers for improving software efficiency or adding new features. Other than the installation of R and R packages, no programming skills are required to use the application. KarstID could therefore also be relevant for occasional users or educational purpose. Future developments of the application include (i) a continuous consideration of feature requests and bug reports to improve user experience, (ii) the proposition of additional recession models (Drogue 1972; Kullman 2000; Padilla et al. 1994), and (iii) the addition of other discharge time series analyses (e.g. wavelet analyses).