Introduction

Characterization of water quality time series and its accurate modeling are of prime concern for the environmental modelers as it helps in ensuring the quality standards of natural rivers. The characterization of the water quality signals can be performed by appropriate decomposition and spectral analysis procedures to extract the trend, periodicity and seasonality (Wang et al. 2014; Duan et al. 2018). The use of data-driven hybrid models has been gaining popularity in prediction of water quality parameters (Aghel et al. 2018; Najafzadeh et al. 2018), and improved understanding of the periodic structure of the time series may eventually help in developing decomposition-data-driven hybrid models for prediction of these parameters (Wang et al. 2014). Time series data of water quality parameters (WQPs) are nonlinear, non-stationary and multiscaling in characteristics (Rao and Hsu 2008). The traditional spectral analysis methods such as Fourier transform become inappropriate to analyze such series due to its trigonometric functional form and global character. Over the years, the alternatives such as wavelet transforms were proposed which can analyze the hydro-environmental datasets in multiple time scales and which may give more insight into the modeling efforts of such series. However, the ‘a priori’ selection of mother wavelet function and the optimal decomposition level are the two challenging issues while using wavelet transforms (Sang et al. 2016). Hilbert–Huang transform (HHT) propounded by Huang et al. (1998) is a popular tool for spectral characterization of non-stationary time series datasets, which can overcome such challenges and can identify the ‘true’ inherent time scales of hydro-environmental time series. The HHT involves two stages: (1) decomposition of a given time series into different orthogonal modes of specific periodic scale; (2) the Hilbert transform of the modes for extracting the time–frequency–amplitude characteristics of the series. The former step is performed by the operation, namely empirical mode decomposition (EMD) in which the decomposition of a time series is performed in a data-adaptive manner iteratively. The complete description of the EMD algorithm is available elsewhere (Huang et al. 1998; Rao and Hsu 2008). On employing EMD, it is believed that the original time series should be separable to a set of oscillatory modes each having distinctly different periodicity (representing time scales of the processes). However, the basic EMD algorithm may lead to modes with more than one frequency (popularly known as ‘mode mixing’). The time–frequency transformation of such modes may lead to physically meaningless frequencies (such as negative instantaneous frequencies), and it may lead to erroneous interpretations. To avoid the stated problems, a noise-assisted and ensemble averaged version of EMD called ensemble EMD (EEMD) was propounded by Wu and Huang (2005). But later on, it is noticed that the ensemble averaging in EEMD may lead to the lack of pure orthogonality property when applied for practical real field datasets. In order to rectify the issues related to EEMD, Torres et al. (2011) proposed an improved version of EEMD called complete empirical mode decomposition with adaptive noise (CEEMDAN). The orthogonal modes that resulted from one of the decomposition procedures are appropriate input for the time–frequency characterization of the signal.

The application of HHT for the characterization of hydro-environmental time series is becoming popular in recent days (Huang and Wu 2008; Huang et al. 2009a; Kuai and Tsai 2012; Massei and Fournier 2012; Adarsh and Janga Reddy 2016a, b). Even though water quality datasets possess fairly similar characteristics as hydro-meteorological datasets, only very limited studies were made to analyze such series employing the HHT (Franceschini and Tsai 2010; Naixia et al. 2011). Adarsh and Janga Reddy (2014) made a preliminary attempt for tracking the basic characteristics of the four WQPs electrical conductivity, temperature, magnesium and total hardness data from Elunuthimangalam station in Noyyal River in the state of Tamil Nadu in Southern India. However, in their study it is not attempted for an investigation into the association between the WQPs in multiple time scales and such an analysis can have more practical appeal. The characteristic analysis of water quality datasets has proven its multiscaling property (Franceschini and Tsai 2010) while for the estimation of WQPs the selection of input parameters is still made based on the conventional Pearson coefficient of correlation between time series. To get more insight into this issue, a decomposition procedure and a subsequent running correlation step can be conjunctively used. Wavelet-based procedures such as wavelet coherency may suit for such problems, but the difference in choice of mother wavelet function may change the final output and it needs smoothing operation, which in turn affects the time–frequency localization (Grinsted et al. 2004). Moreover, on applying a running correlation procedure fixing the size of sliding window is a major challenge to the modeler (Rodo and Rodriguez-Arias 2006). For circumventing such issues, Chen et al. (2010) propounded time-dependent intrinsic correlation (TDIC) technique in which the size of sliding window is set adaptively based on the instantaneous frequencies derived by HHT. This technique is found to be promising in finding the association between dissolved oxygen and temperature of maritime datasets (Huang and Schmitt 2014). The technique is becoming popular for analyzing the multiscale association between two correlated variables (Ismail et al. 2015; Calif et al. 2016; Derot et al. 2016; Adarsh and Janga Reddy 2018). The correlation analysis of water quality parameters is one of the popular research domains for water quality modelers. Starting with the simple Pearson or Spearman correlation, different methods have been used (Lehmann and Rode 2001; Ouyang et al. 2006; Prathumratana et al. 2008; Bhandari and Nayal 2008; Manache and Mecching 2008; Noori et al. 2010; Lee et al. 2016). In some of the recent studies, HHT is applied for investigating the association between different geophysical and environmental time series (Xiang et al. 2016), where the association between different modes is examined by considering the complete length of time series. Applying a running correlation procedure may give more insight into such association, for which the techniques like TDIC can be adopted.

This study investigates the multiscale running correlation properties among the different water quality parameters based on the observations made at Elunuthimangalam station in Noyyal River in Southern India, by the Central Water Commission (CWC), India. The dying industries are responsible for pollution of Noyyal River, and many studies have been reported on water quality analysis of Noyyal River (Govindarajalu 2003; Mohan and Vanalakshmi 2013; Rajkumar 2011; Srinivasan et al. 2014; Marimuthu et al. 2015; Babu et al. 2017). Even though many studies on development of water quality index, situation analysis, efficacy of treatment methods, etc., were performed on Noyyal River water quality parameters, none of the studies focused on their spectral characterization and multiscale correlation properties. The scientific contribution in the form of time–frequency characterization and multiscale correlation analysis between different pairs of water quality parameters, using an innovative framework, is presented in the study. The specific objectives of this study are: (1) to perform the spectral characterization of time series of six water quality parameters from Noyyal River in Southern India using HHT; (2) to examine the association between EC with other water quality parameters and the popular DO–BOD and DO–temperature links in different time scales by employing the TDIC method.

This research work is performed at TKM College of Engineering, Kollam, during 2017–2018 period, by collecting the water quality data of Elunuthimangalam station in Noyyal River published by the CWC, India.

Materials and methods

This section of the paper presents the theoretical background of the methods used in the study.

Hilbert–Huang Transform

Hilbert–Huang transform is one of the well-accepted techniques suitable for performing the spectral analysis time series that possess nonlinear and non-stationary characteristics. The technique comprises two stages: (1) multiscale disaggregation of time series to obtain zero-mean orthogonal modes, namely intrinsic mode functions (IMFs) and a final residue; (2) HT of the obtained IMF components. The empirical mode decomposition (EMD) stage involves: (1) detection of local extrema (maxima/minima) points and fitting of appropriate mathematical functions connecting these points; (2) estimation of the mean series of extrema and finding a difference series (by subtracting the mean series of extrema from the actual time series). These two steps are collectively called as ‘sifting’ and the ‘sifting’ procedure is continued iteratively till an IMF is evolved. An orthogonal mode is called as an IMF only if it is (a) a zero-mean series; (b) difference between total number of extrema points and the summation of number of local maxima and number of local minima points at the most equal to one. The procedure is continued iteratively till the final residue series is monotonically decreasing/increasing or it is single peaked (or with one trough). In this study, the improved version of EMD, namely the CEEMDAN, is followed for the orthogonal decomposition and steps of CEEMDAN algorithm are presented below (Adarsh and Janga Reddy 2016a):

  1. 1.

    Execute the EMD of the N realizations of artificial series \(X_{n} (t) = X(t) + \upsilon_{0} w_{n} (t)\) and estimate the initial mode of CEEMDAN \(\overline{\overline{{C_{1} (t)}}}\) by ensemble averaging

    $$\overline{\overline{{C_{1} (t)}}} = \frac{1}{N}\sum\limits_{n = 1}^{N} {C_{n} (t)},$$
    (1)

    where n = 1, 2, …, N is the index for realizations; X(t) is the time series signal; wn(t) is the white noise series;\(\upsilon_{\text{o}}\) is the noise parameter for the initial step; and Cn(t) is the first mode obtained by EMD of each realization.

  2. 2.

    For stage 1 (m = 1), calculate the initial residue as:

    $$r_{1} (t) = X(t) - \overline{\overline{{C_{1} (t)}}}$$
    (2)
  3. 3.

    Decompose the ensemble of artificial signals \(r_{1n} (t) = r_{1} (t) + \upsilon_{1} E_{1} (w_{n} (t))\), n =1, 2, …., N till the evolution of their first EMD mode happens. Then, estimate the subsequent mode \(\overline{\overline{{C_{2} (t)}}}\) as

    $$\overline{\overline{{C_{2} (t)}}} = \frac{1}{N}\sum\limits_{n = 1}^{N} {E_{1} [r_{1} (t) + \upsilon_{1} E_{1} (w_{n} (t)]}$$
    (3)

    where \(\upsilon_{1}\) is noise parameter for first stage (m = 1); the operator Em(.) represents the development of the mth mode by EMD.

  4. 4.

    Compute the mth residue as

    $$r_{m} (t) = r_{m - 1} (t) - \overline{\overline{{C_{m} (t)}}} \quad {\text{for}}\;m = 2,3, \ldots ,M$$
    (4)

    where \(\overline{\overline{{C_{m} (t)}}}\) is the IMFs obtained by CEEMDAN.

  5. 5.

    Estimate the first EMD mode of \(r_{m} (t) + \upsilon_{m} E_{m} (w_{n} (t))\), and define the (m + 1)th mode by CEEMDAN as

    $$\overline{\overline{{C_{m + 1} (t)}}} = \frac{1}{N}\sum\limits_{n = 1}^{N} {E_{1} [r_{m} (t) + \upsilon_{m} E_{m} (w_{n} (t)]}$$
    (5)
  6. 6.

    Transfer the control to step (4) for next m.

    The steps 4–6 are to be repeated as a loop till the residue obtained is monotonically increasing/decreasing or having a single extrema.

    $$r_{M} (t) = X(t) - \sum\limits_{m = 1}^{M} {\overline{\overline{{C_{m} (t)}}} } .$$
    (6)

    In the above algorithm, C refers to an IMF component.

Orthogonality and statistical significance of modes

The IMFs from the decomposition of signal help in better understanding of the signal, and the higher the orthogonality, the less the leakage of information (Klionski et al. 2008). To measure the efficiency of the IMFs, the orthogonality of the decomposition should also be checked. The index of orthogonality among different pairs of IMFs can be computed as

$$O_{ij} = \frac{1}{{N_{t} }}\sum\limits_{t} {\left\{ {\frac{{C_{i} (t).C_{j} (t)}}{{\left[ {C_{i}^{2} (t) + C_{j}^{2} (t)} \right]}}} \right\}} \quad i,j = 1,2, \ldots ,M;t = 1, \, 2, \ldots ,N_{t}$$
(7)

where Ci(t) and Cj(t) are ith and jth IMFs; M is the number of IMFs; Nt is the length of signal. If the decomposition is orthogonal, then the value of Oij should be zero (Zhang and Gai 2006). In practice, the value smaller than 0.1 is acceptable (Wu et al. 2011). Another possible mean to assess the orthogonality among different modes obtained by the decomposition is the computation of the overall index of orthogonality (IO) defined by Huang et al. (1998) as follows:

$${\text{IO}} = \frac{1}{{N_{t} }}\sum\limits_{t} {\frac{1}{{[X(t)]^{2} }}} \left( {\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{M} {C_{i} (t)C_{j} (t)} } } \right)$$
(8)

In this case also, the lower value of IO indicates the closeness to orthogonal behavior. A set of perfect orthogonal IMF components will give zero, but in practice, the value of IO smaller than 0.1 is acceptable (Wu et al. 2011).

To identify the most significant and influential components to model the process, statistical significance test (SST) propounded by Wu and Huang (2004) is helpful. The SST involves the following steps:

(1) estimate the energy (squared sum of the signal) and mean period of IMFs of the time series signal X(t); (2) normalize the energy of IMFs with respect to that of a reference IMF (normally the first IMF); (3) generate a time series of white noise by Monte Carlo simulations, perform its decomposition and estimate the confidence bands of the spread function of white noise series at a specified level of significance (normally at 5% level); (4) locate the points in a 2D plane (mean period of IMFs in X-axis and normalized energy in Y-axis) along with the confidence band of white noise; (5) compare the energy level of different IMFs with the confidence band. The IMFs that have their energy level located above the upper confidence line of the white noise series are considered to be statistically significant at the specified significance level.

The Hilbert transformation of IMFs obtained by the CEEMDAN algorithm is helpful to study the time–frequency spectral characterization of the selected signal. The theoretical details of HT are elaborated in Rao and Hsu (2008). The conventional Hilbert Transform algorithm may sometimes result in physically meaningless instantaneous frequencies (IFs) (say negative frequency) or with mathematical incorrectness. To rectify such problems, Huang et al. (2009b) proposed a normalization scheme for HT with direct quadrature operation (NHT-DQ). The ‘normalization’ scheme of HT comprises the following steps: (1) identification of local peaks (maxima points) of IMF components; (2) fitting of spline function connecting the maxima points; (3) term-by-term division of IMF series by the spline series, etc. The last step is iteratively continued till all the normalized maxima points become unity. The resulting series is the frequency-modulated (FM) part of the signal, which helps in estimating amplitude-modulated (AM) part of the signal. This procedure involving normalization process and subsequent Hilbert transformation of AM part of the signal is known as the normalized Hilbert Transform (NHT). The phase angle is computed as inverse tangent of the ratio of Hilbert-transformed series of an IMF and the IMF component, and the implementation scheme which uses ‘arccosine’ in the place of ‘arctan’ in the determination of phase angles is called as ‘direct quadrature (DQ).’

Multiscale correlation analysis of water quality parameters

As the time series of WQPs holds multiscaling property, a scale-specific dynamic (running) correlation procedure is more suitable to investigate the association between two correlated parameters. First the decomposition of the time series pairs is performed using CEEMDAN for the scale separation. Subsequently, the correlation between the modes is found to estimate the linear association between the signals at different time scales. Then, the running correlation between the signals at a specific time scale is performed using the HHT-based method, namely time-dependent intrinsic correlation (TDIC). Here, the running correlation analysis between the two correlated signals is performed by selecting the time window adaptively by ensuring stationarity of the data within the sliding window. Here, for fixing the window size, instantaneous frequencies computed by the HHT are helpful. The complete framework of multiscale correlation analysis of WQPs is provided in Fig. 1.

Fig. 1
figure 1

Framework of multiscale correlation analysis of water quality parameters using TDIC method

Study area and data

Noyyal River originates in the Velliangiri hills in Coimbatore and drains into the river Cauvery at Noyyal in Erode. The river is of 173 km long with a basin area of around 3510 km2. It flows through the districts of Coimbatore, Tiruppur, Erode and Karur in the state of Tamil Nadu in Southern India. Textile industry is dominant in these districts with numerous units working in knitting, dyeing/bleaching, processing, manufacturing operations located in the banks of the river. In textile processing, bleaching and dyeing are the two major activities that require a large amount of water and most of the water used by the associated units is discharged as effluent after processing, which leads to environmental havoc as the people living at the downstream reaches are affected by this pollution. In this study, HHT method is used for the spectral and multiscale correlation analysis of six WQPs of Elunuthimangalam (EMangalam) water quality monitoring station (11°01′54″ 77°53′15″) in Erode, located in Noyyal River (in Noyyal River Basin) in the state of Tamil Nadu in Southern India. Figure 2 presents the location map of the Elunuthimangalam station. In India, there are nearly 400 water quality monitoring stations operated by the CWC, India, which collect and analyze more than 65 water quality parameters at daily/biweekly/monthly scales. The acceptability of river water for drinking and other purposes is fixed with regard to certain permissible limits set by the Bureau of Indian Standards (BIS). By comparing the values of numerous WQPs collected at different monitoring stations with the respective permissible limits, the CWC recognized certain water quality ‘hotspots’ in India, in 2011. EMangalam is one such hotspot, where the highest number of WQPs fails to meet the quality standards specified by the BIS (CWC 2011). In this study, three physical parameters [temperature (T), pH and electrical conductivity (EC)] and the dissolved oxygen (DO), total hardness (TH) and biochemical oxygen demand (BOD) are considered for the analysis. In this study, the monthly data of the above six WQPs for November 2004–May 2012 period (91 data points) collected from http://www.india-wris.nrsc.gov.in/ are considered for analysis.

Fig. 2
figure 2

Location map of Elunuthimangalam station in Noyyal River Basin, Tamil Nadu, India

Results and discussion

The time series of six WQPs considered for the study are decomposed to multiple time scales by invoking the CEEMDAN algorithm after setting the standard deviation of noise parameter as 0.2 and number of realizations as 500 (Antico et al. 2014). The authors used the MATLAB program provided by Patrick Flandrin (http://perso.ens-lyon.fr/patrick.flandrin/emd.html) with appropriate modifications, for the implementation of CEEMDAN algorithm. The time series of different parameters and the IMFs and residue obtained by CEEMDAN decomposition are provided in Fig. 3. It is noticed that all the six time series are decomposable to six IMFs and one residue component. The mean period of oscillatory modes calculated by zero-crossing method (Huang et al. 2009b) and the % variability explained by each mode are presented in Table 1.

Fig. 3
figure 3

Decomposition of different water quality parameters using CEEMDAN: a electrical conductivity; b pH; c temperature; d total hardness; e dissolved oxygen; f biochemical oxygen demand

Table 1 Mean period (T in months) of oscillatory modes of different water quality parameters obtained by CEEMDAN method and the percentage variability explained (V) by them

To check the orthogonality, two methods were used in this study. First, the indices of orthogonality (Oij) between all possible pairs of IMFs obtained by the CEEMDAN-based and EEMD-based decomposition are presented in Fig. 4. In both cases, even the maximum values are of the order of 5 × 10−3, which is substantially less than 0.1. Therefore, to quantify the relative performance of CEEMDAN and EEMD, further the orthogonality is estimated by determining the IO.

Fig. 4
figure 4

The indices of orthogonality among different pairs of IMFs by CEEMDAN and EEMD of different water quality time series: a electrical conductivity; b pH; c temperature; d total hardness; e DO; f BOD. The upper panel shows the results based on CEEMDAN, and the lower panel shows the results based on EEMD

The IO values by CEEMDAN-based and EEMD-based decomposition are summarized in Table 2. The IO values show that the overall orthogonality property is better for the CEEMDAN method when compared with that by EEMD. This may be because the ensemble averaged modes may not always be satisfying the properties of an IMF and when such IMFs are subjected to Hilbert transform, they may lead to negative values of instantaneous frequency which lacks physical meaning.

Table 2 The number of modes and IO by EEMD and CEEMDAN methods

The SST of IMF components of all the six WQP time series is performed, and results are provided in Fig. 5. The SST of different parameters showed that the IMF5 is statistically significant for all parameters, as the energy level of this component falls above the upper significance line and such components may be influential upon the predictability efforts of the respective parameters (Lee and Ouarda 2011). However, it is to be noted that their mean period is different for different cases (2–4 years) and in general it can be inferred that modes with periods of 2–4-year time scale are influential.

Fig. 5
figure 5

Statistical significance test of IMF components of different parameters: a electrical conductivity; b pH; c temperature; d TH; e DO; f BOD. Solid line represents the upper significance line at 5% significance level, and dashed line represents the upper significance line at 1% significance level

Results presented in Table 1 show that the IMF4 possesses annual periodicity (~ 11–13 months), while the first three IMFs have intra-annual periodicity (of seasonal scales) and the fifth and sixth modes are with inter annual periodicity, for all the six time series. The residue infers the long-term trend of the respective time series signal, and for the datasets used in the study, the trend of different water quality parameters except temperature and BOD is noted to be decreasing. Then, the Hilbert transform of IMFs of all the six WQP time series is carried out by invoking the NHT-DQ scheme. The instantaneous frequency trajectories obtained by the HT of IMFs of different parameters are presented in Fig. 6.

Fig. 6
figure 6

Time–frequency spectra of different IMFs obtained by the decomposition of different water quality time series of Noyyal River: a electrical conductivity; b pH; c temperature; d total hardness; e dissolved oxygen; f BOD. The magnitude of the amplitude can be discerned from the provided color scale

In Fig. 6, the time–frequency spectra corresponding to different process scales are presented and the amplitude values of WQPs are discerned by distinct color scheme. In general, it is noticed that less frequent physical factors (natural events) and frequent occurrences (like human activities) have a role on variability of WQPs of the river. Intermittent frequency is noticed in the spectra of high-frequency modes (represented by dots/break in the spectra) while continuous regions are noticed in the spectra of low-frequency modes, which is a typical signature of nonlinear processes. The intermittency could also be linked with the possible fractal behavior of different WQP time series which could be confirmed by executing classical procedures for the detection of fractality (Koirala et al. 2011; Kuai and Tsai 2012; Parmar and Bhardwaj 2013). There exists high-frequency modulation in the spectra, which indicates non-stationarity of different signals. The variability of the frequency suggests that these parameters are the results of natural events or anthropogenic activities that might repeat at different intervals of time (Franceschini and Tsai 2010). Further it is noticed that the dominant frequency (where a concentration of the highest amplitude occurs) is not constant, but varying with time.

Then, by integrating the marginal spectra of different IMFs along the time domain, the marginal Hilbert spectrum (MHS) is prepared. A spectral peak in MHS identified at a specified frequency ω infers that here is a higher probability that physical process with that frequency influences the variability of the time series. From the time–frequency representations of different IMFs, the mean MHS of all the six WQPs is prepared and presented as shown in Fig. 7. From Fig. 7, it is noted that the spectral peaks of all the series except pH are at frequencies greater than 0.085 (i.e., at intra-annual or high frequency range).

Fig. 7
figure 7

Mean marginal spectrum of different water quality time series of Noyyal River: a electrical conductivity; b pH; c temperature; d total hardness; e dissolved oxygen; f biochemical oxygen demand

Individual or combined IMFs and their associated spectra can provide insights into the physical characteristics of the system under study that are not easily identifiable by other methods (Franceschini and Tsai 2010). In addition to this, representing each IMF using the Hilbert spectrum would allow quantifying the direct influence of a given factor on the WQP. In the case of the analyzed series, it might provide some insights into the origins of the high concentration spikes. The frequency in the marginal spectrum indicates the likelihood that an oscillation with such a frequency exists (Huang et al. 1998). Accordingly, from the different panels of Fig. 7 the dominant frequency (spikes) can be identified. For the EC series, the prominent peak is at ~ 0.18 (cycles/month), which corresponds to a periodicity of 6 months approximately. Two more spikes at diverse ranges (0.37 cycles/month and 0.09 cycles/month) are also noticed from the spectrum, which ranges from seasonal (3 months) to annual scale approximately. The prominent peaks of temperature ranges from 0.115 to 0.02 cycles per month (corresponding to 9–50 years variation), which are also in line with the global warming cycle as reported in earlier studies (Franceschini and Tsai 2010). For the spectra of the pH series, the prominent peaks are ~ 0.37 cycles/month and 0.41 cycle/month, which corresponds to 2.5 months (seasonal time scales). Similarly, two spikes at 0.23 cycles/month and 0.3 cycles/month are noticed in the spectra of hardness; and one spike at ~ 0.13 cycle/month is noticed for spectra of BOD. The dominant variation in the dissolved oxygen is also at seasonal scale (2.5–4.5 months approximately). In general, several predominant frequencies and corresponding time scales can be identified from the Hilbert spectrum of the different series and there is a higher likelihood of the highest amplitude at seasonal/intra-annual time scales, which can be associated with human interferences such as disposal of pollutants into the river system, which adds the changes in the concentration of the parameters. Dying industries are polluting Noyyal River which flows through Tiruppur, Erode and Karur districts in the state of Tamil Nadu by releasing their effluents into the river even without any treatment (Marimuthu et al. 2015). Some of the storage structures like Orathupalayam Dam have become a tank holding effluent (where TDS level in the water is above 9000 ppm) and release water after every rainfall, effectively polluting the down river villages in the near-by districts. Many studies performed the water quality analysis and explicitly highlighted that the deterioration of water quality of Noyyal River is due to human activities (Mohan and Vanalakshmi 2013). Also it is worth mentioning that these studies have been performed on the span of the data chosen for this study (Magudeswaran 2004; Magudeswaran and Ramachandran 2007; Shashi Prabha 2010; Rajkumar and Nagan 2010a, b, 2011; Jayanth Sarathi et al. 2011; Rajkumar 2011; Kumar 2012; Mohan and Vanalakshmi 2013). These studies are in the form of water quality analysis, investigation of alternative practices of wastewater treatment or situation analysis to summarize the current state of knowledge regarding water resources management in the Noyyal sub-basin. One of the recent studies also says that more than 700 small-to-medium scale dyeing units have their presence in Tiruppur and Erode regions (Babu et al. 2017). Despite the establishment of treatment plants as per the release of many strict government regulations and court orders in the last decade, many illegal small-scale home-run units continue to pollute the river. The observations and the reports of all these studies confirm that the anthropogenic activities are dominant in the area and it is found that the Hilbert spectrum successfully captured such interventions.

Multiscale local correlation analysis of water quality parameters

The water quality characters of EMangalam station show that the concentration of EC exceeds drastically over the permissible limit (59% for monsoon and 46% for non-monsoon periods) (CWC 2011). Therefore, in the present work the association of different WQPs with EC is investigated in multiple time scales. The total hardness and two important parameters pertaining to aquatic system DO and BOD are also considered in this study for the multiscale correlation analysis. First, the association of different parameters with EC is considered, and then, the two most debated links such as DO–temperature and DO–BOD are investigated. In this exercise, first the linear correlation between decomposed modes of EC with that of different parameters is considered. The resulting correlation matrices are provided in Table 3.

Table 3 Cross-correlation analysis between modes of EC with modes of different water quality parameters

From Table 3, it is noticed that overall correlation (diagonal values of the correlation matrices) of TH with EC and pH with EC are consistent at different time scales. In rest of the cases, the nature of association (positive/negative correlation) differs at different time scales. Strong correlation is noted in the association between TH and EC at different time scales, but the degree of association is not alike for the rest of the cases. On examining the cases of DO–temperature, overall correlation is found to be low, but the nature is consistent at different time scales, while in the link of BOD the nature is different in different time scales. But it is to be noted that the association is based on the linear correlation considering the complete dataset. In all cases, very strong correlation is noticed between the respective residue components (more than 0.98 in magnitude). It is to be noted that for the links involving temperature, the correlation is less (~ 0.66) because of the inherent character of temperature records available.

The association between the different modes of EC with that of different parameters at the corresponding process scales can also be depicted by comparison of modes. As the intra-annual to seasonal association is more perceptible in the EC–TH, EC–BOD and EC–DO links, the comparison plots of these pairs are provided in Figs. 8, 9 and 10. From Figs. 8, 9 and 10, it is clear that the phase-locked (in-phase) anti-phase and positive (or negative) associations are not prevailing over the complete length of the series; instead, it is localized in most of the cases. The overall Pearson correlation between original series of EC and that of pH, temperature, total hardness, BOD and DO is 0.156, − 0.049, 0.163, − 0.059 and 0.066, respectively. Figures 8, 9 and 10 show that one cannot ignore the reasonably good correlation at the localized time spells and in different process scales even though the overall correlation could be very low. This low correlation may be due to the fact that both positive and negative associations along the time domain could cancel each other. That is, the cycles of opposite nature prevailing in different time spells could mutually cancel each other.

Fig. 8
figure 8

Comparison plots between the modes of EC with that of TH

Fig. 9
figure 9

Comparison plots between the modes of EC with that of BOD

Fig. 10
figure 10

Comparison plots between the modes of EC with that of DO

The long-term association between water quality parameters is examined further by comparing the zero-mean residues of different cases (shown in Fig. 11). The comparison showed that DO–EC and TH–EC relationships are positive and the evolution is similar with a minor phase shift for the latter case (as the crossing of zero mean not occurs at the same year for TH and EC). The BOD–EC link is negative, but similar evolution of the trend is noticed in this case. The BOD–DO link is opposing and of similar nature of evolution, while the temperature–DO link is not showing any similarity in their long-term associations and this may be because of the character of temperature dataset which exhibited a change in ~ 2006, which may be due to anthropogenic effect like the disposal of cold industrial water.

Fig. 11
figure 11

Comparison between the residue components of different parameters with EC and DO. ae show the comparison of residues of different parameters with that of EC. f, g show the comparison of residues of different parameters with DO

To get more insight into the local correlations in the temporal domain and over the time scales, dynamic correlation estimation procedure is preferable, for which the TDIC method is chosen. The authors used the MATLAB program provided by Yongxiang Huang (https://zenodo.org/record/9748#.XBhUF2lS-Uk) with appropriate modifications, for determination of the TDIC values of different WQP–EC pairs considered in this study. After performing the TDIC analysis, the TDIC plot is developed. The entry in the X-axis of the TDIC plot is the instants corresponding to the mid-point of each sliding window, and that in the Y-axis is the sliding window size. The shape of the TDIC plot is triangular, and the correlation coefficient at the apex of the triangle is the correlation coefficient between the two IMFs (at same process scale) on considering the complete data, if the length of dataset is chosen as maximum window size (Chen et al. 2010). The TDIC plots of WQP–EC pairs are shown in Fig. 12.

Fig. 12
figure 12

Results of TDIC analysis between EC with different water quality parameters. The white space in the plots depicts that the correlation is not significant at 5% level

From Fig. 12, it is clear that the pH–EC association is fairly consistent and positive in most of the time scales, but there are some localized pockets of time spells where the association is negative (since 2008 in IMF2 and IMF4). The EC–temperature link is not of unique character at different time scales. In some of the short-term intra-annual scales (IMF1 and IMF2), the association is long range and strongly negative, while at annual scale range it is strongly positive. The TH–EC association is found to be of the same character at different time scales and over the temporal domain, even though the strength of association differs. The DO–EC relation is dominantly negative at IMF2 and IMF3; positive at annual scale, but along the time domain a couple of ‘switchovers’ in the nature of association is noticed. That is, the relationship is not unique over the complete span of dataset under consideration. The BOD–EC relationship is consistent and strongly positive at scales IMF2–IMF5, but it is associated with rich dynamics (the presence of many couplets of contrasting correlations) in the time domain for the high-frequency mode IMF1. Because of the contrasting behavior, an opposing nature of association is noticed between the EC–DO and EC–BOD links in most of the time scales, as anticipated. From different plots, it is also noticed that at annual scale (IMF4), the associations between most of the pairs are found to be strongly positive.

The well-debated DO–BOD and DO–temperature links are also analyzed by TDIC, and the results are presented in Fig. 13. The well-debated negative relationship between DO and temperature is not valid at all time scales and over the complete time domain. The characteristics of contracting correlations are noticed in many localized time spells, at different time scales, which have also been noted earlier by Huang and Schmitt (2014) in databases from marine environment. On considering the association between BOD and DO also, similar behavior is noticeable, they are positively associated in IMF3 and IMF4 (close to annual scales), but in the rest of the cases the association is dominantly negative. In each of the cases, it is hard to provide a physical explanation for such switchovers, but in a broad sense the pollutant disposal, environmental factors such as discharge variations in the river could be the reason.

Fig. 13
figure 13

Results of TDIC analysis of DO with temperature and BOD. The upper panels show the DO–temperature links. Lower panels show the DO–BOD links. The white space in the plots depict that the correlation is not significant at 5% level

It is to be noted that the river discharge (Q) is also one of the key parameters which decides the fate of water quality of the river system. Therefore, the monthly discharge of the corresponding period (2004–2012) is collected, and the decomposition and HT of the modes are performed. The modes and the corresponding frequency spectra of discharge time series are presented in Fig. 14. Subsequently, TDIC analysis between discharge and different water quality parameters is performed and the results are presented in Fig. 15. From Fig. 15, the dominant negative association between discharge and different quality parameters is evident, except for discharge–DO and discharge–temperature at IMF2 and discharge–pH at IMF5, but there could be many localized reversals in the nature of association. Because of the contrasting behavior, an opposing nature of association is noticed between the discharge–DO and discharge–BOD links at all the time scales.

Fig. 14
figure 14

Results of CEEMDAN of discharge time series of Noyyal River at Elunuthimangalam station and corresponding instantaneous spectra. The top left panel shows the discharge time series, and the top right panel shows the corresponding residue

Fig. 15
figure 15

Results of TDIC analysis between discharge with different water quality parameters. The white space in the plots depicts that the correlation is not significant at 5% level

This study presented a multiscale framework for investigating the association between the different pairs of water quality parameters by considering the datasets from a water quality hotspot in Southern India. The framework of HHT-based multiscale running correlation analysis presented in this study is general one, and it can be applied for analyzing the association between any two correlated non-stationary time series, for example, the link between DO and temperature in a marine environment (Huang and Schmitt 2014), the link between marine temperature time series of different locations (Ismail et al. 2015; Derot et al. 2016), streamflow–sediment link (Adarsh and Janga Reddy 2016a), teleconnection between climatic variable and hydrological variable (Adarsh and Janga Reddy 2016b). Through this study, it is proven that in a similar way the presented framework can be successfully applied for any pair of correlated stream water quality datasets. This approach provides a better understanding of the processes and physical events governing the measured pollutant concentrations in the river systems. The analysis helps in identification of the particular processes or events that each IMF represents by comparison of periodicity or the amplitudes of the spectra derived from the IMFs. The TDIC analysis breaks the notion of uniqueness in the nature of association between different water quality parameters; instead, it could be different at different time scales and in time domain. Also such observations are problem specific (for example in the TH–EC link the relation is similar at all time scales), and such differences may not exist universally and it may also be data specific. But, on capturing such association the predictability efforts of the water quality parameters can be improved by following hybrid modeling frameworks which capture scale-specific information (Hu and Si 2013; Liu et al. 2016; Adarsh et al. 2017; Adarsh and Janga Reddy 2018, 2019) which may subsequently help in the risk assessment of the water quality parameters of the study area. As the study area is dominated by the textile industry, the selection and understanding characteristics of typical textile factory discharges may give more insight into the problem (Yaseen and Scholz 2018), provided long and reliable datasets are available to proceed with the predictive modeling.

Conclusion

This study performed the HHT analysis of six important water quality parameters from EMangalam monitoring station of Noyyal River in Southern India and proposed an alternate method for investigating the correlation between prominent parameters in multiple time scales. The specific conclusions of the study are:

  • The HHT analysis is successful in capturing the anthropogenic interventions at the Noyyal River in the form of frequent pollutant disposals

  • From the HHT-based multiscale correlation analysis, it can be concluded that the nature of linear association (positive or negative) of different parameters with EC varies at different time scales except for total hardness (TH)–EC and pH–EC links

  • The nature and strength of association among different pairwise combinations of water quality parameters vary in the different time scales and over the time domain with the exception for total hardness–EC link

  • In contrary to the well-known DO–temperature and DO–BOD relationships, the nature of associations can be different at different process scales and at localized spells in the time domain; capturing such associations may provide better insight into the modeling of these parameters in the river systems

  • This enhanced understanding of the behavior of mutual associations among the water quality parameters of the Noyyal River in multiple time scales may finally help us in improving the predictability efforts and subsequent risk assessment.