Introduction

Risk-based hydrology and hydraulic water-related queries, i.e. engineering-based flood defence infrastructure designs or the non-structural assessments (i.e. flood control pressure or flood diversion practices) often demanding an accurate estimation of flood exceedance probability or flow quantiles through extrapolating long-term catchments, have streamflow characteristics in the light of probability distribution framework (Bobee 1975; Rao 1980; Singh and Singh 1988; Chow et al. 1988; Bobee and Ashkar 1989; Adamowski, 1989; Adamowski and Feluch 1990; Cunnane 1987, 1988, 1989; Adamowski and Feluch 1990; Choulakian et al. 1990; Bras 1990; Yue et al. 1999; Yue 1999, 2000; Rao and Hameed 2000; Shiau 2003; Salvadori 2004; Sraj et al. 2014; Serinaldi 2015; Sarhadi et al. 2016). In actuality, the higher degree of uncertainty and complexness usually distributed over the hydrological or flood characteristics did not facilitate for their exact or accurate prediction in the light of any physical or deterministic framework but could be demanding to establish a probabilistic framework (Sen 1999; Requena et al. 2016). Therefore, several mathematical or statistical strategies are often motivated from past decades towards the incorporations of probability distribution framework of the hydroclimatic or flood observations series (i.e. Hosking et al. 1985; Adamowski 1985; Silverman 1986; Bardsley 1988; Adamowski and Labatiuk 1987; Bobee and Rasmussen 1994; Goel et al. 1998; Yue et al. 2001; Yue 2001a, 2001b; Coles 2001; Kartz et al. 2002 and references therein). Flood signifies for an inundation attributed through an overflowing of river water from their banks, due to an abnormality in hydrometeorological consequences, such as intensive precipitation structure (Reddy and Ganguli 2012a). Flood frequency analysis or FFA statistically defines the inter-association between extreme event quantiles and their non-exceedance probabilities by fitting a probability distribution functions or pdf (i.e. either flood peak or volume as functions with their non-exceedance probabilities) (Yue 1999; Yue 2001a; Yue and Rasmussen 2002; Yue and Wang 2004; Xu et al. 2015).

Hydrometeorological stimulations either via the extension of historical rainfall samples, in order to recognize catchments profile, or through joint probability simulations in conjunction with univariate or multivariate statistical framework over the variables of interest are the two distinct ways to address the risk assessments for an extreme flood scenario. Numerous attempts, i.e. Calver and Lamb (1995), Boughton et al. (2002), Blazkova and Beven (2004) and Lawrence et al. (2014), retrieved flood frequency curve through integrating hydrological models in conjunction with probabilistic rainfall models for demonstrating the catchment’s rainfall-runoff profile. Such incorporation usually adapted conventional based lumped and distributed models or via continuous or event-based hydroclimatic simulations. But the requirement of longer computational analysis due to demands of high spatial and temporal resolutions, in order to reveal a satisfactory demonstration of flood stimulation procedure, would attribute for an ineffective characterization of catchments behaviour (Requena et al. 2016). Similarly, few other approaches in flood analysis, which is based on data-driven method of flood predictions using stochastic or time series model, i.e. AR (or autoregressive), ARMA (or autoregressive moving average) and ARIMA (or autoregressive integrated moving average), such as generation of synthetic flow series and their forecasting using ARMA model (O'Connel 1977), time series modelling of annual maximum observation using ARIMA model (Shakeel et al. 1993), forecasting of rainfall and runoff using stochastic time series modelling using AR model (Sherring et al. 2009), generation and forecasting of annual inflow observations using ARMA model (Vijayakumar and Vennila 2016) demonstrate the efficacy of ARIMA model for flood forecasting using the annual streamflow (i.e. peak and maximum discharge) observations for Karkheh River basin in the west of Iran (Machekposhti et al. 2017). Besides this literature, few other studies, such as Ghanbarpour et al. (2010), Tian et al. (2011), Huang et al. (2016) and references therein, explored the efficacy of stochastic time series modelling approaches for solving several water resources problems. ‘Development of few ML algorithms in the flood prediction’ discussed over the developments of few other machine learning (ML) algorithms in the field of flood modelling and their forecasting. Overall, in the above demonstrations, their concern is only limited to target single flood vector, i.e. annual peak or maximum discharge values. Flood is a trivariate stochastic consequences, usually characterized completely through its intercorrelated random vectors, i.e. flood peak discharge, volume and duration of flood hydrograph (Zhang and Singh 2007b; Veronika and Halmova 2014). Thus, it could limit the reliability of univariate design estimations or return periods, which would be insignificant to providing full screen of flood hydrograph and might be attributes for the underestimations (i.e. low design value might increase the risk of failure) or overestimations (i.e. increasing hydraulic construction cost) of hydrologic risk (Grimaldi and Serinaldi 2006; Serinaldi and Grimaldi 2007; Genest et al. 2007; Grimaldi et al. 2013; Fan and Zheng 2016). Such that, flood events with a peak flow of 100-year recursion interval could be less intensive and damaging than the same events described based on the joint occurrence between multiple flood vectors, i.e. between peak-volumes or peak-durations or volume-durations. In actuality, the potential damage that could likely be a function of several associated random variables as well as ignorance of spatial dependency among multiple flood vectors might be attributes for underestimation of uncertainty distributed over the estimated design quantiles and thus often demand more flood variables through the joint distributional assessments for revealing much insightful understanding of flood structure (Renard and Lang 2007; Graler et al. 2013; Vernieuwe et al. 2015). Especially, from the prospects of hydraulic designing procedures where an accountability of multivariate design parameters could be a feasible desire based on their multivariate exceedance probabilities (Brunner et al. 2016; Reddy and Ganguli 2013).

In multivariate risk statistics, return periods are usually associated with certain exceedance probabilities that will demonstrate the risk of extreme scenario through multiple aspects, i.e. based on the joint, conditional or Kendall’s distribution relation (Shiau 2003; Salvadori 2004; Zhang and Singh 2006; Kao and Govindaraju 2008; Salvadori et al. 2011; Salvadori et al. 2015; Serinaldi 2015; Tosunoglu and Kisi 2016). According to Salvadori et al. (2011), under the hydraulic design facilities, selection of appropriate concurrence probabilities is a function of undertaken structure and consequences of its failure. The selection of return period is not an arbitrary process which solely based on the nature of work assessments that will further decide the importance of design vectors into considerations (Salvadori 2004; Serinaldi 2015; Brunner et al. 2016). Multivariate constructions usually comprise a combination of the joint probability density functions or pdfs and joint cumulative distribution functions or cdfs, where cdf statistically defines the probability of event ‘X’ less than their pre-defined critical or threshold values ‘x’, i.e. P(X ≤ x) (Yue and Rasmussen 2002; Veronika and Halmova, 2013). This literature intended towards overviewing the practices of copula-based stochastically synthetizations of flood consequences in the light of multivariate probability distribution framework. In this review, different methodological attempts in the light of bivariate and trivariate copula distribution analysis are pointed for tackling multivariate design problems or estimating design variable quantiles under different notations of return periods. Second section pointed the different attempts and strategies towards the incorporations of univariate frequency analysis or defining the marginal distribution structure which often is a mandatory pre-requisite desire in copula distribution framework. Figure 1 illustrates research methodological flowchart of the literature review. Distinguished varieties of the one-dimensional parametric functions and also efficacy of non-parametric distributions for the treatment of hydroclimatic samples are reviewed in ‘Flood frequency analysis via one-dimensional probability distribution framework or approximation of marginal distributions’ under two different sub-sections. The necessity of establishing multivariate joint distribution of flood samples is pointed in the third section which are further divided into different sub-sections such that the applicability and flexibility of copula distribution for establishing bivariate joint relationship over the traditional multivariate functions, desire towards capturing flood design hydrograph by introducing all the relevant flood vectors simultaneously in the light of trivariate or 3-dimensional copulas construction, discussion over the distinguish varieties of some standard trivariate copulas and their efficacy for establishing joint distribution are reviewed. This section also pointed the flexibility of vine or PCC methodologies as well as minimum information PCC model for revealing many comprehensive attempts in the uncertainty analysis of flood episodes in comparison with traditional trivariate copula functions. ‘Return periods under multivariate settings’ reviewed the importance of different notations of return period, i.e. joint return period, conditional return period, Kendall’s return or survival return period which solely depends upon the nature of work assessments in the water-related issues. Development of few machine learning (or ML) algorithm in the field of flood prediction and forecasting is discussed separately in ‘Development of few ML algorithms in the flood prediction’. ‘Research discussion’ and ‘Research conclusion’ comprise the research discussions and conclusions. Lastly, few ideas to strengthen the current attempts of multivariate practices in the light of time-varying copula framework are discussed in the last section of this literature.

Fig. 1
figure 1

Methodological flowchart of the literature review

Flood frequency analysis via one-dimensional probability distribution framework or approximation of marginal distributions

An approach via parametric distribution function

Hydrological episodes can be characterized through rare and extreme consequences according to prospects of time and magnitude scale. Three different approaches of flood modelling are usually motivated over the works of literature, i.e. regional-based analysis, stream-based and time-series analysis in which flood frequency estimations via annual peak discharge series could be effective for longer data length availability (Rao and Hameed 2000). Regional based hydrological modelling, which is also called pooling group analysis, targeted the data from multiple gauge site to derive a regional distribution of multivariate extreme which might reduce the chance of sampling variations in model parameters and, thus, would be effective for un-gauge stream in comparison with the at-site frequency analysis (Burn 1990; Hosking and Wallis 1997; Viglione et al. 2007; Kyselý et al. 2011). Region-of-influence or ROI technique, i.e. based on the unique flexible pooling group for each targeted site (i.e. Burn 1990), and Hosking-Wallis or HW, i.e. based on delineating fixed regions and where each site characterized with same weight within the targeted region (i.e. Hosking and Wallis 1997) are the two-distinct variants or approaches of the regional based frequency analysis.

Conventional flood frequency practices frequently motivated either through block (annual) maxima (i.e. high flood peak) or peak over threshold on the partial series of data with an assumption of stationary, independent and identically (or i.i.d) distributions of historical samples (Hosking et al. 1985; Bras 1990; Coles 2001; Kartz et al. 2002). Annual maxima records often signify a justifiable basis of design problems such that the expected structural design life establishes a simple relation of its magnitude as well as their distributional structure and thus forms a basis to estimates the design quantiles or event exceedance by selecting an appropriate distributional structure of the given targeted maxima (Bardsley and Manly 1987). An interactive sets of univariate parametric families functions often targeted for univariate density modelling or defining marginal distributions of extreme random vectors such as 3-parameter generalized extreme value distribution (GEV) (i.e. Jenkinson 1955; Ouarda et al. 2001; Yue and Wang 2004), 2-parameter gamma distribution (i.e. Yevjevich 1972; Yue 2001a), 2-parameter with light tailed Gumbel distribution or extreme value type-1 distribution (or EV-1) (i.e. Adamowski, 1989; Yue et al. 1999), 2-parameter with bounded upper tailed or Weibull distribution (i.e. Johnson 1994; Zhang et al. 2016), 1-parameter exponential distribution (i.e. Choulakian et al. 1990; Bacchi et al. 1994; Karmakar and Simonovic 2008), 2-parameter log-normal distribution (i.e. Yue 2000; Xu et al. 2015), normal or Gaussian distribution (Goel et al. 1998; Yue 1999), log-logistic distribution (Bobee and Ashkar 1989), generalized logistic (or GLO) distribution (i.e. Requena et al. 2016), 3-parameter with heavy tailed Freshet distribution (i.e. Graler et al. 2013; Reddy and Ganguli 2013), 3-parameter general Pareto (or GP) distribution (i.e. Johnson 1994; Zhang et al. 2016), 3-parameter log-gamma distribution (i.e. Veronika and Halmova 2014) and log-Pearson type-3 distribution (i.e. Bobee 1975). Generalized extreme value or GEV distribution exhibited a significant relation with hydrologist during extreme value practices which is further encompassed into three distinct functions such as Gumbel, the (Reversed) Weibull and Frechet distributions (i.e. Jenkinson 1955; Coles 2001; Khaliq et al. 2006). Each function attributed for the different tail behaviour based on their shape parameter ‘ξ’, i.e. Gumbel characterized with light tail behaviour, Frechet with heavy tail and bounded upper tail for the Weibull distribution (Graler et al. 2013). If the shape parameter ‘α’ is equal to 0 and correspond to thin upper and unbounded tail for GEV distribution then, it signifies for the Gumbel function and for α > 0 termed for Frechet distribution, which signifies for the long and heavy tailed due to unbounded with decreasing behaviour, polynomially (Khaliq et al. 2006; Graler et al. 2013; Reddy and Ganguli 2013). The flexibility of available univariate models exhibited a control to justify an appropriate fit with distribution samples such that it depends upon its associated vectors of unknown statistical parameters or model parameters, i.e. 3-parameter log-gamma distribution extensively employed in flood modelling over many regions due to its capability of adjustments in their shape in accordance with the flood series (Veronika and Halmova 2014). Also, different density structures attribute different estimation of design quantiles, especially in the distribution tail structure (Karmakar and Simonovic 2008, 2009). Readers are advised to follow Coles (2001), Kartz et al. (2002) and Khaliq et al. (2006) for the extended details of the varieties of univariate models for hydrological observations.

An approach via non-parametric distribution framework

The above-cited literatures are frequently adapted the parametric distribution functions to approximating probability density or marginal distribution of flood characteristics. Simulations via the parametric functions often imposed an assumption that random samples are drawn from the population whose density structure is pre-defined, i.e. the marginal distribution of flood characteristics is assumed to follow some specific family of parametric density functions (Silverman 1986; Adamowski 1985, 1990, 1996; Botev et al. 2010). In actuality, no specific models are categorized and opted universally for any specific hydrologic variables, which would follow different distributions or, in other words, the best-fitted marginal distributions were not from the same probability distribution family (Adamowski 1985; Kim and Heo 2002; Karmakar and Simonovic 2008; Santhosh and Srinivas 2013). Dooge (1986) already pointed out that no amount of statistical refinement can overcome the consequences due to lack of prior probability distribution information of the observed random samples. Also, approximation of any distribution tail beyond the largest value under parameter distribution framework would be difficult (Bardsley 1988; Bardsley and Manly 1987). More especially, in case of multimodal or skewed distributions where parametric functions might be incompatible and attribute for inconsistencies in the estimated quantiles. Therefore, from the last few decades, few demonstrations such as Schwartz (1967), Duins (1976), Singh (1977), Bowman (1984), Silverman (1986), Scott (1992), Lall et al. (1993), Lall (1995), Wand and Jones (1995); Jones and Foster (1996), Lall et al. (1996), Adamowski (1996, 2000), Bowman and Azzalini (1997), Efromovich (1999), Duong and Hazelton (2003), Kim et al. (2003), 2006), Ghosh and Mujumdar (2007) and Santhosh and Srinivas (2013) pointed the flexibility of non-parametric probability concept in the light of Kernel density estimations or kde. Kernel estimator is recognized as a much stable data smoothing procedure in the field of hydrologic or flood frequency analysis and which yields a bonafide density. Enumerations of the alternate theoretical overview for non-parametric setting are conducted in the earlier literature such as Rosenblatt (1956), Parzen (1962) and Bartlett (1963). Actually, the non-parametric framework does not require any prior distribution assumptions and will be directly retrieved from the distribution series with higher extent of flexibility as compared with parametric density estimators (Adamowski 1989 and Moon and Lall 1994).

Unless, the univariate approach defines the general concept of non-exceedance probability or return period via cumulative distribution function or cdf, but it might be unsatisfactory when the requirement demands the consideration of multivariate design parameters, which often reveals an essential concern in the water-related queries. Flood is a multidimensional phenomenon which often characterizes comprehensively by accounting its triplet intercorrelated random vectors and thus could demand the necessity of multivariate constructions for estimating the design hydrograph instead of just estimating design quantiles by targeting single flood vectors, i.e. univariate frequency analysis or return period (Choulakian et al. 1990; Bacchi et al. 1994; Goel et al. 1998; Yue 1999, Yue 2001a; Nadarajah and Shiau 2005). Actually, the selection of suitable recursion interval depends upon the selected design variable quantiles (Brunner et al. 2016) or, in other words, the importance of different notations of return periods, i.e. joint, conditional, Kendall’s or survival functions solely depend upon the nature of assessments going to tackle in the water-related issues (Salvadori 2004; Serinaldi 2015). For examples, in the non-structural water-related queries, i.e. flood control and mitigation practices, demonstrating the mutual concurrency of flood peak with their volume extents would be a defensive approach in flood diversion practices or the joint dependency between flood peak and duration of events for flood controlling pressure practical (Fan et al. 2015; Xu et al. 2015).

Bivariate joint distribution framework of flood characteristics

Limitation of traditional multivariate distribution framework

Actually, capturing of correlation structure among the multiple hydrologic or flood vectors under classical statistical formulations such as Pearson correlation coefficient (‘ρ’) or Kendall’s tau (‘τ’) would be ineffective to characterize co-movements tendencies of extreme vectors (Poulin et al. 2007). The unreliability and impractical consequences of univariate frequency analysis motivated numerous demonstrations towards multivariate joint probability constructions to investigate the mutual concurrency among flood vectors (Sackl and Bergmann 1987; Krstanovic and Singh 1987; Singh and Singh 1991; Raynal-Villasenor and Salas 1987; Cuadras 1992; Bacchi et al. 1994; Goel et al. 1998; Choulakian et al. 1990; Yue et al. 1999; Yue 1999, 2000, 2001a, 2001b; Yue and Rasmussen 2002; Durrans et al., 2003; Yue and Wang 2004; Nadarajah and Shiau 2005; Escalante 2007 and references therein). Distinguished varieties of traditional multivariate functions are incorporated for establishing bivariate joint relations and frequencies between flood peak-volume, volume-durations or peak-durations such as bivariate normal, lognormal and gamma functions (i.e. Yue 1999, 2000 and 2001), bivariate exponential distributional (i.e. Singh and Singh 1991; Choulakian et al. 1990), generalized extreme value distributions (i.e. Yue et al. 1999; Yue 2001b; Nadarajah and Shiau 2005), Pearson type III distribution (i.e. Durrans 1992), Gumbel mixed and Gumbel logistic functions (i.e. Yue and Wang 2004).

Multivariate practices via traditional probability functions often attribute several statistical constraints and shortcomings during the joint dependency measure, such that each individual hydrological entities or flood vectors will have an identical marginal structure or assumed to have Gaussian or normal distributions or either transformed or forced to have normal distribution through the data transformation procedure, which might be following the different marginal structures and would desire to model separately (Zhang 2005; Zhang and Singh 2006, 2007a; Reddy and Ganguli 2012a). Also, statistical parameters of the marginal structure are employed to model joint association, which often desire for separate modelling of their marginal and joint structure (Schmidt 2007). Limited space is usually available to justify the joint structure under conventional multivariate functions, thus often revealing a tough challenge (Song and Singh 2010). Besides this, conventional models attribute for the heavy dependency of flood exceedance on the right tail and thus might result for complexity during the demonstrations of observed samples and thus could demand for the separate modelling of margins from their joint dependence structure for securing their joint association significantly (Zhang and Singh 2006; Reddy and Ganguli 2013). Actually, separate modelling of univariate marginal and their joint structure could optimize the reliability of the modelling outcomes (Ane and Kharoubi 2003 and Reddy and Ganguli 2012a).

After encountering the above limitations it motivated firstly, De Michele and Salvadori (2003) and Favre et al. (2004) introduce the concept of copula function as a model risk for hydrological observations. After that, a series of literature incorporated copula function, i.e. for flood samples (Salvadori and De Michele 2004; De Michele et al. 2005; Grimaldi and Serinaldi 2006; Zhang and Singh 2006; Zhang and Singh 2007b; Renard and Lang 2007; Genest et al. 2007; Salvadori et al. 2011; Grimaldi et al. 2013; Graler et al. 2013; Sraj et al. 2014; Daneshkhan et al. 2015; Bedford et al. 2015; Fan and Zheng 2016 and references therein), for rainfall characteristics (Salvadori and De Michele 2006; Zhang and Singh 2007a; Kao and Govindaraju 2008; Vernieuwe et al. 2015) and for drought episodes (Shiau 2006; Shiau and Modarres 2009; Song and Singh 2010; Ma et al. 2013; Saghafian and Mehdikhani 2014; Rauf and Zeephongsekul 2014; Zhang et al. 2016). Besides their extended applicability in extreme event modelling, copulas were significantly applied in the ground water modelling (Reddy and Ganguli, 2012) and also modelling of hydroclimatic samples (Maity and Kumar 2008 and Cong and Brady 2011). Actually, copulas segregate modelling of individual univariate vectors and their joint structure separately into two distinct stages, which attribute higher flexibility in selecting most appropriate and justifiable marginal and their joint structure among the peer family members to capture a wider extent of dependency, along with preservation in their joint association (Saklar 1959; De Michele and Salvadori 2003; Salvadori and De Michele 2004; and Nelsen 2006). The essential mathematical terminologies and theorems associated with copula function reader are advised to follow Saklar (1959) and Nelsen (2006) and also ‘International Association of Hydrological Sciences (or IAHS)’ for extended details and lists of their applicability in the field of hydroclimatological observations.

Copula-based bivariate probability distributions

In extreme hydrological modelling, the copula-based methodology can be classified as parametric, semiparametric and non-parametric estimation procedures depending upon the way of estimating its univariate marginals and joint dependence structure (Choros et al. 2010; Santhosh and Srinivas 2013). Current copula attempt in the recent decades (i.e. Favre et al. 2004; Grimaldi and Serinaldi 2006; Zhang 2005; Sraj et al. 2014 and references therein) frequently incorporated parametric settings for establishing multivariate flood distributions analysis using standard parametric distribution approach. On the other side, few demonstrations (i.e. Karmakar and Simonovic 2008, 2009; Reddy and Ganguli 2012a) incorporated semiparametric copulas, also called heterogeneous or mixed marginal environment, where flood marginals are approximated using non-parametric distribution approach (i.e. kernel density estimators or orthonormal series) but still, parametric copula functions are introduced to modelled their joint dependencies. Besides this, few attempts (i.e. Dupuis 2007) pointed few limitations of the copula function in the context of finding best-fitted copula among their peer classes which is not a simple and consistent procedure and also limitation in the context of different extents of dependence measuring capabilities of each copula functions (Nelsen 2006). Therefore, literature (i.e. Santhosh and Srinivas 2013) incorporated the non-parametric approach for multivariate flood frequency analysis using the diffusion kernel functions which is earlier motivated by Botev et al. (2010).

Among the interactive sets of frequently incorporated copulas such as the extreme value class (i.e. Gumbel-Hougaard, Galambos and Husler-Reiss), elliptical class (i.e. Gaussian family), unclassified Plackett and Farlie-Gumbel-Morgenstern (or FGM) parametric functions and three-parametric Twan family (i.e. belong to extreme value class), the Archimedean class (i.e. Ali-Mikhail or A-M-H family, Frank family, Clayton or Cook-Johnson (C-J) family and Gumbel-Hougaard family) copulas are frequently accepted due to large varieties of families and its capability to capture joint dependencies for a wider extent also, exhibiting several desirable properties which attributes much flexibility during joint probability simulations (De Michele and Salvadori 2003; Salvadori and De Michele 2004; Favre et al. 2004; Nelsen 2006; Grimaldi and Serinaldi 2006; Zhang and Singh 2006; Salvadori and De Michele 2007; Corbella and Stretch 2013; Madadgar and Moradkhani 2013; Chebana et al. 2013; Rauf and Zeephongsekul 2014; Bender et al. 2014; Jiang et al. 2015; Papaioannou et al. 2016; Galiatsatou and Prinos 2016; Requena et al. 2016). Mathematically, the copula function (i.e. [C : [0, 1]2 ⟶ [0, 1]]) approximates the bivariate Archimedean class copula, if it justifies the representation (i.e. [C(u, v) = ∅−1(∅(u) +  ∅ (v)) for u, v ∈ [0, 1]]), where ∅(.) and ∅−1 signify the generator function of the specified Archimedean copulas and their inverse such that the generator \( \left(\upvarphi :\mathrm{I}\longrightarrow {\mathfrak{R}}^{+}\right) \) signifies for the positive, convex and decreasing function and could be approximated for ∅(1) = 0 and ∅ (1) = ∞ (Nelsen 2006). Each family of Archimedean class is characterized by a specific extent of dependency capturing capability, which is constrained by the degree of intersection between random vectors and will be investigated based on the dependency measure. As such, AMH family could model for both positive and negative associations but the dependence parameter is restricted for Kendall’s tau τθ ∈ [−0.181, 0.333] and could be insignificant for outside this range, similarly for C-J and GH family the Kendall’s tau τθ ≥ 0 and only significant to capture the positive dependency (Salvadori and De Michele 2004; Nelsen 2006; Xu et al. 2015). Frank family functions exhibited higher versatility due to its capability in accommodating and capturing the entire range of dependencies (i.e. τθ ∈ [1, −1]) and only member that justified radial symmetry as well (i.e. symmetric to u + v = 1) (De Michele and Salvadori 2003; Favre et al. 2004; Nelsen 2006; Grimaldi and Serinaldi 2006; Zhang and Singh 2007a). All the Archimedean copulas except the Frank family exhibited non-symmetrical behaviour with respect to secondary diagonals such as the GH copula that is much suitable to model the dependence structure between vectors with upper-tail dependence; similarly, Clayton copula exhibited strong capability to model with lower tail dependency while Frank has no tail dependency (Poulin et al. 2007). Besides the above families, the extreme-value (or EV) copula is also incorporated for establishing bivariate joint relation, that can be formulated as [C(u, v) = uvA(log(u)/ log(uv))], for u, v ∈ [0, 1] which can be uniquely defined through Pickands dependence function (i.e. [A : [0, 1] ⟶ [1/2, 1]) and having non-symmetrical behaviour over the secondary diagonals (Twan 1988; Papaioannou et al. 2016). Nelsen (2006) demonstrated the extended examples for the Archimedean class functions; also see Twan (1988) for the extreme value functions.

Trivariate joint dependency constructions via 3-dimensional copulas

Unless extended efforts are often motivated towards establishing the copula-based methodology for estimating bivariate design variable quantiles under different notations of return periods, but such attempts still might be insufficient for revealing justifiable and comprehensive studies of flood probability analysis. Actually, dealing with multiple design variables, i.e. flood peak, volume and durations, would limit the applicability of analysing only through bivariate joint concurrency for the flood episodes but, due to its triplet distribution behaviour, could be demanding for simultaneous accountability of its all intercorrelated vectors (Salvadori et al. 2011; Graler et al. 2013; Fan and Zheng 2016; Reddy and Ganguli 2013; Daneshkhan et al. 2016). Actually, potential damage could likely be a function of multiple relevant vectors of specified hydrological episodes such that ignorance of spatial dependency among these uncertain vectors might be attributed for the underestimation of uncertainty, which frequently encountered during risk evaluation (Renard and Lang 2007; Graler et al. 2013; Vernieuwe et al. 2015). Few literatures incorporated copula-based methodology for establishing trivariate joint distribution and defining the concept of trivariate return periods by introducing an interactive class of 3-dimensional copula functions, but still, such computational strategies are quite limited over the literature.

Grimaldi and Serinaldi (2006) performed flood probability analysis through adapting three distinct forms of trivariate functions such as the mono-parametric and asymmetric or fully nested structure of Frank functions along with the Gumbel logistic distributions and pointed the significance of Frank function under FNA structure. Similarly, Serinaldi and Grimaldi (2007) fitted the same fully nested structure for deriving trivariate flood structure. Genest et al. (2007) adopted meta-elliptical copulas for annual spring flood analysis over Romaine River in Canada and revealed that it could be an effective tool for multidimensional hydrological data by preserving the pairwise dependencies among the random vectors through the correlation matrix but exhibited some modelling limitation such as might be ineffective under the low probabilities, unless the asymptotic properties of data will be justified through the strong arguments. Reddy and Ganguli (2013) examined the significance of multidimensional designs events by comparing univariate, bivariate and trivariate return periods for the flood episodes via fully nested Archimedean or FNA class copula and Student’s t copula (Elliptical class copula) and revealed that it could be an essential effort to demonstrate joint and conditional flood occurrence in the light of trivariate return periods. Fan and Zheng (2016) adopted entropy copula structure in conjunction with Gibbs sampling along with the Gaussian and the Archimedean copula for simulation of trivariate flood episodes and revealed that entropy copula could be easily projected into higher dimensional frame directly just like as the Gaussian copula.

Similarly, Kao and Govindaraju (2008) applied the non-Archimedean copula function for simulating the trivariate structure of extreme rainfall episodes. This demonstration pointed out the modelling flexibility of Plackett family of copulas which concluded for faithful preservation of lower-level dependencies among relevantly associated vectors, which often reveal crucial strategies under trivariate or higher dimension dependency simulations. Madadgar and Moradkhani (2013) captured the joint behaviour of drought episodes under the climate change scenario using trivariate copula structure. This study integrated the significant drought vectors like severity, duration and its intensity using trivariate Gumbel copula (i.e. Archimedean family function) and t copula (i.e. elliptical family) framework for capturing joint and conditional probabilities. Also, the stress of dynamic environmental arising over the occurrence of future drought risk was also addressed through integrating GCM output under A 1 B scenario. Few other methodological efforts are of Song and Singh (2010) (i.e. drought frequency analysis under meta-elliptical copula structure), Wong et al. (2010) (i.e. modelling of trivariate drought characteristics) and so on.

The n-dimensional Archimedean copula can be formulated by extending two-dimensional form into ‘n’ order series, which can be as expressed by [C(x1, x2, …..xn) = ∅1(∅(x1) +  ∅ (x2)……. ∅ (xn))] where consistency of this equation will be preserved as long as the generator function (.) is fully monotonic; otherwise, it might be inconsistent for hydrological samples due to hypothesis in terms of homogenous dependency across the variables; also, ‘’ tends to the strict generator if pseudo-inverse function −1(.) becomes as ordinary inverse function when (0) tends to infinity (Grimaldi and Serinaldi 2006; Nelsen 2006; Reddy and Ganguli 2013). From the perspective of lower dimension, i.e. bivariate copulas modelling, the symmetric Archimedean copulas frequently motivated over the literature and often justified significant outcomes through the inferential testing (i.e. good-of-fit test) measures but would be sounded for inconsistency when projecting into higher dimension distributional frame (i.e. n≥3). In actuality, it approximates the dependencies between multiple vectors pairs by employing single dependence parameters due to its mono-parametric behaviour, but it would be incapable to preserve all pairwise dependency at the lower stages (Renard and Lang 2007; Genest et al. 2007; Kao and Govindaraju 2008; Madadgar and Moradkhani 2013). Therefore, it could desire to approximate each random pair individually through multiple parametric joint asymmetric functions (Serinaldi and Grimaldi 2007; Savu and Trede 2010; Reddy and Ganguli 2013). Whelan (2004) pointed a flexible structure that permit for the heterogeneous dependency across vectors in the context of fully nested Archimedean or FNA copulas which solely based on the joint integration of two or multiple bivariate or any dimensional Archimedean copulas structure through another Archimedean structure and can be formulated by [C(x1, x2, x3) = ∅2(∅2−1 ∘ ∅1[∅1−1(x1) + ∅2−1(x2)] + ∅2−1(x3)) = C2[C1(x1, x2), x3]] where: ∅1 and ∅2 signify Laplace transformation for first derivatives [2−1∘ ∅1] will be monotonic; the symbol ‘’ indicates the composite of functions. The formulated copula C(x1, x2, x3) signifies the joint simulation of two bivariate structure through trivariate asymmetric Archimedean functions, but its applicability could be significantly justified only, if the dependency strength among the two variables i.e. (x1, x2) will dominate over the correlation structure between these variables and the third variables i.e. (x1, x3) and (x2, x3) (Savu and Trede 2010; Reddy and Ganguli 2013). Some literatures such as Grimaldi and Serinaldi (2006), Serinaldi and Grimaldi (2007), Madadgar and Moradkhani (2013) and Reddy and Ganguli (2013) demonstrated the flexibility of FNA structure for the hydrological observations. But some literature still pointed the issue of faithful preservations of lower stages dependency via the FNA structure and their modelling limitation which is only limited for positive range and thus pointed the applicability of few other standard class of trivariate copulas (i.e. Renard and Lang 2007; Genest et al. 2007; Kao and Govindaraju 2008; Fan and Zheng 2016). Such that Renard and Lang (2007) pointed Gaussian function (Elliptical class copula) for hydrological observations which can be significantly projected into any higher dimensional frame directly due to the symmetric and definite positive matrix which can demonstrate the dependence between various attributes pairs. Genest et al. (2007) pointed the meta-elliptical copulas which could preserve pairwise dependencies via correlation matrix but exhibited some modelling limitation under low probabilities unless the asymptotic properties of data will be justified through strong arguments. Similarly, Kao and Govindaraju (2008) pointed the non-Archimedean Plackett families, which are based on the principle of the constant cross product, are another alternative to justify the preservations issue at lower-level dependencies. Ma et al. (2013) modelled trivariate drought characteristics via Gaussian and Student’s t copula structure. Fan and Zheng (2016) highlighted the significance of maximum entropy theory in conjunction with entropy copula as a dynamic modelling strategy for higher dimensional space without imposing any assumption of copula family, more especially in conjunction with Gibbs sampling technique which could justified much comprehensive demonstration but surrounded with computational complexity due to the lack of analytical based parameter estimation. Justifiable preservation of all the lower-level dependencies often seems a challenging effort in the higher dimensional copula-based methodology especially, if complex pattern of dependency exhibited over the multidimensional data structure (Joe 1997; Kurowicka and Cooke 2006; Aas et al. 2009).

Vine copulas or PCC framework for trivariate joint distributions

The previous section highlighted different efforts and motivation towards simultaneous accountability of multiple design vectors via higher dimensional (i. e. n ≥ 3) copula-based joint probability simulations for the hydrological characteristics, but still, such incorporations are quite limited. In actuality, the above-undertaken copulas encountered several statistical issues or queries such as complexity during the approximation of justifiable parametric distributions for higher dimensional hydrological attributes (Aas et al. 2009) and also might be quite ineffective to capture and reflect all the possible mutual concurrency among multidimensional vectors (Daneshkhan et al. 2016). In actuality, due to the higher degree of uncertainty and complexity, resolving the dependence structure of multivariate extreme via conventional copula formulation is quite complex, which often demands a flexible methodology through precise estimation of tail dependence coefficient under various tail dependency (Aas et al. 2009; Daneshkhan et al. 2016). Therefore, literature such as Kurowicka and Cooke (2006), Joe (1997), Aas et al. (2009) and Bedford and Cooke (2001); Bedford and Cook (2002) was directed towards a comprehensive way of uncertainty characterization for higher dimensional hydrological entities using the vine or pair-copula constructions (or PCC). Applicability of PCC simulations seems much popular in finance and risk management (i.e. Aas et al. 2009; Czado and Min 2010; Nikoloulopoulos et al. 2012; Zhang 2014), but in the past few years, such incorporations were significantly recognized for the hydroclimatic observations such as frequency analysis for drought episodes (i.e. Song and Singh 2010; Saghafian and Mehdikhani 2014), for flood characteristics (i.e. Song and Kang 2011; Graler et al. 2013; Daneshkhan et al. 2016) and for storm or rainfall modelling (i.e. Gyasi-Agyei and Melching 2012; Vernieuwe et al. 2015).

Actually, vine copula construction are solely based on the principle of decomposition of full multivariate density into a cascade or simple local building blocks via conditional independence or pair-copula (Aas and Berg 2009; Bedford and Cook 2002; Graler et al. 2013). Due to conditional mixing via the stage-wise hierarchical nesting procedure, the pair-copula concept exhibited much effective and flexible modelling environments. Such multivariate simulation ignited from the works which is earlier demonstrated by Joe (1997) and after their underlying structural theory extended by Bedford and Cooke (2001); Bedford and Cook (2002) and Aas et al. (2009) and also Hobaek et al. (2010) demonstrated the different aspects in their structural and computational framework. Such incorporation usually comprises through the interactive sets of multiple bivariate or 2-dimensional copulas to cascade in fitting a copula to the random vectors and their conditional and unconditional distribution functions instead of introducing a fixed multidimensional structure to all the characteristics which might be attributed for ineffective over the data exhibited complex dependence structure in the tail and which often a stringent challenges in hydrological modelling (Joe 1997; Bedford and Cooke 2001; Bedford and Cook 2002). Distinct varieties of pair-copula decomposition are attributed under the regular vine structure such as canonical or C-vine and D-vine distribution is the two special modes of parametric regular vine construction (Kurowicka and Cooke 2006; Czado and Min 2010; Czado et al. 2013). Applicability of the D-vine structure is frequently sounded from the existed literature due to their higher flexibility than the C-vine structure and that would be effective when the existence of any particular vectors that regulate the level of mutual interactions within distributed observations are predefined or known (Aas et al. 2009; Daneshkhan et al. 2016). In actuality, the degree of mutual concurrency among multiple targeted vectors comprises the basis to adopt a justifiable vine tree structure (Graler et al. 2013). Let us suppose, for trivariate flood characteristics, if stronger association exhibited between the flood peak (P) and volume (V) and volume (V) and durations (D), that means it will be point to select D-vine structure by placing ‘V’ in between peak and durations. Czado et al. (2013) explored the extended details over the selection procedure of the regular vine constructions. The approximation capability of vine copula for multidimensional structure depends upon the manner of their decomposition and which further reveals that the choice of conditioning is not fixed or unique in vine or PCC (Hobaek et al. 2010; Graler et al. 2013). For further details of the C- and D-vine structure, readers are advised to follow Kurowicka and Cooke (2006), Aas et al. (2009) and Aas and Berg (2009).

Figure 2 illustrates the general computational flow of vine copula framework (i.e. Bedford and Cook 2002; Aas and Berg 2009; Aas et al. 2009; Czado et al. 2013; Graler et al. 2013; Daneshkhan et al. 2016). Computational strategies are usually initiated by selecting a significant vine structure, which solely depends upon the degree of mutual concurrency and having the following stages as reviewed from the above-mentioned literature,

  • First stage of modelling

  • Capturing correlation structure or pairwise dependency by selecting a justifiable bivariate parametric copula function for each flood pairs.

  • Estimating conditional cumulative functions or ‘h-functions’ through conditioning each of the joint structure through variables, which share with both the other flood vectors, e.g. flood volume or V (from the above Fig. 2).

  • Mathematically, conditioning structure can be deriving through the partial differentiation of each bivariate structure as formulated from the Eq. (1):

    $$ {\mathrm{F}}_{\mathrm{P}\mid \mathrm{V}}\left(\mathrm{p}|\mathrm{v}\right)=\frac{\partial {\mathrm{C}}_{\mathrm{P}\mathrm{V}}\left(\mathrm{p},\mathrm{v}\right)}{\mathrm{\partial V}}\ and\ {\mathrm{F}}_{\mathrm{D}\mid \mathrm{V}}\left(\mathrm{d}|\mathrm{v}\right)=\frac{\mathrm{\partial VD}\left(\mathrm{v},\mathrm{d}\right)}{\mathrm{\partial V}} $$
    (1)
Fig. 2
figure 2

Stage-wise hierarchy of bivariate copulas density under 3-dimensional pair-copulas construction (or PCC)

where CP, V and CV, D signifies for bivariate copula structure; FP ∣ V and FD ∣ V defines conditional cumulative functions

  • Second stage of modelling

  • Synthesizing full density structure of 3-dimensional copula function using conditional CDFs of Eq. (1), for investigating the conditional CDFs using Eqs. (2) and (3) as the following

$$ {\mathrm{C}}_{\mathrm{P}\mathrm{V}\mathrm{D}}\left(\mathrm{p},\mathrm{v},\mathrm{d}\right)={\mathrm{C}}_{\mathrm{P}\mathrm{D}\mid \mathrm{V}}\left({\mathrm{F}}_{\mathrm{P}\mid \mathrm{V}}\left(\mathrm{p}|\mathrm{v}\right),{\mathrm{F}}_{\mathrm{D}\mid \mathrm{V}}\left(\mathrm{d}|\mathrm{v}\right)\right).{\mathrm{C}}_{\mathrm{P}\mathrm{V}}\left(\mathrm{p},\mathrm{v}\right).{\mathrm{C}}_{\mathrm{VD}}\left(\mathrm{v},\mathrm{d}\right)\kern0.5em $$
(2)

also,

$$ {\mathrm{f}}_{\mathrm{P}\mathrm{V}\mathrm{D}}\left(\mathrm{p},\mathrm{v},\mathrm{d}\right)={\mathrm{C}}_{\mathrm{P}\mathrm{V}\mathrm{D}}\left(\mathrm{p},\mathrm{v},\mathrm{d}\right).{\mathrm{f}}_{\mathrm{X}}\left(\mathrm{x}\right).{\mathrm{f}}_{\mathrm{Y}}\left(\mathrm{y}\right).{\mathrm{f}}_{\mathrm{Z}}\left(\mathrm{z}\right)={\mathrm{C}}_{\mathrm{P}\mathrm{D}\mid \mathrm{V}}\left({\mathrm{F}}_{\mathrm{P}\mid \mathrm{V}}\left(\mathrm{p}|\mathrm{v}\right),{\mathrm{F}}_{\mathrm{D}\mid \mathrm{V}}\left(\mathrm{d}|\mathrm{v}\right)\right).{\mathrm{C}}_{\mathrm{P}\mathrm{V}}\left(\mathrm{p},\mathrm{v}\right).{\mathrm{C}}_{\mathrm{V}\mathrm{D}}\left(\mathrm{v},\mathrm{d}\right).{\mathrm{f}}_{\mathrm{P}}\left(\mathrm{p}\right).{\mathrm{f}}_{\mathrm{V}}\left(\mathrm{v}\right).{\mathrm{f}}_{\mathrm{D}}\left(\mathrm{d}\right) $$
(3)

An approach via minimum information PCC

Hydrological samples often surrounded with a higher degree of randomness and complexity in their multivariate dependence structure, which often attributes a stringent challenge to justify a precise approximation of multidimensional joint density structure. Also, justifiable accuracy in the estimated exceedance probability of river flow response often demands a longer duration of historical time series. The efficacy and modelling potential of vine copula construction for trivariate distributions are already reviewed from the above-cited literature but still have some modelling issues, i.e. complexity during the selection and synthesis of justifiable copulas structure under parametric density concept for vine constructions (Bedford et al. 2015). Therefore, a new methodological framework is pointed through introducing the concept of minimum information based vine framework. Such non-informative vine methodology facilitates a basis to further exaggerate the modelling potential of traditional PCC (i.e. which is defined via parametric copula framework) by approximating any undertaken copulas density to desire degree of approximations which is already demonstrated by Daneshkhan et al. 2016 for the trivariate flood distribution analysis. Minimum information PCC captures the complex multivariate structure for various tail dependencies through the precise estimation of tail coefficient for a given selected copula and also facilitates to model multivariate extreme in the presence of limited data length (Daneshkhan et al. 2016).

The fundamental concept of building the minimum information PCC for any bivariate joint densities structure say D1 and D2 can be demonstrated through establishing the relative information between undertaken densities which can be further minimized to 0 under identical bivariate densities (i.e. D1 = D2) as pointed from Eq. (4) (Bedford and Meenuwissen 1997 and Daneshkhan et al. 2016).

$$ I\left({D}_1,{D}_2\right)=\iint \ln \left(\frac{D_1\Big({x}_1,{x}_2}{D_2\left({x}_1,{x}_2\right)}\right){D}_1\left({x}_1,{x}_2\right)d{x}_1d{x}_2 $$
(4)

The generalized algorithmic explanation for establishing minimum information structure between any adjacent arbitrary pair of targeted extreme vectors, say between ‘P’ and ‘V’ under vine model, can be formulated by integrating the concept of moments constraint using Eq. (5) (i.e. Bedford et al. 2015; Daneshkhan et al. 2015, 2016)

$$ {\upphi}_{\mathrm{i}}\left(\mathrm{P},\mathrm{V}\right)={\upphi}_{\mathrm{I}}^{\prime}\left({\mathrm{F}}_1^{-1}\left(\mathrm{P}\right),{\mathrm{F}}_2^{-1}\left(\mathrm{V}\right)\right),\kern0.5em \mathrm{for}\ i=1,2,\dots .,k $$
(5)

where \( {\mathrm{F}}_1^{-1}\left(\mathrm{P}\right)\ \mathrm{and}\kern0.50em {\mathrm{F}}_2^{-1}\left(\mathrm{V}\right) \) represent the univariate cumulative functions of the targeted vectors. Selection of appropriate basis functions (i.e. ϕi for i = 1, 2 …k) controls the fitness level of copula structure for each random pair (Daneshkhan et al. 2016). Also, selecting appropriate number of grid size has also influence over their approximation level such that larger value would attribute for longer computational period (Bedford et al. 2015). Therefore, it often demands a perfect synchronization or balancing between analysis duration and the accuracy level (Daneshkhan et al. (2016). Readers are advised to follow Bedford et al. (2015) and Daneshkhan et al. (2015, 2016) for extended details of this non-informative copula framework.

Return periods under multivariate settings

This section overviewed the statistical significance of return periods under multidimensional design concept for tackling different hydrologic problems. In actuality, selection of return periods is depending upon the importance of undertaken structure as well as its consequences of failure where their appropriate selection often attributed an impact over the strength of design variables quantiles (Brunner et al. 2016). Hydrology and hydraulic applications mostly interested in the evaluation of the mean inter-arrival period between two design events which usually defined in a year called the return period (Shiau 2003; Salvadori 2004). Especially, the design quantiles define a higher return period often seems a feasible practice in the hydraulic structure designs (Requena et al. 2016). In multidimensional risk framework, return periods can be derived from the exceedance probabilities of flood attributes pair, such that joint return period retrieves from the joint exceedance probabilities. Estimating multivariate design variable quantiles under different notations of return periods, i.e. based on joint and conditional probability distribution functions or via Kendall’s distribution or survival functions, is often a justifiable and essential concern in the hydrologic risk assessments (Salvadori 2004; Graler et al. 2013; Salvadori et al. 2013; Brunner et al. 2016). Shiau (2003), Salvadori (2004), Salvadori and De Michele (2004, 2007), Salvadori et al. (2011) and Serinaldi (2015) pointed an extended mathematical framework towards the deriving of different notations of return periods under copula-based methodology.

Primary return periods

The return periods can be segregated into two distinct groups, i.e. primary return period comprise via the inclusive probability such as ‘AND and ‘OR’ return period and the secondary or ‘Kendall’ return period, which can be define based on the Kendall’s probability distribution or survival function (Salvadori 2004; Salvadori et al. 2011; Salvadori et al. 2013). Concurrence probability usually define the probability that any extreme happening, i.e. flood episodes, which characterize through either univariate (say flood peak discharge or ‘X’) or multivariate variable (say ‘X’, ‘Y’…) exceeding certain a threshold level say ‘x’ (or ‘x’, ‘y’… for the multivariate structure) (Yue and Rasmussen 2002; Shiau 2003; Salvadori 2004). Under the one-dimensional probability framework, the return periods of hydrological or flood events exceeding a threshold value say {X ≥ x} can be defined through fitting univariate cumulative distribution functions or cdfs using Eq. (6), as given below:

$$ T=\frac{\mu }{\mathrm{total}\ \mathrm{no}\ \mathrm{of}\ \mathrm{flood}\ \mathrm{per}\ \mathrm{year}}=\frac{\mu }{\mathrm{P}\left(X\ge x\right)}=\frac{1}{1-\mathrm{univariate}\ \mathrm{cdf}\ \mathrm{or}\ F(x)} $$
(6)

where μ = mean inter-arrival duration between two consecutive episodes = 1, for annual maxima based extreme modelling (Yue and Rasmussen 2002).

In actuality, notation of return period under univariate concept might be useful only if the concentration of single hydrological attribute will justify the requirements of the design process and, in another way, it will also indicate the existence of no significant inter-association exhibited between multiple relevant vectors (Veronika and Halmova, 2014). Each separate approach of return periods has their own significance, and that will be solely based on the nature of the undertaking problem, which cannot be interchanged and also impossible to decide for the most consistent ways (Serinaldi 2015). Therefore, the return periods which demonstrate the undertaken assessments have requirements in a much better way, only the things that create a sharp distinct through selecting most consistency and justifiable return period (Tosunoglu and Kisi 2016). Reddy and Ganguli (2013) demonstration revealed that the assessments of both primary (i.e. ‘OR and ‘AND) and secondary (i.e. Kendall’s) return periods could be an effective practice more especially from hydraulic or flood defence infrastructure designing prospects, such that concentrating over only the return period in either ‘OR’ case or ‘AND’ might reveal under-dimensioned or over-dimensioned. Actually, the joint return period facilitates different possible ways to capture joint relationship between the targeted vectors such as under the bivariate distribution between flood vectors say ‘X’ and ‘Y’; some alternative probability relations are given below (Yue and Rasmussen 2002; Salvadori 2004; Brunner et al. 2016).

•when both targeted vectors say ‘X’ and ‘Y’ simultaneously exceeds certain value say ‘x’ and‘y’, i.e. {X > x, Y > y},

•when only vector ‘Y’ exceeds the threshold say ‘y’, i.e. {X ≤ x, Y > y},

•when neither ‘X’ or ‘Y’ vector exceeds threshold, i.e. {X ≤ x, Y ≤ y},

•when only vector ‘X’ exceeds threshold say ‘x’{X > x, Y ≤ y}.

Let us suppose, if ′X ≥ xand ′ Y ≥ y′ are two potential flood vector, representing peak and volume series exceeding certain a threshold value say, ‘x’ and ‘y’, then according to statistics of return periods for the joint probability under ‘OR’ and ‘AND’ case (i.e. Yue and Rasmussen 2002; Salvadori 2004; Salvadori and De Michele 2004; Zhang and Singh 2006, 2007a) can be formulated using Eqs. (7) and (8):

For ‘OR’ case

$$ {T}_{XY}=\mu /P\left(X\ge x\ OR\ T\ge y\right)=\mu /1-C\left[F(x),F(y)\right] $$
(7)

similarly, and for ‘AND’ case

$$ {T}_{XY}^{\prime }=\mu /P\left(X\ge x\ \mathrm{AND}\ T\ge y\right)=\mu /1-F(x)-F(y)+C\left[F(x),F(y)\right] $$
(8)

where C[F(x), F(y)] signifies for copula joint density of flood margins F(x) and F(y) of the undertaken vectors and μ=mean inter-arrival duration of two successive episodes = 1, for annual maxima based extreme generations.

In most of the hydrological design requirements, it could be demanding to define events through highlighting the significance or priority of one design variables over another design vectors and thus literature pointed out the necessity of conditional distributional framework for defining the concept of conditional return periods, i.e. Salvadori and De Michele (2004), Shiau (2006), Zhang and Singh (2006, 2007a), Kao and Govindaraju (2008), Salvadori and De Michele (2010), Salvadori et al. (2011), Vandenberghe et al. (2011), Rauf and Zeephongsekul (2014), Veronika and Halmova (2014), Salvadori et al. (2014), Saghafian and Mehdikhani (2014), Zhang et al. (2016), Brunner et al. (2016) and Tosunoglu and Kisi (2016). For example, probability of flood peak conditional to volume or durations or either flood volume conditional to peak or durations or flood durations conditional to flood peak or volume information would be benefited from the hydraulic design prospects. Let us consider if ‘X’ and ‘Y’ are the flood vectors then the conditional distribution of ‘X’ given various percentile value of ‘Y’ or vice-versa can be formulated using Eqs. (9) and (10):

$$ {P}_{X/Y}=1-\Big(P(x)-H\left(x,y\right)/1-P(y) $$
(9)
$$ {P}_{Y/X}=1-\left(P(y)-H\left(x,y\right)/1-P(X)\right) $$
(10)

Formulation of conditional probability framework under bivariate distributions between any pair of targeted flood vectors say ‘X’ and ‘Y’ can be formulated using Eqs. (11) and (12), for the various possible combinations in accordance with suitability or nature of the undertaken problem, as given below (Shiau 2003; Reddy and Ganguli 2012a; Veronika and Halmova, 2013).

$$ P\left(X\le x\setminus Y\le y\right)=P\left(X\le x,Y\le y\right)/P\left(\ Y\le y\right)=H\left(X,Y\right)\ \mathrm{or}\ C\left(X,Y\right)/F(Y) $$
(11)
$$ P\left(X\le x\setminus Y\ge y\right)=P\left(X\le x,Y\ge y\right)/P\left(\ Y\ge y\right)=F(X)-C\left(X,Y\right)/1-F(Y) $$
(12)

Similarly, Eqs. (13) and (14) represented the conditional cumulative function of Y given X ≥ x,which can be expressed as

$$ P\left(Y\le y\setminus X\le x\right)=P\left(Y\le y,X\le x\right)/P\left(\ X\le x\right)=C\left(X,Y\right)/F(X) $$
(13)
$$ P\left(Y\le y\setminus X\ge x\right)=P\left(Y\le y,X\ge x\right)/P\left(\ X\ge x\right)=F(Y)-C\left(X,Y\right)/1-F(X) $$
(14)

where H(X, Y) and C(X, Y) signify the joint cumulative distributions, estimated using conventional and copula density structure of the univariate margins F(X) and F(Y) of targeted vectors ‘X’ and ‘Y’. Therefore, the cumulative structure H(X, Y) can be expressed in the context of bivariate copula density structure, say ′C(X, Y) for the representation of conditional return period, as expressed from the Eqs. (15) and (16):

$$ {T}_{X\backslash Y\ge y}=1/\left(1-\left(F(y)\right)\right(1-F(x)-F(y)+\left(H\left(X,Y\right)\ or\ C\left(X,Y\right)\right) $$
(15)
$$ {T}_{y\backslash X\le x}=1/\left(1-\left(F(x)\right)\right(1-F(x)-F(y)+H\left(X,Y\right)\ or\ C\left(X,Y\right) $$
(16)

Really, it is very difficult and tough challenge under the design process to point out that which definition of return contours perform better and consistence measure of design event for attributing a justifiable and significant risk expectation for the undertaken water related problems.

Demonstrating the risk of supercritical extreme via the Kendall’s distribution and survival functions (or secondary return periods)

Actually, utilizing the standard definition of return period solely in the light of inclusion probability or primary returns might be problematic and attributed for underestimation of correct value (Salvadori and De Michele 2010). In actuality, hydrologic consequences, i.e. flood, drought or rainfall, exhibited either the critical, sub-critical or super-critical behaviour. Primary return periods (i.e. joint and conditional distributions) for the annual flood analysis often attributing towards capturing of mean forecasting and would not facilitate to demonstrate the risk of supercritical or dangerous scenario which are rare and could be outlined by investigating mean time lapse between the occurrence of supercritical episodes (Salvadori and De Michele 2010; Salvadori et al. 2011; Vandenberghe et al., 2011; Mirabbasi et al. 2012). Appropriate reliability of hydraulic design system often intended towards the definition of exceedance probabilities for rare episodes (Sarhadi et al. 2016). Actually, the super-critical scenario of hydrological extremes often reveals a serious potential threat for designing facilities due to its rare happening risk in comparison with given design return periods (Graler et al. 2013; Reddy and Ganguli 2013). Therefore, it often demands to make a sharp distinction through the segregation of probability distribution space into a non-critical and super-critical region based on critical cumulative probability level through Kendall distribution functions or ′KC′ (Graler et al. 2013; Brunner et al. 2016). Thus, literature, i.e. Salvadori (2004), Salvadori and De Michele (2004) and Salvadori and De Michele (2007), demonstrated the efforts towards recognizing the concept of return period under supercritical extreme scenario for defining design events from the Kendall distribution functions also called secondary return period. Kendall return is usually demonstrated through an appropriate discrimination between the non-critical and supercritical episodes using critical cumulative probability levels and that will be further extended into the multidimensional frame in the context of Kendall distribution function \( ^{\prime }{K}_{C_{\theta }}(.)^{\prime } \) (Graler et al. 2013). Under copula framework, Kendall joint return period can be derived from Kendall probability function under two different computational way, i.e. via analytically efforts or ether numerically, based on the simulation algorithm (Salvadori and De Michele 2007; Vandenberghe et al. 2010; Salvadori et al. 2011). According to Salvadori and De Michele (2010) and Salvadori et al. (2011), algorithmic expression for Kendall distribution can be pointed using Eq. (17):

$$ \kern2.5em \left[{K}_C(t)=\Pr \left[W\le t\right]=\Pr \left[C\left(U,\kern0.5em V\right)\le t\right]\right], $$
(17)

where W = C(U,  V) signifies for univariate random variables and KC(t) only depends on C(U,  V) or copula function in which, copula level curves called isolines make a separation of distribution space into super-critical and a non-critical segments. Also, for the given probability level ‘t’, Kendall’s quantile can be demonstrated through an inverse of Kendall distribution (i.e. \( {q}_t={K}_C^{-1}(t) \)) (Brunner et al. 2016). Actually, the above Kendall’s equation facilitates to investigate the chance that random point in the unit square exhibited either larger or smaller copula value than a given critical probability level through the representation of multidimensional information via univariate form based on the cumulative function of copula’s level curve (Salvadori et al. 2011; Graler et al. 2013). Efforts over the statistical evaluation of analytical expression for Kendall function are motivated by Ghoudi et al. (1998) and Salvadori and De Michele (2007) for the extreme value and Archimedean based bivariate copula distributions. On the other side, Salvadori et al. (2011) focused over the tackling of simulation algorithmic efforts (via numerical analysis) for defining′KC′, in the absence of analytical expression. Salvadori et al. (2013) tackle some critical issues over standard Kendall’s return estimation, which are actually pointed by Graler et al. (2013), through introducing the concept of survival function in conjunction with Kendall’s return periods. According to Graler et al. (2013), it might be possible that few non-critical events reveal for larger value over any undertaken design value, but the conventional definition of Kendall’s function attributes for longer joint concurrence probabilities for all the super-critical scenarios over the design value. Therefore, such computational challenges can be undertaken in the light of survival Kendall’s structure by replacing Kendall’s function by survival Kendall’s function under copulas structure as demonstrated mathematically by Eq. (18) (Salvadori et al., 2013).

$$ {T}_{\mathrm{Kendall}\prime \mathrm{s}\ \mathrm{survivval}}=\frac{\mu }{1-{\overline{K}}_C{(t)}^{\prime }}\ \mathrm{and}\kern0.5em {\overline{K}}_C{(t)}^{\prime }=\Pr \left[C\left(1-U,1-V\right)\ge \mathrm{t}\right] $$
(18)

where C(1 − U, 1 − V)= survival function of bivariate random vectors; \( 1-{\overline{K}}_C{(t)}^{\prime } \) signifies the chance of multivariate extreme occurrence in the super-critical region at a critical probability level ‘t’ (Salvadori et al. 2014). On the other side, survival Kendall quantile can be derived by replacing inverse of Kendall function through survival Kendall’s distribution as \( {q}_t={{\overline{K}}_C}^{-1}(t) \) (Salvadori et al. 2014). Volpin and Fiori (2014) demonstrated structure-based concurrence probability estimations which could be an essential concern in the hydraulic designing facility. Such efforts hypothetically establish an interlinking between hydrological variables with their design parameters via strictly monotonic structure function as a statistically formulated equation and which can be facilitated for structural failure return period as demonstrated using Eqs. (19) and (20) (i.e. Volpin and Fiori 2014).

$$ Z=g\left(X,Y\right) $$
(19)
$$ {T}_Z=\frac{\mu }{1-{F}_Z{(z)}^{\prime }} $$
(20)

where FZ is the distribution structure of a random variables say, Z.

The most interesting desire just after the recognition of return periods is the characterisation of the most appropriate design. Multivariate nature of design problem often demands to select multiple design events for a given estimated return periods and that will be further parameterizing during the hydraulic designing procedure (Salvadori et al. 2011; Graler et al. 2013). In actuality, an infinite number of possible combinations among targeted or flood vectors corresponding to each concurrence probabilities under multidimensional framework often reveal a tough challenge during selecting the most promising and effective design process. Salvadori et al. (2011) pointed an efforts to justify the above requirements under the two different perspectives such that one approach concentrated through ‘component-wise excess design realization’ while other one focused ‘most-likely design realization’. Selection of design under the later practices can be justified through targeting a point with the largest probability distributions (Salvadori et al. 2011; Graler et al. 2013). Besides this approach, Salvadori et al. (2014) focused another alternative, i.e. design realization via H-conditional approach which can be defined in the presence of ruling variable. As pointed by Brunner et al. (2016), multivariate simulation often yielded large outcomes; thus, the selection of just one design realization often reveals for flexibility. On the other side, many practitioners desire for much design information through selecting one design event via opting a sub-set of design process, which can be tackled through splitting a return curve into two distinct parts called naive and a proper part as demonstrated by Chebana and Ouarda (2009) or via across sampling of return contour plot according to likelihood function, called ensembles of design events (Graler et al. 2013). Few statistical significances of ensemble-based strategies are also pointed in the literature, i.e. Vandenberghe et al. (2010) and Salvadori et al. (2011).

For 2-D joint probability structure, it can be formulated using Eq. (21);

$$ \left({U}_1,{U}_2\right)={\arg \max}_{T_{X_1,{X}_2}}{f}_{XY}\left({f}_1^{-1}\left({u}_1\right),\kern0.5em {f}_2^{-1}\left({u}_2\right)\right) $$
(21)

Similarly, for 3-D probability frame, using Eq. (22);

$$ \left({U}_1,{U}_2,{U}_3\ \right)={\arg \max}_{T_{X_1,{X}_2,}\ {X}_3}{f}_{XYZ}\left({f}_1^{-1}\left({u}_1\right),\kern0.5em {f}_2^{-1}\left({u}_2\right),{f}_3^{-1}\left({u}_3\right)\right) $$
(22)

Development of few ML algorithms in the flood prediction

Flood is often considered as the most destructive natural disaster and thus, often, motivated hydrologist and water practioner towards discovering the more efficient and accurate flood forecasting model or machine learning algorithm (or MLA) for the appropriate assessment extreme hazards. This section pointed the distinguish varieties of MLA, which are frequently and widely accepted among researchers in the treatment of hydroclimatic samples, also listed in the Table 1. ANNs or artificial neural networks is considered as one of the most interactive and efficient MLA in terms of more accurate approximations, higher modelling speed and could be able in modelling of complex flood structure (i.e. Mosavi et al. 2018; Li et al. 2010; Wu and Chau 2010; Jain and Prasad Indurthy 2004). Frequently applied for the modelling of river flow characteristics is rainfall-runoff modelling or prediction or extrapolation of streamflow characteristics, as revealed from Table 1. But, besides their several advantages, ANNs exhibited some modelling issues in flood modelling such as complexity in data handling and network architecture (Deo and Sahin 2015). Besides this algorithm, the SVM or support vector machine, a supervised machine learning algorithm which works on the principal of statistical learning theory as well as rule of structural risk minimizations, is recognized as the most efficient and robust approach, especially in solving non-linear regression issues in the flood predictions and modelling (i.e. Ortiz-García et al. 2014; Gizaw and Gan 2016; Gong et al. 2016; Jajarmizadeh et al. 2015; Tehrany et al. 2015). Based on training from the historical observations, SVM can extrapolate the data for the future time frame; also in few literature, it is incorporated as a regression tools called Support vector regression (or SVR) (i.e. Li et al. 2016; Tehrany et al. 2015). The WNN or wavelet neural network is another most interactive machine learning approach in the time series extrapolations of flood characteristics, which is based on the principal of decomposition of initial observation sets into individual resolution levels. It is widely applied in the modelling of daily streamflow characteristics, rainfall-runoff as well as reservoir inflow modelling (i.e. Supratid et al. 2017; and Ravansalar et al. 2017). On the other side, the ANFIS or adaptive neuro-fuzzy inference system algorithm often poses quick and easy implementation as well as accurate and higher abilities in the learning procedure and thus often poses a good choice in the forecasting of flood episodes (i.e. Choubin et al. 2014; Lafdani et al. 2013; Shu and Ouarda 2008). Besides this, the decision tree (or DT) algorithm, which is based on the technique of tree of decision-making, is widely applicable in the prediction of flood events (Tehrany et al., 2014; Liaw and Wiener 2002), which further classified as fast algorithm (Tehrany et al. 2013), classification and regression tree (or CART) (i.e. Dehghani et al. 2017), random forest method (or RFM) (i.e. Liaw and Wiener 2002) and M5 decision tree algorithm (i.e. Etemad-Shahidi and Mahjoobi 2009). All the above-mentioned ML algorithms are classified into two groups depending upon the length of samples or prediction lead-time under considerations such as short term and long term and which are further categorized as single and hybrid method.

Table 1 Machine learning algorithm (MLA) in the treatment of hydrologic samples

Research discussion

Statistical inferencing of extreme hydroclimatic samples, for retrieving flow exceedance probabilities or return period, is often revealing an insightful concern for assessing hydrologic risk in the basin perspective water resources planning, management, and designing facilities. Actually, the hydrometeorological stimulations either via the extension of historical rainfalls samples or through joint distribution framework over the variables of interest are the two distinct ways to address the risk assessments for an extreme flood scenario. Few attempts extracted flood frequency curve via integrating hydrological models in conjunction with probabilistic rainfall models, i.e. either via the conventional based lumped and distributed models or via continuous or event-based hydroclimatic simulations. But due to the longer computational analysis, as it is demanding high spatial and temporal resolutions, it would be attributed for an ineffective characterization of catchments behaviour. Due to the higher degree of uncertainty and complex flood characteristics, it often demanding to establish a probability distributions framework instead of any deterministic procedure (Sen 1999 and Hosking et al. 1985). Multidimensional behaviour often demands the necessity of multivariate constructions for retrieving design variable quantiles under the different notations of return periods through accounting its multiple design vectors instead of just examining the univariate frequency relationship or return periods. In actuality, univariate frequency analysis would be incapable to recognize the full screen of flood or inflow hydrograph and thus could be demanding to introduced multiple intercorrelated flood vectors, i.e. flood peak, volume and its duration to establish joint probability density functions or pdf and joint cumulative distribution functions cdf, especially from the prospects of hydraulic designing procedures where accountability of multivariate design parameters could be a feasible desire based on their multivariate exceedance probabilities. In other words, selection of return period depends upon the importance of undertaken structure as well as its consequences of failure where their appropriate selection often attributed an impact over the strength of design variables quantiles (Brunner et al. 2016). Therefore, unreliability and impractical consequences of the univariate flood modelling motivated numerous demonstrations towards the development of a multivariate distribution framework via introducing distinguished varieties of traditional probability functions for establishing bivariate joint relations between flood peak-volume, volume-durations or peak-duration, i.e. Choulakian et al., 1990; Yue et al. 1999; Yue 1999, 2000, and references therein), but due to several statistical shortcomings which often limited the applicability of traditional multivariate functions and thus motivated extended demonstrations in the light of bivariate copulas simulations under the parametric or semiparametric distribution settings (De Michele and Salvadori 2003; Salvadori and De Michele 2004; Salvadori 2004; Nelsen 2006; Karmakar and Simonovic 2009 and references therein). In multivariate risk statistics, return period usually associated with certain exceedance probabilities and their selection is not an arbitrary process which solely based on the nature of work assessments that will further decide the importance of design vectors into considerations.

During copula constructions, the approximations of the marginal distribution of univariate random vectors via the parametric functions would be problematic due to unsymmetrical or skewed distribution behaviour of hydrologic samples. Also, the parametric functions often imposed an assumption that random samples are drawn from the population whose density structure is pre-defined, i.e. the marginal distribution of flood characteristics is assumed to follow some specific family of parametric density functions, but in actuality, no universally accepted models are fixed or assigned to any of hydrologic vector. Thus, many literature pointed the flexibility of non-parametric probability concept in the light of Kernel density estimations or kde which is recognized as a much stable data smoothing procedure in the field of hydrologic or flood frequency analysis and yielding a bonafide density (i.e. Adamowski 1996, 2000; Ghosh and Mujumdar 2007; Santhosh and Srinivas 2013 and references therein). Distinguished varieties of copulas are incorporated such as the extreme value class (i.e. Gumbel-Hougaard, Galambos and Husler-Reiss), elliptical class (i.e. Gaussian family), unclassified Plackett and Farlie-Gumbel-Morgenstern (or FGM) parametric functions and three-parametric Twan family (i.e. belong to extreme value class) for establishing bivariate dependencies of hydroclimatic samples and among which the Archimedean class (i.e. Ali-Mikhail or A-M-H family, Frank family, Clayton or Cook-Johnson (C-J) family and Gumbel-Hougaard family) copulas are frequently accepted due to large varieties of families and its capability to capture joint dependencies for a wider extent also, exhibiting several desirable properties which attribute much flexibility during joint probability simulations (i.e. De Michele and Salvadori 2003; Salvadori and De Michele 2004; Favre et al. 2004; Nelsen 2006; Grimaldi and Serinaldi 2006; Papaioannou et al. 2016; Galiatsatou and Prinos 2016; Requena et al. 2016 and references therein). Each family of Archimedean class is characterized by a specific extent of dependency capturing capability, which is constrained by the degree of intersection between random vectors and will be investigated based on the dependency measure. Unless extended, efforts are motivated towards establishing copula-based bivariate design estimations but such attempts still might be insufficient for revealing comprehensive studies of flood probability analysis due to its triplet distribution behaviour and could be demanding for the simultaneous accountability of its all intercorrelated vectors. The potential damage could be depending upon the multiple relevant vectors of specified hydrological episodes such that ignorance of spatial dependency among these random vectors might be attributed for underestimation of uncertainty (Renard and Lang 2007; Graler et al. 2013; Vernieuwe et al. 2015). Thus, a limited number of literature appeared in the context of 3-dimensional copula distribution analysis for establishing the trivariate joint relationship and their associated return periods (i.e. Reddy and Ganguli 2013; Graler et al. 2013; Daneshkhan et al. 2016 and references therein). Distinguished varieties of standard trivariate copulas are incorporated, i.e. Grimaldi and Serinaldi (2006) (mono-parametric and asymmetric or FNA structure of frank function), Serinaldi and Grimaldi (2007) (FNA structure), Genest et al. (2007) (meta-elliptical copulas), Reddy and Ganguli (2013) (FNA and Student’s t copulas which belong to elliptical class) and Fan and Zheng (2016) (entropy copulas). Genest et al. (2007) revealed that the meta-elliptical copulas could be effective incorporation for preserving the pairwise dependencies among the random vectors through the correlation matrix but might be ineffective under the low probabilities unless the asymptotic properties of data will be justified through the strong arguments. Similarly, the flexibility of Plackett family of copulas which concluded for faithful preservation of lower-level dependencies during higher dimension modelling is pointed by Kao and Govindaraju (2008) for rainfall samples. Madadgar and Moradkhani (2013) captured the joint behaviour of drought episodes using trivariate Gumbel copula (i.e. Archimedean family function) and t copula (i.e. elliptical family). Some literature still pointed the issue of faithful preservations of lower stages dependency via the FNA structure and their modelling limitation which is only limited for positive range and thus pointed the applicability of few other standard class of trivariate copulas. Actually, justifiable preservation of all the lower-level dependencies often seems a challenging effort in the higher dimensional copula-based methodology especially, if the complex pattern of dependency exhibited over the multidimensional data structure. Also, it often demands a flexible methodology through precise estimation of tail dependence coefficient under various tail dependency. Therefore, literature such as Kurowicka and Cooke (2006), Joe (1997), Aas et al. (2009) and Bedford and Cooke (2001); Bedford and Cook (2002) directed towards a comprehensive way of uncertainty characterization for higher dimensional hydrological entities using the vine or pair-copula construction (or PCC). Actually, vine copula construction is solely based on the principle of the decomposition of full multivariate density into a cascade or simple local building blocks via conditional independence or pair-copulae (Aas and Berg 2009; Bedford and Cook 2002). Due to conditional mixing via the stage-wise hierarchical nesting procedure, the pair-copula concept exhibited much effective and flexible modelling environments. In PCC construction, interactive sets of multiple bivariate copulas in cascade form are often employed in fitting a copula to random vectors and their conditional and unconditional distribution, instead of just introducing a fixed multidimensional structure to all the characteristics and which might be attributed for ineffective over the data exhibited complex dependence structure in the tail and which often a stringent challenges in hydrological modelling (Joe 1997; Bedford and Cooke 2001; Bedford and Cook 2002). Distinct varieties of pair-copula decomposition are attributed under the regular vine structure such as canonical or C-vine and D-vine distribution in which the applicability of D-vine structure is frequently sounded from the existed literature due to their higher flexibility than the C-vine structure. In actuality, the degree of mutual concurrency among multiple targeted vectors comprises the basis to adopt a justifiable vine tree structure (Graler et al. 2013). The approximation capability of vine copula for multidimensional structure depends upon the manner of their decomposition, unless the modelling efficacy of PCC structure is reviewed from the above-cited literature but still having some modelling issues, i.e. complexity during the selection and synthesis of justifiable copula structure under parametric density concept for vine constructions (Bedford et al. 2015). Therefore, the concept of minimal information based vine structure is discussed such that this non-informative vine concept could further exaggerate the flexibility of conventional PCC structure (Daneshkhan et al. 2016). Actually, minimum information PCC captures the complex multidimensional flood structure for various tail dependency by the precise estimation of their tail coefficient for given selected copulas and also facilitates to the model multivariate extreme in the presence of limited data length (Daneshkhan et al. 2016).

The statistical significance of return periods under multidimensional design concept for tackling different hydrologic problems is reviewed in the separate section. Estimating multivariate design quantiles under different notations of return periods, i.e. based on joint and conditional probability distribution functions or via Kendall’s distribution or survival functions, is often an essential concern in the hydrologic risk assessments (Salvadori 2004; Graler et al. 2013; Salvadori et al. 2013). Brunner et al. (2016), Shiau (2003), Salvadori (2004), Salvadori and De Michele (2004, 2007), Salvadori et al. (2011) and Serinaldi (2015) pointed the extended mathematical formulation of defining the different notations of return periods using copula-based methodology. In actuality, univariate return period might be useful only, if the concentration of single hydrological vector will justify the requirements of the design process where each of the separate return period approach has their own significance and will be solely based on the nature of the undertaking problem, and also, it is much difficult task to decide for the most consistent ways (Veronika and Halmova, 2013; Serinaldi 2015). Therefore, the return periods demonstrate the undertaken assessment requirements in a much better way, only the things that create a sharp distinct through selecting most consistency and justifiable return period. According to Reddy and Ganguli (2013), considering both the primary as well secondary return period could be an effective practice more especially from the prospect of flood defence infrastructure designs. Such that concentrating over only the return period in either ‘OR’ case or ‘AND’ might reveal for under-dimensioned or over-dimensioned. Actually, the joint return period facilitates different possible ways to capture joint relationship for various possible combinations among the multiple flood vectors. In other words, for a given return period, various possible design combinations can be possible or vice versa. Besides the importance of joint return contour, most of the hydrological design requirements often demand to define events through highlighting the significance or priority of one design variables over another design vectors, i.e. conditional distribution or conditional return periods (i.e. Salvadori and De Michele 2004, Shiau 2006, Zhang and Singh 2006, 2007a, Kao and Govindaraju 2008, Salvadori and De Michele 2010, Salvadori et al. 2011). Such that probability of flood peak conditional to volume or durations or either the flood volume conditional to peak or durations or flood durations conditional to flood peak or volume information would be benefited from the hydraulic design prospects.

Hydrological consequences, i.e. flood, drought or rainfall, exhibited either the critical, sub-critical or super-critical behaviour; thus, in such circumstances, only the accountability of primary return period might be problematic or attributed for underestimation of correct value (Salvadori and De Michele 2010). Capturing of only mean forecasting would not facilitate to demonstrate the risk of supercritical or dangerous episodes which are rare. Appropriate reliability in hydraulic design facilities often intended towards defining the exceedance probabilities of rare episodes (Sarhadi et al. 2016). Therefore, it could be demanding to make a sharp distinction through segregating probability distribution space into a non-critical and super-critical region based on the critical cumulative probability level and will be further extended into the multidimensional frame in the context of Kendall distribution function \( ^{\prime }{K}_{C_{\theta }}(.)^{\prime } \) (i.e. Salvadori 2004; Salvadori and De Michele 2004; Salvadori and De Michele 2007 and Graler et al. 2013). The analytical efforts or ether through numerical approach based on simulation algorithm are the two different computational ways to estimate the Kendall joint return period, derived from Kendall probability function under the copula distribution framework (Salvadori et al., 2007; Vandenberghe et al. 2010; Salvadori et al. 2011). The analytical expression for Kendall function are motivated by Ghoudi et al. (1998) and Salvadori and De Michele (2007) for the bivariate extreme value and Archimedean copula distributions while, Salvadori et al. (2011) focused, tackling of simulation algorithmic efforts (via numerical analysis) for defining′KC′ in the absence of analytical expression. It might be possible that few non-critical events reveal for larger value over any undertaken design value, where the Kendall’s function attributes for longer joint concurrence probabilities for all the super-critical scenario over the design value thus can be undertaken in the light of survival Kendall’s function (Salvadori et al. 2013). This structure-based concurrence probability estimations establishing an inter-association between hydrological characteristics and design parameters via the strictly monotonic structure function as a statistically formulated equation and which can be facilitated for structural failure return period (i.e. Volpin and Fiori 2014). Due to multivariate nature of the design problems, it often demands the characterization of most justifiable design for a given estimated return period requirement under the two different perspectives, i.e. one approaches concentrated through ‘component-wise excess design realization’ while the other one focused ‘the most-likely design realization’ (Salvadori et al. 2011). The design realization via H-conditional approach is another alternative which can be defined in the presence of ruling variable (Salvadori et al. 2014).

Research conclusions

Basin perspective water resources operational planning, managements or the hydraulic structural designs often demand an accurate estimation of flow exceedance probability for assessing the flood risk. Due to higher degree of uncertainty and complex flood dependence structure, it often demands a probabilistic approach for the treatment of historical streamflow observations within the catchments region based on several mathematical and statistical frameworks. The flood frequency analysis or FFA statistically defines an inter-association between flood design quantiles and their recursion interval by fitting a univariate or multivariate probability distribution functions or pdfs. Multivariate flood distribution analysis often provides a comprehensive understanding in the flood generating probability which usually comprises a combination of the joint probability density functions or pdfs and joint cumulative distribution functions or cdfs. In actuality, flood is a multivariate complex and stochastic hydrologic consequence usually characterized completely through its multiple intercorrelated random vectors, i.e. flood peak discharge, volume and duration of flood hydrograph. Therefore, the reliability of univariate flood frequency analysis or return periods often stands for several queries which would be attributed for underestimation and overestimations of hydrologic risk and thus often demanding the establishment of multivariate joint distribution of flood characteristics by accounting its multiple intercorrelated flood vectors. Actually, univariate flood probability constructions would be incapable to recognize the full screen of flood or inflow hydrograph and reduce uncertainty in the estimated design quantiles.

In this literature, the efficacy of copula-based methodologies is reviewed for establishing multivariate distributions of flood episodes, which is recognized as highly flexible tool for establishing multivariate joint dependency and their associated return periods in comparison with traditional multivariate functions that are discussed in the context of a theoretical and mathematical simulation for the flood characteristics. In this study, different methodological attempts in the light of bivariate and trivariate copula distribution analysis are pointed for tackling multivariate design problems and estimating design variable quantiles under different notations of return periods. The section ‘Flood frequency analysis via one-dimensional probability distribution framework or approximation of marginal distributions’ pointed a distinguish variety of one-dimensional mono-parametric, bi-parametric or tri-parametric based parametric distribution family functions, which are often employed for establishing univariate marginal distributions and which often a mandatory pre-requisite desires before introducing individual hydrologic or flood vectors into multivariate or copula framework. It is also revealed that the different density structures attribute different estimations of design quantiles, especially in the tail of the distributions; also, flexibility of available univariate functions exhibited a control to justify an appropriate fit with the given samples such that it depends upon its associated vectors of unknown statistical or model parameters. But, simulations via parametric functions often imposed an assumption that random samples are drawn from the population whose density structure is pre-defined. In actuality, no specific models are categorized and opted universally for any specific hydrologic variables, which would follow different distributions. Therefore, flexibility of non-parametric based kernel density estimator is recognized as a much stable data smoothing procedure in the field of hydrologic or flood modelling and which yielding a bonafide density as revealed from section ‘An approach via non-parametric distribution framework’. The non-parametric framework does not require any prior distribution assumptions and will be directly derived from distribution series with higher extent of flexibility as compared with parametric density estimators. Unless the univariate frequency analysis defines the general concept of return period via the estimated cumulative distribution function, it might be unsatisfactory when the requirement demands the consideration of multivariate design parameters, which often reveals an essential concern in the water-related queries.

Multivariate practices via the traditional probability distribution functions often attribute for the several statistical shortcomings and limitations, as revealed from section ‘Limitation of traditional multivariate distribution framework’. Actually, the classical statistical approaches of estimating degree of association would be incapable for characterizing the co-movements tendencies of hydrologic or flood vectors. In such aspects, the copula function appeared as a most effective multivariate tool which segregates modelling of individual univariate vectors and their joint structure separately into two distinct stages, which thus attribute higher flexibility in selecting most appropriate and justifiable marginal distributions and their joint structure to capture a wider extent of dependency along with preservation in their joint structure, as revealed from section ‘Copula-based bivariate probability distributions’. The copula-based methodology can be classified as parametric, semiparametric and non-parametric estimation procedures depending upon the way of estimating its univariate marginals and joint dependence structure. An interactive set of copula family function such as the extreme value class (i.e. Gumbel-Hougaard, Galambos and Husler-Reiss), elliptical class (i.e. Gaussian family), unclassified Plackett and Farlie-Gumbel-Morgenstern (or FGM) parametric functions and three-parametric Twan family (i.e. belong to extreme value class) is often incorporated for establishing bivariate joint dependence structure, in which the Archimedean class (i.e. Ali-Mikhail or A-M-H family, Frank family, Clayton or Cook-Johnson (C-J) family and Gumbel-Hougaard family) copulas are frequently accepted due to large varieties of families and its capability to capture joint dependencies for a wider extent also, exhibiting several desirable properties which attributes much flexibility during joint probability simulations, as revealed from same section. Unless an extended efforts are often motivated towards establishing copula-based bivariate simulations and estimation of bivariate design variable quantiles under the different notations of return periods but, such attempts still might be insufficient for revealing a justifiable and comprehensive studies of flood probability analysis, due to its trivariate behaviour. Actually, potential damage could likely be a function of multiple relevant vectors of specified hydrological episodes such that an ignorance of spatial dependency among these uncertain vectors might be attributed for the underestimation of uncertainty, which frequently encountered during risk evaluation. Therefore, section ‘Trivariate joint dependency constructions via 3-dimensional copulas’ discussed the applicability of 3-dimensional copula function for establishing trivariate joint simulation of flood characteristics and their associated return periods but whose computational strategies are quite limited over the existing literature. The above conventional trivariate copula simulation often encountered some statistical issues such as complexity during the approximation of justifiable parametric distributions for higher dimensional hydrological attributes and also might be quite ineffective to capture and reflect all the possible mutual concurrency among multidimensional flood vectors, as revealed from section ‘Vine copulas or PCC framework for trivariate joint distributions’. Due to higher degree of uncertainty and complex flood dependence structure, resolving the dependence structure of multivariate extreme via conventional copula formulation is quite complex and often demands a flexible methodology through precise estimation of tail dependence coefficient under various tail dependency. For solving such issues, the vine or pair-copula construction (PCC) provides a comprehensive way of uncertainty characterization for the higher dimensional hydrological entities which is solely based on the principle of the decomposition of full multivariate density into a cascade or simple local building blocks via conditional independence or pair-copulas. In actuality, due to conditional mixing via the stage-wise hierarchical nesting procedure, the pair-copula concept exhibited much effective and flexible modelling environments. The PCC structure also exhibited some modelling issues as revealed from ‘Vine copulas or PCC framework for trivariate joint distributions’ and thus motivated towards the minimum information PCC which capture complex multidimensional flood structure for various tail dependencies by the precise estimation of their tail coefficient for given selected copulas and also facilitate to the model multivariate extreme in the presence of limited data length.

The statistical significance of return periods under multidimensional design concept for tackling different hydrologic problems is discussed in the section ‘Return periods under multivariate settings’. Return periods can be derived from the exceedance probabilities of flood attributes pair in the multidimensional risk framework, such that joint return period retrieves from the joint exceedance probabilities and segregated into two distinct groups, i.e. primary return period comprise via the inclusive probability such as ‘AND and ‘OR’ return period and the secondary or ‘Kendall’ return period, which can be define based on the Kendall’s probability distribution or survival function. Utilizing the standard definition of return period based on inclusion probability or primary returns might be attributed for underestimation of correct value. Actually, primary return periods captured the mean forecasting which would be incapable to demonstrate the risk of supercritical or dangerous scenario. Appropriate reliability of hydraulic design system often intended towards the definition of exceedance probabilities for rare episodes and thus pointed the mathematical significance and derivation of the secondary return periods derived from the Kendall’s distribution and survival function called the Kendall’s return period, as discussed in section ‘Demonstrating the risk of supercritical extreme via the Kendall’s distribution and survival functions (or secondary return periods)’.

Some ideas to strengthen the current attempts of multivariate practices via incorporating time-varying copula framework

Unless extended efforts are motivated via the multivariate stochastic generations of flood characteristics in the context of bivariate or trivariate copulas simulations for retrieving the flood exceedance probabilities or design quantiles under different notations of return periods, how one could appropriately justify the desire of defensive task without addressing dynamic environmental arising (Climate change and/or LULCC) often poses an isolation behaviour or independence with such phenomenon (Kartz et al. 2002; Strupczewski and Kaczmarski 2001; Khaliq et al. 2006; El Adlouni et al. 2007; Villarini et al. 2009; Wigley 2009; Lopez-Paz et al. 2013; Condon et al. 2015). Consistencies and accuracy in the estimated design quantiles under stationary risk framework might be doubted due to the ignorance of the accountability of changing phenomenon either over univariate structure (i.e. temporal variability in their mean and variance) of individuals flood vectors or in their joint correlation structure (Zhang 2005; Bender et al. 2014; Galiatsatou and Prinos 2016; Sarhadi et al. 2016). The existence of the non-stationarity, due to external controlling factors, usually tries to interrupt the hydrological behaviour within catchment region and which might be further altered the expectation of such extremity happening under time-invariant hydrologic risk assessment efforts. Actually, traditional flood modelling is often designed with the hypothesis of independent and identically distributed (or i.i.d) behaviour of hydrologic samples and such assumption often adapted as a standard design procedure for tackling water-related issues but, due to the time-varying consequences, it would interrupt the statistical characteristics of hydrological samples and might lead to non-stationarity (Gilroy and Macuen 2012; Lima et al. 2015). Time-varying controlling covariates would pose their stress over the hydrological characteristics differently from the future prospects as compared to what their impact surrounded in the past or present scenario (Khaliq et al. 2006; Chebana et al. 2013; Jiang et al. 2015), such that the chance of the future occurrence of flood episodes would be likely changed over the time functions such as like goodness of 100-year flood in a given year would change (Gilroy and Macuen 2012; Du et al. 2015). Thus, stress of the time-varying consequences over flood exceedance probabilities might resulted that the actual associated risk either greater or smaller than the hazards statistics accounted under stationary risk concept and might reveal for under-dimensioned or over-dimensioned in the designing strategies (Lima et al. 2015). More especially from the prospect of engineering based hydraulic structural designing procedure where, it could be an essential desire to outline the expectation of potential for future changing over their design value to justify the anticipated structural design life appropriately both from the present and from the future prospects (Bender et al. 2014; Sarhadi et al. 2016). Numerous efforts often motivated to addressed the dynamic consequences over univariate hydrological characteristics through the implementation of univariate extreme value modelling frame via covariate analysis (i.e. Strupczewski and Kaczmarski 2001; Coles 2001; Kartz et al. 2002; Zhang 2005; Wong et al. 2006; Clarke 2007; El Adlouni et al. 2007; Villarini et al. 2010; Gilroy and Macuen 2012; Lopez and Frances 2013; Lima et al. 2015). But the multidimensional behaviour of flood episodes often demands a multivariate stochastic framework in conjunction with addressing of dynamic consequences over their design quantile framework. In the traditional copula-based methodology, both the marginal distribution and the copula-based joint dependence parameters will not allow varying over time in order to adjoin the stress of covariates over the flood characteristics (Corbella and Stretch 2013; Jiang et al. 2015; Galiatsatou and Prinos 2016). With my best knowledge, very limited attempts adapted the multivariate modelling strategies for hydrological characteristics in the light of dynamic copula frameworks, i.e. Corbella and Stretch (2013), Bender et al. (2014), Jiang et al. (2015), Sarhadi et al. (2016), and Galiatsatou and Prinos (2016). These computational strategies are usually segregated into two distinct stages such as modelling the non-stationarity consequences over the univariate flood attributes through time-varying marginal distribution structure while modelling the dynamic scenario over the joint probability structure of multiple random vectors in conjunction with copulas structure with dynamic parameters under time-varying joint frame structure. As concluded from above literature that dynamic copula simulation often incorporated via demonstrating the temporal variation within bivariate joint relationship and their return periods, also the importance of trivariate joint distributions and their return periods is already pointed in the above section (i.e. Graler et al. 2013; Reddy and Ganguli 2013 and references therein). On another side, the flexibility of PCC and minimum information PCC structure for constructing the higher dimensional copulas are already pointed from the above cited literature (i.e. Daneshkhan et al. 2015, 2016). Thus, the overall conclusion could point the ideas towards an attempt via integrating dynamic concept for capturing the effect of temporal influence in the trivariate joint distribution or design quantiles.