1 Introduction

When the goal is to find an appropriate model for a random vector \(\textbf{X}\in \mathbb {R}^d\), a well-established strategy is to model the marginal behaviours and the dependence structure separately. This approach is possible thanks to a theorem of Sklar (1959), which states that there exists \(C:[0,1]^d \rightarrow [0,1]\) such that for all \({\textbf{x}}= (x_1, \ldots , x_d) \in \mathbb {R}^d\),

$$\begin{aligned} {\mathbb {P}}(\textbf{X}\le {\textbf{x}}) = C \left\{ {\mathbb {P}}(X_1\le x_1), \ldots , {\mathbb {P}}(X_d\le x_d) \right\} . \end{aligned}$$

Whenever the marginal distributions are continuous, the function C is unique and is called the copula of \(\textbf{X}\). See Nelsen (2007) and Joe (2015) for details on copulas. In that context, it is customary to assume that C belongs to a parametric family \(\mathcal C = \{ C_{\varvec{\theta }}; {\varvec{\theta }}\in \varTheta \subseteq \mathbb {R}^p \}\) and then estimate the unknown parameter from a sample of copies \(\textbf{X}_1, \ldots , \textbf{X}_T\) of \(\textbf{X}\).

A popular estimator of a copula parameter is the pseudo-maximum likelihood (PML) estimator introduced by Oakes (1994) and later investigated by Genest et al. (1995). In principle, this method is applicable regardless of the dimension of the vector and of the number of parameters. However, the PML estimator requires an explicit expression for the copula density, which is not always the case, notwithstanding the fact that the density may be explicit but intractable. The PML estimator can also be numerically unstable, especially when the family has several parameters. Some authors have proposed approaches to address these shortcomings. For example, minimum-distance estimators were considered by Biau and Wegkamp (2005) based on copula densities and by Tsukahara (2005) and Weiß (2011) relying on goodness-of-fit metrics. When there is only one parameter to estimate, a common strategy is the inversion of Kendall’s tau. This estimator was considered by Genest et al. (2006) in the bivariate case and by Genest et al. (2011) for d-dimensional copulas.

This paper extends the use of the inversion of Kendall’s tau to families comprising multiple parameters. The proposed estimators are based on unbiased estimations of the moments of the multivariate probability integral transformation (MPIT) random variable, thereby avoiding the need to estimate the marginal distributions. The proposed method is similar to Brahimi and Necir (2012), who suggested using higher moments of the MPIT. However, their estimators are biased and their approach is limited to cases where the vector of theoretical moments is explicitly invertible. In order to circumvent the aforementioned constraint, an approach based on simulated moments à la McFadden (1989) is adopted. Hence, the proposed estimators can be performed as soon as it is possible to simulate from a given copula family.

The manuscript is organized as follows. A generic method-of-moments estimator based on a vector of U-statistics is introduced in Sect. 2; its consistency and asymptotic normality are established under mild conditions for both the standard and simulated versions. Section 3 explains how to estimate the moments of the multivariate probability integral transformation without biases and describes the new estimators of copula parameters suitably adapted to the parametric structure at hand. In Sect. 4, the performance of the proposed estimators is investigated and compared to competing procedures through an extensive simulations study. Section 5 illustrates the introduced methodologies on the modelling of multivariate data with copula models that have a complex parametric structure. Section 6 ends the paper with a brief discussion. The mathematical proofs can be found in an appendix and the Matlab code is available at www.uqtr.ca/MyMatlabWebpage.

2 A generic method-of-moments estimator of copula parameters

2.1 Statistical functionals

The estimators proposed in this paper are special cases of a generic method-of-moments estimator based on a vector of U-statistics. Specifically, let \(\textbf{X}= (X_1,\ldots , X_d)\) be a random vector from a d-variate distribution function F that has continuous marginals \(F_1, \ldots , F_d\) and a unique copula C. Consider the vector \({\varvec{\kappa }}= ({\varvec{\kappa }}_1, \ldots , {\varvec{\kappa }}_L)\), where for each \(\ell \in \{1, \ldots , L\}\), \({\varvec{\kappa }}_\ell := {\varvec{\kappa }}_\ell ({\textbf{x}}_1, \ldots , {\textbf{x}}_m)\) is a symmetric function in its m arguments. Then, for \(\textbf{X}_1, \ldots , \textbf{X}_m\) i.i.d. F, define the statistical functional

$$\begin{aligned} \mathcal S_{\varvec{\kappa }}(F) = {\mathbb {E}}\left\{ {\varvec{\kappa }}(\textbf{X}_1, \ldots , \textbf{X}_m) \right\} . \end{aligned}$$

In order to develop semi-parametric estimators of the parameters of a given family of copulas, it is necessary that \(\mathcal S_{\varvec{\kappa }}(F)\) be free of the marginal distributions. To this end, \({\varvec{\kappa }}\) must be such that \(\mathcal S_{\varvec{\kappa }}(F) = \mathcal S_{\varvec{\kappa }}(C)\), i.e.,

$$\begin{aligned} {\mathbb {E}}\left\{ {\varvec{\kappa }}(\textbf{X}_1, \ldots , \textbf{X}_m) \right\} = {\mathbb {E}}\left\{ {\varvec{\kappa }}({\textbf{U}}_1, \ldots , {\textbf{U}}_m) \right\} , \end{aligned}$$
(1)

where \({\textbf{U}}_1, \ldots , {\textbf{U}}_m\), with \({\textbf{U}}_j = ( F_1(X_{j1}), \ldots , F_d(X_{jd}) )\), are i.i.d. C.

Assuming a random sample \(\textbf{X}_1, \ldots , \textbf{X}_T\) i.i.d. F, an unbiased estimator of the vector of means \({\varvec{\mu }}_C^{\varvec{\kappa }}= {\mathbb {E}}\left\{ {\varvec{\kappa }}({\textbf{U}}_1, \ldots , {\textbf{U}}_m) \right\} \) is given by the L-dimensional vector of U-statistics

$$\begin{aligned} {\varvec{\mu }}_T^{\varvec{\kappa }}= {T \atopwithdelims ()m}^{-1} \sum _{1\le t_1< \ldots < t_m \le T} {\varvec{\kappa }}\left( \textbf{X}_{t_1}, \ldots , \textbf{X}_{t_m} \right) , \end{aligned}$$
(2)

where in the above expression, the sum of vectors is taken component by component. From Theorem 3, p. 122 of Lee (1990), \({\varvec{\mu }}_T^{\varvec{\kappa }}\) converges almost surely to \(\mu _C^{\varvec{\kappa }}\) as long as \({\mathbb {E}}\{ |{\varvec{\kappa }}(\textbf{X}_1,\ldots ,\textbf{X}_m)| \} < \infty \).

The next result states the asymptotic normality of \(\sqrt{T} ({\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}_C^{\varvec{\kappa }})\). The proof is a direct application of Theorem 2, p. 76 of Lee (1990) and is therefore omitted. Here and in the sequel, \(\rightsquigarrow \) means “converges in distribution to”.

Lemma 1

For each \(\ell \in \{ 1, \ldots , L \}\), let \({\textbf{U}}_1, \ldots , {\textbf{U}}_m\) be i.i.d. C and define \({\varvec{\kappa }}_C^\star = ({\varvec{\kappa }}^\star _1, \ldots , {\varvec{\kappa }}^\star _L)\), where

$$\begin{aligned} {\varvec{\kappa }}^\star _\ell ({\textbf{u}}) = m \left[ {\mathbb {E}}\left\{ {\varvec{\kappa }}_\ell \left( {\textbf{u}}, {\textbf{U}}_2, \ldots , {\textbf{U}}_m \right) \right\} - {\varvec{\mu }}_{C,\ell }^{\varvec{\kappa }}\right] . \end{aligned}$$

Provided that \(\varSigma _C^{\varvec{\kappa }}= {\mathbb {E}}\{ {\varvec{\kappa }}_C^\star ({\textbf{U}}_1)^\top {\varvec{\kappa }}_C^\star ({\textbf{U}}_1) \} \in \mathbb {R}^{L\times L}\) is non-singular, \(\sqrt{T} ({\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}_C^{\varvec{\kappa }}) \rightsquigarrow \mathcal {N}(\textbf{0}_L, \varSigma _C^{\varvec{\kappa }})\).

2.2 A generalized method-of-moments estimator

Suppose that the copula C of the population belongs to a copula family

$$\begin{aligned} \mathcal C = \left\{ C_{\varvec{\theta }}; {\varvec{\theta }}\in \varTheta \subset \mathbb {R}^L \right\} . \end{aligned}$$

In that case, there exists \({\varvec{\theta }}_0 \in \varTheta \) called the “true value”, so that \(C = C_{{\varvec{\theta }}_0}\). As is well known, a method-of-moments estimator consists of estimating \({\varvec{\theta }}_0\) by selecting \({\varvec{\theta }}\in \varTheta \) such that the sample moments match their theoretical counterparts. In the current context, let \(\mu ^{\varvec{\kappa }}({\varvec{\theta }}) = \mu _{C_{\varvec{\theta }}}^{\varvec{\kappa }}\) and assume that the map \({\varvec{\mu }}^{\varvec{\kappa }}: \varTheta \rightarrow \mathbb {R}^L\) is one-to-one on an open set \(\varTheta \subset \mathbb {R}^L\). A method-of-moments estimator of \({\varvec{\theta }}_0\) is the unique solution of \({\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }}) = {\varvec{\mu }}_T^{\varvec{\kappa }}\), namely

$$\begin{aligned} {\varvec{\theta }}_T^{\varvec{\kappa }}= ({\varvec{\mu }}^{\varvec{\kappa }})^{-1}({\varvec{\mu }}_T^{\varvec{\kappa }}). \end{aligned}$$

The consistency and asymptotic normality of \({\varvec{\theta }}_T^{\varvec{\kappa }}\) are stated next.

Proposition 1

Let \(\textbf{X}_1, \ldots , \textbf{X}_T\) be i.i.d. \(C_{{\varvec{\theta }}_0}\) and assume that \({\varvec{\mu }}^{\varvec{\kappa }}: \varTheta \rightarrow \mathbb {R}^L\) is one-to-one and continuously differentiable at \({\varvec{\theta }}_0 \in \varTheta \) with nonsingular first order derivative \({\varvec{\nu }}_0^{\varvec{\kappa }}\in \mathbb {R}^{L\times L}\) at \({\varvec{\theta }}_0\). Let also \(A_0 = ({\varvec{\nu }}_0^{\varvec{\kappa }})^{-1}\) and define \(\varSigma _0^{\varvec{\kappa }}\) as the covariance matrix in Lemma 1 when \(C = C_{{\varvec{\theta }}_0}\). Then \({\varvec{\theta }}_T^{\varvec{\kappa }}\) exists with a probability tending to one and

$$\begin{aligned} \sqrt{T} \left( {\varvec{\theta }}_T^{\varvec{\kappa }}- {\varvec{\theta }}_0 \right) \rightsquigarrow \mathcal {N}\left( \textbf{0}_L, A_0^{\varvec{\kappa }}\, \varSigma _0^{\varvec{\kappa }}\, (A_0^{\varvec{\kappa }})^\top \right) . \end{aligned}$$

2.3 Simulated version of the generalized method-of-moments estimator

It is difficult to compute the proposed method-of-moments estimator when \({\varvec{\mu }}^{\varvec{\kappa }}\) is not explicitly invertible. This is even worse when there is no explicit expression for \({\varvec{\mu }}^{\varvec{\kappa }}\). In such cases, it is useful to express \({\varvec{\theta }}_T^{\varvec{\kappa }}\) as the minimum-distance estimator

$$\begin{aligned} {\varvec{\theta }}_T^{\varvec{\kappa }}= \mathop {\mathrm {arg\,min}}\limits _{{\varvec{\theta }}\in \varTheta } \left\{ {\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }}) \right\} M_T \left\{ {\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }}) \right\} ^\top , \end{aligned}$$

where \(M_T \in \mathbb {R}^{L\times L}\) is a weight matrix that converges in probability to a positive definite matrix \(M_0 \in \mathbb {R}^{L\times L}\) as \(T \rightarrow \infty \). Nevertheless, this expression does not solve the problem of cases where \({\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }})\) admits no explicit expression.

To avoid the above-mentioned drawbacks, a simulated version of the generic method-of-moments estimator \({\varvec{\theta }}_T^{\varvec{\kappa }}\) is proposed. The idea is in the same spirit as that investigated by Oh and Patton (2013) for copula-based time series models, which is itself inspired by the simulated method-of-moments estimators studied by McFadden (1989), Pakes and Pollard (1989) and Newey and McFadden (1994). To describe the idea in the current context, consider a version of \({\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }})\) based on a simulated sample \({\textbf{U}}_1^{\varvec{\theta }}, \ldots , {\textbf{U}}_S^{\varvec{\theta }}\) i.i.d. \(C_{\varvec{\theta }}\), namely

$$\begin{aligned} {\varvec{\mu }}_S^{\varvec{\kappa }}({\varvec{\theta }}) = {S \atopwithdelims ()m}^{-1} \sum _{1\le s_1< \ldots < s_m \le S} {\varvec{\kappa }}\left( {\textbf{U}}_{s_1}^{\varvec{\theta }}, \ldots , {\textbf{U}}_{s_m}^{\varvec{\theta }}\right) . \end{aligned}$$

The proposed simulated method-of-moments estimator of \({\varvec{\theta }}_0 \in \varTheta \) is then

$$\begin{aligned} {\varvec{\theta }}_{T,S}^{\varvec{\kappa }}= \mathop {\mathrm {arg\,min}}\limits _{{\varvec{\theta }}\in \varTheta } \{ {\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}_S^{\varvec{\kappa }}({\varvec{\theta }}) \} M_T \{ {\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}_S^{\varvec{\kappa }}({\varvec{\theta }}) \}^\top . \end{aligned}$$
(3)

Proposition 2 states the consistency of \({\varvec{\theta }}_{T,S}^{\varvec{\kappa }}\) as \(T,S \rightarrow \infty \), i.e., \({\varvec{\theta }}_{T,S}^{\varvec{\kappa }}\) converges in probability to the true value \({\varvec{\theta }}_0 \in \varTheta \). The conditions under which it happens are mild. In particular, it is no longer required that \(\mu ^{\varvec{\kappa }}\) is one-to-one.

Proposition 2

Let \(\textbf{X}_1, \ldots , \textbf{X}_T\) be i.i.d. from a distribution with continuous marginals and unique copula that belongs to a family \(\{ C_{\varvec{\theta }}; {\varvec{\theta }}\in \varTheta \subset \mathbb {R}^L \}\), where \(\varTheta \) is compact. For \({\varvec{\theta }}_0 \in \varTheta \) being the true value, assume that \({\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }}_0) \ne {\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }})\) as soon as \({\varvec{\theta }}\ne {\varvec{\theta }}_0\) and that \(\mu ^{\varvec{\kappa }}\) is Lipschitz continuous on \(\varTheta \). Then, as \(T, S \rightarrow \infty \), \({\varvec{\theta }}_{T,S}^{\varvec{\kappa }}\) is consistent for \({\varvec{\theta }}_0\).

Remark 1

Unlike standard results on the consistency of simulated method-of-moments estimators (see Pakes and Pollard 1989; McFadden 1989, for instance), Proposition 2 allows ST to go to infinity at different rates. In other words, it is assumed more generally that \(T/S \rightarrow \zeta \in [0,\infty )\) as \(T,S \rightarrow \infty \), so that the case when \({\varvec{\mu }}({\varvec{\theta }})\) is explicit is recovered at the limit when \(T/S \rightarrow 0\).

Lemma 2

A sufficient condition for the Lipschitz continuity of \({\varvec{\mu }}^{\varvec{\kappa }}\) is that the density \(c_{\varvec{\theta }}\) of \(C_{\varvec{\theta }}\) be uniformly Lipschitz, i.e., there exists \(K \in (0,\infty )\) such that

$$\begin{aligned} \sup _{{\textbf{u}}\in [0,1]^d} \left| c_{{\varvec{\theta }}_1}({\textbf{u}}) - c_{{\varvec{\theta }}_2}({\textbf{u}}) \right| \le K \left\| {\varvec{\theta }}_1 - {\varvec{\theta }}_2 \right\| , \end{aligned}$$

where \(\Vert \cdot \Vert \) is the euclidean norm.

The uniform Lipschitz property as stated in Lemma 2 holds for the Farlie–Gumbel–Morgenstern family of bivariate copulas whose members have density \(c_\theta (u_1,u_2) = 1 + \theta (1-2u_1)(1-2u_2)\) for \(\theta \in [-1,1]\), since one readily obtains

$$\begin{aligned} \left| c_{\theta _1}(u_1,u_2) - c_{\theta _2}(u_1,u_2) \right| \le \left| \theta _1 - \theta _2 \right| . \end{aligned}$$

However, this uniform Lipschitz condition is rather strong and not verified for many commonly used copula families.

It is possible to show the almost sure convergence (strong consistency) of \({\varvec{\theta }}_{T,S}^{\varvec{\kappa }}\), but its weak consistency is enough to establish its asymptotic normality. This result is precisely the subject of the next proposition.

Proposition 3

Let \(\textbf{X}_1, \ldots , \textbf{X}_T\) be i.i.d. from a distribution with continuous marginals and unique copula that belongs to a family \(\{ C_{\varvec{\theta }}; {\varvec{\theta }}\in \varTheta \subset \mathbb {R}^L \}\). Assume that the true value \({\varvec{\theta }}_0\) is an interior point of \(\varTheta \) and that \(g_0^{\varvec{\kappa }}({\varvec{\theta }}) = {\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }}_0) - {\varvec{\mu }}^{\varvec{\kappa }}({\varvec{\theta }})\) possesses a derivative \(G_0^{\varvec{\kappa }}\) at \({\varvec{\theta }}_0\) such that \(B_0^{\varvec{\kappa }}= G_0^{\varvec{\kappa }}\, M_0 \, (G_0^{\varvec{\kappa }})^\top \) is nonsingular. Also, suppose that for \(g_{T,S}^{\varvec{\kappa }}({\varvec{\theta }}) = {\varvec{\mu }}_T^{\varvec{\kappa }}- {\varvec{\mu }}_S^{\varvec{\kappa }}({\varvec{\theta }})\),

$$\begin{aligned} g_{T,S}^{\varvec{\kappa }}({\varvec{\theta }}_{T,S}^{\varvec{\kappa }}) \, M_T \, g_{T,S}^{\varvec{\kappa }}({\varvec{\theta }}_{T,S}^{\varvec{\kappa }})^\top - \inf _{{\varvec{\theta }}\in \varTheta } g_{T,S}^{\varvec{\kappa }}({\varvec{\theta }}) \, M_T \, g_{T,S}^{\varvec{\kappa }}({\varvec{\theta }}))^\top \le o_{\mathbb {P}}(T^{-1}). \end{aligned}$$

Then for \(\varOmega _0^{\varvec{\kappa }}= (B_0^{\varvec{\kappa }})^{-1} \, G_0^{\varvec{\kappa }}\, M_0 \, \varSigma _0^{\varvec{\kappa }}\, M_0 \, (G_0^{\varvec{\kappa }})^\top (B_0^{\varvec{\kappa }})^{-1}\),

$$\begin{aligned} \sqrt{ T S \over T+S } \left( {\varvec{\theta }}_{T,S}^{\varvec{\kappa }}- {\varvec{\theta }}_0 \right) \rightsquigarrow \mathcal {N}\left( \textbf{0}_L, \varOmega _0^{\varvec{\kappa }}\right) ~\text{ as } T, S \rightarrow \infty . \end{aligned}$$

3 Moments of the multivariate probability integral transformation

3.1 Population versions

The estimators that will be developed in the sequel are based on the multivariate probability integral transformation (MPIT). The MPIT was first introduced by Genest and Rivest (1993) as a tool for testing the fit to bivariate copulas. Specifically, the MPIT of a random vector \(\textbf{X}= (X_1, \ldots , X_d) \sim F\) is the random variable \(W = F(\textbf{X})\). When the marginals \(F_1, \ldots , F_d\) of F are continuous, one can invoke Sklar’s Theorem and write

$$\begin{aligned} W = C \left\{ F_1(X_1), \ldots , F_d(X_d) \right\} = C({\textbf{U}}), \end{aligned}$$

where \({\textbf{U}}= (F_1(X_1), \ldots , F_d(X_d)) \sim C\). In other words, the stochastic behaviour of W depends only on the unique copula C of F.

For certain parametric structures, it is advantageous to work with the pair-by-pair probability integral transformations related to \(\textbf{X}\). Specifically, let \(C_{jj'}\) be the copula of the pair \((X_j,X_{j'})\) and consider \(W_{jj'} = C_{jj'}(U_j,U_{j'})\), where \(U_j = F_j(X_j)\) and \(U_{j'} = F_{j'}(X_{j'})\). The first moment of \(W_{jj'}\) is related to Kendall’s measure of association of \((U_j,U_{j'})\) through the relationship

$$\begin{aligned} \tau _{jj'} = 4 \, {\mathbb {E}}(W_{jj'}) - 1. \end{aligned}$$

A similar claim can be made about a d-variate extension due to Kendall and Smith (1940) defined as the mean of the pair-by-pair Kendall’s tau. Another extension of Kendall’s tau proposed by Joe (1990) is related to the first moment of W. These two d-variate Kendall measures are given respectively by

$$\begin{aligned} \tau _d = {d\atopwithdelims ()2}^{-1} \sum _{j<j'} \left\{ 4 \, {\mathbb {E}}(W_{jj'}) - 1 \right\} \quad \text{ and }\quad \widetilde{\tau }_d = { 2^d \, {\mathbb {E}}(W) - 1 \over 2^{d-1} - 1 } . \end{aligned}$$

3.2 Unbiased estimation of the moments of the MPIT

The next proposition provides an unbiased estimator of the moments of W based on a sample \(\textbf{X}_1, \ldots , \textbf{X}_T\) i.i.d. from a d-variate distribution F with continuous marginals \(F_1, \ldots , F_d\) and unique copula C.

Proposition 4

For \(a \in \mathbb {N}\), let \(\mu _a = {\mathbb {E}}(W^a)\), where \(W = C({\textbf{U}})\) and \({\textbf{U}}\sim C\). Then an unbiased estimator of \(\mu _a\) is provided by the U-statistic \(\widehat{\mu }_a\) with symmetric kernel of order \(a+1\) defined for \({\textbf{x}}_1, \ldots , {\textbf{x}}_{a+1} \in \mathbb {R}^d\) by

$$\begin{aligned} \mathcal {K}_a({\textbf{x}}_1, \ldots , {\textbf{x}}_{a+1}) = \frac{1}{a+1} \sum _{j=1}^{a+1} \mathbb {I}\left\{ \left( \max _{\ell \in \{1, \ldots , a+1 \}, \ell \ne j} {\textbf{x}}_\ell \right) < {\textbf{x}}_j \right\} . \end{aligned}$$

This kernel satisfies (1), i.e., is marginal-free.

Note that when \(a=1\), \(\mathcal {K}_1({\textbf{x}}_1,{\textbf{x}}_2) = \{ \mathbb {I}({\textbf{x}}_1< {\textbf{x}}_2) + \mathbb {I}({\textbf{x}}_2 < {\textbf{x}}_1) \} / 2\). This is, up to a constant, the kernel of the empirical Kendall’s measure of association.

One is now in a position to establish the convergence in distribution of the vector of U-statistics \(\widehat{\varvec{\mu }}= ( \widehat{\mu }_1, \ldots , \widehat{\mu }_L)\).

Proposition 5

Let \(\textbf{X}_1, \ldots , \textbf{X}_T\) be i.i.d. from a d-variate distribution with continuous marginals and unique copula C. For \(\widehat{\varvec{\mu }}= ( \widehat{\mu }_1, \ldots , \widehat{\mu }_L)\) and \({\varvec{\mu }}= ( \mu _1, \ldots , \mu _L)\), then \(\sqrt{T} (\widehat{\varvec{\mu }}- {\varvec{\mu }})\) converges to the L-variate normal distribution with zero means and variance-covariance matrix \(\varSigma \in \mathbb {R}^{L\times L}\) such that for each \(a, a' \in \{ 1, \ldots , L \}\), \(\varSigma _{aa'} = {\mathbb {E}}\{ \mathcal {K}_a^\star ({\textbf{U}}) \, \mathcal {K}_{a'}^\star ({\textbf{U}}) \}\), \({\textbf{U}}\sim C\), where

$$\begin{aligned} \mathcal {K}_a^\star ({\textbf{u}}) = \{ C({\textbf{u}}) \}^a + a \, {\mathbb {E}}\left\{ \mathbb {I}({\textbf{U}}>{\textbf{u}}) \left( C({\textbf{U}}) \right) ^{a-1} \right\} - (a+1) \, \mu _a. \end{aligned}$$

3.3 Estimators of copula parameters

Let \(\textbf{X}_1, \ldots , \textbf{X}_T\) be a random sample of independent and identically distributed \(\mathbb {R}^d\)-valued vectors whose joint distribution F has continuous marginal distributions. It is assumed that the unique copula C of F belongs to \(\{ C_{\varvec{\theta }}; {\varvec{\theta }}\in \varTheta \subseteq \mathbb {R}^L \}\) and the goal is to estimate the unknown parameter \({\varvec{\theta }}_0 \in \varTheta \subset \mathbb {R}^L\).

One possibility to estimate \({\varvec{\theta }}_0\) is to use the first L moments of \(W = C_{\varvec{\theta }}({\textbf{U}})\), i.e., let \({\varvec{\mu }}({\varvec{\theta }}) = ({\mathbb {E}}(W), \ldots , {\mathbb {E}}(W^L) )\), where it is understood that the expectation is taken with respect to \(C_{\varvec{\theta }}\). An empirical version of \({\varvec{\mu }}({\varvec{\theta }})\) is the vector \({\varvec{\mu }}_T\) of the first L empirical moments. The corresponding vector of kernels is then \({\varvec{\kappa }}= (\mathcal {K}_1, \ldots , \mathcal {K}_L)\), where \(\mathcal {K}_a\) is defined in Proposition 4. The simulated method-of-moments estimator is thus of the form given in (3), i.e.,

$$\begin{aligned} \widehat{\varvec{\theta }}= \mathop {\mathrm {arg\,min}}\limits _{{\varvec{\theta }}\in \varTheta } \{ {\varvec{\mu }}_T - {\varvec{\mu }}_S({\varvec{\theta }}) \} M_T \{ {\varvec{\mu }}_T - {\varvec{\mu }}_S({\varvec{\theta }}) \}^\top , \end{aligned}$$
(4)

where \({\varvec{\mu }}_S({\varvec{\theta }})\) estimates \({\varvec{\mu }}({\varvec{\theta }})\) based on a sample of size S from \(C_{\varvec{\theta }}\).

Many parametrization schemes have a pair-by-pair structure of the form \({\varvec{\theta }}= (\varSigma ,{\varvec{\gamma }})\), where \(\varSigma \in \mathbb {R}^{d\times d}\) is a correlation matrix whose off-diagonal entry \(\varSigma _{jj'}\) appears only in the distribution of \((X_j,X_{j'})\) for each \(j \ne j' \in \{1,\ldots ,d\}\). This pattern occurs, for example, for models derived from the multivariate Normal. If \({\varvec{\gamma }}\in \mathbb {R}^q\) is a parameter that appears in the distribution of every sub-vector of \(\textbf{X}\), one can estimate \(\varSigma _{jj'}\) and \({\varvec{\gamma }}\) from the first \(q+1\) moments of \(W_{jj'} = C_{\varSigma _{jj'},{\varvec{\gamma }}}(U_j,U_{j'})\), yielding \(\widehat{\varSigma }_{jj'}\) and \(\widehat{{\varvec{\gamma }}}_{jj'}\). The global parameter can then be estimated with

$$\begin{aligned} \widehat{{\varvec{\gamma }}} = {d\atopwithdelims ()2}^{-1} \sum _{j<j' \in \{1,\ldots ,d\}} \widehat{{\varvec{\gamma }}}_{jj'}. \end{aligned}$$
(5)

Remark 2

It is worth mentioning the work by Brahimi and Necir (2012), who also suggested using the first L moments of W. Their methodology is however limited to cases where the vector of theoretical moments is explicitly invertible and is based on a biased estimation of the moments using the empirical copula.

3.4 On the use of alternative probability integral random variables

Other moments of a copula could be used for the purpose of parameter estimation. For instance, as pointed out by Quessy (2009), Spearman’s measure of association and some of its multivariate extensions can be expressed as the expectation of a symmetric kernel of a U-statistic. Indeed, Spearman’s rho is an affine transformation of the expectation of \(W_\textrm{Sp} = F_1(X_1) \times \cdots \times F_d(X_d)\), where \(\textbf{X}= (X_1, \ldots , X_d)\) follows a distribution F with continuous marginals \(F_1, \ldots , F_d\). One could then consider \({\mathbb {E}}(W_\textrm{Sp}^a)\), but its estimation with a U-statistic involves a kernel of order \(a \times (d+1)\). To see it, note that

$$\begin{aligned} {\mathbb {E}}(W_\textrm{Sp}^a) = {\mathbb {E}}_\textbf{X}\left[ \left\{ {\mathbb {E}}\, \mathbb {I}\left( \textbf{Y}^\varPi \le \textbf{X}\right) \right\} ^a \right] = {\mathbb {E}}_{\textbf{X},\textbf{Y}_1^\varPi , \ldots ,\textbf{Y}_a^\varPi } \left\{ \prod _{j=1}^a \mathbb {I}\left( \textbf{Y}_j^\varPi \le \textbf{X}\right) \right\} , \end{aligned}$$

where \(\textbf{Y}_1^\varPi , \ldots , \textbf{Y}_a^\varPi \) are i.i.d. \(F_1 \times \cdots \times F_d\) and \(\textbf{X}\sim F\). The expression inside the brackets involves \(a \times (d+1)\) independent random variables. This makes the use of \(W_\textrm{Sp}\) less attractive than the use of W, especially as the number of parameters increases.

4 Sampling properties of the estimators

4.1 Preliminaries

This section investigates the performance of the estimators defined in Eqs. (4) and (5). Comparisons with competing estimators are also made. In the sequel, the accuracy of an estimator \(\widehat{\theta }\) of a given parameter \(\theta \in \mathbb {R}\) is measured by its relative bias (RB) and relative root mean-squared error (RRMSE), namely

$$\begin{aligned} \textrm{RB}_\theta (\widehat{\theta }) = {1\over \theta } \, {\mathbb {E}}\left( \widehat{\theta }- \theta \right) \quad \text{ and }\quad \textrm{RRMSE}_\theta (\widehat{\theta }) = {1\over \theta } \sqrt{ {\mathbb {E}}\left\{ \left( \widehat{\theta }-\theta \right) ^2 \right\} } . \end{aligned}$$

As explained by Oh and Patton (2013) (see also Gouriéroux et al. 1996), it is crucial that the random number generator seed involved in their computation be fixed across the generation of the simulated datasets of size S. Otherwise, the evaluation of the estimated function will be unstable and the optimization will not converge. The minimization in Eq. (3) is performed using the MATLAB routine fminsearchbnd written by John D’Errico. Unlike for example the Newton algorithm, the latter does not require the existence of derivatives. The maximum number of iterations has been set to 40. Many choices are possible for the weight matrix, including the inverse of the efficient weight matrix. However, in line with a recommendation by Oh and Patton (2013), one only considers \(M_T = \textrm{I}_L\) throughout in order to simplify the analyses.

4.2 Calibration of the simulated estimator

A popular estimator in the case of a one-parameter bivariate copula families \(\mathcal C = \{ C_\theta ; \theta \in \varTheta \subseteq \mathbb {R}\}\) is the inversion of Kendall’s tau (IKT). Specifically, if \((X_{11},X_{12}), \ldots , (X_{T1},X_{T2})\) are random pairs from a population with continuous marginals and a copula \(C \in \mathcal C\), the IKT estimator is defined by

$$\begin{aligned} \widehat{\theta }^\tau = \tau _C^{-1}(\tau _T). \end{aligned}$$

In that expression, \(\tau _C\) is the population value of Kendall’s tau, i.e.,

$$\begin{aligned} \tau _C(\theta ) = 4 \int _0^1 \int _0^1 C_\theta (u_1,u_2) \textrm{d}C_\theta (u_1,u_2) - 1, \end{aligned}$$

and \(\tau _T\) is its empirical counterpart, i.e.,

$$\begin{aligned} \tau _T = {4 \over T(T-1)} \sum _{i<j} \mathbb {I}\left\{ (X_{i1}-X_{j1}) (X_{i2}-X_{j2}) > 0 \right\} - 1. \end{aligned}$$

Whereas the simulated method-of-moments estimator is designed for cases where an explicit expression for the vector of moments is unavailable, it can be performed in the case of copula families with explicit expressions for their moments. This is the case for the one-parameter Clayton, Gumbel, Normal and Frank bivariate copula families described in Table 1.

Table 1 Copula \(C_\theta \), Kendall’s tau \(\tau _C(\theta )\) and parameter space \(\varTheta \) for parametric families of one-parameter bivariate copulas

An investigation on the accuracy of \(\widehat{\theta }^\tau \) and its estimated version \(\widehat{\theta }\) has been performed in the light of their estimated RRMSE as estimated from 1000 replicates. This study can provide not only information on the loss of accuracy due to the use of the estimated version instead of the mathematical inversion, but also on the role of the number of simulated samples S on the performance of \(\widehat{\theta }\). To this end, the values \(S \in \{ 100, 250, 500 \}\) have been considered; the corresponding estimators are referred respectively to \(\widehat{\theta }_{100}\), \(\widehat{\theta }_{250}\) and \(\widehat{\theta }_{500}\). The results for sample sizes \(T \in \{50,100\}\) for a Kendall’s tau that belongs to \(\{ 1/4, 1/2, 3/4 \}\) are found in Table 2. Note that the expression of Kendall’s tau for Frank’s copula has been inverted numerically.

Table 2 Estimation, based on 1000 replicates, of the relative root mean-squared error (RRMSE) of the inversion of Kendall’s tau estimator \(\widehat{\theta }^\tau \) and of the simulated estimator \(\widehat{\theta }\) when \(S \in \{100,250,500\}\) for sample sizes \(T \in \{50,100\}\) under four one-parameter families of bivariate copulas

Looking at Table 2, one can see that the RRMSE are smaller when \(T=100\) compared to \(T=50\), as expected. It is also not a surprise that \(\widehat{\theta }^\tau \) is more accurate than its simulated versions. However, the loss of efficiency for using \(\mu _S(\theta )\) instead of \(\mu (\theta )\) is rather mild as soon as \(S=250\). In fact, when \(T=100\), the average of the relative efficiency \(\textrm{RRMSE}(\widehat{\theta }^\tau ) / \textrm{RRMSE}(\widehat{\theta })\) over the twelve models is 74,0%, 87,2% and 93,3% when \(S=100,250,500\), respectively. Note finally that the RRMSE decreases as the level of dependence increases, i.e. as \(\tau _C\) increases, for all the models.

4.3 On the use of higher moments of the MPIT

One could ask if considering higher moments of W can lead to better estimators. To answer this question, at least in part, let \(\widehat{\theta }_{\{2\}}\) and \(\widehat{\theta }_{\{1,2\}}\) be the simulated method-of-moments estimators based respectively on the second moment of W and on its first two moments. The estimated relative bias and relative root mean-squared error of these two estimators, as well as of \(\widehat{\theta }\), are found in Table 3 for the one-parameter families of Table 1. The sample size is \(T=100\) and the number of simulated samples has been set to \(S=100\). It is reasonable to think that the use of higher values of S would not influence much the relative efficiency of the estimators. The conclusion from Table 3 is that using higher moments does not result in more efficient estimators. Even though the RRMSE of \(\widehat{\theta }_{\{1,2\}}\) is slightly smaller than that of \(\widehat{\theta }\) in some cases, the gain in accuracy is rather small. As a general rule, one can advocate the use of the simulated method-of-moments estimator with the same number of moments of W as the number of parameters to estimate.

Table 3 Estimation, based on 1000 replicates, of the relative bias (RB) and relative root mean-squared error (RRMSE) of the simulated method-of-moments estimators based on the first, second, and first two moments of W when \(T=100\) and \(S=100\) under four one-parameter families of bivariate copulas

4.4 Comparison with competing procedures

In this section, the performance of \(\widehat{\varvec{\theta }}\) will be compared to two other semi-parametric procedures, namely the pseudo-maximum likelihood (PML) estimator and the simulated method-of-moments estimator of Oh and Patton (2013). Other possibilities exist, for instance minimum distance (MD) estimators derived from goodness-of-fit criteria. Based on an extensive simulation study, Weiß (2011) concluded that no MD estimator stands out and are worse than the PML estimator in terms of bias and mean-squared error. For that reason, no MD estimator will be considered in the upcoming simulations study.

The PML estimator is the rank-based version of the maximum-likelihood estimator and is sometimes referred to the canonical maximum likelihood (see Cherubini et al. 2004, for instance). If \(C_{\varvec{\theta }}\) admits a density \(c_{\varvec{\theta }}\), the PML estimator of \({\varvec{\theta }}\) as defined by Shih and Louis (1995) and Genest et al. (1995) is

$$\begin{aligned} \widehat{\varvec{\theta }}_\textrm{ML} = \mathop {\mathrm {arg\,max}}\limits _{{\varvec{\theta }}\in \varTheta } \sum _{t=1}^T \ln c_{\varvec{\theta }}\left( { R_{t1} \over T+1 }, \ldots , { R_{td} \over T+1 } \right) , \end{aligned}$$

where for each \(j \in \{ 1, \ldots , d \}\), \(R_{tj}\) is the rank of \(X_{tj}\) among \(X_{1j}, \ldots , X_{Tj}\). The estimator of Oh and Patton (2013) is based on the vector of statistics \({\varvec{\nu }}_T = (\rho _T^\textrm{Sp}, \lambda _T^{.05}, \lambda _T^{.10}, \lambda _T^{.90}, \lambda _T^{.95})\), where \(\rho _T^\textrm{Sp}\) is Spearman’s rank correlation and

$$\begin{aligned} \lambda _T^q = \left\{ \begin{array}{ll} \displaystyle {C_T(q,q) \over q},\, &{} \text{ when } q\le 1/2; \\ \\ \displaystyle {1 - 2q + C_T(q,q) \over 1-q},\, &{} \text{ when } q>1/2, \end{array} \right. \end{aligned}$$

where \(C_T\) is the bivariate empirical copula. Recall that

$$\begin{aligned} C_T(u_1,u_2) = {1\over T} \sum _{t=1}^T \mathbb {I}\left( {R_{t1} \over T+1} \le u_1, {R_{t2} \over T+1} \le u_2 \right) . \end{aligned}$$

Letting \({\varvec{\nu }}_S({\varvec{\theta }})\) be a version of \({\varvec{\nu }}_T\) based on a sample of size S from \(C_{\varvec{\theta }}\), the estimator of Oh and Patton (2013) is

$$\begin{aligned} \widehat{\varvec{\theta }}_\textrm{OP} = \mathop {\mathrm {arg\,min}}\limits _{{\varvec{\theta }}\in \varTheta } \{ {\varvec{\nu }}_T - {\varvec{\nu }}_S({\varvec{\theta }}) \} M_T \{ {\varvec{\nu }}_T - {\varvec{\nu }}_S({\varvec{\theta }}) \}^\top . \end{aligned}$$

As suggested by Oh and Patton (2013), the weight matrix is set to \(M_T = \textrm{I}_5\).

4.4.1 One-parameter families

The performances of \(\widehat{\theta }_\textrm{ML}\), \(\widehat{\theta }_\textrm{OP }\) and \(\widehat{\theta }\) have been compared under the four families of Tables 1. The maximization of the pseudo-likelihood for the computation of \(\widehat{\theta }_\textrm{ML}\) uses the MATLAB procedure fminsearchbnd. To increase the numerical stability, the computation of Gumbel’s density uses the finite-difference approximation \(c_\theta ^\textrm{Gu}(u_1,u_2) \approx C_\theta ^\textrm{Gu}(u_1+\epsilon ,u_2+\epsilon ) + C_\theta ^\textrm{Gu}(u_1,u_2) - C_\theta ^\textrm{Gu}(u_1+\epsilon ,u_2) - C_\theta ^\textrm{Gu}(u_1,u_2+\epsilon )\), where \(\epsilon = 1\times 10^{-8}\). The four copula models are parametrized in term of Kendall’s tau \(\tau _C \in [0,1]\) and the optimum is searched inside [.01, .99] with the initial value \(x_0 = 1/2\). The results on the relative bias and relative root mean-squared error are found in Table 4 when \(T=100\). The number of simulated samples has been set to \(S=500\).

Table 4 Estimation, based on 1000 replicates, of the relative bias (RB) and relative root mean-squared error (RRMSE) of \(\widehat{\theta }_\textrm{ML}\), \(\widehat{\theta }_\textrm{OP}\) and \(\widehat{\theta }\) when \(T=100\) and \(S=500\) under four one-parameter families of bivariate copulas

First note that the estimator \(\widehat{\theta }_\textrm{OP}\) of Oh and Patton (2013) is significantly more biased than \(\widehat{\theta }\), except when \(\tau _C=1/4\) for the Normal and Frank copulas. This difference in accuracy is also reflected in their respective RRMSE. As expected, the pseudo-maximum likelihood (PML) estimator performs generally better than the simulated method-of-moments (SMM) estimator in terms of RRMSE. However, the average of the relative efficiency over the twelve models is 87,8%, so the loss of efficiency for using \(\widehat{\theta }\) instead of \(\widehat{\theta }_\textrm{ML}\) is small. Interestingly, the relative bias of the SMM estimator is often smaller than that of the PML estimator when the level of dependence is small or moderate, i.e., when \(\tau \in \{1/4, 1/2\}\). Overall, the SMM estimator can be safely recommended when the use of the PML estimator is problematic, e.g., when the density is not tractable and/or when there is a high number of parameters to estimate.

4.4.2 The two-parameters chi-square copula

As defined by Bárdossy (2006) and later investigated by Quessy et al. (2016), the d-variate chi-square copula is the dependence structure of \(\textbf{X}= ( (Z_1+\gamma _1)^2, \ldots , (Z_d+\gamma _d)^2 )\), where \((Z_1, \ldots , Z_d)\) is d-variate normal with zero means, unit variances and correlation \(\varSigma \), and \(\gamma _1, \ldots , \gamma _d \in [0,\infty )\) are non-centrality parameters. One recovers the Normal copula at the limit when \(\gamma _1 = \cdots = \gamma _d \rightarrow \infty \). Unlike the Normal, the chi-square copula is radially asymmetric.

The results in Table 5 concern the estimation of the parameters of the bivariate chi-square copula when \(\gamma _1 = \gamma _2 = \gamma \), where one can find the estimated relative bias and relative root mean-squared error of \((\widehat{\gamma }_\textrm{ML},\widehat{\theta }_\textrm{ML})\), \((\widehat{\gamma }_\textrm{OP},\widehat{\theta }_\textrm{OP})\) and \((\widehat{\gamma },\widehat{\theta })\) when \(T=100\) and \(S=100\). In that case, the density is

$$\begin{aligned} c_{\gamma ,\theta }(u_1,u_2)= & {} { \phi _\theta \left\{ h_\gamma (u_1) - \gamma , h_\gamma (u_2) - \gamma \right\} + \phi _\theta \left\{ h_\gamma (u_1) + \gamma , -h_\gamma (u_2) + \gamma \right\} \over D_\gamma (u_1) \, D_\gamma (u_2) } \\{} & {} + \, { \phi _\theta \left\{ h_\gamma (u_1) - \gamma , - h_\gamma (u_2) - \gamma \right\} + \phi _\theta \left\{ h_\gamma (u_1) + \gamma , h_\gamma (u_2) + \gamma \right\} \over D_\gamma (u_1) \, D_\gamma (u_2) }, \end{aligned}$$

where \(\phi _\theta \) is the density of the bivariate normal distribution with zero means, unit variances and correlation \(\theta \in [-1,1]\), and for \(G_\gamma (x) = \varPhi (\sqrt{x}-\gamma ) + \varPhi (\sqrt{x}+\gamma ) - 1\), \(h_\gamma (u) = \{ G_\gamma ^{-1}(u) \}^{1/2}\) and \(D_\gamma (u) = \phi \{h_\gamma (u)-\gamma \} + \phi \{h_\gamma (u)+\gamma \}\).

For the estimation of the non-centrality parameter \(\gamma \), one observes that the accuracy of the three estimators, both in terms of RB and RRMSE, increase as \(\gamma \) increases. For a given value of \(\gamma \), their accuracy also increases as \(\tau _C\) increases, except when \(\gamma =2\). Looking at the relative performance of the three estimators, the PML estimator is clearly the best. Overall, \(\widehat{\gamma }_\textrm{OP}\) is systematically, although slightly, more accurate than \(\widehat{\gamma }\). Turning to the estimation of the dependence parameter \(\theta \), one can say that \(\widehat{\theta }\) stands out positively from its two competitors when \(\gamma \in \{ 3/2, 2 \}\). When \(\gamma \in \{1/2, 1 \}\), there is no clear trend as to which estimator performs better overall. Generally speaking, \(\widehat{\theta }\) is the best when \(\tau _C = 1/4\), \(\widehat{\theta }^\textrm{OP}\) when \(\tau _C = 1/2\) and \(\widehat{\theta }^ \textrm{ML}\) when \(\tau _C = 3/4\).

Table 5 Estimation, based on 1000 replicates, of the relative bias (RB) and relative root mean-squared error (RRMSE) of the estimators \(\widehat{\theta }_\textrm{ML}\), \(\widehat{\theta }_\textrm{OP}\) and \(\widehat{\theta }\) of \(\theta \) and the estimators \(\widehat{\gamma }_\textrm{ML}\), \(\widehat{\gamma }_\textrm{OP}\) and \(\widehat{\gamma }\) of the non-centrality parameter \(\gamma \) under the two-parameter chi-square copula when \(T=100\) and \(S=100\)

4.5 Multivariate models

4.5.1 The Archimedean family

A d-dimensional copula is a member of the Archimedean family if it can be expressed in the form \(C_\varPsi ({\textbf{u}}) = \varPsi \left\{ \varPsi ^{-1}(u_1) + \cdots + \varPsi ^{-1}(u_d) \right\} \), where \(\varPsi : [0,\infty ) \rightarrow [0,1]\) is called the generator and satisfies \((-1)^j \, \varPsi ^{[j]} \ge 0\) for each \(j \in \{1,\ldots ,d\}\), where \(\varPsi ^{[j]}(t) = \partial ^j \, \varPsi (t) / \partial ^j t\). See McNeil and Nešlehová (2009) for more details. The Clayton, Gumbel and Frank copulas whose bivariate versions are detailed in Table 1 are particular cases of this class, where the generators are respectively \(\varPsi _\theta ^{\textrm{C}\ell }(t) = (\theta t + 1)^{-1/\theta }\), \(\varPsi _\theta ^\textrm{Gu}(t) = e^{-t^{1-\theta }}\) and

$$\begin{aligned} \varPsi _\theta ^\textrm{Fr}(t) = -{1\over \theta } \, \ln \left( e^{-\theta t} - 1 \over e^{-\theta } - 1 \right) . \end{aligned}$$

If it is assumed that the d-variate copula of a population belongs to a given parametric Archimedean family with \({\varvec{\theta }}\in \varTheta \subset \mathbb {R}^L\), then \({\varvec{\theta }}_0\) can be estimated by the simulated method-of-moments estimator in (4) based on the first L moments of W. Alternatively, since \({\varvec{\theta }}_0\) appears in the distribution of any possible pair of variables, another estimator similar to the one in (5) is

$$\begin{aligned} \widehat{\varvec{\theta }}^\star = {d \atopwithdelims ()2}^{-1} \sum _{j<j'\in \{1,\ldots ,d\}} \widehat{\varvec{\theta }}_{jj'}. \end{aligned}$$

Table 6 reports the performance of \(\widehat{\theta }\) and \(\widehat{\theta }^\star \) in terms of relative bias and RRMSE when \(T=100\) and \(S=250\) for the multivariate one-parameter Clayton, Gumbel and Frank copulas in dimensions \(d \in \{ 3, 4, 5 \}\). First observe that the RRMSE decreases as the dimension d increases. Note also the drastic decrease in RRMSE as Kendall’s tau passes from 1/4 to 1/2, then a slightly smaller decrease for the passage from 1/2 to 3/4. Note also that in most cases, \(\widehat{\theta }\) is less biased than \(\widehat{\theta }^\star \). However, the opposite is true when looking at the RRMSEs, although the differences are minimal. In short, it can be concluded that both estimation strategies work well.

Table 6 Estimation, based on 1000 replicates, of the relative bias (upper panel) and root mean-squared error (lower panel) of the simulated method-of-moments estimators \(\widehat{\theta }\) and \(\widehat{\theta }^\star \) when \(T=100\) and \(S=250\) under one-parameter multivariate Archimedean copulas

4.5.2 Elliptical copulas and their squared versions

Elliptical distributions are parametrized in a pair-by-pair fashion. Specifically, as initially defined by Cambanis et al. (1981), a vector \(\textbf{X}\in \mathbb {R}^d\) is said to follow an elliptically contoured distribution if it admits the stochastic representation \(\textbf{X}= R A \mathcal U\), where \(R>0\) is the radial random variable, \(\varSigma = A^\top A \in \mathbb {R}^{d\times d}\) is symmetric and positive definite and \(\mathcal U\) is uniformly distributed on the unit sphere in \(\mathbb {R}^d\). An elliptical copula is simply the copula extracted from an elliptical distribution, as first investigated by Fang et al. (2002). Elliptical copulas then inherit from the pairwise parametrization of elliptical distributions.

Apart from the Normal copula, a popular elliptical model is Student’s copula with \(\gamma \in (0,\infty )\) degrees of freedom and parameter \(\theta \in (-1,1)\), which can be expressed implicitly by \(C_{\gamma ,\theta }(u_1,u_2) = \varOmega _{\gamma ,\theta } \left\{ \varOmega _\gamma ^{-1}(u_1), \varOmega _\gamma ^{-1}(u_2) \right\} \), where \(\varOmega _\gamma \) is the cumulative distribution function of the univariate Student and \(\varOmega _{\gamma ,\theta }\) is the cdf of Student’s bivariate distribution. Another model is the generalized Laplace copula (see Kozubowski et al. 2013, for details) extracted from the multivariate distribution whose density is

$$\begin{aligned} f_{\varSigma ,\gamma }({\textbf{x}}) = { 2 \, |\varSigma |^{-1/2} \over (2\pi )^{d/2} \, \varGamma (\gamma ) } \left( \sqrt{ {\textbf{x}}\varSigma ^{-1} {\textbf{x}}^\top \over 2 } \right) ^{\gamma - d/2} K_{\gamma - d/2} \left( \sqrt{ 2 {\textbf{x}}\varSigma ^{-1} {\textbf{x}}^\top } \right) , \end{aligned}$$

with \(K_\lambda \) the modified Bessel function of index \(\lambda \).

As is well known, Kendall’s tau of a bivariate elliptical copula is given by the simple formula \(\tau _C(\theta ) = (2/\pi ) \sin ^{-1}\theta \), whatever the form of the radial random variable (see Fang et al. 2002, for instance). Hence, \({\mathbb {E}}(W) = \{ \tau _C(\theta ) + 1 \} / 4\) does not depend on R. As is stated in the next proposition, it is indeed the case for any moment of order \(a \in \mathbb {N}\) of W. This theoretical result is illustrated in the two top panels of Fig. 1, where one can find the estimated curves of \({\mathbb {E}}(W^2)\) as a function of \(\theta \in (0,1)\) for various values of \(\gamma \) in the case of the Student and Laplace copulas.

Proposition 6

The moment of order \(a \in \mathbb {N}\) of an elliptical copula characterized by some radial variable R does not depend on R.

In the light of Proposition 6, it is not possible to estimate \((\gamma ,\theta )\) with the SMM estimator based on the MPIT random variable. It is however possible to estimate the parameters of the so-called squared version of an elliptical copula. As defined by Quessy and Durocher (2019), the squared copula associated to a d-variate copula C is the distribution of \((|1-2U_1|, \ldots , |1-2U_d|)\) when \((U_1, \ldots , U_d) \sim C\). When C is an elliptical copula, the resulting squared construction has a pair-by-pair parametric structure \(\varSigma \in \mathbb {R}^{d\times d}\) and a global parameter \({\varvec{\gamma }}\in \mathbb {R}^q\). See the bottom panels of Fig. 1 for \({\mathbb {E}}(W^2)\) as a function of \(\theta \) for the squared–Student (introduced by Favre et al. (2018) as the Fisher copula) and squared–Laplace copulas.

Fig. 1
figure 1

Moment of order two of the probability integral random variable as a function of \(\theta \in [0,1]\) for the bivariate Student, Laplace, Fisher and Squared–Laplace copulas

As is explained in Sect. 3.3, the entries of \(\varSigma \) can be estimated using the first \(q+1\) moments of \(W_{jj'}\) for each \(j<j' \in \{1, \ldots , d\}\), and then estimate \({\varvec{\gamma }}\) with the mean of \(\widehat{{\varvec{\gamma }}}_{jj'}\), \(j<j' \in \{1, \ldots , d\}\), i.e., with the estimator in (5). The results in Table 7 concern the Fisher and squared–Laplace copulas in dimension \(d \in \{ 3, 4, 5 \}\) when \(T=100\) and \(S=100\). To simplify the presentation of results, the correlation matrix \(\varSigma \in \mathbb {R}^{d\times d}\) has been taken equicorrelated in such a way that \(\varSigma _{jj'} = \sin (\pi \tau _C /2)\) for each \(j \ne j' \in \{1, \ldots , d\}\), where \(\tau _C \in \{ 1/4,1/2,3/4\}\). From experiences not presented here, considering more general correlation matrices has a negligible influence on the performance of the estimator.

Looking at the results in Table 7, it can be seen that the performance of \(\widehat{\gamma }\) is, more or less, equivalent for the three dimensions considered. For the Fisher copula, the accuracy significantly increases as \(\tau _C\) increases when \(\gamma \in \{3,6\}\); when \(\gamma =10\), the estimator is the most accurate when \(\tau =1/2\). Turning to the squared–Laplace copula, the accuracy of the estimator in relation to the level of Kendall’s tau depends on \(\gamma \). Hence, whereas the accuracy increases as \(\tau _C\) increases when \(\gamma =1\), there is no clear trend when \(\gamma \in \{3,5\}\).

Table 7 Estimation, based on 1000 replicates, of the relative bias (RB) and relative root mean-squared error (RRMSE) of the simulated estimator \(\widehat{\gamma }\) when \(T=100\) and \(S=100\) under multivariate two-parameters pairwise copulas

5 Data analysis of hockey data

Ice hockey is a fast-paced team game that is played continuously and for which the measure of a player’s quality with appropriate indicators is a real challenge. For the illustration that follows, the \(T=410\) forwards that played at least 16,000 s in the 2019–2020 season of the National Hockey League (NHL) have been considered. Five variables, namely \(X_1\): Points, \(X_2\): Expected goals (xg) with a player on, \(X_3\): Playing in attack, \(X_4\): Scoring chances and \(X_5\): Number of shots, have been selected to characterize their offensive skills. Each variable have been rescaled to a block of 60 min. The pairwise scatterplots of the raw data and of the standardized ranks is found in Fig. 2.

Fig. 2
figure 2

Five-dimensional Hockey data set: histograms (on the diagonal), pairwise scatterplots of the raw data (above the diagonal) and of standardized ranks (below the diagonal)

Looking at Fig. 2, a radially asymmetric dependence structure featuring more weights in the upper tails seems to emerge. This is confirmed by the test of radial symmetry of Bahraoui and Quessy (2017) based on the copula characteristic function with the normal weight and smoothing parameter \(\sigma = 1\). Indeed, the p-value of the test, as estimated from 10,000 multiplier bootstrap samples, is 4.61%. For this reason, parameter estimation has been performed for the following seven radially asymmetric copula families in order to capture this radially asymmetric behaviour:

  1. (i)

    The one-parameter survival-Clayton and Gumbel Archimedean copulas;

  2. (ii)

    The chi-square copula with non-centrality parameter \(\gamma \in [0,\infty )\);

  3. (iii)

    The squared versions of the Student, Laplace and Pearson type II copulas;

  4. (iv)

    A special case of the skew–Student copula as defined by Demarta and McNeil (2005), i.e., the dependence structure of \(\textbf{X}= {\textbf{Z}}/ \sqrt{Y} + \gamma _2 \, \textbf{1}_d / Y\), where \({\textbf{Z}}\) is standard normal with correlation \(\varSigma \in \mathbb {R}^{d\times d}\), \(\gamma _1 \, Y\) is chi-square with \(\gamma _1 \ge 1\) degrees of freedom, \(\gamma _2 \in \mathbb {R}\) is an asymmetry parameter and \(\textbf{1}_d = (1, \ldots , 1)\).

The results of the estimation based on the simulated method-of-moments estimator performed with \(S=250\) are in Table 8. As a criterion for choosing an appropriate model among the seven copula families, the ability of the model to reproduce Kendall’s matrix has been considered. Specifically, a sample of size \(T=2,500\) have been simulated from each estimated model and the Frobenius matrix distance between the sample Kendall matrix \(K_T\) of the data and that of the simulated sample have been computed. For the Hockey data,

$$\begin{aligned} K_T = \left( \begin{array}{ccccc} 1 &{} ~0.460 &{} ~0.359 &{} ~0.411 &{} ~0.378 \\ 0.460 &{} ~1 &{} ~0.671 &{} ~0.583 &{} ~0.502 \\ 0.359 &{} ~0.671 &{} ~1 &{} ~0.476 &{} ~0.470 \\ 0.411 &{} ~0.583 &{} ~0.476 &{} ~1 &{} ~0.623 \\ 0.378 &{} ~0.502 &{} ~0.470 &{} ~0.623 &{} ~1 \end{array} \right) . \end{aligned}$$
Table 8 Results of the parameter estimation for the five-dimensional Hockey data set

Looking at the results in the third column of Table 8, the Chi-square (\(\chi ^2\)) and skew–Student (Sk) copulas stand out among the seven models considered. In order to see if these models are good to reproduce the observed data, artificial samples of size \(T=410\) from both copula models have been simulated at the estimated values. The corresponding estimations of \(\varSigma \) are

$$\begin{aligned} \widehat{\varSigma }^{\chi ^2}= & {} \left( \begin{array}{ccccc} 1 &{}.720 &{}.536 &{}.639 &{}.575 \\ .720 &{} 1 &{}.838 &{}.827 &{}.762 \\ .536 &{}.838 &{} 1 &{}.731 &{}.735 \\ .639 &{}.827 &{}.731 &{} 1 &{}.864 \\ .575 &{}.761 &{}.735 &{}.864 &{} 1 \end{array} \right) , ~~\widehat{\varSigma }^\textrm{Sk} = \left( \begin{array}{ccccc} 1 &{}.015 &{}.016 &{}.022 &{}.000 \\ .015 &{} 1 &{}.396 &{}.334 &{}.410 \\ .016 &{}.396 &{} 1 &{}.225 &{}.102 \\ .022 &{}.334 &{}.225 &{} 1 &{}.332 \\ .000 &{}.410 &{}.102 &{}.332 &{} 1 \end{array} \right) . \end{aligned}$$

Note that \(\widehat{\varSigma }^\textrm{Sk}\) results from a transformation due to Higham (2002) to make it positive definite. The resulting samples have then been put on the same scales as the raw data by taking the empirical percentiles. Their scatterplots (raw data and standardized ranks) are found in Fig. 3 and Fig. 4, respectively. Whereas the chi-square copula is better at reproducing Kendall’s matrix, the skew–Student copula seems better at reproducing the upper tail behaviour of the dependence structure.

Fig. 3
figure 3

Simulated data of size \(T=410\) from the estimated Chi-square copula with marginal distributions matching those of the Hockey data set

Fig. 4
figure 4

Simulated data of size \(T=410\) from the estimated skew–Student copula with marginal distributions matching those of the Hockey data set

6 Conclusion

This paper developed a general parameter estimation procedure for multivariate copula models. The proposed estimators are based on the moments of the multivariate probability integral transformation (MPIT). It then generalizes the inversion of Kendall’s tau estimator. On one hand, moments of order greater than one are considered, making possible the estimation in multi-parameters models. On the other hand, the use of simulated moments make the methodology apply as soon as it is possible to simulate from a given parametric model. Compared to Brahimi and Necir (2012), the proposed estimators are not restricted to (the few) cases where the mapping induced by the theoretical moments is explicitly invertible. Oh and Patton (2013) also developed simulated method-of-moments estimators in copula models. What can be seen as an advantage over their method: (i) it is no longer necessary to base the estimation on pairwise dependence measures and (ii) the number of moments can match the number of parameters of the assumed parametric copula model.

Knowing how to estimate for dimensions \(d>2\) and multi-parameter models is important, especially in the context of big data, which is becoming more important. However, the applicability of the pseudo-maximum likelihood estimator (i.e., any dimension d, any number of parameters to estimate) is more theoretical than practical. Indeed, even in cases when an explicit (and numerically tractable) copula density is available, the PMLE can be computationally very intensive, especially when d is large. The flexibility of the method introduced in this work then appears to be of great interest for multi-parameter complex dependence models. One can mention the skew–Student copulas introduced by (Demarta and McNeil 2005), the factor copulas investigated for instance by Krupskii and Joe (2013), Krupskii and Joe (2015) and Mazo et al. (2016) as well as the vines copulas (see Czado 2019).