Abstract
Copula is a powerful tool to model multivariate data. We propose the modelling of intraday returns of multiple financial assets through copula. The problem originates due to the asynchronous nature of intraday financial data. We propose a consistent estimator of the correlation coefficient in case of elliptical copula and show that the plug-in copula estimator is uniformly convergent. For non-elliptical copulas, we capture the dependence through Kendall’s Tau (leveraging the relation between copula parameter and Kendall’s tau). We demonstrate underestimation of the copula parameter and propose an alternative method to obtain an improved estimator. In simulations, the proposed estimator reduces the bias significantly for a general class of copulas. We apply the proposed methods to real data of several stock prices.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
A very rich collection of market models have been developed and thoroughly investigated for intraday financial data. Although univariate modeling is important for addressing certain kinds of problems, it is not enough to unveil the nature and dynamics of the financial market. Interactions between different financial instruments are left out of univariate studies. If the companies belong to related business sectors, are affected by same socio-political conditions, or are owned by the same business house then such interactions can arise in a systematic way. High dependence between several constituent stocks of a portfolio can increase the probability of a large loss. So accurate estimation of the dependence between assets is of paramount importance. Correlation dynamics models, therefore, have become an important aspect of the theory and practice in finance. Correlation trading, which is a trading activity to exploit the changes in dependence structure of financial assets, and correlation risk that capture the exposure to losses due to changes in correlation, have attracted the attention of many practitioners, see Krishnan et al. (2009). Other than these direct applications, accurate modelling of dependence is also important indirectly, in a range of practical scenarios. For example, basket options are widely used, although their accurate pricing is challenging. The primary reason is that they are cheaper to use for portfolio insurance. The cost-saving relies on the dependence structure between the assets, see Salmon et al. (2006). In the actuarial world, as shown in Embrechts et al. (2002), Monte Carlo-based approaches to joint modelling of risks, like Dynamic Financial Analysis, depend heavily on the dependence structure. Frey and McNeil (2002) and Breymann et al. (2003) showed that the choice of model and correlation have a significant impact on the tail of the loss distribution and measures of extreme risks. It follows from the above discussion that we need accurate multivariate modeling and analysis. In order to perform multivariate analysis, we need multivariate data. This means that we need to have data for all p(≥ 2) variables on n (sufficiently large) time points. For example, in case of daily financial data we would expect to observe the price of all p stocks on a particular day. This kind of data is called synchronously observed data. On the other hand, if we don’t have observations for one or several variables (or stocks) on a particular time point then we call it nonsynchronous /asynchronous data. An example of such data is intraday stock price data. Within a particular day we can not expect to observe transactions in all stocks simultaneously. In Fig. 1, we have shown the transaction/arrival times of two stocks within a small time interval. This cannot be handled as a missing data problem where a fraction of observations are missing. Here, it is extremely rare to observe transactions in two stocks at the same point of time.
The effect of asynchronicity can be quite serious on the estimation of model parameters. One such phenomenon, reported by Epps (1979), is called Epps effect. Empirical results reported in that paper showed that the realized covariance between stock returns decreases as sampling frequency increases. Later the same phenomenon has been reported in several other studies on different stock markets, see Zebedee and Kasch-Haroutounian (2009) and foreign exchange market, see Muthuswamy et al. (2001). It is also empirically shown , see Renò (2003), that taking into account only the synchronous, or nearly synchronous, alleviates this underestimation problem.
Several studies have been devoted to the estimation of the covariance from intraday data. Mancino and Sanfelici (2011) analysed the performance of the Fourier estimator originally proposed by Malliavin et al. (2009). Peluso et al. (2014) adopted a Bayesian dynamic linear model and treated asynchronous trading as missing observations in an otherwise synchronous series. Corsi and Audrino (2012) proposed two covariance estimators, adapted to the case of rounding in the price time stamps, which can be seen as a general way of computing the Hayashi-Yoshida estimator (see Hayashi et al. 2005) at a lower frequency. Zhang (2011) proposed a method called two-scale realized covariance estimator (TSCV) which combines two-scale sub-sampling and previous tick method that can simultaneously remove the bias due to microstructure noise and asynchronicity. In the similar line, average realized volatility matrix (ARVM) has been proposed to modify the TSCV estimator such no bias-correction is required for the off-diagonal elements, see Hwang and Shin (2018). Fan et al. (2012) studies TSCV under high-dimensional setting. Aït-Sahalia et al. (2010) proposed quasi–maximum likelihood estimator of the quadratic covariance. Attempts have been made to resolve the problems of high-frequency data by introducing appropriate filtering technique, see Ho and Xin (2019).
The correlation coefficient only captures the linear dependence. In this work, we aspire to focus on estimation of non-linear dependence structure through copula. Apart from modelling the complete dependence structure, one of many other advantages of copula is the flexibility it offers to model the complex relationship between variables in a rather simple manner. It allows us to model the marginal distributions as per need and takes care of the dependence structure separately. It is also one of the most important tools to model tail dependence, which is the probability of extremely large or small return on one asset given that the other asset yielded an extremely large or small return, see Xu and Li (2009). For this reason, copula is also a useful tool for modelling the joint distribution of default times. It is used for pricing Credit Spread Basket, Credit Debt Obligation, First to Default, N-to Default and other Credit derivative baskets, see Malgrat (2013). They also shows how copula helps to correlate the systematic risk to idiosyncratic risk. Zhang and Zhu (2016) developed a class of copula structured multivariate maxima and moving maxima processes which is a flexible and statistically workable model to characterize joint extremes.
In this paper, we will elaborately discuss the impact of asynchronicity on several measures of association in a general class of copulas. We explain why there is a serious underestimation of the measures of association and demonstrate the need for careful treatment. We propose an alternative method for the estimation of correlation. Moreover we prescribe methods for accurate estimation of the associated copula. We also show that the estimation of some commonly used measures of associations, like Kendall’s tau, is challenging. The rest of the paper is organized as follows. In Section 2 we deal with the elliptical copula parameter estimation for nonsynchronous data and prove the main theorems. Section 3 deals with a more general class of copulas. In Sections 4 and 5 the results of simulation and real data analysis are shown. We present the conclusions in Section 6. All the proofs are given in Appendix Appendix.
2 Estimation of Elliptical Copula
Suppose there are two stocks whose log-prices at time t ∈ (0,T) are denoted by Xt and Yt. By \({R_{t}^{1}}\) and \({R_{t}^{2}}\) we denote the corresponding log-returns. Although in the ideal world of the Black-Scholes model, the log returns are assumed to follow a Gaussian distribution, the stylized facts about financial market suggest that a distribution with a heavier tail needs to be considered. In the multivariate scenario, the search for such a model is challenging. In such situations copula appears to be a central tool at our disposal.
In Section 4, the results of a simulation study are reported where the effect of asynchronicity on the estimation of the correlation coefficient has been shown. The simulation results display severe underestimation. Before attempting to understand the problem and propose a remedy, we will present an algorithm to synchronize the data to make it suitable for standard multivariate analysis. We should note that some studies (see Hayashi et al. (2005), Buccheri et al. 2020) attempt to calculate integrated covariance without synchronizing the data.
2.1 Pairing Method
The prices of the stocks are observed at random times when transactions take place. As a transaction in one stock would not influence the transaction time in the other, it is reasonable to assume that the observation times of the two stocks are independent point processes. Therefore, if we have log prices of the first stock along with its time of occurrence as \((X_{i},{t_{i}^{1}}), i=1,2,..,n_{1}\) and that of the second stock as \((Y_{j},{t_{j}^{2}}), j=1,2,...,n_{2}\), then \({t_{i}^{1}}\)’s and \({t_{j}^{2}}\)’s are independent. Here n1 and n2 are the number of observations of first and second stock respectively, available on a particular day.
Before fitting a copula model, the observations of two stock prices need to be paired such that they can be treated as synchronously observed. The conventional synchronizing methods, like previous tick sampling require a set (or sample) of n time points τi,i = 1(1)n at which we would like to observe a synchronized pair. For each stock, the tick information observed just previous to each such sampled time point τi is chosen to construct the synchronized pair (\(X_{\tau _{i}},Y_{\tau _{i}}\)), yielding n such pairs.
It is evident from the above discussion that the number of synchronized pairs is less than both n1 and n2, unless we allow repetition. This means many observations in each stock will be removed and not to be used for further analysis. Generalized sampling times are defined as the following.
DEFINITION 1
Ait-Sahalia et al. (2005) Suppose we have M stocks. Let \({t_{k}^{i}}\) be the k-th arrival time of the i-th asset. Then {τj: 1 ≤ j ≤ n}, are called generalized sampling times if
-
1.
0 = τ0 < τ1 < ... < τn = T.
-
2.
\((\tau _{j-1},\tau _{j}]\cap \{{t_{k}^{i}}:k=1,...,n_{i}\}\neq \varnothing \) for some i = 1,...,M.
-
3.
\(\max \limits _{1\leq j\leq n}\delta _{j}\rightarrow 0\) in probability, where δj = τj − τj− 1.
In the above-mentioned method, an observation is uprooted from its original time point and assigned to a sampled time point τj, for some j. In contrast, we want to retain the actual times of the prices that are chosen to be paired. In other words, instead of having a pair like \((X_{\tau _{j}},Y_{\tau _{j}})\), we want to have a pair \((X_{t_{{k_{i}^{1}}}^{1}},Y_{t_{{k_{i}^{2}}}^{2}})\) where \(t_{{k_{i}^{1}}}^{1}\text {and }t_{{k_{i}^{2}}}^{2}\) are the times at which the i-th pair of stock-prices were observed. To emphasise this, we call the algorithm as the ‘pairing method’ (in contrast to ‘synchronizing method’). The pairing method, to be followed throughout in this paper, is described through the following algorithm (A0):
The pairs created by this algorithm are identical to the pairs created by “refresh time sampling” (see Barndorff-Nielsen et al. 2011) but accommodates more information by retaining the transaction times. Instead of writing \((X_{t_{{k_{i}^{1}}}^{1}},Y_{t_{{k_{i}^{2}}}^{2}})\) we shall henceforth write \((X_{t({k_{i}^{1}})},Y_{t({k_{i}^{2}})})\).
In Fig. 2a and b, \({t_{j}^{1}}\) and \({t_{i}^{2}}\) are paired together as (\(t({k_{l}^{1}}),t({k_{l}^{2}})\)). The figures illustrate how the next pair is going to be chosen using the algorithm. In Fig. 2a, \(t_{j+1}^{1}<t_{i+1}^{2}\). So \(t(k_{l+1}^{2})=t_{i+1}^{2}\) and \(t(k_{l+1}^{1})\) is chosen to be the largest of the arrival times in the first stock that are less than \(t_{i+1}^{2}\). In Fig. 2b, \(t_{j+1}^{1}>t_{i+1}^{2}\). So \(t(k_{l+1}^{1})=t_{j+1}^{1}\) and \(t(k_{l+1}^{2})\) is chosen to be the largest of the arrival times in the first stock that are less than \(t_{j+1}^{1}\). The pairs are represented by the arrows.
2.2 Estimation of Correlation Coefficient
Suppose we have n paired observations from the algorithm A0. Now we can proceed to calculate the correlation coefficient. We denote a pair of centered and scaled log price processes by (Xt,Yt) and assume that this is independent of the arrival processes. Since the log returns \(X_{t({k_{i}^{1}})}-X_{t(k_{i-1}^{1})}\) and \(Y_{t({k_{i}^{2}})}-Y_{t(k_{i-1}^{2})}\) are calculated over two nonidentical time-intervals, namely \((t({k_{i}^{1}}),t({k_{i}^{2}}))\) and \((t(k_{i-1}^{1}),t(k_{i-1}^{2}))\), the correlation between the returns is heavily dependent on the length of overlapping and non-overlapping portions of these two time-intervals. To see this, first suppose \(X_{t({k_{i}^{1}})}-X_{t(k_{i-1}^{1})}={\sum }_{i=m}^{l}(X_{t_{i+1}}-X_{t_{i}})\) for some m and l. Here {ti : i = 1(1)(n1 + n2)} is the set of combined (ordered) time points at which a transaction (in any of the stocks) is noted. Then one of these four configurations is true:
See Fig. 3a and b for illustrations of the first two configurations where two consecutive pairs of log-prices are \((X_{t(k_{i-1}^{1})},Y_{t(k_{i-1}^{2})})\) and \((X_{t({k_{i}^{1}})},Y_{t({k_{i}^{2}})})\) with their corresponding transaction times \((t(k_{i-1}^{1}),t(k_{i-1}^{2}))\) and \((t({k_{i}^{1}}),t({k_{i}^{2}}))\).
DEFINITION 2
We define a random variable Ii, denoting the overlapping time interval of i th interarrivals corresponding to \(X_{t({k_{i}^{1}})}-X_{t(k_{i-1}^{2})}\) and \(Y_{t({k_{i}^{2}})}-Y_{t(k_{i-1}^{2})}\) as
For example in Fig. 3a, \(I_{i}=t({k_{i}^{2}})-t(k_{i-1}^{2})\) and for Fig. 3b, \(I_{i}=t({k_{i}^{2}})-t(k_{i-1}^{1})\).
DEFINITION 3
We define \(\hat {\theta }\) as the following,
where for l = 1,2 \(m_{l}=\frac {1}{n}{\sum }_{i=1}^{n}(t({k_{i}^{l}})-t(k_{i-1}^{l}))\), \(m(I)=\frac {1}{n}{\sum }_{i=1}^{n}I_{i}\) and \(\hat {\rho }\) is the sample correlation coefficient based on the pairs algorithm (A0).
With these definitions and notations we are now ready to state the first theorem.
We consider the following assumptions (\(\mathcal {A}\)):
\(\mathcal {A}_{1}\): The log price processes follows independent and stationary increment property and the increments of the two processes over the same time interval have correlation 𝜃.
\(\mathcal {A}_{2}\): The observation times (arrival process) of two stocks are independent Renewal processes and \(n\rightarrow \infty \) as \(n_{1},n_{2}\rightarrow \) \(\infty \).
\(\mathcal {A}_{3}\): Estimation is based on paired data obtained by algorithm A0.
Theorem 1.
Under the assumptions \(\mathcal {A}_{1}-\mathcal {A}_{3},\)
-
1.
\(\hat {\theta }\) is a consistent estimator of the true correlation coefficient 𝜃.
-
2.
Moreover,
$$ \sqrt{n}(\hat{\theta}-\theta)\stackrel{d}{\rightarrow}N(0,\gamma^{2}(1-\rho^{2})^{2}+\rho^{2}{\sigma_{0}^{2}}), $$where \({\sigma _{0}^{2}}\), defined in Appendix Appendix, depends only on the distribution of the arrival times and
$$ \gamma=\frac{\sqrt{E(t({k_{2}^{1}})-t({k_{1}^{1}}))E(t({k_{2}^{2}})-t({k_{1}^{2}}))}}{E(I_{1})}. $$
Proof of Theorem 1 is given in Appendix Appendix.
According to this theorem, in order to get a consistent estimator, we need to multiply the usual sample correlation coefficient, based on the paired observations (by algorithm A0), by a correction factor. The correction factor is a function of only \(t_{{k_{i}^{1}}}\) and \(t_{{k_{i}^{2}}}\) for i = 1(1)n i.e. it is only dependent on the arrival process.
2.3 Nonlinear dependence and elliptical copula
So far we were dealing with linear dependence through the correlation coefficient. In this section we will deal with nonlinear dependence through copula. A copula is formally defined as follows.
DEFINITION 4
A d-dimensional distribution function C(u1,u2,...,ud) : [0,1]d \(\rightarrow [0,1]\), where the margins satisfy Cj(uj) = C(1,1,...,uj,...1) = uj for all uj ∈ [0,1] and j = 1,....,d, is called a copula.
It is clear from the definition that the copula is a distribution function with uniform margins. There are many families of copula functions, like Elliptical copula, Archimedean copula, vine copula etc. In this section, we will restrict our attention to elliptical copula. Elliptical copula has the following form
where FR(x1,x2,....,xd) is a d-dimensional elliptical distribution function with correlation matrix R and F(.) is the marginal distribution of FR (Hyrš and Schwarz, 2015). The Gaussian copula is the most widely used elliptical copula which mimics the dependence structure of a multivariate Gaussian distribution. But it does not capture the nonlinear dependence. It is well-known that a Gaussian copula with correlation coefficient zero reduces to independent copula. But this is not true in general. For example in case of another common elliptical copula, namely the t copula, the parameter captures the linear dependence but the form of the copula function accommodates for nonlinear dependence. We will now discuss the effect of asynchronicity on copula estimation.
By Sklar’s theorem (see Nelsen 2007), the distribution function of the log returns R1 and R2 can be expressed as F(r1,r2) = C(F1(r1),F2(r2);𝜃), where C is the unique copula associated with F and F1, F2 are the distribution function of the scaled returns on a unit interval i.e.
Asynchronicity not only affects the estimation of 𝜃, but also the estimation of the copula function because R1 and R2 are assumed to be observed synchronously. The convergence of \(\hat {C}(\hat {F}_{1}(r^{1}),\hat {F}_{2}(r^{2});\hat {\theta })\), where \(\hat {F}_{1}(.)\) and \(\hat {F}_{2}(.)\) are empirical distribution functions of R1 and R2, needs more than the convergence of \(\hat {\theta }\). The next theorem tries to address this concern. Before stating the theorem we will make an additional assumption which ensures that the probability of both the missing value of the scaled return at \(t_{{k_{i}^{1}}}\) and observed value of the scaled return at \(t_{{k_{i}^{2}}}\) lying in an interval of length 2δ is in the order of \(\frac {\delta }{n^{\psi }}\) for ψ > 0. \(\mathcal {A}_{4}: P[\mid R_{{k^{1}_{i}}}^{l}-R_{{k^{2}_{i}}}^{l}\mid \leq 2\delta ]=O(\frac {\delta }{n^{\psi }})\) for l = 1,2 with ψ > 0 and \(\mid \mathrm {max_{i}}(R_{{k_{i}^{1}}}-R_{{k_{i}^{2}}})\mid <M\) where M is a positive real number. This assumption is reasonable due to two reasons. First, as δ gets smaller, we expect the probability \(P[\mid R_{{k^{1}_{i}}}^{l}-R_{{k^{2}_{i}}}^{l}\mid \leq 2\delta ]\) also to be smaller. Second, as we are looking for high-frequency data, increase in n implies higher liquidity. Therefore the arrival times between two consecutive transactions reduces. Now, as we assumed an underlying diffusion process driven by Geometric Brownian Motion, the fluctuations in price (and therefore returns) would be much lesser for a short interarrival. Therefore the probability would be less for a higher value of n.
Theorem 2.
If the true underlying copula is an elliptical copula then under \(\mathcal {A}_{1}-\mathcal {A}_{4}\), \(C(\hat {F}_{1}(r_{1}),\tilde {F}_{2}(r_{2});\hat {\theta })\) is uniformly convergent to the true copula, where \(\hat {F}_{1}(.)\) and \(\tilde {F}_{2}(.)\) are the empirical distribution functions of the marginals of R1 and R2 computed from the scaled paired data and \(\hat {\theta }\) is defined as in Theorem 1.
Proof of Theorem 2 is given in Appendix A.
2.4 γ and Expected Loss of Data
Recall that the correction factor γ in Theorem 1 is a function of the arrival process only. It is worthwhile to express γ in terms of the underlying parameters of the arrival processes. In the next theorem, we will try to do so. But the implication of the theorem goes beyond this purpose. Remember that all the synchronization methods we discussed have one problem. It results in loss of data, which is evident from Fig. 2. The second observation of the first stock will not be included in any of the pairs and therefore will be wasted. So one can ask the question that what proportion of observations (of each stock) will be wasted by using our pairing method (A0). This can be answered if we can compare average interarrival length in a stock (for example \(E({t_{i}^{1}}-t_{i-1}^{1})\) for the first stock) and average interarrival length formed by the pairs (\(E(t_{k_{i}}^{1}-t_{k_{i-1}}^{1})\) for the first stock). One important point to note here is that even if the two initial point processes \({{t_{i}^{1}}}:i=1(1)n_{1}\) and \({{t_{i}^{2}}}:i=1(1)n_{2}\) are independent, the point processes after pairing the observations \({t_{k_{i}}^{1}}:i=1(1)n\) and \({t_{k_{i}}^{2}}:i=1(1)n\) are not independent. This is due to the fact that the pairing method (A0) involves arrivals of both the stocks. Due to that fact, we will see in the next theorem both \(E(t_{k_{i}}^{1}-t_{k_{i-1}}^{1})\) and \(E(t_{k_{i}}^{2}-t_{k_{i-1}}^{2})\) involves λ1,λ2, the parameters of the two point processes.
Theorem 3.
Suppose the two underlying point processes are Poisson processes with parameters λ1 and λ2 then,
-
1.
\(E(I) = \frac {1}{2}{\sum }_{n=1}^{\infty }n[(\frac {1}{\lambda _{1}}+\frac {1}{\lambda _{2}})(p_{n}+q_{n})].\)
-
2.
\(E(t_{k_{i}}^{1}-t_{k_{i-1}}^{1}) = \frac {\eta _{1}}{\lambda _{1}}.\)
-
3.
\(E(t_{k_{i}}^{2}-t_{k_{i-1}}^{2}) = \frac {\eta _{2}}{\lambda _{2}}.\)
where
and for i = 1,2
with FB(a,b) denoting the cumulative distribution function (cdf) of the Beta (a,b) distribution.
Proof of Theorem 3 is given in Appendix A.
As a consequence of this theorem we have
Before moving to the next section, we like to note two points. First, If one of the stock has much lower liquidity than the other then the number of paired observations will reduce significantly. The extent of this reduction can be obtained precisely from Theorem 3. Secondly, the parameter of elliptical copula is the correlation matrix. As a result, all the methods described above can be applicable for multivariate analysis with dimension more than 2.
3 Extension to general copula
In this section, we will deal with a more general class of copulas. As the argument in Section 2 is entirely based on the correlation coefficient it can not be directly extended to a larger class of copulas. This is precisely because for a general copula there is no direct relation between the Pearson’s correlation coefficient and the copula parameter.
We propose to use Kendall’s tau to capture the copula dependence. The definition of Kendall’s tau is
where \(\tilde {X}\) and \(\tilde {Y}\) are identical but independent copies of X and Y. The relation between Kendall’s tau and the copula is captured through the following equation.
If X and Y be random variables with an Archimedean copula C generated by ϕ in Ω then
For the elliptical copulas a simplified form can be derived,
So we can study how Kendall’s tau is affected by asynchronicity. Thereupon we will gauge the impact on the copula parameter using the above mentioned relation.
3.1 Underestimation of Kendall’s Tau
To have a closer look at the dependence, in this section we consider the conditional distribution of return given the underlying configuration. The problem with nonsynchronous data is that any two independent pairs of returns can not be taken as identical copies of each other. To see this, consider Fig. 4, where arrival times of the 1st stock are denoted by triangles and arrival times of the second stock are denoted by circles. After applying the pairing method, suppose the first circle and first triangle represent the location of the first pair of prices. Similarly, the second circle and the second triangle represent the next pair. From the figure, it is evident that these two pairs forms an example of the second configuration (see Eq. 1). Similarly, the 3rd and 4th pair constitutes an example of the 4th configuration. So the corresponding returns may not be considered as identically distributed. In this subsection, we will measure the Kendall’s tau using only the returns with same configuration. Figure 5 represents the arrival times of two pairs of the same configuration.
As illustrated in Fig. 5, suppose we have two non-overlapping inter-arrivals u1 and u2 for the first stock and 𝜖1 + u1 + η1 and 𝜖2 + u2 + η2 for the second stock, with arrival times denoted by the triangles and circles respectively. The log returns corresponding to the inter-arrivals of the first stock are given by \({R^{1}_{1}}=R^{1}(u_{1})\) and \({R^{1}_{2}}=R^{1}(u_{2})\). Similarly, the log returns corresponding to the intervals of the second stock are denoted by \({R^{2}_{1}}=R^{2}(\epsilon _{1}+u_{1}+\eta _{1})=R^{2}(\epsilon _{1})+R^{2}(u_{1})+R^{2}(\eta _{1})\) (due to independent increment property) and \({R^{2}_{2}}=R^{2}(\epsilon _{2}+u_{2}+\eta _{2})=R^{2}(\epsilon _{2})+R^{2}(u_{2})+R^{2}(\eta _{2})\). In the following section, we will focus on the two specific configurations (1).
Define,
and
where Ii and \({I_{i}^{c}}\) are respectively overlapping and non-overlapping regions of the i th pair of returns. In the above example, length(I1) = u1, length(I2) = u2, \(\text {length}({I_{1}^{c}})=\epsilon _{1}+\eta _{1}\) and \(\text {length}({I_{2}^{c}})=\epsilon _{2}+\eta _{2}\). Note that for the first and fourth configurations, E(sign(A)) gives us true Kendall’s tau. We cannot calculate E(sign(A)) because R2(I1) and R2(I2) are not observed. Instead, we observe \(R^{2}(I_{1}\cup {I_{1}^{c}})\) and \(R^{2}(I_{2}\cup {I_{2}^{c}})\). Therefore the observed Kendall’s tau is E(sign(A + B)). In this section, we will try to find out the relation between E(sign(A)) and E(sign(A + B)).
In order to establish our result, we need some assumptions. Suppose X and Y are positively associated random variables. Let (X1,Y1) and (X2,Y2) be two identical copies of (X,Y ). Then, given the information that Y1 − Y2 > 0, we would expect that X1 − X2 is more likely to be positive than negative. Intuitively, positive association would also suggest that given the information \(Y_{1}-Y_{2}\in S\subset \mathbb {R}^{+}\), X1 − X2 is more likely to be positive. This notion is not in general captured by any known measure of association. For each of the following we define (X1,Y1) and (X2,Y2) as two identical copies of (X,Y ), U = X1 − X2 and V = Y1 − Y2.
Assumptions(\(\mathbf {{\mathscr{B}}}\)) stated below, tries to capture the above idea. We expand on this more in Appendix Appendix.
\(\mathbf {{\mathscr{B}}_{1}}\): If P(UV > 0) > 1/2 (or < 1/2) then for all M > 0,
P(U > 0∣0 < V < M) ≥ 1/2 (or < 1/2) and
P(V > 0∣0 < U < M) ≥ 1/2 (or < 1/2).
\(\mathbf {{\mathscr{B}}_{2}}\): If P(UV > 0) > 1/2 (or < 1/2) then for all M > 0,
P(UV > 0∣ ∣V ∣ > M) > 1/2 (or < 1/2) and
P(UV > 0∣ ∣V ∣ > M) > 1/2 (or < 1/2).
Before stating the main theorems, we will first state some Lemmas which will help us to prove the theorems.
Lemma 1.
E(sign(A)∣sign(A)≠sign(B)) = E(sign(A)).
Proof of Lemma 1 is given in Appendix A.
Lemma 2.
E(sign(A)∣sign(A)≠sign(B),∣A∣ < ∣B∣) = E(sign(A)∣∣A∣ < ∣B∣).
This is a straightforward consequence of Lemma 1 and the independence of {sign(A)≠sign(B)} and {∣A∣ < ∣B∣}.
Theorem 4.
Under the Assumption \({\mathscr{B}}\), for the pairs with 1st and 4th configuration,
where ρτ is the Kendall’s tau calculated on the paired data with 1st and 4th configurations, i.e. ρτ = E(sign(X1 − X2)(Y1 − Y2)), where (X1,Y1) and (X2,Y2) are independent pairs of the same configurations.
Proof of Theorem 4 is given in Appendix A. Now we will show that \(\text {sign}(\rho _{\tau })=\text {sign}(\hat {\rho }_{\tau })\).
Theorem 5.
For the pairs with 1st and 4th configuration,
where ρτ is the Kendall’s tau calculated on the paired data with 1st and 4th configurations, i.e. ρτ = E(sign(X1 − X2)(Y1 − Y2)), where (X1,Y1) and (X2,Y2) are independent pairs of the same configurations.
Proof of Theorem 5 is given in Appendix Appendix. By Lemma 2,
This together with Assumption \({\mathscr{B}}\) implies that, \(\text {sign}(\rho _{\tau })=\text {sign}(\hat {\rho }_{\tau })\).
Theorems 4 and 5 together imply that the estimator of Kendall’s tau obtained after pairing the observed asynchronous data underestimates the true parameter under the assumption \({\mathscr{B}}\) for the 1st and 4th configurations. Similar results can be established under the other two configurations.
3.2 Corrected Estimator
Similar to Section 2, we would like to find a correction factor, that only depends on the arrival times, for a more general class of copula. For elliptical copula, the value of the correction factor is not dependent on the value of the parameter. This is evident from Fig. 6 (left panel), showing the true and uncorrected mean estimated parameter of Gaussian copula for simulated nonsynchronous data. We generate the arrival times according to a pre-specified Poisson process. We can see that the true and estimated parameters lie along the regression line where the intercept term of the regression line is insignificant. This suggests that the corrected estimator should be a constant times the uncorrected one. This constant was the correction factor derived in Theorem 1.
On the other hand in the figure for Clayton copula (right panel, Fig. 6), we can see that a straight line would not be a good candidate to model the relation between the true and uncorrected estimated Kendall’s tau. This means we should not aspire to find a simple multiplicative correction factor that would give us the value of the true parameter. On inspection, a second degree polynomial seems to be a good model. But the same procedure, a second-degree polynomial seems to be appropriate for the Gumbel copula as well. We therefore, use a quadratic model to obtain the corrected estimator. The detailed steps are outlined below:
-
1.
From the observed data, estimate the two arrival processes independently.
-
2.
Estimate the univariate marginal distributions.
-
3.
Using the pairing algorithm described in Section 2.2, pair the observations.
-
4.
With paired data, we can now see which copula fits best to the data. It can be obtained through AIC or BIC criterion.
-
5.
Estimate the Kendall’s tau (uncorrected) from this paired data.
-
6.
Prefix K copula parameters (or equivalently Kendall’s tau). For each parameter, with the information of the underlying copula, arrival processes, and marginals, we now simulate N nonsynchronous samples (the technique of generating nonsynchronous data is discussed in Section 4).
-
7.
For each sample, calculate uncorrected estimate and plot the estimates and the true Kendall’s tau in a plot like Fig. 8 (right panel).
-
8.
Fit a suitable quadratic regression for such a plot.
-
9.
From the regression equation, find the corrected Kendall’s tau corresponding to the estimated value of the Kendall’s tau (obtained from step 5).
Note that the above procedure yields an interval estimator for Kendall’s tau by considering the confidence interval in the regression. In Section 4 we study the coverage probability of such intervals through simulations and compare them to other interval estimates.
4 Simulation
We simulated data of synchronized log-returns of two stocks for n1 + n2 time points. The time points are generated by a poisson Process. Corresponding n1 + n2 returns are drawn randomly from a bivariate distribution determined by a pre-specified copula and margins. These n1 + n2 pairs are then transformed appropriately to represent log-prices on the corresponding interarrivals. Now from the first stock, we randomly delete n2 time points and their corresponding prices. The remaining n1 data points constitute the data for the first stock. For the second stock, we keep the time points which were deleted from the first stock and delete rest of the time points. These time points, along with their corresponding log-prices, constitute data for the second stock. So now we have nonsynchronous data for the two stocks.
4.1 Estimation of copula parameter of elliptical copulas
In the following simulation study, we test the performance of the method prescribed in Theorem 1 to estimate the copula parameter. To do so, we first choose a Gaussian copula and generate 100 instances of nonsynchronous data by the method mentioned above. Initially, both n1 and n2 are taken to be the same. The mean, variance and mean square error of the 100 estimates are reported in Tables 1 and 2. In Fig. 7, we show the boxplots for ρ = 0.8. The boxplot on the left corresponds to the corrected estimate and those on the middle and right corresponds to uncorrected estimates obtained from refresh time sampling and previous tick sampling respectively. The horizontal line suggests the true parameter.
From the table, we see that both previous tick and refresh time sampling fail to capture the magnitude of true dependence. In fact the previous tick method is the worst choice for synchronization.
We carry out the same analysis with t copula, with different marginal distributions with different degrees of freedom, which is a more realistic scenario for intraday financial data. The result is similar i.e. not only does our prescribed correction give a good estimate but also the uncorrected method returns a biased estimate and the bias is significant. The result of 100 simulations with parameter -0.4 is summarized in Table 3.
4.2 Interval Estimation of Kendall’s Tau in Non-elliptical Copula
We take three approaches to interval estimation of the true Kendall’s tau and apply those on simulated data from several Archimedean copulas. In the first approach, we follow the method described in Section 3.2 and get the 95% confidence interval. Here we are assuming a quadratic relation between the true and the estimated parameters. The blue dotted lines in the right panel of Fig. 8 show the confidence intervals for Clayton copula.
The second approach is similar to the first one, but we don’t fit a regression line. Instead, for each true Kendall’s tau, we plot the interval that contains the (under)estimated Kendall’s tau 95% of times. In the left panel of Fig. 8, we plot the intervals (horizontal) against the true Kendall’s tau. Now we calculate the confidence interval for true Kendall’s tau as the vertical interval corresponding to the estimated Kendall’s tau (see the red vertical lines corresponding to 0.1, 0.2 and 0.32 in the figure). This is a completely non-parametric approach and relies on inversion of the acceptance regions of hypothesis tests for Kendall’s tau.
In the third approach, we deliberately mis-specify the underlying copula as a Gaussian copula and use Theorem 1 to calculate the confidence interval (using the relation between correlation coefficient and Kendall’s tau for elliptical copula, see Eq. 5). The results of these three approaches to interval estimation are given in Table 4 in Section 4.
The coverage probability and interval lengths from the three methods of interval estimation, described in Section 4.2, are shown in Table 4. An important takeaway from this table comes from the last column which demonstrates that the effect of model misspecification can be quite serious. Note that the second method, being completely non-parametric, does not assume anything about the shape of the dependence function between the true parameter and the uncorrected estimate. The first method assumes a quadratic model. This assumption reduces computations by a huge amount. From the table we see that the coverage probability of the first method is always at least the target value of 95%. So the assumption of a quadratic model does not compromise the coverage probability. The intervals are a little larger than the second method, so the first method is more conservative. Another observation is that the length of the intervals do not depend much on the value of the underlying parameter.
5 Real Data Analysis
We analyze real financial intraday data to see which kind of copula is most likely to be encountered in practice and obtain the corresponding parameter estimates. We use AIC to compare and select the best copula. In many of the cases, we find that the t-copula is a good choice to model bivariate intraday data. To see the impact of asynchronicity for real data we record the relative extent of correction to be undertaken. The intraday data for Apple and Facebook stocks are plotted in Fig. 9. These have been modelled by bivariate t copula for three consecutive days. For all three days both the uncorrected and the corrected estimates are evaluated in Table 5. The percentage change in values of uncorrected and corrected estimates is reported in the third column. We notice that almost 30 to 35% of data being lost or deleted after constructing the pairs by algorithm \(\mathcal {A}_{0}\).
We also perform the same analysis for a couple of other stocks and the results we obtained are very similar. For example, when we consider Amazon and Netflix on three nearly consecutive days, the percentage changes in copula parameter with t copula are 41.75%, 39.84%, 42.76% respectively.
6 Conclusion and Future Directions
Both simulations and real data analysis clearly show that the impact of asynchronicity can be very serious if not tackled properly. We discuss some of the methods to circumvent the problem. Careful pre-processing of intraday data is necessary to model or infer about the underlying realities. We propose a consistent estimator of the correlation coefficient and more generally of elliptical copula function. For a more general class of copulas, where there is a one-one relation between the Kendall’s tau and the copula, we suggest a way of estimating the copula parameter. Alongside the point estimates, three ways of interval estimation is discussed and compared. From the results it is evident that the impact of asynchronicity can be quite serious under model mis-specification. The real data analysis corroborates our findings. For the two chosen stocks, as the correlation is very less, the absolute change in the value after the correction is not much. But the relative change is significantly high, as we expected.
There are several directions in which one can extend this work. Firstly, we did not assume the presence of microstructure noise. In the presence of noisy observations, the estimator may demand further modifications. The estimation procedure can be further challenging if the parameter is time-dependent. As time-dependent copula modelling is gaining popularity in financial data analysis, it is worthwhile to investigate the effect of asynchronicity in time-varying parameter estimation. Another question one can look into is that how asynchronicity affect the estimation of popular risk measures like Value at Risk (VaR) of portfolios involving multiple assets.
Data Availability
Data collected from public source (https://www.dukascopy.com/).
Code Availability
References
Aït-Sahalia, Y., Fan, J. and Xiu, D. (2010). High-frequency covariance estimates with noisy and asynchronous financial data. Journal of the American Statistical Association 105, 492, 1504–1517.
Ait-Sahalia, Y., Mykland, P.A. and Zhang, L. (2005). How often to sample a continuous-time process in the presence of market microstructure noise. The Review of Financial Studies 18, 2, 351–416.
Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A. and Shephard, N. (2011). Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading. Journal of Econometrics 162, 2, 149–169.
Breymann, W., Dias, A. and Embrechts, P. (2003). Dependence structures for multivariate high-frequency data in finance. Quantitative Finance 3(1), 1–14.
Buccheri, G., Bormetti, G., Corsi, F. and Lillo, F. (2020). A score-driven conditional correlation model for noisy and asynchronous data: An application to high-frequency covariance dynamics. Journal of Business & Economic Statistics, 1–17.
Corsi, F. and Audrino, F. (2012). Realized covariance tick-by-tick in presence of rounded time stamps and general microstructure effects. Journal of Financial Econometrics 10, 4, 591–616.
Embrechts, P., McNeil, A. and Straumann, D. (2002). Correlation and dependence in risk management: properties and pitfalls. Risk Management: Value at Risk and Beyond 1, 176–223.
Epps, T.W. (1979). Comovements in stock prices in the very short run. Journal of the American Statistical Association 74, 366a, 291–298.
Fan, J., Li, Y. and Yu, K. (2012). Vast volatility matrix estimation using high-frequency data for portfolio selection. Journal of the American Statistical Association 107, 497, 412–428.
Frey, R. and McNeil, A.J. (2002). Var and expected shortfall in portfolios of dependent credit risks: conceptual and practical insights. Journal of banking & finance 26, 7, 1317–1334.
Hayashi, T., Yoshida, N. et al. (2005). On covariance estimation of non-synchronously observed diffusion processes. Bernoulli 11, 2, 359–379.
Ho, M. and Xin, J. (2019). Sparse kalman filtering approaches to realized covariance estimation from high frequency financial data. Mathematical Programming176, 1, 247–278.
Hwang, E. and Shin, D.W. (2018). Two-stage stationary bootstrapping for bivariate average realized volatility matrix under market microstructure noise and asynchronicity. Journal of Econometrics 202, 2, 178–195.
Hyrš, M. and Schwarz, J. (2015). Elliptical and archimedean copulas in estimation of distribution algorithm with model migration, 1. IEEE, p. 212–219.
Krishnan, C.N.V., Petkova, R. and Ritchken, P. (2009). Correlation risk. Journal of Empirical Finance 16, 3, 353–367.
Malgrat, M. (2013). Pricing of a “worst of”? option using a copula method. Ph.D. Thesis, KTH.
Malliavin, P., Mancino, M.E. et al. (2009). A fourier transform method for nonparametric estimation of multivariate volatility. The Annals of Statistics37, 4, 1983–2010.
Mancino, M.E. and Sanfelici, S. (2011). Estimating covariance via fourier method in the presence of asynchronous trading and microstructure noise. Journal of Financial Econometrics 9, 2, 367–408.
Muthuswamy, J., Sarkar, S., Low, A. and Terry, E. (2001). Time variation in the correlation structure of exchange rates: high-frequency analyses. Journal of Futures Markets: Futures, Options, and Other Derivative Products 21, 2, 127–144.
Nelsen, R.B. (2007). An introduction to copulas. Springer Science and Business Media, Berlin.
Peluso, S., Corsi, F. and Mira, A. (2014). A bayesian high-frequency estimator of the multivariate covariance of noisy and asynchronous returns. Journal of Financial Econometrics 13, 3, 665–697.
Renò, R. (2003). A closer look at the epps effect. International Journal of Theoretical and Applied Finance 6, 01, 87–102.
Salmon, M., Schleicher, C. et al. (2006). Pricing multivariate currency options with copulas. Copulas: From Theory to Application in Finance, Risk Books, London.
Xu, Q. and Li, X.-M. (2009). Estimation of dynamic asymmetric tail dependences: an empirical study on asian developed futures markets. Applied Financial Economics 19, 4, 273–290.
Zebedee, A.A. and Kasch-Haroutounian, M. (2009). A closer look at co-movements among stock returns. Journal of Economics and Business 61, 4, 279–294.
Zhang, L. (2011). Estimating covariation: Epps effect and microstructure noise. Journal of Econometrics 160, 1, 33–47.
Zhang, Z. and Zhu, B. (2016). Copula structured m4 processes with application to high-frequency financial data. Journal of Econometrics 194, 2, 231–241.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Ethics Approval. No ethical approval is required.
Conflict of interests
Conflict of interests. The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
1.1 Proofs
1.1.1 A.1 Proof of Theorem 1:
Proof.
The conditional expectation:
This is a consequence of the assumption A1 (as illustrated in the examples in Section 2.2). As we have
the unconditional expectation \(E(X_{t({k_{i}^{1}})}-X_{t(k_{i-1}^{1})})^{2}=E(t({k_{2}^{1}})-t({k_{1}^{1}}))E\) (X2). Similarly, \(E(Y_{t({k_{i}^{2}})}-Y_{t(k_{i-1}^{2})})^{2}=E(t({k_{2}^{2}})-t({k_{1}^{2}}))E(Y^{2})\).
Therefore,
Note that the estimate \(\hat {\theta }\) is defined as
where \(\hat {\rho }=\hat {Cor}((X_{t({k_{2}^{1}})}-X_{t({k_{1}^{1}})}),(Y_{t({k_{2}^{2}})}-Y_{t({k_{1}^{2}})}))\), the sample correlation coefficient.
Let us define the correction factor: \(w_{n}=\frac {\sqrt {m_{1}m_{2}}}{m(I)}\).
By the WLLN, wn and \(\hat {\rho }\) converge to γ and 𝜃 in probability. Hence \(\hat \theta = w_{n}\hat {\rho }\) is consistent for 𝜃 = γρ.
By CLT and independence of the arrival process and the price process,
for some \({\sigma _{0}^{2}}\) that depends only on the distribution of the arrival times.
By delta method,
This completes the proof. □
A.2 Proof of Theorem 2.
Proof.
Note that F(x,y) = C(F1(x),F2(y),𝜃). So \(\hat {C}=\hat {C}(\hat {F}_{1}(R^{1}_{{k_{i}^{1}}}),\hat {F}_{2}(R^{2}_{{k_{i}^{1}}}),\hat {\theta };\ i=1(1)n)\) is a consistent estimator for the copula C. But \(R^{2}_{{k_{i}^{1}}}\)’s are unobserved, where \(R^{2}_{{k_{i}^{2}}}\)’s are actually observed. Let us use the notation \(\hat {F}_{2}(.)\) and \(\tilde {F}_{2}(.)\) for the empirical distribution function of R2 based on the values \(\{R^{2}_{{k_{i}^{1}}}:i=1(1)n\}\) and \(\{R^{2}_{{k_{i}^{2}}}:i=1(1)n\}\) respectively. Here \(R^{2}_{{k_{i}^{1}}}= \frac {(Y_{t({k^{1}_{i}})}-Y_{t(k^{1}_{i-1})})}{(t({k^{1}_{i}})-t(k^{1}_{i-1}))}\). Therefore to claim that the estimated copula based on the paired data (observed) is consistent, we have to show that- \(\mid \hat {C}(\hat {F}_{1},\hat {F}_{2},\hat {\theta })-\hat {C}(\hat {F}_{1},\tilde {F}_{2},\hat {\theta })\mid \rightarrow 0\) a.s.
Suppose \(\delta _{i}=\mid R^{2}_{{k_{i}^{1}}}-R^{2}_{{k_{i}^{2}}}\mid \). Note that by Assumption \(\mathcal {A}_{4}\), \(\delta = \max \limits ({\delta _{i}}) < M\). Then,
This implies that
The second inequality is due to Chebyshev’s inequality and the last equality is due to \(\mathcal {A}_{4}\). The third inequality is a consequence of asynchronicity as for each i, there are at most two j’s (the preceding and the next) for which \((R^{1}_{{k_{i}^{1}}},R^{2}_{{k_{i}^{2}}})\) and \((R^{1}_{{k_{j}^{1}}},R^{2}_{{k_{j}^{2}}})\) are dependent.
Hence by Borel Cantelli lemma, \(\mid \tilde {F}_{2}(r)-\hat {F}_{2}(r)\mid \stackrel {a.s}{\rightarrow 0}\). Again as \(\mid \hat {F}_{2}(r)-F_{2}(r)\mid \stackrel {a.s}{\rightarrow 0}\) we have that, \(\mid \tilde {F}_{2}(r)-F_{2}(r)\mid \stackrel {a.s}{\rightarrow 0}\).
Now we have to show that uniform convergence will hold in this case. That is, we want to show that almost surely, \(\forall \epsilon >0,\sup _{r}\mid \tilde {F}_{2}(r)-F(r)\mid <\epsilon \). This can be done using standard arguments and properties of distribution function, so we skip the proof. Now using properties of copula we can clearly see that
As both \(\hat {F}_{1}\) and \(\tilde {F}_{2}\) are uniformly convergent to F1 and F2 respectively, the result follows. □
A.3 Proof of Theorem 3
Proof.
-
(a)
If you fix the point \(t({k_{i}^{1}})\) then there can be two situations depending on the position of \(t({k_{i}^{2}})\).
Case 1: \(t({k_{i}^{1}})>t({k_{i}^{2}})\) and Case 2: \(t({k_{i}^{1}})<t({k_{i}^{2}})\)
Firstly consider case 1. We are interested in the overlapping interval. As illustrated in Fig. 10, define S1 as the first interarrival of the first stock after \(t({k_{i}^{1}})\), T1′ as the first interarrival of the 2nd stock after \(t({k_{i}^{2}})\) and T1 as the first interarrival of the 2nd stock after \(t({k_{i}^{1}})\) i.e. if we start observing the process only from \(t({k_{i}^{1}})\) then T1 is the first arrival time of the second stock. As the arrival processes are Poisson processes, distributions of T1 and T1′ are same (due to memory-less property). Denote all the subsequent interarrivals as Si : i = 2,3,... and Ti : i = 2,3,....
Now (for case 1),
Therefore (for case 1),
Now, \(E\big ({\sum }_{i=1}^{n}T_{i}\big )=nE(T_{1})=\frac {n}{\lambda _{2}}\) and \(E\big ({\sum }_{i=1}^{n}S_{i}\big )=nE(S_{1})=\frac {n}{\lambda _{1}}\).
Define \(p_{n}=P\big ({\sum }_{i=1}^{n}T_{i}<S_{1}<{\sum }_{i=1}^{n+1}T_{i})\) and \(q_{n}=P\big ({\sum }_{i=1}^{n}S_{i}<T_{1}<{\sum }_{i=1}^{n+1}S_{i}\big )\). So,
Note that \(T_{1}\sim exp(\lambda _{2})\equiv Gamma(1,\lambda _{2})\) and \({\sum }_{i=1}^{n}S_{i}\sim Gamma(n,\lambda _{1})\). Therefore,
where \(Z_{n}=\frac {T_{1}/\lambda _{2}}{\frac {{\sum }_{i=1}^{n}S_{i}}{\lambda _{1}}+\frac {T_{1}}{\lambda _{2}}}\sim Beta(1,n)\) Therefore,
Similarly,
Similarly we can derive for case 2 by interchanging the role of X and Y. Hence for case 2 we have,
Now two cases are equally likely, \(P(\text {Case}\ 1)=\frac {1}{2}=P(\mathrm {Case\ 2})\). Therefore combining both the cases we have, \(E(I)=\frac {1}{2}{\sum }_{n=1}^{\infty }n[(\frac {1}{\lambda _{1}}+\frac {1}{\lambda _{2}})(p_{n}+q_{n})]\).
-
(b)
Now we have to calculate \(E(t({k_{i}^{1}})-t(k_{i-1}^{1}))\).
For case 1: Define,
Then, \(E(t({k_{i}^{1}})-t(k_{i-1}^{1}))=\frac {1}{\lambda _{1}}\eta _{1}\), where E(N) = η1
For case 2 the derivation is similar.
-
(c)
This part is symmetric to part (b). We observe that there can be two possibilities- first: the first interarrival (since \(t({k^{1}_{i}})\)) of the second stock is less than the first interarrival of the first stock. In this case there must be a k such that first interarrival of the first stock is more than k th interarrival of second stock but less than k + 1th interarrival of the second stock. For such case N′ = k. In the second possibility: the first interarrival (since \(t({k^{1}_{i}})\)) of the first stock is less than the first interarrival of the second stock. Thus N′ = 1. Therefore, \(E(t({k_{i}^{2}})-t(k_{i-1}^{2}))=\frac {1}{\lambda _{2}}\eta _{2}\), where η2 is the mean of N′ defined below,
$$ \begin{array}{@{}rcl@{}} N'=\begin{cases} & 1\quad\text{if}\quad {\sum}_{i=1}^{k}S_{i}<T_{1}<{\sum}_{i=1}^{k+1}S_{i}\quad\text{for}\ \text{some}\quad k(\geq1)\ \\\ & k\quad\text{if}\quad {\sum}_{i=1}^{k}T_{i}<S_{1}<{\sum}_{i=1}^{k+1}T_{i}\quad\text{for}\ \text{some}\quad k(\geq1) \end{cases} \end{array} $$and
$$ \begin{array}{@{}rcl@{}} \eta_{2}=\sum\limits_{k=1}^{\infty}\Bigg[\Big\{F_{B(1,k+1)}\Big(\frac{\lambda_{1}}{\lambda_{1}+\lambda_{2}}\Big)-F_{B(1,k)}\Big(\frac{\lambda_{1}}{\lambda_{1}+\lambda_{2}}\Big)\Big\}+ \\ k\Big\{F_{B(1,k+1)}\Big(\frac{\lambda_{2}}{\lambda_{1}+\lambda_{2}}\Big)-F_{B(1,k)}\Big(\frac{\lambda_{2}}{\lambda_{1}+\lambda_{2}}\Big)\Big\}\Bigg]. \end{array} $$
□
A.4 Proof of Lemma 1
Proof.
Note that Y (η1) + Y (𝜖1) − Y (η2) − Y (𝜖2) is independent of Y (u1) − Y (u2) and symmetric around 0. Now the proof can be completed using symmetry arguments. □
A.5 Proof of Theorem 4
Proof.
Due to symmetry of two configurations, it is enough to prove the theorem for one configuration. Let us consider the 4th configuration. Here I1 = u1, I2 = u2, \({I_{1}^{c}}=\epsilon _{1}+\eta _{1}\) and \({I_{2}^{c}}=\epsilon _{2}+\eta _{2}\).
The Kendall’s tau for this nonsynchronous configuration, as defined in section 1, is:
According to our notation, A = (X1(u1) − X2(u2))(Y1(u1) − Y2(u2)) and B = (X1(u1) − X2(u2))(Y1(𝜖1) + Y1(η1) − Y2(𝜖2) − Y2(η2)). So, ρτ = E(sign(A + B)) and \(\tilde {\rho _{\tau }}=E(\text {sign}(A))\). Let us denote the region, where sign(A)≠sign(A + B), by N i.e. N = {sign(A)≠sign(B) &∣B∣ > ∣A∣}.
But
From expressions of E(sign(A∣N)), using Lemma 2 and assumption \({\mathscr{B}}_{2}\), we get the result. □
A.6 Proof of Theorem 5
Proof.
Due to symmetry of two configurations, it is enough to prove the theorem for one configuration. We consider the case of Fig. 5 (4th configuration). Here I1 = u1, I2 = u2, \({I_{1}^{c}}=\epsilon _{1}+\eta _{1}\) and \({I_{2}^{c}}=\epsilon _{2}+\eta _{2}\).
The Kendall’s tau for this nonsynchronous configuration, as defined in section 1, is:
According to our notation, A = (X1(u1) − X2(u2))(Y1(u1) − Y2(u2)) and B = (X1(u1) − X2(u2))(Y1(𝜖1) + Y1(η1) − Y2(𝜖2) − Y2(η2)). So,
and
where \(\tilde {\rho }_{\tau }\) is the true Kendall’s tau and p = P(sign(A + B) = sign(A)).
Therefore,
Now we have to calculate the probability p.
The 3rd step of the above derivation is justified because {sign(A) = sign(B)} and {sign(A)≠sign(B) & ∣A∣ > ∣B∣} are clearly independent. Also independence of {sign(A)≠sign(B) and {∣A∣ > ∣B∣} is self evident which results in step 4. Note that sign(A) and sign(B) will depend on sign(Y1(u1) − Y2(u2)) and sign(Y1(𝜖1) + Y1(η1) − Y2(𝜖2) − Y2(η2)). Due to independent increment property, sign(Y1(u1) − Y2(u2)) and sign(Y1(𝜖1) + Y1(η1) − Y2(𝜖2) − Y2(η2)) are independent. Therefore \(P(\text {sign}(A)=\text {sign}(B))=P(\text {sign}(A)\neq \text {sign}(B))=\frac {1}{2}\).
Let us denote the events C = {sign(A) = sign(B)} and D = {sign(A)≠sign(B)&∣A∣ > ∣B∣}. Note that \(\tilde {\rho }_{\tau }=E(\text {sign}(A))=E(\text {sign}(A)\mid C)\) as condition on the event C does not influence the expected value of sign(A).
Inserting this to Eq. A.1 we get
Hence we proved the result. □
B Positive and Negative Connection
Suppose X and Y are positively associated. And (X1,Y1) and (X2,Y2) are two identical pairs. Then bigger the value of Y1 from Y2, bigger the probability of {X1 > X2} would expected to be. This good expectation is formalized in the following definition.
DEFINITION 5
X is said to be positively connected to Y if ∀M > 0,
X is said to be negatively connected to Y if ∀M > 0, the signs of the above inequalities are reversed.
Note that, the definition is not symmetric for X and Y. X is positively connected to Y does not mean that Y is positively connected to X.
DEFINITION 6
If X is positively (or negatively) connected to Y and Y is positively (or negatively) connected to X, then we say there is a positive (or negative) connection between X and Y.
It is easy to see that if X and Y have positive (or negative) connection, then Assumption \({\mathscr{B}}\) is satisfied. This is because X1 − X2∣Y1 − Y2=dX2 − X1∣Y2 − Y1. Therefore this stronger but reasonable assumption unifies the previous assumptions and enable us to state the following theorem.
Theorem 6.
Under the assumption that returns of two stocks have positive (or negative) connection, for the pairs with 1st and 4th configuration,
-
1.
\( \mid \tilde {\rho _{\tau }}\mid >\mid \rho _{\tau }\mid \)
-
2.
\( \text {sign}(\tilde {\rho _{\tau }})=\text {sign}(\rho _{\tau }) \)
where ρτ is the Kendall’s tau calculated on the paired data with 1st and 4th configurations, i.e. ρτ = E(sign(X1 − X2)(Y1 − Y2)), where (X1,Y1) and (X2,Y2) are independent pairs of the same configurations.
Rights and permissions
About this article
Cite this article
Chakrabarti, A., Sen, R. Copula Estimation for Nonsynchronous Financial Data. Sankhya B 85 (Suppl 1), 116–149 (2023). https://doi.org/10.1007/s13571-022-00276-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13571-022-00276-3