Keywords

6.1 Introduction and Summary

One often comes across problems where the probability of male and that of female each being equal to 50 % is questioned. This question can be thought of in terms of the human sex ratio of X: Y (which is currently 101 male to 100 female, CIA Fact Book, 2013) and the corresponding proportions being same to that of their corresponding distributions being identical. In this context, X and Y are thought to be nonnegative random variables. However, if the X and Y are independent identically distributed i.i.d.; it is well-known that the ratios \(X\!{/}(X+Y)\) and \(Y\!{/}(X+Y)\) are equal in distribution. This prompts the question: if we remove the assumption of mutual independence of X and Y, can the equidistribution of these ratios still hold, and under what reasonable conditions? In what follows, we explore some general answers to this question. We show that, if X and Y have the same distribution then \(\frac{X}{X+Y}\) need not have the same distribution as \(\frac{Y}{X+Y}\) and identify sufficient conditions for an affirmative answer. Extension of our main result to the case of n-dimensional random vectors \((X_1, \cdots, X_n)\) for \(n \,\geq\, 2\) is indicated.

Generically, the cumulative distribution function (c.d.f.) of a random vector \((X,Y)\) is denoted by \(F_{X,Y}\) and its probability density function (p.d.f.), when it exists, by \(f_{X,Y}\). For higher dimensional random vectors \((X_1, \cdots, X_n)\), \(n \geq 2\); \(F_{X_1, \cdots, X_n}\) and \(f_{X_1, \cdots, X_n}\) correspondingly denote its c.d.f. and p.d.f., respectively. We use \(\stackrel{d}{=}\) to denote equality in distribution of (r.v.s).

6.2 Counterexample

We show a counterexample to demonstrate that \(X \stackrel{d}{=}Y\) does not guarantee equality of distribution of the ratios \(\frac{X}{X+Y}\) and \(\frac{Y}{X+Y}\). For this purpose, we use a suitable joint density of \((X,Y)\), that we construct via the standard normal density

$$\begin{aligned} \phi(x) = \frac{1}{\sqrt{2\pi}} {\rm exp} \left(-\frac{x^2}{2} \right), \qquad -\infty < x < \infty.\end{aligned}$$

Consider the joint density function on \(R^2 = (-\infty, \infty) \times (-\infty, \infty)\), given by

$$\begin{aligned} f_{X,Y}(x,y) = \big[1+xy\phi(x)\phi^2(y)\big]\phi(x)\phi(y).\end{aligned}$$

To see that \(f_{X,Y}\) is a valid joint density we need to observe that \(\phi(x) \,<\, 1\) and that \(|x\phi(x)| \,<\, 1\) (because \(\frac{x^2}{2\pi} < \textit{exp}(x^2)\)). This in turn gives \(1+xy\phi(x)\phi^2(y)\,>\,0\) and the fact that the mean of a scaled standard normal random variable is zero, which make \(f_{X,Y}\) a valid density and both the marginals to be standard normal. Hence, X and Y have the same distribution. We will now derive the density of \(V = \frac{Y}{X+Y}\) and then show that densities of V and \(1-V = \frac{X}{X+Y}\) are not the same. Let \(W = X\) and \(Y = \frac{VW}{1-V}\). The absolute value of the Jacobian is given by \(\frac{|w|}{(1-v)^2}\). Hence, the joint density \(f_{W,V}\) of \((W, V)\) on the R 2 plane is given by,

$$ f_{W,V}(w,v) = f_{X,Y}\left(w, \frac{wv}{(1-v)} \right) \frac{|w|}{{(1-v)}^2}, $$

which simplifies to,

$$\frac{|w|}{2\pi(1-v)^2}\left[1+\bigg(\frac{w^2v}{1-v}\bigg) \frac{{\rm exp} \left(\frac{-w^{2}}{2}-\frac{w^2v^2}{(1-v)^2} \right)}{(2\pi)^{3/2}} \right]$$
$$\times{\rm exp}\left(\frac{-w^2}{2}\right){\rm exp}\left(-\frac{w^2v^2}{2(1-v)^2}\right). $$

In the above joint density, we integrate out the w variable, to get the marginal density of V. Note that a closed form of the density of V can be obtained by using the facts that if N is a normal random variable with mean zero and variance \(\sigma_N^2\) then \(E|N| = \sqrt{\frac{2}{\pi}}\sigma_N\) and \(E|N|^3 = 2\sqrt{\frac{2}{\pi}}\sigma_N^3.\) Hence, the density of V is given by

$$\begin{aligned}f_V (v) = \int_{-\infty}^{\infty}f_{U,V}(u,v) du = \frac{1}{\pi\big(v^2+(1-v)^2\big)}+\frac{v(1-v)}{\sqrt{2}\pi^{5/2}\big(2(1-v)^2 + 3v^2\big)^2}. \nonumber\end{aligned}$$

Clearly, \(f_V(v) \neq f_V (1-v)\), and the latter is the density of \(U:=\frac{X}{X+Y}\). The two ratios U and V are not equal in distribution.

Dependence between X and Y in the counterexample does not establish the necessity of their statistical independence for the equality in distribution of the ratios U, V to hold. In fact, our results are typically based on the assumption of a joint distribution, and cover independence as a special case.

6.3 Main Results

For a random vector \((X,Y)\), denote the ratios of the two component r.v.s to their sum, by

$$U:= \frac{X}{X+Y}, \qquad V:= \frac{Y}{X+Y}.$$
(6.1)

It may be noted that while \(U\,+\,V=1\), the r.v.s \(U \mbox{and} V\) cannot be thought of as the proportional contribution of the components of \((X,Y)\) to their sum, as is obvious from the preceding counterexample.

If X, Y are absolutely continuous with a (joint) density, then so are U and V, with their respective densities related via

$$f_V (v) = f_U (1-v).$$
(6.2)

Standard calculations yield an expression for the density of U. In particular, choosing the transformation

$$\begin{aligned} U = \frac{X}{X+Y}, \qquad T = X+Y \;;\end{aligned}$$

the joint density of \((U,T)\) is easily seen to be \(f_{U,T} (u,t) = f_{X,Y} (ut,\, (1-u)t)\;|t|\), so that the marginal density of U is

$$f_U (u) = \int_{- \infty}^{\infty} f_{X,Y} \bigg (ut,\, (1-u)t \bigg) |t| dt,$$
(6.3)

which together with (6.2) implies

$$\begin{aligned}f_V (v) & = & \int_{- \infty}^{\infty} f_{X,Y} \bigg ((1-v)t,\, vt \bigg) |t| dt\\ \nonumber & \neq & \int_{- \infty}^{\infty} f_{X,Y} \bigg (vt,\, (1-v)t \bigg) |t| dt = f_U (v), -\infty < v < \infty,\end{aligned}$$

in general.

Define H to be symmetric in its arguments \((x,y)\), if

$$\begin{aligned} H(x,y) = H(y,x), \mbox{all}\, (x,y).\end{aligned}$$

If, however, \(f_{X,Y}\) has this symmetry, then the earlier equality obviously holds. We thus have the following proposition.

Proposition 6.1

If \((X,Y)\) admits a joint density that is symmetric in its arguments, then the ratios in (6.1) are equal in distribution (\(U \stackrel{d}{=} V\)).

Remark 1.

There is no explicit assumption that \(X \stackrel{d}{=} Y\) in the premise of the earlier proposition, as it is an easy consequence of the symmetry; viz,

$$\begin{aligned} f_X (x) = \int_{-\infty}^{\infty} f_{X,Y} (x,y) \, dy = \int_{-\infty}^{\infty} f_{X,Y} (y,x) \, dy = f_Y (x).\end{aligned}$$

Remark 2.

In view of the Remark 1 earlier, in the absolutely continuous case, the classic result that X,Y i.i.d. implies \(U \stackrel{d}{=} V\) follows as a special case of proposition 6.1, since if X,Y are i.i.d. with a common p.d.f. \(f_X(\cdot) \equiv f_Y(\cdot)\), then the joint p.d.f. satisfies

$$\begin{aligned} f_{X,Y}(x,y) = f_X(x)f_Y(y) = f_Y(x)f_X(y) = f_{X,Y}(y,x).\end{aligned}$$

While proposition 6.1 provides an answer to our question when X,Y are absolutely continuous, an affirmative answer in the general case, where the joint c.d.f. of X,Y may also have discrete or/and singular components, is given by our next proposition. Note that F(x,y) being symmetric in \((x,y)\) implies that \(P\{(X,Y) \in ({-}\infty,x] \times ({-}\infty,y]\} = P\{(Y,X) \in ({-}\infty,x] \times (\!\!-\infty,y]\}\) for all \((x, y) \in R^2\). This, in turn implies that \((X,Y) \stackrel{d}{=}(Y, X).\)

Proposition 6.2

If the joint c.d.f. \(F_{X,Y}(x,y)\) is symmetric in \((x,y)\), then \(U {\stackrel{d}{=} V}\).

Proof.

With \(F_{X,Y}(x,y)\) also denoting the Lebesgue–Stieltjes measure on the plane induced by the joint c.d.f., we have,

$$\begin{aligned}E(e^{itU}) &=& \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} {\rm exp} \bigg(it \big(\frac{x}{x+y}\big)\bigg) dF_{X,Y}(x,y)\nonumber\\ \nonumber&=& \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} {\rm exp} \bigg(it \big(1- \frac{y}{x+y}\big)\bigg) dF_{X,Y}(y,x)\end{aligned}$$
(6.4)
$$\begin{aligned}&=& E(e^{it(1-U)}) = E(e^{itV}), \qquad -\infty < t < \infty,\end{aligned}$$
(6.5)

where the second equality uses the symmetry condition of the joint c.d.f. and the two corresponding measures are the same because they are seen to be same of the relatively determining class of sets \((\!\!-\infty,x] \times (\!\!-\infty,y].\) Thus, the ratios U and V having the same characteristic function and therefore must be equal in distribution. Alternately, \(F_{X,Y}(x,y) = F_{X,Y}(y,x)\) implies that \((X, Y) \stackrel{d}{=} (Y, X)\) and \(h(x, y)= \frac{x}{x+y}\) being a continuous function gives \(h(X,Y) \stackrel{d}{=} h(Y,X).\) Interestingly, converse of Proposition 6.2 is not true namely, \(X/(X + Y) \stackrel{d}{=}Y/(X + Y)\) does not imply that X and Y have symmetric distribution functions. To see this, let \((X, Y)\) take on the bivariate pairs (1,2) and (4,2) with probability 1/2 each. Then \(X/(X+Y)\) and \(Y/(X+Y)\) both have identical distributions, taking on the values 1/3 and 2/3 with probability 1/2 each. Yet, \(1/2 = P[X = 1,Y = 2] \neq P[X = 2, Y = 1] = 0\).

The joint c.d.f.’s symmetry condition was motivated by the corresponding assumption in Proposition 6.1 and the following observation.

Lemma 6.3

  1. (i)

    Suppose X,Y are absolutely continuous. Then \(F_{X,Y}\) is symmetric in its arguments \((x,y)\) if and only if so is \(f_{X,Y}\).

  2. (ii)

    The symmetry condition in Proposition 6.2 implies X and Y are identically distributed.

Proof.

  1. (i)

    Suppose \(f_{X,Y}\) is symmetric in \((x,y)\). Then the nonnegativity of the integrand and Fubini’s theorem implies,

    $$\begin{aligned}F_{X,Y}(x,y)=P(X \leq x,\;Y \leq y) & = & \int_{- \infty}^x \int_{- \infty}^y f_{X,Y}(u,v)\; dv du\\ & = & \int_{- \infty}^x \int_{- \infty}^y f_{X,Y}(v,u)\; dv du\\ & = & \int_{- \infty}^y \int_{- \infty}^x f_{X,Y}(v,u)\; du dv\\ & = & P(X \leq y,\;Y \leq x) \equiv F_{X,Y}(y,x).\end{aligned}$$

    Conversely, supposing \(F_{X,Y}\) is symmetric in its argument \((x,y)\), and has a joint density; we have,

    $$\begin{aligned} f_{X,Y}(x,y) = \frac{\partial^2}{\partial x,\partial y} F_{X,Y}(x,y)=\frac{\partial^2}{\partial x,\partial y} F_{X,Y}(y,x)=f_{X,Y}(y,x).\end{aligned}$$
  2. (ii)

    Using the pointwise symmetry of \(F_{X,Y}(\cdot, \cdot)\) on R 2,

    $$\begin{aligned} P(X \leq x) = \lim_{y \rightarrow \infty} F_{X,Y}(x,y) = \lim_{y \rightarrow \infty} F_{X,Y}(y,x)= P(Y \leq x).\end{aligned}$$

Remark 3.

The symmetry condition in Proposition 6.2 is of course equivalent to X,Y being “exchangeable”, i.e., \((X,Y) \stackrel{d}{=} (Y,X)\). For a pair of r.v.s however, it is much more simply stated as the property that the joint c.d.f. \(F_{X,Y} (\cdot,\cdot): R^2 \longrightarrow [0,1]\) is symmetric in its arguments. For random vectors of higher dimensions, the corresponding condition that the c.d.f. \(F_{X_1, \cdots, X_n}\) is permutation invariant in its arguments is more succinctly and elegantly descried as \(X_1, \cdots, X_n\) being exchangeable; thus generalizing our earlier proposition as follows.

Proposition 6.4

If \(X_1, \cdots, X_n (n \geq 2)\) is a finite, exchangeable sequence, then

$$\begin{aligned} \frac{X_j}{S_n} \stackrel{d}{=} \frac{X_k}{S_n}, \qquad j,k \in \{1,2, \cdots,n\}, j \neq k\end{aligned}$$

where \(S_n:= \sum_{i=1}^n X_i\).

Proof.

Suppose \(X_1, \cdots, X_n (n \,\geq\, 2)\) are exchangeable, i.e., \((X_{i_1}, \cdots, X_{i_n}) \stackrel{d}{=}\) \((X_1, \cdots, X_n)\) for all permutations \((i_1, \cdots, i_n)\) of \((1, \cdots, n)\). For brevity, denote by

$$\begin{aligned}{\boldsymbol{X}} &:= & (X_1, \cdots, X_n), \mbox{and}\\ 0_j {\boldsymbol{X}} &:= & (X_1, \cdots,X_{j-1},X_{j+1}, \cdots, X_n),\end{aligned}$$

be the corresponding vector that skips the j-th coordinate X j , and the corresponding values assumed as, \({\boldsymbol{x}}\) and \(0_j {\boldsymbol{X}}\), respectively. When

$$\begin{aligned}E \bigg \{{{\rm exp} \bigg(it \frac{X_j}{S_n} \bigg)} \bigg\} &=& \int_{-\infty}^{\infty} {{\rm exp} \bigg(it \frac{u}{s_n} \bigg)} dF_{\boldsymbol{X}} (x_1,\cdots,x_{j-1},u,x_{j+1},\cdots,x_n)\\ &=& \int_{-\infty}^{\infty} {{\rm exp} \bigg(it \frac{u}{s_n} \bigg)} dF_{(X_j, 0_j {\boldsymbol{X}})} (u, 0_j {\boldsymbol{x}})\\ &=& \int_{-\infty}^{\infty} {{\rm exp} \bigg(it \frac{u}{s_n} \bigg)} dF_{(X_k, 0_k {\boldsymbol{X}})} (u, 0_k {\boldsymbol{x}})\\ &=& E \bigg \{{{\rm exp} \bigg(it \frac{X_k}{S_n} \bigg)} \bigg\},\end{aligned}$$

where the value s n of S n is given by \(s_n = u+ \smash{\sum_{{i=1}_{i \neq j}}^n x_i}\) or \(s_n = u+ \smash{\sum_{{i=1}_{i \neq k}}^n x_i}\) in the second or third integrands earlier, respectively. Note, the two equalities preceding the last step hold, since \((X_j, 0_j {\boldsymbol{X}}) \stackrel{d}{=} {\boldsymbol{X}} \stackrel{d}{=} (X_k, 0_k {\boldsymbol{X}})\) for all pairs j,k, by exchangeability. Alternately, since \((X_j, 0_j {\boldsymbol{X}}) \stackrel{d}{=}(X_k, 0_k {\boldsymbol{X}})\) and \(h({\boldsymbol{x}}) = \frac{x_{1}}{s_{n}}\) is a continuous function, \(h(X_j, 0_j {\boldsymbol{X}}) \stackrel{d}{=}h(X_k, 0_k {\boldsymbol{X}})\). Hence the result.

In conclusion, any Archimedian copula can be used as a generator of such exchangeable r.v.s, Nelson (1999) and Genest et al. (1986). These results are also applicable to Bayesian contexts, where the observations are conditionally i.i.d. given an environmental variable with a prior distribution.

{