1 Introduction

An important step in statistical meta-analysis is to carry out appropriate tests of homogeneity of the relevant effect sizes before pooling of evidence or information across studies. While the familiar Cochran (1954) Chi-square goodness-of-fit test is widely used in this context, it turns out that this test may perform poorly in terms of not maintaining Type I error rate in many problems. In particular, this is indeed a serious drawback of Cochran’s test for testing the homogeneity of several proportions in case of sparse data. A recent meta-analysis (Nissen and Wolski 2007), addressing the cardiovascular safety concerns associated with (rosiglitazone), has received wide attention (Cai et al. 2010; Tian et al. 2009; Shuster et al. 2007; Shuster 2010; Stijnen et al. 2010). Two difficulties seem to appear in this study: first, study sizes (N) are highly unequal, especially in control arm, with over \(95 \%\) of the studies having sizes below 400 and two studies having sizes over 2500; second, event rate is extremely low, especially for death end point, with the maximum death rate in the treatment arm being \(2 \%\), while in control arm, over \(80 \%\) of the studies have zero events. The original meta-analysis (Nissen and Wolski 2007) was performed under fixed effects framework, as the diagnostic test based on Cochran’s Chi-square test failed to reject homogeneity. However, with two large studies dominating the combined result, people agree random effects analysis is the superior choice over fixed effects (Shuster et al. 2007). Moreover, the results for the fixed and random effect analyses are discordant. While different fixed effect and random effect approaches are proposed, the problem of testing for homogeneity of effect sizes is less familiar and often not properly addressed. This is precisely the object of this paper, namely a thorough discussion of tests of homogeneity of proportions in case of sparse data situations. Recently, there are some studies on testing the equality of means when the number of groups increases with fixed sample sizes in either ANOVA (analysis of variance) or MANOVA (multivariate analysis of variance). For example, see Bathke and Harrar (2008), Bathke and Lankowski (2005) and Boos and Brownie (1995). Those studies have limitation in asymptotic results since they assume all samples sizes are equal, i.e., balanced design. On the other hand, we actually emphasize the case that sample sizes are highly unbalanced and present more fluent asymptotic results for a variety cases including unbalanced cases and small values of proportions in binomial distributions.

In this paper, we first point out that the classical Chi-square test may fail in controlling a size when the number of groups is high and data are sparse. We modify the classical Chi-square test with providing asymptotic results. Moreover, we propose two new tests for homogeneity of proportions when there are many groups with sparse count data. Throughout this study, we present some theoretical conditions under which our proposed tests achieve the asymptotic normality, while most of existing tests do not have rigorous investigation of asymptotic properties.

A formulation of the testing problem for proportions is provided in Sect. 2 along with a review of the literature and suggestion for new tests. The necessary asymptotic theory to ease the application of the suggested test is developed. Results of simulation studies are reported in Sect. 3, and an application to the Nissen and Wolski (2007) data set is made in Sect. 4. Concluding remark is presented in Sect. 5.

2 Testing the homogeneity of proportions with sparse data

In this section, we present a modification of a classical test which is Cochran’s test and also propose two types of new tests. Throughout this paper, our theoretical studies are based on triangular array which is commonly used in asymptotic theories in high dimension. See Park and Ghosh (2007) and Park (2009) for triangular array in binary data and Greenshtein and Ritov (2004) for more general cases. More specifically, let \({\varTheta }^{(k)} = \{ (\pi _1^{(1)}, \pi _2^{(2)}, \ldots , \pi _k^{(k)}) : 0<\pi _{i}^{(k)}<1~~\text{ for } 1\le i \le k\}\) be the parameter space in which \(\pi _i^{(k)}\)s are allowed to be varying depending on k as k increases. Additionally, sample sizes \((n_1^{(1)}, \ldots , n_k^{(k)})\) also change depending on k. However, for notational simplicity, we suppress superscript k from \(\pi _i^{(k)}\) and \(n_i^{(k)}\). The triangular array provides more flexible situations, for example all increasing sample sizes and all decreasing \(\pi _i\)s. On the other hand, the asymptotic results in Bathke and Lankowski (2005) and Boos and Brownie (1995) are based on increasing k, but all sample sizes and \(\pi _i\)s are fixed. This set up provides somewhat limited results, while we present the asymptotic results on the triangular array. Our results will include the asymptotic power functions of proposed tests, while existing studies do not provide them.

2.1 Modification of Cochran’s test

Suppose that there are k independent populations and the ith population has \(X_{i} \sim \hbox {Binomial}(n_i, \pi _i)\). Denote the total sample size and the weighted average of \(\pi _i\)’s by \(N=\sum _{i=1}^kn_i\) and \(\bar{\pi }= \frac{1}{N} \sum _{i=1}^kn_i \pi _i\), respectively. We are interested in testing the homogeneity of \(\pi _{i}\)’s from different groups,

$$\begin{aligned} H_0 : \pi _1 = \pi _2 = \cdots = \pi _k \equiv \pi (\hbox {unknown}). \end{aligned}$$
(1)

To test the above hypothesis in (1), one familiar procedure is Cochran’s Chi-square test in Cochran (1954), namely \(T_S\):

$$\begin{aligned} {T}_S = \sum _{i=1}^k\frac{( X_i - n_i \hat{\bar{\pi }} )^2 }{n_i \hat{\bar{\pi }} (1-\hat{\bar{\pi }})} \end{aligned}$$
(2)

where \({\hat{\pi }} = \frac{\sum _{i=1}^kX_i }{\sum _{i=1}^kn_i}\). \(\mathcal{T}_S\) uses an approximate Chi-square distribution with degrees of freedom \((k-1)\) under \(H_0\). The \(H_0\) is rejected when \({T}_S > \chi ^2_{1-\alpha , k-1}\) where \(\chi ^2_{1-\alpha , k-1}\) is the \(1-\alpha \) quantile of Chi-square distribution with degrees of freedom \((k-1)\). In particular, when k is large, \( \frac{T_S -k}{\sqrt{2k}}\) is approximated by a standard normal distribution under \(H_0\). Although Cochran’s test for homogeneity is widely used, the approximation to the \(\chi ^2\) distribution of \(T_S\) or normal approximation may be poor when the sample sizes within the groups are small or when some counts in one of the two categories are low. This is partly because the test statistic becomes noticeably discontinuous and partly because its moments beyond the first may be rather different from those of \(\chi ^2\).

We demonstrate that the asymptotic Chi-square approximation to \({T}_S\) or normal approximation based on \(\frac{{T}_S-k}{\sqrt{2k}}\) may be very poor when k is large or \(\pi _i\)s are small compared to \(n_i\)s. We provide the following theorem and propose a modified approximation to \(T_S\) which is expected to provide more accurate approximation. Let us define

$$\begin{aligned} T=\frac{\mathcal{T}_S - E(\mathcal{T}_S)}{\sqrt{\mathcal{B}_k}} \end{aligned}$$
(3)

where \(\mathcal{T}_S = \sum _{i=1}^k \frac{(X_i - n_i \bar{\pi })^2}{n_i \bar{\pi }(1-\bar{\pi })}\), \(\mathcal{B}_k \equiv \hbox {Var}(\mathcal{T}_S) = \sum _{i=1}^k \hbox {Var} \left( \frac{(X_i - n_i \bar{\pi })^2}{n_i \bar{\pi }(1-\bar{\pi })} \right) \equiv \sum _{i=1}^k B_i\) and

$$\begin{aligned} \hbox {Var}\left( \frac{X_i - n_i \bar{\pi }}{n_i \bar{\pi }(1- \bar{\pi })} \right) = B_i= & {} \frac{2\pi _i^2(1-\pi _i)^2}{\bar{\pi }^2 (1-\bar{\pi })^2}+ \frac{\pi _i(1-\pi _i)(1-6\pi _i(1-\pi _i))}{n_i\bar{\pi }^2(1-\bar{\pi })^2} \\&+\, \frac{3\pi _i(1-\pi _i)(1-2\pi _i)(\pi _i-\bar{\pi })}{\bar{\pi }^2 (1-\bar{\pi })^2}\\&+\, \frac{4n_i \pi _i(1- \pi _i)(\pi _i-\bar{\pi })^2}{\bar{\pi }^2 (1-\bar{\pi })^2}, \\ E(\mathcal{T}_S)= & {} \sum _{i=1}^k \left( \frac{n_i(\pi _i -\bar{\pi })^2 }{\bar{\pi }(1-\bar{\pi })} + \frac{\pi _i(1-\pi _i)}{\bar{\pi }(1-\bar{\pi })} \right) . \end{aligned}$$

Note that \(\mathcal{T}_S\) is not a statistic since it still includes the unknown parameter \(\bar{\pi }= \sum _{i=1}^k \frac{n_i \pi _i}{N}\). It will be shown later that \(\bar{\pi }\) can be replaced by \(\hat{\bar{\pi }} = \frac{1}{N} \sum _{i=1}^k n_i \hat{\pi }_i\) under \(H_0\) since \(\hat{\bar{\pi }}\) has the ratio consistency (\(\frac{\hat{\bar{\pi }}}{\bar{\pi }} \rightarrow 1\) in probability) under some mild conditions. Define

$$\begin{aligned} \mathcal{B}_{0k} = \sum _{i=1}^k B_{0i} = \sum _{i=1}^k \left( 2-\frac{6}{n_i} + \frac{1}{n_i \bar{\pi }(1-\bar{\pi })} \right) \end{aligned}$$

and

$$\begin{aligned} T_0 = \frac{\mathcal{T}_S - k}{\sqrt{\mathcal{B}_{0k}}} \end{aligned}$$
(4)

which is the T defined in (3) under \(H_0\) since \(E(\mathcal{T}_S) =k\) and \(\mathcal{B}_k = \mathcal{B}_{0k}\) under \(H_0\). The following theorem shows the asymptotic properties of \(T_0\) in (4).

Theorem 1

For \(\theta _i=\pi _i(1-\pi _i)\) and \(\bar{\theta }= \bar{\pi }(1-\bar{\pi })\), if \(\frac{\sum _{i=1}^k\left( \theta _i^4 + \frac{\theta _i}{n_i} \right) }{(\bar{\pi }(1-\bar{\pi }))^4 \mathcal{B}_k^2 } \rightarrow 0\) and \( \frac{\sum _{i=1}^kn_i^2 \theta _i (\pi _i -\bar{\pi })^4 (\theta _i + \frac{1}{n_i}) }{(\bar{\pi }(1-\bar{\pi }))^4\mathcal{B}_k^2} \rightarrow 0\) as \(k\rightarrow \infty \), then we have

$$\begin{aligned} P( T_0 > z_{1-\alpha }) - \bar{\varPhi }\left( \frac{z_{1-\alpha }}{\sigma _k} -\mu _k \right) \rightarrow 0 \end{aligned}$$

where \(\mu _k = \frac{E(\mathcal{T}_S)-k}{\sqrt{\mathcal{B}_{k}}}\), \(\sigma ^2_k = \frac{\mathcal{B}_k}{\mathcal{B}_{0k}}\) and \(\bar{\varPhi }(z) = 1-{\varPhi }(z) =P(Z\ge z) \) for a standard normal distribution Z.

Proof

See “Appendix”. \(\square \)

We propose to use a test which rejects the \(H_0\) if

$$\begin{aligned} T_{\chi } \equiv \frac{{T}_S - k}{ \sqrt{\hat{\mathcal{B}}_{0k}}} > z_{1-\alpha } \end{aligned}$$
(5)

where \(z_{1-\alpha }\) is the \(1-\alpha \) quantile of a standard normal distribution, \(\hat{\mathcal{B}}_{0k} = \sum _{i=1}^k \left( 2-\frac{6}{n_i} + \frac{1}{n_i \hat{\bar{\pi }}(1-\hat{\bar{\pi }})} \right) \) and \(\hat{\bar{\pi }} = \frac{\sum _{i=1}^k n_i \hat{\pi }_i}{N}\).

Using Theorem 1, we obtain the following results which states that our proposed modification of Cochran’s test in (5) is the asymptotically size \(\alpha \) test, while \(\frac{{T}_S -k}{\sqrt{2k}}\) may fail in controlling a size \(\alpha \) under some conditions.

Corollary 1

Under \(H_0\) and the conditions in Theorem 1, \(T_{\chi }\) in (5) is asymptotically size \(\alpha \) test. A normal approximation to \(\frac{{T}_S -k}{\sqrt{2k}}\) is not asymptotically size \(\alpha \) test unless \(\frac{\mathcal{B}_{0k}}{2k} \rightarrow 1\).

Proof

We first show that \(\hat{\bar{\pi }}/\bar{\pi }\rightarrow 1\) in probability. Under \(H_0\), \(\pi _i \equiv \pi \), we have \(\sum _{i=1}^k n_i \hat{\pi }_i \sim Binomial(N,\pi )\). Using \( \sum _{i=1}^k n_i \pi _i = N\pi \rightarrow \infty \) under \(H_0\), we have

$$\begin{aligned} E \left( \frac{\hat{\bar{\pi }}}{\bar{\pi }} -1 \right) ^2 = \frac{1-\pi }{N \pi } \le \frac{1}{N\pi } \rightarrow 0 \end{aligned}$$

leading to \(\hat{\bar{\pi }}/\bar{\pi }\rightarrow 1\) in probability. From this, we have \( \frac{\hat{\mathcal{B}}_{0k}}{\mathcal{B}_{0k} } \rightarrow 1 \) in probability under \(H_0\). Furthermore, under \(H_0\), since we have \( \frac{\mathcal{B}_{0k}}{\mathcal{B}_k}=1\) and \(E(\mathcal{T}_S)=k\), we obtain \(T_{\chi } - T = ( \sqrt{\frac{\mathcal{B}_{0k}}{ \hat{\mathcal{B}}_{0k} }} -1 )T = o_p(1) O_p(1) = o_p(1)\) which means \(T_{\chi }\) and T are asymptotically equivalent under the \(H_0\). Since \(P_{H_0}(T > z_{1-\alpha } ) - \bar{\varPhi }(z_{1-\alpha }) \rightarrow 0 \), we have \(P_{H_0}(T_{\chi } > z_{1-\alpha } ) - \alpha \rightarrow 0\) which means \(T_{\chi }\) is the asymptotically size \(\alpha \) test. On the other hand, it is obvious that \( \frac{\mathcal{T}_S -k}{\sqrt{2k}}\) does not have an asymptotic standard normality unless \({\mathcal{B}_{0k}}/(2k) \rightarrow 1\) since \( \frac{\mathcal{T}_S-k}{\sqrt{2k}} = \sqrt{\frac{\hat{\mathcal{B}}_{0k}}{2k}} T_{\chi }\) under the \(H_0\). \(\square \)

Under \(H_0\), since \(\mathcal{B}_{0k} = 2k + (\frac{1}{ \bar{\pi }(1-\bar{\pi })} -6 ) \sum _{i=1}^k \frac{1}{n_i}\), we expect \(\frac{\mathcal{B}_{0k}}{2k}\) to converge to 1 when \( (\frac{1}{\pi (1-\pi )} -6 ) \sum _{i=1}^k \frac{1}{n_i} =o(k)\) where \(\pi _i= \bar{\pi }\equiv \pi \) under \(H_0\). This may happen when \(\pi \) is bounded away from 0 and 1 and \(n_i\)s are large. If all \(n_i\)s are bounded by some constant, say C, and \(|\frac{1}{\pi (1-\pi )} -6| \ge \delta >0\) (this can happen when \(\pi <\epsilon _1\) or \(\pi > 1-\epsilon _2\) for some \(\epsilon _1>0\) and \(\epsilon _2>0\)), then \(\frac{\mathcal{B}_k}{2k}\) does not converge to 1. Even for \(n_i\)s are large, if \(\pi \rightarrow 0\) fast enough, then \(\frac{\mathcal{B}_{0k}}{2k}\) does not converge to 1. For example, if \(\pi =1/k\) and \(n_i=k\) as \(k \rightarrow \infty \), then \(\frac{\mathcal{B}_{0k}}{2k} \rightarrow 3/2\) which leads to \(\frac{\mathcal{T}_S-k}{\sqrt{2k}} \rightarrow N(0, \frac{3}{2})\) in distribution. This implies that \(P(\frac{\mathcal{T}_S -k}{\sqrt{2k}}> z_{1-\alpha }) \rightarrow 1-{\varPhi }( \sqrt{\frac{2}{3}} z_{1-\alpha } ) > \alpha \), so the test obtains a larger asymptotic size than a given nominal level. To summarize, if either \(\pi \) is small or \(n_i\)s are small, we may not expect an accurate approximation to \(\frac{\mathcal{T}_S-k}{\sqrt{2k}}\) based on normal approximation, so the sparse binary data with small \(n_i\)s and a large number of groups (k) need to be handled more carefully.

2.2 New tests

In addition to the modified Cochran’s test \(T_{\chi }\), we also propose new tests designed for sparse data when k is large. Similar to the asymptotic normality of \(T_{\chi }\), it will be justified that our proposed tests have the asymptotic normality when \(k \rightarrow \infty \) although \(n_i\)s are not required to increase. Toward this end, we proceed as follows. Let \(||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}} = \sum _{i=1}^k n_i (\pi _i - \bar{\pi })^2\) which is weighted \(l_2\) distance from \({\varvec{\pi }}=(\pi _1,\pi _2,\ldots , \pi _k)\) to \({\varvec{\bar{\pi }}} = (\bar{\pi }, \bar{\pi }, \ldots , \bar{\pi })\) where \(\mathbf{n}=(n_1,\ldots , n_k)\). The proposed test is based on measuring the \(||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}}\). Since this is unknown, one needs to estimate the \(||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}}\). One typical estimator is a plug-in estimator such as \(||\hat{{\varvec{\pi }}} - {{\varvec{\hat{\bar{\pi }}}}}||_{\mathbf{n}}\); however, this estimator may have a significant bias. To illustrate this, note that

$$\begin{aligned} E||\hat{{\varvec{\pi }}} - {{\varvec{\hat{ \bar{\pi }}}}}||^2_{\mathbf{n}}= & {} \sum _{i=1}^k\pi _i (1-\pi _i)+ \sum _{i=1}^k\frac{n_i \pi _i (1-\pi _i)}{N} - \frac{2}{N} \sum _{i=1}^kn_i \pi _i (1-\pi _i)\nonumber \\&+ \sum _{i=1}^kn_i (\pi _i - \bar{\pi })^2 \\= & {} \sum _{i=1}^kc_i \pi _i (1-\pi _i)+ ||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}} \end{aligned}$$

where \(c_i =(1 - \frac{n_i}{N})\). This shows that \(||\hat{{\varvec{\pi }}} - {{\varvec{\hat{ \bar{\pi }}}}}||^2_{\mathbf{n}}\) is an overestimate of \(||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}}\) by \(\sum _{i=1}^kc_i \pi _i (1-\pi _i)\) which needs to be corrected. Using \(E\left[ \frac{n_i}{n_i-1} \hat{\pi }_i (1-\hat{\pi }_i)\right] = \pi _i (1-\pi _i)\) for \(\hat{\pi }_i = \frac{x_i}{n_i}\), we define \(d_i=\frac{n_i c_i}{n_i-1}\) and

$$\begin{aligned} T = \sum _{i=1}^n n_i (\hat{\pi }_i - \hat{\bar{\pi }})^2 - \sum _{i=1}^kd_i \hat{\pi }_i (1-\hat{\pi }_i)\equiv ||\hat{{\varvec{\pi }}} - {{\varvec{\hat{ \bar{\pi }}}}}||^2_{\mathbf{n}} - \sum _{i=1}^kd_i \hat{\pi }_i (1-\hat{\pi }_i) \end{aligned}$$
(6)

which is an unbiased estimator of \(||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}}\). This implies \(E(T)= ||{\varvec{\pi }} - {\varvec{\bar{\pi }}}||^2_{\mathbf{n}} \ge 0\) and “=” holds only when \(H_0\) is true. Therefore, it is natural to consider large values of T as an evidence supporting \(H_1\), and we thus propose a one-sided (upper) rejection region based on T for testing \(H_0\). Our proposed test statistics are based on T of which the asymptotic distribution is normal distribution under some conditions.

We derive the asymptotic normality of a standardized version of T under some regularity conditions. Let us decompose T into two components, say \(T_1\) and \(T_2\):

$$\begin{aligned}&T\displaystyle = \sum _{i=1}^kn_i(\hat{\pi }_i - \pi _i + \pi _i - {\bar{\pi }} + {\bar{\pi }} - \hat{\bar{\pi }} )^2 - \sum _{i=1}^kd_i \hat{\pi }_i (1-\hat{\pi }_i)\nonumber \\&\displaystyle = \underbrace{\sum \nolimits _{i=1}^k \left\{ n_i (\hat{\pi }_i - \pi _i)^2 - d_i \hat{\pi }_i (1-\hat{\pi }_i)+ 2 n_i (\hat{\pi }_i - \pi _i ) (\pi _i-\bar{\pi }) +n_i (\pi _i -\bar{\pi })^2 \right\} }_{T_1} \nonumber \\ \end{aligned}$$
(7)
$$\begin{aligned}&\displaystyle - \underbrace{ N (\hat{\bar{\pi }} - \bar{\pi })^2 }_{T2} \end{aligned}$$
(8)

where \(T_1 \equiv \sum _{i=1}^kT_{1i}\) for \(T_{1i} = n_i (\hat{\pi }_i - \pi _i)^2 - d_i \hat{\pi }_i (1-\hat{\pi }_i)+2 n_i (\hat{\pi }_i - \pi _i ) (\pi _i-\bar{\pi })+n_i (\pi _i -\bar{\pi })^2\). To prove the asymptotic normality of the proposed test, we need some preliminary results stated below in Lemmas 1, 2 and 3 and show the ratio consistency of proposed estimators of \(\hbox {Var}(T_1)\) in Lemma 5.

Lemma 1

Let \(\theta _i= \pi _i (1-\pi _i)\). When \(X_i \sim Binomial (n_i, \pi _i)\) and \(\hat{\pi }_i = \frac{X_i}{n_i}\), we have

$$\begin{aligned} E[(\hat{\pi }_i - \pi _i)^3]= & {} \frac{(1-2\pi _i)\theta _i}{n_i^2},~~ E[(\hat{\pi }_i - \pi _i)^4] = \frac{3\theta _i^2}{n_i^2} + \frac{(1-6\theta _i) \theta _i}{n_i^3}\\ E[\hat{\pi }_i (1-\hat{\pi }_i)]= & {} \frac{n_i-1}{n_i} \theta _i, \\ \pi _i^l= & {} E\left[ \frac{n_i^l}{\prod _{j=0}^{l-1} \left( n_i -j \right) } \prod _{j=0}^{l-1} \left( \hat{\pi }_i -\frac{j}{n_i} \right) \right] ,\quad \text{ for } n_i \ge l \text{ and } l=1,2,3,4. \end{aligned}$$

Proof

The first three results are easily derived by some computations. For the last result, note that when \(X_{i}\sim \hbox {Binomial}(n_{i},\pi _{i})\), \(E[X_i(X_i-1)\cdots (X_i-l+1)] = n_i(n_i-1)\cdots (n_i-l+1) \pi _{i}^l\). Let , then we have the above unbiased estimators under \(H_0\) using \(\hat{\pi }= \frac{X}{N} = \frac{1}{N} \sum _{i=1}^kn_i \hat{\pi }_i\). \(\square \)

We now derive the asymptotic null distribution of \(\frac{T_1}{\sqrt{\hbox {Var}(T_1)}}\) and propose an unbiased estimator of \({\hbox {Var}(T_1)}\) which has the ratio consistency property. We first compute \(\hbox {Var}(T_1)\) and then propose an estimator \(\widehat{\hbox {Var}(T_1)}\).

Lemma 2

The variance of \(T_1\), \(\hbox {Var}(T_1)\), is

$$\begin{aligned} \hbox {Var}(T_1)= & {} \sum _{i=1}^k\mathcal{A}_{1i} \theta _i^2 + \sum _{i=1}^k\mathcal{A}_{2i} \theta _i \nonumber \\&+\, 4 \sum _{i=1}^k n_i(\pi _i - \bar{\pi })^2 \theta _i +\frac{4}{N}\sum _{i=1}^kn_i(\pi _i - \bar{\pi })(1-2\pi _i)\theta _i \end{aligned}$$
(9)

where \( \mathcal{A}_{1i}= \left( 2-\frac{6}{n_i} - \frac{d_i^2}{n_i} + \frac{8d_i^2}{n_i^2}-\frac{6d_i^2}{n_i^3} + 12 d_i \frac{n_i-1}{n_i^2} \right) \) and \(\mathcal{A}_{2i}= \frac{ n_i}{N^2}\) for \(d_i = \frac{n_i}{n_i-1} \left( 1-\frac{n_i}{N} \right) \) .

Proof

See “Appendix”. \(\square \)

Under the \(H_0\) (\(\pi _i = \pi \) for all \(1\le i \le k\)), the third and fourth terms including \(\pi _i-\bar{\pi }\) in (9) are 0, and therefore, we obtain the \(\hbox {Var}(T_1)\) under \(H_0\) as follows;

$$\begin{aligned} \hbox {Var}_{H_0}(T_1)\equiv & {} \mathcal{V}_1 = \sum _{i=1}^k \left\{ \mathcal{A}_{1i} \theta _i^2 + \mathcal{A}_{2i} \theta _i \right\} \end{aligned}$$
(10)
$$\begin{aligned}= & {} \mathcal{V}_{1*} = (\pi (1-\pi ))^2 \sum _{i=1}^k \mathcal{A}_{1i}+ \pi (1-\pi ) \sum _{i=1}^k \mathcal{A}_{2i} . \end{aligned}$$
(11)

\(\mathcal{V}_1\) in (10) and \(\mathcal{V}_{1*}\) in (11) are equivalent under the \(H_0\); however, the estimators may be different depending on whether \(\theta _i\)s are estimated individually from \(x_i\) or the common value \(\pi \) is estimated in \(\mathcal{V}_{1*}\) by the pooled estimator \(\hat{\pi }\). We shall consider these two approaches for estimating \(\mathcal{V}_1\) and \(\mathcal{V}_{1*}\).

First, we demonstrate the estimator for \(\mathcal{V}_1\) in (10). \( \mathcal{V}_{1i} \equiv \mathcal{A}_{1i} \theta _i^2 + \mathcal{A}_{2i} \theta _i\) is a fourth degree polynomial in \(\pi _i\), in other words, \(\mathcal{V}_{1i} = a_{1i} \pi _i + a_{2i} \pi _i^2 + a_{3i} \pi _3^3 + a_{14} \pi _i^4\) where \(a_{ij}\)’s depend only on N and \(n_i\). As an estimator of \(\mathcal{V}_1 = \sum _{i=1}^k(a_{1i} \pi _i + a_{2i} \pi _i^2 + a_{3i} \pi _i^3 + a_{4i} \pi _i^4)\), we consider unbiased estimators of \(\pi _i\), \(\pi _i^2\), \(\pi _i^3\) and \(\pi _i^4\). Let \(\eta _{li}= \pi _{i}^l\), \(l=1,2,3,4\), then unbiased estimators of \(\eta _{li}\), say \(\hat{\eta }_{li}\), are obtained directly from Lemma 1, leading to the first estimator of \(\mathcal{V}_{1}\), as

(12)

where \( \hat{\eta }_{li} = \frac{n_i^l}{\prod _{j=1}^{l-1} (n_i-j)} \prod _{j=0}^{l-1} \left( \hat{\pi }_i - \frac{j}{n_i} \right) \) for \(l=1,2,3,4\) from Lemma 1 and

$$\begin{aligned} a_{1i}= \mathcal{A}_{2i},\quad a_{2i} = \mathcal{A}_{1i}-\mathcal{A}_{2i},\quad a_{3i} = -\,2\mathcal{A}_{1i},\quad a_{4i}= & {} \mathcal{A}_{1i}. \end{aligned}$$

The second estimator is based on estimating \(\mathcal{V}_{1*}\) in (11). Since all \(\pi _i=\pi \) under \(H_0\), we can write \(\mathcal{V}_{1*}= \sum _{i=1}^{k} \sum _{l=1}^4 a_{li} \pi _i^l = \sum _{i=1}^k\sum _{l=1}^4 a_{li} \pi ^l\) and use an unbiased estimator of \(\pi ^l\) using \(\sum _{i=1}^k x_i \sim Binomial(N, \pi )\) from Lemma 1. This leads to the estimator of \(\mathcal{V}_{1*}\) under \(H_0\) which is

$$\begin{aligned} \hat{\mathcal{V}}_{1*} = \sum _{i=1}^k\sum _{l=1}^4 a_{li} \hat{\eta }_{l}. \end{aligned}$$
(13)

where \(\hat{\eta }_{l} = \frac{N^l}{\prod _{j=0}^l(N-j)} \prod _{j=0}^{l-1} \left( \hat{\pi }- \frac{j}{N} \right) \) and \(\hat{\pi }= \frac{1}{N} \sum _{i=1}^kn_i \hat{\pi }_i\), as used earlier.

Remark 1

Note that \(\hat{\mathcal{V}}_1\) is an unbiased estimator of \(\mathcal{V}_1\) regardless of \(H_0\) and \(H_1\). On the other hand, \(\hat{\mathcal{V}}_{1*}\) is an unbiased estimator of \(\mathcal{V}_{1*}\) only under the \(H_0\) since we use the binomial distribution of the pooled data \(\sum _{i=1}^k x_i\) and use Lemma 1.

For sequences of \(a_n (>0)\) and \(b_n(>0)\), let us define \(a_n \asymp b_n \) if \( 0< \liminf \frac{a_n}{b_n} \le \limsup \frac{a_n}{b_n} < \infty \). The following lemmas will be used in the asymptotic normality of the proposed test.

Lemma 3

Suppose \(n_i \ge 2\) for \(1\le i \le k\). Then,

  1. 1.

    we have \(\mathcal{V}_1 \asymp \sum _{i=1}^k \theta _i^2 + \frac{1}{N^2} \sum _{i=1}^k n_i \theta _i\). In particular, if \(0<c\le \pi _i \le 1-c <1\) for all i and some constant c, we have \(\mathcal{V}_1\asymp k\).

  2. 2.

    we have

    $$\begin{aligned} \sum _{i=1}^k \mathcal{A}_{1i} \theta _i^2 \le \hbox {Var}(T_1) \le K\left( \mathcal{V}_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||_{\mathbf{n} \theta }^2\right) \end{aligned}$$
    (14)

    for some constant \(K>0\) where \(||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||_{\mathbf{n} \theta }^2 = \sum _{i=1}^k n_i (\pi _i - \bar{\pi })^2 \theta _i\). If \( |\pi _i - \bar{\pi }| \ge \frac{1+\epsilon }{N}\) for some \(\epsilon >0\) and \(1\le i\le k\), we have

    $$\begin{aligned} \hbox {Var}(T_1) \asymp \mathcal{V}_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||_{\mathbf{n} \theta }^2. \end{aligned}$$
    (15)

Proof

See “Appendix”. \(\square \)

We provide another lemma which plays a crucial role in the proof of the main result. As mentioned, we have two types of variances such as \(\mathcal{V}_1\) in (10) and \(\mathcal{V}_{1*}\) in (11) and their estimators \(\widehat{\mathcal{V}_1}\) and \(\widehat{\mathcal{V}}_{1*}\). For \(T_1\) in (8), we consider two types of standard deviations based on \(\hbox {Var}(T_1)\) and \(\hbox {Var}(T_1)_*\).

The following lemma provides upper bounds of \(n^4 E(\hat{\pi }- \pi )^8\) and \(E(\hat{\pi }(1-\hat{\pi }))^4\) which are needed in our proof for our mail results.

Lemma 4

If \(X \sim \hbox {Binomial}(n,\pi )\), \(\hat{\pi }= \frac{X}{n}\) and \(\hat{\eta }_l\) is the unbiased estimator of \(\pi ^l\) defined in Lemma  1, then we have, for \(\theta \equiv \pi (1-\pi )\),

$$\begin{aligned} n^4 E(\hat{\pi }- \pi )^8\le & {} C \min \left\{ \theta ^4, \frac{\theta }{n} \right\} \nonumber \\ E(\hat{\pi }(1-\hat{\pi }))^4\le & {} C'\min \left\{ \theta ^4, \frac{\theta }{n^3} \right\} \nonumber \\ E \hat{\pi }^l= & {} \pi ^l + O \left( \frac{\pi }{n^{l-1}} + \frac{\pi ^{l-1}}{n} \right) \quad \text{ for } l \ge 2 \end{aligned}$$
(16)
$$\begin{aligned} E (\hat{\pi }^l - \pi ^l)^2= & {} O\left( \frac{\pi ^{2l-1}}{n} + \frac{\pi }{n^{2l-1}} \right) \quad \text{ for } l\ge 2 \nonumber \\ E (\hat{\eta }_l - \pi ^l)^2= & {} O\left( \frac{\pi ^{2l-1}}{n} + \frac{\pi }{n^{2l-1}} \right) \quad \text{ for } l\ge 2 \end{aligned}$$
(17)

where C and \(C'\) are universal constants which do not depend on \(\pi \) and n.

Proof

See “Appendix”. \(\square \)

Remark 2

It should be noted that the bounds in Lemma 4 depend on the behavior of \(\theta =\pi (1-\pi )\) and the sample size n in binomial distribution. In the classical asymptotic theory for a fixed value of \(\pi \), if \(\pi \) is bounded away from 0 and 1 and n is large, then \(\theta ^4\) dominates \(\frac{\theta }{n}\) (or \(\frac{\theta }{n^3}\)). However, n is not large and \(\pi \) is close to 0 or 1, then \(\frac{\theta }{n}\) (or \(\frac{\theta }{n^3}\)) is a tighter bound of \(n^4 E(\hat{\pi }- \pi )^8\) (or \(E(\hat{\pi }(1-\hat{\pi }))^4\)) than \(\theta ^4\).

The following lemma shows that \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\) have the ratio consistency under some conditions.

Lemma 5

For \(\tilde{\theta }= \bar{\pi }(1-\bar{\pi })\), \(\bar{\pi }= \frac{1}{N} \sum _{i=1}^kn_i \pi _i\) and \(\pi _i \le \delta <1\) for some \(0<\delta <1\), we have the followings;

  1. 1.

    if \(\frac{\sum _{i=1}^k \left( \frac{\theta _i^3}{n_i}+ \frac{\theta _i}{n_i^3}\right) }{\left( \sum _{i=1}^k \left( \theta _i^2 + \frac{1}{N^2} \frac{\theta _i}{n_i} \right) \right) ^2} \rightarrow 0\) as \(k\rightarrow 0\), \(\frac{\hat{\mathcal{V}}_1}{\mathcal{V}_1} \rightarrow 1\) in probability.

  2. 2.

    if \(\frac{ (\tilde{\theta })^3 \sum _{i=1}^k\frac{1}{n_i} + \tilde{\theta }\sum _{i=1}^k \frac{1}{n_i^3}}{ \left( k (\tilde{\theta })^2+ \frac{\tilde{\theta }}{N^2} \sum _{i=1}^k \frac{1}{n_i} \right) ^2 } \rightarrow 0 \), \(\frac{\hat{\mathcal{V}}_{1*}}{\mathcal{V}_{1*}} \rightarrow 1\) in probability.

Proof

See “Appendix”. \(\square \)

Remark 3

Lemma 5 includes the condition \(\pi _i \le \delta <1\) which avoids dense case that the majority of observations are 1. Since our study focuses on sparse case, it is realistic to exclude \(\pi _i\)s which are very close to 1. When data are dense, the homogeneity test of \(\pi _i\) can be done through testing \(\pi _i^* \equiv 1-\pi \) and \(x_{ij}^*=1-x_{ij}\).

Remark 4

As an estimator of \(\pi _i^l\) or \(\pi ^l\) for \(l=1,2,3,4\), we used unbiased estimators of them. Instead of unbiased estimators, we may consider simply MLE, \( (\hat{\pi }_i)^l\) or \( (\hat{\pi })^l\) for \(l=1,2,3,4\). For the first type estimator \(\hat{\mathcal{V}}_1\), when sample sizes \(n_i\) are not large, unbiased estimators and MLE are different. Especially, if all \(n_i\)s are small and k is large, then such small differences are accumulated so the behavior of estimators for variance is expected to be significantly different. This will be demonstrated in our simulation studies. On the other hand, for \(\hat{\mathcal{V}}_{1*}\), unbiased estimators and MLEs for \( (\pi )^l\) under \(H_0\) behave almost same way even for small \(n_i\) since the total sample size \(N=\sum _{i=1}^k n_i\) is large due to large k. The estimator of \(\mathcal{V}_{1}\) based on \(\hat{\pi }_i\), namely \(\hat{\mathcal{V}}_{1}^{mle}\) has the larger variance

$$\begin{aligned} E\left( \hat{\mathcal{V}}_1^{mle} - \mathcal{V}_1\right) ^2 \asymp \sum _{i=1}^k\left( \frac{\theta _i^3}{n_i} + \frac{\theta _i}{n_i^3}\right) + \sum _{i\ne j} \frac{\theta _i \theta _j}{n_in_j} \end{aligned}$$

while \(E(\hat{\mathcal{V}}_1 - \mathcal{V}_1)^2 \asymp \sum _{i=1}^k\left( \frac{\theta _i^3}{n_i} + \frac{\theta _i}{n_i^3}\right) \). Similarly, we can also define \({\hat{\mathcal{V}}}_{1*}^{mle}\) based on the \(\hat{\pi } = \frac{\sum _{i=1}^k x_{i}}{N}\). Even with the given condition \( \sum _{i=1}^k\left( \frac{\theta _i^3}{n_i}+\frac{\theta _i}{n_i^3}\right) /(\sum _{i=1}^k\theta _i^2 + \frac{1}{N^2} \sum _{i=1}^k\frac{\theta _i}{n_i})^2 =o(1)\), \(\hat{\mathcal{V}}_{1}^{mle}\) may not be a ratio-consistent estimator due to the additional variation from biased estimation of \(\pi _i^l\) for \(l=2,3,4\). We present simulation studies comparing tests with \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_1^{mle}\) later.

In Lemma 5, we present ratio consistency of \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\) under some conditions. Both conditions avoid too small \(\pi _i\)s compared to \(n_i\)s among k groups. It is obvious that the conditions are satisfied if all \(\pi _i\)s are uniformly bounded away from 0 and 1. In general, however, the conditions allow small \(\pi _i\)s which may converge to zero at some rate satisfying presented conditions on \(\theta _i\)s in lemmas and theorems.

Under \(H_0\), we have two different estimators, \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\) and their corresponding test statistics, namely \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\), respectively:

$$\begin{aligned} T_\mathrm{new1} = \frac{T}{\sqrt{\hat{\mathcal{V}}_1 }},~~~~T_\mathrm{new2} = \frac{T}{\sqrt{\hat{\mathcal{V}}_{1*} }}. \end{aligned}$$

The following theorem shows that the proposed tests, \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\), are asymptotically size \(\alpha \) tests.

Theorem 2

Under \(H_0 : \pi _i \equiv \pi \) for all \(1\le i \le k\), if the condition in Lemma 5 holds and \(\frac{ \sum _{i=1}^k \frac{1}{n_i} }{k\theta ^3} \rightarrow 0\) for \(\theta = \pi (1-\pi )\) under \(H_0\), then \(T_\mathrm{new1}\rightarrow N(0,1)\) in distribution and \(T_\mathrm{new2} \rightarrow N(0,1)\) in distribution as \(k \rightarrow \infty \).

Proof

See “Appendix”. \(\square \)

Remark 5

The condition in Lemma 5 under the \(H_0\) is \(\frac{\theta ^3 \sum _{i=1}^k\frac{1}{n_i} + \theta \sum _{i=1}^k\frac{1}{n_i^3}}{ \left( k\theta ^2 + \frac{\theta }{N^2} \sum _{i=1}^k \frac{1}{n_i} \right) ^2 } =o(1)\). This condition includes a variety of situations such as small values of \(\pi \) as well as small sample sizes. Furthermore, inhomogeneous sample sizes are also included. For example, when the sample sizes are bounded, we have \(\sum _{i=1}^k\frac{1}{n_i} \asymp k\) and \(\sum _{i=1}^k\frac{1}{n_i^3} \asymp k\) leading to \(\frac{\theta ^3 \sum _{i=1}^k\frac{1}{n_i} + \theta \sum _{i=1}^k\frac{1}{n_i^3}}{ \left( k\theta ^2 + \frac{\theta }{N^2} \sum _{i=1}^k \frac{1}{n_i} \right) ^2 } \le \frac{1}{k\theta ^3}\) which converges to 0 when \(k \theta ^3 \rightarrow \infty \). This happens when \(\pi = k^{\epsilon -1/3}\) for \(0< \epsilon <1/3\) which is allowed to converge to 0. Another case is that sample sizes are highly unbalanced. For example, we have \( n_i \asymp i^{\alpha }\) for \( \alpha >1\) which implies \(\sum _{i=1}^{\infty } \frac{1}{n_i} < \infty \) and \(\sum _{k=1}^{\infty } \frac{1}{n_i^3} <\infty \). Therefore, the condition is \( \frac{\theta ^3 \sum _{i=1}^k\frac{1}{n_i} + \theta \sum _{i=1}^k\frac{1}{n_i^3}}{ \left( k\theta ^2 + \theta \sum _{i=1}^k \frac{1}{n_i} \right) ^2 } \asymp \frac{ \theta ^3 + \theta }{ (k \theta ^2 + \theta )^2 } \le \frac{ \theta ^3 + \theta }{k^2 \theta ^4 } = \frac{1}{k^2 \theta } + \frac{1}{k^2 \theta ^3} \rightarrow 0 \) if \( \pi \asymp k ^{\epsilon } \) for \( -\frac{2}{3}< \epsilon < 0\). In this case, the sample size \(n_i\) diverges as \(i \rightarrow \infty \), so sample sizes are highly unbalanced. For the asymptotic normality, additional condition \(\sum _{i=1}^k\frac{1}{n_i}/(k\theta ^3) \rightarrow 0\) in Theorem 2 is satisfied for \( -\frac{1}{3}< \epsilon <0\).

From Theorem 2, we reject the \(H_0\) if

$$\begin{aligned} T_\mathrm{new1} (\text{ or }~~ T_\mathrm{new2}) > z_{1-\alpha } \end{aligned}$$

where \(z_{1-\alpha }\) is \((1-\alpha )\) quantile of a standard normal distribution. As explained in Sect. 2.2, note that the rejection region is one-sided since we have \(E(T) \ge 0\), implying that large values of tests support the alternative hypothesis.

Although they have the same asymptotic null distribution, their power functions are different due to the different behaviors of \(\hat{\mathcal{V}}_1\) and \(\hat{\mathcal{V}}_{1*}\) under \(H_1\). In general, it is not necessary to have the asymptotic normality under the \(H_1\); however, to compare the powers analytically, one may expect asymptotic power functions to be more specific.

The following lemma states the asymptotic normality of \(T/\sqrt{{\hbox {Var}(T_1)}}\) where \(\hbox {Var}(T_1)\) is in (9) in Lemma 2. In the following asymptotic results, it is worth mentioning that we put some conditions on \(\theta _i\)s so that they do not approach to 0 too fast.

Theorem 3

If (i) \( |\pi _i -\bar{\pi }| \ge \frac{1+\epsilon }{N}\) for \(1\le i \le k\), (ii) \(\frac{\sum _{i=1}^k(\theta _i^4 + \frac{\theta _i}{n_i})}{\left( \sum _{i=1}^k\theta _i^4 + \frac{1}{N^2} \sum _{i=1}^k\frac{\theta _i}{N} \right) ^2 } \rightarrow 0\) and (iii) \(\frac{ \max _i (\pi _i - \bar{\pi })^2 (n_i\theta _i +1)}{ \mathcal{V}_1 + ||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}}} \rightarrow 0\) where \(||{{\varvec{\pi }}} - \bar{{\varvec{\pi }}} ||^2_{\theta \mathbf{n}} = \sum _{i=1}^k n_i (\pi _i-\bar{\pi })^2 \theta _i\), then

$$\begin{aligned} \frac{T -\sum _{i=1}^k n_i (\pi _i - \bar{\pi })^2}{\sqrt{{\hbox {Var}(T_1)}}} \rightarrow N(0,1) ~~\text{ in } \text{ distribution } \end{aligned}$$

where \(\hbox {Var}(T_1)\) is defined in (9).

Proof

See “Appendix”. \(\square \)

Using Theorem 3, we obtain the asymptotic power of the proposed tests. We state this in the following corollary.

Corollary 2

Under the assumptions in Lemma 5 and Theorem  3, the powers of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) are

$$\begin{aligned} P(T_\mathrm{new1} > z_{1-\alpha })- & {} \bar{\varPhi }\left( \frac{\sqrt{\mathcal{V}_1}}{\sqrt{{\hbox {Var}(T_1)}}} z_{1-\alpha } - \frac{\sum _{i=1}^k n_i(\pi _i-\bar{\pi })^2}{\sqrt{\hbox {Var}(T_1)}} \right) \rightarrow 0 \end{aligned}$$

and

$$\begin{aligned} P(T_\mathrm{new2} > z_{1-\alpha })- & {} \bar{\varPhi }\left( \frac{\sqrt{\mathcal{V}_{1*}}}{\sqrt{\hbox {Var}(T_1)}} z_{1-\alpha }- \frac{\sum _{i=1}^k n_i(\pi _i-\bar{\pi })^2}{\sqrt{\hbox {Var}(T_1)}} \right) \rightarrow 0 \end{aligned}$$

where \(\bar{\varPhi }(x) = 1-{\varPhi }(x) = P(Z > x)\) for a standard normal random variable Z and \(\hbox {Var}(T_1)\) defined in (9).

2.3 Comparison of powers

In the previous section, we present the asymptotic power of tests, \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\). Currently, it does not look straightforward to tell one test is uniformly better than the others. However, one may consider some specific scenario and compare different tests under those scenario which may help to understand the properties of tests in a better way. Asymptotic powers depend on the configurations of \((\pi _i's)\), \((n_i's)\) and k. It is not possible to consider all configurations; however, what we want to show through simulations is that neither of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) dominates the other.

Let \(\beta (T)\) be the asymptotic power of a test statistic \(\lim _{k\rightarrow \infty } P(T >z_{1-\alpha })\) where T is one of \(T_{\chi }\), \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\).

Theorem 4

  1. 1.

    If sample sizes \(n_1=\cdots =n_k\equiv n\) and \(\max _{1\le i \le k} \pi _i < \frac{1}{2}-\frac{1}{\sqrt{3}}\), then

    $$\begin{aligned} \lim _{k\rightarrow \infty }(\beta (T_\mathrm{new2}) - \beta (T_\mathrm{new1}) ) \ge 0. \end{aligned}$$

    If \(n_i=n\) for all \(1\le i \le k\) and \( n\bar{\pi }(1-\bar{\pi }) \rightarrow \infty \), then

    $$\begin{aligned} \lim _{k \rightarrow \infty } (\beta ({T_\mathrm{new2}}) - \beta (T_{\chi }) )\ge & {} 0. \end{aligned}$$
  2. 2.

    Suppose \(\pi _i =\pi =k^{-\gamma }\) for \(1\le i\le k-1\) and \(\pi _k = k^{-\gamma }+\delta \) for \(0<\gamma <1\) as well as \(n_i =n\) for \(1\le i\le k-1\), and \(n_k = [nk^{\alpha }]\) for \(0<\alpha < 1\) where [x] is the greatest integer which does not exceed x. Then, if \(n \rightarrow \infty \),

    1. (a)

      for \(\{(\alpha ,\gamma ) : 0<\alpha<1, 0<\gamma<1, 0<\alpha + \gamma<1, 0<\gamma \le \frac{1}{2}\}\), then \(\lim _k (\beta (T_\mathrm{new_1}) -\beta (T_\mathrm{new2}))=0\).

    2. (b)

      for \(\{ (\alpha , \gamma ) :0<\alpha<1, 0<\gamma <1, \alpha + \gamma>1, \alpha > \frac{1}{2} \} \), then \(\lim _k (\beta (T_\mathrm{new_1}) -\beta (T_\mathrm{new2})) >0\).

  3. 3.

    Suppose \(\pi _1 = k^{-\gamma }+\delta \) and \(n_1 =n \rightarrow \infty \) and \(\pi _i = k^{-\gamma }\) and \(n_i = [n k^{\alpha }]\) for \(2 \le i \le n\). For \(0<\gamma <1\) and \(0<\alpha <1\), if \(0<\gamma <1/2\) and \(k^{1-\alpha -\gamma } =o(n)\), then

    $$\begin{aligned} \lim _{k \rightarrow \infty } (\beta (T_\mathrm{new_2}) -\beta (T_\mathrm{new_1})) >0. \end{aligned}$$
    (18)

Proof

See “Appendix”. \(\square \)

From Theorem 4, we conjecture that \(T_\mathrm{new2}\) has better powers than others when sample sizes are homogeneous or similar to each other. For inhomogeneous sample sizes, \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) have different performances from the cases of 2 and 3 in Theorem 4. We show numerical studies reflecting these cases later.

Although we compare the powers of the proposed tests under some local alternative, it is interesting to see different scenarios and compare powers. Instead of an analytical approach, we present numerical studies as follows: Since the asymptotic powers of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) depend on the behavior of \(\mathcal{V}_1\) and \(\mathcal{V}_{1*}\), we compare those two variances under a variety of situations. If \(\mathcal{V}_{1*}> \mathcal{V}_1\), then \(T_\mathrm{new1}\) is more powerful than \(T_\mathrm{new2}\); otherwise, we have an opposite result. Although we compared the powers of tests in this paper in Theorem 4, there are numerous additional situations which are not covered analytically. We provide some additional situations from numerical studies here. We take \(k=100\), and we generate sample sizes \(n_i \sim \{20,21,\ldots ,200 \}\) uniformly. The left panel is for \(\pi _i \sim U(0.01, 0.2)\) and the left panel is for \(\pi \sim U(0.01,0.5)\) where U(ab) is the uniform distribution in (ab). We consider 1, 000 different configurations of \((n_i, \pi _i)_{1\le i \le 100}\) for each panel. We see that \(\hbox {Var}(T_1)\) and \(\hbox {Var}(T_1)_*\) have different behaviors when \(\pi _i\)s are generated different ways. If \(\pi _i\)s are widely spread out, then \(\hbox {Var}(T_1)_*\) is larger, otherwise \(\hbox {Var}(T_1)\) seems to be larger from our simulations (Fig. 1).

Fig. 1
figure 1

Comparison of \(\hbox {Var}(T_1)\) and \(\hbox {Var}(T_1)_*\)

We present simulation studies comparing the performance of \(T_\mathrm{new1}\), \(T_\mathrm{new2}\) and existing tests. They have different performances depending on different situations.

3 Simulations

In this section, we present simulation studies to compare our proposed tests with existing procedures.

We first adopt the following simulation setup and evaluate our proposed tests. Let us define

$$\begin{aligned} {\varvec{n}}_8= & {} 20(2,2^2,2^3,2^4,2^5,2^6,2^7,2^8) \\ {\varvec{n}}_{40}= & {} 20({\varvec{n}}_{1}^*,{\varvec{n}}_{2}^*,\ldots , {\varvec{n}}_{8}^*) = 20(2,\ldots ,2,2^2,\ldots ,2^2,\ldots , 2^8,\ldots ,2^8) \end{aligned}$$

where \({\varvec{n}}_{m}^*=(2^m,2^m,\ldots , 2^m)\) is a 8 dimensional vector. We consider the following simulations (Tables 1, 2, 3, 4, 5, 6).

  1. Setup 1

    \(\pi _i=0.001\) for \(1\le i \le k-1\) and \(\pi _k=0.001+\delta \) for \(k=8\) and \({\varvec{n}}_8\)

  2. Setup 2

    \(\pi _i=0.001+\delta \) for \(k=1\) and \(\pi _i = 0.001\) for \(2\le i \le k\) for \(k=8\) and \({\varvec{n}}_8\)

  3. Setup 3

    \(\pi _1 = 0.001 + \delta \) and \(\pi _i=0.001\) for \(2\le i \le 8\), \(k=8\), \(n_i= 2560\) for \(1\le i \le 8\)

  4. Setup 4

    \(\pi _i=0.001\) for \(1\le i \le k-1\) and \(\pi _k=0.001+\delta \) for \(k=40\) and \({\varvec{n}}_{40}\)

  5. Setup 5

    \(\pi _i=0.001+\delta \) for \(k=1\) and \(\pi _i = 0.001\) for \(2\le i \le k\) for \(k=40\) and \({\varvec{n}}_{40}\)

  6. Setup 6

    \(\pi _i=0.001+\delta \) for \(i=1\) and \(\pi _i=0.001\) for \(2 \le i \le k\). \(n_i= 2560\) for \(1\le i \le 40\)

Table 1 Powers under Setup 1
Table 2 Powers under Setup 2
Table 3 Powers under Setup 3

As test statistics, we use \(T_\mathrm{new1}\), \(M_1\), \(T_\mathrm{new2}\), \(M_2\), TS, modTS and PW. Here, as discussed in Remark 4, \(M_1\) uses \(\hat{\mathcal{V}}_1^{mle}\) as an estimator of \(\mathcal{V}_1\) in \(T_\mathrm{new1}\) and \(M_2\) uses \(\hat{\mathcal{V}}_{1*}^{mle}\) for \(\mathcal{V}_{1*}\) in \(T_\mathrm{new2}\). TS represents the test in () and modTS represents the test in (10). Chi represents Chi-square test based on \(T_S > \chi ^2_{k-1, 1-\alpha }\) where \(\chi ^2_{k-1,1-\alpha }\) is the \((1-\alpha )\) quantile of Chi-square distribution with degrees of freedom \(k-1\). PW is the test in Potthoff and Whittinghill (1966), and BL represents the test in Bathke and Lankowski (2005). Note that BL is available only when sample sizes are all equal. For calculation of size and power of each test, we simulate 10,000 samples and compute empirical size and power based on 10,000 p values.

From the above scenario, we consider inhomogeneous sample sizes (Setups 1, 2, 4 and 5) and homogeneous sample sizes (Setups 3 and 6). Furthermore, when sample sizes are inhomogeneous, two cases are considered: one is the case that different \(\pi _i\) occurs for a study with large sample (Setups 1 and 4) and the other for a study with small sample (Setups 2 and 5). Setups 1–6 consider the cases that only one study has a different probability (\(0.001+\delta \)) and all the others have the same probability (0.001). On the other hand, we may consider the following cases which represent all probabilities are different from each other (Tables 7, 8).

  1. Setup 7

    \(\pi _i=0.001(1+\epsilon _i)\), \(k=40\), \(n_i = 2560\) for \(1\le i \le 40\) where \(\epsilon _i\)s are equally spaced grid in \([-\delta , \delta ]\).

  2. Setup 8

    \(\pi _i=0.01(1+\epsilon _i)\), \(k=40\), \({\varvec{n}}_{40}^*\) where \(\epsilon _i\)s are equally spaced grid in \([-\,\delta , \delta ]\).

Table 4 Powers under Setup 4
Table 5 Powers under Setup 5
Table 6 Powers under Setup 6
Table 7 Powers under Setup 7
Table 8 Powers under Setup 8

From our simulations, we first see that \(T_\mathrm{new1}\) obtains more powers than \(M_1\), while \(T_\mathrm{new2}\) and \(M_2\) obtain almost similar powers. The performance of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) is different depending on different situations. When sample sizes are homogeneous (Setups 3, 6 and 7), \(T_\mathrm{new2}\) obtains slightly more power than \(T_\mathrm{news}\) as shown in 1 in Theorem 4. On the other hand, when sample sizes are inhomogeneous, \(T_\mathrm{new1}\) seems to have more advantage for the cases that different probability occurs for large sample sizes, while \(T_\mathrm{new2}\) seems to obtain better powers for the opposite case. Overall, the performances of \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) are different depending on situations. Cochran’s test seems to fail in controlling a given size; however, the modified TS achieves reasonable empirical sizes. When sample sizes are homogeneous, the modified TS has comparable powers; however, for inhomogeneous sample sizes, the modified TS has significantly small powers compare to \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) for Setup 8 (Tables 8).

Table 9 Powers under Setup 9

As suggested by a reviewer, we consider the following two more numerical studies when k is extremely large (Tables 9, 10).

  1. Setup 9

    \(\pi _i=0.01(1+\epsilon _i)\), \(k=2000\), \(n_i = 100\) for \(1\le i \le 2,000\) where \(\epsilon _i\)s are equally spaced grid in \([-\delta , \delta ]\).

  2. Setup 10

    \(\pi _i=0.01(1+\epsilon _i)\), \(k=2,000\), \({\varvec{n}}=({\varvec{n}}_{1,250}, {\varvec{n}}_{2,250},\ldots , {\varvec{n}}_{8,250})\) where \({\varvec{n}}_{m,250}=(2^m, 2^m, \ldots , 2^m)\) is a 250 dimensional vector with all components \(2^m\) and \(\epsilon _i\)s are equally spaced grid in \([-\,\delta , \delta ]\).

Setup 9 is the case of a extremely large number of groups with small sample sizes. As mentioned in introduction, we focus on sparse count data in the sense that \(\pi _i\)s are small, so we take \(\pi _i=0.01\) and homogeneous sample sizes \(n_i=100\) so that we have \(E(X_i)=n_i\pi _i\) which represents very sparse data in each group. For the number of groups, we use \(k=2000\) which is much larger than \(n_i=100\). Table 9 shows sizes and powers of all tests, and we see that all tests have similar performances when sample sizes are homogeneous. On the other hand, for the case that sample sizes are highly unbalanced which is the case of Setup 10, Table 10 shows that our proposed tests control the nominal level of size and obtain increasing patter of powers, while tests based on Chi-square statistics fail in controlling the nominal level of size and obtaining powers. In particular, those Chi-square-based tests have decreasing patterns of powers even though the effect sizes (\(\delta \) in this case) increase. PW controls the size and has increasing pattern of powers; however, the powers of PW are much smaller than those of our proposed tests. All codes will be available upon request.

4 Real examples

In this section, we provide real examples for testing the homogeneity of binomial proportions from a large number of independent groups.

Table 10 Powers under Setup 10
Table 11 p values for homogeneity tests

We apply our proposed tests and existing tests to the rosiglitazone data in Nissen and Wolski (2007). The data set includes the 42 studies and consists of study size (N), number of myocardial infarctions (MI) and number of deaths (D) for rosiglitazone (treatment) and the corresponding results under control arm for each study.

We consider testing (1) for the proportions of myocardial infarctions and death rate (D) from cardiovascular causes. There are four situations, (i) MI/rosiglitazone, (ii) death from cardiovascular (DCV)/rosiglitazone, (iii) MI/control and (iv) death from cardiovascular(DCV)/control. Table 4 shows the p values for different situations and different test statistics. In case of MI/rosiglitazone and MI/control, all tests have 0 p value. On the other hand, for the other two cases, some tests have different results. For DCV/Rosiglitazone, \(T_\mathrm{new2}\), TS and modTS have small p values, while \(T_\mathrm{new1}\) and PW have slightly larger p values. For DCV/Control, \(T_\mathrm{new1}\) and \(T_\mathrm{new2}\) have much small p values (0.107 and 0.079) compared to \(T_S\), modTS, Chi and PW (0.609, 0.406, 0.584 and 0.229, respectively) (Table 11).

5 Concluding remarks

In this paper, we considered testing homogeneity of binomial proportions from a large number of independent studies. In particular, we focused on the sparse data and heterogeneous sample sizes which may affect the identification of null distributions. We proposed new tests and showed their asymptotic results under some regular conditions. We provided simulations and real data examples, which show that our proposed tests are convincing in case of sparse and a large number of studies. This is a convincing result since our proposed test is most reliable in controlling a given size from our simulations, so small p values from our proposed test are strong evidence against the null hypotheses.