1 INTRODUCTION: HETEROSCEDASTIC META-ANALYSIS MODEL

In a research synthesis problem one has to combine several estimates of the quantity of interest \(\mu\). The popular random effects model (REM) postulates the form of these estimators \(x_{i},i=1,\ldots,N\),

$$x_{i}=\mu+\lambda_{i}+\epsilon_{i}.$$
(1)

Here, the parameter of primary focus \(\mu\) is the common mean representing the treatment effect in biostatistics or the reference value in metrology. The measurement error \(\epsilon_{i}\) of the \(i\)th study is supposed to have zero mean and the variance representing the within-study variability. The protocol requires that the participants who possibly use different measuring techniques report not only their estimates of \(\mu\), but also the estimates \(s_{i}^{2}\) of this variance (within study uncertainty).

The term \(\lambda_{i}\) is commonly taken to have zero mean with some heterogeneity variance traditionally denoted by \(\tau^{2}\). One can view \(\lambda_{i}\) as the additional noise imposed by Nature on the experiment, where one observes the results of independent individual studies all measuring the same overall effect \(\mu\). This variance can be estimated by one of the developed procedures [8, 10]. The classical fixed effects model (FEM) corresponding to the situation when \(\tau^{2}=0\) provides a poor fit in many practical situations [1]. Indeed the heterogeneity in most research synthesis studies can be quite substantial (Thompson and Sharp, 1999). Moreover, some medical researchers believe that ‘‘examination of heterogeneity is perhaps the most important task in meta-analysis’’ [7, p. 30]. In many heterogeneous studies the assumption that all \(\lambda\)’s have the same dispersion seems to be violated if the smallest reported uncertainties correspond to the cases which are most deviant from the rest of data. Indeed this condition appears to be due mainly to mathematical expediency and limits the applicability of REM. We advocate a more detailed analysis by admitting several distinct heterogeneity variances whose larger values are assigned to the aberrant, outlying cases. The most homogeneous data subset gets zero heterogeneity variance.

By independence of \(\lambda_{i}\) and \(\epsilon_{i}\), the variance of \(x_{i}\) cannot be smaller than the variance of \(\epsilon_{i}\) typically estimable by \(s_{i}^{2}\) although it commonly includes non-statistically derived components. In many situations there is doubt about the validity of these estimates especially when they are identified with the unknown uncertainties. Hoaglin [10] discusses possible dire consequences of such identification in the homogeneity hypothesis testing problem \(\tau^{2}=0\). The case of REM allowing \(\tau^{2}\) to depend on the study was investigated by Rukhin (2019a) who suggested to use \(s_{i}\) only as a lower bound on the unknown \(i\)th uncertainty so that \(\tau^{2}_{i}\geq s_{i}^{2}\). In this work \(s_{i}^{2}\) also serves as the lower bound for the unknown variance of \(x_{i}\).

More generally one can entertain an augmented random effects model (AREM) which puts the studies into different classes or clusters according to the several values of the unknown heterogeneity variance, say, \(\tau^{2}=\tau^{2}_{k}>0,k=1,\ldots,K.\) Then the cluster corresponding to \(\tau^{2}_{0}=0\) plays a special role: it includes all conforming studies satisfying FEM. This class represents the largest consistent subset of all studies allowing to identify cases affected by ‘‘excess-variation’’ from likelihood calculations. Its definition is a controversial issue in metrology [3, 20, 4]. We illustrate our definition of this concept by two practical examples in Section 6. The situation when each heterogeneous cluster consists of just one element corresponds to the mentioned setting with \(s_{i}^{2}\) representing a lower bound on the variance of \(x_{i}\).

There is a body of work aimed at extending REM for meta-analysis needs. See Ohlssen et al., 2007, Lee and Thompson, 2008, or Kulinskaya and Olkin, 2014 for more flexible parametric and nonparametric models. The main objective of this paper is to suggest a methodology for selecting one of AREM models with distinct heterogeneity variances. We use maximum likelihood estimators of the mean and variances which provide likelihood-based information criterions studied numerically in Section 7. For this purpose in the next section iterative algorithms to find the maximum likelihood procedures and the conditions for their convergence are discussed. Section 3 contains a necessary condition for the global extremum, and Section 4 gives an example where \(s_{i}^{2}\) are equal to illustrate that the traditional REM have smaller likelihood than some other AREM models. Section 5 discusses the algebraic difficulty of the likelihood equations. The proofs are collected in the Appendix.

2 MAXIMUM LIKELIHOOD ESTIMATORS

Assume that the data are represented by a series of independent but not equally distributed random variables \(x_{i}\sim N(\mu,\sigma^{2}_{i})\) with the unknown common mean \(\mu\) and variances \(\sigma^{2}_{i},\sigma^{2}_{i}=\tau^{2}_{i}+s^{2}_{i},\) where \(\tau^{2}_{i}\geq 0\) is unknown and \(s_{i}^{2}\) represents the within-study variance, \(i=1,\ldots,N\). Supposing that there are no more than \(K\) different positive \(\tau\)’s, the variance of \(x_{i}\) for \(i\) in the class \(I_{k}\) is \(s^{2}_{i}+\tau^{2}_{k}\). Clusters \(I_{0},I_{1},\ldots,I_{K}\) define a partition of the set \(\{1,\ldots,N\}\) with \(I_{k}\) containing \(n_{k}\) elements, \(n_{0}+n_{1}+\cdots+n_{K}=N.\) The total number of heterogeneous clusters \(K\) cannot exceed \(N-n_{0}\), and there is no generality loss in assuming that \(n_{1}\geq\cdots\geq n_{K}\). The special cluster \(I_{0}\) corresponds to the vanishing heterogeneity variance \(\tau^{2}_{0}=0\).

For given \(n_{0}\) and \(K\), the number of non-empty heterogeneity clusters is the Stirling number of the second kind \(\Big{\{}\begin{matrix}N-n_{0}\\ K\end{matrix}\Big{\}}\) (Graham et al., 1994). Thus the total number of all different partitions is

$$\sum_{0\leq n_{0}\leq N}\Bigg{\{}\begin{matrix}N-n_{0}\\ K\end{matrix}\Bigg{\}}\binom{N}{n_{0}}=\Bigg{\{}\begin{matrix}N+1\\ K+1\end{matrix}\Bigg{\}},$$

which allows \(I_{0}\) to be empty. When \(K=1\), this number is \(2^{N}-1\), for \(K=2\) it is \((3^{N}-2^{N+1}+1)/2\), for \(K=N-1\), it is \(N(N+1)/2.\) An implication is that in practice either \(K\) should be chosen to be small or close to \(N\).

For given clusters \(I_{0},I_{1},\ldots,I_{K}\) the (classical) log-likelihood function (times \(-2\)) is

$$L(\mu,\tau^{2}_{1},\ldots,\tau_{K}^{2};I_{0},I_{1},\ldots,I_{K})$$
$${}=L(\mu,\tau^{2}_{1},\ldots,\tau_{K}^{2})=\sum_{k=0}^{K}\sum_{I_{k}}\left[\frac{(x_{i}-\mu)^{2}}{\tau^{2}_{k}+s_{i}^{2}}+\log(\tau^{2}_{k}+s_{i}^{2})\right],\quad\tau^{2}_{0}=0.$$
(2)

We call (2) the augmented random effects model (AREM.CL.\(K\)) likelihood. A standard argument shows that for given clusters the model AREM.CL.\(K\) is identifiable.

The form of the restricted likelihood function \(\tilde{L}\) (AREM.RL.\(K\)) is also well known:

$$\widetilde{L}(\tau^{2}_{1},\ldots,\tau^{2}_{K})=\sum_{k=0}^{K}\sum_{I_{k}}\left[\frac{(x_{i}-\tilde{\mu})^{2}}{\tau^{2}_{k}+s_{i}^{2}}+\log(\tau^{2}_{k}+s_{i}^{2})\right]$$
$${}+\log\left(\sum_{k=0}^{K}\sum_{I_{k}}\frac{1}{\tau^{2}_{k}+s_{i}^{2}}\right).$$
(3)

Here the cluster weighted mean

$$\tilde{\mu}=\tilde{\mu}(\tau^{2}_{1},\ldots,\tau^{2}_{K})=\sum_{k=0}^{K}\sum_{I_{k}}\frac{x_{i}}{\tau^{2}_{k}+s_{i}^{2}}\left[\sum_{k=0}^{K}\sum_{I_{k}}\frac{1}{\tau^{2}_{k}+s_{i}^{2}}\right]^{-1}$$
(4)

is the best linear unbiased estimator of \(\mu\) which also minimizes (2) in \(\mu\) for given \(\tau\)’s.

Thus to find strictly positive maximum likelihood estimates \(\hat{\tau}^{2}_{k}\), one has to solve equations

$$\frac{\partial L}{\partial\tau_{k}^{2}}=0,\quad k=1,\ldots,K,$$

with positive definite matrix of second derivatives. These equations can be written in the form

$$\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}-s_{i}^{2}}{(\tau^{2}_{k}+s_{i}^{2})^{2}}=\tau^{2}_{k}\sum_{I_{k}}\frac{1}{(\tau^{2}_{k}+s_{i}^{2})^{2}}.$$
(5)

The following Theorem 2.1 gives an explicit form of the Hessian and offers an iterative algorithm motivated by (5).

Let \(a\) and \(d\) be \(K\)-dimensional vectors with the coordinates

$$a_{k}=2^{1/2}\sum_{I_{k}}\frac{x_{i}-\tilde{\mu}}{(\tau_{k}^{2}+s_{i}^{2})^{2}}\left(\sum_{m}\sum_{I_{m}}\frac{1}{\tau_{m}^{2}+s_{i}^{2}}\right)^{-1/2}$$

and

$$d_{k}=2\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}}{(\tau_{k}^{2}+s_{i}^{2})^{3}}-\sum_{I_{k}}\frac{1}{(\tau_{k}^{2}+s_{i}^{2})^{2}},\quad k=1,\ldots,K.$$

Theorem 2.1. For AREM.CL.\(K\) the Hessian \(H=(\partial^{2}L/[\partial\tau_{k}^{2}\partial\tau^{2}_{\ell}])_{k,\ell=1}^{K}\) has the form

$$H=\textrm{diag}(d)-aa^{T}.$$
(6)

With \(\hat{\tau}^{2}_{k}\) substituted for \(\tau_{k}^{2}\) in the definition of \(a,d\), and \(\tilde{\mu}\) to get \(\hat{a}\), \(\hat{d}\), \(\hat{\mu}\), the sufficient condition for the minimum of (2) to be attained is:

$$\sum_{k=1}^{K}\frac{\hat{a}_{k}^{2}}{\hat{d}_{k}}<1,\quad\hat{d}_{k}>\hat{a}_{k}^{2},\quad k=1,\ldots,K.$$
(7)

A necessary condition is that \(\sum_{k=1}^{K}\hat{a}_{k}^{2}/\hat{d}_{k}\leq 1\).

Provided that (7) holds and

$$\hat{d}_{k}-\hat{a}_{k}^{2}<2\sum_{I_{k}}\frac{1}{(\hat{\tau}_{k}^{2}+s_{i}^{2})^{2}},\quad k=1,\ldots,K,$$
(8)

the positive maximum likelihood estimators \(\hat{\tau}^{2}_{k},k=1,\ldots,K\) exist and can be determined by iterations as

$$\hat{\tau}^{2}_{k}=\max\left[0,\sum_{I_{k}}\frac{(x_{i}-\hat{\mu})^{2}-s_{i}^{2}}{(\hat{\tau}_{k}^{2}+s_{i}^{2})^{2}}\right]\left[\sum_{I_{k}}\frac{1}{(\hat{\tau}_{k}^{2}+s_{i}^{2})^{2}}\right]^{-1}.$$
(9)

There are many numerical methods like the Newton–Raphson rule which need only condition (7) for convergence. They are more reliable than the EM algorithm and can be used instead of (9) or (14) in the following Theorem 2.2. An attractive feature of iterations in Theorem 2.1 is that they decrease the value of (2) at each step (Rukhin, 2011). This fact matters since the likelihood may not be a unimodal function of \(\tau^{2}\); indeed the (polynomial) Eq. (5) can have several positive solutions. Therefore the choice of a good starting value is important.

To find the global optimizers \(\hat{I}_{0},\ldots,\hat{I}_{K}\), enumerate all \(\Big{\{}\begin{matrix}N+1\\ K+1\end{matrix}\Big{\}}\) partitions \(I_{0},\ldots,I_{K}\) of the index set \(\{1,\ldots,N\}\) and for each partition determine the solution of (5) via Theorem 2.1. Failure to converge is interpreted as non-existence of solutions satisfying (8) resulting in rejection of the candidate clustering. The partition \(\hat{I}_{0},\ldots,\hat{I}_{K}\) which provides the overall minimizer is taken as the final choice delivering the maximum likelihood estimators \(\hat{\tau}^{2}_{1},\ldots,\hat{\tau}^{2}_{K}\) and \(\hat{\mu}\). A necessary optimality condition is provided in Section 3.

In the simplest but important case, \(K=1\) (AREM.CL.\(K=1\)), we have to choose two clusters, one with zero heterogeneity \(I_{0}\), another its heterogeneous complement \(I_{1}\). Then one has to determine the likelihood defined by the conditions: for \(i\in I_{1}\), \(\tau^{2}_{i}=\tau^{2}\) with unknown but positive \(\tau^{2}\), while \(I_{0}\) corresponds to \(j\)’s for which \(\tau^{2}_{j}=0\). The maximum likelihood estimator of \(\mu\) is of the form (4) with \(\tau^{2}=\hat{\tau}^{2}\),

$$\hat{\mu}=\left[\sum_{I_{0}}\frac{x_{j}}{s_{j}^{2}}+\sum_{I_{1}}\frac{x_{i}}{\hat{\tau}^{2}+s_{i}^{2}}\right]\left[\sum_{I_{0}}\frac{1}{s_{j}^{2}}+\sum_{I_{1}}\frac{1}{\hat{\tau}^{2}+s_{i}^{2}}\right]^{-1},$$

and \(\hat{\tau}^{2}=\hat{\tau}^{2}(I_{1})\) is a solution of the equation

$$\sum_{I_{1}}\frac{(x_{i}-\hat{\mu})^{2}}{(\hat{\tau}^{2}+s_{i}^{2})^{2}}=\sum_{I_{1}}\frac{1}{\hat{\tau}^{2}+s_{i}^{2}}.$$
(10)

To get the \(\mu\)-estimator for a fixed \(I_{1}\), it suffices to solve (10) choosing the true minimizer of \(\min_{\mu,\tau^{2}}L(\mu,\tau^{2})\) out of possibly several solutions with positive second derivative, i.e.,

$$\left(\sum_{I_{1}}\frac{1}{\hat{\tau}^{2}+s_{i}^{2}}+\sum_{I_{0}}\frac{1}{s_{j}^{2}}\right)\sum_{I_{1}}\frac{(x_{i}-\hat{\mu})^{2}}{(\hat{\tau}^{2}+s_{i}^{2})^{3}}-\left(\sum_{I_{1}}\frac{x_{i}-\hat{\mu}}{(\hat{\tau}^{2}+s_{i}^{2})^{2}}\right)^{2}$$
$${}>\frac{1}{2}\left(\sum_{I_{1}}\frac{1}{\hat{\tau}^{2}+s_{i}^{2}}+\sum_{I_{0}}\frac{1}{s_{j}^{2}}\right)\sum_{I_{1}}\frac{1}{(\hat{\tau}^{2}+s_{i}^{2})^{2}}.$$
(11)

When \(I_{0}=\emptyset\), \(I_{1}=\{1,\ldots,N\}\), one gets the setting of traditional REM with just one \(\tau^{2}\) and (9) presents a commonly used procedure to determine this parameter (e.g., Rukhin, 2019b). However in all examples the likelihood of this model cannot exceed that of models with non-empty \(I_{0}\) (see Section 4). The seemingly new convergence conditions (7) and (8) mean that in this situation \(\hat{a}_{1}^{2}<\hat{d}_{1}<2\sum_{I_{1}}(\hat{\tau}^{2}+s_{i}^{2})^{-2}+\hat{a}_{1}^{2}\). The likelihood of this case typically is smaller than that of some other AREM.CL.\(K=1\) models.

Another special case is the model AREM.CL.\(K=N-n_{0}\), with lower bounded variances, \(n_{k}\equiv 1,k\not\in I_{0}\). In this situation

$$\hat{\mu}=\left[\sum_{k\not\in I_{0}}\frac{x_{k}}{\hat{\tau}^{2}_{k}+s_{k}^{2}}+\sum_{I_{0}}\frac{x_{j}}{s_{j}^{2}}\right]\left[\sum_{k\not\in I_{0}}\frac{1}{\hat{\tau}^{2}_{k}+s_{k}^{2}}+\sum_{I_{0}}\frac{1}{s_{j}^{2}}\right]^{-1}.$$

Then the iteration scheme of Theorem 2.1 can be reduced to a one-dimensional problem involving only \(\hat{\mu}.\)

Theorem 2.2. A sufficient condition for the minimum in AREM.CL.\(K=N-n_{0}\) is that for any \(k\not\in I_{0},\)

$$\frac{2}{\hat{\tau}^{2}_{k}+s_{k}^{2}}<\sum_{\ell\not\in I_{0}}\frac{1}{\hat{\tau}^{2}_{\ell}+s_{\ell}^{2}}+\sum_{I_{0}}\frac{1}{s_{j}^{2}}<\sum_{I_{0}}\frac{2}{s_{j}^{2}}.$$
(12)

A non-strict version of (12) and inequalities

$$(x_{k}-\hat{\mu})^{2}>s_{k}^{2},\quad k\not\in I_{0},$$
(13)

form necessary conditions. The iteration scheme to find the maximum likelihood estimator

$$\hat{\mu}=\left[\sum_{k\not\in I_{0}}\frac{x_{k}}{(x_{k}-\hat{\mu})^{2}}+\sum_{I_{0}}\frac{x_{j}}{s_{j}^{2}}\right]\left[\sum_{k\not\in I_{0}}\frac{1}{(x_{k}-\hat{\mu})^{2}}+\sum_{I_{0}}\frac{1}{s_{j}^{2}}\right]^{-1}$$
(14)

converges if (12) holds. Under condition (13) the maximum likelihood estimators \(\hat{\tau}^{2}_{k}=(x_{k}-\hat{\mu})^{2}-s_{k}^{2},k\not\in I_{0},\) are positive.

Similar results can be derived for restricted maximum likelihood estimators of \(\tau^{2}\) with \(K\) simultaneous equations

$$\sum_{i\in I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}}{(\tilde{\tau}^{2}_{k}+s_{i}^{2})^{2}}=\sum_{i\in I_{k}}\frac{1}{\tilde{\tau}^{2}_{k}+s_{i}^{2}}-\sum_{i\in I_{k}}\frac{1}{(\tilde{\tau}^{2}_{k}+s_{i}^{2})^{2}}\left(\sum_{m}\sum_{i\in I_{m}}\frac{1}{s_{i}^{2}+\tilde{\tau}_{m}^{2}}\right)^{-1}.$$
(15)

Define the \(K\)-dimensional vectors \(b\) and \(c\) by their coordinates

$$b_{k}=\sum_{I_{k}}\frac{1}{(\tau_{k}^{2}+s_{i}^{2})^{2}}\left(\sum_{m}\sum_{I_{m}}\frac{1}{\tau_{m}^{2}+s_{i}^{2}}\right)^{-1}$$

and

$$c_{k}=2\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}}{(\tau_{k}^{2}+s_{i}^{2})^{3}}-\sum_{I_{k}}\frac{1}{(\tau_{k}^{2}+s_{i}^{2})^{2}}+\sum_{I_{k}}\frac{2}{(\tau_{k}^{2}+s_{i}^{2})^{3}}\left(\sum_{m}\sum_{I_{m}}\frac{1}{\tau_{m}^{2}+s_{i}^{2}}\right)^{-1},$$

\(k=1,\ldots,K\). By substituting the restricted maximum likelihood estimators \(\tilde{\tau}_{k}^{2}\), one gets vectors \(\tilde{b},\tilde{c}\) as well as \(\tilde{a}\).

Theorem 2.3. For AREM.RL.\(K\) the Hessian corresponding to (15) has the form

$$\tilde{H}=C-\tilde{a}\tilde{a}^{T}-\tilde{b}\tilde{b}^{T},$$
(16)

\(C=\textrm{diag}(\tilde{c})\). A sufficient condition for the minimum of \(\widetilde{L}(\tau^{2}_{1},\ldots,\tau^{2}_{K})\) in (3) to be attained at \(\tilde{\tau}_{k}^{2},k=1,\ldots,K\) is that for all \(k\)

$$\tilde{a}_{k}^{2}+\tilde{b}_{k}^{2}<\tilde{c}_{k}$$
(17)

and

$$\sum_{k}\frac{\tilde{a}_{k}^{2}+\tilde{b}_{k}^{2}}{\tilde{c}_{k}}<\min\left[1+\sum_{k}\frac{\tilde{a}^{2}_{k}}{\tilde{c}_{k}}\sum_{k}\frac{\tilde{b}^{2}_{k}}{\tilde{c}_{k}}-\left(\sum_{k}\frac{\tilde{a}_{k}\tilde{b}_{k}}{\tilde{c}_{k}}\right)^{2},2\right].$$
(18)

For any minimizer non-strict inequalities in (17) and (18) must hold. Under these conditions an iteration scheme to find positive restricted maximum likelihood estimators \(\tilde{\tau}^{2}_{k},\) \(k=1,\ldots,K,\)

$$\tilde{\tau}^{2}_{k}=\max\Big{\{}0,\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}-s_{i}^{2}+[\sum_{m}\sum_{I_{m}}(\tilde{\tau}_{m}^{2}+s_{i}^{2})^{-1}]^{-1}}{(\tilde{\tau}_{k}^{2}+s_{i}^{2})^{2}}\Big{\}}\left[\sum_{I_{k}}\frac{1}{(\tilde{\tau}_{k}^{2}+s_{i}^{2})^{2}}\right]^{-1},$$

\(\tilde{\mu}=\tilde{\mu}(\tilde{\tau}_{1}^{2},\ldots,\tilde{\tau}_{K}^{2})\), converges if (17) and (18) are valid.

Specification of Theorem 2.3 for AREM.RL.\(K=1\) and AREM.RL.\(K=N-n_{0}\) is given in Section 9.4; heuristic discussion of the conditions in Theorems 2.1–2.3 is postponed until Section 4.

3 CONDITION FOR GLOBAL EXTREMUM

In this section we give a necessary condition for attaining the global extremum of (2) over all clusters \(I_{0},I_{1},\ldots,I_{K}\).

Given the partition \(I_{0},I_{1},\ldots,I_{K}\), consider \(L(\tilde{\mu},\tau_{1}^{2},\ldots,\tau_{K}^{2})\) as a function of \(\tau_{1}^{2},\ldots,\tau_{K}^{2}\). Let

$$S=S(\tau_{1}^{2},\ldots,\tau_{K}^{2})=\sum_{m}\sum_{I_{m}}(\tau_{m}^{2}+s_{i}^{2})^{-1},$$
(19)

so that \(\tilde{\mu}=\sum_{m}\sum_{I_{m}}x_{i}(\tau_{m}^{2}+s_{i}^{2})^{-1}/S\).

For a fixed \(q=0,\ldots,K\), choose any \(n\in I_{q}\) which is moved from \(I_{q}\) to \(I_{p}\) and put \(\bar{I}_{q}=I_{q}\!\setminus\!\{n\}\) with the set \(I_{p}\) (\(p\neq q\)) replaced by \(\bar{I}_{p}=I_{p}\,\bigcup\,\{n\}\). Let \(\bar{I}_{k}=I_{k},k\neq p,q\), and denote the similar modification of \(\tilde{\mu}\) by

$$\bar{\mu}=\sum_{m}\sum_{\bar{I}_{m}}\frac{x_{i}}{\tau_{m}^{2}+s_{i}^{2}}\left[\sum_{m}\sum_{\bar{I}_{m}}\frac{1}{\tau_{m}^{2}+s_{i}^{2}}\right]^{-1},=\frac{x_{n}\Delta+\tilde{\mu}S}{\Delta+S},$$
(20)

\(\Delta=(\tau_{p}^{2}+s_{n}^{2})^{-1}-(\tau_{q}^{2}+s_{n}^{2})^{-1}.\) In this notation one has for any \(\tau^{2}_{k},k=1,\ldots,K\)

$$\sum_{k}\sum_{\bar{I}_{k}}\frac{(x_{i}-\bar{\mu})^{2}}{\tau_{k}^{2}+s_{i}^{2}}=\sum_{k}\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}}{\tau_{k}^{2}+s_{i}^{2}}+\frac{\Delta S(x_{n}-\tilde{\mu})^{2}}{\Delta+S}.$$
(21)

To prove (21) write the sum in its left-hand side as

$$\sum_{k}\sum_{I_{k}}\frac{(x_{i}-\bar{\mu})^{2}}{\tau_{k}^{2}+s_{i}^{2}}+\Delta(x_{n}-\bar{\mu})^{2}=\sum_{k}\sum_{I_{k}}\frac{(x_{i}-\tilde{\mu})^{2}}{\tau_{k}^{2}+s_{i}^{2}}-S(\tilde{\mu}-\bar{\mu})^{2}+\frac{\Delta S^{2}(x_{n}-\tilde{\mu})^{2}}{(\Delta+S)^{2}}$$

and employ (20) to simplify.

This formula and the representation (2) of the likelihood function lead to the following result.

Theorem 3.1. If in AREM.CL.\(K\) the clusters \(I_{0},I_{1},\ldots,I_{K}\) with estimates \(\hat{\tau}_{1}^{2},\ldots,\hat{\tau}_{K}^{2}\) and \(\hat{\mu}\) provide the global minimum of (2), then for any \(n\in I_{q}\) and \(p,0\leq p\neq q\leq K\), one must have

$$\frac{\Delta S(x_{n}-\hat{\mu})^{2}}{\Delta+S}\geq\log\left(\frac{\hat{\tau}_{q}^{2}+s_{n}^{2}}{\hat{\tau}_{p}^{2}+s_{n}^{2}}\right).$$
(22)

Here \(S\) is defined by (19) with \(\hat{\tau}_{1}^{2},\ldots,\hat{\tau}_{K}^{2}\) replacing \(\tau_{1}^{2},\ldots,\tau_{K}^{2}\).

According to Theorem 3.1 the distance between elements of heterogeneity clusters \(I_{k},k\geq 1\) and \(\hat{\mu}\) measured relative to \(\tau_{k}^{2}\) cannot be too small. Therefore the optimal homogeneity cluster \(I_{0}\) must consist of data points which are fairly close to the consensus estimate \(\hat{\mu}\).

4 EQUAL VARIANCES

To elucidate the conditions of Theorems 2.1–2.3 in this section we look at the simplest case \(K=1\) when all heterogeneity is in \(x\)’s, i.e., \(s_{i}^{2}\) are equal, \(s_{i}^{2}\equiv s^{2}\). Then \(\hat{\mu}=\bar{x}=\sum x_{i}/N\), \(\hat{\tau}^{2}=\max[0,\sum_{i}(x_{i}-\bar{x})^{2}/N-s^{2}]\). Provided that \(\hat{\tau}^{2}>0\),

$$\mathcal{L}(\emptyset,\{1,\ldots,N\})=N[1+\log(\hat{\tau}^{2}+s^{2})],$$

which can be assumed smaller than \(\mathcal{L}(\{1,\ldots,N\},\emptyset)\). For a heterogeneity cluster \(I=I_{1}\) of cardinality \(M\),

$$\bar{\mu}=\frac{\sum_{i\notin I}x_{i}/s^{2}+\sum_{i\in I}x_{i}/(\tau^{2}+s^{2})}{(N-M)/s^{2}+M/(\tau^{2}+s^{2})}=(1-\omega)\bar{x}_{I^{c}}+\omega\bar{x}_{I},$$

where \(\omega=M(\tau^{2}+s^{2})^{-1}/[(N-M)s^{-2}+M(\tau^{2}+s^{2})^{-1}],\omega\leq\omega_{0}=M/N,\bar{x}_{I}=\sum_{i\in I}x_{i}/M,\bar{x}_{I^{c}}=\sum_{i\notin I}x_{i}/(N-M),N\geq 3\).

By putting \(\alpha_{I}=\sum_{i\in I}(x_{i}-\bar{x}_{I})^{2}/(Ms^{2})\), \(\beta_{I}=\sum_{i\notin I}(x_{i}-\bar{x}_{I^{c}})^{2}/[(N-M)s^{2}]\), \(\gamma_{I}=(\bar{x}_{I}-\bar{x}_{I^{c}})^{2}/s^{2}\), it is convenient to work with the difference between likelihoods

$$[L(\tilde{\mu},\tau^{2}(\omega);I^{c},I)-\mathcal{L}(\emptyset,\{1,\ldots,N\})]/N=\omega_{0}\log\left(\frac{\omega_{0}(1-\omega)}{(1-\omega_{0})\omega}\right)$$
$${}+\frac{(1-\omega_{0})}{1-\omega}\left[\alpha_{I}\omega+\beta_{I}(1-\omega)+\gamma_{I}\omega(1-\omega)\right]-1-\log(v).$$
(23)

Here

$$v=\omega_{0}\alpha_{I}+(1-\omega_{0})\beta_{I}+\omega_{0}(1-\omega_{0})\gamma_{I}=\frac{\sum_{i}(x_{i}-\bar{x})^{2}}{Ns^{2}}.$$
(24)

The quantities \(a_{1}\) and \(d_{1}\) from Theorem 2.1 are such that

$$a_{1}^{2}=\frac{2M(1-\omega)^{2}\omega s^{2}\gamma_{I}}{(\tau^{2}+s^{2})^{3}}=\frac{2(N-M)\omega^{2}(1-\omega)\gamma_{I}}{(\tau^{2}+s^{2})^{2}},$$
$$d_{1}=\frac{2(N-M)\omega[\alpha_{I}+\gamma_{I}(1-\omega)^{2}]}{(1-\omega)(\tau^{2}+s^{2})^{2}}-\frac{M}{(\tau^{2}+s^{2})^{2}}.$$

The condition (7) there means that at maximum likelihood solution

$$\alpha_{I}+\gamma_{I}(1-\omega)^{3}\geq\frac{\omega_{0}(1-\omega)}{2(1-\omega_{0})\omega}.$$

which means that \(\alpha_{I}\geq[\omega_{0}(2\omega-1)(1-\omega)][2(1-\omega_{0})\omega^{2}]\) and then the iteration algorithm is convergent. The condition 12 of Theorem 2.2 is much simpler, \(\omega<\frac{1}{2},\) indicating that the relative weight of increased variance cases cannot exceed one-half.

The restricted likelihoods (up to a constant term) are \(\tilde{{\mathcal{L}}}(\emptyset,\{1,\ldots,N\})=(N-1)[1+\log(Nv/(N-1))],\) \(v>(N-1)/N\), and

$$\tilde{L}(\tilde{\mu},\tau^{2}(\omega);I^{c},I)=\log\left(\frac{1-\omega_{0}}{1-\omega}\right)$$
$${}+N\left[\omega_{0}\log\left(\frac{\omega_{0}(1-\omega)}{(1-\omega_{0})\omega}\right)+\frac{(1-\omega_{0})}{1-\omega}\left[\alpha_{I}\omega+\beta_{I}(1-\omega)+\gamma_{I}\omega(1-\omega)\right]\right].$$

In Theorem 2.3 one has \(b_{1}=\omega^{2}(\tau^{2}+s^{2})^{-2}\) and

$$c_{1}=b_{1}+\frac{2\omega}{(\tau^{2}+s^{2})^{2}},$$

so that the condition (17) means that

$$\alpha_{I}\geq\frac{(1-\omega)(M-2\omega+\omega^{2})}{2(N-M)\omega}.$$

Since \(K=1\), (18) is equivalent to (17) and under this condition the iteration algorithm converges.

The situation with equal uncertainties is also helpful to find out if the classical random effects model (\(I_{0}=\emptyset\)) can have a higher likelihood than AREM.CL.\(K\) with a non-empty homogeneous data set. Define for any \(K\)

$${\mathcal{L}}(I_{0},\ldots,I_{K})=\min_{\tau^{2}_{1},\ldots,\tau^{2}_{K}}L(\tilde{\mu},\tau^{2}_{1},\ldots,\tau^{2}_{K};I_{0},\ldots,I_{K}).$$
(25)

Conjecture. If \(\mathcal{L}\) is defined by (25) and \(K=1\), then

$$\mathcal{L}(\emptyset,\{1,\ldots,N\})\geq\min_{I_{0}\neq\emptyset}\mathcal{L}(I_{0},I_{0}^{c}).$$
(26)

This conjecture holds in many practical examples. It is a feature of the model with lower bounded variances (Rukhin, 2019a). We confirm (26) here for the equal variances case for all \(N\geq 3,K=1\).

For \(0<\omega\leq\omega_{0},\) and \(v\geq 1\) given in (24) we define \(F_{I}(\omega;\alpha_{I},\beta_{I},\gamma_{I})\) as the left hand side of (23),

$$F_{I}(\omega;\alpha_{I},\beta_{I},\gamma_{I})=\omega_{0}\log\left(\frac{\omega_{0}(1-\omega)}{(1-\omega_{0})\omega}\right)$$
$${}+\frac{(1-\omega_{0})}{1-\omega}\left[\alpha_{I}\omega+\beta_{I}(1-\omega)+\gamma_{I}\omega(1-\omega)\right]-1-\log(v).$$
(27)

The goal is to show that for some \(M,1\leq M<N\) there is \(I\) such that \(\min_{0<\omega\leq\omega_{0}}F_{I}(\omega;\alpha_{I},\beta_{I},\gamma_{I}))<0\). One has \(F_{I}(\omega_{0};\alpha_{I},\beta_{I},\gamma_{I})=v-\log(v)-1>0\) and \(\lim_{\omega\to 0}F_{I}(\omega;\alpha_{I},\beta_{I},\gamma_{I})=+\infty\). Thus the sought minimum cannot be attained close to the boundary.

For fixed \(I\) the desired minimizer \(\omega\) satisfies the cubic equation

$$\alpha_{I}+\gamma_{I}(1-\omega)^{2}=\frac{\omega_{0}(1-\omega)}{(1-\omega_{0})\omega}.$$
(28)

If (28) does not have a solution \(\omega\) in the interval \((0,\omega_{0})\), for which the second derivative is positive then \(\min F_{I}(\omega;\alpha_{I},\beta_{I},\gamma_{I})=0\) and the homogeneity cluster \(I\) cannot have larger likelihood than \(\emptyset\). The existence of a solution implies that \(\alpha_{I}+\gamma_{I}(1-\omega_{0})^{2}>1\).

One can express \(\alpha_{I}\) through \(\omega\) and \(\gamma_{I}\) to get the inequality in terms of these two variables

$$\omega_{0}\log\left(\frac{\omega_{0}(1-\omega)}{(1-\omega_{0})\omega}\right)+(1-(\omega_{0})\gamma_{I}\omega^{2}-1)-\log(v)\leq 0,$$
(29)

which for fixed \(\omega\) allows to find the region \(\alpha_{I},\beta_{I},\gamma_{I}\), where (27) holds.

Indeed it suffices to look at the simplest situation when \(M=N-1\) in which case \(b_{I}=0\). If \(\min_{\omega}\sum_{I}F_{I}(\omega,v)\leq 0\) then \(\sum_{I}\min_{\omega}F_{I}(\omega,v)\leq 0\) so that for some \(I\), \(F_{I}(\omega,v)\leq 0\). To evaluate \(\min_{\omega}\sum_{I}F_{I}(\omega,v)\) averages of \(\alpha_{I}\) and \(\gamma_{I}\) are needed. These quantities can be obtained by summing over all \(M\)-element sets \(I\),

$$\omega_{0}\sum_{I}\alpha_{I}=\binom{N}{M}\frac{(M-1)v}{(N-1)},$$
$$(1-\omega_{0})\sum_{I}\beta_{I}=\binom{N}{M}\frac{(N-M-1)v}{(N-1)},$$

which vanishes when \(M=N-1\), and

$$\omega_{0}(1-\omega_{0})\sum_{I}\gamma_{I}=\binom{N}{M}\frac{v}{(N-1)}.$$

Thus for \(M=N-1\) the inequality \(\min_{\omega}\sum_{I}F_{I}(\omega,\alpha_{I},0,\gamma_{I})<0\) means that (27) holds with \(\gamma_{I}=v/[(N-1)\omega_{0}(1-\omega_{0})]=v[N/(N-1)]^{2}\), \(\alpha_{I}=(N-2)v/[(N-1)\omega_{0}]=vN(N-2)/(N-1)^{2}\). The direct calculation given in Section 9.4 implies that this inequality holds for all \(N\geq 3\). Indeed the region where (29) does not hold is convex and \(\alpha_{I}^{0}=N(N-2)/(N-1)^{2}\) and \(\gamma_{I}^{0}=[N/(N-1)]^{2}\) provide its extreme point. Thus if \(\alpha_{I}-\alpha_{I}^{0}\geq\lambda_{N}(\gamma_{I}-\gamma_{I}^{0})\), (29) holds. Here \(\lambda_{N}\) denotes the slope of the boundary at \((\alpha_{I}^{0},\gamma_{I}^{0})\). It is shown in the Appendix that \(\lambda_{N}=(N^{2}-2N-2+(N-1)\sqrt{N^{2}-N-2})[N(N+2)]^{-1}\). Thus for \(N\geq 3\), \(\max_{I}\alpha_{I}-\alpha_{I}^{0}\geq\lambda_{N}(\min_{I}\gamma_{I}-\gamma_{I}^{0})\), and one can determine the optimal cluster \(I\) form the condition \(I^{c}=\{\arg\min_{k}\gamma_{k}\}\).

The counterpart of (27) for the restricted likelihood (up to a constant term) is

$$\frac{N-1}{N}\left[\left(\frac{Nv}{N-1}\right)\log\left(\frac{Nv}{N-1}\right)-1\right]\leq\frac{1}{N}\log\left(\frac{1-\omega}{1-\omega_{0}}\right)$$
$${}+\omega_{0}\log\left(\frac{(1-\omega_{0})\omega}{\omega_{0}(1-\omega)}\right)+\frac{(\omega_{0}-\omega)(N\omega_{0}+\omega)}{N\omega}-(\omega_{0}-\omega)^{2}c_{I},$$

where

$$\frac{\alpha_{I}}{1-\omega}+(1-\omega)\gamma_{I}=\frac{\omega_{0}}{(1-\omega_{0})\omega}+\frac{1}{N(1-\omega_{0})}.$$

However this inequality typically does not hold. The simplest example is given by a centered data set: \(N=3,x_{1}-\bar{x}=-5/(2\sqrt{3}),x_{2}-\bar{x}=1/(2\sqrt{3}),x_{3}-\bar{x}=2/\sqrt{3},s^{2}=1\). Then the cluster \(\{1,2,3\}\) provides the restricted likelihood solution with \(\tilde{\tau}^{2}=7/4\). In contrast the maximum likelihood method chooses heterogeneity cluster \(I_{1}=\{2,3\}\) with a smaller variance estimate \(\hat{\tau}^{2}=0.73\).

Now we formulate the main results of this section.

Theorem 4.1. When \(s_{i}^{2}\equiv s^{2}\), for the optimal cluster \(I\), \(\omega=M(\tau^{2}+s^{2})^{-1}/[(N-M)s^{-2}+M(\tau^{2}+s^{2})^{-1}]\) solves cubic equation (28). For all \(N\geq 3\), (26) holds when \(M=N-1\), with \(I^{c}=\{i\},i=\arg\min_{k}\gamma_{k}\). For the restricted likelihood function the corresponding inequality does not hold.

The estimates of \(\tau^{2}\) obtained for \(I_{0}\neq\emptyset\) minimizing \(\mathcal{L}(I_{0},I_{0}^{c})\) are typically (much) larger than those derived from AREM.CL.\(K=1\). Indeed the inequality (27) cannot be true for \(\omega_{1}=\omega_{0}/[\omega_{0}+(1-\omega_{0})v]<\omega_{0}\), as \(\omega_{0}(1-\omega_{1})/[(1-\omega_{0})\omega_{1}]=v.\) Thus the minimizer \(\omega\) in (27) cannot exceed \(\omega_{1}\) which corresponds to \(\hat{\tau}^{2}\).

In the general case of arbitrary uncertainties one cannot restrict attention to the case \(M=N-1\). As a matter of fact the cluster leading to a larger likelihood value can correspond to any \(M,M=1,\ldots,N-1\). According to Theorem 3.1 all elements of the heterogeneity class must be far away from the consensus mean estimate.

5 MAXIMUM LIKELIHOOD DEGREE

The goal here is to give the formula for the degree of (polynomial) likelihood equations (11) representing the algebraic complexity of the problem. In the situation of the previous section this degree is three. Our formula is based on a different representation of the likelihood function (2),

$$L(\tilde{\mu},\tau^{2}_{1},\ldots,\tau^{2}_{K})=\sum_{0\leq k,\ell\leq K}\sum_{i\in I_{k},j\in I_{\ell}}\frac{(x_{i}-x_{j})^{2}}{(\tau^{2}_{k}+s_{i}^{2})(\tau^{2}_{\ell}+s_{j}^{2})}\left[\sum_{k}\sum_{I_{k}}\frac{1}{\tau^{2}_{k}+s_{i}^{2}}\right]^{-1}$$
$${}+\log\left(\prod_{k=0}^{K}\prod_{I_{k}}(\tau^{2}_{k}+s_{i}^{2})\right),$$

which follows from the Lagrange identity. It will be assumed that all \(x\)’s in each cluster and all \(s\)’s are distinct (i.e., that the data is generic). This is a usual condition imposed when studying the maximum likelihood degree (cf. Gross et al., 2012) for a homogeneous variance component problem.

To represent the solutions of likelihood equations as those of polynomial equations let

$$P=P(\tau^{2}_{1},\ldots,\tau^{2}_{K};I_{0},I_{1},\ldots,I_{K})=\prod_{k=1}^{K}\prod_{i\in I_{k}}(\tau^{2}_{k}+s_{i}^{2})$$

be a polynomial of degree \(n_{1}+\cdots+n_{K}=N-n_{0}>0\). Then with \(S_{0}=\sum_{j\in I_{0}}s_{j}^{-2}\),

$$\sum_{I_{0}}\frac{1}{s_{j}^{2}}+\sum_{k=1}^{K}\sum_{I_{k}}\frac{1}{\tau^{2}_{k}+s_{i}^{2}}=S_{0}+\sum_{k=1}^{K}\frac{\partial\log P}{\partial\tau^{2}_{k}}=S_{0}+\sum_{k=1}^{K}\frac{P^{\prime}_{k}}{P}.$$

We also define two polynomials

$$Q=\sum_{0\leq k,\ell\leq K}\sum_{i\in I_{k},j\in I_{\ell}}\frac{(x_{i}-x_{j})^{2}P(\tau^{2}_{1},\ldots,\tau^{2}_{K})}{(\tau^{2}_{k}+s_{i}^{2})(\tau^{2}_{\ell}+s_{j}^{2})},$$

\(\tau^{2}_{0}=0\), and

$$R=S_{0}P+\sum_{k=1}^{K}P^{\prime}_{k}.$$

The degree of \(Q\) is \(N-\max(n_{0},2)\), that of \(R\) is \(N-\max(n_{0},1).\)

The scaled likelihood function \(L\) from (2) takes the form

$$L=\frac{Q}{R}+\log P+\sum_{j\in I_{0}}\log s_{j}^{2},$$

and the likelihood equations for \(\tau^{2}_{\ell},\ell=1,\ldots,K\) become

$$Q^{\prime}_{\ell}PR-QPR^{\prime}_{\ell}+P^{\prime}_{\ell}R^{2}=0.$$
(30)

For the restricted likelihood function \(\widetilde{L}=Q/R+\log R,\) these equations are simpler

$$Q^{\prime}_{\ell}R-QR^{\prime}_{\ell}+RR^{\prime}_{\ell}=0.$$
(31)

Combining these facts with information about the degree of \(P,Q\), and \(R\), one gets the next result.

Theorem 5.1. The degree \(DL\) of the polynomial Eqs. (30) has the form

$$DL=\begin{cases}3N-3n_{0}-1\quad n_{0}\geq 1\\ 3N-3\quad n_{0}=0,\end{cases}$$

the degree \(RL\) of the Eqs. (31) is

$$RL=\begin{cases}2N-2n_{0}-1\quad n_{0}\geq 1\\ 2N-3\quad n_{0}=0.\end{cases}$$

6 TWO PRACTICAL EXAMPLES

In many applications the smallest reported values \(s_{k}^{2}\) often seem to belong to the outlying cases. Two examples are provided in this section; many more can be found in metrology literature.

6.1 Key Comparison CCL-K1

Length key comparison CCL-K1 was carried out to compare the deviations from nominal length of steel and tungsten carbide gauge blocks (Thalmann, 2002). The measurement results for one of these blocks, namely, the deviations from a tungsten carbide gauge blocks of nominal length \(1\)mm by eleven participating national institutes are given in Table 1.

Table 1 CCL-K1 data on deviations from nominal length of a gauge block (tungsten carbide) in nm units

When \(K=1\), according to both maximum likelihood estimators \(\hat{I}_{1}=\tilde{I}_{1}=\{6,7,8\}\) increasing the uncertainties of these three labs to \(23.61\), \(23.95\), and \(24.27\) nm (or to \(23.81\), \(24.12\), and \(24.47\) nm) with \(\hat{\tau}=22.75\) nm, \(\hat{\mu}=18.07\) nm. For the restricted maximum likelihood estimator \(\tilde{\tau}=22.76\) nm, \(\tilde{\mu}=18.09\) nm.

Cox (2007) analyzed the same data set with the conclusion that two laboratories \(6\) and \(7\) are to be excluded from the largest consistent subset which he defined to be formed by the studies for which \(\sum_{j}(x_{j}-\tilde{\mu})^{2}/s_{j}^{2}\) does not exceed the critical point of \(\chi^{2}\)-distribution for \(0.05\) significance level. This approach which gives the consensus deviation as \(20.3\) nm has been criticized by Toman and Possolo, 2009 and Elster and Toman, 2010.

The maximum of the likelihood function is not attained at \(I_{1}=\{6,7\}\), so that a better definition of the largest consistent subset is \(\hat{I}_{0}=\hat{I}_{1}^{c}=\{1,2,3,4,5,9,10,11\}\). The same homogeneity cluster persists when \(K=2\). The estimate \(\hat{\tau}_{1}=22.55\) nm is then replaced by \(\hat{\tau}_{1}=11.07\) nm and \(\hat{\tau}_{2}=26.70\) nm. The classical maximum likelihood estimator of \(\tau\) based on the cluster \(\hat{I}_{0}\) above is positive, but small: \(0.056\) nm; the commonly used DerSimonian–Laird estimator of this parameter vanishes so that \(\hat{I}_{0}\) looks homogeneous.

Larger values of \(K\) do not lead to substantial gains in the likelihood functions. The best (classical likelihood) choice for \(K=3\) recommends to remove from \(\hat{I}_{0}\) lab \(3\), i.e., to take \(\{1,2,4,5,9,10,11\}\) as a new consistent subset; that for the restricted likelihood is \(\{1,2,3,4,5,9,11\}\).

6.2 CIPM CCQM. FF-K3 Study

Another example of discrepant data is the international fluid flow comparisons of air speed measurement (CCM.FF-K3) (Terao et al., 2007). An ultrasonic anemometer chosen as a transfer standard was circulated between four national metrology institutes who reported calibration results at certain speeds. The (dimensionless) data given in the Table 2 represents the ratio of the laboratory’s reference air speed to the one measured by the transfer standard.

Table 2 Relative air speed data (nominal speed 2 m/s)

Since the data is aberrant, the organizers of this study decided to use the median as an estimator of \(\mu\). When \(K=1\), the maximum likelihood estimator is \(\hat{I}_{0}=\{3\},\hat{I}_{1}=\{1,2,4\},\hat{\mu}=1.0193\) while the restricted maximum likelihood estimator gives \(\tilde{I}_{0}=\emptyset,\tilde{I}_{1}=\{1,2,3,4\},\tilde{\mu}=1.0138\). All increased uncertainties are about \(0.0137\) (MLE) (\(\hat{\tau}^{2}=1.81\times 10^{-4})\) or \(0.0118\) (REML) (\(\tilde{\tau}^{2}=1.49\times 10^{-4}\)), respectively. If \(K=2\), the maximum likelihood estimator practically remains the same, \(\hat{\mu}=1.020\), but the negative likelihood (2) is \(-36.35\) (attained at \(\hat{I}_{0}=\emptyset,\hat{I}_{1}=\{1,2\},\hat{I}_{2}=\{3,4\}\)) while the corresponding value is \(-35.45\) for \(K=1\).

For the restricted maximum likelihood estimator \(\tilde{I}_{0}=\{3\},\tilde{I}_{1}=\{2\},\tilde{I}_{2}=\{1,4\}\), \(\tilde{\mu}=1.020,\tilde{\tau}_{1}=0.0199,\tilde{\tau}_{2}=0.0087\), with the first increased uncertainty larger than that for \(K=1\), but the second one smaller. The negative restricted likelihood (3) decreases from \(-22.89\) to \(-23.52\). For \(K=2\) both \(\mu\)-estimators evaluated according to different likelihood methods turn out to be practically equal to \(1.0196\) (larger than the median \(1.0143\)).

Table 3 summarizes the results. It indicates that from the view of information criterions AREM with \(K=2\) provides the best fit to the data. Then maximum likelihood estimators coincide, and \(\hat{I}_{0}=\{3\},\hat{I}_{1}=\{2\},\hat{I}_{2}=\{1,4\}.\) The value \(K=3\) gives only small gains in the likelihood.

Table 3 Likelihoods of different AREM models and information criterion numbers

7 SIMULATION RESULTS FOR INFORMATION QUANTITIES

The quantities (25) provide statistics for Akaike’s information criterion (AIC) or for the Bayesian information criterion (BIC) based on the classical or restricted likelihood (Claeskens and Hjort, 2008). Thus one can get the information criteria numbers for these models, e.g.,

$$AIC_{\textrm{AREM.CL}}={\mathcal{L}}(I_{0},\ldots,I_{K})+2(K+1)$$

and

$$BIC_{\textrm{AREM.CL}}={\mathcal{L}}(I_{0},\ldots,I_{K})+(K+1)\log N.$$

In the corresponding formulas for \(AIC_{\textrm{AREM.RL}}\) and \(BIC_{\textrm{AREM.RL}}\) through \(\tilde{{\mathcal{L}}}(I_{0},\ldots,I_{K})\) one has to replace in the second term \(K+1\) by \(K\). These numbers are employed in Section 6 in two practical examples. Here we compare them numerically for AREM.\(K=1\) and AREM.\(K=N-n_{0}\).

To compare the properties of the considered likelihood procedures we performed Monte Carlo simulations involving AREM when \(I_{0}^{c}=\{1,\ldots,M\}\) with \(M=1\) or \(2\), \(N=5\), \(K=1\). In these models \(x_{i}\sim N(0,\tau^{2}+s_{i}^{2}),i=1,\ldots,M\) and \(x_{j}\sim N(0,s_{j}^{2})\) for \(j\geq M+1\). The variances \(s_{k}^{2}\) were obtained from realizations of standard exponential random variables for \(50\,000\) Monte Carlo runs.

Figure 1 displays the percentage of correct identification by information criterions \(AIC_{\textrm{AREM.CL}}\) and \(AIC_{\textrm{AREM.RL}}\). The Bayes information criterions \(BIC_{\textrm{AREM.CL}}\) and \(BIC_{\textrm{AREM.RL}}\) are not reported as their values differ only by a constant.

Fig. 1
figure 1

The percentage of correct model identification as a function of \(\tau^{2}\) for AIC based on classical likelihood (\(M=1\), solid line; \(M=2\), line marked by \(+\)) and on restricted likelihood (\(M=1\), line marked by \(\ast\); \(M=2\), line marked by circles).

This quantity ranges from \(0.12\) (\(\tau^{2}=0)\) to \(0.31\) (\(\tau^{2}=3)\) for AREM.CL when \(M=1\) which is reasonable in view of the total number of models \((2^{N}-1=31)\). When \(M=2\) it grows from \(0.02\) to \(0.17\). The probabilities of the correct choice behave similarly for \(K=N-n_{0}\) but are somewhat smaller e.g., when \(M=1\) they increase from \(0.09\) to \(0.21\), and from \(0.02\) to \(0.16\) (\(M=2\)).

Neither classical nor restricted likelihood procedures perform well in the case of the traditional REM (\(M=N,K=1\)) favoring AREM with smaller \(M\).

8 DISCUSSION

This work proposes a class of models to handle discrepant heterogeneous data in research synthesis. The traditional random effects model is extended to allow for different heterogeneity variance values. The algorithms for determination of these values along with the convergence conditions are presented. The procedures are based on the Gaussian likelihood although this distribution can be replaced by a non-normal location/scale parameter family. However the underlying density cannot be well estimated because usually the data is too scarce.

The suggested approach prescribes additional error to some outlying studies whose summary results are given larger uncertainties but still enter the final answer. Typically the uncertainty enlargements apply only to a few cases. The straightforward iterative numerical algorithms to evaluate the maximum likelihood estimators can be employed for small/moderate number of studies. If there are many studies, the likelihood calculations become impractical but the present methodology still can be used provided that the sizes \(n_{k}\) of heterogeneity clusters \(I_{k},k=1,\ldots,K\) are small.

We do not recommend more than two clusters unless there are good practical reasons to believe in so many categories. Then the \(K=N-n_{0}\) model can be implemented. The largest consistent subset obtained when \(K=1\) or \(2\) as a rule gets smaller if \(K\) increases while the chance of the correct model identification diminishes.