Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

6.1 Introduction

Information geometry gives geometric insights and methods for studying the statistical efficiency of estimators, testing, prediction and model selection. The field of algebraic statistics has proceeded somewhat separately but recently a positive effort is being made to bring the two subjects together, notably [15]. This paper should be seen as part of this effort.

A straightforward way of linking the two areas is to ask how far algebraic methods can be used when the statistical manifolds of information geometry are algebraic, that is algebraic varieties or derived forms, such as rational quotients. We call such models “algebraic statistical models” and will give formal definitions.

In the standard theory for non-singular statistical models, maximum likelihood estimators (MLEs) have first-order asymptotic efficiency and bias-corrected MLEs have second-order asymptotic efficiency. A short section covers briefly the basic theory of asymptotic efficiency using differential geometry, necessary for our development.

We shall show that for some important algebraic models, the estimating equations of MLE type become polynomial and the degrees usually become very high if the model has a high-dimensional parameter space. In this paper, asymptotically efficient algebraic estimators, a generalization of bias corrected MLE, are studied. By algebraic estimators we mean estimators which are the solution of algebraic equations. A main result is that for (algebraic) curved exponential family, there are second-order efficient estimators whose polynomial degree is at most two. These are computed by decreasing the degree of the estimating equations using Gröbner basis methods, the main tool of algebraic statistics. We supply some the basic Gröbner theory in Appendix A. See [22].

The reduction of the degree saves computational costs dramatically when we use computational methods for solving the algebraic estimating equations. Here we use homotopy continuation methods of [19, 24] to demonstrate this effect for a few simple examples, for which we are able to carry out the Gröbner basis reduction. Appendix B discusses homotopy continuation methods.

Although, as mentioned, the links between computational algebraic methods and the theory of efficient estimators based on differential geometry are recent, two other areas of statistics, not covered here, exploit differential geometry methods. The first is tube theory. The seminal paper by Weyl [26] has been used to give exact confidence level values (size of tests), and bounds, for certain Gaussian simultaneous inference problems: [17, 20]. This is very much related to the theory of up-crossings of Gaussian processes using expected Euler characteristic methods, see [1] and earlier papers. The second area is the use of the resolution of singularities (incidentally related to the tube theory) in which confidence levels are related to the dimension and the solid angle tangent of cones with apex at a singularity in parameters space [12, 25]. Moreover, the degree of estimating equations for MLE has been studied for some specific algebraic models, which are not necessarily singular [11]. In this paper we cover the non-singular case, for rather more general estimators than MLE, and show that algebraic methods have a part to play.

Most of the theories in the paper can be applied to a wider class of Multivariate Gaussian models with some restrictions on their covariance matrices, for example models studied in [6, 14]. Though the second-order efficient estimators proposed in the paper can be applied to them potentially, the cost for computing Gröbner basis prevents their direct application. Further innovation in the algebraic computation is required for real applications, which is a feature of several other areas of algebraic statistics.

Section 6.2 gives some basic background in estimation and differential geometry for it. Sections 6.3 and 6.4, which are the heart of the paper, give the algebraic developments and Sect. 6.5 gives some examples. Section 6.6 carries out some computation using homotopy continuation.

6.2 Statistical Manifolds and Efficiency of Estimators

In this section, we introduce the standard setting of statistical estimation theory, via information geometry. See [2, 4] for details. It is recognized that the ideas go back to at least the work of Rao [23], Efron [13] and Dawid [10]. The subject of information geometry was initiated by Amari and his collaborators [3, 5].

Central to this family of ideas is that the rates of convergence of statistical estimators and other test statistics depend on the metric and curvature of the parametric manifolds in a neighborhood of the MLE or the null hypothesis. In addition Amari realized the importance of two special models, the affine exponential model and the affine mixture model, \(e\) and \(m\) frame respectively. In this paper we concentrate on the exponential family model but also look at curved subfamilies. By extending the dimension of the parameter space of the exponential family, we are able to cover some classes of mixture models. The extension of the exponential model to infinite dimensions is covered by [21].

6.2.1 Exponential Family and Estimators

A full exponential family is a set of probability distributions \(\{\mathrm {d}P(x|\theta ) \mid \theta \in \varTheta \}\) with a parameter space \(\varTheta \subset \mathbb {R}^d\) such that

$$ \mathrm {d}P(x|\theta )=\exp (x_i \theta ^i-\psi (\theta )) \mathrm {d}\nu , $$

where \(x \in \mathbb {R}^d\) is a variable representing a sufficient statistic and \(\nu \) is a carrier measure on \(\mathbb {R}^d\). Here \(x_i \theta ^i\) means \(\sum _{i} x_i \theta ^i\) (Einstein summation notation).

We call \(\theta \) a natural parameter and \(\eta =\eta (\theta ):=E[x|\theta ]\) an expectation parameter. Denote \(E= E(\varTheta ):=\{\eta (\theta ) \mid \theta \in \varTheta \}\subset \mathbb {R}^d\) as the corresponding expectation parameter space. Note that the relation \(\eta (\theta )=\nabla _\theta \psi (\theta )\) holds. If the parameter space is restricted to a subset \(\mathcal {V}_\varTheta \subset \varTheta \), we obtain a curved exponential family

$$ \{\mathrm {d}P(x|\theta ) \mid \theta \in \mathcal {V}_\varTheta \}. $$

The corresponding space of the expectation parameter is denoted by \(\mathcal {V}_E:=\{\eta (\theta ) \mid \theta \in \mathcal {V}_\varTheta \}\subset E\).

Fig. 6.1
figure 1

A projection to the model manifold according to a local coordinate defines an estimator

Figure 6.1 explains how to define an estimator by a local coordinate. Let \((u,v)\in \mathbb {R}^p \times \mathbb {R}^{d-p}\) with a dimension \(p\) of \(\mathcal {V}_\varTheta \) be a local coordinate system around the true parameter \(\theta ^*\) and define \(\mathcal {U} \subset \mathbb {R}^p\) such that \(\{\theta (u,0)| u\in \mathcal {U}\}=\mathcal {V}_\varTheta \). For a full exponential model with \(N\) samples obtained by composing a map \((X^{(1)},\ldots , X^{(N)}) \mapsto \theta (\eta )|_{\eta =\bar{X}}\) and a coordinate projection map \(\theta (u,v)\mapsto u\), we can define a (local) estimator \((X^{(1)},\ldots , X^{(N)})\mapsto u\). We define an estimator by \(\eta (u,v)\) similarly. Since \(\bar{X}\) is a sufficient statistic of \(\theta \) (and \(\eta \)) in the full exponential family, every estimator can be computed by \(\bar{X}\) rather than the original data \(\{X_i\}\). Therefore in the rest of the paper, we write \(X\) as shorthand for \(\bar{X}\).

6.2.2 Differential Geometrical Entities

Let \(w:=(u,v)\) and use indexes \(\{i,j,\ldots \}\) for \(\theta \) and \(\eta \), \(\{a,b,\ldots \}\) for \(u\), \(\{\kappa ,\lambda ,\ldots \}\) for \(v\) and \(\{\alpha ,\beta ,\ldots \}\) for \(w\). The following are used for expressing conditions for asymptotic efficiency of estimators, where Einstein notation is used.

Differential geometrical entities

  • \(\eta _i(\theta ) = \displaystyle \frac{\partial }{ \partial \theta ^i}\psi (\theta )\),

  • Fisher metric \(\underline{G}=(g_{ij})\) w.r.t. \(\theta \): \(g_{ij}(\theta )=\displaystyle \frac{\partial ^2 \psi (\theta )}{\partial \theta ^i \partial \theta ^j}\),

  • Fisher metric \(\bar{G}=(g^{ij})\) w.r.t. \(\eta \): \(\bar{G}=\underline{G}^{-1}\),

  • Jacobian: \(B_{i\alpha }(\theta ):=\displaystyle \frac{\partial \eta _i(w)}{\partial w^\alpha }\),

  • e-connection: \(\varGamma ^{(e)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \theta ^i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \eta _i(w)\right) \),

  • m-connection: \(\varGamma ^{(m)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \eta _i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \theta ^i(w)\right) \),

6.2.3 Asymptotic Statistical Inference Theory

Under some regularity conditions on the carrier measure \(\nu \), potential function \(\psi \) and the manifolds \(\mathcal {V}_\varTheta \) or \(\mathcal {V}_E\), the asymptotic theory below is available. These conditions are for guaranteeing the finiteness of the moments and the commuting of the expectation and the partial derivative \(\frac{\partial }{\partial \theta } E_\theta [f]= E_\theta [\frac{\partial f}{\partial \theta }]\). For more details of the required regularity conditions, see Sect. 2.1 of [4].

  1. 1.

    If \(\hat{u}\) is a consistent estimator (i.e. \(P(\Vert \hat{u}-u\Vert >\epsilon )\rightarrow 0\) as \(N\rightarrow \infty \) for any \(\epsilon >0\)), the squared error matrix of \(\hat{u}\) is

    $$\begin{aligned} E_u[(\hat{u}-u)(\hat{u}-u)^\top ]=\,&E_u[(\hat{u}^a-u^a)(\hat{u}^b-u^b)]\\ =\,&N^{-1}[g_{ab}-g_{a\kappa }g^{\kappa \lambda }g_{b\lambda }]^{-1}+O(N^{-2}). \end{aligned}$$

    Here \([\cdot ]^{-1}\) means the matrix inverse. Thus, if \(g_{a\kappa }=0\) for all \(a\) and \(\kappa \), the main term in the r.h.s. becomes minimum. We call such an estimator as a first-order efficient estimator.

  2. 2.

    The bias term becomes

    $$ E_u[\hat{u}^a-u^a]=(2N)^{-1}b^a(u)+ O(N^{-2}) $$

    for each \(a\) where \(b^a(u):=\varGamma ^{(m)}{}_{cd}^a(u) g^{cd}(u)\). Then, the bias corrected estimator \(\check{u}^a:=\hat{u}^a-b^a(\hat{u})\) satisfies \(E_u[\check{u}^a-u^a]=O(N^{-2})\).

  3. 3.

    Assume \(g_{a\kappa }=0\) for all \(a\) and \(\kappa \), then the square error matrix is represented by

    $$\begin{aligned} E_u[(\check{u}^a-u^a)(\check{u}^b-u^b)]=\frac{1}{N}g^{ab}+\frac{1}{2N^2} \left( \mathop {{\varGamma }^{2ab}_M}^{(m)}+2\mathop {{H}^{2ab}_M}^{(e)}+\mathop {{H}^{2ab}_A}^{(m)}\right) +o(N^{-2}). \end{aligned}$$

    See Theorem 5.3 of [4] and Theorem 4.4 of [2] for the definition of the terms in the r.h.s. Of the four dominating terms in the r.h.s., only

    $$\begin{aligned} \mathop {{H}^{2ab}_A}^{(m)}:=g^{\kappa \mu }g^{\lambda \nu } H^{(m)}{}_{\kappa \lambda }^a H^{(m)}{}_{\mu \nu }^b \end{aligned}$$

    depends on the selection of the estimator.

    Here \(H^{(m)}{}_{\kappa \lambda }^a\) is an embedding curvature and equal to \(\varGamma ^{(m)}{}_{\kappa \lambda }^a\) when \(g_{a\kappa }=0\) for every \(a\) and \(\kappa \). Since \(\mathop {{H}^{2ab}_A}^{(m)}\) is the square of \(\varGamma ^{(m)}{}_{\kappa \lambda }^a\), the square error matrix attains the minimum in the sense of positive definiteness if and only if

    $$\begin{aligned} \left. \varGamma ^{(m)}{}_{\kappa \lambda ,a}(w)\right| _{v=0}=\left. \left( \frac{\partial ^2}{\partial v^\kappa \partial v^\lambda } \eta _i(w)\right) \left( \frac{\partial }{\partial u^a} \theta ^i(w)\right) \right| _{v=0}=0. \end{aligned}$$
    (6.1)

    Therefore the elimination of the m-connection (6.1) implies second-order efficiency of the estimator after a bias correction, i.e. it becomes optimal among the bias-corrected first-order efficient estimators up to \(O(N^{-2})\).

6.3 Algebraic Models and Efficiency of Algebraic Estimators

This section studies asymptotic efficiency for statistical models and estimators which are defined algebraically. Many models in statistics are defined algebraically. Perhaps most well known are polynomial regression models and algebraic conditions on probability models such as independence and conditional independence. Recently there has been considerable interest in marginal models [7] which are typically linear restrictions on raw probabilities. In time series autoregressive models expressed by linear transfer functions induce algebraic restrictions on covariance matrices. Our desire is to have a definition of algebraic statistical model which can be expressed from within the curved exponential family framework but is sufficiently broad to cover cases such as those just mentioned. Our solution is to allow algebraic conditions in the natural parameter \(\theta \), mean parameter \(\eta \) or both. The second way in which algebra enters is in the form of the estimator.

6.3.1 Algebraic Curved Exponential Family

We say a curved exponential family is algebraic if the following two conditions are satisfied.

(C1):

\(\mathcal {V}_\varTheta \) or \(\mathcal {V}_E\) is represented by a real algebraic variety, i.e. \(\mathcal {V}_\varTheta :=\mathcal {V}(f_1,\ldots ,f_k)=\{\theta \in \mathbb {R}^d | f_1(\theta )=\cdots =f_k(\theta )=0\}\) or similarly \(\mathcal {V}_E :=\mathcal {V}(g_1,\ldots ,g_k)\) for \(f_i \in \mathbb {R}[\theta ^1,\ldots ,\theta ^d]\) and \(g_i \in \mathbb {R}[\eta _1,\ldots ,\eta _d]\).

(C2):

\(\theta \mapsto \eta (\theta )\) or \(\eta \mapsto \theta (\eta )\) is represented by some algebraic equations, i.e. there are \(h_1,\ldots ,h_k \in \mathbb {R}[\theta ,\eta ]\) such that locally in \(\mathcal {V}_\varTheta \times \mathcal {V}_E\), \(h_i(\theta ,\eta )=0\) iff \(\eta (\theta )=\eta \) or \(\theta (\eta )=\theta \).

Here \(\mathbb {R}[\theta ^1,\ldots ,\theta ^d]\) means a polynomial of \(\theta ^1,\ldots ,\theta ^d\) over the real number field \(\mathbb {R}\) and \(\mathbb {R}[\theta ,\eta ]\) means \(\mathbb {R}[\theta ^1,\ldots ,\theta ^d,\eta _1,\ldots ,\eta _d]\). The integer \(k\), the size of the generators, is not necessarily equal to \(d-p\) but we assume \(\mathcal {V}_\varTheta \) (or \(\mathcal {V}_E\)) has dimension \(p\) around the true parameter. Note that if \(\psi (\theta )\) is a rational form or the logarithm of a rational form, (C2) is satisfied.

6.3.2 Algebraic Estimators

The parameter set \(\mathcal {V}_\varTheta \) (or \(\mathcal {V}_E\)) is sometimes singular for algebraic models. But throughout the following analysis, we assume non-singularity around the true parameter \(\theta ^*\in \mathcal {V}_\varTheta \) (or \(\eta ^*\in \mathcal {V}_E\) respectively).

Following the discussion at the end of Sect. 6.2.1. We call \(\theta (u,v)\) or \(\eta (u,v)\) an algebraic estimator if    

(C3):

\(w \mapsto \eta (w)\) or \(w \mapsto \theta (w)\) is represented algebraically.

  We remark that the MLE for an algebraic curved exponential family is an algebraic estimator.

If conditions (C1), (C2) and (C3) hold, then all of the geometrical entities in Sect. 6.2.2 are characterized by special polynomial equations. Furthermore, if \(\psi (\theta )\in \mathbb {R}(\theta ) \cup \log \mathbb {R}(\theta )\) and \(\theta (w)\in \mathbb {R}(w) \cup \log \mathbb {R}(w)\), then the geometrical objects have the additional property of being rational.

6.3.3 Second-Order Efficient Algebraic Estimators, Vector Version

Consider an algebraic estimator \(\eta (u,v)\in \mathbb {R}[u,v]^d\) satisfying the following vector equation:

$$\begin{aligned} X=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) \end{aligned}$$
(6.2)

where, for each \(u\), \(\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}\) is a complete basis of \(\mathbb {R}^d\) such that \(\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0\) and \(f_j(u,v)\in \mathbb {R}[u][v]_{\ge 3}\), namely a polynomial whose degree in \(v\) is at least 3 with coefficients polynomial in \(u\), for \(j=1,\ldots ,p\). Remember we use a notation \(X=\bar{X}=\frac{1}{N}\sum _i X_i\). The constant \(c\) is to control the perturbation (see below).

A straightforward computation of the \(m\)-connection in (6.1) at \(v=0\) for

$$ \eta (w)=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) $$

shows it to be zero. This gives

Theorem 1

Vector equation (6.2) satisfies the second-order efficiency (6.1).

We call (6.2) a vector version of a second-order efficient estimator. Note that if \(c=0\), (6.2) gives an estimating equation for the MLE. Thus the last term in (6.2) can be recognized as a perturbation from the MLE.

Figure 6.2 is a rough sketch of the second-order efficient estimators. Here the model is embedded in an \(m\)-affine space. Given a sample (red point), the MLE is an orthogonal projection (yellow point) to the model with respect to the Fisher metric. But a second-order efficient estimator maps the sample to the model along a “cubically” curved manifold (red curve).

Fig. 6.2
figure 2

Image of the vector version of the second-order efficient estimators

6.3.4 Second-Order Efficient Algebraic Estimators, Algebraic Version

Another class of second-order efficient algebraic estimators we call the algebraic version, which is defined by the following simultaneous polynomial equations with \(\eta _u=\eta (u,0)\).

$$\begin{aligned} (X-\eta _u)^\top \tilde{e}_j(u,\eta _u)&+c\cdot h_j(X,u,\eta _u,X-\eta _u)=0 \text{ for } j=1,\ldots ,p \end{aligned}$$
(6.3)

where \(\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} \) span \(((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}\) for every \(u\) and \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3\) (\(\text{ degree }=3\) in \(t\)) for \(j=1,\ldots ,p\). The constant \(c\) is to control the perturbation. The notation \(\bar{G}\) represents the Fisher metric on the full-exponential family with respect to \(\eta \). The notation \((\nabla _u \eta (u,0))^{\perp _{\bar{G}}}\) means the subspace orthogonal to \(span(\partial _a \eta (u,0))_{a=1}^p\) with respect to \(\bar{G}\) and \((\cdot )^{\perp _E}\) means the orthogonal complement in the sense of Euclidean vector space. Here, the term “degree” of a polynomial means the maximum degree of its terms. Note that the case \((X-\eta _u)^\top \tilde{e}_j(u,\eta _u)=0\) for \(j=1,\ldots ,p\) gives a special set of the estimating equations of the MLE.

Theorem 2

An estimator defined by a vector version (6.2) of the second-order efficient estimators is also represented by an algebraic version (6.3) where \(h_j(X,u,\eta _u,t)=\tilde{f}_j(u,(\tilde{e}_i^\top t)_{i=1}^p,(\tilde{e}_i^\top (X-\eta _u))_{i=1}^p)\) with a function \(\tilde{f}_j(u,v,\tilde{v})\in \mathbb {R}[u,\tilde{v}][v]_3\) such that \(\tilde{f}(u,v,v)=f(u,v)\).

Proof

Take the Euclidean inner product of both sides of (6.2) with each \(\tilde{e}_j\) which is a vector Euclidean orthogonal to the subspace \(span(\{e_i |i\ne j\})\) and obtain a system of polynomial equations. By eliminating variables \(v\) from the polynomial equations, an algebraic version is obtained. \(\square \)

Theorem 3

Every algebraic equation (6.3) gives a second-order efficient estimator (6.1).

Proof

Writing \(X=\eta (u,v)\) in (6.3), we obtain

$$\begin{aligned} (\eta (u,v)-\eta (u,0))^\top \tilde{e}_j(u) + c \cdot h_j(\eta (u,v),u,\eta (u,0),\eta (u,v)-\eta (u,0))=0. \end{aligned}$$

Partially differentiate this by \(v\) twice, we obtain

$$\begin{aligned} \left. \left( \frac{\partial ^2 \eta (u,v)}{\partial v^\lambda \partial v^\kappa }\right) ^\top \tilde{e}_j(u)\right| _{v=0} = 0, \end{aligned}$$

since each term of \(h_j(\eta (u,v),u,\eta (u,0),\eta (u,v)-\eta (u,0))\) has degree more than 3 in its third component \((\eta _i(u,v)-\eta _i(u,0))_{i=1}^{d}\) and \(\left. \eta (u,v)-\eta (u,0)\right| _{v=0}=0\). Since \(\mathrm{span}\{\tilde{e}_j(u); j=1,\ldots ,p\}=((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}=\mathrm{span}\{\bar{G} \partial _{u_a} \eta ;a=1,\ldots ,p\}\), we obtain

$$\begin{aligned} \left. \varGamma _{\kappa \lambda a}^{(m)} \right| _{v=0} = \left. \frac{\partial ^2 \eta _i}{\partial v^\lambda \partial v^\kappa } g^{ij} \frac{\partial \eta _j}{\partial u^a} \right| _{v=0} = 0. \end{aligned}$$

This implies the estimator is second-order efficient. \(\square \)

By Theorems 1, 2 and 3, the relationship between the three forms of the second-order efficient algebraic estimators is summarized as

$$ (1) \Leftarrow (2) \Rightarrow (3) \Rightarrow (1). $$

Furthermore, if we assume the estimator has a form \(\eta \in \mathbb {R}(u)[v]\), that is a polynomial in \(v\) with coefficients rational in \(u\), every first-order efficient estimator satisfying (6.1) can be written in a form (6.2) after resetting coordinates \(v\) for the estimating manifold. In this sense, we can say \((1) \Rightarrow (2)\) and the following corollary holds.

Corollary 1

If \(\eta \in \mathbb {R}(u)[v]\), the forms (1), (2) and (3) are equivalent.

6.3.5 Properties of the Estimators

The following theorem is a straightforward extension of the local existence of MLE. That is to say, the existence for sufficiently large sample size. The regularity conditions are essentially the same as for the MLE but with an additional condition referring to the control constant \(c\).

Proposition 1

(Existence and uniqueness of the estimate) Assume that the Fisher matrix is non-degenerate around \(\eta (u^*)\in \mathcal {V}_E\). Then the estimate given by (6.2) or (6.3) locally uniquely exists for small \(c\), i.e. there is a neighborhood \(G(u^*)\subset \mathbb {R}^d\) of \(\eta (u^*)\) and \(\delta >0\) such that for every fixed \(X \in G(u^*)\) and \(-\delta <c<\delta \), a unique estimate exists.

Proof

Under the condition of the theorem, the MLE always exists locally. Furthermore, because of the nonsingular Fisher matrix, the MLE is locally bijective (by the implicit representation theorem). Thus \((u_1,\ldots ,u_p)\mapsto (g_1(x-\eta _u),\ldots ,g_p(x-\eta _u))\) for \(g_j(x-\eta _u):=(X-\eta _u)^\top \tilde{e}_j(u,\eta _u)\) in (6.3) is locally bijective. Since \(\{g_i\}\) and \(\{h_i\}\) are continuous, we can select \(\delta >0\) for (6.3) to be locally bijective for every \(-\delta <c<\delta \). \(\square \)

6.3.6 Summary of Estimator Construction

We summarize how to define a second-order efficient algebraic estimator (vector version) and how to compute an algebraic version from it.

Input:  

  • a potential function \(\psi \) satisfying (C2),

  • polynomial equations of \(\eta ,u\) and \(v\) satisfying (C3),

  • \(m_1,\ldots ,m_{d-p}\in \mathbb {R}[\eta ]\) such that \(\mathcal {V}_E=V(m_1,\ldots ,m_{d-p})\) gives the model,

  • \(f_j \in \mathbb {R}[u][v]_{\ge 3}\) and \(c\in \mathbb {R}\) for a vector version

 

Step 1:

Compute \(\psi \) and \(\theta (\eta )\), \(G(\eta )\), (\(\varGamma ^{(m)}(\eta )\) for bias correction)

Step 2:

Compute \(f_{ai} \in \mathbb {R}[\eta ][\xi _{11},\ldots ,\xi _{pd}]_1\) s.t. \(f_{aj}(\xi _{11},\ldots ,\xi _{pd}):=\partial _{u^a} m_j\) for \( \xi _{b i} :=\partial _{u^b}\eta _i.\)

Step 3:

Find \(e_{p+1},\ldots ,e_{d}\in (\nabla _u \eta )^{\perp _{\bar{G}}}\) by eliminating \(\{\xi _{aj}\}\) from \(\langle e_i,\partial _{u^a} \eta \rangle _{\bar{G}}=e_{ik}(\eta ) g^{kj}(\eta ) \xi _{aj}=0\) and \(f_{aj}(\xi _{11},\ldots ,\xi _{pd})=0\).

Step 4:

Select \(e_{1},\ldots ,e_p \in \mathbb {R}[\eta ]\) s.t. \(e_1(\eta ),\ldots ,e_d(\eta )\) are linearly independent.

Step 5:

Eliminate \(v\) from

$$ X=\eta (u,0)+\sum \nolimits _{i=p+1}^{d} v_{i-p} e_i+ c\cdot \sum \nolimits _{j=1}^p f_j(u,v) e_j $$

and compute \((X-\eta )^\top \tilde{e}_j\) and \(h\in (\mathbb {R}[\eta ][X-\eta ]_3)^p\), given by Theorem 2.

  Output(Vector version):

$$ X=\eta (u,0)+\sum \nolimits _{i=p+1}^{d} v_{i-p} e_i(\eta )+ c\cdot \sum \nolimits _{j=1}^p f_j(u,v) e_j(\eta ). $$

Output(Algebraic version):

$$ (X-\eta )^\top \tilde{e}+c \cdot h(X-\eta )=0. $$

6.3.7 Reduction of the Degree of the Estimating Equations

As we noted in Sect. 6.3.4, if we set \(h_j=0\) for all \(j\), the estimator becomes the MLE. In this sense, \(c h_j\) can be recognized as a perturbation from the likelihood equations. If we select each \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3\) tactically, we can reduce the degree of the polynomial estimating equation. For algebraic background, the reader refers to Appendix A.

Here, we assume \(u\in \mathbb {R}[\eta _u]\). For example, we can set \(u_i=\eta _i\). Then \(\tilde{e}_j(u,\eta _u)\) is a function of \(\eta _u\), so we write it as \(\tilde{e}_j(\eta )\). Define an ideal \(\mathcal {I}_3\) of \(\mathbb {R}[X,\eta ]\) as

$$ \mathcal {I}_3:=\langle \{(X_i-\eta _i)(X_j-\eta _j)(X_k-\eta _k)\mid 1\le i,j,k \le d \}\rangle . $$

Select a monomial order \(\prec \) and set \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Let \(G_{\prec }=\{g_1,\ldots ,g_m\}\) be a Gröbner basis of \(\mathcal {I}_3\) with respect to \(\prec \). Then the remainder (normal form)\(r_j\) of \((X-\eta )^\top \tilde{e}_j(\eta )\), the first term of the l.h.s. of (6.3), with respect to \(G_{\prec }\), is uniquely determined for each \(j\).

Theorem 4

If the monomial order \(\prec \) is the pure lexicographic,

  1. 1.

    \(r_j\) for \(j=1,\ldots , p\) has degree at most 2 with respect to \(\eta \), and

  2. 2.

    \(r_j=0\) for \(j=1,\ldots , p\) are the estimating equations for a second-order efficient estimator.

Proof

Assume \(r_j\) has a monomial term whose degree is more than 2 with respect to \(\eta \) and represent the term as \(\eta _a \eta _b \eta _c q(\eta ,X)\) with a polynomial \(q\in \mathbb {R}(\eta ,X)\) and a combination of indices \(a,b,c\). Then \(\{\eta _a \eta _b \eta _c + (X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)\) has a smaller polynomial order than \(\eta _a \eta _b \eta _c q(\eta ,X)\) since \(\prec \) is pure lexicographic satisfying \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Therefore by subtracting \((X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)\in \mathcal {I}_3\) from \(r_j\), the polynomial degree decreases. This contradicts the fact \(r_j\) is the normal form so each \(r_j\) has degree at most 2.

Furthermore each polynomial in \(\mathcal {I}_3\) is in \(\mathbb {R}[X,u,\eta _u][X-\eta ]_3\) and therefore by taking the normal form, the condition for the algebraic version (6.3) of second-order efficiency still holds. \(\square \)

The reduction of the degree is important when we use algebraic algorithms such as homotopy continuation methods [18] to solve simultaneous polynomial equations since computational cost depends highly on the degree of the polynomials.

6.4 First-Order Efficiency

It is not surprising that, for first-order efficiency, almost the same arguments hold as for second-order efficiency.

By Theorem 5.2 of [4], a consistent estimator is first-order efficient if and only if

$$\begin{aligned} g_{\kappa a}=0. \end{aligned}$$
(6.4)

Consider an algebraic estimator \(\eta (u,v)\in \mathbb {R}[u,v]^d\) satisfying the following vector equation:

$$\begin{aligned} X=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) \end{aligned}$$
(6.5)

where, for each \(u\), \(\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}\) is a complete basis of \(\mathbb {R}^d\) s.t. \(\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0\) and \(f_j(u,v)\in \mathbb {R}[u][v]_{\ge 2}\), a polynomial whose degree of \(v\) is at least 2, for \(j=1,\ldots ,p\). Similarly, \(c\in \mathbb {R}\) is a constant for perturbation. Here, the only difference between (6.2) for the second-order efficiency and (6.5) for the first-order efficiency is the degree of the \(f_j(u,v)\) with respect to \(v\).

The algebraic version of the first-order efficient algebraic estimator is defined by the following simultaneous polynomial equalities with \(\eta _u=\eta (u,0)\).

$$\begin{aligned} (X-\eta _u)^\top \tilde{e}_j(u,\eta _u)&+c\cdot h_j(X,u,\eta _u,X-\eta _u)=0 \text{ for } j=1,\ldots ,p \end{aligned}$$
(6.6)

where \(\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} \) span \(((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}\) for every \(u\) and \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_2\) (\(\text{ degree }=2\) w.r.t. \(t\)) for \(j=1,\ldots ,p\). Here, the only difference between (6.3) for the second-order efficiency and (6.6) for the first-order efficiency is the degree of the \(h_j(X,u,\eta _u,t)\) with respect to \(t\).

Then the relation between the three different forms of first-order efficiency can be proved in the same way manner as for Theorem 1, 2 and 3.

Theorem 5

(i) Vector version (6.5) satisfies the first-order efficiency.

(ii) An estimator defined by a vector version (6.5) of the first-order efficient estimators is also represented by an algebraic version (6.6).

(iii) Every algebraic version (6.6) gives a first-order efficient estimator.

The relationship between the three forms of the first-order efficient algebraic estimators is summarized as \((4) \Leftarrow (5) \Rightarrow (6) \Rightarrow (4)\). Furthermore, if we assume the estimator has a form \(\eta \in \mathbb {R}(u)[v]\), the forms (6.4), (6.5) and (6.6) are equivalent.

Let \(\mathcal {R}:=\mathbb {Z}[X,\eta ]\) and define

$$\begin{aligned} \mathcal {I}_2:=\langle \{(X_i-\eta _i)(X_j-\eta _j)\mid 1\le i,j \le d \}\rangle \end{aligned}$$

as an ideal of \(\mathcal {R}\). In a similar manner, let \(\prec \) be a monomial order such that \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Let \(G_{\prec }=\{g_1,\ldots ,g_m\}\) be a Gröbner basis of \(\mathcal {I}_2\) with respect to \(\prec \). The properties of the normal form \(r_i\) of \((X-\eta (u,0))^\top \tilde{e}_i(u)\) with respect to \(G_{\prec }\) are then covered by the following:

Theorem 6

If the monomial order \(\prec \) is the pure lexicographic,

(i) \(r_i\) for \(i=1,\ldots , d\) has degree at most 1 with respect to \(\eta \), and

(ii) \(r_i=0\) for \(i=1,\ldots , d\) are the estimating equations for a first-order efficient estimator.

6.5 Examples

In this section, we show how to use the algebraic computation to design asymptotically efficient estimators for two simple examples. The examples satisfy the algebraic conditions (C1), (C2) and (C3) so it is verified that necessary geometric entities have an algebraic form as mentioned in Sect. 6.3.2.

6.5.1 Example: Periodic Gaussian Model

The following periodic Gaussian model shows how to compute second-order efficients estimators and their biases.

  • Statistical Model:

    $$\begin{aligned} X\sim N(\mu , \varSigma (a))\, \text {with}\, \mu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\,\text {and}\, \varSigma (a)= \begin{bmatrix} 1&a&a^2&a \\ a&1&a&a^2 \\ a^2&a&1&a \\ a&a^2&a&1 \end{bmatrix}\, \text {for}\, 0\le a < 1. \end{aligned}$$

    Here, the dimension of the full exponential family and the curved exponential family are \(d=3\) and \(p=1\), respectively.

  • Curved exponential family:

    $$ \log f(x|\theta ) = 2 \left( x_1 x_2 + x_2 x_3 + x_3 x_4 + x_4 x_1\right) \theta _{{2}}+ 2\left( x_{{3}}x_{{1}}+x_{{4}}x_{{2}} \right) \theta _{{3}}-\psi (\theta ), $$
  • Potential function:

    $$\begin{aligned} \psi (\theta )=~-~1/2\,\log ( {\theta _{{1}}}^{4}~-~4\,{\theta _{{1}}}^{2}{\theta _{{2} }}^{2}~+~8\,\theta _{{1}}{\theta _{{2}}}^{2}\theta _{{3}}~-~2\,{\theta _{{1}}} ^{2}{\theta _{{3}}}^{2}~-~4\,{\theta _{{2}}}^{2}{\theta _{{3}}}^{2}~+~{\theta _{{3}}}^{4} )~+~2\,\log ( 2\,\pi ), \end{aligned}$$
  • Natural parameter:

    $$ \theta (a)=\left[ \frac{1}{1-2a^2+4a^4},-\frac{a}{1-2a^2+4a^4}, \frac{a^2}{1-2a^2+4a^4}\right] ^\top , $$
  • Expectation parameter:   \(\eta (a)=[-2,-4a,-2a^2]^\top ,\)

  • Fisher metric with respect to \(\eta \):

    $$ (g^{ij})=\left[ {\small \begin{array}{ccc} 2\,{a}^{4}+4\,{a}^{2}+2&{}8\,a \left( 1+{a}^ {2} \right) &{}8\,{a}^{2}\\ 8\,a \left( 1+{a}^{2} \right) &{}4+24\,{a}^{2}+4\,{a}^{4}&{}8\,a \left( 1+{a}^{2} \right) \\ 8\,{a}^{2}&{}8\,a \left( 1+{a}^{2} \right) &{}2\,{a}^ {4}+4\,{a}^{2}+2\end{array} }\right] , $$
  • A set of vectors \(e_i\in \mathbb {R}^3\):

    $$ e_0(a):=[0,-1,a]^\top \in \partial _a \eta (a), $$
    $$ e_1(a):=[3a^2+1,4a,0]^\top ,~ e_2(a):=[-a^2-1,0,2]^\top \in (\partial _a \eta (a))^{\perp _{\bar{G}}}. $$
  • A vector version of the second-order efficient estimator is, for example,

    $$ x-\eta +v_1\cdot e_1 + v_2 \cdot e_2 +c \cdot v_1^3\cdot e_0=0. $$
  • A corresponding algebraic version of the second-order efficient estimator: by eliminating \(v_1\) and \(v_2\), we get \(g(a)+c\cdot h(a)=0\) where

    $$\begin{aligned} g(a):=8( a~-~1) ^{2} ( a~+~1) ^{2} ( 1~+~2{a}^{2}) ^{2} ( 4{a}^{5}~-8{a}^{3}~+~2{a}^{3}{ x_3}~-~3{ x_2}{a}^{2}~+~4a~+~4a{ x_1}~+~2a{ x_3}-{ x_2}) \end{aligned}$$

    and

    $$ h(a):=( 2{a}^{4}+{a}^{3}{ x_2}-{a}^{2}{ x_3}+2{a}^{2}+a{ x_2 }-2{ x_1}-{ x_3}-4) ^{3}. $$
  • An estimating equation for MLE:

    $$ 4{a}^{5}-8{a}^{3}+2{a}^{3}{ x_3}-3{ x_2}{a}^{2}+4a+4a{ x_1}+2a{ x_3}-{ x_2}=0. $$
  • Bias correction term for an estimator \({\hat{a}}\):   \(\hat{a} ( {\hat{a}}^{8}-4{\hat{a}}^{6}+6{\hat{a}}^{4}-4{\hat{a}}^{2}+1 ) / ( 1+2{\hat{a}}^{2} ) ^{2}.\)

6.5.2 Example: Log Marginal Model

Here, we consider a log marginal model. See [7] for more on marginal models.

  • Statistical model (Poisson regression):

    \(X_{ij} \displaystyle \mathop {\sim }^{ i.i.d} \mathrm{Po}(Np_{ij})\) s.t. \(p_{ij}\in (0,1)\) for \(i=1,2\) and \(j=1,2,3\) with model constraints:

    $$\begin{aligned} p_{11} + p_{12} + p_{13}&+p_{21} + p_{22} + p_{23} = 1,\nonumber \\ p_{11} + p_{12} + p_{13}&=p_{21} + p_{22} + p_{23},\nonumber \\ \displaystyle \frac{p_{11}/p_{21}}{p_{12}/p_{22}}&=\frac{p_{12}/p_{22}}{p_{13}/p_{23}}. \end{aligned}$$
    (6.7)

    Condition (6.7) can appear in a statistical test of whether acceleration of the ratio \(p_{1j}/p_{2j}\) is constant.

    In this case, \(d=6\) and \(p=3\).

  • Log density w.r.t. the point mass measure on \(\mathbb {Z}_{\ge 0}^6\):

    $$ \log f(x|p)=\log \left\{ \prod _{ij} e^{-Np_{ij}} (Np_{ij})^{X_{ij}}\right\} = -N + \sum _{ij} X_{ij} \log (Np_{ij}). $$
  • The full expectation family is given by

    $$ \begin{bmatrix} X_{1}&X_{2}&X_{3} \\ X_{4}&X_{5}&X_{6} \\ \end{bmatrix} := \begin{bmatrix} X_{11}&X_{12}&X_{13} \\ X_{21}&X_{22}&X_{23} \\ \end{bmatrix} ,$$
    $$ \begin{bmatrix} \eta _{1}&\eta _{2}&\eta _{3} \\ \eta _{4}&\eta _{5}&\eta _{6} \\ \end{bmatrix} = N \begin{bmatrix} p_{11}&p_{12}&p_{13} \\ p_{21}&p_{22}&p_{23} \\ \end{bmatrix} ,$$

    \(\theta ^i=\log (\eta _i)\) and \(\psi (\theta )=N\).

  • The Fisher metric w.r.t. \(\theta \):   \(g_{ij}=\frac{\partial ^2 \psi }{\partial \theta ^i \partial \theta ^j}=\delta _{ij}\eta _i\).

  • Selection of the model parameters:

    $$ [u_1,u_2,u_3]:=[\eta _1,\eta _3,\eta _5] \,\mathrm{{and}}\, [v_1,v_2,v_3]:=[\eta _2,\eta _4,\eta _6]. $$
  • A set of vectors \(e_i\in \mathbb {R}^6\):

    $$ e_0:= \begin{bmatrix} {\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{-\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{0}\\{-\eta _{3} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6}}\\{0}\\{\eta _{3} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6}} \end{bmatrix}\in (\nabla _u \eta ) , $$
    $$\begin{aligned}{}[e_1,e_2,e_3]:&= \left[ \begin{bmatrix} \eta _1\\ \eta _2\\ \eta _3\\ 0\\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} {\eta _{1} (-\eta _{1} \eta _{5}^2+\eta _{3} \eta _{5}^2)}\\{\eta _{2} (-\eta _{1} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (\eta _{2}^2 \eta _{4}-\eta _{2}^2 \eta _{6})}\\{\eta _{5} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})}\\{0} \end{bmatrix}, \begin{bmatrix} {\eta _{1} (\eta _{1} \eta _{5}^2-\eta _{3} \eta _{5}^2)}\\{\eta _{2} (\eta _{1} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (2 \eta _{1} \eta _{3} \eta _{5}+\eta _{2}^2 \eta _{6})}\\{0}\\{\eta _{6} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})} \end{bmatrix} \right] \\&\quad \in ((\nabla _u \eta )^{\perp _{\bar{G}}})^3 \end{aligned}$$
  • A vector version of the second-order efficient estimator is, for example,

    $$ X-\eta +v_1\cdot e_1 + v_2 \cdot e_2 + v_3 \cdot e_3 + c \cdot v_1^3\cdot e_0=0. $$
  • The bias correction term of the estimator = 0.

  • A set of estimating equations for MLE:

    {\(x_{{1}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{1}}{\eta _{{2}}}^{2}\eta _{{4}}{ \eta _{{6}}}^{2}-x_{{2}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2}-2\,x_{{4}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2 }-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}+2\,x_{{6}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}\),

    \(-x_{{2}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}} }^{2}+x_{{3}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{3}}{\eta _{{2}}}^{2}\eta _{ {4}}{\eta _{{6}}}^{2}-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-2\,x_{{4}} \eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}}}^{2}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5} }}^{2}+2\,x_{{6}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}\),

    \(-2\,x_{{4}}\eta _{{1}} \eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-x_{{4}}{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}+ x_{{5}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{5}}{\eta _{{2}}}^{2}\eta _{{4}}{\eta _{{6}}}^{2}+2\,x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}+x_{{6}}{\eta _{{2 }}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}\),

    \(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2} \eta _{{4}}\eta _{{6}}\), \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\), \(-\eta _{{1} }-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1\)}

    The total degree of the equations is \(5\times 5\times 5 \times 4\times 1\times 1= 500.\)

  • A set of estimating equations for a 2nd-order efficient estimator with degree at most 2:

    {\(-3\,x_{{1}}x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}+6\,x_{{1}}x_{{2}}{x_{{4 }}}^{2}x_{{6}}\eta _{{6}}+x_{{1}}x_{{2}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{ {1}}x_{{2}}{x_{{4}}}^{2}{\eta _{{6}}}^{2}+3\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}} }^{2}\eta _{{2}}-6\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{1}}x_ {{2}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{1}}x_{{2}}x_{{4}}x_{{6}}\eta _{{2 }}\eta _{{6}}-x_{{1}}x_{{2}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{1}}x_{{2}}{ x_{{6}}}^{2}{\eta _{{4}}}^{2}+3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}} -2\,x_{{1}}x_{{3}}x_{{4}}x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5 }}}^{2}x_{{6}}\eta _{{4}}+2\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}+x_ {{1}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2\,x_{{1}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{1}}x_{{4}}{x_{{5}}}^{2}\eta _{{3}}\eta _{{6}}-x_{{1}}x_{{4}} {x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{1}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4} }+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}}}^{2}{x_{{4}}}^ {2}x_{{6}}\eta _{{1}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{1}}\eta _{{6}}-3\,{x_{{2}} }^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}-2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}} \eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}}\eta _{{6}}+{x_{{2}}}^{2}{x_{ {6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{2}}-2\,x _{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _ {{1}}\eta _{{2}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{3}}x_{{4 }}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{4} }\),

    \(3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-2\,x_{{1}}x_{{3}}x_{{4} }x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+2 \,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-x_{{1}}x_{{4}}{x_{{5}}}^{ 2}\eta _{{3}}\eta _{{6}}+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}} }^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{3}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{3}}\eta _{{ 6}}-3\,{x_{{2}}}^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{3}}-2\,{x_{{2}}}^{2}x_{{4} }x_{{6}}\eta _{{3}}\eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{3}}\eta _{{6}}+{x _{{2}}}^{2}{x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}-3\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x _{{6}}\eta _{{2}}+6\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{6}}+x_{{2}}x_{{ 3}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{{2}}x_{{3}}{x_{{4}}}^{2}{\eta _{{6}}} ^{2}+3\,x_{{2}}x_{{3}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}-6\,x_{{2}}x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}} -2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{2}}x_{{3}}{x_{{6}}} ^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{2}}x_{{3}}{x_{{6}}}^{2}{\eta _{{4}}}^{2}-x_{{2}} {x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{3}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{3 }}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{3}}+2\,x_{{2}}x_{{4}}{ x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}+x_{{3}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2 \,x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{5}}}^{2 }\eta _{{1}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{ 4}}\),

    \(6\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-4\,x_{{1}}x_{{3}}x_{{ 4}}x_{{5}}\eta _{{5}}\eta _{{6}}-6\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+ 4\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-2\,x_{{1}}x_{{4}}{x_{{5} }}^{2}\eta _{{3}}\eta _{{6}}+2\,x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{ x_{{2}}}^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{5}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{ 5}}\eta _{{6}}-3\,{x_{{2}}}^{2}x_{{4}}x_{{5}}x_{{6}}\eta _{{4}}+3\,{x_{{2}}}^{ 2}x_{{4}}x_{{5}}x_{{6}}\eta _{{6}}+{x_{{2}}}^{2}x_{{4}}x_{{5}}\eta _{{4}}\eta _{{6 }}-{x_{{2}}}^{2}x_{{4}}x_{{5}}{\eta _{{6}}}^{2}-3\,{x_{{2}}}^{2}x_{{4}}{x_ {{6}}}^{2}\eta _{{5}}-{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{4}}\eta _{{5}}+{x_{{2}}}^ {2}x_{{4}}x_{{6}}\eta _{{5}}\eta _{{6}}+{x_{{2}}}^{2}x_{{5}}x_{{6}}{\eta _{{4}}}^{ 2}-{x_{{2}}}^{2}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2 }\eta _{{4}}\eta _{{5}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{5}}+2\,x_{{2} }x_{{4}}x_{{5}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{2}}x_{{4}}x_{{5}}x_{{6}}\eta _ {{2}}\eta _{{6}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{5}}-2\,x_{{3}}x_ {{4}}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+2\,x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}} \eta _{{4}}\),

    \(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}}\), \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\) , \(-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1\)}.

    The total degree of the polynomial equations is \(32\).

  • A set of estimating equations for a first-order-efficient estimator with degree at most 1:

    {\(-{x_{{5}}}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4} }x_{{1}}x_{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{1}}x_{{2}}x_{{4}}-2\,{x_{{4 }}}^{2}\eta _{{6}}x_{{1}}x_{{2}}x_{{6}}-{x_{{6}}}^{2}x_{{1}}x_{{2}}\eta _{{2}} x_{{4}}+{x_{{4}}}^{2}x_{{1}}x_{{2}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6} }}^{2}\eta _{{1}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{1}}x_{{6}} ,\)

    \( -{x_{{5} }}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x _{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{2}}x_{{3}}x_{{4}}-2\,{x_{{4}}}^{2}\eta _ {{6}}x_{{2}}x_{{3}}x_{{6}}-{x_{{6}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{4}}+{ x_{{4}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2}\eta _{ {3}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{3}}x_{{6}} ,\)

    \( -2\,{x_{{5}}}^{2} x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+2\,{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x_{{ 3}}-x_{{4}}x_{{6}}x_{{5}}{x_{{2}}}^{2}\eta _{{6}}+x_{{4}}x_{{5}}{x_{{2}}}^ {2}\eta _{{4}}x_{{6}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{5}}x_{{6}}+x_{{4}}{x_{ {6}}}^{2}{x_{{2}}}^{2}\eta _{{5}} ,\)

    \(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}},\) \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\) , \(-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+6\)}.

The estimating equations for a second-order-efficient estimator above look much more complicated than the estimating equation for the MLE, but each term of the first three polynomials are at most degree 2. Thanks to this degree reduction, the computational costs for the estimates become much smaller as we will see in Sect. 6.6.

6.6 Computation

To obtain estimates based on the method of this paper, we need fast algorithms to find the solution of polynomial equations. The authors have carried out computations using homotopy continuation method (matlab program HOM4PS2 by Lee, Li and Tsuai [18]) for the log marginal model in Sect. 6.5.2 and a data \(\bar{X}=(1,1,1,1,1,1)\).

The run time to compute each estimate on a standard laptop (Intel(R) Core (TM) i7-2670QM CPU, 2.20 GHz, 4.00 GB memory) is given by Table 6.1. The computation is repeated 10 times and the averages and the standard deviations are displayed. Note the increasing of the speed for the second-order efficient estimators is due to the degree reduction technique. The term “path” in the table heading refers to a primitive iteration step within the homotopy method. In the faster polyhedron version, the solution region is subdivided into polyhedral domains.

Table 6.1 Computational time for each estimate by the homotopy continuation methods

Figure 6.3 shows the mean squared error and the computational time of the MLE, the first-order estimator and the second-order efficient estimator of Sect. 6.5.2. The true parameter is set \(\eta ^*=(1/6,1/4,1/12,1/12,1/4,1/6)\), a point in the model manifold, and \(N\) random samples are generated i.i.d. from the distribution with the parameter. The computation is repeated for exponentially increasing sample sizes \(N=1,\ldots ,10^5\). In general, there are multiple roots for polynomial equations and here we selected the root closest to the sample mean by the Euclidean norm. Figure 6.3(1) also shows that the mean squared error is approximately the same for the three estimators, but (2) shows that the computational time is much more for the MLE.

Fig. 6.3
figure 3

The mean squared error and computation time for each estimate by the homotopy continuation method

6.7 Discussion

In this paper we have concentrated on reduction of the polynomial degree of the estimating equations and shown the benefits in computation of the solutions. We do not expect the estimators to be closed form, such as a rational polynomial form in the data. The most we can expect is they are algebraic, that is they are the solution of algebraic equations. They lie on a zero dimensional algebraic variety. It is clear that there is no escape from using mixed symbolic-numerical methods. In algebraic statistics the number of solution of the ML equation is called the ML degree. Given that we have more general estimating equations than pure ML equations this points to an extended theory or “quasi” ML degree of efficient estimator degree. The issue of exactly which solution to use as our estimators persists. In the paper we suggest taking the solution closest to the sufficient statistic in the Euclidian metric. We could use other metrics and more theory is needed.

Here we have put forward estimating equations with reduced degree and shown the benefits in terms of computation. But we could have used other criteria for choosing the equations, while remaining in the efficient class. We might prefer to choose an equation which reduces the bias further via decreasing the next order term. There may thus be some trade off between degree and bias.

Beyond the limited ambitions of this paper to look at second-order efficiency lie several other areas, notably hypothesis testing and model selection. But the question is the same: to what extent can we bring the algebraic methods to bear, for example by expressing additional differential forms and curvatures in algebraic terms. Although estimation typically requires a mixture of symbolic and numeric methods in some cases only the computation of the efficient estimate requires numeric procedures and the other computations can be carrying out symbolically.