Abstract
A strong link between information geometry and algebraic statistics is made by investigating statistical manifolds which are algebraic varieties. In particular it is shown how first and second order efficient estimators can be constructed, such as bias corrected Maximum Likelihood and more general estimators, and for which the estimating equations are purely algebraic. In addition it is shown how Gröbner basis technology, which is at the heart of algebraic statistics, can be used to reduce the degrees of the terms in the estimating equations. This points the way to the feasible use, to find the estimators, of special methods for solving polynomial equations, such as homotopy continuation methods. Simple examples are given showing both equations and computations.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
6.1 Introduction
Information geometry gives geometric insights and methods for studying the statistical efficiency of estimators, testing, prediction and model selection. The field of algebraic statistics has proceeded somewhat separately but recently a positive effort is being made to bring the two subjects together, notably [15]. This paper should be seen as part of this effort.
A straightforward way of linking the two areas is to ask how far algebraic methods can be used when the statistical manifolds of information geometry are algebraic, that is algebraic varieties or derived forms, such as rational quotients. We call such models “algebraic statistical models” and will give formal definitions.
In the standard theory for non-singular statistical models, maximum likelihood estimators (MLEs) have first-order asymptotic efficiency and bias-corrected MLEs have second-order asymptotic efficiency. A short section covers briefly the basic theory of asymptotic efficiency using differential geometry, necessary for our development.
We shall show that for some important algebraic models, the estimating equations of MLE type become polynomial and the degrees usually become very high if the model has a high-dimensional parameter space. In this paper, asymptotically efficient algebraic estimators, a generalization of bias corrected MLE, are studied. By algebraic estimators we mean estimators which are the solution of algebraic equations. A main result is that for (algebraic) curved exponential family, there are second-order efficient estimators whose polynomial degree is at most two. These are computed by decreasing the degree of the estimating equations using Gröbner basis methods, the main tool of algebraic statistics. We supply some the basic Gröbner theory in Appendix A. See [22].
The reduction of the degree saves computational costs dramatically when we use computational methods for solving the algebraic estimating equations. Here we use homotopy continuation methods of [19, 24] to demonstrate this effect for a few simple examples, for which we are able to carry out the Gröbner basis reduction. Appendix B discusses homotopy continuation methods.
Although, as mentioned, the links between computational algebraic methods and the theory of efficient estimators based on differential geometry are recent, two other areas of statistics, not covered here, exploit differential geometry methods. The first is tube theory. The seminal paper by Weyl [26] has been used to give exact confidence level values (size of tests), and bounds, for certain Gaussian simultaneous inference problems: [17, 20]. This is very much related to the theory of up-crossings of Gaussian processes using expected Euler characteristic methods, see [1] and earlier papers. The second area is the use of the resolution of singularities (incidentally related to the tube theory) in which confidence levels are related to the dimension and the solid angle tangent of cones with apex at a singularity in parameters space [12, 25]. Moreover, the degree of estimating equations for MLE has been studied for some specific algebraic models, which are not necessarily singular [11]. In this paper we cover the non-singular case, for rather more general estimators than MLE, and show that algebraic methods have a part to play.
Most of the theories in the paper can be applied to a wider class of Multivariate Gaussian models with some restrictions on their covariance matrices, for example models studied in [6, 14]. Though the second-order efficient estimators proposed in the paper can be applied to them potentially, the cost for computing Gröbner basis prevents their direct application. Further innovation in the algebraic computation is required for real applications, which is a feature of several other areas of algebraic statistics.
Section 6.2 gives some basic background in estimation and differential geometry for it. Sections 6.3 and 6.4, which are the heart of the paper, give the algebraic developments and Sect. 6.5 gives some examples. Section 6.6 carries out some computation using homotopy continuation.
6.2 Statistical Manifolds and Efficiency of Estimators
In this section, we introduce the standard setting of statistical estimation theory, via information geometry. See [2, 4] for details. It is recognized that the ideas go back to at least the work of Rao [23], Efron [13] and Dawid [10]. The subject of information geometry was initiated by Amari and his collaborators [3, 5].
Central to this family of ideas is that the rates of convergence of statistical estimators and other test statistics depend on the metric and curvature of the parametric manifolds in a neighborhood of the MLE or the null hypothesis. In addition Amari realized the importance of two special models, the affine exponential model and the affine mixture model, \(e\) and \(m\) frame respectively. In this paper we concentrate on the exponential family model but also look at curved subfamilies. By extending the dimension of the parameter space of the exponential family, we are able to cover some classes of mixture models. The extension of the exponential model to infinite dimensions is covered by [21].
6.2.1 Exponential Family and Estimators
A full exponential family is a set of probability distributions \(\{\mathrm {d}P(x|\theta ) \mid \theta \in \varTheta \}\) with a parameter space \(\varTheta \subset \mathbb {R}^d\) such that
where \(x \in \mathbb {R}^d\) is a variable representing a sufficient statistic and \(\nu \) is a carrier measure on \(\mathbb {R}^d\). Here \(x_i \theta ^i\) means \(\sum _{i} x_i \theta ^i\) (Einstein summation notation).
We call \(\theta \) a natural parameter and \(\eta =\eta (\theta ):=E[x|\theta ]\) an expectation parameter. Denote \(E= E(\varTheta ):=\{\eta (\theta ) \mid \theta \in \varTheta \}\subset \mathbb {R}^d\) as the corresponding expectation parameter space. Note that the relation \(\eta (\theta )=\nabla _\theta \psi (\theta )\) holds. If the parameter space is restricted to a subset \(\mathcal {V}_\varTheta \subset \varTheta \), we obtain a curved exponential family
The corresponding space of the expectation parameter is denoted by \(\mathcal {V}_E:=\{\eta (\theta ) \mid \theta \in \mathcal {V}_\varTheta \}\subset E\).
Figure 6.1 explains how to define an estimator by a local coordinate. Let \((u,v)\in \mathbb {R}^p \times \mathbb {R}^{d-p}\) with a dimension \(p\) of \(\mathcal {V}_\varTheta \) be a local coordinate system around the true parameter \(\theta ^*\) and define \(\mathcal {U} \subset \mathbb {R}^p\) such that \(\{\theta (u,0)| u\in \mathcal {U}\}=\mathcal {V}_\varTheta \). For a full exponential model with \(N\) samples obtained by composing a map \((X^{(1)},\ldots , X^{(N)}) \mapsto \theta (\eta )|_{\eta =\bar{X}}\) and a coordinate projection map \(\theta (u,v)\mapsto u\), we can define a (local) estimator \((X^{(1)},\ldots , X^{(N)})\mapsto u\). We define an estimator by \(\eta (u,v)\) similarly. Since \(\bar{X}\) is a sufficient statistic of \(\theta \) (and \(\eta \)) in the full exponential family, every estimator can be computed by \(\bar{X}\) rather than the original data \(\{X_i\}\). Therefore in the rest of the paper, we write \(X\) as shorthand for \(\bar{X}\).
6.2.2 Differential Geometrical Entities
Let \(w:=(u,v)\) and use indexes \(\{i,j,\ldots \}\) for \(\theta \) and \(\eta \), \(\{a,b,\ldots \}\) for \(u\), \(\{\kappa ,\lambda ,\ldots \}\) for \(v\) and \(\{\alpha ,\beta ,\ldots \}\) for \(w\). The following are used for expressing conditions for asymptotic efficiency of estimators, where Einstein notation is used.
Differential geometrical entities
-
\(\eta _i(\theta ) = \displaystyle \frac{\partial }{ \partial \theta ^i}\psi (\theta )\),
-
Fisher metric \(\underline{G}=(g_{ij})\) w.r.t. \(\theta \): \(g_{ij}(\theta )=\displaystyle \frac{\partial ^2 \psi (\theta )}{\partial \theta ^i \partial \theta ^j}\),
-
Fisher metric \(\bar{G}=(g^{ij})\) w.r.t. \(\eta \): \(\bar{G}=\underline{G}^{-1}\),
-
Jacobian: \(B_{i\alpha }(\theta ):=\displaystyle \frac{\partial \eta _i(w)}{\partial w^\alpha }\),
-
e-connection: \(\varGamma ^{(e)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \theta ^i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \eta _i(w)\right) \),
-
m-connection: \(\varGamma ^{(m)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \eta _i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \theta ^i(w)\right) \),
6.2.3 Asymptotic Statistical Inference Theory
Under some regularity conditions on the carrier measure \(\nu \), potential function \(\psi \) and the manifolds \(\mathcal {V}_\varTheta \) or \(\mathcal {V}_E\), the asymptotic theory below is available. These conditions are for guaranteeing the finiteness of the moments and the commuting of the expectation and the partial derivative \(\frac{\partial }{\partial \theta } E_\theta [f]= E_\theta [\frac{\partial f}{\partial \theta }]\). For more details of the required regularity conditions, see Sect. 2.1 of [4].
-
1.
If \(\hat{u}\) is a consistent estimator (i.e. \(P(\Vert \hat{u}-u\Vert >\epsilon )\rightarrow 0\) as \(N\rightarrow \infty \) for any \(\epsilon >0\)), the squared error matrix of \(\hat{u}\) is
$$\begin{aligned} E_u[(\hat{u}-u)(\hat{u}-u)^\top ]=\,&E_u[(\hat{u}^a-u^a)(\hat{u}^b-u^b)]\\ =\,&N^{-1}[g_{ab}-g_{a\kappa }g^{\kappa \lambda }g_{b\lambda }]^{-1}+O(N^{-2}). \end{aligned}$$Here \([\cdot ]^{-1}\) means the matrix inverse. Thus, if \(g_{a\kappa }=0\) for all \(a\) and \(\kappa \), the main term in the r.h.s. becomes minimum. We call such an estimator as a first-order efficient estimator.
-
2.
The bias term becomes
$$ E_u[\hat{u}^a-u^a]=(2N)^{-1}b^a(u)+ O(N^{-2}) $$for each \(a\) where \(b^a(u):=\varGamma ^{(m)}{}_{cd}^a(u) g^{cd}(u)\). Then, the bias corrected estimator \(\check{u}^a:=\hat{u}^a-b^a(\hat{u})\) satisfies \(E_u[\check{u}^a-u^a]=O(N^{-2})\).
-
3.
Assume \(g_{a\kappa }=0\) for all \(a\) and \(\kappa \), then the square error matrix is represented by
$$\begin{aligned} E_u[(\check{u}^a-u^a)(\check{u}^b-u^b)]=\frac{1}{N}g^{ab}+\frac{1}{2N^2} \left( \mathop {{\varGamma }^{2ab}_M}^{(m)}+2\mathop {{H}^{2ab}_M}^{(e)}+\mathop {{H}^{2ab}_A}^{(m)}\right) +o(N^{-2}). \end{aligned}$$See Theorem 5.3 of [4] and Theorem 4.4 of [2] for the definition of the terms in the r.h.s. Of the four dominating terms in the r.h.s., only
$$\begin{aligned} \mathop {{H}^{2ab}_A}^{(m)}:=g^{\kappa \mu }g^{\lambda \nu } H^{(m)}{}_{\kappa \lambda }^a H^{(m)}{}_{\mu \nu }^b \end{aligned}$$depends on the selection of the estimator.
Here \(H^{(m)}{}_{\kappa \lambda }^a\) is an embedding curvature and equal to \(\varGamma ^{(m)}{}_{\kappa \lambda }^a\) when \(g_{a\kappa }=0\) for every \(a\) and \(\kappa \). Since \(\mathop {{H}^{2ab}_A}^{(m)}\) is the square of \(\varGamma ^{(m)}{}_{\kappa \lambda }^a\), the square error matrix attains the minimum in the sense of positive definiteness if and only if
$$\begin{aligned} \left. \varGamma ^{(m)}{}_{\kappa \lambda ,a}(w)\right| _{v=0}=\left. \left( \frac{\partial ^2}{\partial v^\kappa \partial v^\lambda } \eta _i(w)\right) \left( \frac{\partial }{\partial u^a} \theta ^i(w)\right) \right| _{v=0}=0. \end{aligned}$$(6.1)Therefore the elimination of the m-connection (6.1) implies second-order efficiency of the estimator after a bias correction, i.e. it becomes optimal among the bias-corrected first-order efficient estimators up to \(O(N^{-2})\).
6.3 Algebraic Models and Efficiency of Algebraic Estimators
This section studies asymptotic efficiency for statistical models and estimators which are defined algebraically. Many models in statistics are defined algebraically. Perhaps most well known are polynomial regression models and algebraic conditions on probability models such as independence and conditional independence. Recently there has been considerable interest in marginal models [7] which are typically linear restrictions on raw probabilities. In time series autoregressive models expressed by linear transfer functions induce algebraic restrictions on covariance matrices. Our desire is to have a definition of algebraic statistical model which can be expressed from within the curved exponential family framework but is sufficiently broad to cover cases such as those just mentioned. Our solution is to allow algebraic conditions in the natural parameter \(\theta \), mean parameter \(\eta \) or both. The second way in which algebra enters is in the form of the estimator.
6.3.1 Algebraic Curved Exponential Family
We say a curved exponential family is algebraic if the following two conditions are satisfied.
- (C1):
-
\(\mathcal {V}_\varTheta \) or \(\mathcal {V}_E\) is represented by a real algebraic variety, i.e. \(\mathcal {V}_\varTheta :=\mathcal {V}(f_1,\ldots ,f_k)=\{\theta \in \mathbb {R}^d | f_1(\theta )=\cdots =f_k(\theta )=0\}\) or similarly \(\mathcal {V}_E :=\mathcal {V}(g_1,\ldots ,g_k)\) for \(f_i \in \mathbb {R}[\theta ^1,\ldots ,\theta ^d]\) and \(g_i \in \mathbb {R}[\eta _1,\ldots ,\eta _d]\).
- (C2):
-
\(\theta \mapsto \eta (\theta )\) or \(\eta \mapsto \theta (\eta )\) is represented by some algebraic equations, i.e. there are \(h_1,\ldots ,h_k \in \mathbb {R}[\theta ,\eta ]\) such that locally in \(\mathcal {V}_\varTheta \times \mathcal {V}_E\), \(h_i(\theta ,\eta )=0\) iff \(\eta (\theta )=\eta \) or \(\theta (\eta )=\theta \).
Here \(\mathbb {R}[\theta ^1,\ldots ,\theta ^d]\) means a polynomial of \(\theta ^1,\ldots ,\theta ^d\) over the real number field \(\mathbb {R}\) and \(\mathbb {R}[\theta ,\eta ]\) means \(\mathbb {R}[\theta ^1,\ldots ,\theta ^d,\eta _1,\ldots ,\eta _d]\). The integer \(k\), the size of the generators, is not necessarily equal to \(d-p\) but we assume \(\mathcal {V}_\varTheta \) (or \(\mathcal {V}_E\)) has dimension \(p\) around the true parameter. Note that if \(\psi (\theta )\) is a rational form or the logarithm of a rational form, (C2) is satisfied.
6.3.2 Algebraic Estimators
The parameter set \(\mathcal {V}_\varTheta \) (or \(\mathcal {V}_E\)) is sometimes singular for algebraic models. But throughout the following analysis, we assume non-singularity around the true parameter \(\theta ^*\in \mathcal {V}_\varTheta \) (or \(\eta ^*\in \mathcal {V}_E\) respectively).
Following the discussion at the end of Sect. 6.2.1. We call \(\theta (u,v)\) or \(\eta (u,v)\) an algebraic estimator if
- (C3):
-
\(w \mapsto \eta (w)\) or \(w \mapsto \theta (w)\) is represented algebraically.
We remark that the MLE for an algebraic curved exponential family is an algebraic estimator.
If conditions (C1), (C2) and (C3) hold, then all of the geometrical entities in Sect. 6.2.2 are characterized by special polynomial equations. Furthermore, if \(\psi (\theta )\in \mathbb {R}(\theta ) \cup \log \mathbb {R}(\theta )\) and \(\theta (w)\in \mathbb {R}(w) \cup \log \mathbb {R}(w)\), then the geometrical objects have the additional property of being rational.
6.3.3 Second-Order Efficient Algebraic Estimators, Vector Version
Consider an algebraic estimator \(\eta (u,v)\in \mathbb {R}[u,v]^d\) satisfying the following vector equation:
where, for each \(u\), \(\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}\) is a complete basis of \(\mathbb {R}^d\) such that \(\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0\) and \(f_j(u,v)\in \mathbb {R}[u][v]_{\ge 3}\), namely a polynomial whose degree in \(v\) is at least 3 with coefficients polynomial in \(u\), for \(j=1,\ldots ,p\). Remember we use a notation \(X=\bar{X}=\frac{1}{N}\sum _i X_i\). The constant \(c\) is to control the perturbation (see below).
A straightforward computation of the \(m\)-connection in (6.1) at \(v=0\) for
shows it to be zero. This gives
Theorem 1
Vector equation (6.2) satisfies the second-order efficiency (6.1).
We call (6.2) a vector version of a second-order efficient estimator. Note that if \(c=0\), (6.2) gives an estimating equation for the MLE. Thus the last term in (6.2) can be recognized as a perturbation from the MLE.
Figure 6.2 is a rough sketch of the second-order efficient estimators. Here the model is embedded in an \(m\)-affine space. Given a sample (red point), the MLE is an orthogonal projection (yellow point) to the model with respect to the Fisher metric. But a second-order efficient estimator maps the sample to the model along a “cubically” curved manifold (red curve).
6.3.4 Second-Order Efficient Algebraic Estimators, Algebraic Version
Another class of second-order efficient algebraic estimators we call the algebraic version, which is defined by the following simultaneous polynomial equations with \(\eta _u=\eta (u,0)\).
where \(\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} \) span \(((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}\) for every \(u\) and \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3\) (\(\text{ degree }=3\) in \(t\)) for \(j=1,\ldots ,p\). The constant \(c\) is to control the perturbation. The notation \(\bar{G}\) represents the Fisher metric on the full-exponential family with respect to \(\eta \). The notation \((\nabla _u \eta (u,0))^{\perp _{\bar{G}}}\) means the subspace orthogonal to \(span(\partial _a \eta (u,0))_{a=1}^p\) with respect to \(\bar{G}\) and \((\cdot )^{\perp _E}\) means the orthogonal complement in the sense of Euclidean vector space. Here, the term “degree” of a polynomial means the maximum degree of its terms. Note that the case \((X-\eta _u)^\top \tilde{e}_j(u,\eta _u)=0\) for \(j=1,\ldots ,p\) gives a special set of the estimating equations of the MLE.
Theorem 2
An estimator defined by a vector version (6.2) of the second-order efficient estimators is also represented by an algebraic version (6.3) where \(h_j(X,u,\eta _u,t)=\tilde{f}_j(u,(\tilde{e}_i^\top t)_{i=1}^p,(\tilde{e}_i^\top (X-\eta _u))_{i=1}^p)\) with a function \(\tilde{f}_j(u,v,\tilde{v})\in \mathbb {R}[u,\tilde{v}][v]_3\) such that \(\tilde{f}(u,v,v)=f(u,v)\).
Proof
Take the Euclidean inner product of both sides of (6.2) with each \(\tilde{e}_j\) which is a vector Euclidean orthogonal to the subspace \(span(\{e_i |i\ne j\})\) and obtain a system of polynomial equations. By eliminating variables \(v\) from the polynomial equations, an algebraic version is obtained. \(\square \)
Theorem 3
Every algebraic equation (6.3) gives a second-order efficient estimator (6.1).
Proof
Writing \(X=\eta (u,v)\) in (6.3), we obtain
Partially differentiate this by \(v\) twice, we obtain
since each term of \(h_j(\eta (u,v),u,\eta (u,0),\eta (u,v)-\eta (u,0))\) has degree more than 3 in its third component \((\eta _i(u,v)-\eta _i(u,0))_{i=1}^{d}\) and \(\left. \eta (u,v)-\eta (u,0)\right| _{v=0}=0\). Since \(\mathrm{span}\{\tilde{e}_j(u); j=1,\ldots ,p\}=((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}=\mathrm{span}\{\bar{G} \partial _{u_a} \eta ;a=1,\ldots ,p\}\), we obtain
This implies the estimator is second-order efficient. \(\square \)
By Theorems 1, 2 and 3, the relationship between the three forms of the second-order efficient algebraic estimators is summarized as
Furthermore, if we assume the estimator has a form \(\eta \in \mathbb {R}(u)[v]\), that is a polynomial in \(v\) with coefficients rational in \(u\), every first-order efficient estimator satisfying (6.1) can be written in a form (6.2) after resetting coordinates \(v\) for the estimating manifold. In this sense, we can say \((1) \Rightarrow (2)\) and the following corollary holds.
Corollary 1
If \(\eta \in \mathbb {R}(u)[v]\), the forms (1), (2) and (3) are equivalent.
6.3.5 Properties of the Estimators
The following theorem is a straightforward extension of the local existence of MLE. That is to say, the existence for sufficiently large sample size. The regularity conditions are essentially the same as for the MLE but with an additional condition referring to the control constant \(c\).
Proposition 1
(Existence and uniqueness of the estimate) Assume that the Fisher matrix is non-degenerate around \(\eta (u^*)\in \mathcal {V}_E\). Then the estimate given by (6.2) or (6.3) locally uniquely exists for small \(c\), i.e. there is a neighborhood \(G(u^*)\subset \mathbb {R}^d\) of \(\eta (u^*)\) and \(\delta >0\) such that for every fixed \(X \in G(u^*)\) and \(-\delta <c<\delta \), a unique estimate exists.
Proof
Under the condition of the theorem, the MLE always exists locally. Furthermore, because of the nonsingular Fisher matrix, the MLE is locally bijective (by the implicit representation theorem). Thus \((u_1,\ldots ,u_p)\mapsto (g_1(x-\eta _u),\ldots ,g_p(x-\eta _u))\) for \(g_j(x-\eta _u):=(X-\eta _u)^\top \tilde{e}_j(u,\eta _u)\) in (6.3) is locally bijective. Since \(\{g_i\}\) and \(\{h_i\}\) are continuous, we can select \(\delta >0\) for (6.3) to be locally bijective for every \(-\delta <c<\delta \). \(\square \)
6.3.6 Summary of Estimator Construction
We summarize how to define a second-order efficient algebraic estimator (vector version) and how to compute an algebraic version from it.
Input:
-
a potential function \(\psi \) satisfying (C2),
-
polynomial equations of \(\eta ,u\) and \(v\) satisfying (C3),
-
\(m_1,\ldots ,m_{d-p}\in \mathbb {R}[\eta ]\) such that \(\mathcal {V}_E=V(m_1,\ldots ,m_{d-p})\) gives the model,
-
\(f_j \in \mathbb {R}[u][v]_{\ge 3}\) and \(c\in \mathbb {R}\) for a vector version
- Step 1:
-
Compute \(\psi \) and \(\theta (\eta )\), \(G(\eta )\), (\(\varGamma ^{(m)}(\eta )\) for bias correction)
- Step 2:
-
Compute \(f_{ai} \in \mathbb {R}[\eta ][\xi _{11},\ldots ,\xi _{pd}]_1\) s.t. \(f_{aj}(\xi _{11},\ldots ,\xi _{pd}):=\partial _{u^a} m_j\) for \( \xi _{b i} :=\partial _{u^b}\eta _i.\)
- Step 3:
-
Find \(e_{p+1},\ldots ,e_{d}\in (\nabla _u \eta )^{\perp _{\bar{G}}}\) by eliminating \(\{\xi _{aj}\}\) from \(\langle e_i,\partial _{u^a} \eta \rangle _{\bar{G}}=e_{ik}(\eta ) g^{kj}(\eta ) \xi _{aj}=0\) and \(f_{aj}(\xi _{11},\ldots ,\xi _{pd})=0\).
- Step 4:
-
Select \(e_{1},\ldots ,e_p \in \mathbb {R}[\eta ]\) s.t. \(e_1(\eta ),\ldots ,e_d(\eta )\) are linearly independent.
- Step 5:
-
Eliminate \(v\) from
$$ X=\eta (u,0)+\sum \nolimits _{i=p+1}^{d} v_{i-p} e_i+ c\cdot \sum \nolimits _{j=1}^p f_j(u,v) e_j $$and compute \((X-\eta )^\top \tilde{e}_j\) and \(h\in (\mathbb {R}[\eta ][X-\eta ]_3)^p\), given by Theorem 2.
Output(Vector version):
Output(Algebraic version):
6.3.7 Reduction of the Degree of the Estimating Equations
As we noted in Sect. 6.3.4, if we set \(h_j=0\) for all \(j\), the estimator becomes the MLE. In this sense, \(c h_j\) can be recognized as a perturbation from the likelihood equations. If we select each \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3\) tactically, we can reduce the degree of the polynomial estimating equation. For algebraic background, the reader refers to Appendix A.
Here, we assume \(u\in \mathbb {R}[\eta _u]\). For example, we can set \(u_i=\eta _i\). Then \(\tilde{e}_j(u,\eta _u)\) is a function of \(\eta _u\), so we write it as \(\tilde{e}_j(\eta )\). Define an ideal \(\mathcal {I}_3\) of \(\mathbb {R}[X,\eta ]\) as
Select a monomial order \(\prec \) and set \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Let \(G_{\prec }=\{g_1,\ldots ,g_m\}\) be a Gröbner basis of \(\mathcal {I}_3\) with respect to \(\prec \). Then the remainder (normal form)\(r_j\) of \((X-\eta )^\top \tilde{e}_j(\eta )\), the first term of the l.h.s. of (6.3), with respect to \(G_{\prec }\), is uniquely determined for each \(j\).
Theorem 4
If the monomial order \(\prec \) is the pure lexicographic,
-
1.
\(r_j\) for \(j=1,\ldots , p\) has degree at most 2 with respect to \(\eta \), and
-
2.
\(r_j=0\) for \(j=1,\ldots , p\) are the estimating equations for a second-order efficient estimator.
Proof
Assume \(r_j\) has a monomial term whose degree is more than 2 with respect to \(\eta \) and represent the term as \(\eta _a \eta _b \eta _c q(\eta ,X)\) with a polynomial \(q\in \mathbb {R}(\eta ,X)\) and a combination of indices \(a,b,c\). Then \(\{\eta _a \eta _b \eta _c + (X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)\) has a smaller polynomial order than \(\eta _a \eta _b \eta _c q(\eta ,X)\) since \(\prec \) is pure lexicographic satisfying \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Therefore by subtracting \((X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)\in \mathcal {I}_3\) from \(r_j\), the polynomial degree decreases. This contradicts the fact \(r_j\) is the normal form so each \(r_j\) has degree at most 2.
Furthermore each polynomial in \(\mathcal {I}_3\) is in \(\mathbb {R}[X,u,\eta _u][X-\eta ]_3\) and therefore by taking the normal form, the condition for the algebraic version (6.3) of second-order efficiency still holds. \(\square \)
The reduction of the degree is important when we use algebraic algorithms such as homotopy continuation methods [18] to solve simultaneous polynomial equations since computational cost depends highly on the degree of the polynomials.
6.4 First-Order Efficiency
It is not surprising that, for first-order efficiency, almost the same arguments hold as for second-order efficiency.
By Theorem 5.2 of [4], a consistent estimator is first-order efficient if and only if
Consider an algebraic estimator \(\eta (u,v)\in \mathbb {R}[u,v]^d\) satisfying the following vector equation:
where, for each \(u\), \(\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}\) is a complete basis of \(\mathbb {R}^d\) s.t. \(\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0\) and \(f_j(u,v)\in \mathbb {R}[u][v]_{\ge 2}\), a polynomial whose degree of \(v\) is at least 2, for \(j=1,\ldots ,p\). Similarly, \(c\in \mathbb {R}\) is a constant for perturbation. Here, the only difference between (6.2) for the second-order efficiency and (6.5) for the first-order efficiency is the degree of the \(f_j(u,v)\) with respect to \(v\).
The algebraic version of the first-order efficient algebraic estimator is defined by the following simultaneous polynomial equalities with \(\eta _u=\eta (u,0)\).
where \(\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} \) span \(((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}\) for every \(u\) and \(h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_2\) (\(\text{ degree }=2\) w.r.t. \(t\)) for \(j=1,\ldots ,p\). Here, the only difference between (6.3) for the second-order efficiency and (6.6) for the first-order efficiency is the degree of the \(h_j(X,u,\eta _u,t)\) with respect to \(t\).
Then the relation between the three different forms of first-order efficiency can be proved in the same way manner as for Theorem 1, 2 and 3.
Theorem 5
(i) Vector version (6.5) satisfies the first-order efficiency.
(ii) An estimator defined by a vector version (6.5) of the first-order efficient estimators is also represented by an algebraic version (6.6).
(iii) Every algebraic version (6.6) gives a first-order efficient estimator.
The relationship between the three forms of the first-order efficient algebraic estimators is summarized as \((4) \Leftarrow (5) \Rightarrow (6) \Rightarrow (4)\). Furthermore, if we assume the estimator has a form \(\eta \in \mathbb {R}(u)[v]\), the forms (6.4), (6.5) and (6.6) are equivalent.
Let \(\mathcal {R}:=\mathbb {Z}[X,\eta ]\) and define
as an ideal of \(\mathcal {R}\). In a similar manner, let \(\prec \) be a monomial order such that \(\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d\). Let \(G_{\prec }=\{g_1,\ldots ,g_m\}\) be a Gröbner basis of \(\mathcal {I}_2\) with respect to \(\prec \). The properties of the normal form \(r_i\) of \((X-\eta (u,0))^\top \tilde{e}_i(u)\) with respect to \(G_{\prec }\) are then covered by the following:
Theorem 6
If the monomial order \(\prec \) is the pure lexicographic,
(i) \(r_i\) for \(i=1,\ldots , d\) has degree at most 1 with respect to \(\eta \), and
(ii) \(r_i=0\) for \(i=1,\ldots , d\) are the estimating equations for a first-order efficient estimator.
6.5 Examples
In this section, we show how to use the algebraic computation to design asymptotically efficient estimators for two simple examples. The examples satisfy the algebraic conditions (C1), (C2) and (C3) so it is verified that necessary geometric entities have an algebraic form as mentioned in Sect. 6.3.2.
6.5.1 Example: Periodic Gaussian Model
The following periodic Gaussian model shows how to compute second-order efficients estimators and their biases.
-
Statistical Model:
$$\begin{aligned} X\sim N(\mu , \varSigma (a))\, \text {with}\, \mu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\,\text {and}\, \varSigma (a)= \begin{bmatrix} 1&a&a^2&a \\ a&1&a&a^2 \\ a^2&a&1&a \\ a&a^2&a&1 \end{bmatrix}\, \text {for}\, 0\le a < 1. \end{aligned}$$Here, the dimension of the full exponential family and the curved exponential family are \(d=3\) and \(p=1\), respectively.
-
Curved exponential family:
$$ \log f(x|\theta ) = 2 \left( x_1 x_2 + x_2 x_3 + x_3 x_4 + x_4 x_1\right) \theta _{{2}}+ 2\left( x_{{3}}x_{{1}}+x_{{4}}x_{{2}} \right) \theta _{{3}}-\psi (\theta ), $$ -
Potential function:
$$\begin{aligned} \psi (\theta )=~-~1/2\,\log ( {\theta _{{1}}}^{4}~-~4\,{\theta _{{1}}}^{2}{\theta _{{2} }}^{2}~+~8\,\theta _{{1}}{\theta _{{2}}}^{2}\theta _{{3}}~-~2\,{\theta _{{1}}} ^{2}{\theta _{{3}}}^{2}~-~4\,{\theta _{{2}}}^{2}{\theta _{{3}}}^{2}~+~{\theta _{{3}}}^{4} )~+~2\,\log ( 2\,\pi ), \end{aligned}$$ -
Natural parameter:
$$ \theta (a)=\left[ \frac{1}{1-2a^2+4a^4},-\frac{a}{1-2a^2+4a^4}, \frac{a^2}{1-2a^2+4a^4}\right] ^\top , $$ -
Expectation parameter: \(\eta (a)=[-2,-4a,-2a^2]^\top ,\)
-
Fisher metric with respect to \(\eta \):
$$ (g^{ij})=\left[ {\small \begin{array}{ccc} 2\,{a}^{4}+4\,{a}^{2}+2&{}8\,a \left( 1+{a}^ {2} \right) &{}8\,{a}^{2}\\ 8\,a \left( 1+{a}^{2} \right) &{}4+24\,{a}^{2}+4\,{a}^{4}&{}8\,a \left( 1+{a}^{2} \right) \\ 8\,{a}^{2}&{}8\,a \left( 1+{a}^{2} \right) &{}2\,{a}^ {4}+4\,{a}^{2}+2\end{array} }\right] , $$ -
A set of vectors \(e_i\in \mathbb {R}^3\):
$$ e_0(a):=[0,-1,a]^\top \in \partial _a \eta (a), $$$$ e_1(a):=[3a^2+1,4a,0]^\top ,~ e_2(a):=[-a^2-1,0,2]^\top \in (\partial _a \eta (a))^{\perp _{\bar{G}}}. $$ -
A vector version of the second-order efficient estimator is, for example,
$$ x-\eta +v_1\cdot e_1 + v_2 \cdot e_2 +c \cdot v_1^3\cdot e_0=0. $$ -
A corresponding algebraic version of the second-order efficient estimator: by eliminating \(v_1\) and \(v_2\), we get \(g(a)+c\cdot h(a)=0\) where
$$\begin{aligned} g(a):=8( a~-~1) ^{2} ( a~+~1) ^{2} ( 1~+~2{a}^{2}) ^{2} ( 4{a}^{5}~-8{a}^{3}~+~2{a}^{3}{ x_3}~-~3{ x_2}{a}^{2}~+~4a~+~4a{ x_1}~+~2a{ x_3}-{ x_2}) \end{aligned}$$and
$$ h(a):=( 2{a}^{4}+{a}^{3}{ x_2}-{a}^{2}{ x_3}+2{a}^{2}+a{ x_2 }-2{ x_1}-{ x_3}-4) ^{3}. $$ -
An estimating equation for MLE:
$$ 4{a}^{5}-8{a}^{3}+2{a}^{3}{ x_3}-3{ x_2}{a}^{2}+4a+4a{ x_1}+2a{ x_3}-{ x_2}=0. $$ -
Bias correction term for an estimator \({\hat{a}}\): \(\hat{a} ( {\hat{a}}^{8}-4{\hat{a}}^{6}+6{\hat{a}}^{4}-4{\hat{a}}^{2}+1 ) / ( 1+2{\hat{a}}^{2} ) ^{2}.\)
6.5.2 Example: Log Marginal Model
Here, we consider a log marginal model. See [7] for more on marginal models.
-
Statistical model (Poisson regression):
\(X_{ij} \displaystyle \mathop {\sim }^{ i.i.d} \mathrm{Po}(Np_{ij})\) s.t. \(p_{ij}\in (0,1)\) for \(i=1,2\) and \(j=1,2,3\) with model constraints:
$$\begin{aligned} p_{11} + p_{12} + p_{13}&+p_{21} + p_{22} + p_{23} = 1,\nonumber \\ p_{11} + p_{12} + p_{13}&=p_{21} + p_{22} + p_{23},\nonumber \\ \displaystyle \frac{p_{11}/p_{21}}{p_{12}/p_{22}}&=\frac{p_{12}/p_{22}}{p_{13}/p_{23}}. \end{aligned}$$(6.7)Condition (6.7) can appear in a statistical test of whether acceleration of the ratio \(p_{1j}/p_{2j}\) is constant.
In this case, \(d=6\) and \(p=3\).
-
Log density w.r.t. the point mass measure on \(\mathbb {Z}_{\ge 0}^6\):
$$ \log f(x|p)=\log \left\{ \prod _{ij} e^{-Np_{ij}} (Np_{ij})^{X_{ij}}\right\} = -N + \sum _{ij} X_{ij} \log (Np_{ij}). $$ -
The full expectation family is given by
$$ \begin{bmatrix} X_{1}&X_{2}&X_{3} \\ X_{4}&X_{5}&X_{6} \\ \end{bmatrix} := \begin{bmatrix} X_{11}&X_{12}&X_{13} \\ X_{21}&X_{22}&X_{23} \\ \end{bmatrix} ,$$$$ \begin{bmatrix} \eta _{1}&\eta _{2}&\eta _{3} \\ \eta _{4}&\eta _{5}&\eta _{6} \\ \end{bmatrix} = N \begin{bmatrix} p_{11}&p_{12}&p_{13} \\ p_{21}&p_{22}&p_{23} \\ \end{bmatrix} ,$$\(\theta ^i=\log (\eta _i)\) and \(\psi (\theta )=N\).
-
The Fisher metric w.r.t. \(\theta \): \(g_{ij}=\frac{\partial ^2 \psi }{\partial \theta ^i \partial \theta ^j}=\delta _{ij}\eta _i\).
-
Selection of the model parameters:
$$ [u_1,u_2,u_3]:=[\eta _1,\eta _3,\eta _5] \,\mathrm{{and}}\, [v_1,v_2,v_3]:=[\eta _2,\eta _4,\eta _6]. $$ -
A set of vectors \(e_i\in \mathbb {R}^6\):
$$ e_0:= \begin{bmatrix} {\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{-\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{0}\\{-\eta _{3} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6}}\\{0}\\{\eta _{3} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6}} \end{bmatrix}\in (\nabla _u \eta ) , $$$$\begin{aligned}{}[e_1,e_2,e_3]:&= \left[ \begin{bmatrix} \eta _1\\ \eta _2\\ \eta _3\\ 0\\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} {\eta _{1} (-\eta _{1} \eta _{5}^2+\eta _{3} \eta _{5}^2)}\\{\eta _{2} (-\eta _{1} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (\eta _{2}^2 \eta _{4}-\eta _{2}^2 \eta _{6})}\\{\eta _{5} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})}\\{0} \end{bmatrix}, \begin{bmatrix} {\eta _{1} (\eta _{1} \eta _{5}^2-\eta _{3} \eta _{5}^2)}\\{\eta _{2} (\eta _{1} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (2 \eta _{1} \eta _{3} \eta _{5}+\eta _{2}^2 \eta _{6})}\\{0}\\{\eta _{6} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})} \end{bmatrix} \right] \\&\quad \in ((\nabla _u \eta )^{\perp _{\bar{G}}})^3 \end{aligned}$$ -
A vector version of the second-order efficient estimator is, for example,
$$ X-\eta +v_1\cdot e_1 + v_2 \cdot e_2 + v_3 \cdot e_3 + c \cdot v_1^3\cdot e_0=0. $$ -
The bias correction term of the estimator = 0.
-
A set of estimating equations for MLE:
{\(x_{{1}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{1}}{\eta _{{2}}}^{2}\eta _{{4}}{ \eta _{{6}}}^{2}-x_{{2}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2}-2\,x_{{4}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2 }-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}+2\,x_{{6}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}\),
\(-x_{{2}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}} }^{2}+x_{{3}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{3}}{\eta _{{2}}}^{2}\eta _{ {4}}{\eta _{{6}}}^{2}-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-2\,x_{{4}} \eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}}}^{2}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5} }}^{2}+2\,x_{{6}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}\),
\(-2\,x_{{4}}\eta _{{1}} \eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-x_{{4}}{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}+ x_{{5}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{5}}{\eta _{{2}}}^{2}\eta _{{4}}{\eta _{{6}}}^{2}+2\,x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}+x_{{6}}{\eta _{{2 }}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}\),
\(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2} \eta _{{4}}\eta _{{6}}\), \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\), \(-\eta _{{1} }-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1\)}
The total degree of the equations is \(5\times 5\times 5 \times 4\times 1\times 1= 500.\)
-
A set of estimating equations for a 2nd-order efficient estimator with degree at most 2:
{\(-3\,x_{{1}}x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}+6\,x_{{1}}x_{{2}}{x_{{4 }}}^{2}x_{{6}}\eta _{{6}}+x_{{1}}x_{{2}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{ {1}}x_{{2}}{x_{{4}}}^{2}{\eta _{{6}}}^{2}+3\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}} }^{2}\eta _{{2}}-6\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{1}}x_ {{2}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{1}}x_{{2}}x_{{4}}x_{{6}}\eta _{{2 }}\eta _{{6}}-x_{{1}}x_{{2}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{1}}x_{{2}}{ x_{{6}}}^{2}{\eta _{{4}}}^{2}+3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}} -2\,x_{{1}}x_{{3}}x_{{4}}x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5 }}}^{2}x_{{6}}\eta _{{4}}+2\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}+x_ {{1}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2\,x_{{1}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{1}}x_{{4}}{x_{{5}}}^{2}\eta _{{3}}\eta _{{6}}-x_{{1}}x_{{4}} {x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{1}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4} }+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}}}^{2}{x_{{4}}}^ {2}x_{{6}}\eta _{{1}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{1}}\eta _{{6}}-3\,{x_{{2}} }^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}-2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}} \eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}}\eta _{{6}}+{x_{{2}}}^{2}{x_{ {6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{2}}-2\,x _{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _ {{1}}\eta _{{2}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{3}}x_{{4 }}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{4} }\),
\(3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-2\,x_{{1}}x_{{3}}x_{{4} }x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+2 \,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-x_{{1}}x_{{4}}{x_{{5}}}^{ 2}\eta _{{3}}\eta _{{6}}+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}} }^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{3}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{3}}\eta _{{ 6}}-3\,{x_{{2}}}^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{3}}-2\,{x_{{2}}}^{2}x_{{4} }x_{{6}}\eta _{{3}}\eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{3}}\eta _{{6}}+{x _{{2}}}^{2}{x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}-3\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x _{{6}}\eta _{{2}}+6\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{6}}+x_{{2}}x_{{ 3}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{{2}}x_{{3}}{x_{{4}}}^{2}{\eta _{{6}}} ^{2}+3\,x_{{2}}x_{{3}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}-6\,x_{{2}}x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}} -2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{2}}x_{{3}}{x_{{6}}} ^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{2}}x_{{3}}{x_{{6}}}^{2}{\eta _{{4}}}^{2}-x_{{2}} {x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{3}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{3 }}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{3}}+2\,x_{{2}}x_{{4}}{ x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}+x_{{3}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2 \,x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{5}}}^{2 }\eta _{{1}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{ 4}}\),
\(6\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-4\,x_{{1}}x_{{3}}x_{{ 4}}x_{{5}}\eta _{{5}}\eta _{{6}}-6\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+ 4\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-2\,x_{{1}}x_{{4}}{x_{{5} }}^{2}\eta _{{3}}\eta _{{6}}+2\,x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{ x_{{2}}}^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{5}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{ 5}}\eta _{{6}}-3\,{x_{{2}}}^{2}x_{{4}}x_{{5}}x_{{6}}\eta _{{4}}+3\,{x_{{2}}}^{ 2}x_{{4}}x_{{5}}x_{{6}}\eta _{{6}}+{x_{{2}}}^{2}x_{{4}}x_{{5}}\eta _{{4}}\eta _{{6 }}-{x_{{2}}}^{2}x_{{4}}x_{{5}}{\eta _{{6}}}^{2}-3\,{x_{{2}}}^{2}x_{{4}}{x_ {{6}}}^{2}\eta _{{5}}-{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{4}}\eta _{{5}}+{x_{{2}}}^ {2}x_{{4}}x_{{6}}\eta _{{5}}\eta _{{6}}+{x_{{2}}}^{2}x_{{5}}x_{{6}}{\eta _{{4}}}^{ 2}-{x_{{2}}}^{2}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2 }\eta _{{4}}\eta _{{5}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{5}}+2\,x_{{2} }x_{{4}}x_{{5}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{2}}x_{{4}}x_{{5}}x_{{6}}\eta _ {{2}}\eta _{{6}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{5}}-2\,x_{{3}}x_ {{4}}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+2\,x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}} \eta _{{4}}\),
\(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}}\), \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\) , \(-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1\)}.
The total degree of the polynomial equations is \(32\).
-
A set of estimating equations for a first-order-efficient estimator with degree at most 1:
{\(-{x_{{5}}}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4} }x_{{1}}x_{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{1}}x_{{2}}x_{{4}}-2\,{x_{{4 }}}^{2}\eta _{{6}}x_{{1}}x_{{2}}x_{{6}}-{x_{{6}}}^{2}x_{{1}}x_{{2}}\eta _{{2}} x_{{4}}+{x_{{4}}}^{2}x_{{1}}x_{{2}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6} }}^{2}\eta _{{1}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{1}}x_{{6}} ,\)
\( -{x_{{5} }}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x _{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{2}}x_{{3}}x_{{4}}-2\,{x_{{4}}}^{2}\eta _ {{6}}x_{{2}}x_{{3}}x_{{6}}-{x_{{6}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{4}}+{ x_{{4}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2}\eta _{ {3}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{3}}x_{{6}} ,\)
\( -2\,{x_{{5}}}^{2} x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+2\,{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x_{{ 3}}-x_{{4}}x_{{6}}x_{{5}}{x_{{2}}}^{2}\eta _{{6}}+x_{{4}}x_{{5}}{x_{{2}}}^ {2}\eta _{{4}}x_{{6}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{5}}x_{{6}}+x_{{4}}{x_{ {6}}}^{2}{x_{{2}}}^{2}\eta _{{5}} ,\)
\(\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}},\) \(\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}\) , \(-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+6\)}.
The estimating equations for a second-order-efficient estimator above look much more complicated than the estimating equation for the MLE, but each term of the first three polynomials are at most degree 2. Thanks to this degree reduction, the computational costs for the estimates become much smaller as we will see in Sect. 6.6.
6.6 Computation
To obtain estimates based on the method of this paper, we need fast algorithms to find the solution of polynomial equations. The authors have carried out computations using homotopy continuation method (matlab program HOM4PS2 by Lee, Li and Tsuai [18]) for the log marginal model in Sect. 6.5.2 and a data \(\bar{X}=(1,1,1,1,1,1)\).
The run time to compute each estimate on a standard laptop (Intel(R) Core (TM) i7-2670QM CPU, 2.20 GHz, 4.00 GB memory) is given by Table 6.1. The computation is repeated 10 times and the averages and the standard deviations are displayed. Note the increasing of the speed for the second-order efficient estimators is due to the degree reduction technique. The term “path” in the table heading refers to a primitive iteration step within the homotopy method. In the faster polyhedron version, the solution region is subdivided into polyhedral domains.
Figure 6.3 shows the mean squared error and the computational time of the MLE, the first-order estimator and the second-order efficient estimator of Sect. 6.5.2. The true parameter is set \(\eta ^*=(1/6,1/4,1/12,1/12,1/4,1/6)\), a point in the model manifold, and \(N\) random samples are generated i.i.d. from the distribution with the parameter. The computation is repeated for exponentially increasing sample sizes \(N=1,\ldots ,10^5\). In general, there are multiple roots for polynomial equations and here we selected the root closest to the sample mean by the Euclidean norm. Figure 6.3(1) also shows that the mean squared error is approximately the same for the three estimators, but (2) shows that the computational time is much more for the MLE.
6.7 Discussion
In this paper we have concentrated on reduction of the polynomial degree of the estimating equations and shown the benefits in computation of the solutions. We do not expect the estimators to be closed form, such as a rational polynomial form in the data. The most we can expect is they are algebraic, that is they are the solution of algebraic equations. They lie on a zero dimensional algebraic variety. It is clear that there is no escape from using mixed symbolic-numerical methods. In algebraic statistics the number of solution of the ML equation is called the ML degree. Given that we have more general estimating equations than pure ML equations this points to an extended theory or “quasi” ML degree of efficient estimator degree. The issue of exactly which solution to use as our estimators persists. In the paper we suggest taking the solution closest to the sufficient statistic in the Euclidian metric. We could use other metrics and more theory is needed.
Here we have put forward estimating equations with reduced degree and shown the benefits in terms of computation. But we could have used other criteria for choosing the equations, while remaining in the efficient class. We might prefer to choose an equation which reduces the bias further via decreasing the next order term. There may thus be some trade off between degree and bias.
Beyond the limited ambitions of this paper to look at second-order efficiency lie several other areas, notably hypothesis testing and model selection. But the question is the same: to what extent can we bring the algebraic methods to bear, for example by expressing additional differential forms and curvatures in algebraic terms. Although estimation typically requires a mixture of symbolic and numeric methods in some cases only the computation of the efficient estimate requires numeric procedures and the other computations can be carrying out symbolically.
References
Adler, R.J., Taylor, J.E.: Random Fields and Geometry. Springer Monographs in Mathematics. Springer, New York (2007)
Amari, S., Nagaoka, H.: Methods of Information Geometry, vol. 191. American Mathematical Society, Providence (2007)
Amari, S.: Differential geometry of curved exponential families-curvatures and information loss. Ann. Stat. 10, 357–385 (1982)
Amari, S.: Differential-Geometrical Methods in Statistics. Springer, New York (1985)
Amari, S., Kumon, M.: Differential geometry of Edgeworth expansions in curved exponential family. Ann. Inst. Stat. Math. 35(1), 1–24 (1983)
Andersson, S., Madsen, J.: Symmetry and lattice conditional independence in a multivariate normal distribution. Ann. Stat. 26(2), 525–572 (1998)
Bergsma, W.P., Croon, M., Hagenaars, J.A.: Marginal Models for Dependent, Clustered, and Longitudinal Categorical Data. Springer, New York (2009)
Buchberger, B.: Bruno Buchberger’s PhD thesis 1965: an algorithm for finding the basis elements of the residue class ring of a zero dimensional polynomial ideal. J. Symbol. Comput. 41(3), 475–511 (2006)
Cox, D.A., Little, J., O’Shea, D.: Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3/e (Undergraduate Texts in Mathematics). Springer, New York (2007)
Dawid, A.P.: Further comments on some comments on a paper by Bradley Efron. Ann. Stat. 5(6), 1249 (1977)
Drton, M., Sturmfels B., Sullivant, S.: Lectures on Algebraic Statistics. Springer, New York (2009)
Drton, M.: Likelihood ratio tests and singularities. Ann. Stat. 37, 979–1012 (2009)
Efron, B.: Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Stat. 3, 1189–1242 (1975)
Gehrmann, H., Lauritzen, S.L.: Estimation of means in graphical Gaussian models with symmetries. Ann. Stat. 40(2), 1061–1073 (2012)
Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P.: Algebraic and Geometric Methods in Statistics. Cambridge University Press, Cambridge (2009)
Huber, B., Sturmfels, B.: A polyhedral method for solving sparse polynomial systems. Math. Comput. 64, 1541–1555 (1995)
Kuriki, S., Takemura, A.: On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12(2), 768–796 (2002)
Lee, T.L., Li, T.Y., Tsai, C.H.: HOM4PS2.0: a software package for solving polynomial systems by the polyhedral homotopy continuation method. Computing 83(2–3), 109–133 (2008)
Li, T.Y.: Numerical solution of multivariate polynomial systems by homotopy continuation methods. Acta Numer 6(1), 399–436 (1997)
Naiman, D.Q.: Conservative confidence bands in curvilinear regression. Ann. Stat. 14(3), 896–906 (1986)
Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23, 1543–1561 (1995)
Pistone, G., Wynn, H.P.: Generalised confounding with Gröbner bases. Biometrika 83, 653–666 (1996)
Rao, R.C.: Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37(3), 81–91 (1945)
Verschelde, J.: Algorithm 795: PHCpack: a general-purpose solver for polynomial systems by homotopy continuation. ACM Trans. Math. Softw. (TOMS) 25(2), 251–276 (1999)
Watanabe, S.: Algebraic analysis for singular statistical estimation. In: Watanabe, O., Yokomori, T. (eds.) Algorithmic Learning Theory. Springer, Berlin (1999)
Weyl, H.: On the volume of tubes. Am. J. Math. 61, 461–472 (1939)
Acknowledgments
This paper has benefited from conversations with and advice from a number of colleagues. We should thank Satoshi Kuriki, Tomonari Sei, Wicher Bergsma and Wilfred Kendall. The first author acknowledges support by JSPS KAKENHI Grant 20700258, 24700288 and the second author acknowledges support from the Institute of Statistical Mathematics for two visits in 2012 and 2013 and from UK EPSRC Grant EP/H007377/1. A first version of this paper was delivered at the WOGAS3 meeting at the University of Warwick in 2011. We thank the sponsors. The authors also thank the referees of the short version in GSI2013 and the referees of the first long version of the paper for insightful suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Normal Forms
A basic text for the materials in this section is [9]. The rapid growth of modern computational algebra can be credited to the celebrated Buchberger’s algorithm [8].
A monomial ideal \(I\) in a polynomial ring \(K[x_1,\ldots ,x_n]\) over a field \(K\) is an ideal for which there is a collection of monomials \(f_1,\ldots , f_m\) such that any \(g \in I\) can be expressed as a sum
with some polynomials \(g_i\in K[x_1,\ldots ,x_n]\). We can appeal to the representation of a monomial \(x^{\alpha }=x_1^{\alpha _1}\ldots x_n^{\alpha _n}\) by its exponent \(\alpha =(\alpha _1,\ldots ,\alpha _n)\). If \(\beta \ge 0\) is another exponent then
and \(\alpha + \beta \) is in the positive (shorthand for non-negative) “orthant” with corner at \(\alpha \). The set of all monomials in a monomial ideal is the union of all positive orthants whose “corners” are given by the exponent vectors of the generating monomial \(f_1, \ldots , f_m\). A monomial ordering written \(x^{\alpha } \prec x^{\beta }\) is a total (linear) ordering on monomials such that for \(\gamma \ge 0\), \(x^{\alpha } \prec x^{\beta } \Rightarrow x^{\alpha +\gamma } \prec x^{\beta + \gamma }\). Any polynomial \(f(x)\) has a leading terms with respect to \(\prec \), written \(LT(f)\).
There are, in general, many ways to express a given ideal \(I\) as being generated from a basis \(I = \langle f_1,\ldots , f_m\rangle \). That is to say, there are many choices of basis. Given an ideal \(I\) a set \(\{g_1, \ldots g_m\}\) is called a Gröbner basis (G-basis) if:
where \(\langle LT(I)\rangle \) is the ideal generated by all the monomials in \(I\). We sometimes refer to \(\langle LT(I) \rangle \) as the leading term ideal. Any ideal \(I\) has a Gröbner basis and any Gröbner basis in the ideal is a basis of the ideal.
Given a monomial ordering and an ideal expressed in terms of the G-basis, \(I\;=\; \langle g_1,\ldots , g_m\rangle \), any polynomial \(f\) has a unique remainder with respect the quotient operation \(K[x_1, \ldots , x_k]/I\). That is
We call the remainder \(r(x)\) the normal form of \(f\) with respect to \(I\) and write \(NF(f)\). Or, to stress the fact that it may depend on \(\prec \), we write \(NF(f, \prec )\). Given a monomial ordering \(\prec \), a polynomial \(f=\sum _{\alpha \in L} \theta _{\alpha } x^{\alpha }\) for some \(L\) is a normal form with respect to \(\prec \) if \(x^{\alpha } \notin \langle LT(f) \rangle \) for all \(\alpha \in L\). An equivalent way of saying this is: given an ideal \(I\) and a monomial ordering \(\prec \), for every \(f \in K[x_1,\ldots ,x_k]\) there is a unique normal form \(NF(f)\) such that \(f-NF(f) \in I\).
B Homotopy Continuation Method
Homotopy continuation method is an algorithm to find the solutions of simultaneous polynomial equations numerically. See, for example, [19, 24] for more details of the algorithm and theory.
We will explain the method briefly by a simple example of 2 equations with 2 unknowns
- Step 1 :
-
Select arbitrary polynomials of the form:
$$\begin{aligned} f_0(x,y):=f_0(x):=a_1x^{d_1}-b_1&=0,\nonumber \\ g_0(x,y):=g_0(y):=a_2y^{d_2}-b_2&=0 \end{aligned}$$(6.8)where \(d_1= \deg (f)\) and \(d_2=\deg (g)\). Polynomial equations in this form are easy to solve.
- Step 2 :
-
Take the convex combinations:
$$\begin{aligned} f_t(x,y):=\,&t f(x,y)+ (1-t) f_0(x,y),\\ g_t(x,y):=\,&t g(x,y)+ (1-t) g_0(x,y) \end{aligned}$$then our target becomes the solution for \(t=1\).
- Step 3 :
-
Compute the solution for \(t=\delta \) for small \(\delta \) by the solution for \(t=0\) numerically.
- Step 4 :
-
Repeat this until we obtain the solution for \(t=1\).
Figure 6.4 shows a sketch of the algorithm. This algorithm is called the (linear) homotopy continuation method and justified if the path connects \(t=0\) and \(t=1\) continuously without an intersection. That can be proved for almost all \(a\) and \(b\). See [19].
For each computation for the homotopy continuation method, the number of the paths is the number of the solutions of (6.8). In this case, the number of paths is \(d_1 d_2\). In general case with \(m\) unknowns, it becomes \(\prod _{i=1}^m d_i\) and this causes a serious problem for computational cost. Therefore decreasing the degree of second-order efficient estimators plays an important role for the homotopy continuation method.
Note that in order to solve this computational problem, the authors of [16] proposed the nonlinear homotopy continuation methods (or the polyhedral continuation methods). But as we can see in Sect. 6.5.2, the degree of the polynomials still affects the computational costs.
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kobayashi, K., Wynn, H.P. (2014). Computational Algebraic Methods in Efficient Estimation. In: Nielsen, F. (eds) Geometric Theory of Information. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-05317-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-05317-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05316-5
Online ISBN: 978-3-319-05317-2
eBook Packages: EngineeringEngineering (R0)