Computational Algebraic Methods in Efficient Estimation

Kobayashi, Kei; Wynn, Henry P.

doi:10.1007/978-3-319-05317-2_6

Kei Kobayashi² &
Henry P. Wynn³

Part of the book series: Signals and Communication Technology ((SCT))

1782 Accesses

Abstract

A strong link between information geometry and algebraic statistics is made by investigating statistical manifolds which are algebraic varieties. In particular it is shown how first and second order efficient estimators can be constructed, such as bias corrected Maximum Likelihood and more general estimators, and for which the estimating equations are purely algebraic. In addition it is shown how Gröbner basis technology, which is at the heart of algebraic statistics, can be used to reduce the degrees of the terms in the estimating equations. This points the way to the feasible use, to find the estimators, of special methods for solving polynomial equations, such as homotopy continuation methods. Simple examples are given showing both equations and computations.

Access provided by Autonomous University of Puebla. Download chapter PDF

Asymptotically Efficient Estimators for Algebraic Statistical Manifolds

Recent advances in algebraic geometry and Bayesian statistics

Article 06 December 2022

Computational Information Geometry in Statistics: Foundations

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

6.1 Introduction

Information geometry gives geometric insights and methods for studying the statistical efficiency of estimators, testing, prediction and model selection. The field of algebraic statistics has proceeded somewhat separately but recently a positive effort is being made to bring the two subjects together, notably [15]. This paper should be seen as part of this effort.

A straightforward way of linking the two areas is to ask how far algebraic methods can be used when the statistical manifolds of information geometry are algebraic, that is algebraic varieties or derived forms, such as rational quotients. We call such models “algebraic statistical models” and will give formal definitions.

In the standard theory for non-singular statistical models, maximum likelihood estimators (MLEs) have first-order asymptotic efficiency and bias-corrected MLEs have second-order asymptotic efficiency. A short section covers briefly the basic theory of asymptotic efficiency using differential geometry, necessary for our development.

We shall show that for some important algebraic models, the estimating equations of MLE type become polynomial and the degrees usually become very high if the model has a high-dimensional parameter space. In this paper, asymptotically efficient algebraic estimators, a generalization of bias corrected MLE, are studied. By algebraic estimators we mean estimators which are the solution of algebraic equations. A main result is that for (algebraic) curved exponential family, there are second-order efficient estimators whose polynomial degree is at most two. These are computed by decreasing the degree of the estimating equations using Gröbner basis methods, the main tool of algebraic statistics. We supply some the basic Gröbner theory in Appendix A. See [22].

The reduction of the degree saves computational costs dramatically when we use computational methods for solving the algebraic estimating equations. Here we use homotopy continuation methods of [19, 24] to demonstrate this effect for a few simple examples, for which we are able to carry out the Gröbner basis reduction. Appendix B discusses homotopy continuation methods.

Although, as mentioned, the links between computational algebraic methods and the theory of efficient estimators based on differential geometry are recent, two other areas of statistics, not covered here, exploit differential geometry methods. The first is tube theory. The seminal paper by Weyl [26] has been used to give exact confidence level values (size of tests), and bounds, for certain Gaussian simultaneous inference problems: [17, 20]. This is very much related to the theory of up-crossings of Gaussian processes using expected Euler characteristic methods, see [1] and earlier papers. The second area is the use of the resolution of singularities (incidentally related to the tube theory) in which confidence levels are related to the dimension and the solid angle tangent of cones with apex at a singularity in parameters space [12, 25]. Moreover, the degree of estimating equations for MLE has been studied for some specific algebraic models, which are not necessarily singular [11]. In this paper we cover the non-singular case, for rather more general estimators than MLE, and show that algebraic methods have a part to play.

Most of the theories in the paper can be applied to a wider class of Multivariate Gaussian models with some restrictions on their covariance matrices, for example models studied in [6, 14]. Though the second-order efficient estimators proposed in the paper can be applied to them potentially, the cost for computing Gröbner basis prevents their direct application. Further innovation in the algebraic computation is required for real applications, which is a feature of several other areas of algebraic statistics.

Section 6.2 gives some basic background in estimation and differential geometry for it. Sections 6.3 and 6.4, which are the heart of the paper, give the algebraic developments and Sect. 6.5 gives some examples. Section 6.6 carries out some computation using homotopy continuation.

6.2 Statistical Manifolds and Efficiency of Estimators

In this section, we introduce the standard setting of statistical estimation theory, via information geometry. See [2, 4] for details. It is recognized that the ideas go back to at least the work of Rao [23], Efron [13] and Dawid [10]. The subject of information geometry was initiated by Amari and his collaborators [3, 5].

Central to this family of ideas is that the rates of convergence of statistical estimators and other test statistics depend on the metric and curvature of the parametric manifolds in a neighborhood of the MLE or the null hypothesis. In addition Amari realized the importance of two special models, the affine exponential model and the affine mixture model, $e$ and $m$ frame respectively. In this paper we concentrate on the exponential family model but also look at curved subfamilies. By extending the dimension of the parameter space of the exponential family, we are able to cover some classes of mixture models. The extension of the exponential model to infinite dimensions is covered by [21].

6.2.1 Exponential Family and Estimators

A full exponential family is a set of probability distributions $\{\mathrm {d}P(x|\theta ) \mid \theta \in \varTheta \}$ with a parameter space $\varTheta \subset \mathbb {R}^d$ such that

$$ \mathrm {d}P(x|\theta )=\exp (x_i \theta ^i-\psi (\theta )) \mathrm {d}\nu , $$

where $x \in \mathbb {R}^d$ is a variable representing a sufficient statistic and $\nu $ is a carrier measure on $\mathbb {R}^d$. Here $x_i \theta ^i$ means $\sum _{i} x_i \theta ^i$ (Einstein summation notation).

We call $\theta $ a natural parameter and $\eta =\eta (\theta ):=E[x|\theta ]$ an expectation parameter. Denote $E= E(\varTheta ):=\{\eta (\theta ) \mid \theta \in \varTheta \}\subset \mathbb {R}^d$ as the corresponding expectation parameter space. Note that the relation $\eta (\theta )=\nabla _\theta \psi (\theta )$ holds. If the parameter space is restricted to a subset $\mathcal {V}_\varTheta \subset \varTheta $, we obtain a curved exponential family

$$ \{\mathrm {d}P(x|\theta ) \mid \theta \in \mathcal {V}_\varTheta \}. $$

The corresponding space of the expectation parameter is denoted by $\mathcal {V}_E:=\{\eta (\theta ) \mid \theta \in \mathcal {V}_\varTheta \}\subset E$.

Figure 6.1 explains how to define an estimator by a local coordinate. Let $(u,v)\in \mathbb {R}^p \times \mathbb {R}^{d-p}$ with a dimension $p$ of $\mathcal {V}_\varTheta $ be a local coordinate system around the true parameter $\theta ^*$ and define $\mathcal {U} \subset \mathbb {R}^p$ such that $\{\theta (u,0)| u\in \mathcal {U}\}=\mathcal {V}_\varTheta $. For a full exponential model with $N$ samples obtained by composing a map $(X^{(1)},\ldots , X^{(N)}) \mapsto \theta (\eta )|_{\eta =\bar{X}}$ and a coordinate projection map $\theta (u,v)\mapsto u$, we can define a (local) estimator $(X^{(1)},\ldots , X^{(N)})\mapsto u$. We define an estimator by $\eta (u,v)$ similarly. Since $\bar{X}$ is a sufficient statistic of $\theta $ (and $\eta $) in the full exponential family, every estimator can be computed by $\bar{X}$ rather than the original data $\{X_i\}$. Therefore in the rest of the paper, we write $X$ as shorthand for $\bar{X}$.

6.2.2 Differential Geometrical Entities

Let $w:=(u,v)$ and use indexes $\{i,j,\ldots \}$ for $\theta $ and $\eta $, $\{a,b,\ldots \}$ for $u$, $\{\kappa ,\lambda ,\ldots \}$ for $v$ and $\{\alpha ,\beta ,\ldots \}$ for $w$. The following are used for expressing conditions for asymptotic efficiency of estimators, where Einstein notation is used.

Differential geometrical entities

$\eta _i(\theta ) = \displaystyle \frac{\partial }{ \partial \theta ^i}\psi (\theta )$,
Fisher metric $\underline{G}=(g_{ij})$ w.r.t. $\theta $: $g_{ij}(\theta )=\displaystyle \frac{\partial ^2 \psi (\theta )}{\partial \theta ^i \partial \theta ^j}$,
Fisher metric $\bar{G}=(g^{ij})$ w.r.t. $\eta $: $\bar{G}=\underline{G}^{-1}$,
Jacobian: $B_{i\alpha }(\theta ):=\displaystyle \frac{\partial \eta _i(w)}{\partial w^\alpha }$,
e-connection: $\varGamma ^{(e)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \theta ^i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \eta _i(w)\right) $,
m-connection: $\varGamma ^{(m)}_{\alpha \beta ,\gamma }=\left( \displaystyle \frac{\partial ^2}{\partial w^\alpha \partial w^\beta } \eta _i(w)\right) \left( \displaystyle \frac{\partial }{\partial w^\gamma } \theta ^i(w)\right) $,

6.2.3 Asymptotic Statistical Inference Theory

Under some regularity conditions on the carrier measure $\nu $, potential function $\psi $ and the manifolds $\mathcal {V}_\varTheta $ or $\mathcal {V}_E$, the asymptotic theory below is available. These conditions are for guaranteeing the finiteness of the moments and the commuting of the expectation and the partial derivative $\frac{\partial }{\partial \theta } E_\theta [f]= E_\theta [\frac{\partial f}{\partial \theta }]$. For more details of the required regularity conditions, see Sect. 2.1 of [4].

1.
If $\hat{u}$ is a consistent estimator (i.e. $P(\Vert \hat{u}-u\Vert >\epsilon )\rightarrow 0$ as $N\rightarrow \infty $ for any $\epsilon >0$), the squared error matrix of $\hat{u}$ is
$$\begin{aligned} E_u[(\hat{u}-u)(\hat{u}-u)^\top ]=\,&E_u[(\hat{u}^a-u^a)(\hat{u}^b-u^b)]\\ =\,&N^{-1}[g_{ab}-g_{a\kappa }g^{\kappa \lambda }g_{b\lambda }]^{-1}+O(N^{-2}). \end{aligned}$$
Here $[\cdot ]^{-1}$ means the matrix inverse. Thus, if $g_{a\kappa }=0$ for all $a$ and $\kappa $, the main term in the r.h.s. becomes minimum. We call such an estimator as a first-order efficient estimator.
2.
The bias term becomes
$$ E_u[\hat{u}^a-u^a]=(2N)^{-1}b^a(u)+ O(N^{-2}) $$
for each $a$ where $b^a(u):=\varGamma ^{(m)}{}_{cd}^a(u) g^{cd}(u)$. Then, the bias corrected estimator $\check{u}^a:=\hat{u}^a-b^a(\hat{u})$ satisfies $E_u[\check{u}^a-u^a]=O(N^{-2})$.
3.
Assume $g_{a\kappa }=0$ for all $a$ and $\kappa $, then the square error matrix is represented by
$$\begin{aligned} E_u[(\check{u}^a-u^a)(\check{u}^b-u^b)]=\frac{1}{N}g^{ab}+\frac{1}{2N^2} \left( \mathop {{\varGamma }^{2ab}_M}^{(m)}+2\mathop {{H}^{2ab}_M}^{(e)}+\mathop {{H}^{2ab}_A}^{(m)}\right) +o(N^{-2}). \end{aligned}$$
See Theorem 5.3 of [4] and Theorem 4.4 of [2] for the definition of the terms in the r.h.s. Of the four dominating terms in the r.h.s., only
$$\begin{aligned} \mathop {{H}^{2ab}_A}^{(m)}:=g^{\kappa \mu }g^{\lambda \nu } H^{(m)}{}_{\kappa \lambda }^a H^{(m)}{}_{\mu \nu }^b \end{aligned}$$
depends on the selection of the estimator.

Here $H^{(m)}{}_{\kappa \lambda }^a$ is an embedding curvature and equal to $\varGamma ^{(m)}{}_{\kappa \lambda }^a$ when $g_{a\kappa }=0$ for every $a$ and $\kappa $. Since $\mathop {{H}^{2ab}_A}^{(m)}$ is the square of $\varGamma ^{(m)}{}_{\kappa \lambda }^a$, the square error matrix attains the minimum in the sense of positive definiteness if and only if
$$\begin{aligned} \left. \varGamma ^{(m)}{}_{\kappa \lambda ,a}(w)\right| _{v=0}=\left. \left( \frac{\partial ^2}{\partial v^\kappa \partial v^\lambda } \eta _i(w)\right) \left( \frac{\partial }{\partial u^a} \theta ^i(w)\right) \right| _{v=0}=0. \end{aligned}$$
(6.1)
Therefore the elimination of the m-connection (6.1) implies second-order efficiency of the estimator after a bias correction, i.e. it becomes optimal among the bias-corrected first-order efficient estimators up to $O(N^{-2})$.

6.3 Algebraic Models and Efficiency of Algebraic Estimators

This section studies asymptotic efficiency for statistical models and estimators which are defined algebraically. Many models in statistics are defined algebraically. Perhaps most well known are polynomial regression models and algebraic conditions on probability models such as independence and conditional independence. Recently there has been considerable interest in marginal models [7] which are typically linear restrictions on raw probabilities. In time series autoregressive models expressed by linear transfer functions induce algebraic restrictions on covariance matrices. Our desire is to have a definition of algebraic statistical model which can be expressed from within the curved exponential family framework but is sufficiently broad to cover cases such as those just mentioned. Our solution is to allow algebraic conditions in the natural parameter $\theta $, mean parameter $\eta $ or both. The second way in which algebra enters is in the form of the estimator.

6.3.1 Algebraic Curved Exponential Family

We say a curved exponential family is algebraic if the following two conditions are satisfied.

(C1):: $\mathcal {V}_\varTheta $ or $\mathcal {V}_E$ is represented by a real algebraic variety, i.e. $\mathcal {V}_\varTheta :=\mathcal {V}(f_1,\ldots ,f_k)=\{\theta \in \mathbb {R}^d | f_1(\theta )=\cdots =f_k(\theta )=0\}$ or similarly $\mathcal {V}_E :=\mathcal {V}(g_1,\ldots ,g_k)$ for $f_i \in \mathbb {R}[\theta ^1,\ldots ,\theta ^d]$ and $g_i \in \mathbb {R}[\eta _1,\ldots ,\eta _d]$.
(C2):: $\theta \mapsto \eta (\theta )$ or $\eta \mapsto \theta (\eta )$ is represented by some algebraic equations, i.e. there are $h_1,\ldots ,h_k \in \mathbb {R}[\theta ,\eta ]$ such that locally in $\mathcal {V}_\varTheta \times \mathcal {V}_E$, $h_i(\theta ,\eta )=0$ iff $\eta (\theta )=\eta $ or $\theta (\eta )=\theta $.

Here $\mathbb {R}[\theta ^1,\ldots ,\theta ^d]$ means a polynomial of $\theta ^1,\ldots ,\theta ^d$ over the real number field $\mathbb {R}$ and $\mathbb {R}[\theta ,\eta ]$ means $\mathbb {R}[\theta ^1,\ldots ,\theta ^d,\eta _1,\ldots ,\eta _d]$. The integer $k$, the size of the generators, is not necessarily equal to $d-p$ but we assume $\mathcal {V}_\varTheta $ (or $\mathcal {V}_E$) has dimension $p$ around the true parameter. Note that if $\psi (\theta )$ is a rational form or the logarithm of a rational form, (C2) is satisfied.

6.3.2 Algebraic Estimators

The parameter set $\mathcal {V}_\varTheta $ (or $\mathcal {V}_E$) is sometimes singular for algebraic models. But throughout the following analysis, we assume non-singularity around the true parameter $\theta ^*\in \mathcal {V}_\varTheta $ (or $\eta ^*\in \mathcal {V}_E$ respectively).

Following the discussion at the end of Sect. 6.2.1. We call $\theta (u,v)$ or $\eta (u,v)$ an algebraic estimator if

(C3):: $w \mapsto \eta (w)$ or $w \mapsto \theta (w)$ is represented algebraically.

We remark that the MLE for an algebraic curved exponential family is an algebraic estimator.

If conditions (C1), (C2) and (C3) hold, then all of the geometrical entities in Sect. 6.2.2 are characterized by special polynomial equations. Furthermore, if $\psi (\theta )\in \mathbb {R}(\theta ) \cup \log \mathbb {R}(\theta )$ and $\theta (w)\in \mathbb {R}(w) \cup \log \mathbb {R}(w)$, then the geometrical objects have the additional property of being rational.

6.3.3 Second-Order Efficient Algebraic Estimators, Vector Version

Consider an algebraic estimator $\eta (u,v)\in \mathbb {R}[u,v]^d$ satisfying the following vector equation:

$$\begin{aligned} X=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) \end{aligned}$$

(6.2)

where, for each $u$, $\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}$ is a complete basis of $\mathbb {R}^d$ such that $\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0$ and $f_j(u,v)\in \mathbb {R}[u][v]_{\ge 3}$, namely a polynomial whose degree in $v$ is at least 3 with coefficients polynomial in $u$, for $j=1,\ldots ,p$. Remember we use a notation $X=\bar{X}=\frac{1}{N}\sum _i X_i$. The constant $c$ is to control the perturbation (see below).

A straightforward computation of the $m$-connection in (6.1) at $v=0$ for

$$ \eta (w)=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) $$

shows it to be zero. This gives

Theorem 1

Vector equation (6.2) satisfies the second-order efficiency (6.1).

We call (6.2) a vector version of a second-order efficient estimator. Note that if $c=0$, (6.2) gives an estimating equation for the MLE. Thus the last term in (6.2) can be recognized as a perturbation from the MLE.

Figure 6.2 is a rough sketch of the second-order efficient estimators. Here the model is embedded in an $m$-affine space. Given a sample (red point), the MLE is an orthogonal projection (yellow point) to the model with respect to the Fisher metric. But a second-order efficient estimator maps the sample to the model along a “cubically” curved manifold (red curve).

6.3.4 Second-Order Efficient Algebraic Estimators, Algebraic Version

Another class of second-order efficient algebraic estimators we call the algebraic version, which is defined by the following simultaneous polynomial equations with $\eta _u=\eta (u,0)$.

$$\begin{aligned} (X-\eta _u)^\top \tilde{e}_j(u,\eta _u)&+c\cdot h_j(X,u,\eta _u,X-\eta _u)=0 \text{ for } j=1,\ldots ,p \end{aligned}$$

(6.3)

where $\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} $ span $((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}$ for every $u$ and $h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3$ ($\text{ degree }=3$ in $t$) for $j=1,\ldots ,p$. The constant $c$ is to control the perturbation. The notation $\bar{G}$ represents the Fisher metric on the full-exponential family with respect to $\eta $. The notation $(\nabla _u \eta (u,0))^{\perp _{\bar{G}}}$ means the subspace orthogonal to $span(\partial _a \eta (u,0))_{a=1}^p$ with respect to $\bar{G}$ and $(\cdot )^{\perp _E}$ means the orthogonal complement in the sense of Euclidean vector space. Here, the term “degree” of a polynomial means the maximum degree of its terms. Note that the case $(X-\eta _u)^\top \tilde{e}_j(u,\eta _u)=0$ for $j=1,\ldots ,p$ gives a special set of the estimating equations of the MLE.

Theorem 2

An estimator defined by a vector version (6.2) of the second-order efficient estimators is also represented by an algebraic version (6.3) where $h_j(X,u,\eta _u,t)=\tilde{f}_j(u,(\tilde{e}_i^\top t)_{i=1}^p,(\tilde{e}_i^\top (X-\eta _u))_{i=1}^p)$ with a function $\tilde{f}_j(u,v,\tilde{v})\in \mathbb {R}[u,\tilde{v}][v]_3$ such that $\tilde{f}(u,v,v)=f(u,v)$.

Proof

Take the Euclidean inner product of both sides of (6.2) with each $\tilde{e}_j$ which is a vector Euclidean orthogonal to the subspace $span(\{e_i |i\ne j\})$ and obtain a system of polynomial equations. By eliminating variables $v$ from the polynomial equations, an algebraic version is obtained. $\square $

Theorem 3

Every algebraic equation (6.3) gives a second-order efficient estimator (6.1).

Proof

Writing $X=\eta (u,v)$ in (6.3), we obtain

$$\begin{aligned} (\eta (u,v)-\eta (u,0))^\top \tilde{e}_j(u) + c \cdot h_j(\eta (u,v),u,\eta (u,0),\eta (u,v)-\eta (u,0))=0. \end{aligned}$$

Partially differentiate this by $v$ twice, we obtain

$$\begin{aligned} \left. \left( \frac{\partial ^2 \eta (u,v)}{\partial v^\lambda \partial v^\kappa }\right) ^\top \tilde{e}_j(u)\right| _{v=0} = 0, \end{aligned}$$

since each term of $h_j(\eta (u,v),u,\eta (u,0),\eta (u,v)-\eta (u,0))$ has degree more than 3 in its third component $(\eta _i(u,v)-\eta _i(u,0))_{i=1}^{d}$ and $\left. \eta (u,v)-\eta (u,0)\right| _{v=0}=0$. Since $\mathrm{span}\{\tilde{e}_j(u); j=1,\ldots ,p\}=((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}=\mathrm{span}\{\bar{G} \partial _{u_a} \eta ;a=1,\ldots ,p\}$, we obtain

$$\begin{aligned} \left. \varGamma _{\kappa \lambda a}^{(m)} \right| _{v=0} = \left. \frac{\partial ^2 \eta _i}{\partial v^\lambda \partial v^\kappa } g^{ij} \frac{\partial \eta _j}{\partial u^a} \right| _{v=0} = 0. \end{aligned}$$

This implies the estimator is second-order efficient. $\square $

By Theorems 1, 2 and 3, the relationship between the three forms of the second-order efficient algebraic estimators is summarized as

$$ (1) \Leftarrow (2) \Rightarrow (3) \Rightarrow (1). $$

Furthermore, if we assume the estimator has a form $\eta \in \mathbb {R}(u)[v]$, that is a polynomial in $v$ with coefficients rational in $u$, every first-order efficient estimator satisfying (6.1) can be written in a form (6.2) after resetting coordinates $v$ for the estimating manifold. In this sense, we can say $(1) \Rightarrow (2)$ and the following corollary holds.

Corollary 1

If $\eta \in \mathbb {R}(u)[v]$, the forms (1), (2) and (3) are equivalent.

6.3.5 Properties of the Estimators

The following theorem is a straightforward extension of the local existence of MLE. That is to say, the existence for sufficiently large sample size. The regularity conditions are essentially the same as for the MLE but with an additional condition referring to the control constant $c$.

Proposition 1

(Existence and uniqueness of the estimate) Assume that the Fisher matrix is non-degenerate around $\eta (u^*)\in \mathcal {V}_E$. Then the estimate given by (6.2) or (6.3) locally uniquely exists for small $c$, i.e. there is a neighborhood $G(u^*)\subset \mathbb {R}^d$ of $\eta (u^*)$ and $\delta >0$ such that for every fixed $X \in G(u^*)$ and $-\delta <c<\delta $, a unique estimate exists.

Proof

Under the condition of the theorem, the MLE always exists locally. Furthermore, because of the nonsingular Fisher matrix, the MLE is locally bijective (by the implicit representation theorem). Thus $(u_1,\ldots ,u_p)\mapsto (g_1(x-\eta _u),\ldots ,g_p(x-\eta _u))$ for $g_j(x-\eta _u):=(X-\eta _u)^\top \tilde{e}_j(u,\eta _u)$ in (6.3) is locally bijective. Since $\{g_i\}$ and $\{h_i\}$ are continuous, we can select $\delta >0$ for (6.3) to be locally bijective for every $-\delta <c<\delta $. $\square $

6.3.6 Summary of Estimator Construction

We summarize how to define a second-order efficient algebraic estimator (vector version) and how to compute an algebraic version from it.

Input:

a potential function $\psi $ satisfying (C2),
polynomial equations of $\eta ,u$ and $v$ satisfying (C3),
$m_1,\ldots ,m_{d-p}\in \mathbb {R}[\eta ]$ such that $\mathcal {V}_E=V(m_1,\ldots ,m_{d-p})$ gives the model,
$f_j \in \mathbb {R}[u][v]_{\ge 3}$ and $c\in \mathbb {R}$ for a vector version

Step 1:

Compute $\psi $ and $\theta (\eta )$, $G(\eta )$, ($\varGamma ^{(m)}(\eta )$ for bias correction)

Step 2:

Compute $f_{ai} \in \mathbb {R}[\eta ][\xi _{11},\ldots ,\xi _{pd}]_1$ s.t. $f_{aj}(\xi _{11},\ldots ,\xi _{pd}):=\partial _{u^a} m_j$ for $ \xi _{b i} :=\partial _{u^b}\eta _i.$

Step 3:

Find $e_{p+1},\ldots ,e_{d}\in (\nabla _u \eta )^{\perp _{\bar{G}}}$ by eliminating $\{\xi _{aj}\}$ from $\langle e_i,\partial _{u^a} \eta \rangle _{\bar{G}}=e_{ik}(\eta ) g^{kj}(\eta ) \xi _{aj}=0$ and $f_{aj}(\xi _{11},\ldots ,\xi _{pd})=0$.

Step 4:

Select $e_{1},\ldots ,e_p \in \mathbb {R}[\eta ]$ s.t. $e_1(\eta ),\ldots ,e_d(\eta )$ are linearly independent.

Step 5:

Eliminate $v$ from

$$ X=\eta (u,0)+\sum \nolimits _{i=p+1}^{d} v_{i-p} e_i+ c\cdot \sum \nolimits _{j=1}^p f_j(u,v) e_j $$

and compute $(X-\eta )^\top \tilde{e}_j$ and $h\in (\mathbb {R}[\eta ][X-\eta ]_3)^p$, given by Theorem 2.

Output(Vector version):

$$ X=\eta (u,0)+\sum \nolimits _{i=p+1}^{d} v_{i-p} e_i(\eta )+ c\cdot \sum \nolimits _{j=1}^p f_j(u,v) e_j(\eta ). $$

Output(Algebraic version):

$$ (X-\eta )^\top \tilde{e}+c \cdot h(X-\eta )=0. $$

6.3.7 Reduction of the Degree of the Estimating Equations

As we noted in Sect. 6.3.4, if we set $h_j=0$ for all $j$, the estimator becomes the MLE. In this sense, $c h_j$ can be recognized as a perturbation from the likelihood equations. If we select each $h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_3$ tactically, we can reduce the degree of the polynomial estimating equation. For algebraic background, the reader refers to Appendix A.

Here, we assume $u\in \mathbb {R}[\eta _u]$. For example, we can set $u_i=\eta _i$. Then $\tilde{e}_j(u,\eta _u)$ is a function of $\eta _u$, so we write it as $\tilde{e}_j(\eta )$. Define an ideal $\mathcal {I}_3$ of $\mathbb {R}[X,\eta ]$ as

$$ \mathcal {I}_3:=\langle \{(X_i-\eta _i)(X_j-\eta _j)(X_k-\eta _k)\mid 1\le i,j,k \le d \}\rangle . $$

Select a monomial order $\prec $ and set $\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d$. Let $G_{\prec }=\{g_1,\ldots ,g_m\}$ be a Gröbner basis of $\mathcal {I}_3$ with respect to $\prec $. Then the remainder (normal form)$r_j$ of $(X-\eta )^\top \tilde{e}_j(\eta )$, the first term of the l.h.s. of (6.3), with respect to $G_{\prec }$, is uniquely determined for each $j$.

Theorem 4

If the monomial order $\prec $ is the pure lexicographic,

1.
$r_j$ for $j=1,\ldots , p$ has degree at most 2 with respect to $\eta $, and
2.
$r_j=0$ for $j=1,\ldots , p$ are the estimating equations for a second-order efficient estimator.

Proof

Assume $r_j$ has a monomial term whose degree is more than 2 with respect to $\eta $ and represent the term as $\eta _a \eta _b \eta _c q(\eta ,X)$ with a polynomial $q\in \mathbb {R}(\eta ,X)$ and a combination of indices $a,b,c$. Then $\{\eta _a \eta _b \eta _c + (X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)$ has a smaller polynomial order than $\eta _a \eta _b \eta _c q(\eta ,X)$ since $\prec $ is pure lexicographic satisfying $\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d$. Therefore by subtracting $(X_a-\eta _a)(X_a-\eta _a)(X_a-\eta _a)\}q(\eta ,X)\in \mathcal {I}_3$ from $r_j$, the polynomial degree decreases. This contradicts the fact $r_j$ is the normal form so each $r_j$ has degree at most 2.

Furthermore each polynomial in $\mathcal {I}_3$ is in $\mathbb {R}[X,u,\eta _u][X-\eta ]_3$ and therefore by taking the normal form, the condition for the algebraic version (6.3) of second-order efficiency still holds. $\square $

The reduction of the degree is important when we use algebraic algorithms such as homotopy continuation methods [18] to solve simultaneous polynomial equations since computational cost depends highly on the degree of the polynomials.

6.4 First-Order Efficiency

It is not surprising that, for first-order efficiency, almost the same arguments hold as for second-order efficiency.

By Theorem 5.2 of [4], a consistent estimator is first-order efficient if and only if

$$\begin{aligned} g_{\kappa a}=0. \end{aligned}$$

(6.4)

Consider an algebraic estimator $\eta (u,v)\in \mathbb {R}[u,v]^d$ satisfying the following vector equation:

$$\begin{aligned} X=\eta (u,0)+\sum _{i=p+1}^{d} v_{i-p} e_i(u)+ c\cdot \sum _{j=1}^p f_j(u,v) e_j(u) \end{aligned}$$

(6.5)

where, for each $u$, $\{e_j(u); j=1,\ldots ,p\}\cup \{e_i(u); i=p+1,\ldots ,d\}$ is a complete basis of $\mathbb {R}^d$ s.t. $\langle e_j(u),(\bigtriangledown _u \eta )\rangle _g=0$ and $f_j(u,v)\in \mathbb {R}[u][v]_{\ge 2}$, a polynomial whose degree of $v$ is at least 2, for $j=1,\ldots ,p$. Similarly, $c\in \mathbb {R}$ is a constant for perturbation. Here, the only difference between (6.2) for the second-order efficiency and (6.5) for the first-order efficiency is the degree of the $f_j(u,v)$ with respect to $v$.

The algebraic version of the first-order efficient algebraic estimator is defined by the following simultaneous polynomial equalities with $\eta _u=\eta (u,0)$.

$$\begin{aligned} (X-\eta _u)^\top \tilde{e}_j(u,\eta _u)&+c\cdot h_j(X,u,\eta _u,X-\eta _u)=0 \text{ for } j=1,\ldots ,p \end{aligned}$$

(6.6)

where $\{\tilde{e}_j(u,\eta _u)\in \mathbb {R}[u,\eta _u]^d; j=1,\ldots ,p\} $ span $((\nabla _u \eta (u,0))^{\perp _{\bar{G}}})^{\perp _E}$ for every $u$ and $h_j(X,u,\eta _u,t) \in \mathbb {R}[X,u,\eta _u][t]_2$ ($\text{ degree }=2$ w.r.t. $t$) for $j=1,\ldots ,p$. Here, the only difference between (6.3) for the second-order efficiency and (6.6) for the first-order efficiency is the degree of the $h_j(X,u,\eta _u,t)$ with respect to $t$.

Then the relation between the three different forms of first-order efficiency can be proved in the same way manner as for Theorem 1, 2 and 3.

Theorem 5

(i) Vector version (6.5) satisfies the first-order efficiency.

(ii) An estimator defined by a vector version (6.5) of the first-order efficient estimators is also represented by an algebraic version (6.6).

(iii) Every algebraic version (6.6) gives a first-order efficient estimator.

The relationship between the three forms of the first-order efficient algebraic estimators is summarized as $(4) \Leftarrow (5) \Rightarrow (6) \Rightarrow (4)$. Furthermore, if we assume the estimator has a form $\eta \in \mathbb {R}(u)[v]$, the forms (6.4), (6.5) and (6.6) are equivalent.

Let $\mathcal {R}:=\mathbb {Z}[X,\eta ]$ and define

$$\begin{aligned} \mathcal {I}_2:=\langle \{(X_i-\eta _i)(X_j-\eta _j)\mid 1\le i,j \le d \}\rangle \end{aligned}$$

as an ideal of $\mathcal {R}$. In a similar manner, let $\prec $ be a monomial order such that $\eta _1\succ \cdots \succ \eta _d \succ X_1\succ \cdots \succ X_d$. Let $G_{\prec }=\{g_1,\ldots ,g_m\}$ be a Gröbner basis of $\mathcal {I}_2$ with respect to $\prec $. The properties of the normal form $r_i$ of $(X-\eta (u,0))^\top \tilde{e}_i(u)$ with respect to $G_{\prec }$ are then covered by the following:

Theorem 6

If the monomial order $\prec $ is the pure lexicographic,

(i) $r_i$ for $i=1,\ldots , d$ has degree at most 1 with respect to $\eta $, and

(ii) $r_i=0$ for $i=1,\ldots , d$ are the estimating equations for a first-order efficient estimator.

6.5 Examples

In this section, we show how to use the algebraic computation to design asymptotically efficient estimators for two simple examples. The examples satisfy the algebraic conditions (C1), (C2) and (C3) so it is verified that necessary geometric entities have an algebraic form as mentioned in Sect. 6.3.2.

6.5.1 Example: Periodic Gaussian Model

The following periodic Gaussian model shows how to compute second-order efficients estimators and their biases.

Statistical Model:
$$\begin{aligned} X\sim N(\mu , \varSigma (a))\, \text {with}\, \mu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\,\text {and}\, \varSigma (a)= \begin{bmatrix} 1&a&a^2&a \\ a&1&a&a^2 \\ a^2&a&1&a \\ a&a^2&a&1 \end{bmatrix}\, \text {for}\, 0\le a < 1. \end{aligned}$$
Here, the dimension of the full exponential family and the curved exponential family are $d=3$ and $p=1$, respectively.
Curved exponential family:
$$ \log f(x|\theta ) = 2 \left( x_1 x_2 + x_2 x_3 + x_3 x_4 + x_4 x_1\right) \theta _{{2}}+ 2\left( x_{{3}}x_{{1}}+x_{{4}}x_{{2}} \right) \theta _{{3}}-\psi (\theta ), $$
Potential function:
$$\begin{aligned} \psi (\theta )=~-~1/2\,\log ( {\theta _{{1}}}^{4}~-~4\,{\theta _{{1}}}^{2}{\theta _{{2} }}^{2}~+~8\,\theta _{{1}}{\theta _{{2}}}^{2}\theta _{{3}}~-~2\,{\theta _{{1}}} ^{2}{\theta _{{3}}}^{2}~-~4\,{\theta _{{2}}}^{2}{\theta _{{3}}}^{2}~+~{\theta _{{3}}}^{4} )~+~2\,\log ( 2\,\pi ), \end{aligned}$$
Natural parameter:
$$ \theta (a)=\left[ \frac{1}{1-2a^2+4a^4},-\frac{a}{1-2a^2+4a^4}, \frac{a^2}{1-2a^2+4a^4}\right] ^\top , $$
Expectation parameter: $\eta (a)=[-2,-4a,-2a^2]^\top ,$
Fisher metric with respect to $\eta $:
$$ (g^{ij})=\left[ {\small \begin{array}{ccc} 2\,{a}^{4}+4\,{a}^{2}+2&{}8\,a \left( 1+{a}^ {2} \right) &{}8\,{a}^{2}\\ 8\,a \left( 1+{a}^{2} \right) &{}4+24\,{a}^{2}+4\,{a}^{4}&{}8\,a \left( 1+{a}^{2} \right) \\ 8\,{a}^{2}&{}8\,a \left( 1+{a}^{2} \right) &{}2\,{a}^ {4}+4\,{a}^{2}+2\end{array} }\right] , $$
A set of vectors $e_i\in \mathbb {R}^3$:
$$ e_0(a):=[0,-1,a]^\top \in \partial _a \eta (a), $$

$$ e_1(a):=[3a^2+1,4a,0]^\top ,~ e_2(a):=[-a^2-1,0,2]^\top \in (\partial _a \eta (a))^{\perp _{\bar{G}}}. $$
A vector version of the second-order efficient estimator is, for example,
$$ x-\eta +v_1\cdot e_1 + v_2 \cdot e_2 +c \cdot v_1^3\cdot e_0=0. $$
A corresponding algebraic version of the second-order efficient estimator: by eliminating $v_1$ and $v_2$, we get $g(a)+c\cdot h(a)=0$ where
$$\begin{aligned} g(a):=8( a~-~1) ^{2} ( a~+~1) ^{2} ( 1~+~2{a}^{2}) ^{2} ( 4{a}^{5}~-8{a}^{3}~+~2{a}^{3}{ x_3}~-~3{ x_2}{a}^{2}~+~4a~+~4a{ x_1}~+~2a{ x_3}-{ x_2}) \end{aligned}$$
and
$$ h(a):=( 2{a}^{4}+{a}^{3}{ x_2}-{a}^{2}{ x_3}+2{a}^{2}+a{ x_2 }-2{ x_1}-{ x_3}-4) ^{3}. $$
An estimating equation for MLE:
$$ 4{a}^{5}-8{a}^{3}+2{a}^{3}{ x_3}-3{ x_2}{a}^{2}+4a+4a{ x_1}+2a{ x_3}-{ x_2}=0. $$
Bias correction term for an estimator ${\hat{a}}$: $\hat{a} ( {\hat{a}}^{8}-4{\hat{a}}^{6}+6{\hat{a}}^{4}-4{\hat{a}}^{2}+1 ) / ( 1+2{\hat{a}}^{2} ) ^{2}.$

6.5.2 Example: Log Marginal Model

Here, we consider a log marginal model. See [7] for more on marginal models.

Statistical model (Poisson regression):

$X_{ij} \displaystyle \mathop {\sim }^{ i.i.d} \mathrm{Po}(Np_{ij})$ s.t. $p_{ij}\in (0,1)$ for $i=1,2$ and $j=1,2,3$ with model constraints:
$$\begin{aligned} p_{11} + p_{12} + p_{13}&+p_{21} + p_{22} + p_{23} = 1,\nonumber \\ p_{11} + p_{12} + p_{13}&=p_{21} + p_{22} + p_{23},\nonumber \\ \displaystyle \frac{p_{11}/p_{21}}{p_{12}/p_{22}}&=\frac{p_{12}/p_{22}}{p_{13}/p_{23}}. \end{aligned}$$
(6.7)
Condition (6.7) can appear in a statistical test of whether acceleration of the ratio $p_{1j}/p_{2j}$ is constant.

In this case, $d=6$ and $p=3$.
Log density w.r.t. the point mass measure on $\mathbb {Z}_{\ge 0}^6$:
$$ \log f(x|p)=\log \left\{ \prod _{ij} e^{-Np_{ij}} (Np_{ij})^{X_{ij}}\right\} = -N + \sum _{ij} X_{ij} \log (Np_{ij}). $$
The full expectation family is given by
$$ \begin{bmatrix} X_{1}&X_{2}&X_{3} \\ X_{4}&X_{5}&X_{6} \\ \end{bmatrix} := \begin{bmatrix} X_{11}&X_{12}&X_{13} \\ X_{21}&X_{22}&X_{23} \\ \end{bmatrix} ,$$

$$ \begin{bmatrix} \eta _{1}&\eta _{2}&\eta _{3} \\ \eta _{4}&\eta _{5}&\eta _{6} \\ \end{bmatrix} = N \begin{bmatrix} p_{11}&p_{12}&p_{13} \\ p_{21}&p_{22}&p_{23} \\ \end{bmatrix} ,$$
$\theta ^i=\log (\eta _i)$ and $\psi (\theta )=N$.
The Fisher metric w.r.t. $\theta $: $g_{ij}=\frac{\partial ^2 \psi }{\partial \theta ^i \partial \theta ^j}=\delta _{ij}\eta _i$.
Selection of the model parameters:
$$ [u_1,u_2,u_3]:=[\eta _1,\eta _3,\eta _5] \,\mathrm{{and}}\, [v_1,v_2,v_3]:=[\eta _2,\eta _4,\eta _6]. $$
A set of vectors $e_i\in \mathbb {R}^6$:
$$ e_0:= \begin{bmatrix} {\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{-\eta _{2}^2 (\eta _{4}-\eta _{6})}\\{0}\\{-\eta _{3} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6}}\\{0}\\{\eta _{3} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6}} \end{bmatrix}\in (\nabla _u \eta ) , $$

$$\begin{aligned}{}[e_1,e_2,e_3]:&= \left[ \begin{bmatrix} \eta _1\\ \eta _2\\ \eta _3\\ 0\\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} {\eta _{1} (-\eta _{1} \eta _{5}^2+\eta _{3} \eta _{5}^2)}\\{\eta _{2} (-\eta _{1} \eta _{5}^2-2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (\eta _{2}^2 \eta _{4}-\eta _{2}^2 \eta _{6})}\\{\eta _{5} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})}\\{0} \end{bmatrix}, \begin{bmatrix} {\eta _{1} (\eta _{1} \eta _{5}^2-\eta _{3} \eta _{5}^2)}\\{\eta _{2} (\eta _{1} \eta _{5}^2+2 \eta _{2} \eta _{4} \eta _{6})}\\{0}\\{\eta _{4} (2 \eta _{1} \eta _{3} \eta _{5}+\eta _{2}^2 \eta _{6})}\\{0}\\{\eta _{6} (\eta _{2}^2 \eta _{4}+2 \eta _{1} \eta _{3} \eta _{5})} \end{bmatrix} \right] \\&\quad \in ((\nabla _u \eta )^{\perp _{\bar{G}}})^3 \end{aligned}$$
A vector version of the second-order efficient estimator is, for example,
$$ X-\eta +v_1\cdot e_1 + v_2 \cdot e_2 + v_3 \cdot e_3 + c \cdot v_1^3\cdot e_0=0. $$
The bias correction term of the estimator = 0.
A set of estimating equations for MLE:

{$x_{{1}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{1}}{\eta _{{2}}}^{2}\eta _{{4}}{ \eta _{{6}}}^{2}-x_{{2}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2}-2\,x_{{4}}\eta _{{1}}\eta _{{2}}\eta _{{4}}{\eta _{{6}}}^{2 }-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}+2\,x_{{6}}\eta _{{1}}\eta _{{2}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}$,

$-x_{{2}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}+x_{{2}}\eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}} }^{2}+x_{{3}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{3}}{\eta _{{2}}}^{2}\eta _{ {4}}{\eta _{{6}}}^{2}-x_{{4}}\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-2\,x_{{4}} \eta _{{2}}\eta _{{3}}\eta _{{4}}{\eta _{{6}}}^{2}+x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5} }}^{2}+2\,x_{{6}}\eta _{{2}}\eta _{{3}}{\eta _{{4}}}^{2}\eta _{{6}}$,

$-2\,x_{{4}}\eta _{{1}} \eta _{{3}}{\eta _{{5}}}^{2}\eta _{{6}}-x_{{4}}{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}+ x_{{5}}{\eta _{{2}}}^{2}{\eta _{{4}}}^{2}\eta _{{6}}-x_{{5}}{\eta _{{2}}}^{2}\eta _{{4}}{\eta _{{6}}}^{2}+2\,x_{{6}}\eta _{{1}}\eta _{{3}}\eta _{{4}}{\eta _{{5}}}^{2}+x_{{6}}{\eta _{{2 }}}^{2}\eta _{{4}}\eta _{{5}}\eta _{{6}}$,

$\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2} \eta _{{4}}\eta _{{6}}$, $\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}$, $-\eta _{{1} }-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1$}

The total degree of the equations is $5\times 5\times 5 \times 4\times 1\times 1= 500.$
A set of estimating equations for a 2nd-order efficient estimator with degree at most 2:

{\(-3\,x_{{1}}x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}+6\,x_{{1}}x_{{2}}{x_{{4 }}}^{2}x_{{6}}\eta _{{6}}+x_{{1}}x_{{2}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{ {1}}x_{{2}}{x_{{4}}}^{2}{\eta _{{6}}}^{2}+3\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}} }^{2}\eta _{{2}}-6\,x_{{1}}x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{1}}x_ {{2}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{1}}x_{{2}}x_{{4}}x_{{6}}\eta _{{2 }}\eta _{{6}}-x_{{1}}x_{{2}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{1}}x_{{2}}{ x_{{6}}}^{2}{\eta _{{4}}}^{2}+3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}} -2\,x_{{1}}x_{{3}}x_{{4}}x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5 }}}^{2}x_{{6}}\eta _{{4}}+2\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}+x_ {{1}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2\,x_{{1}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{1}}x_{{4}}{x_{{5}}}^{2}\eta _{{3}}\eta _{{6}}-x_{{1}}x_{{4}} {x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{1}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4} }+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}}}^{2}{x_{{4}}}^ {2}x_{{6}}\eta _{{1}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{1}}\eta _{{6}}-3\,{x_{{2}} }^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}-2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}} \eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{1}}\eta _{{6}}+{x_{{2}}}^{2}{x_{ {6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{2}}-2\,x _{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{1}}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _ {{1}}\eta _{{2}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{1}}\eta _{{4}}-x_{{3}}x_{{4 }}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{4} }\),

\(3\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-2\,x_{{1}}x_{{3}}x_{{4} }x_{{5}}\eta _{{5}}\eta _{{6}}-3\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+2 \,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-x_{{1}}x_{{4}}{x_{{5}}}^{ 2}\eta _{{3}}\eta _{{6}}+x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{x_{{2}} }^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{3}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{3}}\eta _{{ 6}}-3\,{x_{{2}}}^{2}x_{{4}}{x_{{6}}}^{2}\eta _{{3}}-2\,{x_{{2}}}^{2}x_{{4} }x_{{6}}\eta _{{3}}\eta _{{4}}+2\,{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{3}}\eta _{{6}}+{x _{{2}}}^{2}{x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}-3\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x _{{6}}\eta _{{2}}+6\,x_{{2}}x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{6}}+x_{{2}}x_{{ 3}}{x_{{4}}}^{2}\eta _{{2}}\eta _{{6}}-2\,x_{{2}}x_{{3}}{x_{{4}}}^{2}{\eta _{{6}}} ^{2}+3\,x_{{2}}x_{{3}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}-6\,x_{{2}}x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{4}}+2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{4}} -2\,x_{{2}}x_{{3}}x_{{4}}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{2}}x_{{3}}{x_{{6}}} ^{2}\eta _{{2}}\eta _{{4}}+2\,x_{{2}}x_{{3}}{x_{{6}}}^{2}{\eta _{{4}}}^{2}-x_{{2}} {x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{3}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{3 }}\eta _{{6}}+x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{3}}+2\,x_{{2}}x_{{4}}{ x_{{6}}}^{2}\eta _{{3}}\eta _{{4}}+x_{{3}}{x_{{4}}}^{2}x_{{6}}{\eta _{{2}}}^{2}-2 \,x_{{3}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{5}}}^{2 }\eta _{{1}}\eta _{{6}}-x_{{3}}x_{{4}}{x_{{6}}}^{2}{\eta _{{2}}}^{2}+2\,x_{{3}}x_{ {4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{4}}+x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}}\eta _{{ 4}}\),

\(6\,x_{{1}}x_{{3}}x_{{4}}{x_{{5}}}^{2}\eta _{{6}}-4\,x_{{1}}x_{{3}}x_{{ 4}}x_{{5}}\eta _{{5}}\eta _{{6}}-6\,x_{{1}}x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{4}}+ 4\,x_{{1}}x_{{3}}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{5}}-2\,x_{{1}}x_{{4}}{x_{{5} }}^{2}\eta _{{3}}\eta _{{6}}+2\,x_{{1}}{x_{{5}}}^{2}x_{{6}}\eta _{{3}}\eta _{{4}}+3\,{ x_{{2}}}^{2}{x_{{4}}}^{2}x_{{6}}\eta _{{5}}-{x_{{2}}}^{2}{x_{{4}}}^{2}\eta _{{ 5}}\eta _{{6}}-3\,{x_{{2}}}^{2}x_{{4}}x_{{5}}x_{{6}}\eta _{{4}}+3\,{x_{{2}}}^{ 2}x_{{4}}x_{{5}}x_{{6}}\eta _{{6}}+{x_{{2}}}^{2}x_{{4}}x_{{5}}\eta _{{4}}\eta _{{6 }}-{x_{{2}}}^{2}x_{{4}}x_{{5}}{\eta _{{6}}}^{2}-3\,{x_{{2}}}^{2}x_{{4}}{x_ {{6}}}^{2}\eta _{{5}}-{x_{{2}}}^{2}x_{{4}}x_{{6}}\eta _{{4}}\eta _{{5}}+{x_{{2}}}^ {2}x_{{4}}x_{{6}}\eta _{{5}}\eta _{{6}}+{x_{{2}}}^{2}x_{{5}}x_{{6}}{\eta _{{4}}}^{ 2}-{x_{{2}}}^{2}x_{{5}}x_{{6}}\eta _{{4}}\eta _{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2 }\eta _{{4}}\eta _{{5}}-2\,x_{{2}}{x_{{4}}}^{2}x_{{6}}\eta _{{2}}\eta _{{5}}+2\,x_{{2} }x_{{4}}x_{{5}}x_{{6}}\eta _{{2}}\eta _{{4}}-2\,x_{{2}}x_{{4}}x_{{5}}x_{{6}}\eta _ {{2}}\eta _{{6}}+2\,x_{{2}}x_{{4}}{x_{{6}}}^{2}\eta _{{2}}\eta _{{5}}-2\,x_{{3}}x_ {{4}}{x_{{5}}}^{2}\eta _{{1}}\eta _{{6}}+2\,x_{{3}}{x_{{5}}}^{2}x_{{6}}\eta _{{1}} \eta _{{4}}\),

$\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}}$, $\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}$ , $-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+1$}.

The total degree of the polynomial equations is $32$.
A set of estimating equations for a first-order-efficient estimator with degree at most 1:

{$-{x_{{5}}}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4} }x_{{1}}x_{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{1}}x_{{2}}x_{{4}}-2\,{x_{{4 }}}^{2}\eta _{{6}}x_{{1}}x_{{2}}x_{{6}}-{x_{{6}}}^{2}x_{{1}}x_{{2}}\eta _{{2}} x_{{4}}+{x_{{4}}}^{2}x_{{1}}x_{{2}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6} }}^{2}\eta _{{1}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{1}}x_{{6}} ,$

$ -{x_{{5} }}^{2}x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x _{{3}}+2\,{x_{{6}}}^{2}\eta _{{4}}x_{{2}}x_{{3}}x_{{4}}-2\,{x_{{4}}}^{2}\eta _ {{6}}x_{{2}}x_{{3}}x_{{6}}-{x_{{6}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{4}}+{ x_{{4}}}^{2}x_{{2}}x_{{3}}\eta _{{2}}x_{{6}}+{x_{{2}}}^{2}{x_{{6}}}^{2}\eta _{ {3}}x_{{4}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{3}}x_{{6}} ,$

$ -2\,{x_{{5}}}^{2} x_{{4}}\eta _{{6}}x_{{1}}x_{{3}}+2\,{x_{{5}}}^{2}x_{{6}}\eta _{{4}}x_{{1}}x_{{ 3}}-x_{{4}}x_{{6}}x_{{5}}{x_{{2}}}^{2}\eta _{{6}}+x_{{4}}x_{{5}}{x_{{2}}}^ {2}\eta _{{4}}x_{{6}}-{x_{{4}}}^{2}{x_{{2}}}^{2}\eta _{{5}}x_{{6}}+x_{{4}}{x_{ {6}}}^{2}{x_{{2}}}^{2}\eta _{{5}} ,$

$\eta _{{1}}\eta _{{3}}{\eta _{{5}}}^{2}-{\eta _{{2}}}^{2}\eta _{{4}}\eta _{{6}},$ $\eta _{{1}}+\eta _{{2}}+\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}$ , $-\eta _{{1}}-\eta _{{2}}-\eta _{{3}}-\eta _{{4}}-\eta _{{5}}-\eta _{{6}}+6$}.

The estimating equations for a second-order-efficient estimator above look much more complicated than the estimating equation for the MLE, but each term of the first three polynomials are at most degree 2. Thanks to this degree reduction, the computational costs for the estimates become much smaller as we will see in Sect. 6.6.

6.6 Computation

To obtain estimates based on the method of this paper, we need fast algorithms to find the solution of polynomial equations. The authors have carried out computations using homotopy continuation method (matlab program HOM4PS2 by Lee, Li and Tsuai [18]) for the log marginal model in Sect. 6.5.2 and a data $\bar{X}=(1,1,1,1,1,1)$.

The run time to compute each estimate on a standard laptop (Intel(R) Core (TM) i7-2670QM CPU, 2.20 GHz, 4.00 GB memory) is given by Table 6.1. The computation is repeated 10 times and the averages and the standard deviations are displayed. Note the increasing of the speed for the second-order efficient estimators is due to the degree reduction technique. The term “path” in the table heading refers to a primitive iteration step within the homotopy method. In the faster polyhedron version, the solution region is subdivided into polyhedral domains.

Table 6.1 Computational time for each estimate by the homotopy continuation methods

Full size table

Figure 6.3 shows the mean squared error and the computational time of the MLE, the first-order estimator and the second-order efficient estimator of Sect. 6.5.2. The true parameter is set $\eta ^*=(1/6,1/4,1/12,1/12,1/4,1/6)$, a point in the model manifold, and $N$ random samples are generated i.i.d. from the distribution with the parameter. The computation is repeated for exponentially increasing sample sizes $N=1,\ldots ,10^5$. In general, there are multiple roots for polynomial equations and here we selected the root closest to the sample mean by the Euclidean norm. Figure 6.3(1) also shows that the mean squared error is approximately the same for the three estimators, but (2) shows that the computational time is much more for the MLE.

6.7 Discussion

In this paper we have concentrated on reduction of the polynomial degree of the estimating equations and shown the benefits in computation of the solutions. We do not expect the estimators to be closed form, such as a rational polynomial form in the data. The most we can expect is they are algebraic, that is they are the solution of algebraic equations. They lie on a zero dimensional algebraic variety. It is clear that there is no escape from using mixed symbolic-numerical methods. In algebraic statistics the number of solution of the ML equation is called the ML degree. Given that we have more general estimating equations than pure ML equations this points to an extended theory or “quasi” ML degree of efficient estimator degree. The issue of exactly which solution to use as our estimators persists. In the paper we suggest taking the solution closest to the sufficient statistic in the Euclidian metric. We could use other metrics and more theory is needed.

Here we have put forward estimating equations with reduced degree and shown the benefits in terms of computation. But we could have used other criteria for choosing the equations, while remaining in the efficient class. We might prefer to choose an equation which reduces the bias further via decreasing the next order term. There may thus be some trade off between degree and bias.

Beyond the limited ambitions of this paper to look at second-order efficiency lie several other areas, notably hypothesis testing and model selection. But the question is the same: to what extent can we bring the algebraic methods to bear, for example by expressing additional differential forms and curvatures in algebraic terms. Although estimation typically requires a mixture of symbolic and numeric methods in some cases only the computation of the efficient estimate requires numeric procedures and the other computations can be carrying out symbolically.

References

Adler, R.J., Taylor, J.E.: Random Fields and Geometry. Springer Monographs in Mathematics. Springer, New York (2007)
Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry, vol. 191. American Mathematical Society, Providence (2007)
Google Scholar
Amari, S.: Differential geometry of curved exponential families-curvatures and information loss. Ann. Stat. 10, 357–385 (1982)
Google Scholar
Amari, S.: Differential-Geometrical Methods in Statistics. Springer, New York (1985)
Google Scholar
Amari, S., Kumon, M.: Differential geometry of Edgeworth expansions in curved exponential family. Ann. Inst. Stat. Math. 35(1), 1–24 (1983)
MATH MathSciNet Google Scholar
Andersson, S., Madsen, J.: Symmetry and lattice conditional independence in a multivariate normal distribution. Ann. Stat. 26(2), 525–572 (1998)
MATH MathSciNet Google Scholar
Bergsma, W.P., Croon, M., Hagenaars, J.A.: Marginal Models for Dependent, Clustered, and Longitudinal Categorical Data. Springer, New York (2009)
Google Scholar
Buchberger, B.: Bruno Buchberger’s PhD thesis 1965: an algorithm for finding the basis elements of the residue class ring of a zero dimensional polynomial ideal. J. Symbol. Comput. 41(3), 475–511 (2006)
Google Scholar
Cox, D.A., Little, J., O’Shea, D.: Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra, 3/e (Undergraduate Texts in Mathematics). Springer, New York (2007)
Google Scholar
Dawid, A.P.: Further comments on some comments on a paper by Bradley Efron. Ann. Stat. 5(6), 1249 (1977)
Article MATH MathSciNet Google Scholar
Drton, M., Sturmfels B., Sullivant, S.: Lectures on Algebraic Statistics. Springer, New York (2009)
Google Scholar
Drton, M.: Likelihood ratio tests and singularities. Ann. Stat. 37, 979–1012 (2009)
Google Scholar
Efron, B.: Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Stat. 3, 1189–1242 (1975)
Article MATH MathSciNet Google Scholar
Gehrmann, H., Lauritzen, S.L.: Estimation of means in graphical Gaussian models with symmetries. Ann. Stat. 40(2), 1061–1073 (2012)
Article MATH MathSciNet Google Scholar
Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P.: Algebraic and Geometric Methods in Statistics. Cambridge University Press, Cambridge (2009)
Google Scholar
Huber, B., Sturmfels, B.: A polyhedral method for solving sparse polynomial systems. Math. Comput. 64, 1541–1555 (1995)
Article MATH MathSciNet Google Scholar
Kuriki, S., Takemura, A.: On the equivalence of the tube and Euler characteristic methods for the distribution of the maximum of Gaussian fields over piecewise smooth domains. Ann. Appl. Probab. 12(2), 768–796 (2002)
Article MATH MathSciNet Google Scholar
Lee, T.L., Li, T.Y., Tsai, C.H.: HOM4PS2.0: a software package for solving polynomial systems by the polyhedral homotopy continuation method. Computing 83(2–3), 109–133 (2008)
Article MATH MathSciNet Google Scholar
Li, T.Y.: Numerical solution of multivariate polynomial systems by homotopy continuation methods. Acta Numer 6(1), 399–436 (1997)
Article Google Scholar
Naiman, D.Q.: Conservative confidence bands in curvilinear regression. Ann. Stat. 14(3), 896–906 (1986)
Article MATH MathSciNet Google Scholar
Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23, 1543–1561 (1995)
Google Scholar
Pistone, G., Wynn, H.P.: Generalised confounding with Gröbner bases. Biometrika 83, 653–666 (1996)
Article MATH MathSciNet Google Scholar
Rao, R.C.: Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37(3), 81–91 (1945)
MATH MathSciNet Google Scholar
Verschelde, J.: Algorithm 795: PHCpack: a general-purpose solver for polynomial systems by homotopy continuation. ACM Trans. Math. Softw. (TOMS) 25(2), 251–276 (1999)
Article MATH Google Scholar
Watanabe, S.: Algebraic analysis for singular statistical estimation. In: Watanabe, O., Yokomori, T. (eds.) Algorithmic Learning Theory. Springer, Berlin (1999)
Google Scholar
Weyl, H.: On the volume of tubes. Am. J. Math. 61, 461–472 (1939)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This paper has benefited from conversations with and advice from a number of colleagues. We should thank Satoshi Kuriki, Tomonari Sei, Wicher Bergsma and Wilfred Kendall. The first author acknowledges support by JSPS KAKENHI Grant 20700258, 24700288 and the second author acknowledges support from the Institute of Statistical Mathematics for two visits in 2012 and 2013 and from UK EPSRC Grant EP/H007377/1. A first version of this paper was delivered at the WOGAS3 meeting at the University of Warwick in 2011. We thank the sponsors. The authors also thank the referees of the short version in GSI2013 and the referees of the first long version of the paper for insightful suggestions.

Author information

Authors and Affiliations

The Institute of Statistical Mathematics, 10-3, Midori-cho, Tachikawa, Tokyo, Japan
Kei Kobayashi
London School of Economics, London, UK
Henry P. Wynn

Authors

Kei Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Henry P. Wynn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kei Kobayashi .

Editor information

Editors and Affiliations

Laboratoire d'Informatique (LIX), Polytechnique School, Palaiseau Cedex, France
Frank Nielsen

Appendices

A Normal Forms

A basic text for the materials in this section is [9]. The rapid growth of modern computational algebra can be credited to the celebrated Buchberger’s algorithm [8].

A monomial ideal $I$ in a polynomial ring $K[x_1,\ldots ,x_n]$ over a field $K$ is an ideal for which there is a collection of monomials $f_1,\ldots , f_m$ such that any $g \in I$ can be expressed as a sum

$$ g = \sum _{i=1}^m g_i(x)f_i(x) $$

with some polynomials $g_i\in K[x_1,\ldots ,x_n]$. We can appeal to the representation of a monomial $x^{\alpha }=x_1^{\alpha _1}\ldots x_n^{\alpha _n}$ by its exponent $\alpha =(\alpha _1,\ldots ,\alpha _n)$. If $\beta \ge 0$ is another exponent then

$$x^{\alpha } x^{\beta } = x^{\alpha + \beta },$$

and $\alpha + \beta $ is in the positive (shorthand for non-negative) “orthant” with corner at $\alpha $. The set of all monomials in a monomial ideal is the union of all positive orthants whose “corners” are given by the exponent vectors of the generating monomial $f_1, \ldots , f_m$. A monomial ordering written $x^{\alpha } \prec x^{\beta }$ is a total (linear) ordering on monomials such that for $\gamma \ge 0$, $x^{\alpha } \prec x^{\beta } \Rightarrow x^{\alpha +\gamma } \prec x^{\beta + \gamma }$. Any polynomial $f(x)$ has a leading terms with respect to $\prec $, written $LT(f)$.

There are, in general, many ways to express a given ideal $I$ as being generated from a basis $I = \langle f_1,\ldots , f_m\rangle $. That is to say, there are many choices of basis. Given an ideal $I$ a set $\{g_1, \ldots g_m\}$ is called a Gröbner basis (G-basis) if:

$$\langle LT(g_1), \ldots , LT(g_m)\rangle \; = \; \langle LT(I)\rangle ,$$

where $\langle LT(I)\rangle $ is the ideal generated by all the monomials in $I$. We sometimes refer to $\langle LT(I) \rangle $ as the leading term ideal. Any ideal $I$ has a Gröbner basis and any Gröbner basis in the ideal is a basis of the ideal.

Given a monomial ordering and an ideal expressed in terms of the G-basis, $I\;=\; \langle g_1,\ldots , g_m\rangle $, any polynomial $f$ has a unique remainder with respect the quotient operation $K[x_1, \ldots , x_k]/I$. That is

$$f = \sum _{i=1}^m s_i(x)g_i(x) + r(x).$$

We call the remainder $r(x)$ the normal form of $f$ with respect to $I$ and write $NF(f)$. Or, to stress the fact that it may depend on $\prec $, we write $NF(f, \prec )$. Given a monomial ordering $\prec $, a polynomial $f=\sum _{\alpha \in L} \theta _{\alpha } x^{\alpha }$ for some $L$ is a normal form with respect to $\prec $ if $x^{\alpha } \notin \langle LT(f) \rangle $ for all $\alpha \in L$. An equivalent way of saying this is: given an ideal $I$ and a monomial ordering $\prec $, for every $f \in K[x_1,\ldots ,x_k]$ there is a unique normal form $NF(f)$ such that $f-NF(f) \in I$.

B Homotopy Continuation Method

Homotopy continuation method is an algorithm to find the solutions of simultaneous polynomial equations numerically. See, for example, [19, 24] for more details of the algorithm and theory.

We will explain the method briefly by a simple example of 2 equations with 2 unknowns

$$ \text {Input:} f, g \in \mathbb {R}[x,y] $$

$$ \text {Output: The solutions of} \,f(x,y)=g(x,y)=0. $$

Step 1 :

Select arbitrary polynomials of the form:

$$\begin{aligned} f_0(x,y):=f_0(x):=a_1x^{d_1}-b_1&=0,\nonumber \\ g_0(x,y):=g_0(y):=a_2y^{d_2}-b_2&=0 \end{aligned}$$

(6.8)

where $d_1= \deg (f)$ and $d_2=\deg (g)$. Polynomial equations in this form are easy to solve.

Step 2 :

Take the convex combinations:

$$\begin{aligned} f_t(x,y):=\,&t f(x,y)+ (1-t) f_0(x,y),\\ g_t(x,y):=\,&t g(x,y)+ (1-t) g_0(x,y) \end{aligned}$$

then our target becomes the solution for $t=1$.

Step 3 :

Compute the solution for $t=\delta $ for small $\delta $ by the solution for $t=0$ numerically.

Step 4 :

Repeat this until we obtain the solution for $t=1$.

Figure 6.4 shows a sketch of the algorithm. This algorithm is called the (linear) homotopy continuation method and justified if the path connects $t=0$ and $t=1$ continuously without an intersection. That can be proved for almost all $a$ and $b$. See [19].

For each computation for the homotopy continuation method, the number of the paths is the number of the solutions of (6.8). In this case, the number of paths is $d_1 d_2$. In general case with $m$ unknowns, it becomes $\prod _{i=1}^m d_i$ and this causes a serious problem for computational cost. Therefore decreasing the degree of second-order efficient estimators plays an important role for the homotopy continuation method.

Note that in order to solve this computational problem, the authors of [16] proposed the nonlinear homotopy continuation methods (or the polyhedral continuation methods). But as we can see in Sect. 6.5.2, the degree of the polynomials still affects the computational costs.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kobayashi, K., Wynn, H.P. (2014). Computational Algebraic Methods in Efficient Estimation. In: Nielsen, F. (eds) Geometric Theory of Information. Signals and Communication Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-05317-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-05317-2_6
Published: 09 May 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05316-5
Online ISBN: 978-3-319-05317-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Computational Algebraic Methods in Efficient Estimation

Abstract

Similar content being viewed by others

Asymptotically Efficient Estimators for Algebraic Statistical Manifolds

Recent advances in algebraic geometry and Bayesian statistics

Computational Information Geometry in Statistics: Foundations

Keywords

6.1 Introduction

6.2 Statistical Manifolds and Efficiency of Estimators

6.2.1 Exponential Family and Estimators

6.2.2 Differential Geometrical Entities

6.2.3 Asymptotic Statistical Inference Theory

6.3 Algebraic Models and Efficiency of Algebraic Estimators

6.3.1 Algebraic Curved Exponential Family

6.3.2 Algebraic Estimators

6.3.3 Second-Order Efficient Algebraic Estimators, Vector Version

Theorem 1

6.3.4 Second-Order Efficient Algebraic Estimators, Algebraic Version

Theorem 2

Proof

Theorem 3

Proof

Corollary 1

6.3.5 Properties of the Estimators

Proposition 1

Proof

6.3.6 Summary of Estimator Construction

6.3.7 Reduction of the Degree of the Estimating Equations

Theorem 4

Proof

6.4 First-Order Efficiency

Theorem 5

Theorem 6

6.5 Examples

6.5.1 Example: Periodic Gaussian Model

6.5.2 Example: Log Marginal Model

6.6 Computation

6.7 Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Normal Forms

B Homotopy Continuation Method

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation