Abstract
In this paper, we investigate the least squares (LS) estimator of the nonlinear regression model based on the extended negatively dependent errors which are widely dependent structures. Under the general conditions, we establish some large deviation results for the LS estimator of the nonlinear regression parameter, which can be applied to obtain a weak uniform consistency and a complete convergence rate for this estimator. In addition, some examples and simulations are presented for illustration.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 The nonlinear regression model
First, we consider the nonlinear regression model for the observations \(X^n:=\left( X_1,X_2,\ldots ,X_n\right) \):
where the \(f_t\) are known continuous functions on a parameter set \(\varTheta \subset \mathscr {R}^k\), the \(\varepsilon _t\) are random errors and the \(\theta \in \varTheta \) is the true value of the parameter. Denote
Let \(\hat{\theta }_n(X_1,X_2,\ldots ,X_n)\) denote the least squares (LS) estimator of the parameter \(\theta \in \varTheta \) such as
The LS method plays a central role in the inference of parameters in nonlinear regression models. The study of asymptotic properties of the LS estimator for parameters in nonlinear regression models has been the main subject of investigation, since it is, in general, difficult to obtain the exact distribution of the LS estimator for any fixed sample. For the LS estimator of the nonlinear model based on the independent identically distributed (i.i.d.) errors, Jennrich (1969) presented the asymptotic normality, Malinvaud (1970) obtained the consistency, and Wu (1981) established the necessary and sufficient condition for the strong consistency, etc. In addition, for the the nonlinear model based on the independent but not necessarily identically distributed errors, Bunke and Schmidt (1980) established the strong consistency and asymptotic normality for the weighted LS estimator, Ibragimov and Has’minskii (1981) obtained some large deviation results of the maximum likelihood (ML) estimator, Sieders and Dzhaparidze (1987) extended the results of Ibragimov and Has’minskii (1981) to the M-estimator and gave its application to the main results of large deviation for the LS estimator. For the LS estimator of the nonlinear model, Prakasa Rao (1984a) extended the result of Ibragimov and Has’minskii (1981) to the case of i.i.d. Gaussian errors, Hu (1993) extended the results of Prakasa Rao (1984a) and Sieders and Dzhaparidze (1987) to the cases of locally generalized Gaussian errors, martingale differences, etc.
As far as we know, there is no large deviation result of the LS estimator of model (1) based on the extended negatively dependent (END, Liu 2009) errors. The END sequences are widely dependent structures which cover several negative dependent sequences such as negatively orthant dependent (NOD, Lehmann 1966), negatively superadditive dependent (NSD, Hu 2000) and negatively associated (NA, Joag-Dev and Proschan 1983). Based on the M estimator of Sieders and Dzhaparidze (1987), we obtain the main result of large deviation such as Theorems 2.1–2.4 and Corollary 2.1 for the LS estimator \(\hat{\theta }_n\) of \(\theta \in \mathscr {R}^k\) in model (1), which can be applied to establish a weak uniform consistency and a complete convergence rate.
Now, we recall the M-estimator. Let \(\mathscr {E}^{(n)}=\{\mathscr {X}^{(n)},\mathscr {U}^{(n)},P_\theta ^{(n)},\theta \in \varTheta \}\) be a family of probability spaces, where the \(P_\theta ^{(n)}\) does not necessarily have known form. The parameter set \(\varTheta \) is a Borel subset of k-dimensional Euclidean space. We shall consider the M-estimator maximizing an M-functional \(C_n:\mathscr {X}^{(n)}\times \varTheta \rightarrow [0,\infty )\), which is assumed to be, for all \(X^n\in \mathscr {X}^{(n)}\), a positive continuous function of \(\theta \) and, for all \(\theta \in \varTheta \), a measurable functional of \(X^n\).
Throughout the paper, we assume that, for all \(\theta \in \varTheta \) and \(P_\theta ^{(n)}\)-almost all \(X^n\), a solution \(\hat{\theta }_n\) to the equation
exists (this is certainly true if \(\varTheta \) is compact). So \(\hat{\theta }_n\) is called the M-estimator of \(\theta \). Especially, the LS estimator \(\hat{\theta }_n\) maximizes the M-functional
For all \(n\in \mathcal {N}\) and \(\theta \in \varTheta \subset \mathscr {R}^k\), let \(u\in \mathscr {R}^k, \phi _n(\theta )\) be a nonsingular \(k\times k\) matrix and define the normalized M-ratio
which, for fixed observation \(X^n\), is a continuous, nonnegative finite function on the set
Throughout the paper, for a matrix \(A_{m\times n}, |A_{m\times n}|\) denotes its norm. Define
where \(\bar{U}_{n,\theta }\) is a closure of \(U_{n,\theta }\).
Similar to Theorem 1.5.1 of Ibragimov and Has’minskii (1981) and Theorem 2.1 of Sieders and Dzhaparidze (1987), we define the following sets of functions.
\(\mathbf G \) is the set of all functions \(g_n(\cdot )\) possessing the following properties:
(i) for fixed n, \(g_n(\cdot )\) is a function on \([0,\infty )\) monotonically increasing to infinity;
(ii) for all \(N>0\),
Remark 1.1
If \(g_n(R)=R^{\alpha }\) and \(\alpha >0\), then \(g_n\in \mathbf G \).
Let K be a measurable subset of \(\varTheta \) and \(\mathbf H _K\) be the set of all functions \(\eta _{n,\theta }(\cdot )\) possessing the following properties:
(iii) for fixed n and \(\theta \in \varTheta \), \(\eta _{n,\theta }(\cdot )\) is a function \(U_{n,\theta }\rightarrow (0,\infty )\);
(vi) there exists a polynomial pol\(_K(R)\) in R such that for R and n sufficiently large,
For each n and \(\theta \), let \(\zeta _{n,\theta }:[0,\infty )\rightarrow \mathscr {R}\) be a monotonically nondecreasing continuous function and define the random function
As a generalization of Theorem 1.5.1 of Ibragimov and Has’minskii (1981), Sieders and Dzhaparidze (1987, Theorem 2.1) obtained a large deviation result for the M-estimator as follows.
Theorem 1.1
Let the functionals \(\zeta _{n,\theta }(u)\) process the following properties: given a measurable subset \(K\subset \varTheta \subset \mathscr {R}^k\), there correspond to it numbers m and \(\alpha \), where \(m\ge \alpha >k\), functions \(g_{n}\in \mathbf G \) and \(\eta _{n,\theta } \in \mathbf H _K\), and a polynomial pol\(_K(R)\) in R such that, for all R and n large enough,
Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough,
The constant \(b_0\) can be made arbitrarily close to \((\alpha -k)/(\alpha -k+mk)\) by choosing \(B_0\) large enough.
Remark 1.2
In view of Sieders and Dzhaparidze (1987), the condition (8) can be replaced by the following condition
for all \(\theta \in K\) and \(u \in {\varGamma }_{n,\theta , R}\), where C is a positive constant independent of n and \(\theta \).
Ibragimov and Has’minskii (1981, Theorem 1.5.1) obtained the large deviation (9) for the ML estimator. Under the i.i.d. Gaussian errors, Prakasa Rao (1984a) obtained the result of LS estimator \(\hat{\theta }_n\) such that for all \(\rho >0\) and \(n\ge 1\)
where K is a compact in \(\varTheta \subset \mathscr {R}, B\) and b are some positive constants. Hu (1993) extended (9) to the locally generalized Gaussian and martingale differences cases. In addition, Ivanov (1976) investigated the LS estimator \(\hat{\theta }_n\) of model (1) based on the i.i.d. errors. Assume that there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}\),
By (11) and some other conditions, Ivanov (1976) presented that for all \(\rho >0\) and all \(n\ge 1\),
where p is a positive constant such as \(p\ge 2\), and C is a positive constant independent of n and \(\rho \). Prakasa Rao (1984b) extended (12) to the dependent cases of \(\varphi \)-mixing and \(\alpha \)-mixing errors. Under some general conditions and \(\sup \nolimits _{n\ge 1} E|\varepsilon _n|^p<\infty \) for some \(p>2\), Hu (2002) also obtained (12) and gave some applications to the dependent cases of martingale differences, \(\varphi \)-mixing sequence and NA sequence. By the condition \(\sup \nolimits _{n\ge 1} E|\varepsilon _n|^p<\infty \) for some \(1<p\le 2\), Hu (2004) established that
for all \(\rho >0\), \(n\ge 1\) and some \(C>0\), which was also applied to some dependent errors. In view of (12) and (13), by some moment information of errors, Yang and Hu (2014) obtained some similar results of (12) and (13), which can be used in some case satisfying \(\sup \nolimits _{n\ge 1}E|\xi _n|^p=\infty \) for some \(p>1\).
For more works on the nonlinear regression models, one can refer to Ivanov and Leonenko (1989) and Ivanov (1997) for some basic asymptotic theories, Midi (1999) for the robustness of weighted LS estimator under i.i.d. errors with mean zero and unknown variance \(\sigma ^2\), Ivanov and Leonenko (2008) for the consistency and asymptotic distribution theory of LS estimator under long-range-dependent noise, etc.
Due to the importance of END random variables and the LS estimator of a nonlinear regression parameter, we investigate the LS estimator \(\hat{\theta }_n\) for the model (1) based on the END errors which are not necessarily identically distributed. With the techniques of some exponential inequalities of END random variables giving by Sect. 5, we obtain the large deviation results for the LS estimator \(\hat{\theta }_n\), which can be applied to get a weak uniform consistency and a complete convergence rate \(\hat{\theta }_n-\theta =O(n^{-1/2}\log ^{1/2} n)\), completely (see our results in Sect. 2). Some examples and simulations for the nonlinear models are illustrated in Sect. 3, and the conclusions are presented in Sect. 4. Last, we give the proofs in Sect. 5.
1.2 The concept of END random variables
In this subsection, let us recall the concept of END random variables which was introduced by Liu (2009).
Definition 1.1
We call random variables \(\left\{ Z_n,n\ge 1\right\} \) to be END if there exists a constant \(M>0\) such that both
and
hold for each \(n\ge 1\) and all real numbers \(z_1,z_2,\ldots ,z_n\).
If \(\left\{ Z_n,n\ge 1\right\} \) is a sequence of END random variables, then for any fixed \(m\ge 1, \left\{ Z_{n+m},n\ge 1\right\} \) is also a sequence of END random variables with the same dominating coefficient M. In fact, by Definition 1.1 and the continuity of probability, one can get this property.
Let \(\left\{ Z_n,n\ge 1\right\} \) be a sequence of random variables. For some \(1\le i\le n\), if \(P\left( Z_i\le z_i\right) =0\), then \(P\left( Z_1\le z_1, Z_2\le z_2,\ldots ,Z_n\le z_n\right) =0\). Similarly, if for some \(1\le i\le n\), \(P \left( Z_i> z_i\right) =0\), then \(P\left( Z_1> z_1, Z_2> z_2,\ldots ,Z_n>z_n\right) =0\). Define \(\frac{0}{0}=1\). If
and
then we take \(M=\max \{M_1,M_2\}\) in Definition 1.1 and obtain that \(\{Z_n,n\ge 1\}\) are END random variables. Obviously, for all \(1\le i \le n\), let \(z_i=-\infty \) or \(z_i=+\infty \) in Definition 1.1, it is easy to see that the dominating coefficient \(M\ge 1\).
Moreover, for any \(n\ge 1\), let \(Z_1,Z_2,\ldots ,Z_n\) be dependent according to a multivariate copula function \(C(u_1,\ldots ,u_n)\) with absolutely continuous distribution functions \(F_1,\ldots ,F_n\). Assume that the joint copula density
exists and is uniformly bounded in the whole domain. Then random variables \(\{Z_n,n\ge 1\}\) are END (see Example 4.2 of Liu 2009). By Remark 3.1 of Ko and Tang (2008), the copulas in the Frank family of the form
belong to this category. Meanwhile, Chen et al. (2010) showed that every n-dimensional Farlie–Gumbel–Morgenstern (FGM) distribution described a specific END structure.
If \(M=1\), then END random variables reduce to NOD random variables (see Lehmann 1966), which contain NA random variables and NSD random variables (see Joag-Dev and Proschan 1983; Hu 2000; Wang et al. 2015a). Joag-Dev and Proschan (1983) established that a permutation distribution is NA. Recall that a family of real-valued random variables \(Z=\left\{ Z_t,t\in T\right\} \) is called normal (or Gaussian) system if all its finite-dimensional distributions are Gaussian. Let \(Z=\left( Z_1,\ldots ,Z_n\right) \) be a normal random vector, \(n\ge 2\). Then Joag-Dev and Proschan (1983) proved that it is NA if and only if its components are non-positively correlated. They also pointed out that NA random variables are NOD random variables but the converse statement cannot always be true. For various examples of NA random variables and the related fields, we can refer to the studies by Bulinski and Shaskin (2007), Prakasa Rao (2012), Oliveira (2012) and so on. Since END random variables are widely dependent random variables, many researchers pay attention to the study of the properties of END. For example, Liu (2009, 2010) studied the precise large deviations and moderate deviations of END sequence with heavy tails; Chen et al. (2010) obtained strong law of large numbers of END sequence. They also established some large deviation inequalities and applications to risk theory and renewal theory; Shen (2011) obtained some moment inequalities of END sequence; Wang et al. (2013) and Hu et al. (2015) investigated the complete convergence for END sequences; Wang et al. (2015b) investigated the application of the nonparametric regression model under END errors, etc.
2 The large deviation results of the LS estimator
Let \(\varTheta \) be a Borel subset of \(\mathscr {R}^k, f_t(\theta )\) be a continuous deterministic function from \(\varTheta \) to \(\mathscr {R}\) for each \(t\in \mathcal N\). Assume that \(X^n:=\left( X_1,X_2,\ldots ,X_n\right) \) are the observed random variables of the nonlinear regression model (1).
The LS estimator \(\hat{\theta }_n\), which we assume to exist (see (2)), maximizes the M-functional
Given a sequence of nonsingular \(k\times k\) matrix norming factors \(\phi _n(\theta )\), we define the ratio
where
Similar to Theorem 3.1 of Sieders and Dzhaparidze (1987), we assume that, for some Borel subset K of \(\varTheta \), there exist functions \(g_n(R)\in \mathbf G \), positive constant \(r>0, \varLambda _1\in (0,\infty ], \delta \in (0,1/2), k_1>0, \rho \in (0,1]\) and a polynomial pol(R) in R such that for all n and R large enough, the following inequalities hold:
(N.1) for all \(t\in \mathcal {N}\) and \(|\lambda |\le \varLambda _1\),
(N.2) for all \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta , R}\), where \(|u-v|\le k_1\), one has
and
(N.3) for all \(\theta \in K\) and \(u\in {\varGamma }_{n,\theta , R}\), one has
By (N.1)–(N.3), we have the following large deviation result.
Theorem 2.1
Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) for all \(t\in \mathcal {N}\). For some \(K\subset \varTheta \subset \mathscr {R}^k\) and suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)–(N.3) be fulfilled. Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough,
Moreover, for any \(\beta >0\), we can choose \(B_0\) large enough such that \(b_0\ge \frac{\rho }{\rho +k}-\beta \).
We list two assumptions (N.1)\(^\prime \) and (N.4) as follows:
-
(N.1)\(^{\prime }\) for some \(r>0\), condition (N.1) holds with \(\varLambda _1=\infty \);
-
(N.4) there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}^k\) and n large enough,
$$\begin{aligned} D_1|\phi ^{-1}_n\left( \theta -\theta ^{\prime }\right) |^2\le \sum \limits _{t=1}^n\left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2|\phi ^{-1}_n\left( \theta -\theta ^{\prime }\right) |^2. \end{aligned}$$(21)
Replacing (N.1)–(N.3) by (N.1)\(^\prime \) and (N.4), we have a result as follows.
Theorem 2.2
Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) for all \(t\in \mathcal {N}\). For a suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)\(^\prime \) and (N.4) be fulfilled. Then there exist positive constants \(B_0\) and b such that, for all n and H large enough,
For any \(\beta >0\), it can be chosen \(B_0\) large enough such that \(b\ge \frac{D_1}{32r(1+k)}-\beta \).
Similar to (N.1) and (N.3), we give the following assumptions:
(N.1)\(^{*}\) for all \(t\in \mathcal {N}\), suppose that there exists a positive number L such that
for all positive integers \(m\ge 2\), where \(\sigma ^2=Var(\varepsilon _t)\);
(N.3)\(^{\prime }\) for all \(\theta \in K\) and \(u\in {\varGamma }_{n,\theta , R}\), it has
where \(0<\delta <1/2\).
Therefore, similar to Theorems 2.1 and 2.2, we also establish the following results:
Theorem 2.3
Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) and \(Var(\varepsilon _t)=\sigma ^2\) for all \(t\in \mathcal {N}\). For some \(K\subset \varTheta \subset \mathscr {R}^k\) and suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)\(^{*}\), (N.2) and (N.3)\(^{\prime }\) be fulfilled. Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough, (20) holds. For any \(\beta >0\), it can be chosen \(B_0\) large enough such that \(b_0\ge \frac{\rho }{\rho +k}-\beta \).
Theorem 2.4
Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) and \(Var(\varepsilon _t)=\sigma ^2\) for all \(t\in \mathcal {N}\). For a suitably chosen nonsingular \(\phi _n(\theta )\), let the conditions (N.1)\(^{*}\) and (N.4) be fulfilled. Then there exist positive constants \(B_0\) and \(C_0\) such that, for all n and H large enough,
For all \(\theta \in \varTheta \subset \mathscr {R}^k\), all \(\rho >0\) and n large enough, by taking \(\phi _n(\theta )=n^{-1/2}I_k\) and \(H=n^{1/2}\rho \) in Theorem 2.2, we obtain the following corollary, where \(I_k\) is a \(k\times k\) unit matrix.
Corollary 2.1
Assume that the errors \(\{\varepsilon _t \}\) in the nonlinear regression model (1) are mean zero END random variables satisfying (N.1)\(^{\prime }\). Assume that there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}^k\) and n large enough,
Then for all \(\rho >0\) and n large enough,
where \(B_0\) and b are defined by (22). So it follows
Remark 2.1
The (N.1), (N.1)\(^\prime \) and (N.1)\(^{*}\) control the tails of errors \(\varepsilon _t\) for all \(t\in \mathcal {N}\). Similar to Condition (III) of Ivanov (1976) (see (11)), Assumption A(ii) of Wu (1981), (2.5) of Prakasa Rao (1984a) and (N.2) of Sieders and Dzhaparidze (1987), the (N.2) is a Hölder-type continuity condition on the parametrization \(\theta \rightarrow f(\theta )\). Similar to (N.3) of Sieders and Dzhaparidze (1987), the (N.3) and (N.3)\(^\prime \) prescribe the rate of asymptotic separation. The asymptotic separation is a necessary condition for the consistent estimation (see Theorem 1 of Wu 1981). The similar conditions can be found by Condition III of Ivanov (1976), (2.6) of Prakasa Rao (1984a), etc. In addition, by the proof of Theorem 2.2 in Sect. 5, (N.2) and (N.3) are fulfilled from the condition (N.4) and \(\varLambda _1=\infty \). Similarly, by the proof of Theorem 2.4, (N.2) and (N.3)\(^\prime \) are fulfilled from (N.4).
Assume that \(\{a_n,n\ge 1\}\) is a positive constant sequence satisfying that \(a_n\rightarrow 0\). For all \(\theta \in \varTheta \subset \mathscr {R}^k\), let \(\phi _n(\theta )=a_nI_k\) and the conditions of Theorem 2.1 be fulfilled, where \(I_k\) is a \(k\times k\) unit matrix. Then for all \(\rho >0\), taking \(H=\frac{1}{a_n}\rho \) in (20), we obtain that
in view of (i) of \(\mathbf G \). So the LS estimator \(\hat{\theta }_n\) is a weak uniform consistency estimator of \(\theta \).
3 Some examples and simulations
In this section, some examples and simulations for the LS estimator of nonlinear regression models are illustrated.
Example 3.1
In the nonlinear model (1), let
where \(\theta \in \varTheta =\{\theta :0<\delta _1\le \theta \le \delta _2<\infty \}\). Obviously, there exist some positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \) and n large enough,
where \(D_1<\frac{1}{\delta _2^4}\) and \(D_1\) can be chosen arbitrarily close to \( \frac{1}{\delta _2^4}\). Let the conditions of Theorem 2.2 hold. Then there exist some constants \(B_0\) and b such that, for all n and H large enough,
where b can be chosen arbitrarily close (from below) to \( \frac{1}{64r\delta _2^4}\).
If the errors \(\{\varepsilon _t\}\) are i.i.d. random variables and \(\varepsilon _1\sim N(0,\sigma ^2)\), then by Theorem 5 of Wu (1981), it holds that
which yields
(see Example 1 of Sieders and Dzhaparidze 1987).
Moreover, by (28), we can get that
By comparing (31) with (30), we obtain the large deviation result of (28) under END errors, which has the same order of optimal bound as that of independent case.
Example 3.2
Consider the linear model
where the errors \(\left\{ \varepsilon _t\right\} \) are mean zero END random variables satisfying (N.1)\(^{\prime }\). Applying Theorem 2.2 with \(D_1=D_2=1\) and \(\phi _n(\theta )=n^{-1/2}\), we have that for all n and H large enough,
where \(B_0\) and b are positive constants and b can be chosen arbitrarily close (from below) to \( \frac{1}{64r}\). For all \(\theta \in \varTheta \), we take \(H=\sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\) in (32) and obtain
i.e.,
Example 3.3
Consider a power cure model
where \(d>1/2\) and \(\theta \in \varTheta =\{\theta :0<\delta _1\le \theta \le \delta _2<\infty \}\). Let \(f_t(\theta )=(t+\theta )^d\). Then there exist some positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \),
Let the errors \(\{\varepsilon _t\}\) be mean zero END random variables satisfying (N.1)\(^{\prime }\). Applying Theorem 2.2 with \(\phi _n(\theta )=n^{1/2-d}\), we establish that for all n and H large enough,
where \(B_0\) and b are positive constants and b can be chosen arbitrarily close (from below) to \( \frac{D_1}{64r}\). For all \(\theta \in \varTheta \), taking \(H=\sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\) in (34), we obtain that
i.e.,
where \(d>1/2\). Under the independent errors, Wu (1981) investigated the power cure model and obtained the strong consistency of the LS estimator \(\hat{\theta }_n\) for \(\theta \). We extend the result of Wu (1981) to the END case and establish the complete convergence rate for the LS estimator \(\hat{\theta }_n\) of \(\theta \).
Simulation 3.1 For simplicity, we do the simulation for the power cure model (33) with the case \(d=2\), i.e.,
where \(\theta \in \varTheta =\left\{ \theta :0<\delta _1\le \theta \le \delta _2<\infty \right\} \). Let \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\) be a normal random vector such as \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\sim N_n(\mathbf 0 ,\Sigma )\), where \(\mathbf 0 \) is zero vector, \(\Sigma \) is
for \(0<\rho <1\). By Joag-Dev and Proschan (1983), it can be seen that \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\) is an NA vector. So it is also an END vector. By (14), the LS estimator \({\hat{\theta }}_{n}\) is that \({\hat{\theta }}_{n}=\mathop {{{\mathrm{arg\,min}}}}\limits _{\theta \in {\varTheta }}\sum \nolimits _{t=1}^n(X_t-(t+\theta )^2)^{2}\). Let \(\frac{\mathrm{d}(\sum \nolimits _{t=1}^n(X_t-(t+\theta )^2)^2)}{\mathrm{d}\theta }=0\), then
It is a cubic equation of \(\theta \). By choosing the solutions of cubic equation, one can get \(\hat{\theta }_n\). For \(\theta =1\), \(\rho =0.1,0.2,0.3,0.4,0.5\) and sample size \(n=10,50,100,200\), we use MATLAB software to obtain the roots of cubic equation (35) by repeating the experiments 10000 times, and find that for each experiment, there are one real root and two complex roots. So we choose the real root as the LS estimator \(\hat{\theta }_n\) and plot the Box plots as Fig. 1.
Similarly, for \(\theta =2\), \(\rho =0.1,0.2,0.3,0.4,0.5\) and sample size \(n=10,50,100\), 200, we do the simulation by repeating the experiments 10,000 times and plot the Box plots for LS estimator \(\hat{\theta }_n\) as Fig. 2.
In Fig. 1a–d, with the same \(\theta =1\) but different \(\rho =0.1,\ldots ,0.5\), the median of LS estimator \(\hat{\theta }_n\) is close to 1 and the variation range becomes smaller as the sample n increasing by 10, 50, 100 and 200. Likewise, in Fig. 2e–h, with the same \(\theta =2\) but different \(\rho =0.1,\ldots ,0.5\), the median of \(\hat{\theta }_n\) is close to 2 and the variation range becomes smaller as the sample n increases.
We also give the Q–Q plot with \(\theta =1,2\), \(\rho =0.1,0.2\) and \(n=100\) by repeating the experiments 10,000 times, to test the normality of LS estimator \(\hat{\theta }_n\) and obtain the Fig. 3. In Fig. 3, it can be found that the LS estimator \(\hat{\theta }_n\) has a asymptotic normality based on this multivariate normal experiment.
4 Conclusion
In this paper, we investigate the LS estimator \(\hat{\theta }_n\) of \(\theta \) for the nonlinear model based on END errors which are not necessarily identically distributed. Under the general conditions, we establish some large deviation results such as Theorems 2.1–2.4 for the LS estimator \(\hat{\theta }_n\). As applications, by some simple conditions, a weak uniform consistency of \(\hat{\theta }_n\) is established (see Remark 2.1), and a complete convergence rate of \(\hat{\theta }_n\) is presented as \(\hat{\theta }_n-\theta =O(n^{-1/2}\log ^{1/2} n)\), completely, in Corollary 2.1. Some examples of nonlinear regression models and simulations are given to illustrate in Sect. 3. We extend the results of Sieders and Dzhaparidze (1987), Prakasa Rao (1984a) and Hu (1993) for the independence, Gaussian, locally generalized Gaussian and martingale differences to the case of END random variables. Since END random variables can be NOD random variables, NSD random variables and NA random variables, the results obtained in this paper also hold true for these dependent structures.
5 Proofs
Before proving our results, we give some technical preliminaries as follows.
Lemma 5.1
(cf. Liu 2010, Lemma 3.1) Let random variables \(\{Y_n,n\ge 1\}\) be a sequence of END random variables. We have that
(1) if \(\{f_n,n\ge 1\}\) is a sequence of all nondecreasing (or nonincreasing) functions, then \(\{f_n(Y_n),n\ge 1\}\) is also a sequence of END random variables;
(2) for each \(n\ge 1\), there exists a positive constant M such that
Lemma 5.2
Let \(\{Y_n, n\ge 1\}\) be a sequence of END random variables and \(\{r_n,n\ge 1\}\) be a sequence of positive numbers. For fixed \(n\ge 1\), suppose that there exists a positive number \(\varLambda _1\) such that
Denote \(S_n=\sum \nolimits _{i=1}^n Y_i\) and \(G_n=\sum _{i=1}^nr_i\), \(n\ge 1\). Then there exists a positive constant M such that
and
Consequently,
Proof
For all x, by Markov’s inequality, Lemma 5.1 and (36), we obtain that
So it has
For the fixed \(x\ge 0\), if \(\varLambda _1\ge \frac{x}{G_n}\ge 0\), then
Meanwhile, for the fixed \(x\ge 0\), if \(\varLambda _1\le \frac{x}{G_n}\), \(x\ge 0\), then
Consequently, (37) follows from (40) to (42) immediately.
According to Lemma 5.1 (1), \(\{-Y_n\}\) are also END random variables. Therefore, by (37), it yields
which implies (38). Combining (37) with (38), we obtain (39) finally. \(\square \)
Remark 5.1
Lemma 5.2 is an extension of exponential inequalities for the independent case (see Theorem 2.6 of Petrov 1995) and NOD case (see Theorem 2.1 of Wang et al. 2010) to the END structure case.
Corollary 5.1
Let \(\{Y_n, n\ge 1\}\) be a sequence of END random variables, \(\{d_n, n\ge 1\}\) be a sequence of real numbers and \(\{r_n, n\ge 1\}\) be a sequence of positive numbers. Suppose there exists a positive constant \(\varLambda _1\) (\(\varLambda _1\) possibly \(\infty \)) such that for all \(|\lambda |\le \varLambda _1\), (36) holds true. Denote \(\tilde{S}_n=\sum \nolimits _{i=1}^n d_i Y_i\), \(\tilde{G}_n=\sum \nolimits _{i=1}^n r_id_i^2\) and \(\varLambda =\varLambda _1/\max \nolimits _{1\le i\le n}|d_i|\). Then for all \(x\ge 0\), there exists a positive constant M such that
Proof
Obviously, for \(|\lambda |\le \varLambda =\varLambda _1/\max \nolimits _{1\le i\le n}|d_i|\), we have \(|\lambda d_i^+|\le \varLambda |d_i|\le \varLambda _1\). So by (36), it can be argued that
According to Lemma 5.1 (1), \(d_1^{+}Y_1,\ldots ,d_n^{+}Y_n\) are still END random variables. Denote \(\tilde{S}_n(1)=\sum \nolimits _{i=1}^n d_i^+Y_i\) and \(\tilde{G}_n=\sum \nolimits _{i=1}^n r_id_i^2\). Then, we apply Lemma 5.2 and establish that
Meanwhile, \(d_1^{-}Y_1,\ldots ,d_n^{-}Y_n\) are still END random variables. Denote \(\tilde{S}_n(2)=\sum \nolimits _{i=1}^n d_i^-Y_i\). Similar to the proof of (46), one has
Since
by (46) to (48), we obtain the result (43). Combining with the proofs of (38), (39) and (43), we have the results of (44) and (45) immediately. \(\square \)
Corollary 5.2
Let \(\left\{ Y_n, n\ge 1\right\} \) be a sequence of END random variables satisfying \(EY_i=0\) and \(EY_i^2=\sigma _i^2<\infty \), \(i=1,2,\ldots ,\) \(\{d_n, n\ge 1\}\) be a sequence of real numbers. Denote \(\tilde{S}_n=\sum \nolimits _{i=1}^n d_i Y_i\) and \(\tilde{B}_n^2=\sum \nolimits _{i=1}^n \sigma _i^2d_i^2\). For the fixed \(n\ge 1\), suppose that there exists a positive number L such that
for all positive integers \(m\ge 2\). Then there exists a positive constant M such that for all \(x\ge 0\),
Proof
It can be argued by \(EY_i=0\), \(EY_i^2=\sigma _i^2\) and (49) that
If \(|\lambda |\le \frac{1}{2L}\), then
where \(r_i=2\sigma _i^2\) and \(i=1,2,\ldots ,n\). Taking \(\varLambda _1=\frac{1}{2L}\) and \(\tilde{G}_n=\sum _{i=1}^nr_id_i^2=2\sum _{i=1}^n\sigma _i^2d_i^2=2\tilde{B}_n^2\) in Corollary 5.1, we have the results (50)–(52) immediately. \(\square \)
Lemma 5.3
For some \(m\ge 2\), let \(\left\{ Y_n, n\ge 1\right\} \) be a sequence of END random variables with \(EY_n=0\) and \(E|Y_n|^m<\infty \), \(n=1,2,\ldots \). Assume that \(\left\{ a_{ni}, 1\le i\le n, n\ge 1\right\} \) is a triangular array of real numbers. Denote \(S_n=\sum \nolimits _{i=1}^n a_{ni}Y_i\). Then there exists a positive constant C not dependent on n such that
Proof
Denote \(S_n(1)=\sum \nolimits _{i=1}^n a_{ni}^{+}Y_i\) and \(S_n(2)=\sum \nolimits _{i=1}^n a_{ni}^{-}Y_i\). For \(m\ge 1\), by \(C_r\) inequality, one has
Obviously, by Lemma 5.1 (1), \(\{a_{ni}^{+}Y_i,1\le i\le n\}\) and \(\{a_{ni}^{-}Y_i,1\le i\le n\}\) are also END random variables. Then, for \(m\ge 2\), by Corollary 3.2 of Shen (2011), it yields that
Similarly, it has
Thus, (54) follows from (55) to (57) immediately. \(\square \)
Proof of Theorem 2.1
Let \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta ,R}\). If \(|u-v|\ge k_1\), then we obtain by (18) that
By (17), it can be found that
Taking \(\zeta _{n,\theta }(u):=\log Z_{n,\theta }(u)\) in (15), we establish that
where
For all \(m\ge 1\), by \(C_r\) inequality, it yields
Obviously, the condition (N.1) implies that \(\left\{ E|\varepsilon _t|^m,t\in \mathcal {N}\right\} \) is uniformly bounded. So, by (58) and Lemma 5.3 with (N.1) and \(E\varepsilon _t=0\), we obtain that for all \(m\ge 2\)
Meanwhile, by Cauchy–Schwarz inequality, (18) and (58), one has that
So it follows from (61)–(63) that for all \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta ,R}\)
Taking \(m>\max (2,k/\rho )\) in (64), we find that (7) is fulfilled with \(\alpha =\rho m\).
Next, we verify that (10) holds true. For \(0<\delta <\frac{1}{2}\), let
By (19), it can be argued that
which shows that \(\eta _{n,\theta }(u)\in \mathbf H _K\), because from (5), \(g^{-1}_n(R)\le 1\) for all n and R large enough. Denote \(d_{tn\theta }(u)=d_t\) and \(\max \limits _{1\le t\le n}|d_{tn\theta }(u)|=\max |d_{t}|\). Then, by (19), (59), (60), (65), (66) and Corollary 5.1, we get
which implies that (10) is fulfilled. Combining Theorem 1.1 with Remark 1.2, we obtain (20). Meanwhile, by choosing \(\alpha =\rho m\) in Theorem 1.1, for all \(\beta >0\) and m large enough, there exists a positive \(B_0\) such that (20) holds, where \(b_0\ge \frac{\rho }{\rho +k}-\beta \). \(\square \)
Proof of Theorem 2.2
By (21), (17) is fulfilled with \(\rho =1\) and pol\((R)=D_2\). Obviously, for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), it has
which implies that (18) is satisfied. Meanwhile, for all \(0<\delta <1/2\) and \(u\in {\varGamma }_{n,\theta ,R}\), it can be argued that
Then, by (68) and \(\varLambda _1=\infty \), (19) is fulfilled with \(g_n(R)=\frac{D_1\delta ^2}{8r}R^2\). Letting \(\delta \rightarrow \frac{1}{2}\), we apply Theorem 2.1 and obtain the result (22) finally. \(\square \)
Proof of Theorem 2.3
Combining Corollary 5.2 with the proof of Theorem 2.1, where r is replaced by \(2\sigma ^2\) and \(\varLambda _1\) is replaced by \(\frac{1}{2L}\), we have (20) immediately. \(\square \)
Proof of Theorem 2.4
Together (21) with (67), for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), it can be checked that
which implies
Let \(g_n(R)=C_1R\), where \(C_1\) is a positive constant and is defined later. Next, we prove that (23) in (N.3)\(^\prime \) is fulfilled for all R large enough. By (21), (68) and (69), for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), we take a positive constant \(C_1\) such that for all R large enough,
and
Thus, by (70) and (71), (23) is fulfilled. Consequently, by the proofs of Theorems 2.1 and 2.3, we apply Theorem 2.3 and establish that for all n and H large enough,
where \(C_0=b_0C_1\), \(B_0\) and \(b_0\) are defined in Theorem 2.3. \(\square \)
Proof of Corollary 2.1
For all \(\theta \in \varTheta \subset \mathscr {R}^k\) and all \(\rho >0\), taking \(\phi _n(\theta )=n^{-1/2}I_k\) and \(H=n^{1/2}\rho \) in (22), one establishes the result (26) immediately, where \(I_k\) is a \(k\times k\) unit matrix. Taking \(C_1\) large enough, we apply (26) and establish that
This completes the proof of (27). \(\square \)
References
Bulinski AV, Shaskin A (2007) Limit theorems for associated random fields and related systems. World Scientific, Singapore
Bunke H, Schmidt WH (1980) Asymptotic results on nonlinear approximation of regression functions and weighted least squares. Math Oper Stat Ser Stat 11(1):3–32
Chen YQ, Chen AY, NG KW (2010) The strong law of large numbers for extended negatively dependent random variables. J Appl Prob 47(4):908–922
Hu SH (1993) A large deviation result for the least squares estimators in nonlinear regression. Stoch Process Appl 47(2):345–352
Hu SH (2002) The rate of convergence for the least squares estimator in nonlinear regression model with dependent errors. Sci China Ser A 45(2):137–146
Hu SH (2004) Consistency for the least squares estimator in nonlinear regression model. Stat Probab Lett 67(2):183–192
Hu T-C, Rosalsky A, Wang KL (2015) Complete convergence theorems for extended negatively dependent random variables. Sankhya 77A(1):1–29
Hu TZ (2000) Negatively superadditive dependence of random variables with applications. Chin J Appl Probab Stat 16(2):133–144
Ibragimov IA, Has’minskii RZ (1981) Statistical estimation: asymptotic theory. Springer, New York Translated by Samuel Kotz
Ivanov AV (1976) An asymptotic expansion for the distribution of the least squares estimator of the nonlinear regression parameter. Theory Probab Appl 21(3):557–570
Ivanov AV (1997) Asymptotic theory of nonlinear regression. Kluwer Academic Publishers, Dordreht
Ivanov AV, Leonenko NN (1989) Statistical analysis of random fields. Kluwer Academic Publishers, Dordreht
Ivanov AV, Leonenko NN (2008) Semiparametric analysis of long-range dependence in nonlinear regression. J Stati Plan Inference 138(6):1733–1753
Jennrich RI (1969) Asymptotic properties of nonlinear least squares estimators. Ann Math Stat 40(2):633–643
Joag-Dev K, Proschan F (1983) Negative association of random variables with applications. Ann Stat 11(1):286–295
Ko B, Tang Q (2008) Sums of dependent nonnegative random variables with subexponential tails. J Appl Probab 45(1):85–94
Lehmann EL (1966) Some concepts of dependence. Ann Math Stat 37(5):1137–1153
Liu L (2009) Precise large deviations for dependent random variables with heavy tails. Stat Probab Lett 79(9):1290–1298
Liu L (2010) Necessary and sufficient conditions for moderate deviations of dependent random variables with heavy tails. Sci China Ser A 53(6):1421–434
Malinvaud E (1970) The consistency of nonlinear regression. Ann Math Stat 41(3):956–969
Midi H (1999) Preliminary estimators for robust non-linear regression estimation. J Appl Stat 26(5):591–600
Oliveira PE (2012) Asymptotics for associated random variables. Springer, Berlin
Petrov VV (1995) Limit theorems of probability theory: sequences of independent random variables. Clarendon Press, Oxford
Prakasa Rao BLS (1984a) On the exponential rate of convergence of the least squares estimator in the nonlinear regression model with Gaussian errors. Stat Probab Lett 2(3):139–142
Prakasa Rao BLS (1984b) The rate of convergence of the least squares estimator in a non-linear regression model with dependent errors. J Multivar Anal 14(3):315–322
Prakasa Rao BLS (2012) Associated sequences, demimartingales and nonparametric inference. Birkhäuser, Springer, Basel
Shen AT (2011) Probability inequalities for END sequence and their applications. J Inequal Appl 2011 2011(98):12
Sieders A, Dzhaparidze K (1987) A large deviation result for parameter estimators and its application to nonlinear regression analysis. Ann Stat 15(3):1031–1049
Wang XJ, Hu SH, Yang WZ, Ling NX (2010) Exponential inequalities and inverse moment for NOD sequence. Stat Probab Lett 80(5–6):452–461
Wang XJ, Hu T-C, Volodin AI, Hu SH (2013) Complete convergence for weighted sums and arrays of rowwise extended negatively dependent random variables. Commun Stat Theory Methods 42(13):2391–2401
Wang XJ, Shen AT, Chen ZY, Hu SH (2015a) Complete convergence for weighted sums of NSD random variables and its application in the EV regression model. Test 24(1):166–184
Wang XJ, Zheng LL, Xu C, Hu SH (2015b) Complete consistency for the estimator of nonparametric regression models based on extended negatively dependent errors. Statistics 49(2):396–407
Wu CF (1981) Asymptotic theory of nonlinear least squares estimation. Ann Stat 9(3):501–513
Yang WZ, Hu SH (2014) Large deviation for a least squares estimator in a nonlinear regression model. Stat Probab Lett 91:135–144
Acknowledgements
The authors are deeply grateful to the editor, the associate editor and three anonymous referees for their careful reading and insightful comments. The comments led us to significantly improve the paper. This work is supported by the National Natural Science Foundation of China (Grant: 11426032, 11501005, 11526033, 11671012), National Social Science Fund of China (Grant: 14ATJ005), Natural Science Foundation of Anhui Province (Grant: 1408085QA02, 1508085J06, 1608085QA02), Provincial Natural Science Research Project of Anhui Colleges (Grant: KJ2014A020, KJ2015A065, KJ2016A027), Quality Engineering Project of Anhui Province (2015jyxm054) and Applied Teaching Model Curriculum of Anhui University (XJYYKC1401, ZLTS2015053).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, W., Zhao, Z., Wang, X. et al. The large deviation results for the nonlinear regression model with dependent errors. TEST 26, 261–283 (2017). https://doi.org/10.1007/s11749-016-0509-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-016-0509-z