1 Introduction

1.1 The nonlinear regression model

First, we consider the nonlinear regression model for the observations \(X^n:=\left( X_1,X_2,\ldots ,X_n\right) \):

$$\begin{aligned} X_t=f_t\left( \theta \right) +\varepsilon _t,~~t=1,2,\ldots ,n, \end{aligned}$$
(1)

where the \(f_t\) are known continuous functions on a parameter set \(\varTheta \subset \mathscr {R}^k\), the \(\varepsilon _t\) are random errors and the \(\theta \in \varTheta \) is the true value of the parameter. Denote

$$\begin{aligned} Q_n\left( \theta \right) =\sum \limits _{t=1}^n\left( X_t-f_t(\theta )\right) ^2. \end{aligned}$$

Let \(\hat{\theta }_n(X_1,X_2,\ldots ,X_n)\) denote the least squares (LS) estimator of the parameter \(\theta \in \varTheta \) such as

$$\begin{aligned} Q_n\left( \hat{\theta }_n\right) =\inf \limits _{\theta \in \varTheta }\sum \limits _{t=1}^n\left( X_t-f_t(\theta )\right) ^2. \end{aligned}$$

The LS method plays a central role in the inference of parameters in nonlinear regression models. The study of asymptotic properties of the LS estimator for parameters in nonlinear regression models has been the main subject of investigation, since it is, in general, difficult to obtain the exact distribution of the LS estimator for any fixed sample. For the LS estimator of the nonlinear model based on the independent identically distributed (i.i.d.) errors, Jennrich (1969) presented the asymptotic normality, Malinvaud (1970) obtained the consistency, and Wu (1981) established the necessary and sufficient condition for the strong consistency, etc. In addition, for the the nonlinear model based on the independent but not necessarily identically distributed errors, Bunke and Schmidt (1980) established the strong consistency and asymptotic normality for the weighted LS estimator, Ibragimov and Has’minskii (1981) obtained some large deviation results of the maximum likelihood (ML) estimator, Sieders and Dzhaparidze (1987) extended the results of Ibragimov and Has’minskii (1981) to the M-estimator and gave its application to the main results of large deviation for the LS estimator. For the LS estimator of the nonlinear model, Prakasa Rao (1984a) extended the result of Ibragimov and Has’minskii (1981) to the case of i.i.d. Gaussian errors, Hu (1993) extended the results of Prakasa Rao (1984a) and Sieders and Dzhaparidze (1987) to the cases of locally generalized Gaussian errors, martingale differences, etc.

As far as we know, there is no large deviation result of the LS estimator of model (1) based on the extended negatively dependent (END, Liu 2009) errors. The END sequences are widely dependent structures which cover several negative dependent sequences such as negatively orthant dependent (NOD, Lehmann 1966), negatively superadditive dependent (NSD, Hu 2000) and negatively associated (NA, Joag-Dev and Proschan 1983). Based on the M estimator of Sieders and Dzhaparidze (1987), we obtain the main result of large deviation such as Theorems 2.12.4 and Corollary 2.1 for the LS estimator \(\hat{\theta }_n\) of \(\theta \in \mathscr {R}^k\) in model (1), which can be applied to establish a weak uniform consistency and a complete convergence rate.

Now, we recall the M-estimator. Let \(\mathscr {E}^{(n)}=\{\mathscr {X}^{(n)},\mathscr {U}^{(n)},P_\theta ^{(n)},\theta \in \varTheta \}\) be a family of probability spaces, where the \(P_\theta ^{(n)}\) does not necessarily have known form. The parameter set \(\varTheta \) is a Borel subset of k-dimensional Euclidean space. We shall consider the M-estimator maximizing an M-functional \(C_n:\mathscr {X}^{(n)}\times \varTheta \rightarrow [0,\infty )\), which is assumed to be, for all \(X^n\in \mathscr {X}^{(n)}\), a positive continuous function of \(\theta \) and, for all \(\theta \in \varTheta \), a measurable functional of \(X^n\).

Throughout the paper, we assume that, for all \(\theta \in \varTheta \) and \(P_\theta ^{(n)}\)-almost all \(X^n\), a solution \(\hat{\theta }_n\) to the equation

$$\begin{aligned} C_n\left( X^n,\hat{\theta }_n\right) =\sup \limits _{\theta \in \varTheta }C_n\left( X^n,\theta \right) \end{aligned}$$
(2)

exists (this is certainly true if \(\varTheta \) is compact). So \(\hat{\theta }_n\) is called the M-estimator of \(\theta \). Especially, the LS estimator \(\hat{\theta }_n\) maximizes the M-functional

$$\begin{aligned} C_n\left( X^n,\theta \right) :=\exp \Bigg (-\frac{1}{2}\sum \limits _{t=1}^n\left( X_t-f_t(\theta )\right) ^2\Bigg ). \end{aligned}$$

For all \(n\in \mathcal {N}\) and \(\theta \in \varTheta \subset \mathscr {R}^k\), let \(u\in \mathscr {R}^k, \phi _n(\theta )\) be a nonsingular \(k\times k\) matrix and define the normalized M-ratio

$$\begin{aligned} Z_{n,\theta }(u):=Z_{n,\theta }\left( X^n,u\right) =\frac{C_n\left( X^n,\theta +\phi _n(\theta )u\right) }{C_n\left( X^n,\theta \right) }, \end{aligned}$$
(3)

which, for fixed observation \(X^n\), is a continuous, nonnegative finite function on the set

$$\begin{aligned} U_{n,\theta }=\phi ^{-1}_n(\theta )\left( \varTheta -\theta \right) . \end{aligned}$$

Throughout the paper, for a matrix \(A_{m\times n}, |A_{m\times n}|\) denotes its norm. Define

$$\begin{aligned} {\varGamma }_{n,\theta , R}:=\bar{U}_{n,\theta }\cap \left\{ u:R\le |u|\le R+1\right\} , \end{aligned}$$

where \(\bar{U}_{n,\theta }\) is a closure of \(U_{n,\theta }\).

Similar to Theorem 1.5.1 of Ibragimov and Has’minskii (1981) and Theorem 2.1 of Sieders and Dzhaparidze (1987), we define the following sets of functions.

\(\mathbf G \) is the set of all functions \(g_n(\cdot )\) possessing the following properties:

(i) for fixed n, \(g_n(\cdot )\) is a function on \([0,\infty )\) monotonically increasing to infinity;

(ii) for all \(N>0\),

$$\begin{aligned} \lim \limits _{\begin{array}{c} R\rightarrow \infty \\ n\rightarrow \infty \end{array}}R^N\exp (-g_n(R))=0. \end{aligned}$$
(4)

Remark 1.1

If \(g_n(R)=R^{\alpha }\) and \(\alpha >0\), then \(g_n\in \mathbf G \).

Let K be a measurable subset of \(\varTheta \) and \(\mathbf H _K\) be the set of all functions \(\eta _{n,\theta }(\cdot )\) possessing the following properties:

(iii) for fixed n and \(\theta \in \varTheta \), \(\eta _{n,\theta }(\cdot )\) is a function \(U_{n,\theta }\rightarrow (0,\infty )\);

(vi) there exists a polynomial pol\(_K(R)\) in R such that for R and n sufficiently large,

$$\begin{aligned} \sup \limits _{\theta \in K;~u\in {\varGamma }_{n,\theta , R}}\left( \eta _{n,\theta }(u)\right) ^{-1}\le \text {pol}_K(R). \end{aligned}$$
(5)

For each n and \(\theta \), let \(\zeta _{n,\theta }:[0,\infty )\rightarrow \mathscr {R}\) be a monotonically nondecreasing continuous function and define the random function

$$\begin{aligned} \zeta _{n,\theta }(u):=\zeta _{n,\theta }\left( Z_{n,\theta }(u)\right) . \end{aligned}$$
(6)

As a generalization of Theorem 1.5.1 of Ibragimov and Has’minskii (1981), Sieders and Dzhaparidze (1987, Theorem 2.1) obtained a large deviation result for the M-estimator as follows.

Theorem 1.1

Let the functionals \(\zeta _{n,\theta }(u)\) process the following properties: given a measurable subset \(K\subset \varTheta \subset \mathscr {R}^k\), there correspond to it numbers m and \(\alpha \), where \(m\ge \alpha >k\), functions \(g_{n}\in \mathbf G \) and \(\eta _{n,\theta } \in \mathbf H _K\), and a polynomial pol\(_K(R)\) in R such that, for all R and n large enough,

$$\begin{aligned} E_\theta ^{(n)}|\zeta _{n,\theta }(u)-\zeta _{n,\theta }(v)|^m\le & {} |u-v|^\alpha \text {pol}_K(R), \nonumber \\&\text {for all} ~\theta \in K~ \text {and}~ u,v \in {\varGamma }_{n,\theta , R}, \end{aligned}$$
(7)
$$\begin{aligned} P_\theta ^{(n)}(\zeta _{n,\theta }(u)-\zeta _{n,\theta }(0)\ge -\eta _{n,\theta }(u))\le & {} \exp (-g_{n}(R)) \nonumber \\&\text {for all} ~\theta \in K~ \text {and}~ u \in {\varGamma }_{n,\theta , R}. \end{aligned}$$
(8)

Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in K}P_{\theta }^{(n)}\Big \{|\phi ^{-1}_n(\theta )\left( \hat{\theta }_n-\theta \right) |\ge H\Big \}\le B_0\exp \left( -b_0g_n(H)\right) . \end{aligned}$$
(9)

The constant \(b_0\) can be made arbitrarily close to \((\alpha -k)/(\alpha -k+mk)\) by choosing \(B_0\) large enough.

Remark 1.2

In view of Sieders and Dzhaparidze (1987), the condition (8) can be replaced by the following condition

$$\begin{aligned} P_\theta ^{(n)}\left( \zeta _{n,\theta }(u)-\zeta _{n,\theta }(0)\ge -\eta _{n,\theta }(u)\right) \le C\exp \left( -g_{n}(R)\right) \end{aligned}$$
(10)

for all \(\theta \in K\) and \(u \in {\varGamma }_{n,\theta , R}\), where C is a positive constant independent of n and \(\theta \).

Ibragimov and Has’minskii (1981, Theorem 1.5.1) obtained the large deviation (9) for the ML estimator. Under the i.i.d. Gaussian errors, Prakasa Rao (1984a) obtained the result of LS estimator \(\hat{\theta }_n\) such that for all \(\rho >0\) and \(n\ge 1\)

$$\begin{aligned} \sup \limits _{\theta \in K}P_\theta \left( n^{1/2}|\hat{\theta }_n-\theta |>\rho \right) \le B \mathrm{e}^{-b\rho ^2}, \end{aligned}$$

where K is a compact in \(\varTheta \subset \mathscr {R}, B\) and b are some positive constants. Hu (1993) extended (9) to the locally generalized Gaussian and martingale differences cases. In addition, Ivanov (1976) investigated the LS estimator \(\hat{\theta }_n\) of model (1) based on the i.i.d. errors. Assume that there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}\),

$$\begin{aligned} D_1n\left( \theta -\theta ^{\prime }\right) ^2\le \sum \limits _{t=1}^n\left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2n\left( \theta -\theta ^{\prime }\right) ^2. \end{aligned}$$
(11)

By (11) and some other conditions, Ivanov (1976) presented that for all \(\rho >0\) and all \(n\ge 1\),

$$\begin{aligned} P_\theta \left( n^{1/2}|\hat{\theta }_n-\theta |>\rho \right) \le C\rho ^{-p}, \end{aligned}$$
(12)

where p is a positive constant such as \(p\ge 2\), and C is a positive constant independent of n and \(\rho \). Prakasa Rao (1984b) extended (12) to the dependent cases of \(\varphi \)-mixing and \(\alpha \)-mixing errors. Under some general conditions and \(\sup \nolimits _{n\ge 1} E|\varepsilon _n|^p<\infty \) for some \(p>2\), Hu (2002) also obtained (12) and gave some applications to the dependent cases of martingale differences, \(\varphi \)-mixing sequence and NA sequence. By the condition \(\sup \nolimits _{n\ge 1} E|\varepsilon _n|^p<\infty \) for some \(1<p\le 2\), Hu (2004) established that

$$\begin{aligned} P_\theta \left( n^{1/2}|\hat{\theta }_n-\theta |>\rho \right) \le Cn^{1-p/2}\rho ^{-p}, \end{aligned}$$
(13)

for all \(\rho >0\), \(n\ge 1\) and some \(C>0\), which was also applied to some dependent errors. In view of (12) and (13), by some moment information of errors, Yang and Hu (2014) obtained some similar results of (12) and (13), which can be used in some case satisfying \(\sup \nolimits _{n\ge 1}E|\xi _n|^p=\infty \) for some \(p>1\).

For more works on the nonlinear regression models, one can refer to Ivanov and Leonenko (1989) and Ivanov (1997) for some basic asymptotic theories, Midi (1999) for the robustness of weighted LS estimator under i.i.d. errors with mean zero and unknown variance \(\sigma ^2\), Ivanov and Leonenko (2008) for the consistency and asymptotic distribution theory of LS estimator under long-range-dependent noise, etc.

Due to the importance of END random variables and the LS estimator of a nonlinear regression parameter, we investigate the LS estimator \(\hat{\theta }_n\) for the model (1) based on the END errors which are not necessarily identically distributed. With the techniques of some exponential inequalities of END random variables giving by Sect. 5, we obtain the large deviation results for the LS estimator \(\hat{\theta }_n\), which can be applied to get a weak uniform consistency and a complete convergence rate \(\hat{\theta }_n-\theta =O(n^{-1/2}\log ^{1/2} n)\), completely (see our results in Sect. 2). Some examples and simulations for the nonlinear models are illustrated in Sect. 3, and the conclusions are presented in Sect. 4. Last, we give the proofs in Sect. 5.

1.2 The concept of END random variables

In this subsection, let us recall the concept of END random variables which was introduced by Liu (2009).

Definition 1.1

We call random variables \(\left\{ Z_n,n\ge 1\right\} \) to be END if there exists a constant \(M>0\) such that both

$$\begin{aligned} P\left( Z_i>z_i, i=1, 2, \ldots , n\right) \le M\prod \nolimits _{i=1}^nP\left( Z_i>z_i\right) \end{aligned}$$

and

$$\begin{aligned} P\left( Z_i\le z_i, i=1, 2, \ldots , n\right) \le M\prod \nolimits _{i=1}^nP\left( Z_i\le z_i\right) \end{aligned}$$

hold for each \(n\ge 1\) and all real numbers \(z_1,z_2,\ldots ,z_n\).

If \(\left\{ Z_n,n\ge 1\right\} \) is a sequence of END random variables, then for any fixed \(m\ge 1, \left\{ Z_{n+m},n\ge 1\right\} \) is also a sequence of END random variables with the same dominating coefficient M. In fact, by Definition 1.1 and the continuity of probability, one can get this property.

Let \(\left\{ Z_n,n\ge 1\right\} \) be a sequence of random variables. For some \(1\le i\le n\), if \(P\left( Z_i\le z_i\right) =0\), then \(P\left( Z_1\le z_1, Z_2\le z_2,\ldots ,Z_n\le z_n\right) =0\). Similarly, if for some \(1\le i\le n\), \(P \left( Z_i> z_i\right) =0\), then \(P\left( Z_1> z_1, Z_2> z_2,\ldots ,Z_n>z_n\right) =0\). Define \(\frac{0}{0}=1\). If

$$\begin{aligned} M_1=\sup \limits _{n\ge 1}\sup \limits _{z_i\in (-\infty ,\infty ),1\le i\le n}\frac{P\left( Z_i>z_i, i=1, 2, \ldots , n\right) }{\prod \nolimits _{i=1}^nP(Z_i>z_i)}<\infty \end{aligned}$$

and

$$\begin{aligned} M_2=\sup \limits _{n\ge 1}\sup \limits _{z_i\in (-\infty ,\infty ),1\le i\le n}\frac{P\left( Z_i\le z_i, i=1, 2, \ldots , n\right) }{\prod \nolimits _{i=1}^nP(Z_i\le z_i)}<\infty \end{aligned}$$

then we take \(M=\max \{M_1,M_2\}\) in Definition 1.1 and obtain that \(\{Z_n,n\ge 1\}\) are END random variables. Obviously, for all \(1\le i \le n\), let \(z_i=-\infty \) or \(z_i=+\infty \) in Definition 1.1, it is easy to see that the dominating coefficient \(M\ge 1\).

Moreover, for any \(n\ge 1\), let \(Z_1,Z_2,\ldots ,Z_n\) be dependent according to a multivariate copula function \(C(u_1,\ldots ,u_n)\) with absolutely continuous distribution functions \(F_1,\ldots ,F_n\). Assume that the joint copula density

$$\begin{aligned} C_{1,\ldots ,n}(u_1,\ldots , u_n)=\frac{\partial ^{n}}{\partial u_1\ldots \partial u_n}C(u_1,\ldots ,u_n) \end{aligned}$$

exists and is uniformly bounded in the whole domain. Then random variables \(\{Z_n,n\ge 1\}\) are END (see Example 4.2 of Liu 2009). By Remark 3.1 of Ko and Tang (2008), the copulas in the Frank family of the form

$$\begin{aligned} C(u_1,\ldots ,u_n;\theta )=-\frac{1}{\theta }\ln \Big (1+\frac{(\mathrm{e}^{-\theta u_1}-1)\ldots (\mathrm{e}^{-\theta u_n}-1)}{(\mathrm{e}^{-\theta }-1)^{n-1}}\Big ),~~\theta <0 \end{aligned}$$

belong to this category. Meanwhile, Chen et al. (2010) showed that every n-dimensional Farlie–Gumbel–Morgenstern (FGM) distribution described a specific END structure.

If \(M=1\), then END random variables reduce to NOD random variables (see Lehmann 1966), which contain NA random variables and NSD random variables (see Joag-Dev and Proschan 1983; Hu 2000; Wang et al. 2015a). Joag-Dev and Proschan (1983) established that a permutation distribution is NA. Recall that a family of real-valued random variables \(Z=\left\{ Z_t,t\in T\right\} \) is called normal (or Gaussian) system if all its finite-dimensional distributions are Gaussian. Let \(Z=\left( Z_1,\ldots ,Z_n\right) \) be a normal random vector, \(n\ge 2\). Then Joag-Dev and Proschan (1983) proved that it is NA if and only if its components are non-positively correlated. They also pointed out that NA random variables are NOD random variables but the converse statement cannot always be true. For various examples of NA random variables and the related fields, we can refer to the studies by Bulinski and Shaskin (2007), Prakasa Rao (2012), Oliveira (2012) and so on. Since END random variables are widely dependent random variables, many researchers pay attention to the study of the properties of END. For example, Liu (2009, 2010) studied the precise large deviations and moderate deviations of END sequence with heavy tails; Chen et al. (2010) obtained strong law of large numbers of END sequence. They also established some large deviation inequalities and applications to risk theory and renewal theory; Shen (2011) obtained some moment inequalities of END sequence; Wang et al. (2013) and Hu et al. (2015) investigated the complete convergence for END sequences; Wang et al. (2015b) investigated the application of the nonparametric regression model under END errors, etc.

2 The large deviation results of the LS estimator

Let \(\varTheta \) be a Borel subset of \(\mathscr {R}^k, f_t(\theta )\) be a continuous deterministic function from \(\varTheta \) to \(\mathscr {R}\) for each \(t\in \mathcal N\). Assume that \(X^n:=\left( X_1,X_2,\ldots ,X_n\right) \) are the observed random variables of the nonlinear regression model (1).

The LS estimator \(\hat{\theta }_n\), which we assume to exist (see (2)), maximizes the M-functional

$$\begin{aligned} C_n(X^n,\theta ):=\exp \Big (-\frac{1}{2}\sum \limits _{t=1}^n\left( X_t-f_t(\theta )\right) ^2\Big ). \end{aligned}$$
(14)

Given a sequence of nonsingular \(k\times k\) matrix norming factors \(\phi _n(\theta )\), we define the ratio

$$\begin{aligned} Z_{n,\theta }(u):= & {} \frac{C_n\left( X^n,\theta +\phi _n(\theta )u\right) }{C_n\left( X^n,\theta \right) }\nonumber \\= & {} \exp \Bigg (\sum \limits _{t=1}^n d_{tn\theta }(u)\varepsilon _t-\frac{1}{2}\sum \limits _{t=1}^n d^2_{tn\theta }(u)\Bigg ), \end{aligned}$$
(15)

where

$$\begin{aligned} d_{tn\theta }(u)=f_t\left( \theta +\phi _n(\theta )u\right) -f_t(\theta ). \end{aligned}$$

Similar to Theorem 3.1 of Sieders and Dzhaparidze (1987), we assume that, for some Borel subset K of \(\varTheta \), there exist functions \(g_n(R)\in \mathbf G \), positive constant \(r>0, \varLambda _1\in (0,\infty ], \delta \in (0,1/2), k_1>0, \rho \in (0,1]\) and a polynomial pol(R) in R such that for all n and R large enough, the following inequalities hold:

(N.1) for all \(t\in \mathcal {N}\) and \(|\lambda |\le \varLambda _1\),

$$\begin{aligned} E\exp \left( \lambda \varepsilon _t\right) \le \exp \left( \frac{1}{2}r\lambda ^2\right) ; \end{aligned}$$
(16)

(N.2) for all \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta , R}\), where \(|u-v|\le k_1\), one has

$$\begin{aligned} \sum \limits _{t=1}^n\left[ f_t\left( \theta +\phi _n(\theta )u\right) -f_t\left( \theta +\phi _n(\theta )v\right) \right] ^2\le |u-v|^{2\rho }\text {pol}(R) \end{aligned}$$
(17)

and

$$\begin{aligned} \sum \limits _{t=1}^nd_{tn\theta }^2(u)\le \text {pol}(R); \end{aligned}$$
(18)

(N.3) for all \(\theta \in K\) and \(u\in {\varGamma }_{n,\theta , R}\), one has

$$\begin{aligned} \sum \limits _{t=1}^nd_{tn\theta }^2(u)\ge \max \Big (\frac{8r}{\delta ^2},\frac{4}{\varLambda _1\delta }\max \limits _{1\le t\le n}\left| d_{tn\theta }(u)\right| \Big )g_n(R). \end{aligned}$$
(19)

By (N.1)–(N.3), we have the following large deviation result.

Theorem 2.1

Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) for all \(t\in \mathcal {N}\). For some \(K\subset \varTheta \subset \mathscr {R}^k\) and suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)–(N.3) be fulfilled. Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in K}P_{\theta }^{(n)}\Big \{|\phi ^{-1}_n(\theta )(\hat{\theta }_n-\theta )|\ge H\Big \}\le B_0\exp \left( -b_0g_n(H)\right) . \end{aligned}$$
(20)

Moreover, for any \(\beta >0\), we can choose \(B_0\) large enough such that \(b_0\ge \frac{\rho }{\rho +k}-\beta \).

We list two assumptions (N.1)\(^\prime \) and (N.4) as follows:

  • (N.1)\(^{\prime }\) for some \(r>0\), condition (N.1) holds with \(\varLambda _1=\infty \);

  • (N.4) there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}^k\) and n large enough,

    $$\begin{aligned} D_1|\phi ^{-1}_n\left( \theta -\theta ^{\prime }\right) |^2\le \sum \limits _{t=1}^n\left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2|\phi ^{-1}_n\left( \theta -\theta ^{\prime }\right) |^2. \end{aligned}$$
    (21)

Replacing (N.1)–(N.3) by (N.1)\(^\prime \) and (N.4), we have a result as follows.

Theorem 2.2

Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) for all \(t\in \mathcal {N}\). For a suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)\(^\prime \) and (N.4) be fulfilled. Then there exist positive constants \(B_0\) and b such that, for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{|\phi ^{-1}_n(\theta )(\hat{\theta }_n-\theta )|\ge H\Big \}\le B_0\exp (-bH^2). \end{aligned}$$
(22)

For any \(\beta >0\), it can be chosen \(B_0\) large enough such that \(b\ge \frac{D_1}{32r(1+k)}-\beta \).

Similar to (N.1) and (N.3), we give the following assumptions:

(N.1)\(^{*}\) for all \(t\in \mathcal {N}\), suppose that there exists a positive number L such that

$$\begin{aligned} |E\varepsilon _t^m|\le \frac{m!}{2}\sigma ^2L^{m-2}, \end{aligned}$$

for all positive integers \(m\ge 2\), where \(\sigma ^2=Var(\varepsilon _t)\);

(N.3)\(^{\prime }\) for all \(\theta \in K\) and \(u\in {\varGamma }_{n,\theta , R}\), it has

$$\begin{aligned} \sum \limits _{t=1}^nd_{tn\theta }^2(u)\ge \max \Big (\frac{16\sigma ^2}{\delta ^2},\frac{8L}{\delta }\max \limits _{1\le t\le n}|d_{tn\theta }(u)|\Big )g_n(R), \end{aligned}$$
(23)

where \(0<\delta <1/2\).

Therefore, similar to Theorems 2.1 and 2.2, we also establish the following results:

Theorem 2.3

Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) and \(Var(\varepsilon _t)=\sigma ^2\) for all \(t\in \mathcal {N}\). For some \(K\subset \varTheta \subset \mathscr {R}^k\) and suitably chosen nonsingular \(\phi _n(\theta )\), let the assumptions (N.1)\(^{*}\), (N.2) and (N.3)\(^{\prime }\) be fulfilled. Then there exist positive constants \(B_0\) and \(b_0\) such that, for all n and H large enough, (20) holds. For any \(\beta >0\), it can be chosen \(B_0\) large enough such that \(b_0\ge \frac{\rho }{\rho +k}-\beta \).

Theorem 2.4

Assume that the errors \(\left\{ \varepsilon _t \right\} \) in the nonlinear regression model (1) are END random variables with \(E\varepsilon _t=0\) and \(Var(\varepsilon _t)=\sigma ^2\) for all \(t\in \mathcal {N}\). For a suitably chosen nonsingular \(\phi _n(\theta )\), let the conditions (N.1)\(^{*}\) and (N.4) be fulfilled. Then there exist positive constants \(B_0\) and \(C_0\) such that, for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{|\phi ^{-1}_n(\theta )\left( \hat{\theta }_n-\theta \right) |\ge H\Big \}\le B_0\exp (-C_0H). \end{aligned}$$
(24)

For all \(\theta \in \varTheta \subset \mathscr {R}^k\), all \(\rho >0\) and n large enough, by taking \(\phi _n(\theta )=n^{-1/2}I_k\) and \(H=n^{1/2}\rho \) in Theorem 2.2, we obtain the following corollary, where \(I_k\) is a \(k\times k\) unit matrix.

Corollary 2.1

Assume that the errors \(\{\varepsilon _t \}\) in the nonlinear regression model (1) are mean zero END random variables satisfying (N.1)\(^{\prime }\). Assume that there exist positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \subset \mathscr {R}^k\) and n large enough,

$$\begin{aligned} D_1n|\theta -\theta ^{\prime }|^2\le \sum \limits _{t=1}^n\left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2n|\theta -\theta ^{\prime }|^2. \end{aligned}$$
(25)

Then for all \(\rho >0\) and n large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{|\hat{\theta }_n-\theta |\ge \rho \Big \}\le B_0\exp (-b\rho ^2n), \end{aligned}$$
(26)

where \(B_0\) and b are defined by (22). So it follows

$$\begin{aligned} \hat{\theta }_n-\theta =O\left( n^{-1/2}\log ^{1/2} n\right) ,\quad completely,\quad as\quad n\rightarrow \infty . \end{aligned}$$
(27)

Remark 2.1

The (N.1), (N.1)\(^\prime \) and (N.1)\(^{*}\) control the tails of errors \(\varepsilon _t\) for all \(t\in \mathcal {N}\). Similar to Condition (III) of Ivanov (1976) (see (11)), Assumption A(ii) of Wu (1981), (2.5) of Prakasa Rao (1984a) and (N.2) of Sieders and Dzhaparidze (1987), the (N.2) is a Hölder-type continuity condition on the parametrization \(\theta \rightarrow f(\theta )\). Similar to (N.3) of Sieders and Dzhaparidze (1987), the (N.3) and (N.3)\(^\prime \) prescribe the rate of asymptotic separation. The asymptotic separation is a necessary condition for the consistent estimation (see Theorem 1 of Wu 1981). The similar conditions can be found by Condition III of Ivanov (1976), (2.6) of Prakasa Rao (1984a), etc. In addition, by the proof of Theorem 2.2 in Sect. 5, (N.2) and (N.3) are fulfilled from the condition (N.4) and \(\varLambda _1=\infty \). Similarly, by the proof of Theorem 2.4, (N.2) and (N.3)\(^\prime \) are fulfilled from (N.4).

Assume that \(\{a_n,n\ge 1\}\) is a positive constant sequence satisfying that \(a_n\rightarrow 0\). For all \(\theta \in \varTheta \subset \mathscr {R}^k\), let \(\phi _n(\theta )=a_nI_k\) and the conditions of Theorem 2.1 be fulfilled, where \(I_k\) is a \(k\times k\) unit matrix. Then for all \(\rho >0\), taking \(H=\frac{1}{a_n}\rho \) in (20), we obtain that

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_\theta ^{(n)}\left( |\hat{\theta }_n-\theta |>\rho \right)= & {} \sup \limits _{\theta \in \varTheta }P_\theta ^{(n)}\left( \frac{1}{a_n}|I_k\left( \hat{\theta }_n-\theta \right) |>\frac{\rho }{a_n}\right) \\\le & {} B_0\exp \left( -b_0g_n(\rho /a_n)\right) \rightarrow 0, \end{aligned}$$

in view of (i) of \(\mathbf G \). So the LS estimator \(\hat{\theta }_n\) is a weak uniform consistency estimator of \(\theta \).

3 Some examples and simulations

In this section, some examples and simulations for the LS estimator of nonlinear regression models are illustrated.

Example 3.1

In the nonlinear model (1), let

$$\begin{aligned} f_t(\theta )=\frac{1}{\theta ^{-1}+t^{1/4}},~~t=1,2,\ldots ,n, \end{aligned}$$

where \(\theta \in \varTheta =\{\theta :0<\delta _1\le \theta \le \delta _2<\infty \}\). Obviously, there exist some positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \) and n large enough,

$$\begin{aligned} D_1\left( \theta -\theta ^{\prime }\right) ^2\log n\le \sum \limits _{t=1}^n \left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2 \left( \theta -\theta ^{\prime }\right) ^2\log n, \end{aligned}$$

where \(D_1<\frac{1}{\delta _2^4}\) and \(D_1\) can be chosen arbitrarily close to \( \frac{1}{\delta _2^4}\). Let the conditions of Theorem 2.2 hold. Then there exist some constants \(B_0\) and b such that, for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{(\log {n})^{1/2}|\hat{\theta }_n-\theta |\ge H\Big \}\le B_0\exp (-bH^2), \end{aligned}$$
(28)

where b can be chosen arbitrarily close (from below) to \( \frac{1}{64r\delta _2^4}\).

If the errors \(\{\varepsilon _t\}\) are i.i.d. random variables and \(\varepsilon _1\sim N(0,\sigma ^2)\), then by Theorem 5 of Wu (1981), it holds that

$$\begin{aligned} (\log {n})^{1/2}(\hat{\theta }_n-\theta )\xrightarrow {\mathscr {L}}N(0,\sigma ^2 \theta ^4), \end{aligned}$$
(29)

which yields

$$\begin{aligned} \lim \limits _{\begin{array}{c} H\rightarrow \infty \\ n\rightarrow \infty \end{array}}\Big (-H^{-2}\log P_{\theta }^{(n)}\Big \{(\log {n})^{1/2}|\hat{\theta }_n-\theta |\ge H\Big \}\Big )=\frac{1}{2\sigma ^2\theta ^4}, \end{aligned}$$
(30)

(see Example 1 of Sieders and Dzhaparidze 1987).

Moreover, by (28), we can get that

$$\begin{aligned} \liminf \limits _{\begin{array}{c} H\rightarrow \infty \\ n\rightarrow \infty \end{array}}\Big (-H^{-2}\log P_{\theta }^{(n)}\Big \{(\log {n})^{1/2}|\hat{\theta }_n-\theta |\ge H\Big \}\Big )\ge \frac{1}{64r\theta ^4}. \end{aligned}$$
(31)

By comparing (31) with (30), we obtain the large deviation result of (28) under END errors, which has the same order of optimal bound as that of independent case.

Example 3.2

Consider the linear model

$$\begin{aligned} X_t=\theta +\varepsilon _t,~~t=1,2,\ldots ,n,~\theta \in \varTheta \subset \mathscr {R}, \end{aligned}$$

where the errors \(\left\{ \varepsilon _t\right\} \) are mean zero END random variables satisfying (N.1)\(^{\prime }\). Applying Theorem 2.2 with \(D_1=D_2=1\) and \(\phi _n(\theta )=n^{-1/2}\), we have that for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{n^{1/2}|\hat{\theta }_n-\theta |\ge H\Big \}\le B_0\exp (-bH^2), \end{aligned}$$
(32)

where \(B_0\) and b are positive constants and b can be chosen arbitrarily close (from below) to \( \frac{1}{64r}\). For all \(\theta \in \varTheta \), we take \(H=\sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\) in (32) and obtain

$$\begin{aligned} \sum _{n=1}^\infty P_{\theta }^{(n)}\Big \{n^{1/2}|\hat{\theta }_n-\theta |\ge \sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\Big \}\le B_0\sum _{n=1}^\infty \exp (-2\log n)<\infty , \end{aligned}$$

i.e.,

$$\begin{aligned} \hat{\theta }_n-\theta =O(n^{-\frac{1}{2}}\log ^{\frac{1}{2}}n),~~\text {completely},~\text {as}~n\rightarrow \infty .\end{aligned}$$

Example 3.3

Consider a power cure model

$$\begin{aligned} X_t=(t+\theta )^d+\varepsilon _t,~~t=1,2,\ldots ,n, \end{aligned}$$
(33)

where \(d>1/2\) and \(\theta \in \varTheta =\{\theta :0<\delta _1\le \theta \le \delta _2<\infty \}\). Let \(f_t(\theta )=(t+\theta )^d\). Then there exist some positive constants \(D_1\) and \(D_2\) such that, for all \(\theta ,\theta ^{\prime }\in \varTheta \),

$$\begin{aligned} D_1 n^{2d-1}\left( \theta -\theta ^{\prime }\right) ^2\le \sum \limits _{t=1}^n \left( f_t(\theta )-f_t(\theta ^{\prime })\right) ^2\le D_2 n^{2d-1}\left( \theta -\theta ^{\prime }\right) ^2. \end{aligned}$$

Let the errors \(\{\varepsilon _t\}\) be mean zero END random variables satisfying (N.1)\(^{\prime }\). Applying Theorem 2.2 with \(\phi _n(\theta )=n^{1/2-d}\), we establish that for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{n^{d-1/2}|\hat{\theta }_n-\theta |\ge H\Big \}\le B_0\exp (-bH^2), \end{aligned}$$
(34)

where \(B_0\) and b are positive constants and b can be chosen arbitrarily close (from below) to \( \frac{D_1}{64r}\). For all \(\theta \in \varTheta \), taking \(H=\sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\) in (34), we obtain that

$$\begin{aligned} \sum _{n=1}^\infty P_{\theta }^{(n)}\Big \{n^{d-\frac{1}{2}}|\hat{\theta }_n-\theta |\ge \sqrt{\frac{2}{b}}\log ^{\frac{1}{2}}n\Big \}\le B_0\sum _{n=1}^\infty \exp (-2\log n)<\infty , \end{aligned}$$

i.e.,

$$\begin{aligned} \hat{\theta }_n-\theta =O(n^{-(d-\frac{1}{2})}\log ^{\frac{1}{2}}n),~~\text {completely},~\text {as}~n\rightarrow \infty , \end{aligned}$$

where \(d>1/2\). Under the independent errors, Wu (1981) investigated the power cure model and obtained the strong consistency of the LS estimator \(\hat{\theta }_n\) for \(\theta \). We extend the result of Wu (1981) to the END case and establish the complete convergence rate for the LS estimator \(\hat{\theta }_n\) of \(\theta \).

Fig. 1
figure 1

Power cure model with \(d=2\), \(\theta =1\) and \(n=10,50,100,200\), by 10000 times

Simulation 3.1 For simplicity, we do the simulation for the power cure model (33) with the case \(d=2\), i.e.,

$$\begin{aligned} X_t=(t+\theta )^2+\varepsilon _t,~~t=1,2,\ldots ,n, \end{aligned}$$

where \(\theta \in \varTheta =\left\{ \theta :0<\delta _1\le \theta \le \delta _2<\infty \right\} \). Let \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\) be a normal random vector such as \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\sim N_n(\mathbf 0 ,\Sigma )\), where \(\mathbf 0 \) is zero vector, \(\Sigma \) is

$$\begin{aligned} \Sigma = \begin{bmatrix} 1+\rho&-\rho&-\rho ^2&0&\cdots&0&0&0&0\\ -\rho&1+\rho&-\rho&-\rho ^2&\cdots&0&0&0&0 \\ -\rho ^2&-\rho&1+\rho&-\rho&\cdots&0&0&0&0 \\ 0&-\rho ^2&-\rho&1+\rho&\cdots&0&0&0&0 \\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots \\ 0&0&0&0&\cdots&1+\rho&-\rho&-\rho ^2&0 \\ 0&0&0&0&\cdots&-\rho&1+\rho&-\rho&-\rho ^2 \\ 0&0&0&0&\cdots&-\rho ^2&-\rho&1+\rho&-\rho \\ 0&0&0&0&\cdots&0&-\rho ^2&-\rho&1+\rho \\ \end{bmatrix}_{n\times n}, \end{aligned}$$

for \(0<\rho <1\). By Joag-Dev and Proschan (1983), it can be seen that \((\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _n)\) is an NA vector. So it is also an END vector. By (14), the LS estimator \({\hat{\theta }}_{n}\) is that \({\hat{\theta }}_{n}=\mathop {{{\mathrm{arg\,min}}}}\limits _{\theta \in {\varTheta }}\sum \nolimits _{t=1}^n(X_t-(t+\theta )^2)^{2}\). Let \(\frac{\mathrm{d}(\sum \nolimits _{t=1}^n(X_t-(t+\theta )^2)^2)}{\mathrm{d}\theta }=0\), then

$$\begin{aligned} n\theta ^3+\left( 3\sum \limits _{t=1}^n t\right) \theta ^2+\left( 3\sum \limits _{t=1}^nt^2-\sum \limits _{t=1}^n X_t\right) \theta +\sum \limits _{t=1}^nt^3-\sum \limits _{t=1}^ntX_t=0. \end{aligned}$$
(35)

It is a cubic equation of \(\theta \). By choosing the solutions of cubic equation, one can get \(\hat{\theta }_n\). For \(\theta =1\), \(\rho =0.1,0.2,0.3,0.4,0.5\) and sample size \(n=10,50,100,200\), we use MATLAB software to obtain the roots of cubic equation (35) by repeating the experiments 10000 times, and find that for each experiment, there are one real root and two complex roots. So we choose the real root as the LS estimator \(\hat{\theta }_n\) and plot the Box plots as Fig. 1.

Similarly, for \(\theta =2\), \(\rho =0.1,0.2,0.3,0.4,0.5\) and sample size \(n=10,50,100\), 200, we do the simulation by repeating the experiments 10,000 times and plot the Box plots for LS estimator \(\hat{\theta }_n\) as Fig. 2.

Fig. 2
figure 2

Power cure model with \(d=2\), \(\theta =2\) and \(n=10,50,100,200\), by 10000 times

In Fig. 1a–d, with the same \(\theta =1\) but different \(\rho =0.1,\ldots ,0.5\), the median of LS estimator \(\hat{\theta }_n\) is close to 1 and the variation range becomes smaller as the sample n increasing by 10, 50, 100 and 200. Likewise, in Fig. 2e–h, with the same \(\theta =2\) but different \(\rho =0.1,\ldots ,0.5\), the median of \(\hat{\theta }_n\) is close to 2 and the variation range becomes smaller as the sample n increases.

We also give the Q–Q plot with \(\theta =1,2\), \(\rho =0.1,0.2\) and \(n=100\) by repeating the experiments 10,000 times, to test the normality of LS estimator \(\hat{\theta }_n\) and obtain the Fig. 3. In Fig. 3, it can be found that the LS estimator \(\hat{\theta }_n\) has a asymptotic normality based on this multivariate normal experiment.

Fig. 3
figure 3

Normal Q–Q plot with \(\theta =1,2\), \(\rho =0.1,0.2\) and \(n=100\), by 10000 times

4 Conclusion

In this paper, we investigate the LS estimator \(\hat{\theta }_n\) of \(\theta \) for the nonlinear model based on END errors which are not necessarily identically distributed. Under the general conditions, we establish some large deviation results such as Theorems 2.12.4 for the LS estimator \(\hat{\theta }_n\). As applications, by some simple conditions, a weak uniform consistency of \(\hat{\theta }_n\) is established (see Remark 2.1), and a complete convergence rate of \(\hat{\theta }_n\) is presented as \(\hat{\theta }_n-\theta =O(n^{-1/2}\log ^{1/2} n)\), completely, in Corollary 2.1. Some examples of nonlinear regression models and simulations are given to illustrate in Sect. 3. We extend the results of Sieders and Dzhaparidze (1987), Prakasa Rao (1984a) and Hu (1993) for the independence, Gaussian, locally generalized Gaussian and martingale differences to the case of END random variables. Since END random variables can be NOD random variables, NSD random variables and NA random variables, the results obtained in this paper also hold true for these dependent structures.

5 Proofs

Before proving our results, we give some technical preliminaries as follows.

Lemma 5.1

(cf. Liu 2010, Lemma 3.1) Let random variables \(\{Y_n,n\ge 1\}\) be a sequence of END random variables. We have that

(1) if \(\{f_n,n\ge 1\}\) is a sequence of all nondecreasing (or nonincreasing) functions, then \(\{f_n(Y_n),n\ge 1\}\) is also a sequence of END random variables;

(2) for each \(n\ge 1\), there exists a positive constant M such that

$$\begin{aligned} E\Bigg (\prod \limits _{i=1}^n Y_i^{+}\Bigg )\le M\prod \limits _{i=1}^n EY_i^{+}. \end{aligned}$$

Lemma 5.2

Let \(\{Y_n, n\ge 1\}\) be a sequence of END random variables and \(\{r_n,n\ge 1\}\) be a sequence of positive numbers. For fixed \(n\ge 1\), suppose that there exists a positive number \(\varLambda _1\) such that

$$\begin{aligned} E\exp \left( \lambda Y_i\right) \le \exp \left( \frac{1}{2}r_i\lambda ^2\right) ,~~0\le |\lambda |\le \varLambda _1,~~i=1,2,\ldots , n. \end{aligned}$$
(36)

Denote \(S_n=\sum \nolimits _{i=1}^n Y_i\) and \(G_n=\sum _{i=1}^nr_i\), \(n\ge 1\). Then there exists a positive constant M such that

$$\begin{aligned} P(S_n\ge x)\le \left\{ \begin{array}{ccc} M\exp \left( -\frac{x^2}{2G_n}\right) ,~ 0\le x\le G_n\varLambda _1,\\ M\exp \left( -\frac{\varLambda _1 x}{2}\right) ,~~x\ge G_n\varLambda _1,~~~~ \end{array}\right. \end{aligned}$$
(37)

and

$$\begin{aligned} P(S_n\le -x)\le \left\{ \begin{array}{ccc} M\exp \left( -\frac{x^2}{2G_n}\right) ,~ 0\le x\le G_n\varLambda _1,\\ M\exp \left( -\frac{\varLambda _1 x}{2}\right) ,~~x\ge G_n\varLambda _1.~~~~ \end{array}\right. \end{aligned}$$
(38)

Consequently,

$$\begin{aligned} P(|S_n|\ge x)\le \left\{ \begin{array}{ccc} 2M\exp \left( -\frac{x^2}{2G_n}\right) ,~ 0\le x\le G_n\varLambda _1,\\ 2M\exp \left( -\frac{\varLambda _1 x}{2}\right) ,~~x\ge G_n\varLambda _1.~~~~ \end{array}\right. \end{aligned}$$
(39)

Proof

For all x, by Markov’s inequality, Lemma 5.1 and (36), we obtain that

$$\begin{aligned} P\left( S_n\ge x\right)\le & {} \exp (-\lambda x)E\exp (\lambda S_n)= \exp (-\lambda x)E\left( \prod \nolimits _{i=1}^n\exp (\lambda Y_i)\right) \\\le & {} M\exp (-\lambda x)\prod \nolimits _{i=1}^nE\exp (\lambda Y_i)\\\le & {} M\exp \left( \frac{G_n\lambda ^2}{2}-\lambda x\right) , \quad \text {for}~ 0<\lambda \le \varLambda _1. \end{aligned}$$

So it has

$$\begin{aligned} P(S_n\ge x)\le M\inf _{0<\lambda \le \varLambda _1}\exp \left( \frac{G_n\lambda ^2}{2}-\lambda x\right) =M\exp \left( \inf \limits _{0<\lambda \le \varLambda _1}\left( \frac{G_n\lambda ^2}{2}-\lambda x\right) \right) . \end{aligned}$$
(40)

For the fixed \(x\ge 0\), if \(\varLambda _1\ge \frac{x}{G_n}\ge 0\), then

$$\begin{aligned} \exp \left( \inf \limits _{0<\lambda \le \varLambda _1}\left( \frac{G_n\lambda ^2}{2}-\lambda x\right) \right) =\exp \left( -\frac{x^2}{2G_n}\right) . \end{aligned}$$
(41)

Meanwhile, for the fixed \(x\ge 0\), if \(\varLambda _1\le \frac{x}{G_n}\), \(x\ge 0\), then

$$\begin{aligned} \exp \left( \inf \limits _{0<\lambda \le \varLambda _1}\left( \frac{G_n\lambda ^2}{2}-\lambda x\right) \right) =\exp \left( \frac{G_n\varLambda _1^2}{2}-\varLambda _1 x\right) \le \exp \left( -\frac{\varLambda _1 x}{2}\right) . \end{aligned}$$
(42)

Consequently, (37) follows from (40) to (42) immediately.

According to Lemma 5.1 (1), \(\{-Y_n\}\) are also END random variables. Therefore, by (37), it yields

$$\begin{aligned} P(S_n\le -x)=P(-S_n\ge x)\le \left\{ \begin{array}{ccc} M\exp \left( -\frac{x^2}{2G_n}\right) ,~ 0\le x\le G_n\varLambda _1,\\ M\exp \left( -\frac{\varLambda _1 x}{2}\right) ,~~x\ge G_n\varLambda _1,~~~~ \end{array}\right. \end{aligned}$$

which implies (38). Combining (37) with (38), we obtain (39) finally. \(\square \)

Remark 5.1

Lemma 5.2 is an extension of exponential inequalities for the independent case (see Theorem 2.6 of Petrov 1995) and NOD case (see Theorem 2.1 of Wang et al. 2010) to the END structure case.

Corollary 5.1

Let \(\{Y_n, n\ge 1\}\) be a sequence of END random variables, \(\{d_n, n\ge 1\}\) be a sequence of real numbers and \(\{r_n, n\ge 1\}\) be a sequence of positive numbers. Suppose there exists a positive constant \(\varLambda _1\) (\(\varLambda _1\) possibly \(\infty \)) such that for all \(|\lambda |\le \varLambda _1\), (36) holds true. Denote \(\tilde{S}_n=\sum \nolimits _{i=1}^n d_i Y_i\), \(\tilde{G}_n=\sum \nolimits _{i=1}^n r_id_i^2\) and \(\varLambda =\varLambda _1/\max \nolimits _{1\le i\le n}|d_i|\). Then for all \(x\ge 0\), there exists a positive constant M such that

$$\begin{aligned} P(\tilde{S}_n\ge x)\le & {} 2M\exp \left\{ -\min \left( \frac{x^2}{8\tilde{G}_n},\frac{\varLambda x}{4}\right) \right\} , \end{aligned}$$
(43)
$$\begin{aligned} P(\tilde{S}_n\le -x)\le & {} 2M\exp \left\{ -\min \left( \frac{x^2}{8\tilde{G}_n},\frac{\varLambda x}{4}\right) \right\} , \end{aligned}$$
(44)
$$\begin{aligned} P(|\tilde{S}_n|\ge x)\le & {} 4M\exp \left\{ -\min \left( \frac{x^2}{8\tilde{G}_n},\frac{\varLambda x}{4}\right) \right\} . \end{aligned}$$
(45)

Proof

Obviously, for \(|\lambda |\le \varLambda =\varLambda _1/\max \nolimits _{1\le i\le n}|d_i|\), we have \(|\lambda d_i^+|\le \varLambda |d_i|\le \varLambda _1\). So by (36), it can be argued that

$$\begin{aligned} E\exp \left( \lambda d_i^+Y_i\right) \le \exp \left( \frac{1}{2}r_i\left( d_i^+\right) ^2\lambda ^2\right) \le \exp \left( \frac{1}{2}r_id_i^2\lambda ^2\right) ,~\text {for}~|\lambda |\le \varLambda . \end{aligned}$$

According to Lemma 5.1 (1), \(d_1^{+}Y_1,\ldots ,d_n^{+}Y_n\) are still END random variables. Denote \(\tilde{S}_n(1)=\sum \nolimits _{i=1}^n d_i^+Y_i\) and \(\tilde{G}_n=\sum \nolimits _{i=1}^n r_id_i^2\). Then, we apply Lemma 5.2 and establish that

$$\begin{aligned} P(\tilde{S}_n(1)\ge x)\le M\exp \left\{ -\min \left( \frac{x^2}{2\tilde{G}_n},\frac{\varLambda x}{2}\right) \right\} . \end{aligned}$$
(46)

Meanwhile, \(d_1^{-}Y_1,\ldots ,d_n^{-}Y_n\) are still END random variables. Denote \(\tilde{S}_n(2)=\sum \nolimits _{i=1}^n d_i^-Y_i\). Similar to the proof of (46), one has

$$\begin{aligned} P(\tilde{S}_n(2)\ge x)\le M\exp \left\{ -\min \left( \frac{x^2}{2\tilde{G}_n},\frac{\varLambda x}{2}\right) \right\} . \end{aligned}$$
(47)

Since

$$\begin{aligned} P(\tilde{S}_n\ge x) \le P\left( \tilde{S}_n(1)\ge x/2\right) +P\left( \tilde{S}_n(2)\ge x/2\right) , \end{aligned}$$
(48)

by (46) to (48), we obtain the result (43). Combining with the proofs of (38), (39) and (43), we have the results of (44) and (45) immediately. \(\square \)

Corollary 5.2

Let \(\left\{ Y_n, n\ge 1\right\} \) be a sequence of END random variables satisfying \(EY_i=0\) and \(EY_i^2=\sigma _i^2<\infty \), \(i=1,2,\ldots ,\) \(\{d_n, n\ge 1\}\) be a sequence of real numbers. Denote \(\tilde{S}_n=\sum \nolimits _{i=1}^n d_i Y_i\) and \(\tilde{B}_n^2=\sum \nolimits _{i=1}^n \sigma _i^2d_i^2\). For the fixed \(n\ge 1\), suppose that there exists a positive number L such that

$$\begin{aligned} |EY_i^m|\le \frac{m!}{2}\sigma _i^2L^{m-2},~~i=1,2,\ldots , n \end{aligned}$$
(49)

for all positive integers \(m\ge 2\). Then there exists a positive constant M such that for all \(x\ge 0\),

$$\begin{aligned} P(\tilde{S}_n\ge x)\le & {} 2M\exp \left\{ -\min \left( \frac{x^2}{16\tilde{B}_n^2},\frac{x}{8L\max \limits _{1\le i\le n}|d_i|}\right) \right\} , \end{aligned}$$
(50)
$$\begin{aligned} P(\tilde{S}_n\le -x)\le & {} 2M\exp \left\{ -\min \left( \frac{x^2}{16\tilde{B}_n^2},\frac{x}{8L\max \limits _{1\le i\le n}|d_i|}\right) \right\} , \end{aligned}$$
(51)
$$\begin{aligned} P(|\tilde{S}_n|\ge x)\le & {} 4M\exp \left\{ -\min \left( \frac{x^2}{16\tilde{B}_n^2},\frac{x}{8L\max \limits _{1\le i\le n}|d_i|}\right) \right\} . \end{aligned}$$
(52)

Proof

It can be argued by \(EY_i=0\), \(EY_i^2=\sigma _i^2\) and (49) that

$$\begin{aligned} E\exp (\lambda Y_i)= & {} 1+\frac{\lambda ^2}{2}\sigma _i^2+\frac{\lambda ^3}{6}EY_i^3+\cdots +\frac{\lambda ^k}{k!}EY_i^k+\cdots \\\le & {} 1+\frac{\lambda ^2}{2}\sigma _i^2\left( 1+L|\lambda |+L^2\lambda ^2+\cdots +L^{k-2}|\lambda |^{k-2}+\cdots \right) ,~i=1,2,\ldots ,n. \end{aligned}$$

If \(|\lambda |\le \frac{1}{2L}\), then

$$\begin{aligned} E\exp (\lambda Y_i)\le 1+\frac{\lambda ^2\sigma _i^2}{2}\frac{1}{1-L|\lambda |}\le 1+\lambda ^2\sigma _i^2\le \exp \left( \lambda ^2\sigma _i^2\right) := \exp \left( \frac{1}{2}r_i\lambda ^2\right) , \end{aligned}$$
(53)

where \(r_i=2\sigma _i^2\) and \(i=1,2,\ldots ,n\). Taking \(\varLambda _1=\frac{1}{2L}\) and \(\tilde{G}_n=\sum _{i=1}^nr_id_i^2=2\sum _{i=1}^n\sigma _i^2d_i^2=2\tilde{B}_n^2\) in Corollary 5.1, we have the results (50)–(52) immediately. \(\square \)

Lemma 5.3

For some \(m\ge 2\), let \(\left\{ Y_n, n\ge 1\right\} \) be a sequence of END random variables with \(EY_n=0\) and \(E|Y_n|^m<\infty \), \(n=1,2,\ldots \). Assume that \(\left\{ a_{ni}, 1\le i\le n, n\ge 1\right\} \) is a triangular array of real numbers. Denote \(S_n=\sum \nolimits _{i=1}^n a_{ni}Y_i\). Then there exists a positive constant C not dependent on n such that

$$\begin{aligned} E|S_n|^m\le C\max \limits _{1\le i\le n}E|Y_i|^m\Bigg (\sum \limits _{i=1}^na_{ni}^2\Bigg )^{m/2}. \end{aligned}$$
(54)

Proof

Denote \(S_n(1)=\sum \nolimits _{i=1}^n a_{ni}^{+}Y_i\) and \(S_n(2)=\sum \nolimits _{i=1}^n a_{ni}^{-}Y_i\). For \(m\ge 1\), by \(C_r\) inequality, one has

$$\begin{aligned} E|S_n|^m=E|S_n(1)-S_n(2)|^m\le 2^{m-1}(E|S_n(1)|^m+E|S_n(2)|^m). \end{aligned}$$
(55)

Obviously, by Lemma 5.1 (1), \(\{a_{ni}^{+}Y_i,1\le i\le n\}\) and \(\{a_{ni}^{-}Y_i,1\le i\le n\}\) are also END random variables. Then, for \(m\ge 2\), by Corollary 3.2 of Shen (2011), it yields that

$$\begin{aligned} E|S_n(1)|^m\le & {} C_1\Bigg (\sum \limits _{i=1}^n (a_{ni}^{+})^mE|Y_i|^m+\Bigg (\sum \limits _{i=1}^n (a_{ni}^{+})^2EY_i^2\Bigg )^{m/2}\Bigg )\nonumber \\\le & {} C_1\max \limits _{1\le i\le n}E|Y_i|^m\Bigg (\sum \limits _{i=1}^n |a_{ni}|^m+\Bigg (\sum \limits _{i=1}^n a_{ni}^2\Bigg )^{m/2}\Bigg )\nonumber \\\le & {} C_2\max \limits _{1\le i\le n}E|Y_i|^m\Bigg (\sum \limits _{i=1}^n a_{ni}^2\Bigg )^{m/2}. \end{aligned}$$
(56)

Similarly, it has

$$\begin{aligned} E|S_n(2)|^m\le C_3\max \limits _{1\le i\le n}E|Y_i|^m\Bigg (\sum \limits _{i=1}^n a_{ni}^2\Bigg )^{m/2}. \end{aligned}$$
(57)

Thus, (54) follows from (55) to (57) immediately. \(\square \)

Proof of Theorem 2.1

Let \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta ,R}\). If \(|u-v|\ge k_1\), then we obtain by (18) that

$$\begin{aligned} \sum \limits _{t=1}^n\left( d_{tn\theta }(u)-d_{tn\theta }(v)\right) ^2\le 2\text {pol}(R)\le 2|u-v|^{2\rho }k_1^{-2\rho }\text {pol}(R). \end{aligned}$$

By (17), it can be found that

$$\begin{aligned} \sum \limits _{t=1}^n\left( d_{tn\theta }(u)-d_{tn\theta }(v)\right) ^2\le |u-v|^{2\rho }\text {pol}(R),~~\text {for all}~u,v\in {\varGamma }_{n,\theta ,R}. \end{aligned}$$
(58)

Taking \(\zeta _{n,\theta }(u):=\log Z_{n,\theta }(u)\) in (15), we establish that

$$\begin{aligned} \zeta _{n,\theta }(u)-\zeta _{n,\theta }(v)=\sum \limits _{t=1}^n(A_t\varepsilon _t-B_t), \end{aligned}$$
(59)

where

$$\begin{aligned} A_t=d_{tn\theta }(u)-d_{tn\theta }(v),~~~B_t=\frac{1}{2}(d^2_{tn\theta }(u)-d^2_{tn\theta }(v)). \end{aligned}$$
(60)

For all \(m\ge 1\), by \(C_r\) inequality, it yields

$$\begin{aligned} E_\theta ^{(n)}|\zeta _{n,\theta }(u)-\zeta _{n,\theta }(v)|^m\le 2^{m-1}\left( E_\theta ^{(n)}\Big |\sum \limits _{t=1}^nA_t\varepsilon _t\Big |^m+\Big |\sum \limits _{t=1}^nB_t\Big |^m\right) . \end{aligned}$$
(61)

Obviously, the condition (N.1) implies that \(\left\{ E|\varepsilon _t|^m,t\in \mathcal {N}\right\} \) is uniformly bounded. So, by (58) and Lemma 5.3 with (N.1) and \(E\varepsilon _t=0\), we obtain that for all \(m\ge 2\)

$$\begin{aligned} E_\theta ^{(n)}\Big |\sum \limits _{t=1}^nA_t\varepsilon _t\Big |^m\le C\left( \sum \limits _{t=1}^n A_t^2\right) ^{m/2}\le |u-v|^{\rho m}\text {pol}(R). \end{aligned}$$
(62)

Meanwhile, by Cauchy–Schwarz inequality, (18) and (58), one has that

$$\begin{aligned} \Bigg |\sum \limits _{t=1}^n B_t\Bigg |\le & {} \frac{1}{2}\sum \limits _{t=1}^n|d_{tn\theta }(u)-d_{tn\theta }(v)|\cdot |d_{tn\theta }(u)+d_{tn\theta }(v)|\nonumber \\\le & {} \frac{1}{2}\Bigg \{\sum \limits _{t=1}^n A_t^2\cdot \sum \limits _{t=1}^n(d_{tn\theta }(u)+d_{tn\theta }(v))^2\Bigg \}^{1/2}\le |u-v|^{\rho }\text {pol}(R). \end{aligned}$$
(63)

So it follows from (61)–(63) that for all \(\theta \in K\) and \(u,v\in {\varGamma }_{n,\theta ,R}\)

$$\begin{aligned} E_\theta ^{(n)}|\zeta _{n,\theta }(u)-\zeta _{n,\theta }(v)|^m\le |u-v|^{\rho m} \text {pol}(R). \end{aligned}$$
(64)

Taking \(m>\max (2,k/\rho )\) in (64), we find that (7) is fulfilled with \(\alpha =\rho m\).

Next, we verify that (10) holds true. For \(0<\delta <\frac{1}{2}\), let

$$\begin{aligned} \eta _{n,\theta }(u)=\left( \frac{1}{2}-\delta \right) \sum \limits _{t=1}^n d^2_{tn\theta }(u). \end{aligned}$$
(65)

By (19), it can be argued that

$$\begin{aligned} \sum \limits _{t=1}^n d^2_{tn\theta }(u)\ge 32rg_n(R), \end{aligned}$$
(66)

which shows that \(\eta _{n,\theta }(u)\in \mathbf H _K\), because from (5), \(g^{-1}_n(R)\le 1\) for all n and R large enough. Denote \(d_{tn\theta }(u)=d_t\) and \(\max \limits _{1\le t\le n}|d_{tn\theta }(u)|=\max |d_{t}|\). Then, by (19), (59), (60), (65), (66) and Corollary 5.1, we get

$$\begin{aligned} P_\theta ^{(n)}\Big (\zeta _{n,\theta }(u)-\zeta _{n,\theta }(0)\ge -\eta _{n,\theta }(u)\Big )= & {} P_\theta ^{(n)}\Bigg (\sum \limits _{t=1}^n d_t\varepsilon _t\ge \delta \sum \limits _{t=1}^n d_t^2\Bigg )\\\le & {} 2M\exp \left\{ -\sum \limits _{t=1}^n d_t^2\min \Big (\frac{\delta ^2}{8r},\frac{\delta \varLambda _1}{4\max |d_t|}\Big )\right\} \\= & {} 2M\exp \left\{ -\sum \limits _{t=1}^n d_t^2/\max \Big (\frac{8r}{\delta ^2},\frac{4\max |d_t|}{\delta \varLambda _1}\Big )\right\} \\\le & {} 2M\exp (-g_n(R)), \end{aligned}$$

which implies that (10) is fulfilled. Combining Theorem 1.1 with Remark 1.2, we obtain (20). Meanwhile, by choosing \(\alpha =\rho m\) in Theorem 1.1, for all \(\beta >0\) and m large enough, there exists a positive \(B_0\) such that (20) holds, where \(b_0\ge \frac{\rho }{\rho +k}-\beta \).   \(\square \)

Proof of Theorem 2.2

By (21), (17) is fulfilled with \(\rho =1\) and pol\((R)=D_2\). Obviously, for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), it has

$$\begin{aligned} \sum \limits _{t=1}^n d_{tn\theta }^2\le D_2|u|^2\le D_2(R+1)^2, \end{aligned}$$
(67)

which implies that (18) is satisfied. Meanwhile, for all \(0<\delta <1/2\) and \(u\in {\varGamma }_{n,\theta ,R}\), it can be argued that

$$\begin{aligned} \sum \limits _{t=1}^n d_{tn\theta }^2\ge D_1|u|^2\ge D_1R^2. \end{aligned}$$
(68)

Then, by (68) and \(\varLambda _1=\infty \), (19) is fulfilled with \(g_n(R)=\frac{D_1\delta ^2}{8r}R^2\). Letting \(\delta \rightarrow \frac{1}{2}\), we apply Theorem 2.1 and obtain the result (22) finally. \(\square \)

Proof of Theorem 2.3

Combining Corollary 5.2 with the proof of Theorem 2.1, where r is replaced by \(2\sigma ^2\) and \(\varLambda _1\) is replaced by \(\frac{1}{2L}\), we have (20) immediately. \(\square \)

Proof of Theorem 2.4

Together (21) with (67), for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), it can be checked that

$$\begin{aligned} d_{tn\theta }^2\le \sum \limits _{t=1}^n d_{tn\theta }^2\le D_2|u|^2\le D_2(R+1)^2,~~1\le t\le n, \end{aligned}$$

which implies

$$\begin{aligned} \max \limits _{1\le t\le n}|d_{tn\theta }|\le \sqrt{D_2}(R+1). \end{aligned}$$
(69)

Let \(g_n(R)=C_1R\), where \(C_1\) is a positive constant and is defined later. Next, we prove that (23) in (N.3)\(^\prime \) is fulfilled for all R large enough. By (21), (68) and (69), for all \(u\in {\varGamma }_{n,\theta ,R}\), \(\theta \in \varTheta \), we take a positive constant \(C_1\) such that for all R large enough,

$$\begin{aligned} \sum \limits _{t=1}^n d_{tn\theta }^2\ge D_1R^2\ge \frac{16\sigma ^2}{\delta ^2}C_1R=\frac{16\sigma ^2}{\delta ^2}g_n(R) \end{aligned}$$
(70)

and

$$\begin{aligned} \sum \limits _{t=1}^n d_{tn\theta }^2\ge D_1R^2\ge \frac{8L}{\delta }\sqrt{D_2}(R+1)C_1R\ge \frac{8L}{\delta }\max \limits _{1\le t\le n}|d_{tn\theta }|g_n(R). \end{aligned}$$
(71)

Thus, by (70) and (71), (23) is fulfilled. Consequently, by the proofs of Theorems 2.1 and 2.3, we apply Theorem 2.3 and establish that for all n and H large enough,

$$\begin{aligned} \sup \limits _{\theta \in \varTheta }P_{\theta }^{(n)}\Big \{|\phi ^{-1}_n(\theta )(\hat{\theta }_n-\theta )|\ge H\Big \}\le B_0\exp (-C_0H), \end{aligned}$$
(72)

where \(C_0=b_0C_1\), \(B_0\) and \(b_0\) are defined in Theorem 2.3. \(\square \)

Proof of Corollary 2.1

For all \(\theta \in \varTheta \subset \mathscr {R}^k\) and all \(\rho >0\), taking \(\phi _n(\theta )=n^{-1/2}I_k\) and \(H=n^{1/2}\rho \) in (22), one establishes the result (26) immediately, where \(I_k\) is a \(k\times k\) unit matrix. Taking \(C_1\) large enough, we apply (26) and establish that

$$\begin{aligned} \sum _{n=1}^\infty P_{\theta }^{(n)}\Big \{|\hat{\theta }_n-\theta |\ge C_1n^{-\frac{1}{2}}\log ^{\frac{1}{2}}n\Big \}\le & {} B_0\sum _{n=1}^\infty \exp (-bC_1^2\log n)<\infty . \end{aligned}$$

This completes the proof of (27). \(\square \)