1 Introduction

Consider the nonlinear regression model:

$$\begin{aligned} X_{n}=g_{n}(\theta )+\xi _{n},~n\ge 1, \end{aligned}$$
(1.1)

where \(X_{n}\) is observed, \(\{g_{n}(\theta )\}\) is a known sequence of continuous functions possibly nonlinear in \(\theta \in \Theta \), a closed interval on the real line, and \(\{\xi _{n},n\ge 1\}\) is a sequence of random errors with zero mean. The nonlinear regression models have significant advantages over the linear models. The main one is that the nonlinear regression models have essentially fewer unknown parameters. Also, the parameters of nonlinear models have the meaning of physical variables while the linear parameters are usually devoid of physical significance. Therefore, it is of great interest to study the nonlinear regression model. In most of studies devoted to the problems of regression analysis in the past decades, the central place is occupied by the least squares method of estimation of parameters, which has a protracted history. Let

$$\begin{aligned} Q_{n}(\theta )=\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{2}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$

where \(\{\omega _{i}\}\) is a known sequence of positive numbers. An estimator \(\theta _{n}\) is said to be an ordinary least squares estimator (OLSE, for short) of \(\theta \) if it minimizes \(Q_{n}(\theta )\), that is, \(Q_{n}(\theta _{n})=\inf _{\theta \in \Theta }Q_{n}(\theta )\).

The study of asymptotic properties of the OLSE for parameters in nonlinear regression models has been the main subject of investigation. It is challenging in investigating this model since the OLSE non-linearly entering in a regression function parameters can not be found in an explicit form, which complicates the description of its mathematical properties. Hence, on introducing into statistics the use of nonlinear regression analysis it is necessary to overcome a series of mathematical difficulties which do not have analogues in the linear theory. For the OLSE of the nonlinear model based on i.i.d. random errors, Jennrich (1969) established the asymptotic normality, Malinvaud (1970) investigated the consistency, and Wu (1981) established the necessary and sufficient condition for the strong consistency, and so on. In particular, Ivanov (1976) obtained the following result on large deviation for the OLSE with \(\omega _{i}\equiv 1\) based on i.i.d. random errors.

Theorem 1.1

Let \(\{\xi _{n},n\ge 1\}\) be i.i.d. random variables with \(E|\xi _{1}|^{p}<\infty \) for some \(p\ge 2\). Suppose there exist some constants \(0<c_{1}\le c_{2}<\infty \) such that

$$\begin{aligned} c_{1}(\theta _{1}-\theta _{2})^{2}\le \frac{1}{n}\sum _{i=1}^{n}(g_{n}(\theta _{1})-g_{n}(\theta _2))^{2}\le c_{2}(\theta _{1}-\theta _{2})^{2},~\text { for any }\theta _{1},\theta _{2}\in \Theta ,~n\ge 1. \end{aligned}$$

Then for every \(\rho >0\) and for all \(n\ge 1\), it has

$$\begin{aligned} P(n^{1/2}|\theta _{n}-\theta _{0}|>\rho )\le c\rho ^{-p}, \end{aligned}$$

where \(\theta _{0}\) is the true parameter such that \(\theta _{0}\in \) interior of \(\Theta \) and c is a positive constant independent of n and p.

Prakasa Rao (1984) extended Theorem 1.1 from i.i.d. case to some dependent cases such as \(\varphi \)-mixing and \(\alpha \)-mixing assumptions. Hu (2002) extended Theorem 1.1 to martingale differences, \(\varphi \)-mixing and negative association (NA, for short) assumptions under \(\sup _{i\ge 1}E|\xi _{i}|^{p}<\infty \) for some \(p>2\) without identical distribution. Hu (2004) further considered the large deviation result under the moment condition \(\sup _{i\ge 1}E|\xi _{i}|^{p}<\infty \) for some \(1<p\le 2\). Yang and Hu (2014) obtained some general results on the large deviation, which can also be available under some cases satisfying \(\sup _{i\ge 1}E|\xi _{i}|^{p}=\infty \) for some \(p>1\); Yang et al. (2017) established some large deviation results under extended negatively dependent (END, for short) random errors, and so on. However, a new challenge will emerge if the errors are heteroscedastic, i.e., estimating the variances of the errors is not an easy work.

It is well known that the bootstrap is a very excellent method, which has been used comprehensively in many statistical models including nonlinear regression model. One can see Staniewski (1984) for example. As an alternative approach, the random weighting method, or Bayesian bootstrap method, has received increasing attentions by scholars since it was originally suggested by Rubin (1981). The random weighting method is motivated by the bootstrap method and can be regarded as a kind of smoothing of bootstrap. Instead of re-sampling from the original data set, the random weighting method propels us to generate a group of random weights directly from the computer and then use them to weight the original samples. In comparison with the bootstrap method, the random weighting method has advantages such as the simplicity in computation, the suitability for large samples, and there is no need to know the distribution function. Therefore, this method was adopted in various statistical models. For more details, we refer the readers to Zheng (1987), Gao et al. (2003), Xue and Zhu (2005), Fang and Zhao (2006), Barvinok and Samorodnitsky (2007), Gao and Zhong (2010), and so forth.

However, to the best of our knowledge, there is no literature considering the randomly weighted estimation in nonlinear regression models. In this paper, the random weighting method is adopted for the first time to the least squares estimation in nonlinear regression models. Now we are at a position to present this method.

Definition 1.1

(cf. Ng et al. 2011) Let \((W_{1},\cdots ,W_{n})\) be a random vector with \(W_{i}\ge 0\) and \(\sum _{i=1}^{n}W_{i}=1\). Then the Dirichlet probability density function of \((W_{1},\cdots ,W_{n})\) is defined as

$$\begin{aligned} f(w_{1},\cdots ,w_{n})=\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\prod _{i=1}^{n}w_{i}^{\alpha _{i}-1}, \end{aligned}$$

where \(\alpha _{i}>0\), \(\alpha _{0}=\sum _{i=1}^{n}\alpha _{i}\), \(w_{i}\ge 0\), \(\sum _{i=1}^{n-1}w_{i}\le 1\) and \(w_{n}=1-\sum _{i=1}^{n-1}w_{i}\). This distribution is denoted by \(Dir(\alpha _{1},\cdots ,\alpha _{n})\).

By virtue of the concept of Dirichlet distribution, we can propose the randomly weighted least squares estimator of \(\theta \) as follows. Let

$$\begin{aligned} H_{n}(\theta )=\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$
(1.2)

where \(W_{i}\)’s are independent of \(\xi _{i}\)’s and the random vector \({\varvec{W}}=(W_{1},\cdots ,W_{n})\) obeys the Dirichlet distribution \(Dir(4,4,\ldots ,4)\), namely, \(\sum _{i=1}^{n}W_{i}=1\) and the joint density of \(W_{1},\cdots ,W_{n-1}\) is

$$\begin{aligned} f(w_{1},\cdots ,w_{n-1})=\frac{\Gamma (4n)}{(\Gamma (4))^{n}}w_{1}^{3}\cdots w_{n-1}^{3}(1-w_{1}-\cdots w_{n-1})^{3}, \end{aligned}$$

where \((w_{1},\cdots ,w_{n-1})\in D_{n-1}\) and \(D_{n-1}=\{(w_{1},\cdots ,w_{n-1}):w_{i}\ge 0,i=1,\ldots ,n-1,\sum _{i=1}^{n-1}w_{i}\le 1\}\). An estimator \(\hat{\theta }_{n}\) is said to be a randomly weighted least squares estimator (RWLSE, for short) of \(\theta \) if \(\hat{\theta }_{n}=\arg \inf _{\theta \in \Theta }H_{n}(\theta )\).

Since independence assumption is usually implausible in reality, we will adopt a relatively broad dependence, i.e., END assumption in the sequel. The concept of END random variables was introduced by Liu (2009) as follows.

Definition 1.2

A finite collection of random variables \(X_1,X_2,\cdots ,X_n\) is said to be END if there exists a constant \(M > 0\) such that both

$$\begin{aligned} P(X_1>x_1,X_2>x_2,\cdots ,X_n>x_n)\le M\prod _{i=1}^nP(X_i>x_i)\end{aligned}$$

and

$$\begin{aligned} P(X_1\le x_1, X_2\le x_2,\cdots ,X_n\le x_n)\le M\prod _{i=1}^nP(X_i\le x_i) \end{aligned}$$

hold for all real numbers \(x_1, x_2, \cdots , x_n\). An infinite sequence \(\{X_n, n\ge 1\}\) is said to be END if every finite sub-collection is END.

Liu (2009) provided some examples satisfying END structure, one of which shows that if \(X_1,X_2,\cdots ,X_n\) are dependent according to a multivariate copula function \(C(u_{1},\cdots ,u_{n})\) with absolutely continuous distribution functions \(F_{1},\cdots ,F_{n}\), the joint copula density \(c(u_{1},\cdots ,u_{n})=\frac{\partial ^{n}C(u_{1},\cdots ,u_{n})}{\partial u_{1}\cdots \partial u_{n}}\) exists and be uniformly bounded in the whole domain, then \(\{X_{n},n\ge 1\}\) are END. If we take \(M=1\), then the END structure degenerates to negatively orthant dependent (NOD, for short) structure which was introduced by Lehmann (1966) (cf. also Joag-Dev and Proschan 1983). The END structure can reflect not only a negative dependence structure but also a positive one to some extent. Liu (2009) pointed out that the END random variables can be regarded as negatively or positively dependent and provided some interesting examples to support this idea. Joag-Dev and Proschan (1983) also pointed out that negatively associated (NA, for short) random variables are NOD but the inverse is not necessarily true, thus NA random variables are also END. Hence, the consideration of END structure is reasonable and of great interest. Many applications have been found for END random variables. For example, Liu (2010) studied the sufficient and necessary conditions of moderate deviations for END random variables with heavy tails; Chen et al. (2010) established the strong law of large numbers for END random variables and gave their applications to risk theory and renewal theory; Shen (2011) established some exponential probability inequalities for END random variables and presented some applications; Wang and Wang (2013) investigated the precise large deviations for random sums of END real-valued random variables with consistent variation; Wang et al. (2014) proved some results on complete convergence of END random variables; Lita da Silva (2015) established the almost sure convergence for sequences of END random variables; Wang et al. (2015) and Yang et al. (2018) studied the complete consistency of the estimator of nonparametric regression models based on END errors; Wu et al. (2019) investigated the complete f-moment convergence for END random variables, and so on.

For the proposed RWLSE above-mentioned, we establish two general results on large deviation for RWLSE of the parameter \(\theta \) with \(p>2\), and respectively, \(1<p\le 2\) under END errors. As direct corollaries, the rates of complete consistency, strong consistency, and weak consistency are obtained, which reflect that the proposed RWLSE is a consistent estimator of \(\theta \). The numerical analysis reveals that the RWLSE performs as well as the OLSE in heteroscedastic nonlinear regression models, where sometimes the former one is better. As we have pointed out earlier, it is not easy to estimate the variances of heteroscedastic errors, this paper provides an alternative method to estimate the parameters in a heteroscedastic nonlinear regression model.

Throughout this paper, the symbol C represents some positive constant which can be different in different places. \(C(p),C'(p),C_{1}(p),C_{2}(p),\cdots \) are some positive constants depending only on p. Let I(A) be the indicator function of the event A and \(\lfloor x\rfloor \) denote the integer part of x. Denote \(x^{+}=xI(x\ge 0)\), \(x^{-}=-xI(x<0)\). \(\log n=\ln \max (x,e)\), where \(\ln x\) represents the natural logarithm of x.

The rest of this paper is organized as follows: The main results are stated in Sect. 2. The numerical analysis is provided in Sect. 3. The proofs of the main results are presented in Sect. 4. Some lemmas for proving the main results are given in Appendix.

2 Main results

The main results on large deviations are presented as follows.

Theorem 2.1

Let \(p>2\). In model (1.1), assume that \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(E|\xi _{n}|^{p}<\infty \) for each \(n\ge 1\). If there exist positive numbers \(\lambda _n\le \Lambda _n\) for each \(n\ge 1\), such that

$$\begin{aligned} \lambda _n|\theta _{1}-\theta _{2}|\le |g_{i}(\theta _{1})-g_{i}(\theta _2)|\le \Lambda _n|\theta _{1}-\theta _{2}|,~\text { for any }\theta _{1},\theta _{2}\in \Theta ,~1\le i\le n,~n\ge 1,\nonumber \\ \end{aligned}$$
(2.1)

then there exists a positive constant C(p) depending only on p such that for all \(\rho >0\) and each \(n\ge 1\),

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le C(p)\left[ n^{-1}\lambda _n^{-p}\Delta _{np}+n^{-p/2}(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})\right] \rho ^{-p},\nonumber \\\end{aligned}$$
(2.2)

where \(\Delta _{np}=\sum _{i=1}^{n}E|\xi _{i}|^{p}\) and \(\nabla _{np}=\left( \sum _{i=1}^{n}E\xi _{i}^{2}\right) ^{p/2}\).

Theorem 2.2

Let \(1<p\le 2\). In model (1.1), assume that \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(E|\xi _{n}|^{p}<\infty \) for each \(n\ge 1\). If (2.1) holds, then there exists a positive constant \(C'(p)\) depending only on p such that for all \(\rho >0\) and each \(n\ge 1\),

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le C'(p)(\Lambda _n/\lambda _n^{2})^{p}n^{-p/2}\Delta _{np}\rho ^{-p}. \end{aligned}$$
(2.3)

Remark 2.1

It is easy to see that if \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(p>2\), Theorem 2.1 extends Theorem 1.1 from i.i.d assumption to END random errors with not necessarily identical distribution. Similarly, if \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(1<p\le 2\), then Theorem 2.2 also extends the corresponding result of Hu (2004).

Remark 2.2

Yang and Hu (2014) also established the similar results for OLSE of \(\theta \) with NOD errors. By taking \(\lambda _{n}=c_{1}\), \(\Lambda _{n}=c_{2}\) for some \(0<c_1\le c_2\), we point out that the conditions in Theorem 2.1 and the corresponding result of Yang and Hu (2014) do no imply each other. For example, \(n^{-1}\sum _{i=1}^{n}E|\xi _{i}|^{p}>n^{-p/2}\sum _{i=1}^{n}E|\xi _{i}|^{p}\) but \(n^{-p/2}\left( \sum _{i=1}^{n}E\xi _{i}^{2}\right) ^{p/2}\le n^{-p/2}\left( \sum _{i=1}^{n}(E|\xi _{i}|^{p})^{2/p}\right) ^{p/2}\). However, if \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(p>2\), they are equivalent. Hence, our results extend the corresponding ones of Yang and Hu (2014).

By Theorem 2.1, we can obtain the result concerning the rate of complete consistency and strong consistency as follows.

Corollary 2.1

In model (1.1), assume that \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(p>2\). If (2.1) holds with \(\lambda _{n}=c_1\) and \(\Lambda _{n}=c_{2}\) for some \(0<c_{1}\le c_{2}<\infty \), then for any \(\epsilon >0\),

$$\begin{aligned} \sum _{n=1}^{\infty }P(|\hat{\theta }_{n}-\theta _{0}|>\epsilon n^{1/p-1/2}\sqrt{\log n})<\infty ,\end{aligned}$$

and thus

$$\begin{aligned} |\hat{\theta }_{n}-\theta _{0}|=o\left( n^{1/p-1/2}\sqrt{\log n}\right) ~a.s.,~\text {as }n\rightarrow \infty . \end{aligned}$$

By Theorem 2.2, we can also obtain the following result on rate of weak consistency of the RWLSE \(\hat{\theta }_{n}\).

Corollary 2.2

In model (1.1), assume that \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(1<p\le 2\). If (2.1) holds with \(\lambda _{n}=c_1\) and \(\Lambda _{n}=c_{2}\) for some \(0<c_{1}\le c_{2}<\infty \), then

$$\begin{aligned}|\hat{\theta }_{n}-\theta _{0}|=O_P\left( n^{1/p-1}\right) .\end{aligned}$$

In particular, if \(p=2\), then for any positive sequence \(\{\tau _{n},n\ge 1\}\) satisfying \(\tau _{n}=o(n)\),

$$\begin{aligned}\sqrt{\tau _{n}}|\hat{\theta }_{n}-\theta _{0}|\xrightarrow {P}0.\end{aligned}$$

3 Some examples and numerical analysis

3.1 Some examples

In this subsection, we present some examples for the RWLSE of nonlinear regression models.

Example 3.1

Consider the linear model

$$\begin{aligned} X_{i}=\theta +\xi _{i},~i=1,2,\ldots ,n,~\theta \in \Theta , \end{aligned}$$
(3.4)

where \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(E|\xi _{n}|^{p}<\infty \) for each \(n\ge 1\). Obviously, (2.1) holds with \(\lambda _{n}=\Lambda _{n}=1\). Hence, Theorems 2.1 and 2.2 follows from \(E|\xi _{n}|^{p}<\infty \) with \(p>2\) and respectively, \(1<p\le 2\).

Example 3.2

Consider the Michaelis-Menten model (see Sieders and Dzhaparidze (1987)) or Miao and Tang (2021) for example)

$$\begin{aligned} V(v,L,N)=\frac{Lv}{N+v}, \end{aligned}$$

which is used to describe the relation between the velocity V of an enzyme reaction and the concentration v of the substrate. The parameter L denotes the maximal reaction velocity and the parameter N implies the chemical affinity. Based on the model above, for each concentration \(v_{i}\), there is a measurement of the velocity \(V_{i}\) with error \(\xi _{i}\), i.e.,

$$\begin{aligned} X_{i}=V_{i}(v_i,L,N)=\frac{Lv_{i}}{N+v_{i}}+\xi _{i},i\ge 1. \end{aligned}$$
(3.5)

Assume that the parameter set \((L, N)\in \Theta \) is a bounded open set in the positive quadrant. Consider the following simple form of model (3.5)

$$\begin{aligned} X_{i}=g_{i}(N)+\xi _{i}=\frac{1}{N^{-1}+i^{\mu }}+\xi _{i}, \end{aligned}$$
(3.6)

which follows from (3.5) by assuming N/L is known (without loss of generality, we may assume that \(N/L=1\)) and letting \(v_{i}=i^{-\mu }\), \(0<\mu <\min \{(p-1)/(4p),1/8\}\), where \(p>1\). It is easy to see that

$$\begin{aligned} c_{3}n^{-2\mu }|N_{1}-N_{2}|\le & {} |g_{i}(N_{1})-g_{i}(N_{2})|=\frac{|N_{1}-N_{2}|}{(1+N_{1}i^{\mu })(1+N_{2}i^{\mu })}\\\le & {} c_{4}|N_{1}-N_{2}|,~1\le i\le n \end{aligned}$$

for some \(0<c_{3}\le c_{4}<\infty \). Assume further that \(\{\xi _{n},n\ge 1\}\) is a sequence of END random errors with zero mean and \(E|\xi _{n}|^{p}<\infty \) for each \(n\ge 1\), then Theorems 2.1 and 2.2 hold. Moreover, by choosing \(\rho =n^{1/2}\epsilon \), we can obtain the weak consistency for \(p>1\) and strong consistency for some p large enough.

3.2 Numerical analysis

In this section, we will carry out some simulations to study the finite sample performance of the RWLSE in the homoscedastic and heteroscedastic nonlinear regression models. The data are generated from model (3.4) (denoted as Model 1) and (3.6) (denoted as Model 2) respectively. For Model 1, set \(\theta =1\) and for Model 2, set \(N=1/5\). Set the sample size \(n=50,100,200,400,800,1600\). Let \((\epsilon _1,\cdots ,\epsilon _n)\sim N_n(0,\Sigma )\) with

$$\begin{aligned} \Sigma =\begin{pmatrix} 1&{}\quad -0.3&{}\quad 0&{}\quad \cdots &{}\quad 0\\ -0.3&{}\quad 1&{} \quad -0.3&{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad -0.3&{}\quad 1 &{}\quad \ddots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad -0.3\\ 0&{}\quad \cdots &{}\quad 0&{}\quad -0.3&{}\quad 1 \end{pmatrix}. \end{aligned}$$
(3.7)

The weights \(W\sim Dir(4,4,\cdots ,4)\), and the generation method of W is refer to Narayanan (1990).

We first use the RWLSE to estimate \(\theta \) for Model 1 and N for Model 2 with homoscedasticity, i.e., \(\xi _i=\epsilon _i\) for each \(1\le i\le n\). Repeat the procedure 1000 times and calculate the mean and variance of the estimator. The results are given in Table 1. We can see that \(\hat{\theta }\) for Model 1 is unbiased, while \(\hat{N}\) for Model 2 is asymptotically unbiased. The trend that \(Var[\sqrt{n}(\hat{\theta }-\theta )]\) and \(Var[\sqrt{n}(\hat{N}-N)]\) are finite indicates that the convergence rate of the RWLSE is asymptotically \(O(n^{-1/2})\). To compare the RWLSE to the OLSE under END errors, we further present the the mean and variance of the OLSE in Table 2. The results show that there are no intrinsic difference between the mean of the two estimators. The mean and the variance of the RWLSE are slightly inferior to those of the OLSE in both the two models.

Table 1 Mean and variance of the RWLSE under END errors for homoscedastic Models 1 and 2
Table 2 Mean and variance of the OLSE under END errors for homoscedastic Models 1 and 2

We now consider the heteroscedastic case, i.e., \(\xi _i=\big [1+\frac{(-1)^{i}(i-1)}{n}\big ]\epsilon _{i}\) for each \(1\le i\le n\). Other settings are the same as above. The results are given in Table 3 and Table 4. The mean and the variance of the RWLSE are better than those of the OLSE in Model 1 but slightly weaker in Model 2. However, the convergence rates of the two estimators are almost the same. Note that in our simulation, the heteroscedasticity is known. However, in many realistic applications, it is not easy to estimate the variance of the errors if they are heteroscedastic. Therefore, our simulation results show that the RWLSE has a good performance without estimating the variance of the errors first, which provide us an alternative choice when dealing with similar issues.

Table 3 Mean and variance of the RWLSE under END errors for heteroscedastic Models 1 and 2
Table 4 Mean and variance of the OLSE under END errors for heteroscedastic Models 1 and 2

4 Proofs of the main results

Proof of Theorem 2.1

Denote

$$\begin{aligned} \Psi _{n}(\theta _{1},\theta _{2})=\sum _{i=1}^{n}W_{i}(g_{i}(\theta _{1})-g_{i}(\theta _{2}))^{2}, ~~V_{n}(\theta )=\sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta )-g_{i}(\theta _{0})), \end{aligned}$$

and

$$\begin{aligned} U_{n}(\theta )=\frac{V_{n}(\theta )}{\Psi _{n}(\theta ,\theta _{0})},~\theta \ne \theta _{0}. \end{aligned}$$

Note from \(\sum _{i=1}^{n}W_{i}=1\) and (2.1) that

$$\begin{aligned} \Psi _{n}(\theta ,\theta _{0})\ge \lambda _n^{2}(\theta -\theta _{0})^{2}. \end{aligned}$$
(4.1)

For all \(\omega \in (|\hat{\theta }_{n}-\theta _0|>\varepsilon )\), where \(\varepsilon >0\) is arbitrary, we have that \(\hat{\theta }_{n}\ne \theta _0\) and thus

$$\begin{aligned} \sum _{i=1}^{n}W_{i}\xi _{i}^{2}= & {} \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))^{2}\ge \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\hat{\theta }_{n}))^{2}\\= & {} \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))^{2}+2\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))(g_{i}(\theta _{0})-g_{i}(\hat{\theta }_{n}))\\{} & {} +\sum _{i=1}^{n}W_{i}(g_{i}(\theta _{0})-g_{i}(\hat{\theta }_{n}))^{2}\\= & {} \sum _{i=1}^{n}W_{i}\xi _{i}^{2}-2U_{n}(\hat{\theta }_{n})\Psi _{n}(\hat{\theta }_{n},\theta _{0})+\Psi _{n}(\hat{\theta }_{n},\theta _{0}), \end{aligned}$$

which together with \(\Psi _{n}(\hat{\theta }_{n},\theta _{0})>0\) implies \(U_{n}(\hat{\theta }_{n})\ge 1/2\). Hence, \((|\hat{\theta }_{n}-\theta _0|>\varepsilon )\subseteq (U_{n}(\hat{\theta }_{n})\ge 1/2)\). Via choosing \(\varepsilon =\rho n^{-1/2}\), we have

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le & {} P\left( \sup _{|\theta -\theta _{0}|>\rho n^{-1/2}}|U_{n}(\theta )|\ge 1/2\right) \nonumber \\\le & {} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad +P\left( \sup _{\rho n^{-1/2}<|\theta -\theta _{0}|\le \rho }|U_{n}(\theta )|\ge 1/2\right) . \end{aligned}$$
(4.2)

By Cauchy’s inequality, we can see that for all \(\theta \ne \theta _{0}\),

$$\begin{aligned} \frac{|V_{n}(\theta )|^{2}}{\Psi _{n}(\theta ,\theta _{0})}=\frac{\left[ \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta )-g_{i}(\theta _{0}))\right] ^{2}}{\sum _{i=1}^{n}W_{i}(g_{i}(\theta )-g_{i}(\theta _{0}))^{2}} \le \sum _{i=1}^{n}W_{i}\xi _{i}^{2}.\end{aligned}$$
(4.3)

Observing that \(\sum _{i=1}^{n}W_{i}=1\) and \(f(x)=|x|^{r}\) is a convex function for all \(r\ge 1\), we have by \(p>2\) and Lemma A.3 that

$$\begin{aligned} E\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\right) ^{p/2}\le E\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|^{p}\right) =\sum _{i=1}^{n}EW_{i}E|\xi _{i}|^{p}=\frac{1}{n}\sum _{i=1}^{n}E|\xi _{i}|^{p}. \nonumber \\ \end{aligned}$$
(4.4)

Moreover, we obtain by (4.1) that

$$\begin{aligned} \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|=\sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}^{1/2}(\theta ,\theta _{0})\Psi _{n}^{1/2}(\theta ,\theta _{0})} \le (\lambda _n\rho )^{-1}\sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}^{1/2}(\theta ,\theta _{0})}. \nonumber \\ \end{aligned}$$
(4.5)

Hence, it follows from (4.3)–(4.5) and Markov’s inequality that

$$\begin{aligned} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\ge \frac{\lambda _n^{2}\rho ^{2}}{4}\right) \nonumber \\\le & {} \left( \frac{2}{\lambda _n\rho }\right) ^{p}E\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\right) ^{p/2}\nonumber \\\le & {} C_{1}(p)\Delta _{np}n^{-1}\lambda _n^{-p}\rho ^{-p}. \end{aligned}$$
(4.6)

For \(m=0,1,2,\ldots ,\lfloor n^{1/2}\rfloor \), let \(\theta (m)=\theta _{0}+\frac{\rho }{n^{1/2}}+\frac{m\rho }{\lfloor n^{1/2}\rfloor }\) and \(\rho _{m}=\theta (m)-\theta _{0}=\frac{\rho }{n^{1/2}}+\frac{m\rho }{\lfloor n^{1/2}\rfloor }\). It follows from (4.1) again that

$$\begin{aligned} \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\le & {} \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}\frac{|V_{n}(\theta )|}{\lambda _n^{2}(\theta -\theta _{0})^{2}} \le \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}\frac{|V_{n}(\theta )|}{\lambda _n^{2}\rho _{m}^{2}}\\\le & {} \frac{|V_{n}(\theta (m))|}{\lambda _n^{2}\rho _{m}^{2}}+\sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}\frac{|V_{n}(\theta _2)-V_{n}(\theta _1)|}{\lambda _n^{2}\rho _{m}^{2}}. \end{aligned}$$

Hence, it yields that

$$\begin{aligned} P\left( \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} +P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) .~~~~~\nonumber \\ \end{aligned}$$
(4.7)

By Lemma A.3 and Stirling’s approximation, we have that for each \(1\le i\le n\), when n is sufficiently large,

$$\begin{aligned} EW_{i}^{p}= & {} \frac{\Gamma (4n)\Gamma (4+p)}{\Gamma (4n+p)\Gamma (4)}\approx \frac{\Gamma (4+p)}{\Gamma (4)}\cdot \frac{\sqrt{2\pi (4n-1)}\left( \frac{4n-1}{e}\right) ^{4n-1}}{\sqrt{2\pi (4n+p-1)}\left( \frac{4n+p-1}{e}\right) ^{4n+p-1}}\nonumber \\\le & {} C\left( \frac{4n-1}{4n+p-1}\right) ^{4n-1}n^{-p}\le Cn^{-p}. \end{aligned}$$
(4.8)

Note that \(0=EW_{i}E\xi _{i}=EW_{i}\xi _{i}=EW_{i}\xi _{i}^{+}-EW_{i}\xi _{i}^{-}\) for \(1\le i\le n\) and from Lemma A.1 that \(\{W_{n}\xi _{n}^{+}-EW_{n}\xi _{n}^{+},n\ge 1\}\) and \(\{W_{n}\xi _{n}^{-}-EW_{n}\xi _{n}^{-},n\ge 1\}\) are still sequences of END random variables with zero mean. Hence, applying Markov’s inequality, Lemma A.2, (2.1) and (4.8), one can easily obtain that

$$\begin{aligned}{} & {} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \le \left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E|V_{n}(\theta (m))|^{p}\nonumber \\{} & {} \quad \quad =\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \quad \quad \le 2^{p-1}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}(W_{i}\xi _{i}^{+}-EW_{i}\xi _{i}^{+})(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \qquad \quad +2^{p-1}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}(W_{i}\xi _{i}^{-}-EW_{i}\xi _{i}^{-})(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \quad \quad \le C\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\left\{ \sum _{i=1}^{n}E|W_{i}\xi _{i}|^{p}|(g_{i}(\theta (m))-g_{i}(\theta _{0}))|^{p}\right. \nonumber \\{} & {} \quad \qquad \left. +\left( \sum _{i=1}^{n}E|W_{i}\xi _{i}|^{2}(g_{i}(\theta (m))-g_{i}(\theta _{0}))^{2}\right) ^{p/2}\right\} \nonumber \\{} & {} \quad \quad \le C\left( \frac{4\Lambda _n}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}|\theta (m)-\theta _{0}|^{p}\left\{ \sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p} +\left( \sum _{i=1}^{n}EW_{i}^{2}E\xi _{i}^{2}\right) ^{p/2}\right\} \nonumber \\{} & {} \quad \quad \le C_{2}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}(\Delta _{np}+\nabla _{np}). \end{aligned}$$
(4.9)

Similarly, we also obtain by Lemma A.2, (2.1) and (4.8) that for all \(\theta _{1},\theta _{2}\in \Theta \) and n large enough,

$$\begin{aligned} E|V_{n}(\theta _{2})-V_{n}(\theta _{1})|^{p}= & {} E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta _{2})-g_{i}(\theta _{1}))\right| ^{p}\nonumber \\\le & {} C\Lambda _n^{p}|\theta _{2}-\theta _{1}|^{p}\left\{ \sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p} +\left( \sum _{i=1}^{n}EW_{i}^{2}E\xi _{i}^{2}\right) ^{p/2}\right\} \nonumber \\\le & {} C\Lambda _n^{p}n^{-p}(\Delta _{np}+\nabla _{np})|\theta _{2}-\theta _{1}|^{p}=:C(n,p)|\theta _{2}-\theta _{1}|^{p}. \end{aligned}$$

Hence, taking \(r=1+\alpha =p\), \(C=C(n,p)\), \(\varepsilon =\rho /\lfloor n^{1/2}\rfloor \), \(a=\lambda _n^{2}\rho _{m}^{2}/4\), and \(\gamma \in (2,p+1)\) in Lemma A.4, we obtain

$$\begin{aligned}{} & {} P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad =P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m)+\rho /\lfloor n^{1/2}\rfloor }|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \frac{8C\Lambda _n^{p}n^{-p}(\Delta _{np}+\nabla _{np})}{(p+1-\gamma )(p+2-\gamma )}\left( \frac{8\gamma }{\gamma -2}\right) ^{p}\left( \frac{\rho }{\lfloor n^{1/2}\rfloor }\right) ^{p}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\nonumber \\{} & {} \quad \le C_{3}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}(\Delta _{np}+\nabla _{np}). \end{aligned}$$
(4.10)

Noting that \(\rho _{0}=\rho n^{-1/2}\), \(\rho _{m}>m\rho n^{-1/2}\) and \(p>2\), we obtain by (4.7), (4.9) and (4.10) that

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta -\theta _{0}\le \rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \qquad +\sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}\left[ C_{2}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}(\Delta _{np}+\nabla _{np})\right. \nonumber \\{} & {} \qquad \left. +C_{3}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}(\Delta _{np}+\nabla _{np})\right] \nonumber \\{} & {} \quad \le \left[ C_{2}(p)+C_{3}(p)\right] (\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}\nonumber \\{} & {} \qquad +(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}\sum _{m=1}^{\lfloor n^{1/2}\rfloor -1}\left( \frac{C_{2}(p)}{m^{p}}+\frac{C_{3}(p)}{m^{2p}}\right) \nonumber \\{} & {} \quad \le C_{4}(p)(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}. \end{aligned}$$
(4.11)

Similarly, we also have

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta _{0}-\theta \le \rho }|U_{n}(\theta )|\ge 1/2\right) \le C_{5}(p)(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}.\nonumber \\ \end{aligned}$$
(4.12)

The desired result (2.2) follows from (4.2), (4.6), (4.11) and (4.12) immediately. \(\square \)

Proof of Theorem 2.2

The proof is similar to that of Theorem 2.1. Thus, we only present the differences. It follows from (4.1) that

$$\begin{aligned} \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|= & {} \sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}(\theta ,\theta _{0})} \le \sup _{|\theta -\theta _{0}|>\rho }\frac{\sum _{i=1}^{n}W_{i}|\xi _{i}|\cdot \Lambda _{n}|\theta -\theta _{0}|}{\lambda _{n}^{2}(\theta -\theta _{0})^{2}}\nonumber \\\le & {} \frac{\Lambda _{n}}{\lambda _{n}^{2}\rho }\sum _{i=1}^{n}W_{i}|\xi _{i}|.~~~~\end{aligned}$$
(4.13)

Therefore, we have by (4.8), (4.13) and \(C_{r}\)-inequality that

$$\begin{aligned} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|\ge \frac{\lambda _{n}^{2}\rho }{2\Lambda _{n}}\right) \nonumber \\\le & {} \left( \frac{2\Lambda _{n}}{\lambda _{n}^{2}\rho }\right) ^{p}E\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|\right) ^{p}\nonumber \\\le & {} \left( \frac{2\Lambda _{n}}{\lambda _{n}^{2}\rho }\right) ^{p}n^{p-1}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\nonumber \\\le & {} C_{6}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-1}\rho ^{-p}. \end{aligned}$$
(4.14)

Applying Markov’s inequality, the Marcinkiewicz-Zygmund inequality in Lemma A.2, (2.1) and (4.8), we can also obtain that for all n large enough,

$$\begin{aligned} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right)\le & {} \left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\\le & {} C\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\sum _{i=1}^{n}E|W_{i}\xi _{i}|^{p}|(g_{i}(\theta (m))-g_{i}(\theta _{0}))|^{p}\nonumber \\\le & {} C\left( \frac{4\Lambda _n}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}|\theta (m)-\theta _{0}|^{p}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\nonumber \\\le & {} C_{7}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}\Delta _{np}, \end{aligned}$$
(4.15)

and

$$\begin{aligned} E|V_{n}(\theta _{2})-V_{n}(\theta _{1})|^{p}= & {} E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta _{2})-g_{i}(\theta _{1}))\right| ^{p}\\\le & {} C\Lambda _n^{p}|\theta _{2}-\theta _{1}|^{p}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\\\le & {} C\Lambda _n^{p}n^{-p}\Delta _{np}|\theta _{2}-\theta _{1}|^{p}. \end{aligned}$$

Hence, analogous to the proof of (4.10), we have

$$\begin{aligned}{} & {} P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \frac{8C\Lambda _n^{p}n^{-p}\Delta _{np}}{(p+1-\gamma )(p+2-\gamma )}\left( \frac{8\gamma }{\gamma -2}\right) ^{p}\left( \frac{\rho }{\lfloor n^{1/2}\rfloor }\right) ^{p}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\nonumber \\{} & {} \quad \le C_{8}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}\Delta _{np}. \end{aligned}$$
(4.16)

Analogous to the proof of (4.11), we obtain by (4.7), (4.15) and (4.16) that

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta -\theta _{0}\le \rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _{n}^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \qquad +\sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _{n}^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}\left[ C_{7}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}\Delta _{np}+C_{8}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}\Delta _{np}\right] \nonumber \\{} & {} \quad \le \left[ C_{7}(p)+C_{8}(p)\right] (\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}\nonumber \\{} & {} \qquad +(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}\sum _{m=1}^{\lfloor n^{1/2}\rfloor -1}\left( \frac{C_{7}(p)}{m^{p}}+\frac{C_{8}(p)}{m^{2p}}\right) \nonumber \\{} & {} \quad \le C_{9}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}. \end{aligned}$$
(4.17)

Similarly, we also have

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta _{0}-\theta \le \rho }|U_{n}(\theta )|\ge 1/2\right) \le C_{10}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}. \end{aligned}$$
(4.18)

Combining (4.2), (4.14), (4.17), and (4.18), we obtain (2.3) immediately. \(\square \)

Proof of Corollary 2.1

Taking \(\rho =\epsilon n^{1/p}\sqrt{\log n}\) in Theorem 2.1, we have that

$$\begin{aligned} \sum _{n=1}^{\infty }P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )= & {} \sum _{n=1}^{\infty }P(|\hat{\theta }_{n}-\theta _{0}|>\epsilon n^{1/p-1/2}\sqrt{\log n})\\\le & {} C\sum _{n=1}^{\infty }n^{-1}\log ^{-p/2}n<\infty ,\end{aligned}$$

which together with the Borel-Cantelli lemma yields the rate of strong consistency. \(\square \)

Proof of Corollary 2.2

Noting that \(\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty \) for some \(1<p\le 2\), we may assume that \(\Delta _{np}\le 1\) for each \(n\ge 1\). Hence, for any \(\epsilon >0\), taking \(\rho =\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1/2}\) in Theorem 2.2, we have

$$\begin{aligned} P\left( |\hat{\theta }_{n}-\theta _{0}|>\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1}\right)= & {} P\left( n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1/2}\right) \\\le & {} C'(p)n^{1-p/2}\frac{\epsilon }{C'(p)}n^{-1+p/2}=\epsilon . \end{aligned}$$

The second conclusion follows immediately by choosing \(\rho =\epsilon \sqrt{n/\tau _n}\) in Theorem 2.2. This completes the proof. \(\square \)

5 Conclusions

In this work, we mainly consider the following nonlinear regression model:

$$\begin{aligned} X_{n}=g_{n}(\theta )+\xi _{n},~n\ge 1, \end{aligned}$$
(5.1)

where \(X_{n}\) is observed, \(\{g_{n}(\theta )\}\) is a known sequence of continuous functions possibly nonlinear in \(\theta \in \Theta \), and \(\{\xi _{n},n\ge 1\}\) is a sequence of random errors with zero mean.

The nonlinear regression model not only has essentially fewer unknown parameters, but also has the meaning of physical variables while the linear parameters are usually devoid of physical significance. Therefore, it is of great interest to study the nonlinear regression model.

In this work, in view of the concept of Dirichlet distribution, we introduce the random weighting method to the nonlinear regression model and propose the randomly weighted least squares estimator of \(\theta \) as follows:

$$\begin{aligned} \hat{\theta }_{n}=\arg \inf _{\theta \in \Theta }\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$
(5.2)

where \(W_{i}\)’s are independent of \(\xi _{i}\)’s and the random vector \({\varvec{W}}=(W_{1},\cdots ,W_{n})\) obeys the Dirichlet distribution \(Dir(4,4,\ldots ,4)\).

In this work, we establish the asymptotic properties for the randomly weighted least squares estimator with END errors. The results reveal that this new estimator is consistent. Moreover, some simulations are also carried out to show the superiority to the ordinary least squares estimator, especially in a heteroscedastic nonlinear regression model.