Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Wu, Yi; Yu, Wei; Wang, Xuejun

doi:10.1007/s00184-023-00926-0

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Published: 04 October 2023

Volume 87, pages 551–570, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Metrika Aims and scope Submit manuscript

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Download PDF

Yi Wu¹,
Wei Yu² &
Xuejun Wang²

147 Accesses
Explore all metrics

Abstract

In this work, we introduce the random weighting method to the nonlinear regression model and study the asymptotic properties for the randomly weighted least squares estimator with dependent errors. The results reveal that this new estimator is consistent. Moreover, some simulations are also carried out to show the performance of the proposed estimator.

Best linear estimation via minimization of relative mean squared error

Article 30 November 2017

Weighted-Average Least Squares (WALS): Confidence and Prediction Intervals

Article Open access 22 April 2022

Least absolute deviations estimation for uncertain regression with imprecise observations

Article 11 September 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the nonlinear regression model:

$$\begin{aligned} X_{n}=g_{n}(\theta )+\xi _{n},~n\ge 1, \end{aligned}$$

(1.1)

where $X_{n}$ is observed, $\{g_{n}(\theta )\}$ is a known sequence of continuous functions possibly nonlinear in $\theta \in \Theta $, a closed interval on the real line, and $\{\xi _{n},n\ge 1\}$ is a sequence of random errors with zero mean. The nonlinear regression models have significant advantages over the linear models. The main one is that the nonlinear regression models have essentially fewer unknown parameters. Also, the parameters of nonlinear models have the meaning of physical variables while the linear parameters are usually devoid of physical significance. Therefore, it is of great interest to study the nonlinear regression model. In most of studies devoted to the problems of regression analysis in the past decades, the central place is occupied by the least squares method of estimation of parameters, which has a protracted history. Let

$$\begin{aligned} Q_{n}(\theta )=\frac{1}{n}\sum _{i=1}^{n}\omega _{i}^{2}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$

where $\{\omega _{i}\}$ is a known sequence of positive numbers. An estimator $\theta _{n}$ is said to be an ordinary least squares estimator (OLSE, for short) of $\theta $ if it minimizes $Q_{n}(\theta )$, that is, $Q_{n}(\theta _{n})=\inf _{\theta \in \Theta }Q_{n}(\theta )$.

The study of asymptotic properties of the OLSE for parameters in nonlinear regression models has been the main subject of investigation. It is challenging in investigating this model since the OLSE non-linearly entering in a regression function parameters can not be found in an explicit form, which complicates the description of its mathematical properties. Hence, on introducing into statistics the use of nonlinear regression analysis it is necessary to overcome a series of mathematical difficulties which do not have analogues in the linear theory. For the OLSE of the nonlinear model based on i.i.d. random errors, Jennrich (1969) established the asymptotic normality, Malinvaud (1970) investigated the consistency, and Wu (1981) established the necessary and sufficient condition for the strong consistency, and so on. In particular, Ivanov (1976) obtained the following result on large deviation for the OLSE with $\omega _{i}\equiv 1$ based on i.i.d. random errors.

Theorem 1.1

Let $\{\xi _{n},n\ge 1\}$ be i.i.d. random variables with $E|\xi _{1}|^{p}<\infty $ for some $p\ge 2$. Suppose there exist some constants $0<c_{1}\le c_{2}<\infty $ such that

$$\begin{aligned} c_{1}(\theta _{1}-\theta _{2})^{2}\le \frac{1}{n}\sum _{i=1}^{n}(g_{n}(\theta _{1})-g_{n}(\theta _2))^{2}\le c_{2}(\theta _{1}-\theta _{2})^{2},~\text { for any }\theta _{1},\theta _{2}\in \Theta ,~n\ge 1. \end{aligned}$$

Then for every $\rho >0$ and for all $n\ge 1$, it has

$$\begin{aligned} P(n^{1/2}|\theta _{n}-\theta _{0}|>\rho )\le c\rho ^{-p}, \end{aligned}$$

where $\theta _{0}$ is the true parameter such that $\theta _{0}\in $ interior of $\Theta $ and c is a positive constant independent of n and p.

Prakasa Rao (1984) extended Theorem 1.1 from i.i.d. case to some dependent cases such as $\varphi $-mixing and $\alpha $-mixing assumptions. Hu (2002) extended Theorem 1.1 to martingale differences, $\varphi $-mixing and negative association (NA, for short) assumptions under $\sup _{i\ge 1}E|\xi _{i}|^{p}<\infty $ for some $p>2$ without identical distribution. Hu (2004) further considered the large deviation result under the moment condition $\sup _{i\ge 1}E|\xi _{i}|^{p}<\infty $ for some $1<p\le 2$. Yang and Hu (2014) obtained some general results on the large deviation, which can also be available under some cases satisfying $\sup _{i\ge 1}E|\xi _{i}|^{p}=\infty $ for some $p>1$; Yang et al. (2017) established some large deviation results under extended negatively dependent (END, for short) random errors, and so on. However, a new challenge will emerge if the errors are heteroscedastic, i.e., estimating the variances of the errors is not an easy work.

It is well known that the bootstrap is a very excellent method, which has been used comprehensively in many statistical models including nonlinear regression model. One can see Staniewski (1984) for example. As an alternative approach, the random weighting method, or Bayesian bootstrap method, has received increasing attentions by scholars since it was originally suggested by Rubin (1981). The random weighting method is motivated by the bootstrap method and can be regarded as a kind of smoothing of bootstrap. Instead of re-sampling from the original data set, the random weighting method propels us to generate a group of random weights directly from the computer and then use them to weight the original samples. In comparison with the bootstrap method, the random weighting method has advantages such as the simplicity in computation, the suitability for large samples, and there is no need to know the distribution function. Therefore, this method was adopted in various statistical models. For more details, we refer the readers to Zheng (1987), Gao et al. (2003), Xue and Zhu (2005), Fang and Zhao (2006), Barvinok and Samorodnitsky (2007), Gao and Zhong (2010), and so forth.

However, to the best of our knowledge, there is no literature considering the randomly weighted estimation in nonlinear regression models. In this paper, the random weighting method is adopted for the first time to the least squares estimation in nonlinear regression models. Now we are at a position to present this method.

Definition 1.1

(cf. Ng et al. 2011) Let $(W_{1},\cdots ,W_{n})$ be a random vector with $W_{i}\ge 0$ and $\sum _{i=1}^{n}W_{i}=1$. Then the Dirichlet probability density function of $(W_{1},\cdots ,W_{n})$ is defined as

$$\begin{aligned} f(w_{1},\cdots ,w_{n})=\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\prod _{i=1}^{n}w_{i}^{\alpha _{i}-1}, \end{aligned}$$

where $\alpha _{i}>0$, $\alpha _{0}=\sum _{i=1}^{n}\alpha _{i}$, $w_{i}\ge 0$, $\sum _{i=1}^{n-1}w_{i}\le 1$ and $w_{n}=1-\sum _{i=1}^{n-1}w_{i}$. This distribution is denoted by $Dir(\alpha _{1},\cdots ,\alpha _{n})$.

By virtue of the concept of Dirichlet distribution, we can propose the randomly weighted least squares estimator of $\theta $ as follows. Let

$$\begin{aligned} H_{n}(\theta )=\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$

(1.2)

where $W_{i}$’s are independent of $\xi _{i}$’s and the random vector ${\varvec{W}}=(W_{1},\cdots ,W_{n})$ obeys the Dirichlet distribution $Dir(4,4,\ldots ,4)$, namely, $\sum _{i=1}^{n}W_{i}=1$ and the joint density of $W_{1},\cdots ,W_{n-1}$ is

$$\begin{aligned} f(w_{1},\cdots ,w_{n-1})=\frac{\Gamma (4n)}{(\Gamma (4))^{n}}w_{1}^{3}\cdots w_{n-1}^{3}(1-w_{1}-\cdots w_{n-1})^{3}, \end{aligned}$$

where $(w_{1},\cdots ,w_{n-1})\in D_{n-1}$ and $D_{n-1}=\{(w_{1},\cdots ,w_{n-1}):w_{i}\ge 0,i=1,\ldots ,n-1,\sum _{i=1}^{n-1}w_{i}\le 1\}$. An estimator $\hat{\theta }_{n}$ is said to be a randomly weighted least squares estimator (RWLSE, for short) of $\theta $ if $\hat{\theta }_{n}=\arg \inf _{\theta \in \Theta }H_{n}(\theta )$.

Since independence assumption is usually implausible in reality, we will adopt a relatively broad dependence, i.e., END assumption in the sequel. The concept of END random variables was introduced by Liu (2009) as follows.

Definition 1.2

A finite collection of random variables $X_1,X_2,\cdots ,X_n$ is said to be END if there exists a constant $M > 0$ such that both

$$\begin{aligned} P(X_1>x_1,X_2>x_2,\cdots ,X_n>x_n)\le M\prod _{i=1}^nP(X_i>x_i)\end{aligned}$$

and

$$\begin{aligned} P(X_1\le x_1, X_2\le x_2,\cdots ,X_n\le x_n)\le M\prod _{i=1}^nP(X_i\le x_i) \end{aligned}$$

hold for all real numbers $x_1, x_2, \cdots , x_n$. An infinite sequence $\{X_n, n\ge 1\}$ is said to be END if every finite sub-collection is END.

Liu (2009) provided some examples satisfying END structure, one of which shows that if $X_1,X_2,\cdots ,X_n$ are dependent according to a multivariate copula function $C(u_{1},\cdots ,u_{n})$ with absolutely continuous distribution functions $F_{1},\cdots ,F_{n}$, the joint copula density $c(u_{1},\cdots ,u_{n})=\frac{\partial ^{n}C(u_{1},\cdots ,u_{n})}{\partial u_{1}\cdots \partial u_{n}}$ exists and be uniformly bounded in the whole domain, then $\{X_{n},n\ge 1\}$ are END. If we take $M=1$, then the END structure degenerates to negatively orthant dependent (NOD, for short) structure which was introduced by Lehmann (1966) (cf. also Joag-Dev and Proschan 1983). The END structure can reflect not only a negative dependence structure but also a positive one to some extent. Liu (2009) pointed out that the END random variables can be regarded as negatively or positively dependent and provided some interesting examples to support this idea. Joag-Dev and Proschan (1983) also pointed out that negatively associated (NA, for short) random variables are NOD but the inverse is not necessarily true, thus NA random variables are also END. Hence, the consideration of END structure is reasonable and of great interest. Many applications have been found for END random variables. For example, Liu (2010) studied the sufficient and necessary conditions of moderate deviations for END random variables with heavy tails; Chen et al. (2010) established the strong law of large numbers for END random variables and gave their applications to risk theory and renewal theory; Shen (2011) established some exponential probability inequalities for END random variables and presented some applications; Wang and Wang (2013) investigated the precise large deviations for random sums of END real-valued random variables with consistent variation; Wang et al. (2014) proved some results on complete convergence of END random variables; Lita da Silva (2015) established the almost sure convergence for sequences of END random variables; Wang et al. (2015) and Yang et al. (2018) studied the complete consistency of the estimator of nonparametric regression models based on END errors; Wu et al. (2019) investigated the complete f-moment convergence for END random variables, and so on.

For the proposed RWLSE above-mentioned, we establish two general results on large deviation for RWLSE of the parameter $\theta $ with $p>2$, and respectively, $1<p\le 2$ under END errors. As direct corollaries, the rates of complete consistency, strong consistency, and weak consistency are obtained, which reflect that the proposed RWLSE is a consistent estimator of $\theta $. The numerical analysis reveals that the RWLSE performs as well as the OLSE in heteroscedastic nonlinear regression models, where sometimes the former one is better. As we have pointed out earlier, it is not easy to estimate the variances of heteroscedastic errors, this paper provides an alternative method to estimate the parameters in a heteroscedastic nonlinear regression model.

Throughout this paper, the symbol C represents some positive constant which can be different in different places. $C(p),C'(p),C_{1}(p),C_{2}(p),\cdots $ are some positive constants depending only on p. Let I(A) be the indicator function of the event A and $\lfloor x\rfloor $ denote the integer part of x. Denote $x^{+}=xI(x\ge 0)$, $x^{-}=-xI(x<0)$. $\log n=\ln \max (x,e)$, where $\ln x$ represents the natural logarithm of x.

The rest of this paper is organized as follows: The main results are stated in Sect. 2. The numerical analysis is provided in Sect. 3. The proofs of the main results are presented in Sect. 4. Some lemmas for proving the main results are given in Appendix.

2 Main results

The main results on large deviations are presented as follows.

Theorem 2.1

Let $p>2$. In model (1.1), assume that $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $E|\xi _{n}|^{p}<\infty $ for each $n\ge 1$. If there exist positive numbers $\lambda _n\le \Lambda _n$ for each $n\ge 1$, such that

$$\begin{aligned} \lambda _n|\theta _{1}-\theta _{2}|\le |g_{i}(\theta _{1})-g_{i}(\theta _2)|\le \Lambda _n|\theta _{1}-\theta _{2}|,~\text { for any }\theta _{1},\theta _{2}\in \Theta ,~1\le i\le n,~n\ge 1,\nonumber \\ \end{aligned}$$

(2.1)

then there exists a positive constant C(p) depending only on p such that for all $\rho >0$ and each $n\ge 1$,

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le C(p)\left[ n^{-1}\lambda _n^{-p}\Delta _{np}+n^{-p/2}(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})\right] \rho ^{-p},\nonumber \\\end{aligned}$$

(2.2)

where $\Delta _{np}=\sum _{i=1}^{n}E|\xi _{i}|^{p}$ and $\nabla _{np}=\left( \sum _{i=1}^{n}E\xi _{i}^{2}\right) ^{p/2}$.

Theorem 2.2

Let $1<p\le 2$. In model (1.1), assume that $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $E|\xi _{n}|^{p}<\infty $ for each $n\ge 1$. If (2.1) holds, then there exists a positive constant $C'(p)$ depending only on p such that for all $\rho >0$ and each $n\ge 1$,

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le C'(p)(\Lambda _n/\lambda _n^{2})^{p}n^{-p/2}\Delta _{np}\rho ^{-p}. \end{aligned}$$

(2.3)

Remark 2.1

It is easy to see that if $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $p>2$, Theorem 2.1 extends Theorem 1.1 from i.i.d assumption to END random errors with not necessarily identical distribution. Similarly, if $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $1<p\le 2$, then Theorem 2.2 also extends the corresponding result of Hu (2004).

Remark 2.2

Yang and Hu (2014) also established the similar results for OLSE of $\theta $ with NOD errors. By taking $\lambda _{n}=c_{1}$, $\Lambda _{n}=c_{2}$ for some $0<c_1\le c_2$, we point out that the conditions in Theorem 2.1 and the corresponding result of Yang and Hu (2014) do no imply each other. For example, $n^{-1}\sum _{i=1}^{n}E|\xi _{i}|^{p}>n^{-p/2}\sum _{i=1}^{n}E|\xi _{i}|^{p}$ but $n^{-p/2}\left( \sum _{i=1}^{n}E\xi _{i}^{2}\right) ^{p/2}\le n^{-p/2}\left( \sum _{i=1}^{n}(E|\xi _{i}|^{p})^{2/p}\right) ^{p/2}$. However, if $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $p>2$, they are equivalent. Hence, our results extend the corresponding ones of Yang and Hu (2014).

By Theorem 2.1, we can obtain the result concerning the rate of complete consistency and strong consistency as follows.

Corollary 2.1

In model (1.1), assume that $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $p>2$. If (2.1) holds with $\lambda _{n}=c_1$ and $\Lambda _{n}=c_{2}$ for some $0<c_{1}\le c_{2}<\infty $, then for any $\epsilon >0$,

$$\begin{aligned} \sum _{n=1}^{\infty }P(|\hat{\theta }_{n}-\theta _{0}|>\epsilon n^{1/p-1/2}\sqrt{\log n})<\infty ,\end{aligned}$$

and thus

$$\begin{aligned} |\hat{\theta }_{n}-\theta _{0}|=o\left( n^{1/p-1/2}\sqrt{\log n}\right) ~a.s.,~\text {as }n\rightarrow \infty . \end{aligned}$$

By Theorem 2.2, we can also obtain the following result on rate of weak consistency of the RWLSE $\hat{\theta }_{n}$.

Corollary 2.2

In model (1.1), assume that $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $1<p\le 2$. If (2.1) holds with $\lambda _{n}=c_1$ and $\Lambda _{n}=c_{2}$ for some $0<c_{1}\le c_{2}<\infty $, then

$$\begin{aligned}|\hat{\theta }_{n}-\theta _{0}|=O_P\left( n^{1/p-1}\right) .\end{aligned}$$

In particular, if $p=2$, then for any positive sequence $\{\tau _{n},n\ge 1\}$ satisfying $\tau _{n}=o(n)$,

$$\begin{aligned}\sqrt{\tau _{n}}|\hat{\theta }_{n}-\theta _{0}|\xrightarrow {P}0.\end{aligned}$$

3 Some examples and numerical analysis

3.1 Some examples

In this subsection, we present some examples for the RWLSE of nonlinear regression models.

Example 3.1

Consider the linear model

$$\begin{aligned} X_{i}=\theta +\xi _{i},~i=1,2,\ldots ,n,~\theta \in \Theta , \end{aligned}$$

(3.4)

where $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $E|\xi _{n}|^{p}<\infty $ for each $n\ge 1$. Obviously, (2.1) holds with $\lambda _{n}=\Lambda _{n}=1$. Hence, Theorems 2.1 and 2.2 follows from $E|\xi _{n}|^{p}<\infty $ with $p>2$ and respectively, $1<p\le 2$.

Example 3.2

Consider the Michaelis-Menten model (see Sieders and Dzhaparidze (1987)) or Miao and Tang (2021) for example)

$$\begin{aligned} V(v,L,N)=\frac{Lv}{N+v}, \end{aligned}$$

which is used to describe the relation between the velocity V of an enzyme reaction and the concentration v of the substrate. The parameter L denotes the maximal reaction velocity and the parameter N implies the chemical affinity. Based on the model above, for each concentration $v_{i}$, there is a measurement of the velocity $V_{i}$ with error $\xi _{i}$, i.e.,

$$\begin{aligned} X_{i}=V_{i}(v_i,L,N)=\frac{Lv_{i}}{N+v_{i}}+\xi _{i},i\ge 1. \end{aligned}$$

(3.5)

Assume that the parameter set $(L, N)\in \Theta $ is a bounded open set in the positive quadrant. Consider the following simple form of model (3.5)

$$\begin{aligned} X_{i}=g_{i}(N)+\xi _{i}=\frac{1}{N^{-1}+i^{\mu }}+\xi _{i}, \end{aligned}$$

(3.6)

which follows from (3.5) by assuming N/L is known (without loss of generality, we may assume that $N/L=1$) and letting $v_{i}=i^{-\mu }$, $0<\mu <\min \{(p-1)/(4p),1/8\}$, where $p>1$. It is easy to see that

$$\begin{aligned} c_{3}n^{-2\mu }|N_{1}-N_{2}|\le & {} |g_{i}(N_{1})-g_{i}(N_{2})|=\frac{|N_{1}-N_{2}|}{(1+N_{1}i^{\mu })(1+N_{2}i^{\mu })}\\\le & {} c_{4}|N_{1}-N_{2}|,~1\le i\le n \end{aligned}$$

for some $0<c_{3}\le c_{4}<\infty $. Assume further that $\{\xi _{n},n\ge 1\}$ is a sequence of END random errors with zero mean and $E|\xi _{n}|^{p}<\infty $ for each $n\ge 1$, then Theorems 2.1 and 2.2 hold. Moreover, by choosing $\rho =n^{1/2}\epsilon $, we can obtain the weak consistency for $p>1$ and strong consistency for some p large enough.

3.2 Numerical analysis

In this section, we will carry out some simulations to study the finite sample performance of the RWLSE in the homoscedastic and heteroscedastic nonlinear regression models. The data are generated from model (3.4) (denoted as Model 1) and (3.6) (denoted as Model 2) respectively. For Model 1, set $\theta =1$ and for Model 2, set $N=1/5$. Set the sample size $n=50,100,200,400,800,1600$. Let $(\epsilon _1,\cdots ,\epsilon _n)\sim N_n(0,\Sigma )$ with

$$\begin{aligned} \Sigma =\begin{pmatrix} 1&{}\quad -0.3&{}\quad 0&{}\quad \cdots &{}\quad 0\\ -0.3&{}\quad 1&{} \quad -0.3&{}\quad \ddots &{}\quad \vdots \\ 0&{}\quad -0.3&{}\quad 1 &{}\quad \ddots &{}\quad 0\\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad -0.3\\ 0&{}\quad \cdots &{}\quad 0&{}\quad -0.3&{}\quad 1 \end{pmatrix}. \end{aligned}$$

(3.7)

The weights $W\sim Dir(4,4,\cdots ,4)$, and the generation method of W is refer to Narayanan (1990).

We first use the RWLSE to estimate $\theta $ for Model 1 and N for Model 2 with homoscedasticity, i.e., $\xi _i=\epsilon _i$ for each $1\le i\le n$. Repeat the procedure 1000 times and calculate the mean and variance of the estimator. The results are given in Table 1. We can see that $\hat{\theta }$ for Model 1 is unbiased, while $\hat{N}$ for Model 2 is asymptotically unbiased. The trend that $Var[\sqrt{n}(\hat{\theta }-\theta )]$ and $Var[\sqrt{n}(\hat{N}-N)]$ are finite indicates that the convergence rate of the RWLSE is asymptotically $O(n^{-1/2})$. To compare the RWLSE to the OLSE under END errors, we further present the the mean and variance of the OLSE in Table 2. The results show that there are no intrinsic difference between the mean of the two estimators. The mean and the variance of the RWLSE are slightly inferior to those of the OLSE in both the two models.

Table 1 Mean and variance of the RWLSE under END errors for homoscedastic Models 1 and 2

Full size table

Table 2 Mean and variance of the OLSE under END errors for homoscedastic Models 1 and 2

Full size table

We now consider the heteroscedastic case, i.e., $\xi _i=\big [1+\frac{(-1)^{i}(i-1)}{n}\big ]\epsilon _{i}$ for each $1\le i\le n$. Other settings are the same as above. The results are given in Table 3 and Table 4. The mean and the variance of the RWLSE are better than those of the OLSE in Model 1 but slightly weaker in Model 2. However, the convergence rates of the two estimators are almost the same. Note that in our simulation, the heteroscedasticity is known. However, in many realistic applications, it is not easy to estimate the variance of the errors if they are heteroscedastic. Therefore, our simulation results show that the RWLSE has a good performance without estimating the variance of the errors first, which provide us an alternative choice when dealing with similar issues.

Table 3 Mean and variance of the RWLSE under END errors for heteroscedastic Models 1 and 2

Full size table

Table 4 Mean and variance of the OLSE under END errors for heteroscedastic Models 1 and 2

Full size table

4 Proofs of the main results

Proof of Theorem 2.1

Denote

$$\begin{aligned} \Psi _{n}(\theta _{1},\theta _{2})=\sum _{i=1}^{n}W_{i}(g_{i}(\theta _{1})-g_{i}(\theta _{2}))^{2}, ~~V_{n}(\theta )=\sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta )-g_{i}(\theta _{0})), \end{aligned}$$

and

$$\begin{aligned} U_{n}(\theta )=\frac{V_{n}(\theta )}{\Psi _{n}(\theta ,\theta _{0})},~\theta \ne \theta _{0}. \end{aligned}$$

Note from $\sum _{i=1}^{n}W_{i}=1$ and (2.1) that

$$\begin{aligned} \Psi _{n}(\theta ,\theta _{0})\ge \lambda _n^{2}(\theta -\theta _{0})^{2}. \end{aligned}$$

(4.1)

For all $\omega \in (|\hat{\theta }_{n}-\theta _0|>\varepsilon )$, where $\varepsilon >0$ is arbitrary, we have that $\hat{\theta }_{n}\ne \theta _0$ and thus

$$\begin{aligned} \sum _{i=1}^{n}W_{i}\xi _{i}^{2}= & {} \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))^{2}\ge \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\hat{\theta }_{n}))^{2}\\= & {} \sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))^{2}+2\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta _{0}))(g_{i}(\theta _{0})-g_{i}(\hat{\theta }_{n}))\\{} & {} +\sum _{i=1}^{n}W_{i}(g_{i}(\theta _{0})-g_{i}(\hat{\theta }_{n}))^{2}\\= & {} \sum _{i=1}^{n}W_{i}\xi _{i}^{2}-2U_{n}(\hat{\theta }_{n})\Psi _{n}(\hat{\theta }_{n},\theta _{0})+\Psi _{n}(\hat{\theta }_{n},\theta _{0}), \end{aligned}$$

which together with $\Psi _{n}(\hat{\theta }_{n},\theta _{0})>0$ implies $U_{n}(\hat{\theta }_{n})\ge 1/2$. Hence, $(|\hat{\theta }_{n}-\theta _0|>\varepsilon )\subseteq (U_{n}(\hat{\theta }_{n})\ge 1/2)$. Via choosing $\varepsilon =\rho n^{-1/2}$, we have

$$\begin{aligned} P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )\le & {} P\left( \sup _{|\theta -\theta _{0}|>\rho n^{-1/2}}|U_{n}(\theta )|\ge 1/2\right) \nonumber \\\le & {} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad +P\left( \sup _{\rho n^{-1/2}<|\theta -\theta _{0}|\le \rho }|U_{n}(\theta )|\ge 1/2\right) . \end{aligned}$$

(4.2)

By Cauchy’s inequality, we can see that for all $\theta \ne \theta _{0}$,

$$\begin{aligned} \frac{|V_{n}(\theta )|^{2}}{\Psi _{n}(\theta ,\theta _{0})}=\frac{\left[ \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta )-g_{i}(\theta _{0}))\right] ^{2}}{\sum _{i=1}^{n}W_{i}(g_{i}(\theta )-g_{i}(\theta _{0}))^{2}} \le \sum _{i=1}^{n}W_{i}\xi _{i}^{2}.\end{aligned}$$

(4.3)

Observing that $\sum _{i=1}^{n}W_{i}=1$ and $f(x)=|x|^{r}$ is a convex function for all $r\ge 1$, we have by $p>2$ and Lemma A.3 that

$$\begin{aligned} E\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\right) ^{p/2}\le E\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|^{p}\right) =\sum _{i=1}^{n}EW_{i}E|\xi _{i}|^{p}=\frac{1}{n}\sum _{i=1}^{n}E|\xi _{i}|^{p}. \nonumber \\ \end{aligned}$$

(4.4)

Moreover, we obtain by (4.1) that

$$\begin{aligned} \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|=\sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}^{1/2}(\theta ,\theta _{0})\Psi _{n}^{1/2}(\theta ,\theta _{0})} \le (\lambda _n\rho )^{-1}\sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}^{1/2}(\theta ,\theta _{0})}. \nonumber \\ \end{aligned}$$

(4.5)

Hence, it follows from (4.3)–(4.5) and Markov’s inequality that

$$\begin{aligned} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\ge \frac{\lambda _n^{2}\rho ^{2}}{4}\right) \nonumber \\\le & {} \left( \frac{2}{\lambda _n\rho }\right) ^{p}E\left( \sum _{i=1}^{n}W_{i}\xi _{i}^{2}\right) ^{p/2}\nonumber \\\le & {} C_{1}(p)\Delta _{np}n^{-1}\lambda _n^{-p}\rho ^{-p}. \end{aligned}$$

(4.6)

For $m=0,1,2,\ldots ,\lfloor n^{1/2}\rfloor $, let $\theta (m)=\theta _{0}+\frac{\rho }{n^{1/2}}+\frac{m\rho }{\lfloor n^{1/2}\rfloor }$ and $\rho _{m}=\theta (m)-\theta _{0}=\frac{\rho }{n^{1/2}}+\frac{m\rho }{\lfloor n^{1/2}\rfloor }$. It follows from (4.1) again that

$$\begin{aligned} \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\le & {} \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}\frac{|V_{n}(\theta )|}{\lambda _n^{2}(\theta -\theta _{0})^{2}} \le \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}\frac{|V_{n}(\theta )|}{\lambda _n^{2}\rho _{m}^{2}}\\\le & {} \frac{|V_{n}(\theta (m))|}{\lambda _n^{2}\rho _{m}^{2}}+\sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}\frac{|V_{n}(\theta _2)-V_{n}(\theta _1)|}{\lambda _n^{2}\rho _{m}^{2}}. \end{aligned}$$

Hence, it yields that

$$\begin{aligned} P\left( \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} +P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) .~~~~~\nonumber \\ \end{aligned}$$

(4.7)

By Lemma A.3 and Stirling’s approximation, we have that for each $1\le i\le n$, when n is sufficiently large,

$$\begin{aligned} EW_{i}^{p}= & {} \frac{\Gamma (4n)\Gamma (4+p)}{\Gamma (4n+p)\Gamma (4)}\approx \frac{\Gamma (4+p)}{\Gamma (4)}\cdot \frac{\sqrt{2\pi (4n-1)}\left( \frac{4n-1}{e}\right) ^{4n-1}}{\sqrt{2\pi (4n+p-1)}\left( \frac{4n+p-1}{e}\right) ^{4n+p-1}}\nonumber \\\le & {} C\left( \frac{4n-1}{4n+p-1}\right) ^{4n-1}n^{-p}\le Cn^{-p}. \end{aligned}$$

(4.8)

Note that $0=EW_{i}E\xi _{i}=EW_{i}\xi _{i}=EW_{i}\xi _{i}^{+}-EW_{i}\xi _{i}^{-}$ for $1\le i\le n$ and from Lemma A.1 that $\{W_{n}\xi _{n}^{+}-EW_{n}\xi _{n}^{+},n\ge 1\}$ and $\{W_{n}\xi _{n}^{-}-EW_{n}\xi _{n}^{-},n\ge 1\}$ are still sequences of END random variables with zero mean. Hence, applying Markov’s inequality, Lemma A.2, (2.1) and (4.8), one can easily obtain that

$$\begin{aligned}{} & {} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \le \left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E|V_{n}(\theta (m))|^{p}\nonumber \\{} & {} \quad \quad =\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \quad \quad \le 2^{p-1}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}(W_{i}\xi _{i}^{+}-EW_{i}\xi _{i}^{+})(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \qquad \quad +2^{p-1}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}(W_{i}\xi _{i}^{-}-EW_{i}\xi _{i}^{-})(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\{} & {} \quad \quad \le C\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\left\{ \sum _{i=1}^{n}E|W_{i}\xi _{i}|^{p}|(g_{i}(\theta (m))-g_{i}(\theta _{0}))|^{p}\right. \nonumber \\{} & {} \quad \qquad \left. +\left( \sum _{i=1}^{n}E|W_{i}\xi _{i}|^{2}(g_{i}(\theta (m))-g_{i}(\theta _{0}))^{2}\right) ^{p/2}\right\} \nonumber \\{} & {} \quad \quad \le C\left( \frac{4\Lambda _n}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}|\theta (m)-\theta _{0}|^{p}\left\{ \sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p} +\left( \sum _{i=1}^{n}EW_{i}^{2}E\xi _{i}^{2}\right) ^{p/2}\right\} \nonumber \\{} & {} \quad \quad \le C_{2}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}(\Delta _{np}+\nabla _{np}). \end{aligned}$$

(4.9)

Similarly, we also obtain by Lemma A.2, (2.1) and (4.8) that for all $\theta _{1},\theta _{2}\in \Theta $ and n large enough,

$$\begin{aligned} E|V_{n}(\theta _{2})-V_{n}(\theta _{1})|^{p}= & {} E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta _{2})-g_{i}(\theta _{1}))\right| ^{p}\nonumber \\\le & {} C\Lambda _n^{p}|\theta _{2}-\theta _{1}|^{p}\left\{ \sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p} +\left( \sum _{i=1}^{n}EW_{i}^{2}E\xi _{i}^{2}\right) ^{p/2}\right\} \nonumber \\\le & {} C\Lambda _n^{p}n^{-p}(\Delta _{np}+\nabla _{np})|\theta _{2}-\theta _{1}|^{p}=:C(n,p)|\theta _{2}-\theta _{1}|^{p}. \end{aligned}$$

Hence, taking $r=1+\alpha =p$, $C=C(n,p)$, $\varepsilon =\rho /\lfloor n^{1/2}\rfloor $, $a=\lambda _n^{2}\rho _{m}^{2}/4$, and $\gamma \in (2,p+1)$ in Lemma A.4, we obtain

$$\begin{aligned}{} & {} P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad =P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m)+\rho /\lfloor n^{1/2}\rfloor }|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \frac{8C\Lambda _n^{p}n^{-p}(\Delta _{np}+\nabla _{np})}{(p+1-\gamma )(p+2-\gamma )}\left( \frac{8\gamma }{\gamma -2}\right) ^{p}\left( \frac{\rho }{\lfloor n^{1/2}\rfloor }\right) ^{p}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\nonumber \\{} & {} \quad \le C_{3}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}(\Delta _{np}+\nabla _{np}). \end{aligned}$$

(4.10)

Noting that $\rho _{0}=\rho n^{-1/2}$, $\rho _{m}>m\rho n^{-1/2}$ and $p>2$, we obtain by (4.7), (4.9) and (4.10) that

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta -\theta _{0}\le \rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\rho _{m}<\theta -\theta _{0}\le \rho _{m+1}}|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \qquad +\sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}\left[ C_{2}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}(\Delta _{np}+\nabla _{np})\right. \nonumber \\{} & {} \qquad \left. +C_{3}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}(\Delta _{np}+\nabla _{np})\right] \nonumber \\{} & {} \quad \le \left[ C_{2}(p)+C_{3}(p)\right] (\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}\nonumber \\{} & {} \qquad +(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}\sum _{m=1}^{\lfloor n^{1/2}\rfloor -1}\left( \frac{C_{2}(p)}{m^{p}}+\frac{C_{3}(p)}{m^{2p}}\right) \nonumber \\{} & {} \quad \le C_{4}(p)(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}. \end{aligned}$$

(4.11)

Similarly, we also have

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta _{0}-\theta \le \rho }|U_{n}(\theta )|\ge 1/2\right) \le C_{5}(p)(\Lambda _n/\lambda _n^{2})^{p}(\Delta _{np}+\nabla _{np})n^{-p/2}\rho ^{-p}.\nonumber \\ \end{aligned}$$

(4.12)

The desired result (2.2) follows from (4.2), (4.6), (4.11) and (4.12) immediately. $\square $

Proof of Theorem 2.2

The proof is similar to that of Theorem 2.1. Thus, we only present the differences. It follows from (4.1) that

$$\begin{aligned} \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|= & {} \sup _{|\theta -\theta _{0}|>\rho }\frac{|V_{n}(\theta )|}{\Psi _{n}(\theta ,\theta _{0})} \le \sup _{|\theta -\theta _{0}|>\rho }\frac{\sum _{i=1}^{n}W_{i}|\xi _{i}|\cdot \Lambda _{n}|\theta -\theta _{0}|}{\lambda _{n}^{2}(\theta -\theta _{0})^{2}}\nonumber \\\le & {} \frac{\Lambda _{n}}{\lambda _{n}^{2}\rho }\sum _{i=1}^{n}W_{i}|\xi _{i}|.~~~~\end{aligned}$$

(4.13)

Therefore, we have by (4.8), (4.13) and $C_{r}$-inequality that

$$\begin{aligned} P\left( \sup _{|\theta -\theta _{0}|>\rho }|U_{n}(\theta )|\ge 1/2\right)\le & {} P\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|\ge \frac{\lambda _{n}^{2}\rho }{2\Lambda _{n}}\right) \nonumber \\\le & {} \left( \frac{2\Lambda _{n}}{\lambda _{n}^{2}\rho }\right) ^{p}E\left( \sum _{i=1}^{n}W_{i}|\xi _{i}|\right) ^{p}\nonumber \\\le & {} \left( \frac{2\Lambda _{n}}{\lambda _{n}^{2}\rho }\right) ^{p}n^{p-1}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\nonumber \\\le & {} C_{6}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-1}\rho ^{-p}. \end{aligned}$$

(4.14)

Applying Markov’s inequality, the Marcinkiewicz-Zygmund inequality in Lemma A.2, (2.1) and (4.8), we can also obtain that for all n large enough,

$$\begin{aligned} P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right)\le & {} \left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta (m))-g_{i}(\theta _{0}))\right| ^{p}\nonumber \\\le & {} C\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\sum _{i=1}^{n}E|W_{i}\xi _{i}|^{p}|(g_{i}(\theta (m))-g_{i}(\theta _{0}))|^{p}\nonumber \\\le & {} C\left( \frac{4\Lambda _n}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}|\theta (m)-\theta _{0}|^{p}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\nonumber \\\le & {} C_{7}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}\Delta _{np}, \end{aligned}$$

(4.15)

and

$$\begin{aligned} E|V_{n}(\theta _{2})-V_{n}(\theta _{1})|^{p}= & {} E\left| \sum _{i=1}^{n}W_{i}\xi _{i}(g_{i}(\theta _{2})-g_{i}(\theta _{1}))\right| ^{p}\\\le & {} C\Lambda _n^{p}|\theta _{2}-\theta _{1}|^{p}\sum _{i=1}^{n}EW_{i}^{p}E|\xi _{i}|^{p}\\\le & {} C\Lambda _n^{p}n^{-p}\Delta _{np}|\theta _{2}-\theta _{1}|^{p}. \end{aligned}$$

Hence, analogous to the proof of (4.10), we have

$$\begin{aligned}{} & {} P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _n^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \frac{8C\Lambda _n^{p}n^{-p}\Delta _{np}}{(p+1-\gamma )(p+2-\gamma )}\left( \frac{8\gamma }{\gamma -2}\right) ^{p}\left( \frac{\rho }{\lfloor n^{1/2}\rfloor }\right) ^{p}\left( \frac{4}{\lambda _n^{2}\rho _{m}^{2}}\right) ^{p}\nonumber \\{} & {} \quad \le C_{8}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}\Delta _{np}. \end{aligned}$$

(4.16)

Analogous to the proof of (4.11), we obtain by (4.7), (4.15) and (4.16) that

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta -\theta _{0}\le \rho }|U_{n}(\theta )|\ge 1/2\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( |V_{n}(\theta (m))|\ge \frac{1}{4}\lambda _{n}^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \qquad +\sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}P\left( \sup _{\theta (m)<\theta _{1},\theta _{2}\le \theta (m+1)}|V_{n}(\theta _2)-V_{n}(\theta _1)|\ge \frac{1}{4}\lambda _{n}^{2}\rho _{m}^{2}\right) \nonumber \\{} & {} \quad \le \sum _{m=0}^{\lfloor n^{1/2}\rfloor -1}\left[ C_{7}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho _{m}^{-p}n^{-p}\Delta _{np}+C_{8}(p)(\Lambda _n/\lambda _n^{2})^{p}\rho ^{p}\rho _{m}^{-2p}n^{-3p/2}\Delta _{np}\right] \nonumber \\{} & {} \quad \le \left[ C_{7}(p)+C_{8}(p)\right] (\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}\nonumber \\{} & {} \qquad +(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}\sum _{m=1}^{\lfloor n^{1/2}\rfloor -1}\left( \frac{C_{7}(p)}{m^{p}}+\frac{C_{8}(p)}{m^{2p}}\right) \nonumber \\{} & {} \quad \le C_{9}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}. \end{aligned}$$

(4.17)

Similarly, we also have

$$\begin{aligned}{} & {} P\left( \sup _{\rho n^{-1/2}<\theta _{0}-\theta \le \rho }|U_{n}(\theta )|\ge 1/2\right) \le C_{10}(p)(\Lambda _n/\lambda _n^{2})^{p}\Delta _{np}n^{-p/2}\rho ^{-p}. \end{aligned}$$

(4.18)

Combining (4.2), (4.14), (4.17), and (4.18), we obtain (2.3) immediately. $\square $

Proof of Corollary 2.1

Taking $\rho =\epsilon n^{1/p}\sqrt{\log n}$ in Theorem 2.1, we have that

$$\begin{aligned} \sum _{n=1}^{\infty }P(n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\rho )= & {} \sum _{n=1}^{\infty }P(|\hat{\theta }_{n}-\theta _{0}|>\epsilon n^{1/p-1/2}\sqrt{\log n})\\\le & {} C\sum _{n=1}^{\infty }n^{-1}\log ^{-p/2}n<\infty ,\end{aligned}$$

which together with the Borel-Cantelli lemma yields the rate of strong consistency. $\square $

Proof of Corollary 2.2

Noting that $\sup _{n\ge 1}E|\xi _{n}|^{p}<\infty $ for some $1<p\le 2$, we may assume that $\Delta _{np}\le 1$ for each $n\ge 1$. Hence, for any $\epsilon >0$, taking $\rho =\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1/2}$ in Theorem 2.2, we have

$$\begin{aligned} P\left( |\hat{\theta }_{n}-\theta _{0}|>\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1}\right)= & {} P\left( n^{1/2}|\hat{\theta }_{n}-\theta _{0}|>\left( \frac{C'(p)}{\epsilon }\right) ^{1/p}n^{1/p-1/2}\right) \\\le & {} C'(p)n^{1-p/2}\frac{\epsilon }{C'(p)}n^{-1+p/2}=\epsilon . \end{aligned}$$

The second conclusion follows immediately by choosing $\rho =\epsilon \sqrt{n/\tau _n}$ in Theorem 2.2. This completes the proof. $\square $

5 Conclusions

In this work, we mainly consider the following nonlinear regression model:

$$\begin{aligned} X_{n}=g_{n}(\theta )+\xi _{n},~n\ge 1, \end{aligned}$$

(5.1)

where $X_{n}$ is observed, $\{g_{n}(\theta )\}$ is a known sequence of continuous functions possibly nonlinear in $\theta \in \Theta $, and $\{\xi _{n},n\ge 1\}$ is a sequence of random errors with zero mean.

The nonlinear regression model not only has essentially fewer unknown parameters, but also has the meaning of physical variables while the linear parameters are usually devoid of physical significance. Therefore, it is of great interest to study the nonlinear regression model.

In this work, in view of the concept of Dirichlet distribution, we introduce the random weighting method to the nonlinear regression model and propose the randomly weighted least squares estimator of $\theta $ as follows:

$$\begin{aligned} \hat{\theta }_{n}=\arg \inf _{\theta \in \Theta }\sum _{i=1}^{n}W_{i}(X_{i}-g_{i}(\theta ))^{2}, \end{aligned}$$

(5.2)

where $W_{i}$’s are independent of $\xi _{i}$’s and the random vector ${\varvec{W}}=(W_{1},\cdots ,W_{n})$ obeys the Dirichlet distribution $Dir(4,4,\ldots ,4)$.

In this work, we establish the asymptotic properties for the randomly weighted least squares estimator with END errors. The results reveal that this new estimator is consistent. Moreover, some simulations are also carried out to show the superiority to the ordinary least squares estimator, especially in a heteroscedastic nonlinear regression model.

References

Barvinok A, Samorodnitsky A (2007) Random weighting, asymptotic counting, and inverse is operimetry. Israel J Math 158(1):159–191
Article MathSciNet Google Scholar
Block HW, Savits TH, Shaked M (1982) Some concepts of negative dependence. Ann Prob 10(3):765–772
Article MathSciNet Google Scholar
Bozorgnia A, Patterson RF, Taylor RL (1996) Limit theorems for dependent random variables. In: Proceedings of the first world congress of nonlinear analysts 92 (II), Walter de Grutyer, Berlin, 1639–1650
Chen PY, Bai P, Sung SH (2014) The von Bahr-Esseen moment inequality for pairwise independent random variables and applications. J Math Anal Appl 419(2):1290–1302
Article MathSciNet Google Scholar
Chen YQ, Chen AY, Ng KW (2010) The strong law of large numbers for extended negatively dependent random variables. J Appl Prob 47:908–922
Article MathSciNet Google Scholar
Fang YX, Zhao LC (2006) Approximation to the distribution of LAD estimators for censored regression by random weighting method. J Stat Plan Inference 136(4):1302–1316
Article MathSciNet Google Scholar
Gao SS, Zhang JM, Zhou T (2003) Law of large numbers for sample mean of random weighting estimate. Inf Sci 155(1–2):151–156
Article MathSciNet Google Scholar
Gao SS, Zhong YM (2010) Random weighting estimation of kernel density. J Stat Plan Inference 140(9):2403–2407
Article MathSciNet Google Scholar
Hu SH (2002) The rate of convergence for the least squares estimator in nonlinear regression model with dependent errors. Sci China Ser A 45(2):137–146
Article MathSciNet Google Scholar
Hu SH (2004) Consistency for the least squares estimator in nonlinear regression model. Stat Prob Lett 67(2):183–192
Article MathSciNet Google Scholar
Ivanov AV (1976) An asymptotic expansion for the distribution of the least squares estimator of the nonlinear regression parameter. Theory Prob Appl 21(3):557–570
Article Google Scholar
Jennrich RI (1969) Asymptotic properties of nonlinear least squares estimators. Ann Math Stat 40(2):633–643
Article Google Scholar
Joag-Dev K, Proschan F (1983) Negative association of random variables with applications. Ann Stat 11:286–295
Article MathSciNet Google Scholar
Lita da Silva J (2015) Almost sure convergence for weighted sums of extended negatively dependent random variables. Acta Math Hungarica 146(1):56–70
Article MathSciNet Google Scholar
Liu L (2009) Precise large deviations for dependent random variables with heavy tails. Stat Prob Lett 79:1290–1298
Article MathSciNet Google Scholar
Liu L (2010) Necessary and sufficient conditions for moderate deviations of dependent random variables with heavy tails. Sci China Ser Math 53(6):1421–1434
Article MathSciNet Google Scholar
Lehmann E (1966) Some concepts of dependence. Ann Math Stat 37:1137–1153
Article MathSciNet Google Scholar
Malinvaud E (1970) The consistency of nonlinear regression. Ann Math Stat 41(3):956–969
Article MathSciNet Google Scholar
Miao Y, Tang YY (2021) Large deviation inequalities of LS estimator in nonlinear regression models. Stat Prob Lett, 168, Article ID 108930, https://doi.org/10.1016/j.spl.2020.108930
Ng KW, Tian GL, Tang ML (2011) Dirichlet and related distributions: theory, methods and applications, vol 888. John Wiley & Sons, London (Chapter 1, 2)
Book Google Scholar
Narayanan A (1990) Computer generation of Dirichlet random vectors. J Stat Comput Simul 36(1):19–30
Article MathSciNet Google Scholar
Prakasa Rao BLS (1984) The rate of convergence of the least squares estimator in a non-linear regression model with dependent errors. J Multivariate Anal 14(3):315–322
Article MathSciNet Google Scholar
Rubin DB (1981) The Bayesian bootstrap. Ann Stat 9:130–134
Article MathSciNet Google Scholar
Shen AT (2011) Probability inequalities for END sequence and their applications. J Inequal Appl 2011:98, 12
Article MathSciNet Google Scholar
Sieders A, Dzhaparidze K (1987) A large deviation result for parameter estimators and its application to nonlinear regression analysis. The Annals of Statistics 15(3):1031–1049
Article MathSciNet Google Scholar
Staniewski P (1984) The Bootstrap in nonlinear regression. In: Rasch D, Tiku ML (eds) Robustness of statistical methods and nonparametric statistics. Theory and decision library (series b: mathematical and statistical methods), vol 1. Springer, Dordrecht, pp 139–142
Chapter Google Scholar
Wang SJ, Wang XJ (2013) Precise large deviations for random sums of END real-valued random variables with consistent variation. J Math Anal Appl 402:660–667
Article MathSciNet Google Scholar
Wang XJ, Hu SH, Shen AT, Yang WZ (2011) An exponential inequality for a NOD sequence and a strong law of large numbers. Appl Math Lett 24:219–223
Article MathSciNet Google Scholar
Wang XJ, Li XQ, Hu SH, Wang XH (2014) On complete convergence for an extended negatively dependent sequence. Commun Stat Theory Methods 43(14):2923–2937
Article MathSciNet Google Scholar
Wang XJ, Zheng LL, Xu C, Hu SH (2015) Complete consistency for the estimator of nonparametric regression models based on extended negatively dependent errors. Statistics 49(2):396–407
Article MathSciNet Google Scholar
Wu CF (1981) Asymptotic theory of nonlinear least squares estimation. Ann Stat 9(3):501–513
Article MathSciNet Google Scholar
Wu Y, Wang XJ, Hu T-C, Volodin A (2019) Complete $f$-moment convergence for extended negatively dependent random variables. RACSAM 113:333–351
Article MathSciNet Google Scholar
Xue LG, Zhu LX (2005) $L_1$-norm estimation and random weighting method in a semiparametric model. Acta Math Appl Sin 21(2):295–302
Article MathSciNet Google Scholar
Yang WZ, Hu SH (2014) Large deviation for a least squares estimator in a nonlinear regression model. Stat Prob Lett 91:135–144
Article MathSciNet Google Scholar
Yang WZ, Xu HY, Chen L, Hu SH (2018) Complete consistency of estimators for regression models based on extended negatively dependent errors. Stat Pap 59(2):449–465
Article MathSciNet Google Scholar
Yang WZ, Zhao ZR, Wang XH, Hu SH (2017) The large deviation results for the nonlinear regression model with dependent errors. TEST 26(2):261–283
Article MathSciNet Google Scholar
Zheng ZG (1987) Random weighting method. Acta Math Appl Sin 10(2):247–253
MathSciNet Google Scholar

Download references

Acknowledgements

The authors are most grateful to the Editor and anonymous referee for carefully reading the manuscript and valuable suggestions which helped in improving an earlier version of this paper. Supported by the National Natural Science Foundation of China (12201004, 12201079, 12201600), the National Social Science Foundation of China (22BTJ059), and the Natural Science Foundation of Anhui Province (2108085MA06).

Author information

Authors and Affiliations

School of Big Data and Artificial Intelligence, Chizhou University, Chizhou, 247000, People’s Republic of China
Yi Wu
School of Big Data and Statistics, Anhui University, Hefei, 230601, People’s Republic of China
Wei Yu & Xuejun Wang

Authors

Yi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuejun Wang.

Ethics declarations

Conflict of interests

There is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma A.1

Let $(W_{1},\cdots ,W_{n})\sim Dir(\alpha _{1},\cdots ,\alpha _{n})$ for each $n\ge 1$ and $\{\xi _{n},n\ge 1\}$ is a sequence of nonnegative END random variables. Then $\{W_{n}\xi _{n},n\ge 1\}$ is still a sequence of END random variables.

Proof

It follows from Example 5.4 of Block et al. (1982) that $\{W_{n},n\ge 1\}$ is a sequence of nonnegative NOD random variables, which is independent of $\{\xi _{n},n\ge 1\}$. Hence, for any real numbers $z_1,\cdots ,z_n$, we have by Definition 1.2 and the properties of NOD random variables (see in Bozorgnia et al. (1996) or Lemmas 1.1 and 1.2 of Wang et al. (2011) for example) that

$$\begin{aligned}{} & {} P\left( W_1\xi _1 \le z_1, \cdots ,W_n\xi _n\le z_n\right) \\{} & {} \quad =\int \cdots \int I\left( w_1 y_1 \le z_1, \cdots ,w_n y_n\le z_n\right) dF_{W_1,\cdots ,W_n,\xi _1,\cdots ,\xi _n}\left( w_1,\cdots ,w_n,y_1,\cdots ,y_n\right) \\{} & {} \quad =\int \cdots \int I\left( w_1 y_1 \le z_1, \cdots ,w_n y_n\le z_n\right) dF_{W_1,\cdots ,W_n}\left( w_1,\cdots ,w_n\right) dF_{\xi _1,\cdots ,\xi _n}\left( y_1,\cdots ,y_n\right) \\{} & {} \quad =\int \cdots \int P\left( w_1 \xi _1 \le z_1, \cdots ,w_n \xi _n\le z_n\right) dF_{W_1,\cdots ,W_n}\left( w_1,\cdots ,w_n\right) \\{} & {} \quad \le M\int \cdots \int P(w_1 \xi _1 \le z_1)\cdots P(w_n \xi _n\le z_n)dF_{W_1,\cdots ,W_n}\left( w_1,\cdots ,w_n\right) \\{} & {} \quad = ME\left[ F_{\xi _1}\left( \frac{z_1}{W_1}\right) \cdots F_{\xi _n}\left( \frac{z_n}{W_n}\right) \right] \le ME\left[ F_{\xi _1}\left( \frac{z_1}{W_1}\right) \right] \cdots E\left[ F_{\xi _n}\left( \frac{z_n}{W_n}\right) \right] \\{} & {} \quad =M\int \int I(w_1 y_1 \le z_1)dF_{W_1}(w_1)dF_{\xi _1}(y_1)\cdots \int \int I(w_n y_n \le z_n)dF_{W_n}(w_n)dF_{\xi _n}(y_n)\\{} & {} \quad =M\int \int I(w_1 y_1 \le z_1)dF_{W_1,\xi _1}(w_1,y_1)\cdots \int \int I(w_n y_n \le z_n)dF_{W_n,\xi _n}(w_n,y_n)\\{} & {} \quad =MP\left( W_1\xi _1 \le z_1)\cdots P(W_n\xi _n\le z_n\right) . \end{aligned}$$

Similarly, we also have that

$$\begin{aligned} P\left( W_1\xi _1>z_1, \cdots ,W_n\xi _n> z_n\right) \le M P\left( W_1\xi _1>z_1)\cdots P(W_n\xi _n>z_n\right) . \end{aligned}$$

Therefore, $\{W_{n}\xi _{n},n\ge 1\}$ is still a sequence of END random variables by Definition 1.2 again. $\square $

Lemma A.2

Let $\{a_{ni},1\le i\le n,n\ge 1\}$ be an array of real numbers and $\{X_{n},n\ge 1\}$ be a sequence of END random variables with $EX_{n}=0$ and $E|X_{n}|^{p}<\infty $ for each $n\ge 1$ and some $p>1$. Then there exist positive constants $C_{p}$ and $C_{p}'$ depending only on p such that

$$\begin{aligned} E\left| \sum _{i=1}^{n}a_{ni}X_{i}\right| ^{p}\le & {} C_{p}\left[ \sum _{i=1}^{n}E|a_{ni}X_{i}|^{p}+\left( \sum \limits _{i=1}^{n}Ea_{ni}X_{i}^{2}\right) ^{p/2}\right] ,\text { if }p\ge 2,\\ E\left| \sum _{i=1}^{n}a_{ni}X_{i}\right| ^{p}\le & {} C_{p}'\sum _{i=1}^{n}E|a_{ni}X_{i}|^{p},\text { if }1<p<2.\end{aligned}$$

Proof

Noting that $a_{ni}=a_{ni}^{+}-a_{ni}^{-}$ for each $1\le i\le n$, $n\ge 1$, the Rosenthal inequality above is a direct consequence of Corollary 3.2 in Shen (2011) by using $C_{r}$-inequality. The second inequality, i.e., Marcinkiewicz-Zygmund inequality can be obtained by the first one and the method used in the proof of Theorem 2.1 in Chen et al. (2014). The details are omitted. $\square $

Lemma A.3

Let $(Y_{1},\cdots ,Y_{n})\sim Dir(\alpha _{1},\cdots ,\alpha _{n})$ and $\alpha _0=\sum _{i=1}^{n}\alpha _{i}$. Then for any $p>0$ and each $1\le i\le n$,

$$\begin{aligned} EY_{i}^{p}=\frac{\Gamma (\alpha _{0})\Gamma (\alpha _{i}+p)}{\Gamma (\alpha _{0}+p)\Gamma (\alpha _{i})}. \end{aligned}$$

Proof

Without loss of generality, we only need to show $EY_{1}^{p}=\frac{\Gamma (\alpha _{0})\Gamma (\alpha _{1}+p)}{\Gamma (\alpha _{0}+p)\Gamma (\alpha _{1})}$. By Definition 1.1 and some standard calculation, we have that

$$\begin{aligned}{} & {} EY_{1}^{p}=\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\int _{0}^{1}y_{1}^{\alpha _{1}+p-1}dy_{1}\int _{0}^{1-y_{1}}y_{2}^{\alpha _{1}-1}dy_{2}\cdots \\{} & {} \qquad \quad \times \int _{0}^{1-\sum _{i=1}^{n-2}y_{i}}y_{n-1}^{\alpha _{n-1}-1}\left( 1-\sum _{i=1}^{n-1}y_{i}\right) ^{\alpha _{n}-1}dy_{n-1}\\{} & {} \quad =\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\int _{0}^{1}y_{1}^{\alpha _{1}+p-1}dy_{1}\int _{0}^{1-y_{1}}y_{2}^{\alpha _{1}-1}dy_{2} \cdots \\{} & {} \qquad \times \int _{0}^{1-\sum _{i=1}^{n-3}y_{i}}y_{n-2}^{\alpha _{n-2}-1}\left( 1-\sum _{i=1}^{n-2}y_{i}\right) ^{\alpha _{n-1}+\alpha _{n}-1}dy_{n-2}\int _{0}^{1} \eta _{n-1}^{\alpha _{n-1}-1}(1-\eta _{n-1})^{\alpha _{n}-1}d\eta _{n-1}\\{} & {} \quad =\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\int _{0}^{1}y_{1}^{\alpha _{1}+p-1}dy_{1}\int _{0}^{1-y_{1}}y_{2}^{\alpha _{1}-1}dy_{2} \cdots \\{} & {} \qquad \times \int _{0}^{1-\sum _{i=1}^{n-3}y_{i}}y_{n-2}^{\alpha _{n-2}-1}\left( 1-\sum _{i=1}^{n-2}y_{i}\right) ^{\alpha _{n-1}+\alpha _{n}-1}dy_{n-2}\cdot \frac{\Gamma (\alpha _{n-1}) \Gamma (\alpha _{n})}{\Gamma (\alpha _{n-1}+\alpha _{n})}\\{} & {} \quad =\cdots =\frac{\Gamma (\alpha _{0})}{\prod _{i=1}^{n}\Gamma (\alpha _{i})}\cdot \frac{\Gamma (\alpha _{1}+p)\prod _{i=2}^{n}\Gamma (\alpha _{i})}{\Gamma (\alpha _{0}+p)} =\frac{\Gamma (\alpha _{0})\Gamma (\alpha _{1}+p)}{\Gamma (\alpha _{0}+p)\Gamma (\alpha _{1})}, \end{aligned}$$

where the second equality above follows by letting $\eta _{n-1}=y_{n-1}/\left( 1-\sum _{i=1}^{n-2}y_{i}\right) $. $\square $

Lemma A.4

(cf. Hu 2004) Let $(\Omega ,\mathscr {F},P)$ be a probability space, $[T_{1},T_{2}]$ be a closed interval on the real line. Assume that $V(\theta )=V(\omega ,\theta )$ $(\theta \in [T_{1},T_{2}],\omega \in \Omega )$ is a stochastic process such that $V(\omega ,\theta )$ is continuous for all $\omega \in \Omega $. If there exist numbers $\alpha >0$, $r>0$ and $C=C(T_{1},T_{2})<\infty $ such that

$$\begin{aligned} E|V(\theta _{1})-V(\theta _{2})|^{r}\le C|\theta _{1}-\theta _{2}|^{1+\alpha },~\text { for any }\theta _{1},\theta _{2}\in [T_{1},T_{2}],\end{aligned}$$

then for any $\varepsilon >0$, $a>0$, $\theta _{0},\theta _{0}+\varepsilon \in [T_{1},T_{2}]$, and $\gamma \in (2,2+\alpha )$, it has

$$\begin{aligned} P\left( \sup _{\theta _{0}\le \theta _{1},\theta _{2}\le \theta _{0}+\varepsilon }|V(\theta _{1})-V(\theta _{2})|\ge a\right) \le \frac{8C}{(\alpha -\gamma +2)(\alpha -\gamma +3)}\left( \frac{8\gamma }{\gamma -2}\right) ^{r}\frac{\varepsilon ^{\alpha +1}}{a^{r}}. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, Y., Yu, W. & Wang, X. Large deviations for randomly weighted least squares estimator in a nonlinear regression model. Metrika 87, 551–570 (2024). https://doi.org/10.1007/s00184-023-00926-0

Download citation

Received: 21 September 2021
Accepted: 08 September 2023
Published: 04 October 2023
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00184-023-00926-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Abstract

Similar content being viewed by others

Best linear estimation via minimization of relative mean squared error

Weighted-Average Least Squares (WALS): Confidence and Prediction Intervals

Least absolute deviations estimation for uncertain regression with imprecise observations

1 Introduction

Theorem 1.1

Definition 1.1

Definition 1.2

2 Main results

Theorem 2.1

Theorem 2.2

Remark 2.1

Remark 2.2

Corollary 2.1

Corollary 2.2

3 Some examples and numerical analysis

3.1 Some examples

Example 3.1

Example 3.2

3.2 Numerical analysis

4 Proofs of the main results

Proof of Theorem 2.1

Proof of Theorem 2.2

Proof of Corollary 2.1

Proof of Corollary 2.2

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Appendix

Appendix

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation