Stein-rule M-estimation in sparse partially linear models

Raheem, Enayetur; Ahmed, S. Ejaz; Liu, Shuangzhe

doi:10.1007/s42081-023-00231-0

Stein-rule M-estimation in sparse partially linear models

Original Paper
Stein Estimation and Statistical Shrinkage Methods
Published: 23 December 2023

Volume 7, pages 507–535, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Stein-rule M-estimation in sparse partially linear models

Download PDF

116 Accesses
2 Citations
Explore all metrics

Abstract

We propose and investigate the statistical properties of shrinkage M-estimators based on Stein-rule estimation for partially linear models under the assumption of sparsity. We are mainly interested in estimating regression coefficients parameter sub-vector with strong signals when the sparsity assumption may or may not hold. Thus, we consider two models, one including all the predictors, leading to a full (unrestricted, or over-fitted) model estimation; and the other with only a few influential predictors, resulting in a submodel (restricted, or under-fitted model) estimation problem. Generally speaking, submodel estimators perform better than full model estimators, when the assumption of sparsity is nearly correct. However, a small departure from this assumption makes submodel estimators biased and inefficient, questioning its applicability for practical reason. On the other hand, the full model estimators may not be desirable due to interpretability and higher estimation errors, specially when a large number of predictors are included in the model. For this reason, we propose shrinkage strategies which combine both full model and submodel estimators in an optimal way. The asymptotic properties of the suggested estimators have been studied both analytically and numerically. The asymptotic bias and risk of the estimators are derived in closed form. In addition, a simulation study is conducted to examine the performance of the estimators in practical settings when sparsity assumption may or may not hold. Our simulation results consolidate the theoretical properties of the estimators.

Assessing the Relative Performance of Penalty and Non-penalty Estimators in a Partially Linear Model

Efficient Shrinkage for Generalized Linear Mixed Models Under Linear Restrictions

Article 25 January 2018

Shrinkage and Sparse Estimation for High-Dimensional Linear Models

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We study robust shrinkage M-estimation in partially linear models (PLM) with a scaled error term. Ahmed et al. (2006) considered robust shrinkage-estimation of regression parameters when it is a priori suspected that the regression parameters could be restricted to a linear subspace. They studied the asymptotic properties of variants of Stein-rule M-estimators. For some insights on shrinkage estimation strategies, we refer to Ahmed and Fallahpour (2012), Ahmed and Raheem (2012), Raheem et al. (2012), Ahmed (2014), Ma et al. (2014), Ahmed et al. (2016), Sun et al. (2020) and Opoku et al. (2021). Recently, Ahmed et al. (2023) extended shrinkage strategies to high-dimension when the sparsity assumption cannot be judiciously justified. Maruyama et al. (2023) presented both classical and recent shrinkage estimation developments including results of admissibility of generalized Bayes estimators in the presence of a nuisance scale parameter.

Application of robust statistical techniques, including M-estimation and related approaches, to address modeling and prediction challenges are found in various domains, e.g. industrial data modeling (Zhou et al., 2020), disease incidence prediction (Susanti et al., 2020), and ratio-type estimation (Rather et al., 2022). These techniques are valuable for handling data imperfections, outliers, and non-normality.

Related works also include those by Arashi et al. (2014). They discussed the improvement of preliminary tests and Stein-rule Liu estimators specifically tailored for the ill-conditioned elliptical linear regression model. The authors focused on addressing the challenges posed by data that exhibit multicollinearity or non-normality, providing more accurate estimations in such scenarios. Norouzirad and Arashi (2019) discussed the use of preliminary tests and Stein-type shrinkage ridge estimators in the context of robust regression. They explored methods for improving the robustness and accuracy of regression models, particularly when dealing with outliers or influential data points. Recently, Shih et al. (2021) proposed a robust ridge M-estimators, incorporating pretests and Stein-rule shrinkage techniques for estimating the intercept term in regression models. They aimed to enhance the robustness of ridge regression when dealing with outliers and influential observations. On the other hand, in addition to the original results on Huber type M-estimation, a review of recent results and applications was given by e.g. Farcomeni and Ventura (2012).

Generally speaking, a PLM is more flexible than a linear model since it includes a nonlinear component along with the linear components. A PLM may provide a better alternative to the classical linear regression model in a situation where one or more predictors have nonlinear relationship with the response variable. Robust regression models are designed to overcome some of the limitations of classical linear regression in a host of scenarios. For example, least squares regression is highly sensitive to outliers, and is subject to underlying assumptions. Any violation of these assumptions may have serious impact on the validity of the fitted model.

In this paper, we extend the available works to a PLM, and develop shrinkage M-estimators. We construct shrinkage M-estimators of regression parameters under the sparsity assumption. Our analytical and numerical results establish the superiority of shrinkage M-estimators over full model and submodel M-estimators. We focus on shrinkage M-estimation of regression coefficients in a PLM when sparsity assumption may or may not hold. In our setup, the nonparametric part is estimated by the kernel-based method.

1.1 Statement of the problem

Consider a PLM of the form

$$\begin{aligned} \varvec{Y}= \varvec{X}\varvec{\beta }+ g(T) + \sigma \varvec{e}, \end{aligned}$$

(1.1)

where $\varvec{Y}= (y_1, y_2, \dots , y_n)^\top $ is an n-response vector, $\varvec{X}= (\varvec{x}_1^\top , \varvec{x}_2^\top , \dots , \varvec{x}_n^\top )^\top $ is the $n \times p$ design matrix with $\varvec{x}_i$’s as known row p-vectors, $\varvec{\beta }=(\beta _1, \beta _2, \dots , \beta _p)^\top $ is the p-vector of regression parameters, $g(T)=(g(t_1), g(t_2), \dots , g(t_n))^\top $ is an unknown real-valued function, $\varvec{e} =(e_1, e_2, \dots , e_n)^\top $ is the n-vector of random errors with mean $E(\varvec{e})=\varvec{0}$, $e_i$’s are independent and identically distributed (iid) random variables having a continuous distribution, F, free from any unknown scale parameter $\sigma >0$, where the $()^\top $ denotes transpose of a vector or a matrix. In passing, we would like to remark here that without loss of generality the intercept term is not included in establishing the asymptotic properties of the estimators.

Under the assumption of the sparsity, the data matrix $\varvec{X}$ can be partitioned as $\varvec{X}=(\varvec{X}_1: \varvec{X}_2)$ with $\varvec{\beta }=(\varvec{\beta }_1^\top , \varvec{\beta }_2^\top )^\top $, where $\varvec{X}_1$ and $\varvec{X}_2$ are $n \times p_1$ and $n \times p_2$ submatrices of predictors with strong signals and no signals, respectively. Thus, the model can be rewritten as

$$\begin{aligned} \varvec{Y}=\varvec{X}_1 \varvec{\beta }_1 + \varvec{X}_2 \varvec{\beta }_2 + g(T) + \sigma \varvec{e}, \quad p = p_1+p_2 < n. \end{aligned}$$

Under the assumption of sparsity, that is, $\varvec{\beta }_2 = \varvec{0}$, then we have the submodel, under-fitted or restricted model as follows:

$$\begin{aligned} \varvec{Y}=\varvec{X}_1 \varvec{\beta }_1 + g(T) + \sigma \varvec{e}, \quad p_1 < n, \end{aligned}$$

and the remaining discussion follows.

In practice, the submodel can be readily obtained by applying a suitable variable selection method to full model. We are primarily interested in estimating $\varvec{\beta }_1$, when $\varvec{\beta }_2$ may or may not be a null vector. In other words, when practitioners may not be certain that model is fully sparse or not. In an effort to help data analyst, we propose shrinkage M-estimates based on Stein-rule to improve the performance of the under-fitted model estimators.

The remainder of the paper is organized as follows. In Sect. 2, we define a kernel-based least-squares (LS) estimators and discuss a two-step procedure to estimate the nonparametric function in a PLM. In Sect. 3, we propose Stein-rule shrinkage M-estimators. Asymptotic properties of the estimators are presented in Sect. 4. The expressions for asymptotic bias and risk for the estimators are derived in Sect. 5. Monte Carlo simulation results are conducted in Sect. 6. Our concluding remarks are made in Sect. 7. Finally, our derivations of the theoretical results are included in the appendix.

2 Proposed LS estimation

In this section, we propose our robust LS estimation method with a two-step procedure.

Again, consider a PLM of the form

$$\begin{aligned} \varvec{Y}= \varvec{X}\varvec{\beta }+ g(T) + \sigma \varvec{e}, \end{aligned}$$

(2.1)

We first linearize (2.1) by estimating $g(\cdot )$ using kernel smoothing. We then confine ourselves to the estimation of $\varvec{\beta }$ based on the partial residuals which attains the usual parametric convergence rate $n^{-1/2}$ without under-smoothing the nonparametric component $g(\cdot )$; see e.g. Speckman (1988).

Now, we describe the estimation process. We assume $\left\{ y_i, \varvec{x}_i^\top , t_i; i=1, 2, \dots , n \right\} $ satisfy (2.1). If $\varvec{\beta }$ is the true parameter, then by $E(\varvec{e}_i)=0$, we have

$$\begin{aligned} g(t_i) = E(y_i - \varvec{x}_i^\top \varvec{\beta }), \quad i =1, 2, \dots , n. \end{aligned}$$

A natural nonparametric estimator of $g(\cdot )$ given $\varvec{\beta }$ is

$$\begin{aligned} g^*(t, \varvec{\beta }) = \sum _{i=1}^{n} W_{ni}(t) (y_i - \varvec{x}_i^\top \varvec{\beta }), \end{aligned}$$

where

$$\begin{aligned} W_{ni}(t) = \frac{K((t_i - t)/h)}{\sum _{j=1}^n K((t_j - t)/h)}, \end{aligned}$$

(2.2)

with $K(\cdot )$ being a kernel function which is a non-negative function integrable on ${\mathfrak {R}}$, and h being a bandwidth parameter. We need to make the assumptions as outlined in Appendix B.

Now, we define the conditional expectations

$$\begin{aligned} \gamma _0(t)&= E(y | T=t), \text{ and } \\ \varvec{\gamma }(t)&= (\gamma _1(t), \gamma _2(t), \ldots , \gamma _n(t))^\top , \end{aligned}$$

where $\gamma _j(t) = E(\varvec{x}_j|T=t)$.

We estimate $\varvec{\beta }$ using

$$\begin{aligned} \hat{\varvec{\beta }}= \text{ arg }\min SS(\varvec{\beta }) = ({\widetilde{\varvec{X}}}^\top {\widetilde{\varvec{X}}} )^{-1}{\widetilde{\varvec{X}}}^\top \widetilde{\varvec{Y}}, \end{aligned}$$

(2.3)

with

$$\begin{aligned} SS(\varvec{\beta }) = \sum _{i=1}^n \left( y_i - \varvec{x}_i^\top \varvec{\beta }- g^*(t_i, \varvec{\beta })\right) ^2 = \sum _{i=1}^n ({\tilde{y}}_i - \tilde{\varvec{x}}_i^\top \varvec{\beta })^2, \end{aligned}$$

(2.4)

where ${\widetilde{\varvec{Y}}} = ({\tilde{y}}_1, {\tilde{y}}_2, \dots , \tilde{y}_n)^\top $, ${\widetilde{\varvec{X}}} = ({\tilde{\varvec{x}}}_1, {\tilde{\varvec{x}}}_2, \dots , \tilde{\varvec{x}}_n)^\top $, ${\tilde{y}}_i = y_i - \gamma _0(t)$, and ${\tilde{\varvec{x}}}_i = \varvec{x}_i - \varvec{\gamma }(t)$ for $i =1, 2, \dots , n$.

The conditional expectations $\gamma _0(t)$ and $\varvec{\gamma }(t)$ are obtained using a classical nonparametric approach through

$$\begin{aligned} \hat{\gamma }_0(t)&= \sum _{i=1}^n W_{ni}(t)y_i, \text{ and } \\ \hat{\gamma }_j(t)&= \sum _{i=1}^n W_{ni}(t) x_{ij}, \end{aligned}$$

where $W_{ni}(t)$ is defined in (2.2). Clearly, once we obtain the estimates $\hat{\gamma }_0(t)$ and $\hat{\gamma }_j(t)$, they can be plugged into (2.4) prior to the estimation of $\varvec{\beta }$.

The above procedure was independently proposed by Denby (1986) and Speckman (1988). A similar approach was taken by Ahmed et al. (2007) in estimating the nonparametric component in a PLM.

We obtain the robust M-estimators of the parameters of a PLM using a two-step procedure as follows:

Step 1
We first estimate $\gamma _0(t)$ and $\gamma _j(t)$ through kernel smoothing as described above. We denote the estimates by $\hat{\gamma }_0(t)$ and $\hat{\gamma }_j(t)$, respectively.
Step 2
The estimates in Step 1 are then plugged into (2.2). So the estimator ${\hat{\varvec{\beta }}}$ of $\varvec{\beta }$ can be obtained by regressing the residuals $y_i -\hat{\gamma }_0(t)$ and $\varvec{x}_i - \hat{\varvec{\gamma }}(t)$ using a robust procedure. We denote the residuals as ${\hat{r}}_i = y_i -\hat{\gamma }_0(t)$ and $\varvec{u}_i = \varvec{x}_i - \hat{\varvec{\gamma }}(t_i)$.

Consistency and asymptotic normality of the estimators can be found in Appendix Section B.1 and the reference therein.

3 Proposed shrinkage M-estimation strategies

In this section, we propose our full model and submodel estimators, and formulate a test statistic which has asymptotically a non-central $\chi ^2$ distribution.

Let ${\hat{\varvec{\beta }}}_1^{\text {RM}}$ be the restricted estimator of $\varvec{\beta }_1$ where $\varvec{\beta }_2={{\textbf{0}}}$, and $\hat{\beta }_1^{\text {UM}}$ be the unrestricted estimator of $\beta _1$ when $\varvec{\beta }_2$ may not be a null vector. Following Ahmed (2014), a Stein-type M-estimator (SM), $\hat{\varvec{\beta }}^{\text {SM}}_1$ of $\varvec{\beta }_1$ can be defined as

$$\begin{aligned} \hat{\varvec{\beta }}_1^{\text {SM}}= \hat{\varvec{\beta }}_1^{\text {RM}} + (\hat{\varvec{\beta }}_1^{\text {UM}} - \hat{\varvec{\beta }}_1^{\text {RM}}) \left\{ 1- \kappa \psi _n^{-1}\right\} , \quad \text{ for } \kappa =p_2-2 \text{ and } p_2 \ge 3, \end{aligned}$$

where $\psi _n$ is a distance statistic defined later in (3.7). To avoid the over-shrinkage problem, the positive-rule Stein-type M-estimator (SM+) has the form

$$\begin{aligned} \hat{\varvec{\beta }}_1^{\text {SM+}}= \hat{\varvec{\beta }}_1^{\text {RM}} + (\hat{\varvec{\beta }}_1^{\text {UM}} - \hat{\varvec{\beta }}_1^{\text {RM}}) \left\{ 1- \kappa \psi _n^{-1}\right\} ^{+}, \quad \text{ for } p_2 \ge 3, \end{aligned}$$

where $z^+=\max (0,z)$.

3.1 Full model and submodel estimation strategies for $\hat{\varvec{\beta }}_1$

For a suitable absolutely continuous function $\rho : {\mathfrak {R}} \rightarrow {\mathfrak {R}}$, with derivative $\phi $, an M-estimator of $\varvec{\beta }$ is defined as a solution of the minimization

$$\begin{aligned} \min _{\varvec{\beta }} \sum _{i=1}^{n} \rho (\tilde{y}_i - \tilde{\varvec{x}}_i^\top \varvec{\beta }). \end{aligned}$$

(3.1)

Generally, an M-estimator is regression-equivariant, i.e.,

$$\begin{aligned} \varvec{M}_n(c\varvec{Y}+ \varvec{X}\varvec{a}) = c\varvec{M}_n(\varvec{Y}) + c\,\varvec{a}, \quad \text{ for } \varvec{a} \in {\mathfrak {R}}_p, \end{aligned}$$

and robustness depends on the choice of $\rho (\cdot )$. But it is generally not scale-equivariant. That is, it may not satisfy

$$\begin{aligned} \varvec{M}_n(c\varvec{Y}) = c\varvec{M}_n(\varvec{Y}), \quad \text{ for } c>0. \end{aligned}$$

To have the estimators scale and regression equivariant, we need to studentize them. The studentized M-estimator is defined as as solution of the minimization

$$\begin{aligned} \min _{\beta \in {\mathfrak {R}}^p}\sum _{i=1}^{n} \rho \left( \frac{\tilde{y}_i - \tilde{\varvec{x}}_i^\top \varvec{\beta }}{S_n}\right) , \end{aligned}$$

(3.2)

where $S_n =S_n(\varvec{Y})\ge 0$ is an appropriate scale statistic that is regression equivariant and scale equivariant, i.e.,

$$\begin{aligned} S_n(c(\varvec{Y}+ \varvec{X}\varvec{a})) = cS_n(\varvec{Y}), \quad \text{ for } \varvec{a} \in {\mathfrak {R}}_p \text{ and } c>0. \end{aligned}$$

According to Jurečcková and Sen (1996), the minimization in (3.2) should be supplemented by a rule how to define $\varvec{M}_n$ in the case when $S_n(\varvec{Y})=0$. However, in general, this happens with probability zero, and the specific rule does not affect the asymptotic properties of $\varvec{M}_n$. There are additional regularity conditions needed with (3.2), which we present in Appendix A. Further details may be found in Jurecčková and Sen (Jurečcková and Sen (1996), page 217).

Now, in an effort to define the M-estimator for $\varvec{\beta }_1$, we define $C=X^\top X$ with $X=(X1:X2)$ as follows:

$$\begin{aligned} \varvec{C} = \left( \begin{array}{cc} \varvec{C}_{11} &{} \varvec{C}_{12} \\ \varvec{C}_{21} &{} \varvec{C}_{22} \end{array} \right) = \left( \begin{array}{ll}X_1^\top X1 &{} X_1^\top X2 \\ X_2^\top X1 &{}X_2^\top X2\end{array}\right) \end{aligned}$$

Also, we define

$$\begin{aligned} \varvec{C}_{22.1} = \varvec{C}_{22} - \varvec{C}_{21}\varvec{C}_{11}^{-1}\varvec{C}_{12}. \end{aligned}$$

Note that, if $\varvec{C}_{21}=\varvec{0}$, then $\varvec{C}_{22.1}= \varvec{C}_{22}$. Otherwise, $\varvec{C}_{22}- \varvec{C}_{22.1}$ is positive semi-definite. We assume that C and $C_{22.1}$ are positive.

A studentized unrestricted M-estimator of $\varvec{\beta }$ is defined as a solution of (3.2). Let us denote it by

$$\begin{aligned} \hat{\varvec{\beta }}^{\text {UM}} = \left( \left( \hat{\varvec{\beta }}_1^{\text {UM}}\right) ^\top , \left( \hat{\varvec{\beta }}_2^{\text {UM}}\right) ^\top \right) ^\top . \end{aligned}$$

A studentized restricted M-estimator of $\varvec{\beta }_1$ is obtained by minimizing

$$\begin{aligned} \min _{{\varvec{\beta }}_{1} \in {\mathfrak {R}}^{p_1}} \sum _{i=1}^{n}\rho \left( \frac{\tilde{y}_i- \tilde{\varvec{x}}_{i1}^\top {\varvec{\beta }}_{1}}{S_n}\right) , \end{aligned}$$

(3.3)

where $S_n$ is regression-invariant so is not affected by the restricted environment. Since $\rho (\cdot )$ is assumed to have derivative $\phi (\cdot )$, we rewrite $\hat{\varvec{\beta }}^{\text {UM}}$ as a solution of

$$\begin{aligned} \varvec{M}_n({\varvec{\theta }}) = \sum _{i=1}^n \tilde{\varvec{x}}_i \, \phi \left( \frac{\tilde{y}_i - \tilde{\varvec{x}}_i^\top \varvec{\theta }}{S_n} \right) = \varvec{0}. \end{aligned}$$

(3.4)

In other words,

$$\begin{aligned} \varvec{M}_n(\hat{\varvec{\beta }}^{\text {UM}}) = \varvec{0}. \end{aligned}$$

Similarly, $\hat{\varvec{\beta }}_1^{\text {RM}}$ is a solution of

$$\begin{aligned} {{\varvec{M}}_{n_1}} ({\varvec{\theta }_1}) = \sum _{i=1}^n \tilde{\varvec{x}}_{i1} \, \phi \left( \frac{\tilde{y}_i - \tilde{\varvec{x}}_{i1}^\top \varvec{\theta }_1}{S_n} \right) = \varvec{0}. \end{aligned}$$

(3.5)

Now, let

$$\begin{aligned} { \hat{\varvec{M}}_{n_2}^{\text {RM}} } = \sum _{i=1}^n \tilde{\varvec{x}}_{i2} \, \phi \left( \frac{\tilde{y}_i - \tilde{\varvec{x}}_{i1}^\top \hat{\varvec{\beta }}_1^{\text {RM}}}{S_n}\right) . \end{aligned}$$

(3.6)

Note that ${\varvec{M}}_{n}$ is a $(p_1+p_2)$-vector, ${\varvec{M}}_{n_1}$ is a $p_1$-vector and $\hat{\varvec{M}}_{n_2}^{\text {RM}}$ is a $p_2$-vector.

3.2 Test statistic

Following (Jurečcková and Sen (1996), Sect. 10.2) a suitable test statistic can be formulated as follows:

$$\begin{aligned} \psi _n = \frac{\left[ \hat{\varvec{M}}_{n_2}^{\text {RM}}\right] ^\top \varvec{C}_{22.1}^{-1}\left[ \hat{\varvec{M}}_{n_2}^{\text {RM}}\right] }{\hat{\sigma }_{\Phi _n}}, \end{aligned}$$

(3.7)

where

$$\begin{aligned} \hat{\sigma }_{\Phi _n}^{2} = (n-p_2)^{-1}\sum _{i=1}^n \phi ^2 \left( \frac{\tilde{y}_i - \tilde{\varvec{x}}_{i1}^\top \hat{\varvec{\beta }}_1^{\text {RM}}}{S_n}\right) . \end{aligned}$$

(3.8)

Directly applying the Lemma 5.5.1 in (Jurečcková and Sen (1996), page 220), it can be shown that under the sparsity assumption, that is, $\varvec{\beta }_2$ is a null vector

$$\begin{aligned} \psi _n {\mathop {\longrightarrow }\limits ^{d}} \chi ^2_{p_2}. \end{aligned}$$

Further, under (local) alternative hypothesis $\psi _n$ has a non-central $\chi ^2$ distribution.

It is to be mentioned here that unlike LS estimators, M-estimators are not linear. Even if the distribution function F is normal, the finite sample distribution theory of M-estimators is not simple. Asymptotic methods Jurečcková and Sen (1996) have been used to overcome this difficulty.

4 Asymptotic properties of the estimators

In this section, we establish the asymptotic properties of the estimators. This facilitates in finding the asymptotic distributional bias (ADB), asymptotic distributional quadratic bias (ADQB), and asymptotic distributional quadratic risk (ADQR) of the estimator of the regression parameter vector $\varvec{\beta }_1$.

Under the assumed regularity conditions

$$\begin{aligned} \lim _{n \rightarrow \infty }\frac{C_n}{n} = \varvec{Q}, \end{aligned}$$

(4.1)

where

$$\begin{aligned} \varvec{Q} = \left( \begin{array}{cc} \varvec{Q}_{11} &{} \varvec{Q}_{12} \\ \varvec{Q}_{21} &{} \varvec{Q}_{22} \end{array} \right) , \end{aligned}$$

it is known that under non-sparsity assumption, that is under local alternatives, $\varvec{\beta }_2 \ne \varvec{0}$,

$$\begin{aligned} \frac{\psi _n}{n} \rightarrow \gamma (\hat{\varvec{\beta }}_1, \hat{\varvec{\beta }}_2; \varvec{Q}) >0, \quad \text{ as } n \rightarrow \infty , \end{aligned}$$

such that the shrinkage factor $\kappa \psi ^{-1}_n = {\mathcal {O}}_p (n^{-1})$. This implies, asymptotically, there is no shrinkage effect. Therefore, to obtain meaningful asymptotics, we consider a class of local alternatives, $\{K_n\}$, given by

$$\begin{aligned} K_n: \varvec{\beta }_2 = \varvec{\beta }_{2n} = \frac{\varvec{\omega }}{\sqrt{n}}, \end{aligned}$$

(4.2)

where $\varvec{\omega }= (\omega _1, \omega _2, \dots , \omega _{p_2})^\top \in {\mathfrak {R}}^{p_2}$ is a fixed vector and $||\varvec{\omega }|| < \infty $, so that the null hypothesis $H_0: \varvec{\beta }_2 = \varvec{0}$ reduces to $H_0: \varvec{\omega }= \varvec{0}$.

For an estimator $\varvec{\beta }^{*}_1$ and a positive-definite matrix $\varvec{W}$, we define the loss function of the form

$$\begin{aligned} L(\varvec{\beta }^{*}_1; \varvec{\beta }_1) = n(\varvec{\beta }^{*}_1- \varvec{\beta }_1)^\top \varvec{W}(\varvec{\beta }^{*}_1- \varvec{\beta }_1). \end{aligned}$$

Thus, the risk function is defined as follows:

$$\begin{aligned} R[(\varvec{\beta }^{*}_1, \varvec{\beta }_1); \varvec{W}]&= n E[(\varvec{\beta }^{*}_1-\varvec{\beta }_1)^\top \varvec{W} (\varvec{\beta }^{*}_1 - \varvec{\beta }_1)] \nonumber \\&= n \, \text {tr} [\varvec{W}\{E(\varvec{\beta }^{*}_1-\varvec{\beta }_1)(\varvec{\beta }^{*}_1-\varvec{\beta }_1)^\top \}] \nonumber \\&= \text {tr} (\varvec{W} \varvec{\Omega }^{*}), \end{aligned}$$

(4.3)

where tr denotes the trace operator and $\varvec{\Omega }^{*}$ is the covariance matrix of $\sqrt{n} (\varvec{\beta }_1^*-\varvec{\beta }_1)$. Whenever $ \lim _{n \rightarrow \infty }\hat{\varvec{\Omega }}^*_n = \hat{\varvec{\Omega }}^* $ exists, the asymptotic risk is defined by

$$\begin{aligned} R_n(\varvec{\beta }_{1n}^*, \varvec{\beta }_1; \varvec{W}) \rightarrow R(\varvec{\beta }^*_1, \varvec{\beta }_1; \varvec{W}) = \text{ tr }(\varvec{W}\hat{\varvec{\Omega }}^*). \end{aligned}$$

Consider the asymptotic cumulative distribution function (cdf) of $\sqrt{n}(\varvec{\beta }^{*}_{1n} - \varvec{\beta }_1)$ under $\{ K_n \}$ exists, and is defined as

$$\begin{aligned} G(\varvec{y}) = P\left[ \lim _{n \rightarrow \infty } \sqrt{n}(\varvec{\beta }^{*}_{1n} - \varvec{\beta }_1) \le \varvec{y} \right] . \end{aligned}$$

This is known as the asymptotic distribution function (ADF) of $\varvec{\beta }^{*}_1$. Suppose that $G_n \rightarrow G$ at all points of continuity as $n \rightarrow \infty $, and let $\hat{\varvec{\Omega }}^*$ be the covariance matrix of G. Then the ADR is defined as

$$\begin{aligned} R(\varvec{\beta }^*_{1}, \varvec{\beta }_1; \varvec{W}) = \text{ tr }(\varvec{W} \varvec{\Omega }^*_{G}). \end{aligned}$$

As noted in Ahmed et al. (2006), if $G_n \rightarrow G$ in second moment, then ADR is the asymptotic risk. However, this is a stronger mode of convergence, and is hard to analytically prove for shrinkage M-estimators. Therefore, they suggested using asymptotic distributional risk.

Now let

$$\begin{aligned} {\varvec{\Gamma }} = \int \int \cdots \int \varvec{y} \varvec{y}^\top dG(\varvec{y}) \end{aligned}$$

be the dispersion matrix which is obtained from ADF. The asymptotic distributional quadratic risk (ADQR) may be defined as

$$\begin{aligned} R(\varvec{\beta }^{*}_1; \varvec{\beta }_1) = \text {tr}(\varvec{W} \varvec{\Gamma }), \end{aligned}$$

(4.4)

where $\varvec{\Gamma }$ is the asymptotic distributional mean squared error (ADMSE) of the estimators.

To establish the asymptotic properties of the estimators, we present two important theorems.

Theorem 1

Consider an absolutely continuous function $f(\cdot )$ with derivative $f'(\cdot )$ which exists everywhere, and finite Fisher information

$$\begin{aligned} I(f) = \int _{R}\left( \frac{-f'(x)}{f(x)}\right) ^2 dF(x) < \infty . \end{aligned}$$

Under $\{K_n\}$ and the assumed regularity conditions, $\psi _n$ has asymptotically a non-central chi-square distribution with non-centrality parameter $\Delta = \varvec{\omega }^\top \varvec{Q}_{22.1}\varvec{\omega }\gamma ^{-2}$, where

$$\begin{aligned} \gamma ^2 = \frac{\int _{R}\phi ^2(y) \textrm{d}F(y)}{\int _{R}\phi (x)[-f'(x)/f(x)]^2\textrm{d}F(x)}, \end{aligned}$$

(4.5)

and $\phi (\cdot )$ is defined in (3.4) or Appendix A.

Theorem 2

We have, under the assumed regularity conditions, as $n \rightarrow \infty $

$$\begin{aligned} \sqrt{n}( \varvec{{\hat{\beta }}}^{\text {UM}}- \varvec{\beta }) {\mathop {\rightarrow }\limits ^{d}} N_p(\varvec{0}, \gamma ^2 \varvec{Q}^{-1}). \end{aligned}$$

(4.6)

Proofs of these theorems are available in Jurečcková and Sen (1996).

5 Asymptotic bias and risk of the estimators

In this section, we present the asymptotic distribution, bias and risk results for each of our estimators. We also compare their risk performances.

Theorem 3

Under the local alternative $K_n$ and the assumed regularity conditions, we have as $n\rightarrow \infty $

(i)
$\eta _1 = \sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{\beta }_1) {\mathop {\rightarrow }\limits ^{d}} N(\varvec{0}, \gamma ^2\varvec{Q}^{-1}_{11.2}),$
(ii)
$\eta _2 = \sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1) {\mathop {\rightarrow }\limits ^{d}} N(\varvec{\delta }, \varvec{\Sigma }^*)$, $\quad \varvec{\delta }=-\varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{\omega },$
(iii)
$\eta _3 = \sqrt{n}(\varvec{{\hat{\beta }}}^{\text {RM}}_1-\varvec{\beta }_1) {\mathop {\rightarrow }\limits ^{d}} N(-\varvec{\delta }, \varvec{\Omega }^*), \quad \varvec{\Omega }^*=\gamma ^2 \varvec{Q}^{-1}_{11}.$

We have, under $\{K_{n}\}$

$$\begin{aligned} \sqrt{n} \left( (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{\beta }_1)^\top , (\varvec{{\hat{\beta }}}^{\text {UM}}_2 - n^{-\frac{1}{2}}\varvec{\omega })^\top \right) ^\top {\mathop {\rightarrow }\limits ^{d}} N(\varvec{0}, \gamma ^2\varvec{Q^{-1}}), \end{aligned}$$

where $\varvec{Q}$ is partitioned as in (4.1).

Also, we have the joint distributions as follows:

$$\begin{aligned} \left( \begin{array}{c} \eta _1 \\ \eta _2 \end{array} \right) \sim N \left[ \left( \begin{array}{c} \varvec{0}\\ \varvec{\varvec{\delta }} \end{array} \right) ,\, \left( \begin{array}{cc} \gamma ^2 \varvec{Q}^{-1}_{11.2} &{} \varvec{\Sigma }_{12} \\ \varvec{\Sigma }_{21} &{} \varvec{\Sigma }^* \end{array} \right) \right] \\ \left( \begin{array}{c} \eta _2 \\ \eta _3 \end{array} \right) \sim N \left[ \left( \begin{array}{c} \varvec{\delta }\\ -\varvec{\delta } \end{array} \right) ,\, \left( \begin{array}{cc} \varvec{\Sigma }^* &{} \varvec{\Omega }_{12} \\ \varvec{\Omega }_{21} &{} \varvec{\Omega }^* \end{array} \right) \right] . \end{aligned}$$

The proof for this theorem is given in Appendix C.

5.1 Asymptotic bias of the estimators

The asymptotic distributional bias (ADB) of an estimator $\varvec{\beta }^*$ is defined as

$$\begin{aligned} \text{ ADB }(\varvec{\beta }^*)=E\left\{ \lim _{n\rightarrow \infty } n^{\frac{1}{2}}(\varvec{\beta }^*-\varvec{\beta })\right\} . \end{aligned}$$

Theorem 4

Under the assumed regularity conditions and the stated theorems above, and under $\{K_n\}$, the ADB of the estimators are as follows:

$$\begin{aligned} \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {UM}}_1)&={\varvec{0}}\\ \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&=-\varvec{\delta }\\ \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= \kappa \varvec{\delta }E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} \\ \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= ADB(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - \varvec{\delta }\left[ H_{p_2+2}(\kappa , \Delta ) - E\left\{ \kappa \chi ^{-2}_{p_2+2}(\Delta )I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\right\} \right] , \end{aligned}$$

where ${E\left\{ \chi ^{-2}_{a}(\Delta )\right\} }$ is the expected value of an inverse of a non-central $\chi ^2$ random variable with a degrees of freedom and non-centrality parameter $\Delta $, and $H_{a}(y, \Delta )$ is the cdf of the a non-central $\chi ^2$ random variable with a degrees of freedom and non-centrality parameter $\Delta $.

The proof for this theorem is given in Appendix D.

Let us define the asymptotic distributional quadratic bias (ADQB) of an estimator $\varvec{\beta ^*}$ of $\varvec{\beta }_1$ by

$$\begin{aligned} ADQB(\varvec{\beta }^*)&= [ADB(\varvec{\beta }^*)]^\top \varvec{\Sigma }^{-1} [ADB(\varvec{\beta }^*)], \end{aligned}$$

where $\varvec{\Sigma }$ is the dispersion matrix of $\varvec{{\hat{\beta }}}^{\text {UM}}_1$ as $ n \rightarrow \infty $. In our case, the dispersion matrix is $\varvec{Q}_{11}$. Thus, the asymptotic quadratic distributional bias of the estimators are given below.

$$\begin{aligned} \text{ ADQB }(\varvec{{\hat{\beta }}}^{\text {UM}}_1)&={\varvec{0}},\\ \text{ ADQR }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&= \varvec{\delta }^\top \varvec{Q}^{-1}_{11}\varvec{\delta }\\ \text{ ADQB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= \kappa ^2 \varvec{\delta }^\top \varvec{Q}^{-1}_{11}\varvec{\delta }\left[ E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} \right] ^2\\ \text{ ADQB }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= \varvec{\delta }^\top \varvec{Q}^{-1}_{11}\varvec{\delta }\left[ H_{p_2+2}(\kappa , \Delta ) - E\left\{ \kappa \chi ^{-2}_{p_2+2}(\Delta )I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\right\} \right] . \end{aligned}$$

The above expression reveal that, as expected, the unrestricted estimator of $\varvec{\beta }_1$ is asymptotically unbiased. On the other, the bias function of restricted estimator is a function of the sparsity parameter (non-centrality parameter $\Delta $), so under the sparsity assumption, the estimator is asymptotically unbiased. However, it is an unbounded function of $\Delta $, not a desirable property.

It can be seen that both shrinkage estimators are also function of $\Delta $, more importantly, they are bounded function of the non-centrality parameter. The magnitude of bias increases as $\Delta $ increases and then converges to zero as $\Delta \rightarrow \infty $. As expected the bias curve of positive-part shrinkage estimator is below or equal the curve the curve of the shrinkage estimators.

The bias is a function of MSE (risk), so onward we focus on the risk properties of the estimators.

5.2 Asymptotic risk and risk performance of the estimators

In Appendix E, we present the derivation of the expressions for asymptotic distributional mean square error (ADMSE), and consequentially the risk expressions of the respective estimators.

From the ADMSE and ADQR results in Appendix E, we see clearly that the risk of the classical unrestricted estimator is independent of the sparsity assumption, so its risk take a constant value of $\text {tr}(\varvec{W} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {UM}}_1))$. On the other hand, the risk of the restricted estimator depends on the sparsity assumption, and when the assumption is nearly correct then $R(\varvec{{\hat{\beta }}}^{\text {RM}}_1)\le R(\varvec{{\hat{\beta }}}^{\text {UM}}_1)$ and a strict inequality will hold for some values in the parameter space induced by the sparsity parameter. However, beyond this small interval in the parameter space, the unrestricted estimator will dominate the restricted estimator. As a matter of fact of the the restricted estimator risk is unbounded function of sparsity parameter, a undesirable property.

Interestingly, but not surprisingly, both shrinkage estimators are superior to benchmark estimators in the entire parameter space. For a suitable choice of $\varvec{W}$, it can be verified that $R(\varvec{{\hat{\beta }}}^{\text {SM}}_1) \le R(\varvec{{\hat{\beta }}}^{\text {SM+}}_1) \le R(\varvec{{\hat{\beta }}}^{\text {UM}}_1)$, and the strict inequality will hold for some values in the parameter space. Thus, the shrinkage estimators dominate the classical M-estimator. Further, the shrinkage estimators will outperform the restricted estimator except in a small interval where sparsity assumption may hold. Thus, we recommend the use of the shrinkage estimators, they are in closed form and free from any tuning parameter.

6 Simulation studies

In this section, we conduct a simulation study to appraise the performance of in practical setting and to quantify the relative behavior of the estimators. We perform Monte Carlo simulation experiments to examine the quadratic risk performance of the proposed estimators. We simulate the response from the following model:

$$\begin{aligned} y_i = \sum _{l=1}^{p_1}x_{il}\beta _l + \sum _{m=p_1+1}^{p}x_{im}\beta _m + \sin (4 \pi t_i) + \varepsilon _i \end{aligned}$$

(6.1)

where $\beta _l$ is a $p_1 \times 1$ vector and $\beta _m$ is $p_2 \times 1$ vector of parameters with $p=p_1+p_2$, and $\varepsilon _i \sim N(0,1)$, $i=1, \ldots , n$. Furthermore, $x_{i1}=(\zeta ^{(1)}_{i1})^2+\zeta ^{(1)}_{i}+ \xi _{i1}$, $x_{i2}=(\zeta ^{(1)}_{i2})^2+\zeta ^{(1)}_{i}+ 2\xi _{i2}$, $x_{is}=(\zeta ^{(1)}_{is})^2+\zeta ^{(1)}_{i}$ with $\zeta ^{(1)}_{is}\sim N(0,1)$, $\zeta ^{(1)}_{i}\sim N(0,1)$, $\xi _{i1}\sim $ Bernoulli(0.35) and $\xi _{i2}\sim $ Bernoulli(0.35) for all $s=3,\ldots , p$.

We are interested in testing the assumption of the sparsity in the form of statistical hypothesis $H_0: (\beta _{p_1+1}, \beta _{p_1+2}, \ldots , \beta _{p_1+p_2})=\varvec{0}$. Our aim is to estimate $\varvec{\beta }_1$ when sparsity assumption may or may not be true. We partition the regression coefficients as $\varvec{\beta }= (\varvec{\beta }_1^\top , \varvec{\beta }_2^\top )$. Each realization was repeated 5000 times to obtain stable results. For each realization, we calculated the MSE of the estimators.

We define $\Delta ^* = ||\varvec{\beta } - \varvec{\beta }^{(0)}||,$ where $\varvec{\beta }^{(0)}= (\beta _1^\top , 0)^\top $ and $||\cdot ||$ is the Euclidean norm. In addition, $\Delta ^*$ and $S_n$ were estimated by median absolute deviation (MAD). To determine the behavior of the estimators for $\Delta ^* >0,$ further data sets were generated from those distributions under the alternative hypothesis.

6.1 Error distributions

In an effort to evaluate the performance of the proposed estimators numerically, we perform a simulation study. We generate data from four different error distributions, namely the standard normal, contaminated normal, standard logistic distribution, and standard Laplace distribution, respectively.

The cumulative distribution function

$$\begin{aligned} F(x) = \lambda N(0, \omega ^2) + (1- \lambda )N(0, 1) \end{aligned}$$

(6.2)

was used to generate the standard normal and contaminated normal errors, where $\lambda $ is the parameter indicating whether the standard normal or its contaminated version is returned. We consider $\lambda =0$ and $\lambda =0.9$, respectively. Indeed, for $\lambda =0$ we get the standard normal errors, while for $\lambda =0.9$, with $\omega ^2 \ne 1$, we obtain the scale contaminated normal errors.

The standard logistic distribution has cdf

$$\begin{aligned} F(x) = \frac{1}{1+ e^{-x}}, \quad x \in {\mathfrak {R}}. \end{aligned}$$

(6.3)

The standard Laplace distribution has cdf

$$\begin{aligned} F(x) = \frac{1}{2} \left[ 1+ \text {sign}(x)(1- e^{-|x|})\right] , \quad x \in {\mathfrak {R}}. \end{aligned}$$

(6.4)

6.2 Relative risk comparison

The risk performance of an estimator of $\varvec{\beta }_1$ was measured by comparing its MSE with that of the unrestricted M-estimator. We numerically calculated the relative MSE (RMSE) of the proposed estimators $\varvec{{\hat{\beta }}}^{\text {RM}}_1,$ $\varvec{{\hat{\beta }}}^{\text {SM}}_1$, $\varvec{{\hat{\beta }}}^{\text {SM+}}_1$ to the unrestricted estimator $\varvec{{\hat{\beta }}}^{\text {UM}}_1$, given by

$$\begin{aligned} \text {RMSE}(\varvec{{\hat{\beta }}}^{\text {UM}}_1: \hat{\varvec{\beta }}_1^\text {*})=\frac{\text {MSE}(\varvec{{\hat{\beta }}}^{\text {UM}}_1)}{\text {MSE}(\hat{\varvec{\beta }}_1^\text { *})}, \end{aligned}$$

(6.5)

where $\hat{\varvec{\beta }}_1^\text {*}$ is one of the proposed estimators. The amount by which an RMSE is larger than unity indicates the degree of superiority of the estimator $\hat{\varvec{\beta }}_1^\text {*}$ over $\varvec{{\hat{\beta }}}^{\text {UM}}_1$; see also Fig. 1.

We compute the RMSE values for $n=30, 50$ and various configurations of $(p_1, p_2)$ based on Huber’s $\rho -$function. Our results are presented in Fig. 1 and Tables 1–4.

Figure 1 shows the RMSE values of various M-estimators. Here, $\Delta ^*$ indicates the correctness of the submodel under sparsity assumption. Thus, $\Delta ^* > 0$ quantify the degree of deviation from the assumed model. Figure 1 clearly shows that the restricted estimator is the best when $\Delta ^*$ is close to the origin. However, the restricted estimator become inefficient and the RMSE goes below 1 very quickly as $\Delta ^*$ deviates from zero. The RMSE of restricted estimator is depicted by the dashed line in Fig. 1. In the simulation study, the restricted estimator shows similar behaviour for all the error distributions considered in this study.

Tables 1–4 portrayed similar characteristic of the estimators. Both shrinkage estimators dominate the classical M-estimator, and positive-rule shrinkage estimator (SM+) dominates the shrinkage estimator. As for example, Table 1 presents the RMSEs for $(p_1, p_2) = (3, 5)$ and $n=30$. For the standard normal error, the gain in risk for the positive-rule shrinkage M-estimator is 3.161 times that of the classical M-estimator provided that the model specification is correct (i.e., $\Delta ^*=0$). For the same configuration, when the error distribution is the standard Laplace, the gain in risk for SM+ is 2.273 times that of unrestricted estimator. Interestingly, for the large dimensional case $(p_1, p_2) = (5, 20)$ in Table 3, the gain is much higher with the value 7.325 and 4.200, respectively, demonstrating the applicability, power and beauty of the Stein-rule estimators in high-dimensional cases.

In closing, our numerical results strongly corroborate the theoretical properties of the suggested estimators.

Table 1 RMSE values for restricted, shrinkage, and positive shrinkage M-estimators for ($p_1, p_2$) = (3, 5), $n=30$, based on Huber’s $\rho -$function for different error distributions

Full size table

Table 2 RMSE values for restricted, shrinkage, and positive shrinkage M-estimators for ($p_1, p_2$) = (3, 9), $n=50$, based on Huber’s $\rho -$function for different error distributions

Full size table

Table 3 RMSE values for restricted, shrinkage, and positive shrinkage M-estimators for ($p_1, p_2$) = (5, 9), $n=50$, based on Huber’s $\rho -$function for different error distributions

Full size table

Table 4 RMSE values for restricted, shrinkage, and positive shrinkage M-estimators for ($p_1, p_2$) = (5, 20), $n=50$, based on Huber’s $\rho -$function for different error distributions

Full size table

7 Concluding remarks

In this paper, the shrinkage M-estimation strategies in the context of a partially linear regression model are developed. The statistical properties of shrinkage and positive-rule shrinkage M-estimators are investigated when the sparsity assumption may or may not hold. The expressions for bias and risk of the estimators are presented in closed form. The relative performance of the estimators is critically examined, the positive-rule shrinkage estimator is found to perform better than the unrestricted estimator. Further, it outshines the restricted estimator except in small interval when the submodel at the hand assumed to be to nearly true model.

In the simulation study, we numerically compute relative mean squared errors of the restricted-M, shrinkage-M, and positive-rule shrinkage M-estimators compared to the unrestricted M-estimator. Four different error distributions are considered to study the performance of the proposed estimators. Our numerical provides support for the positive-rule shrinkage estimators under varying degrees of model misidentification, as well. The submodel restricted M-estimator outperforms all other estimators when there is sparsity. However, a small departure from this condition makes the restricted very inefficient, questioning its applicability for practical purposes. We suggest to use positive-rule shrinkage M-estimators due to its performance in the entire parameter space.

More importantly, the performance positive-rule shrinkage M-estimators is noticeable when $p_2$ is large, this work can be extended to high-dimensional cases, we refer to Ahmed et. al. (2023). We plan to study such extensions in a separate communication.

References

Ahmed, S. E. (2014). Penalty, Shrinkage and Pretest Strategies: Variable Selection and Estimation. New York, USA: Springer.
Book Google Scholar
Ahmed, S. E., Ahmed, F., & Yüzbaşı, B. (2023). Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data. Boca Raton, USA: CRC Press.
Book Google Scholar
Ahmed, S. E., Doksum, K. A., Hossain, S., & You, J. (2007). Shrinkage, pretest and absolute penalty estimators in partially linear models. Australian & New Zealand Journal of Statistics, 49, 435–454.
Article MathSciNet Google Scholar
Ahmed, S. E., & Fallahpour, S. (2012). Shrinkage estimation strategy in quasi-likelihood models. Statistics & Probability Letters, 82(12), 2170–2179.
Article MathSciNet Google Scholar
Ahmed, S. E., Hussein, A. A., & Sen, P. K. (2006). Risk comparison of some shrinkage M-estimators in linear models. Nonparametric Statistics, 18, 401–415.
Article MathSciNet Google Scholar
Ahmed, S. E., & Raheem, S. M. E. (2012). Shrinkage and absolute penalty estimation in linear regression models. Wiley Interdisciplinary Reviews: Computational Statistics, 4(6), 541–553.
Article Google Scholar
Ahmed, S. E., & Yüzbaşı, B. (2016). Big data analytics: integrating penalty strategies. International Journal of Management Science and Engineering Management, 11(2), 105–115.
Article Google Scholar
Arashi, M., Kibria, B. G., Norouzirad, M., & Nadarajah, S. (2014). Improved preliminary test and Stein-rule Liu estimators for the ill-conditioned elliptical linear regression model. Journal of Multivariate Analysis, 126, 53–74.
Article MathSciNet Google Scholar
Bianco, A., & Boente, G. (2004). Robust estimators in semiparametric partly linear regression models. Journal of Statistical Planning and Inference, 122(1–2), 229–252.
Article MathSciNet Google Scholar
Denby, L. (1986). Smooth regression functions. Statistical Research Report, 26. AT &T Bell Laboratories, Murray Hill.
Farcomeni, A., & Ventura, L. (2012). An overview of robust methods in medical research. Statistical Methods in Medical Research, 21(2), 111–133.
Article MathSciNet Google Scholar
Jurečcková, J., & Sen, P. K. (1996). Robust Statistical Procedures: Asymptotics and Interrelations. New York: Wiley.
Google Scholar
Ma, T., Liu, S., & Ahmed, S. E. (2014). Shrinkage estimation for the mean of the inverse Gaussian population. Metrika, 77, 733–752.
Article MathSciNet Google Scholar
Maruyama, Y., Kubokawa, T., & Strawderman, W. E. (2023). Stein Estimation. Singapore: Springer.
Book Google Scholar
Norouzirad, M., & Arashi, M. (2019). Preliminary test and Stein-type shrinkage ridge estimators in robust regression. Statistical Papers, 60, 1849–1882.
Article MathSciNet Google Scholar
Opoku, E. A., Ahmed, S. E., & Nathoo, F. S. (2021). Sparse estimation strategies in linear mixed effect models for high-dimensional data application. Entropy, 23(10), 1348.
Article MathSciNet Google Scholar
Raheem, S. M. E., Ahmed, S. E., & Doksum, K. A. (2012). Absolute penalty and shrinkage estimation in partially linear models. Computational Statistics & Data Analysis, 56, 874–891.
Article MathSciNet Google Scholar
Rather, K. U. I., Koçyiǧit, E. G., Onyango, R., & Kadilar, C. (2022). Improved regression in ratio type estimators based on robust M-estimation. PLoS ONE, 17(12), e0278868.
Article Google Scholar
Robinson, P. (1988). Root-n-consistent semiparametric regression. Econometrica, 56, 931–954.
Article MathSciNet Google Scholar
Shih, J. H., Lin, T. Y., Jimichi, M., & Emura, T. (2021). Robust ridge M-estimators with pretest and Stein-rule shrinkage for an intercept term. Japanese Journal of Statistics and Data Science, 4, 107–150.
Article MathSciNet Google Scholar
Speckman, P. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society. Series. B, 50, 413–437.
MathSciNet Google Scholar
Sun, R., Ma, T., & Liu, S. (2020). Portfolio selection: shrinking the time-varying inverse conditional covariance matrix. Statistical Papers, 61, 2583–2604.
Article MathSciNet Google Scholar
Susanti, Y., Qona’ah, N., Ferawati, K., & Qumillaila, C. (2020). Prediction modeling of annual parasite incidence (API) of Malaria in Indonesia using Robust regression of M-estimation and S-estimation. AIP Conference Proceedings, 2296, 020100. https://doi.org/10.1063/5.0037417
Article Google Scholar
Zhou, P., Xie, J., Li, W., Wang, H., & Chai, T. (2020). Robust neural networks with random weights based on generalized M-estimation and PLS for imperfect industrial data modeling. Control Engineering Practice, 105, 104633.
Article Google Scholar

Download references

Acknowledgements

We would like to express our sincere gratitude to the Reviewers and Editors for their constructive comments and valuable feedback, which greatly contributed to the enhancement of this manuscript. Their meticulous review and thoughtful suggestions played a pivotal role in improving the quality and clarity of our work. Furthermore, S. Ejaz Ahmed would like to thank the colleagues at the University of Canberra for their hospitality and support. The support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) has been invaluable in conducting the research presented in this manuscript and is gratefully acknowledged.

Author information

Authors and Affiliations

North Carolina Central University, Durham, NC, USA
Enayetur Raheem
Brock University, St. Catharines, ON, Canada
S. Ejaz Ahmed
University of Canberra, Canberra, ACT, Australia
Shuangzhe Liu

Authors

Enayetur Raheem
View author publications
You can also search for this author in PubMed Google Scholar
S. Ejaz Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Shuangzhe Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuangzhe Liu.

Ethics declarations

Conflicts of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A. Regularity conditions

Here, we list the regularity conditions needed for the minimization problem in (3.2). Detailed discussions about these conditions can be found in (Jurečcková and Sen (1996), p. 217–218).

For the studentized M-estimators, consider that $\phi = \rho '$ can be decomposed as

$$\begin{aligned} \phi = \phi _1 + \phi _2 + \phi _3, \end{aligned}$$

(A.1)

where $\phi _1$ is an absolutely continuous function with absolutely continuous derivative, $\phi _2$ is a continuous piecewise linear function that is constant in a neighbourhood of $\pm \infty $, and $\phi _3$ is a non-decreasing step function.

The following conditions are imposed on (3.2).

RC1
$S_n(Y)$ is regression invariant and scale equivariant, $S_n >0$ a.s., and
$$\begin{aligned} \sqrt{n} (S_n - S) = O_p(1) \end{aligned}$$
for some functional $S= S(F) >0$.
RC2
The function $h(t) = \int \rho ((z-t)/S) \textrm{d}F(z)$ has the unique minimum at $t=0$.
RC3
For some $\delta > 0$ and $\eta > 1$,
$$\begin{aligned} { \int _{-\infty }^{\infty }} \left\{ |z| \sup _{|u| \le \delta } \sup _{|v| \le \delta } \bigg | \phi _1''\left( \frac{e^{-v}(z + u)}{S}\right) \bigg | \right\} ^\eta \textrm{d}F(z) < \infty \end{aligned}$$
and
$$\begin{aligned} { \int _{-\infty }^{\infty }} \left\{ |z|^2 \sup _{|u| \le \delta } \bigg | \frac{\phi _1'' (z+u)}{S} \bigg | \right\} ^\eta \textrm{d}F(z) < \infty , \end{aligned}$$
where $\phi _1'(z) = \frac{d}{dz} \phi _1(z)$ and $\phi _1''(z) = \frac{d^2}{dz^2}\phi _1(z)$.
RC4
$\phi _3$ is a continuous, piecewise linear function with knots at $\mu _1, \dots , \mu _k$, which is constant in a neighborhood of $\pm \infty $. Hence the derivative $\phi _3'$ of $\phi _3$ is a step function
$$\begin{aligned} \phi _3'(z) = \alpha _\nu \quad \text{ for } \mu _\nu< z < \mu _{\nu +1}, \nu = 0, 1, \dots , k, \end{aligned}$$
where $\alpha _0, \alpha _1, \dots , \alpha _k \in {\mathfrak {R}}_1$, $\alpha _0 = \alpha _k = 0$ and $\infty = \mu _0< \mu _1< \dots< \mu _k < \mu _{k+1} = \infty $. Further, we assume that $f(z) = \frac{\textrm{d}F(z)}{dz}$ is bounded in neighbourhood of $S_{\mu _j}, j = 1, 2, \dots , k$.
RC5
$\phi _3(z) = \lambda _{\nu }$ for $q_\nu < z \le q_{\nu +1}$, $\nu = 1, 2, \dots , m$ where $-\infty = q_0< q_1< \dots< q_m < q_{m+1}= \infty $, $-\infty< \lambda _0< \lambda _1< \dots< \lambda _m < \infty $. We further assume that $f'(z)$ and $f''(z)$ are bounded in the neighbourhood of $S_{q_j}, j = 1, 2, \dots , m$.

B. Assumptions

Assumption B.1

The function $g(\cdot )$ satisfies the Lipschitz condition of order 1 on [0, 1].

Assumption B.2

The probability weight functions $W_{ni}(\cdot )$ satisfy

a)
$\max _{1\le i\le n}\sum _{j=1}^{n} W_{ni}(t_j) = {\mathcal {O}}(1)$,
b)
$\max _{1\le i, \, j \le n}\sum _{j=1}^{n} W_{ni}(t_j) = {\mathcal {O}}(n^{-2/3})$,
c)
$\max _{1\le j\le n}\sum _{i=1}^{n} W_{ni}(t_j) I(|t_i - t_j| > c_n) = {\mathcal {O}}(d_n)$, where I is the indicator function, $c_n$ satisfies $\lim \sup _{n\rightarrow \infty }nc^{3}_n$, and $d_n$ satisfies $\lim \sup _{n \rightarrow \infty } n d^3_n < \infty $.

Remark 1

The usual polynomial and trigonometric functions satisfy Assumption B.1.

Remark 2

Under regular conditions, the Nadaraya-Watson kernel weights, Priestley and Chao kernel weights, locally linear weights and Gasser-Müller kernel weights satisfy Assumption B.2. If we consider the pdf of $U[-1, 1]$ as the kernel function as

$$\begin{aligned} K(t) = \frac{1}{2}I_{[-1, 1]}(t), \end{aligned}$$

with $t_i = \frac{i}{n}$, and the bandwidth $cn^{-1/3}$ where c is constant, then the Priestley and Chao kernel weights satisfy Assumption B.2, and the weights are

$$\begin{aligned} W_{ni}(t) = \frac{1}{2cn^{\frac{2}{3}}}\left( \big |t-\frac{i}{n}\big | \le cn^{-\frac{1}{3}}\right) ^{(t)}. \end{aligned}$$

For a detailed discussion on the assumptions above, see Ahmed et al. (2007).

1.1 B.1 Consistency and Asymptotic Normality

Now, we denote the random vector $(R(T), \varvec{U}(T)^\top )^\top $ with the same distribution as $(r_i, \varvec{u}_i^\top )^\top $.

Consistency of the regression parameters in a semi-parametric model has been proved in great detail in Bianco and Boente (2004). We omit the details but present only the set of assumptions, lemma, and theorem which are needed for proving asymptotic normality and consistency of the estimators.

Let ${\tilde{\rho }}$ and ${\widetilde{W}}$ be score and weight functions, respectively. The asymptotic distribution of $\varvec{\beta }$ is defined as a solution of

$$\begin{aligned} \sum _{i=1}^n {\tilde{\rho }} \left( \frac{{{\hat{r}}}_i - \hat{\varvec{u}}_i \hat{\varvec{\beta }}}{s_n} \right) {\widetilde{W}} (||\hat{\varvec{u}}_i||) \hat{\varvec{u}}_i =0, \end{aligned}$$

(B.1)

where ${\hat{r}}_i = y_i -\hat{\gamma }_0(t)$, $\varvec{u}_i = \varvec{x}_i - \hat{\varvec{\gamma }}(t_i)$, and $s_n$ is an estimate of the residual scale.

To derive the asymptotic distribution of $\varvec{\beta }$ we must have $t_i$ in a compact set, so without loss of generality, we assume that $t_i \in [0,1]$. We need the following set of assumptions. See Bianco and Boente (2004) for details.

A1
${\tilde{\rho }}$ is odd, bounded, continuous, and twice differentiable with bounded derivative ${\tilde{\rho }}'$ and ${\tilde{\rho }}''$ such that $\phi _1(t) = t\tilde{\rho }'(t)$ and $\phi _2(t) = t\tilde{\rho }''(t)$ are bounded.
A2
$E({\tilde{W}}(||\varvec{U}(T)||) ||\varvec{U}(T)||^2) < \infty $ and the matrix
$$\begin{aligned} \varvec{A} = E \left( {\tilde{\rho }}' \left( \frac{R(T) - \varvec{U}(T)^\top \varvec{\beta }}{\sigma }\right) {\widetilde{W}} (||\varvec{U}(T)||) \varvec{U}(T)\varvec{U}(T)^\top \right) \end{aligned}$$
is nonzero.
A3
$\widetilde{W}(u) = {\tilde{\rho }}_1(u)u^{-1} >0$ is a bounded function which satisfies the Lipschitz condition of order 1. Further, ${\tilde{\rho }}_1$ is bounded with bounded derivative.
A4
$E(\widetilde{W} (||\varvec{U}(T)||)\varvec{U}(T)|T=t) =0$ for almost all t.
A5
The functions $\varvec{x}_j(t), 0 \le j \le p$ are continuous in [0, 1] with continuous first derivative.

Remark 3

According to Robinson (1988), condition A2 is needed so that no element of $\varvec{X}$ can be predictable by T. A2 guarantees that there is no multicollinearity in the columns of $\varvec{X}- {\tilde{\varvec{X}}}_j(T)$. In other words, $\varvec{X}$ has to be free from multicollinearity. Also, condition A5 is a standard requirement in kernel estimation in semi-parametric models in order to guarantee asymptotic normality.

Lemma B.1

Let $(y_i, \varvec{x}_i^\top , t_i)^\top , 1 \le i \le n$ be independent random vectors satisfying (2.1) with $e_i$ independent of $(\varvec{x}_i^\top , t_i)^\top $. Assume that $t_i$ are random variable with $t_i \in [0, 1]$. Denote $(R(T), \varvec{U}(T)^\top )^\top $ a random vector with the same distribution as

$$\begin{aligned} (r_i, \varvec{u}_i^\top )^\top = (y_i - \hat{\gamma }_0(t_i), [\varvec{x}_i-\hat{\varvec{\gamma }}(t_i)]^\top )^\top . \end{aligned}$$

Further, let $\hat{\varvec{\gamma }}_j(t_i), \, 0 \le j \le p$ be the estimates of $\gamma _j(t_i)$ such that

$$\begin{aligned} \sum _{t \in [0,1]} | \hat{\gamma }_j(t) - \gamma _j(t)| {\mathop {\longrightarrow }\limits ^{p}} 0, \quad 0 \le j \le p. \end{aligned}$$

If ${\tilde{\varvec{\beta }}} {\mathop {\longrightarrow }\limits ^{p}} \varvec{\beta }$ and $s_n {\mathop {\longrightarrow }\limits ^{p}} \sigma $, then under the stated assumptions A1-A3, $\varvec{A}_n {\mathop {\longrightarrow }\limits ^{p}} \varvec{A}$, where $\varvec{A}$ is defined in A2, and

$$\begin{aligned} \varvec{A}_n = n^{-1} \sum _{i=1}^n {\tilde{\rho }}' \left( \frac{{\hat{r}}_i - \hat{\varvec{u}}_i^\top {\hat{\varvec{\beta }}}}{s_n}\right) {\widetilde{W}}(||\hat{\varvec{u}}_i||) \hat{\varvec{u}}_i\hat{\varvec{u}}_i^\top , \end{aligned}$$

where ${\mathop {\longrightarrow }\limits ^{p}}$ denotes convergence in probability.

Proof

The proof is available in the appendix of Bianco and Boente (2004).

Theorem B.1

Let $(y_i, \varvec{x}_i^\top , t_i)^\top , 1 \le i \le n$ be independent random vectors satisfying (2.1) with $e_i$ independent of $(\varvec{x}_i^\top , t_i)^\top $. Assume that $t_i$ are random variables with $t_{i_n} \in [0, 1]$. Denote $(R(T), \varvec{U}(T)^\top )^\top $ a random vector with the same distribution as

$$\begin{aligned} (r_i, \varvec{u}_i^\top )^\top = (y_i - \gamma _0(t_i), (x_i-\varvec{\gamma }(t_i))^\top )^\top . \end{aligned}$$

Further, let $\hat{\gamma }_j(t),\, 0 \le j \le p$ be estimates of $\gamma _j(t)$ such that first derivative of $\hat{\gamma }_j(t)$ exists and is continuous, and

(B.2)

(B.3)

Then, if $s_n {\mathop {\longrightarrow }\limits ^{p}} \sigma ,$ under A1-A5,

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\beta }}- \varvec{\beta }){\mathop {\longrightarrow }\limits ^{d}} N(\varvec{0}, \varvec{Q}) \end{aligned}$$

with $\varvec{Q} = \varvec{A}^{-1}\varvec{\Sigma } (\varvec{A}^{-1})^\top $, where $\varvec{A}$ is defined in A2 and

$$\begin{aligned} \varvec{\Sigma }= \sigma ^2 E\left( {\tilde{\rho }}^2 \left( \frac{R(T) - \varvec{U}(T)^\top \varvec{\beta }}{\sigma } \right) {\widetilde{W}}^2 (||\varvec{U}(T)||) \varvec{U}(T) \varvec{U}(T)^\top \right) . \end{aligned}$$

Proof

The proof is available in Bianco and Boente (2004).

C. Proof for Theorem 5.1

For Theorem 5.1, we derive $\varvec{\Sigma }_{12}$ as follows:

$$\begin{aligned} \varvec{\Sigma }_{12}&= Cov(\eta _1, \eta _2) \\&= Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1,\, \varvec{{\hat{\beta }}}^{\text {UM}}_1 -\varvec{{\hat{\beta }}}^{\text {RM}}_1) \\&= Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1,\, \varvec{{\hat{\beta }}}^{\text {UM}}_1) - Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1, \, \varvec{{\hat{\beta }}}^{\text {RM}}_1) \\&= Var(\varvec{{\hat{\beta }}}^{\text {UM}}_1) - Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1, \, \varvec{{\hat{\beta }}}^{\text {RM}}_1) \\&= \gamma ^2 \varvec{Q}^{-1}_{11.2} - Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1, \, \varvec{{\hat{\beta }}}^{\text {RM}}_1), \end{aligned}$$

where

$$\begin{aligned} Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1,\, \varvec{{\hat{\beta }}}^{\text {RM}}_1)&= Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1, \varvec{{\hat{\beta }}}^{\text {UM}}_1 + \varvec{Q}^{-1}_{11} \varvec{Q}_{12}\varvec{{\hat{\beta }}}^{\text {UM}}_2) \\&= Var(\varvec{{\hat{\beta }}}^{\text {UM}}_1) + Cov(\varvec{{\hat{\beta }}}^{\text {UM}}_1, \varvec{{\hat{\beta }}}^{\text {UM}}_2)[\varvec{Q}^{-1}_{11} \varvec{Q}_{12}]^\top \\&= \gamma ^2 \varvec{Q}^{-1}_{11.2} + \gamma ^2 \varvec{Q}_{12}\varvec{Q}_{21} \varvec{Q}^{-1}_{11}. \end{aligned}$$

Therefore,

$$\begin{aligned} \varvec{\Sigma }_{12}&= \gamma ^2 \varvec{Q}^{-1}_{11.2} - \gamma ^2 \varvec{Q}^{-1}_{11.2} - \gamma ^2 \varvec{Q}_{12} \varvec{Q}_{21} \varvec{Q}^{-1}_{11} \\&= - \gamma ^2 \varvec{Q}_{12} \varvec{Q}_{21} \varvec{Q}^{-1}_{11} \end{aligned}$$

and

$$\begin{aligned} \varvec{\Sigma }^*&= \varvec{\Omega }^* - \gamma ^2 \varvec{Q}^{-1}_{11.2} + \varvec{\Sigma }_{12} + \varvec{\Sigma }_{21} \\&= \gamma ^2(\varvec{Q}_{11}^{-1} - \varvec{Q}^{-1}_{11.2} - \varvec{Q}_{12}\varvec{Q}_{21}\varvec{Q}^{-1}_{11} - \varvec{Q}^{-1}_{11} \varvec{Q}_{12} \varvec{Q}_{21}). \end{aligned}$$

D. Proof for Theorem 5.2

We present our proof as follows: Obviously, ADB$(\varvec{{\hat{\beta }}}^{\text {UM}}_1)=0$ and

$$\begin{aligned} \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&= E\left\{ \lim _{n \rightarrow \infty } \sqrt{n}(\varvec{{\hat{\beta }}}^{\text {RM}}_1 - \varvec{\beta }_1)\right\} \\&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1 + \varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{{\hat{\beta }}}^{\text {UM}}_2 - \varvec{\beta }_1)\right\} \\&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n} (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{\beta }_1)\right\} + E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{{\hat{\beta }}}^{\text {UM}}_2)\right\} \\&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n} \left( \varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{{\hat{\beta }}}^{\text {UM}}_2)\right) \right\} \\&= \varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{\omega }\\&= -\varvec{\delta }. \end{aligned}$$

$$\begin{aligned} \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {SM}}_1 - \varvec{\beta }_1)\right\} \\&= E\left\{ \lim _{n \rightarrow \infty }\left( \sqrt{n}\varvec{{\hat{\beta }}}^{\text {SM}}_1 - \sqrt{n} \varvec{\beta }_1\right) \right\} \\&= E \left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1)(- \kappa \psi _n^{-1}) \right\} \\&= - \kappa E\left\{ \eta _2\psi _n^{-1}\right\} \\&= - \kappa (-\varvec{\delta }) E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} \\&= \kappa \varvec{\delta }E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} . \end{aligned}$$

$$\begin{aligned} \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {SM+}}_1-\varvec{\beta }_1) \right\} \\&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {SM+}}_1-\varvec{\beta }_1) - \sqrt{n} (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1)(1 - \kappa \psi _n^{-1})I(\psi _n< \kappa )\right\} \\&= E \left\{ \lim _{n \rightarrow \infty }\sqrt{n} (\varvec{{\hat{\beta }}}^{\text {SM}}_1 - \varvec{\beta }_1)\right\} - E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1-\varvec{{\hat{\beta }}}^{\text {RM}}_1)(1 - \kappa \psi _n^{-1}) I(\psi _n< \kappa )\right\} \\&= \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - E\left\{ \eta _2(1 - \kappa \psi _n^{-1})I(\psi _n< \kappa )\right\} \\&= \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - \varvec{\delta }E\left\{ \left( 1 - \kappa \chi ^{-2}_{p_2+2}(\Delta ^2)\right) I\left( \chi ^2_{p_2+2}(\Delta ^2)< \kappa \right) \right\} \\&= \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - \varvec{\delta }E\left\{ I\left( \chi ^{2}_{p_2+2}(\Delta )\right)< \kappa \right\} - \varvec{\delta }E\left\{ \kappa \chi ^{-2}_{p_2+2}(\Delta )I\left( \chi ^{2}_{p_2+2}(\Delta )< \kappa \right) \right\} \\&= \text{ ADB }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - \varvec{\delta }\left[ H_{p_2+2}(\kappa , \Delta ) - E\left\{ \kappa \chi ^{-2}_{p_2+2}(\Delta )I \left( \chi ^{2}_{p_2+2}(\Delta )< \kappa \right) \right\} \right] , \end{aligned}$$

where I() denotes an indicator function.

E. Derivation of Asymptotic Risk of the Estimators

Let us denote the ADMSE by $\varvec{\Gamma }$, and then the expressions are listed as follows:

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {UM}}_1)&= \gamma ^2 \varvec{Q}^{-1}_{11.2}\\ \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&= \gamma ^2\varvec{Q}^{-1}_{11} + \varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{\omega }\varvec{\omega ^\top } \varvec{Q}_{21}Q^{-1}_{11}\\ \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= \gamma ^2\varvec{Q}^{-1}_{11.2} - 2\kappa \left[ E(\chi ^{-2}_{p_2+2}(\Delta ))\Sigma _{21} + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+4}(\Delta ))\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right. \\&\left. - \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta )) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right] \\&+ \kappa ^2 \left[ \varvec{\Sigma }^{*} E(\chi ^{-4}_{p_2+2}(\Delta )) + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-4}_{p_2+4}(\Delta ))\right] . \\ \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - 2 \varvec{\Sigma }_{21} E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta )) I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\\&-2 \varvec{\delta }\varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}E(1- \kappa \chi ^{-2}_{p_2+4}(\Delta )) I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \\&+ 2 \varvec{\delta }\varvec{\delta }^\top E(1-\kappa \chi ^{-2}_{p_2+2}(\Delta )) I (\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \\&+ \varvec{\Sigma }^* E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta ))^2 I(\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \\&+ \varvec{\delta }\varvec{\delta }^\top E\left\{ (1- \chi ^{-2}_{p_2+4}(\Delta ))^2 I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \right\} . \end{aligned}$$

Proof

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {UM}}_1)&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \beta _1) \sqrt{n} (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \beta _1)^\top \right\} \\&= \quad E\{\eta _1 \eta _1^\top \}\\&= \quad \{Cov(\eta _1 \eta _1^\top ) + E(\eta _1)E(\eta _1)^\top \}\\&= \quad Var(\eta _1) \\&= \quad \gamma ^2 \varvec{Q}^{-1}_{11.2}.\\ \end{aligned}$$

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {RM}}_1 - \beta _1) \sqrt{n} (\varvec{{\hat{\beta }}}^{\text {RM}}_1 - \beta _1)^\top \right\} \\&= \quad E\{\eta _3 \eta _3^\top \}\\&= \quad Cov(\eta _3, \eta _3^\top ) + E(\eta _3)E(\eta _3)^\top \\&= \quad Var(\eta _3) + E(\eta _3)E(\eta _3)^\top \\&= \quad \gamma ^2 \varvec{Q}^{-1}_{11} + \varvec{Q}^{-1}_{11}\varvec{Q}_{12}\varvec{\omega }\varvec{\omega }^\top \varvec{Q}_{21}Q^{-1}_{11}. \end{aligned}$$

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= E\left\{ \lim _{n \rightarrow \infty }\sqrt{n}(\varvec{{\hat{\beta }}}^{\text {SM}}_1 - \beta _1) \sqrt{n} (\varvec{{\hat{\beta }}}^{\text {SM}}_1 - \beta _1)^\top \right\} \\&= E\left\{ \lim _{n \rightarrow \infty }n\left[ (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \beta _1) - (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1) \kappa \psi _n^{-1}\right] \right. \\&\quad \left. \left[ (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \beta _1) - (\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1) \kappa \psi _n^{-1}\right] ^\top \right\} \\&= E\left\{ [\eta _1 - \eta _2 \kappa \psi _n^{-1}][\eta _1 - \eta _2 \kappa \psi _n^{-1}]^\top \right\} \\&= E\left\{ \eta _1\eta _1^\top - 2\kappa \psi _n^{-1}\eta _2 \eta _1^\top + \kappa ^2 \psi _n^{-2}\eta _2 \eta _2^\top \right\} . \end{aligned}$$

(A)

Now

$$\begin{aligned} E \left\{ \psi _n^{-1}\eta _2 \eta _1^\top \right\}&= E\left\{ E(\eta _2\eta _1^\top \psi _n^{-1}| \eta _2)\right\} = E\left\{ \eta _2 E(\eta _1^\top \psi _n^{-1}| \eta _2)\right\} \\&= E \left\{ \eta _2 \left[ 0+ \varvec{\Sigma }_{12}\varvec{\Sigma }^{*-1}(\eta _2 - \delta )\right] ^\top \psi _n^{-1}\right\} \\&= E\left\{ \eta _2 (\eta _2 - \varvec{\delta })^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{12}^\top \psi _n^{-1}\right\} \\&= E\left\{ \eta _2 \eta _2^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\psi _n^{-1}\right\} - E\left\{ \eta _2 \varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\psi _n^{-1}\right\} \\&= \left[ Var(\eta _2)E(\chi ^{-2}_{p_2+2}(\Delta )) + E(\eta _2)E(\eta _2)^\top E(\chi ^{-2}_{p_2+4}(\Delta ))\right] \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \\&\quad - E(\eta _2) \varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta ))\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\\&= \left[ \varvec{\Sigma }^* E(\chi ^{-2}_{p_2+2}(\Delta )) + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+4}(\Delta )) \right] \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \\&\quad - \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta )) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\\&= E(\chi ^{-2}_{p_2+2}(\Delta ))\varvec{\Sigma }_{21} + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+4}(\Delta )) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \\&\quad - \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta )) \varvec{\Sigma }^{*-1} \varvec{\Sigma }_{21}. \end{aligned}$$

By substituting $E\{\psi _n^{-1}\eta _2\eta _2^\top \}$ in (A), we get

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= E \{\eta _1\eta _1^\top \} - 2\kappa E\left\{ \psi _n^{-1}\eta _2\eta _1^\top \right\} + \kappa E\left\{ \psi _n^{-2}\eta _2\eta _2^\top \right\} \\&= Var(\eta _1) - 2\kappa \left[ E(\chi ^{-2}_{p_2+2}(\Delta )) \varvec{\Sigma }_{21} + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+4}(\Delta ))\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right. \\&\quad \left. - \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta ))\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\right] \\&\quad + \kappa ^2 \left\{ Var(\eta _2) E(\chi ^{-4}_{p_2+2}(\Delta )) + E(\eta _2)E(\eta _2)^\top ) E(\chi ^{-4}_{p_2+4}(\Delta ))\right\} \\&= \gamma ^2\varvec{Q}^{-1}_{11.2} - 2\kappa \left[ E(\chi ^{-2}_{p_2+2}(\Delta ))\varvec{\Sigma }_{21} + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+4}(\Delta ))\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right. \\&\quad \left. - \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-2}_{p_2+2}(\Delta )) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right] \\&\quad + \kappa ^2 \left[ \varvec{\Sigma }^{*} E(\chi ^{-4}_{p_2+2}(\Delta )) + \varvec{\delta }\varvec{\delta }^\top E(\chi ^{-4}_{p_2+4}(\Delta ))\right] . \end{aligned}$$

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= E\left\{ \lim _{n \rightarrow \infty }n (\varvec{{\hat{\beta }}}^{\text {SM+}}_1 - \beta _1)(\varvec{{\hat{\beta }}}^{\text {SM+}}_1 - \beta _1)^\top \right\} \\&= \Gamma (\varvec{{\hat{\beta }}}^{\text {SM}}_1) - 2 E\left\{ \lim _{n \rightarrow \infty }n(\varvec{{\hat{\beta }}}^{\text {UM}}_1 -\varvec{{\hat{\beta }}}^{\text {RM}}_1)(\varvec{{\hat{\beta }}}^{\text {UM}}- \beta _1)^\top (1- \kappa \psi _n^{-1}) I(\psi _n< \kappa ) \right\} \\&\quad + E \left\{ \lim _{n \rightarrow \infty }n(\varvec{{\hat{\beta }}}^{\text {UM}}_1 -\varvec{{\hat{\beta }}}^{\text {RM}}_1)(\varvec{{\hat{\beta }}}^{\text {UM}}_1 - \varvec{{\hat{\beta }}}^{\text {RM}}_1)^\top (1- \kappa \psi _n^{-1})^2 I(\psi _n< \kappa ) \right\} \\&= \Gamma (\varvec{{\hat{\beta }}}^{\text {SM}}_1) - 2 E \left\{ \eta _2 \eta _1^\top (1-\kappa \psi _n^{-1}) I(\psi _n< \kappa )\right\} \\&\quad + E\left\{ \eta _2\eta _2^\top (1- \kappa \psi _n^{-1})^2I(\psi _n < \kappa )\right\} . \end{aligned}$$

(B)

By using the rule of conditional expectation, we obtain

$$\begin{aligned}&E\left\{ \eta _2\eta _2^\top (1- \kappa \psi _n^{-1})I(\psi _n< \kappa )\right\} \\&= E \left[ \eta _2 E\left\{ \eta _1^\top (1- \kappa \psi _n^{-1} I(\psi _n< \kappa ) \right\} | \eta _2\right] \\&= E\left[ \eta _2 \left\{ 0+ \varvec{\Sigma }_{12}\varvec{\Sigma }^{*-1}(\eta _2 -\varvec{\delta })\right\} ^\top (1- \kappa \psi _n^{-1}) I(\psi _n< \kappa ) \right] \\&= E\left\{ \eta _2(\eta _2 - \varvec{\delta })^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}(1- \kappa \psi _n^{-1}) I(\psi _n< \kappa )\right\} \\&= E\left\{ \eta _2\eta _2^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} (1- \kappa \psi _n^{-1})I(\psi _n< \kappa )\right\} \\&\quad - E\left\{ \eta _2 \varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}(1- \kappa \psi _n^{-1})I(\psi _n< \kappa )\right\} \\&= \left\{ Var(\eta _2) E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta )) I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\right. \\&\quad \left. + {} \varvec{\delta }\varvec{\delta }^\top E(1- \kappa \chi ^{-2}_{p_2+4}(\Delta )) I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \right\} \\&\quad - \left\{ \varvec{\delta }\varvec{\delta }^\top E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta )) I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\right\} . \end{aligned}$$

Substituting the above in (B), we get

$$\begin{aligned} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1) - 2 \varvec{\Sigma }_{21} E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta )) I(\chi ^{2}_{p_2+2}(\Delta )< \kappa )\\&\quad -2 \varvec{\delta }\varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}E(1- \kappa \chi ^{-2}_{p_2+4}(\Delta )) I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21} \\&\quad + 2 \varvec{\delta }\varvec{\delta }^\top E(1-\kappa \chi ^{-2}_{p_2+2}(\Delta )) I (\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \\&\quad + \varvec{\Sigma }^* E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta ))^2 I(\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \\&\quad + \varvec{\delta }\varvec{\delta }^\top E\left\{ (1- \kappa \chi ^{-2}_{p_2+4}(\Delta ))^2 I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \right\} . \end{aligned}$$

Using the definition in (4.4), we have the ADQR expressions as follows:

$$\begin{aligned} R(\varvec{{\hat{\beta }}}^{\text {UM}}_1)&= \text {tr}(\varvec{W} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {UM}}_1)) \\&= \text {tr}(\varvec{W} \gamma ^2 \varvec{Q}_{11.2}^{-1}), \\ R(\varvec{{\hat{\beta }}}^{\text {RM}}_1)&= \text {tr}(\varvec{W} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {RM}}_1)) \\&= \text {tr}(\varvec{W} \gamma ^2 \varvec{Q}_{11}^{-1}) + \text {tr}(\varvec{W} \varvec{Q}_{11}^{-1}\varvec{Q}_{12}\varvec{\omega }\varvec{\omega }^\top \varvec{Q}_{21}\varvec{Q}_{11}^{-1}), \\ R(\varvec{{\hat{\beta }}}^{\text {SM}}_1)&= \text {tr}(\varvec{W} \varvec{\Gamma }(\varvec{{\hat{\beta }}}^{\text {SM}}_1)) \\&= R(\varvec{{\hat{\beta }}}^{\text {UM}}_1) - 2\kappa E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} \text {tr}(\varvec{W} \varvec{\Sigma }_{21}) \\&\quad - 2 \kappa E\left\{ \chi ^{-2}_{p_2+4}(\Delta )\right\} \text {tr}(\varvec{W} \varvec{\delta }\varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}) \\&\quad + 2 \kappa E\left\{ \chi ^{-2}_{p_2+2}(\Delta )\right\} \text {tr}(\varvec{W} \varvec{\delta }\varvec{\delta }^\top \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}) \\&\quad + \kappa ^2 E\left\{ \chi ^{-4}_{p_2+2}(\Delta )\right\} \text {tr}(\varvec{W}\varvec{\Sigma }^*)\\&\quad + \kappa ^2 E\left\{ \chi ^{-2}_{p_2+4}(\Delta )\right\} \text {tr}(\varvec{W} \varvec{\delta }\varvec{\delta }^\top ), \\ R(\varvec{{\hat{\beta }}}^{\text {SM+}}_1)&= \text {tr}(\varvec{W} \varvec{\Gamma }(\hat{\varvec{\beta }}_1^{\text {SM+}})) \\&= R(\hat{\varvec{\beta }}_1^{\text {SM}}) - 2E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta ))I(\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \text {tr}(\varvec{W} \varvec{\Sigma }_{21})\\&\quad - 2 E(1 - \kappa \chi ^{-2}_{p_2+4}(\Delta )) I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \text {tr}(\varvec{W} \varvec{\delta }^\top \varvec{\delta } \varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21}\varvec{\Sigma }^{*-1}\varvec{\Sigma }_{21})\\&\quad + 2 E(1- \kappa \chi ^{-2}_{p_2+2}(\Delta )) I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \text {tr}(\varvec{W} \varvec{\delta }\varvec{\delta }^\top ) \\&\quad + E\left\{ (1 - \kappa \chi ^{-2}_{p_2+2}(\Delta ))^2 I(\chi ^{2}_{p_2+2}(\Delta )< \kappa ) \right\} \text {tr}(\varvec{W} \varvec{\Sigma ^*}) \\&\quad + E\left\{ (1- \kappa \chi ^{-2}_{p_2+4}(\Delta ))^2 I(\chi ^{2}_{p_2+4}(\Delta )< \kappa ) \right\} \text {tr}(\varvec{W} \varvec{\delta }\varvec{\delta }^\top ). \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Raheem, E., Ahmed, S.E. & Liu, S. Stein-rule M-estimation in sparse partially linear models. Jpn J Stat Data Sci 7, 507–535 (2024). https://doi.org/10.1007/s42081-023-00231-0

Download citation

Received: 04 August 2023
Revised: 03 November 2023
Accepted: 05 November 2023
Published: 23 December 2023
Issue Date: June 2024
DOI: https://doi.org/10.1007/s42081-023-00231-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stein-rule M-estimation in sparse partially linear models

Abstract

Similar content being viewed by others

Assessing the Relative Performance of Penalty and Non-penalty Estimators in a Partially Linear Model

Efficient Shrinkage for Generalized Linear Mixed Models Under Linear Restrictions

Shrinkage and Sparse Estimation for High-Dimensional Linear Models

1 Introduction

1.1 Statement of the problem

2 Proposed LS estimation

3 Proposed shrinkage M-estimation strategies

3.1 Full model and submodel estimation strategies for \(\hat{\varvec{\beta }}_1\)

3.2 Test statistic

4 Asymptotic properties of the estimators

Theorem 1

Theorem 2

5 Asymptotic bias and risk of the estimators

Theorem 3

5.1 Asymptotic bias of the estimators

Theorem 4

5.2 Asymptotic risk and risk performance of the estimators

6 Simulation studies

6.1 Error distributions

6.2 Relative risk comparison

7 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix

A. Regularity conditions

B. Assumptions

Assumption B.1

Assumption B.2

Remark 1

Remark 2

1.1 B.1 Consistency and Asymptotic Normality

Remark 3

Lemma B.1

Proof

Theorem B.1

Proof

C. Proof for Theorem 5.1

D. Proof for Theorem 5.2

E. Derivation of Asymptotic Risk of the Estimators

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation