1 Introduction

Consider the following partially linear single-index model (PLSIM):

$$\begin{aligned} Y = \beta ^{\top }X + g(\alpha ^{\top }Z) + \epsilon , \end{aligned}$$
(1.1)

where Y is the scale response, \(W=(X^{\top },Z^{\top })^{\top }\in {\mathbb {R}}^{p_1+p_2}\) is the \((p_1+p_2)-\)dimensional covariate, \(g(\cdot )\) an unknown smooth functions, \(\beta \) and \(\alpha \) are unknown vectors of parameters, and \(\epsilon \) is the error term such that \(E(\epsilon |X, Z)=0\). For identifiability, \(\alpha \) satisfies \(||\alpha ||=1\). Typically, model (1.1) is a reasonable compromise between fully parametric and fully nonparametric modelling. The literature on PLSIM estimation is enormous. For example, Carroll et al. (1997) first proposed a maximum quasi-likelihood method to estimate the generalised PLSIM, Xia and Härdle (2006) extended the minimum average variance estimation (MAVE) approach developed by Xia et al. (2002), Zhu and Xue (2006) proposed empirical-likelihood-based inference for the PLSIM, Wang et al. (2010) developed a two-step method to estimate the model, Liang et al. (2010) proposed the semi-parametrically efficient profile least-squares estimators of coefficients, and Lu et al. (2019) devised a method to consistently estimate the biased PLSIM. Zhao et al. (2020) considered PLSIMs of panel data with errors correlated in space and time and proposed a generalised F-type test method to check index parameters. It is not all so easy to determine whether a real data set corresponds to the given statistical formalisation. Therefore, it is crucial to perform a suitable and efficient model checking before further statistical analysis.

Many efforts have been devoted to checking parametric models since the 1980s. The two most popular methodologies in the literature are local and global smoothing methods. Local smoothing methods are sensitive to high-frequency/oscillating alternative models in low-dimensional cases. However, they suffer from slow convergence rates due to nonparametric estimation and thus are greatly affected by the curse of dimensionality. For examples of local smoothing methods based on nonparametric estimation, see Härdle and Mammen (1993), Zheng (1996), Fan and Li (1996), Dette (1999), Fan et al. (2001), Fan and Huang (2001), Koul and Ni (2004) and Van Keilegom et al. (2008).

Globally smoothing methods involve empirical process-based tests that are typically functions of the averages of weighted sums of residuals. These tests converge to their weak limits at the rate of \(n^{-1/2}\), which is the fastest possible rate at which they can detect local alternatives at the fastest possible rate of \(n^{-1/2}\). Thus, they have theoretical advantages over local smoothing tests. However, when the dimension is greater than 1, the intractability of the limiting null distributions requires a resampling approximation such as the wild bootstrap to determine critical values. Practical evidence shows that these type of tests are less sensitive to oscillating alternatives models. For examples of these methods, see Stute (1997), Stute et al. (1998), Zhu (2003), Khmaladze and Koul (2004), and Stute et al. (2008). González Manteiga and Crujeiras (2013) provided a comprehensive review of the studies in this domain.

A direct way to alleviate the curse of dimensionality is to project the high-dimensional covariates onto one-dimensional spaces. Escanciano (2006) and Lavergne and Patilea (2008, 2012) proposed tests based on projected covariates. Zhu (2003) and Stute et al. (2008) used residual processes to construct tests that can also be considered dimension reduction types. These tests typically require Monte Carlo approximations to determine critical values (e.g. Escanciano 2006; Lavergne and Patilea 2008), although some of them, such as that of Lavergne and Patilea (2012), are asymptotically distribution-free. All these tests use one-dimensional projections to overcome the curse of dimensionality. However, the computational of the test statistics is a serious burden. The computations become even more elaborate if we further need to use a re-sampling approximation such as the bootstrap to determine critical values. Recently, Guo et al. (2016) developed the innovative model-adaptive local smoothing methodology to test the specification of parametric single-index models, thus alleviating the dimensionality problem. Zhu et al. (2017), Tan et al. (2018), Tan and Zhu (2019), Zhu et al. (2021) and Zhu et al. (2022) extended this strategy to use other parametric dimension reduction models. Li et al. (2021) proposed an adaptive-to-model hybrid of tests for parametric regression models, which fully inherits the merits of nonparametric estimation-based tests and empirical process-based tests and while avoiding their shortcomings. However, few studies have considered model checking for PLSIMs based on dimension reduction. In this paper, we construct a groupwise dimension reduction-based adaptive-to-model test for PLSIMs, to mitigate the curse of dimensionality. First, the groupwise dimension reduction method ensures that the proposed test statistic is automatically adaptive to the underlying model under the respective null and alternative hypotheses. It can thus alleviate the dimensionality problem, and simultaneously achieve the omnibus property under the alternative hypothesis. Under the null hypothesis, in probability, the test statistic only involves the bivariate covariates \(\beta ^{\top }X\) and \(\alpha ^{\top }Z\). The null distribution of the proposed test statistic converges to the limiting null distribution at a faster rate \(O_p(nh)\), where h is being a bandwidth converging to zero at a certain rate. Moreover, by fully utilising the information of the low-dimensional null model, the proposed test can detect the local alternatives distinct from the null hypothesis at a faster rate \(O_p(n^{-1/2}h^{-1/2})\) than the convergence rate \(O_p(n^{-1/2}h^{-(p_{1}+p_{2})/4})\) of existing local smoothing tests for parametric models, where \(p_1+p_2\) is the dimension of the complete set (XZ), for example, as in the tests developed by Fan and Li (1996). Therefore, when the dimensions of X and Z are large, the new test is superior to existing local smoothing tests in terms of significance level maintenance and power enhancement.

The rest of the paper is organised as follows. In Sect. 2, we construct the test statistic. Because growpwise sufficient dimension reduction techniques play a crucial role in the proposed test, we also review groupwise least-squares (GLS) estimation. Section 3 discusses some asymptotic properties of the new test. Section 4 includes simulation studies and the analyses for two real data sets. The regularity conditions and all of the proofs of the theoretical results are presented in the Appendix.

2 Model-adaptive test construction

2.1 Basic construction

As we often have no knowledge of the model structure under the alternative hypothesis, the general alternative hypothesis takes the following form:

$$\begin{aligned} Y = m(X,Z) + \epsilon , \end{aligned}$$
(2.1)

where \(m(X,Z)=E(Y|X,Z)\) with \(m(\cdot ,\cdot )\) being an unknown smoothing function. For any \(p_1\times p_1\) orthogonal matrix \(B_1\) and \(p_2\times p_2\) orthogonal matrix \(B_2\), we have \(m(X,Z)=m(B_1B^{\top }_1X,B_2B^{\top }_2Z)\equiv : {\tilde{m}}(B^{\top }_1X,B^{\top }_2Z)\) with \({\tilde{m}}(\cdot ,\cdot )=m(B_1\cdot ,B_2\cdot )\) as the function m is unknown. Therefore, any purely nonparametric regression model (2.1) can be reformulated as the groupwise dimension reduction model:

$$\begin{aligned} Y = m(B^{\top }_1X,B^{\top }_2Z) + \epsilon , \end{aligned}$$
(2.2)

where \(B_1\) is a \(p_1 \times q_1\) matrix with \(q_1\) orthogonal columns and \(B_2\) is a \(p_2 \times q_2\) matrix with \(q_1\) orthogonal columns, \(q_1\) and \(q_2\) are unknown numbers such that \(1 \le q_1 \le p_1\) and \(1 \le q_2 \le p_2\). For identifiability, we assume that the matrixes \(B_1\) and \(B_2\) satisfy \(B^{\top }_1B_1=I_{q_1}\) and \(B^{\top }_2B_2=I_{q_2}\). Based on this observation, we consider an alternative model (2.2) that covers more model structures and is widely used in the sufficient dimension reduction field. We reformulate the hypotheses as

$$\begin{aligned}&H_0:\ E(Y|X, Z) = \beta ^{\top }X + g(\alpha ^{\top }Z)\ {\text { for some }} \beta \in {\mathbb {R}}^{p_1}, \alpha \in {\mathbb {R}}^{p_2},\\&H_1:\ E(Y|X, Z) = m(B^{\top }_1X,B^{\top }_2Z) \ne \beta ^{\top }X + g(\alpha ^{\top }Z) \ \mathrm{{for\ any\ }} \beta \in {\mathbb {R}}^{p_1}, \alpha \in {\mathbb {R}}^{p_2}. \end{aligned}$$

The null and alternative models can then be unified. Under the null hypothesis, \(q_1=1\), \(q_2=1\), \(B_1=\beta /||\beta ||_2\) and \(B_2=\alpha \). \(||A||_2\) denotes the \(L_2\)-norm of a vector A throughout this paper. Under the alternative hypothesis, \(q_1\ge 1\) and \(q_2\ge 1\). Therefore, we can construct a test that is automatically adaptive to the null and alternative models by consistently estimating \(q_1\), \(q_2\), \(B_1\) and \(B_2\) under the null and the alternative models, respectively.

Let \(\epsilon =Y-\beta ^{\top }X-g(\alpha ^{\top }Z)\). Under the null hypothesis \(H_0\), \(B_1=\kappa \beta \) with \(\kappa =1/||\beta ||_2\), and \(B_2=\alpha \). Thus, we have

$$\begin{aligned} E(\epsilon |X, Z)=0 \Rightarrow E(\epsilon |X, Z) =E(\epsilon | B^{\top }_1X,B^{\top }_2Z)= 0. \end{aligned}$$

such that

$$\begin{aligned}&E(\epsilon E(\epsilon |B^{\top }_1X,B^{\top }_2Z)f(B^{\top }_1X,B^{\top }_2Z))=E(E^2(\epsilon |B^{\top }_1X,B^{\top }_2Z)f(B^{\top }_1X,B^{\top }_2Z))=0, \end{aligned}$$

where \(f(\cdot )\) denotes the density function of \((B^{\top }_1X,B^{\top }_2Z)\). Under the alternative hypothesis \(H_1\), as

$$\begin{aligned} E(Y|X, Z)=E(\epsilon |B^{\top }_1X,B^{\top }_2Z)=m(B^{\top }_1X,B^{\top }_2Z)-\beta ^{\top }X - g(\alpha ^{\top }Z)\ne 0, \end{aligned}$$

we have

$$\begin{aligned} E\left[ \epsilon E(\epsilon |B^{\top }_1X,B^{\top }_2Z)W(B^{\top }_1X,B^{\top }_2Z)\right] =E\left[ E^2(\epsilon |B^{\top }_1X,B^{\top }_2Z)\right] >0. \end{aligned}$$
(2.3)

The above argument implies that we can construct a consistent test based on the left term in (2.3). The null hypothesis \(H_0\) is rejected for large values of the test statistic.

Let \(\{(x_1,z_1,y_1), \cdots , (x_n,z_n,y_n)\}\) denote independent and identically distributed (i.i.d.) samples. We estimate \(E(\epsilon |B^{\top }_1X,B^{\top }_2Z)\) using the kernel estimate:

$$\begin{aligned}&{{\hat{E}}}({\hat{\epsilon }}_i|{B_{1n}}^\top x_i,{B_{2n}}^\top z_i )=\frac{\frac{1}{n-1}\sum _{j\ne i}^nK_{h}(B^{\top }_{1n} x_j-B^{\top }_{1n} x_i,B^{\top }_{2n} z_j-B^{\top }_{2n} z_i){\hat{\epsilon }}_j}{\frac{1}{n-1}\sum _{j\ne i}^n K_{h}(B^{\top }_{1n} x_j-B^{\top }_{1n} x_i,B^{\top }_{2n} z_j-B^{\top }_{2n} z_i)}, \end{aligned}$$

where \(K_{h}=K(\cdot /h)/h^{{\hat{q}}_1+{\hat{q}}_2}\) with \(K(\cdot )\) being a \(({\hat{q}}_1+{\hat{q}}_2)\)-dimensional product kernel function with univariate kernel function \(k(\cdot )\) and h being a bandwidth, \(B_{1n}\) and \(B_{2n}\) are estimates of \(B_1\) and \(B_2\) with estimated structural dimensions \({\hat{q}}_1\) and \({\hat{q}}_2\), respectively, which we discuss in the next subsection. Here, \({\hat{\epsilon }}_i\) denotes the estimate of the residual term \(\epsilon _i\), and \({\hat{\epsilon }}_i = y_i-{\hat{\beta }}^{\top }_1x_i-{\hat{g}}({\hat{\alpha }}^{\top } z_i)\), where \({\hat{\beta }}^{\top }_1\), \({\hat{\alpha }}\) and \({\hat{g}}(\cdot )\) denote the estimators of \(\beta \), \(\theta \) and \(g(\cdot )\), respectively. More details are presented in the Appendix.

The density function \(f({B}^{\top }_{1n} x_i,{B}^{\top }_{2n} z_i)\) for \(i=1,2,...,n\), can be estimated by the following kernel form:

$$\begin{aligned} {\hat{f}}_{i}=\frac{1}{n-1}\sum _{j\ne i}^n K_{h}({B}^{\top }_{1n}x_j-{B}^{\top }_{1n} x_i,{B}^{\top }_{2n} z_j-{B}^{\top }_{2n} z_i). \end{aligned}$$

Therefore, a non-standardised test statistic is defined by

$$\begin{aligned} S_{n}= & {} \frac{1}{n(n-1)}\sum _{i=1}^n\sum _{j\ne i}^n {\hat{\epsilon }}_i{\hat{\epsilon }}_j K_{h}({B}^{\top }_{1n} x_j-{B}^{\top }_{1n}x_i,{B}^{\top }_{2n}z_j-{B}^{\top }_{2n}z_i). \end{aligned}$$
(2.4)

Remark 2.1

The nonparametric kernel-based test in Zheng (1996) can also be extended to check the PLSIM, and its test statistic can be defined as:

$$\begin{aligned} S_{ZHn} = \frac{1}{n(n-1)} \sum _{i=1}^{n} \sum _{\begin{array}{c} j=1 \\ j \ne i \end{array}}^{n} \frac{1}{h^{p_1+p_2}} {\tilde{K}}\left( \frac{x_{i}-x_{j}}{h}, \frac{z_{i}-z_{j}}{h} \right) \hat{\epsilon _{i}} \hat{\epsilon _{j}}, \end{aligned}$$
(2.5)

where \({\tilde{K}}_{h}={\tilde{K}}(\cdot /h,\cdot /h)/h^{p_1+p_2}\) with \({\tilde{K}}(\cdot )\) being a \((p_1+p_2)-\)dimensional kernel function. There are two differences between the formulae (2.4) and (2.5). First, we use \(K_h(\cdot )\) in \(S_n\) instead of \({\tilde{K}}_h(\cdot ,\cdot )\) in \(T_{ZH}\). We prove that under \(H_0\), \({\hat{q}}_1={\hat{q}}_2= 1\), \({B}_{1n}\rightarrow \beta /||\beta ||_2\) and \({B}_{2n}\rightarrow \alpha /||\alpha ||_2\) in probability, implying that our test reduces the dimension \(p_1+p_2\) to 2. Under the alternative hypothesis, we show that \(B_{1n}\) and \(B_{2n}\) are automatically consistently estimated. Second, it can be easily shown that for the test statistic \(S_{ZHn}\) to have a finite limiting distribution under the null hypothesis, the standardising constant must be \(nh^{(p_1+p_2)/2}\) to obtain \(nh^{(p_1+p_2)/2}S_{ZHn}\). We will see that the standardising form \(nhS_n\) has a finite limit under \(H_0\) and diverges to infinity much faster than the typical rate \(nh^{(p1+p2)/2}\) of the test in Zheng (1996) under \(H_1\). The results are presented in Sect. 3.

2.2 A review of groupwise least squares estimation

In general, matrices \(B_1\) and \(B_2\) are not identifiable. For any \(q_1 \times q_1\) orthogonal matrix \(C_1\) and \(q_2 \times q_2\) orthogonal matrix \(C_2\), as the \(\sigma \)-fields generated by the random variable \((B^{\top }_1X,B^{\top }_2Z)\) are equivalent to those generated by \(({\tilde{B}}^{\top }_1X,{\tilde{B}}^{\top }_2Z)\) with \({\tilde{B}}_1=B_1\times C_1\) and \({\tilde{B}}_2=B_2\times C_2\), we have

$$\begin{aligned} E(\epsilon |B^{\top }_1X,B^{\top }_2Z)=E(\epsilon |{\tilde{B}}^{\top }_1X,{\tilde{B}}^{\top }_2Z). \end{aligned}$$

However, it is sufficient to identify the spaces spanned by \(B_1\) and \(B_2\) when we construct an adaptive-to-model test in Sect. 2.1. Groupwise dimension reduction (Li 2009) can be used to identify the subspaces spanned by the column vectors of matrices \(B_1\) and \(B_2\).

There are several groupwise dimension reduction approaches available in the literature. Li (2009) proposed a framework for grouped sufficient dimension reduction. Motivated by the MAVE method in Xia et al. (2002), Li et al. (2010) proposed an estimator to incorporate group information into dimension reduction. Guo et al. (2015) developed groupwise dimension reduction based on the “direct sum envelope". Zhou et al. (2016) discussed overlapped groupwise dimension reduction. Generally, these methods are computationally demanding because the resulting estimators need to be solved by an iterative procedure. This procedure involves iteratively estimating the nonparametric and the parametric component. Zhu et al. (2021) proposed a GLS estimation method for groupwise dimension reduction to avoid the iterative procedure. Thus, we recommend the GLS method in Zhu et al. (2021) to estimate \(B_1\) and \(B_2\).

According to Zhu et al. (2021), GLS estimation proceeds as follows:

  1. 1.

    Consider some transformation functions of the response variable, \(f_1(Y),\ldots ,f_t(Y)\) satisfying \(E(f_k(Y))=0\), where t is a prespecified number, and the least-squares estimates of regressing \(f_k(Y)\) on \(W=(X^{\top },Z^{\top })^{\top }\) is

    $$\begin{aligned} \beta _{k}=\arg \min _{\beta _k}E\{f_k(Y)-W^{\top }\beta _k\}^2, \end{aligned}$$

    for \(k=1,\cdots , t\). Then the target matrix M can be constructed as follows:

    $$\begin{aligned} M=(M^{\top }_1, M^{\top }_2)^{\top }=\left( \frac{\beta _{1}}{1+||\beta _{1}||},\ldots , \frac{\beta _{t}}{1+||\beta _{t}||}\right) , \end{aligned}$$
    (2.6)

    where \(M_1\) and \(M_2\) are \(p_1\times t\) and \(p_2\times t\) matrixes, respectively.

  2. 2.

    Then \(M_iM^{\top }_i\) is a \( p_i\times p_i\) positive semi-definite matrix satisfying \(\mathrm{{Span}}(M_iM^{\top }_i) =\mathrm{{Span}}(B_i)\), for \(i=1,2\).

  3. 3.

    When the observations \(\{w_i, y_i\}_{i=1}^n\) are available, \(\beta _k\) can be estimated by the least squares estimates as:

    $$\begin{aligned} \beta _{kn}=\arg \min _{\beta _k}\sum _{i=1}^n\left\{ f_k(y_i)-w^{\top }_i\beta _k\right\} ^2. \end{aligned}$$

    The target matrix M in (2.6) can be estimated by:

    $$\begin{aligned} M_n=(M^{\top }_{1n}, M^{\top }_{2n})^{\top }=\left( \frac{\beta _{1n}}{1+||\beta _{1n}||},\ldots , \frac{\beta _{tn}}{1+||\beta _{tn}||}\right) . \end{aligned}$$

    When \(q_i\) is given, an estimate \( B_{in}\) of \(B_i\) consists of the eigenvectors associated with the \(q_i\) largest eigenvalues of \(M_iM^{\top }_i\), for \(i=1,2\). For more details, refer to Zhu et al. (2021).

2.3 Dimensionality estimation

As illustrated above, the two structure dimensions \(q_1\) and \(q_2\) are essential to estimating matrices \(B_1\) and \(B_2\). We adopt the thresholding double ridge ratio criterion proposed by Zhu et al. (2020). To address the special case \(q_i=p_i\), we define some artificial eigenvalues \({\hat{\lambda }}_{i(p_i+1)}={\hat{\lambda }}_{i(p_i+2)}=0\). Let

$$\begin{aligned} {\hat{r}}_{ij}=\frac{{\hat{s}}^*_{i(j+1)}+c_{2n}}{{\hat{s}}^*_{ij}+c_{2n}} \ \ \mathrm{{with }}\ \ {\hat{s}}^*_{ij}=\frac{{\hat{\lambda }}_{i(j+1)}+c_{1n}}{{\hat{\lambda }}_{ij}+c_{1n}}-1, \end{aligned}$$
(2.7)

where \(c_{1n}\) and \(c_{2n}\) are the two ridges converging to 0, and \(0 \le {\hat{\lambda }}_{ip_i} \le \ldots \le {\hat{\lambda }}_{i1}\) are the eigenvalues of matrix \(M_{in}M^{\top }_{in}/t\). The dimension \(q_i\) can be estimated by:

$$\begin{aligned} {\hat{q}}_i:= \left\{ \begin{array}{ll} 0, &{} \mathrm{{\ if}}\ {\hat{r}}_{ij}> \tau , \ \mathrm{{\ for\ all}}\ 1\le j\le p_i,\\ \arg \max _{1\le j \le p_i}\left\{ j: \ {\hat{r}}_{ij}\le \tau \right\} , &{} \mathrm{{otherwise}}, \end{array}\right. \end{aligned}$$
(2.8)

where \(0<\tau <1\). Based on the rule of thumb in Zhu et al. (2020), we set \(\tau =0.5\) to avoid overestimation with a large \(\tau \) or underestimation with small \(\tau \). As the target matrix here differs from that in Zhu et al. (2020), we recommend the ridge values \(c_{1n}=0.4\log (n)/\sqrt{n}\) and \(c_{2n}=0.8\log (n)/\sqrt{n}\), based on some numerical studies.

The following proposition presents the consistency of the estimators \({\hat{q}}_k\), for \(k=1,2\).

Proposition 2.1

Under conditions A1 and A2 in the Appendix, assume that \(c_{1n} \rightarrow 0\), \(c_{2n} \rightarrow 0\) and \(c_{1n}c_{2n}n \rightarrow \infty \). Then we have

$$\begin{aligned} \lim _{n \rightarrow \infty }P\left( {\hat{q}}_1=q_1,{\hat{q}}_2=q_2\right) =1. \end{aligned}$$

3 Asymptotic properties

3.1 Limiting null distribution

First, we provide some notation. Let

$$\begin{aligned} s^2 = 2\int {}K^2(u,v)dudvE\{[Var(\varepsilon ^2|B^{\top }_1X,B^{\top }_2Z)]^2p(B^{\top }_1X,B^{\top }_2Z)\}, \end{aligned}$$
(3.1)

and

$$\begin{aligned} {\hat{s}}^2 = \frac{2h^2}{n(n-1)}\sum _{i=1}^n\sum _{j \ne i}^nK^2_{h}({B}^{\top }_{1n} x_j-{B}^{\top }_{1n} x_i,{B}^{\top }_{2n} z_j-{B}^{\top }_{2n} z_i){\hat{\epsilon }}^2_i{\hat{\epsilon }}^2_j. \end{aligned}$$
(3.2)

Next, we state the result for the null limiting distribution.

Theorem 3.1

Under \(H_0\) and the regularity conditions in the Appendix, we have

$$\begin{aligned} nhS_n {\mathop {\longrightarrow }\limits ^{d}} N(0, {s^2}). \end{aligned}$$

Furthermore, \({s^2}\) can be consistently estimated by \({\hat{s}}^2\).

From Theorem 3.1, we have the standardised test statistic \(T_n\):

$$\begin{aligned} T_n=nhS_n / {\hat{s}} {\mathop {\longrightarrow }\limits ^{d}} N(0,1). \end{aligned}$$

3.2 Power study

To study how sensitive the new test statistic is against the alternative hypothesis, we consider the following sequence of local alternative models:

$$\begin{aligned} H_{1n} : Y=\beta ^{\top }X + g(\alpha ^{\top }Z)+ C_nf(B^{\top }_1X,B^{\top }_2Z)+\eta , \end{aligned}$$
(3.3)

where the function \(f(\cdot ,\cdot )\) is continuous and differentiable and satisfies \(E[f^2(B^{\top }_1X,B^{\top }_2Z)] < \infty \), \(E(\eta |X,Z)=0\), and \(\beta \) and \(\theta \) are the linear combination of the columns of \(B_1\) and \(B_2\), respectively.

Lemma 3.1

Under the local alternative \(H_{1n}\) in (3.3) with \(C_n = n^{-1/2}h^{-1/2}\) and the same conditions in Proposition 2.1 except that \(C_n^2\log {n} \le c_n \rightarrow 0\), we have

$$\begin{aligned} \lim _{n \rightarrow \infty }P\left( {\hat{q}}_1=q_1,{\hat{q}}_2=q_2\right) =1. \end{aligned}$$

Next, we state the results of the power performance of the test statistic.

Theorem 3.2

Under the regularity conditions in the Appendix, we have the following results.

  1. (i)

    Under \(H_{1n}\) with a fixed \(C_n >0\)

    $$\begin{aligned} T_{n}/(n h) {\mathop {\longrightarrow }\limits ^{P}} {Constant} >0. \end{aligned}$$
  2. (ii)

    Under the sequence of local alternative hypotheses \(H_{1n}\) in (3.3), \(T_n\) has different asymptotic properties based on the rates of \(C_n\) as follows.

    1. (a)

      If \(q_1=q_2=1\) and \(C_n = n^{-1/2}h^{-1/2}\),

      $$\begin{aligned} T_n {\mathop {\longrightarrow }\limits ^{d}} N(u, 1), \end{aligned}$$

      where

      $$\begin{aligned} u=E\{f^2(B^{\top }_1X,B^{\top }_2Z)p_{B_1B_2}(B^{\top }_1X,B^{\top }_2Z)\}/s \end{aligned}$$

      with \(s^2\) defined by (3.1);

    2. (b)

      if \(q_1+q_2>2\), \(C_nn^{1/2} h^{1/2}\rightarrow c_0>0\) for some constant \(c_{0}\) or \(C_nn^{1/2} h^{1/2}\rightarrow \infty \) and \(C_nn^{1/2} h^{(q_1+q_2)/4}\rightarrow 0\), we have

      $$\begin{aligned} T_n/h^{(q_1+q_2-2)/2} {\mathop {\longrightarrow }\limits ^{\mathrm {d}}} N(0,1); \end{aligned}$$
    3. (c)

      if \(q_1+q_2>2\) and, \(C_n=n^{-1/2} h^{-(q_1+q_2)/4}\), we have

      $$\begin{aligned} T_n/h^{(q_1+q_2-2)/2} {\mathop {\longrightarrow }\limits ^{\mathrm {d}}} N(u,1); \end{aligned}$$
    4. (d)

      if \(q_1=q_2=1\) and \(C_n n^{1/2}h^{1/2} \rightarrow \infty \), or \(q_1+q_2>2\) and \(C_nn^{1/2} h^{(q_1+q_2)/4}\rightarrow \infty \), we have

      $$\begin{aligned} T_n/(C^2_nnh){\mathop {\longrightarrow }\limits ^{\mathrm {P}}} {u} >0. \end{aligned}$$

4 Numerical studies

4.1 Simulations

In this subsection, we examine the finite-sample performance of the proposed test using several numerical examples. Each experiment is repeated 1000 times to compute the empirical sizes and powers at the significance level \(\alpha = 0.05\). The parametrical vectors \(\beta \) and \(\alpha \) are estimated using the MAVE method in Xia and Härdle (2006) (see the Appendix). In some applications, the asymptotic normal approximation does not work well in finite sample settings. Thus, re-sampling techniques are often used in finite samples. We apply the wild bootstrap algorithm adopted by Guo et al. (2016). Consider the bootstrap observations \(y_{i}^{*}={{\hat{\beta }}}^{T} x_i +g({\hat{\alpha }}^{T}z_i)+\epsilon _{i}^{*},\) where \(\epsilon _{i}^{*}={\hat{\epsilon }}_{i} \times U_{i}\). \(\left\{ U_{i}\right\} _{i=1}^{n}\) can be chosen to be i.i.d. Bernoulli variates with

$$\begin{aligned} P\left( U_{i}=\frac{1-\sqrt{5}}{2}\right) =\frac{1+\sqrt{5}}{2 \sqrt{5}}, \quad P\left( U_{i}=\frac{1+\sqrt{5}}{2}\right) =1-\frac{1+\sqrt{5}}{2 \sqrt{5}} \text{. } \end{aligned}$$

Let \(T^*_n\) be the bootstrap version of \(T_n\), based on the bootstrap samples \(\{(x_1,z_1,y^*_1)\), \(\cdots , (x_n,z_n,y^*_n)\}\). The null hypothesis is rejected if \(T_n\) is larger than the corresponding quantile of the bootstrap distribution of \(T^*_n\).

For our test, we select the kernel function \(K(\cdot )\) to be Gaussian, and set the bandwidths as \(h_1=0.8n^{-1 /(1+4)}\) and \(h=0.8n^{-1 /\left( {\hat{q}}_{1}+{\hat{q}}_{2}+4\right) }\) with the estimated dimensions \({\hat{q}}_{1}\) and \({\hat{q}}_{2}\) of \(B_{1 n}^{\top } X\) and \(B_{2 n}^{\top } Z\), where \(h_1\) is used to estimate \(\beta \), \(\alpha \) and the function \(g(\cdot )\) and h is used to construct the test statistics.

As in Remark 2.1, we extend the test in Zheng (1996) to check the PLSIMs for comparison and write the test statistic as \(T^{ZH}_n\). For the test in Zheng (1996), we set the bandwidth as \(h=1.5 n^{-1 / (4+p_1+p_2)}\) and select the quartic kernel function \(K(u)=\frac{15}{16}\left( 1-u^{2}\right) ^{2}\), if \(|u| \le 1\) and 0, otherwise. We compare the results of these two test statistics computed from the 500 wild bootstrap samples.

The observations \(\left\{ x_i\right\} ^n_{i=1}\) and \(\left\{ z_i\right\} ^n_{i=1}\) are generated i.i.d. from multivariate normal distributions \(N(0_{p_1}, \Sigma _{1})\) and \(N(0_{p_2}, \Sigma _{2})\), respectively, and independent of the standard normal errors. Here, \(\Sigma _{1}=I_{p_1 \times p_1}\) and \(\Sigma _{2}=I_{p_2 \times p_2}\). The sample sizes are set as n = 200, 400 and the dimensions are \(p_{1} = p_{2} = 4, 8\).

Example 1

Consider the model

$$\begin{aligned} Y=\beta ^\top X+ (\alpha ^\top Z+1)^2+a(0.4 \beta ^\top X+1)^2 +\epsilon , \end{aligned}$$

where \(\beta \) and \(\alpha \) are set using the following two cases:

  • Case 1: \(\beta =(\underbrace{1, \ldots , 1}_{p_{1} / 2}, 0, \ldots , 0)^{\top } / \sqrt{p_{1} / 2}\), and \(\alpha =(0, \ldots , 0, \underbrace{1, \ldots , 1}_{p_{2} / 2})^{\top } / \sqrt{p_{2} / 2}\);

  • Case 2: \(\beta =(\underbrace{1, \ldots , 1}_{p_{1}}) / \sqrt{p_{1}}\), and \(\alpha =(\underbrace{1, \ldots , 1}_{p_{2}}) / \sqrt{p_{2}}\).

The error term \(\epsilon \) follows the standard normal distribution. In this example, \(a=0\) and \(a \ne 0\) correspond to the null and the alternative hypothesis, respectively. The results of the empirical sizes and powers are displayed in Table 1. From Table 1, we make the following observations:

First, the proposed \(T_n\) can control the empirical sizes well. \(T^{ZH}_n\) tends to be conservative, especially when \(p_1=p_2=4\), but when \(p_1 =p_2= 8\), the significance level can be better maintained. Second, the empirical powers of the proposed test \(T_n\) increase reasonably with larger a, and thus the test is significantly and uniformly more powerful than \(T^{ZH}_n\). \(T^{ZH}_n\) is invalid when detecting the alternative hypothesis with a large dimension \(p_1=p_2=8\). For \(p_1=p_2=4\) and \(p_1=p_2=8\), the dimensions of X and Z have less influence for \(T_n\) than they do for \(T^{ZH}_n\). \(T_n\) performs uniformly better than \(T^{ZH}_n\) in the above two cases.

Table 1 Empirical sizes and powers of the tests for Example 1

Example 2

In this example, we examine the finite sample performance of the proposed method under the following model:

$$\begin{aligned} Y=\beta ^\top X+ exp(\alpha ^\top Z) +0.6(\alpha ^\top Z)^2+a(\beta ^\top X)^2 +\epsilon , \end{aligned}$$

where all parameter values are set to be the same as those in Example 1. \(a = 0\) corresponds to the null hypothetical model. The empirical sizes and powers are presented in Table 2.

We observed that the power performances of \(T_n\) in Example  1 and Example 2 are similar by comparing Tables 1, 2 and 3. The proposed test can maintain the significance level as well as in the previous example, and its power performs significantly and uniformly better than that of the test in Zheng (1996).

Table 2 Empirical sizes and powers of the tests for Example 2

Example 3

In this example, we examine the finite sample performance of the proposed method under the model:

$$\begin{aligned} Y=\beta ^\top X+ 0.9(\alpha ^\top Z)^3 + a(X_1+0.5X_2^2) +\epsilon , \end{aligned}$$

where \(X_i\) and \(Z_i\) denote the \(i-\)th components of X and Z, respectively. All parameters values are the same as those used in Example 1. The related results are displayed in Table 3.

From Table 3, both tests still control the size well. When \(p_1=p_2=4\), the empirical powers of \(T_{n}\) increase faster than those of \(T^{ZH}_n\) as a increase. For \(T^{ZH}_n\), the empirical sizes and powers are close to the significance level \(\alpha =0.05\) when \(p_1=p_2=8\). \(T_{n}\) is significantly better than \(T^{ZH}_n\) at detecting the alternative hypothesis because dimension reduction mitigates the curse of dimensionality.

Table 3 Empirical sizes and powers of the tests for Example 3

Example 4

To examine the stability of powers against the structure dimension, we consider a more complex model with higher structure dimensions \(q_1\) and \(q_2\):

$$\begin{aligned} Y= & {} \beta ^\top X+ \sin (\alpha ^\top Z)+2\alpha ^\top Z+a(X_1^3+0.3X_2^2+\exp (X_3)+|X_4| \\&+0.2Z_1^2+Z_2^2+\cos (Z_3)+Z_4^3)/4 +\epsilon , \end{aligned}$$

where \(X_i\) and \(Z_i\) denote the \(i-\)th components of X and Z, respectively. All parameters values are the same as those used in Example 1. \(a = 0\) corresponds to the null hypothetical model. Under the alternative hypotheses, the structural dimensions \(q_1\) and \(q_2\) of the alternative models are equal to 4. The results are presented in Table 4.

We observe that the performances of empirical powers and sizes in Example 4 and Examples 13 are similar by comparing Table 4 and Tables 1, 2 and 3. The empirical sizes of the two tests are close to the significance level. However, the proposed test has significantly higher power than \(T_{n}^{ZH}\), especially when \(p_1=p_2=8\). We also find that the structural dimensions have a negligible effect on the empirical sizes and powers of our test.

Table 4 Empirical sizes and powers of the tests for Example 4

4.2 Real data analysis

4.2.1 Body fat data

We now apply the proposed method to the analysis of the body fat data, which can be found at http://lib.stat.cmu.edu/datasets/bodyfat. Chen et al. (2016) analysed this dataset, which provides estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements. The dataset contains 251 samples with 14 attributes. Consistent with Chen et al. (2016), we select the following 12 attributes: \(\mathrm height^4/weight^2\), age, and circumferences of 10 body parts, namely, neck, chest, abdomen, hip, thigh, knee, ankle, biceps, forearm, and wrist. We set the knee, ankle, and forearm circumferences as \(X=(X_1, X_2, X_3)\). The other predictors are set as \(Z=(Z_1,Z_2,\cdots ,Z_9)\). The response variable is the logarithm of the percentage of body fat (Y). We delete one observations in which the percentage of body fat is equal to 0.

The value of the test statistic is \(T_n=18.2261\) and the \(p-\)value is approximately 0. Therefore, there is enough evidence to reject the null hypothesis at the significance level \(\alpha = 0.05\). The residual plot of the PLSIM fitted against \({\hat{\beta }}^{\top }X\) and \({\hat{\alpha }}^{\top }Z\) in Fig. 1 shows a linear relationship between \({\hat{\beta }}^{\top }X\) and the residuals.

Fig. 1
figure 1

Plot of the residuals against the linear part and single-indexing direction in the body fat data

4.2.2 Auto MPG

This data set is obtained from the Machine Learning Repository at the University of California-Irvine (http://archive.ics.uci.edu/ml/datasets/Auto+MPG). Xia (2007) and Guo et al. (2016) analysed this data set. It has 406 samples and 8 attributes. The predictors are cylinders, engine displacement, horsepower, vehicle weight, time to accelerate from 0 to 60 mph, model year, and the origin of the car. Miles per gallon (Y) is the response variable. Consistent with Xia (2007), we set cylinders, engine displacement, horsepower, vehicle weight, time to accelerate from 0 to 60 mph, and model year as \(Z_1,Z_2,\cdots ,Z_6\). \(X_1=1\) if a car is from America, and 0 otherwise. \(X_2=1\) if a car is from Europe and 0 otherwise.

The value of the proposed test statistic is \(T_n= -1.1447\), and the p-value is 0.2523. Hence, the null hypothesis should be accepted at the significance level \(\alpha = 0.05\). We plot the residual plot against the single-indexing direction in Fig. 2, and the plot shows no pattern. Thus, it is reasonable to fit this dataset based on the PLSIM.

Fig. 2
figure 2

Plot of the residuals against the single-index direction in the Auto MPG data