Estimation for partially linear additive regression with spatial data

Qingguo, Tang; Wenyu, Chen

doi:10.1007/s00362-022-01326-8

Estimation for partially linear additive regression with spatial data

Regular Article
Published: 17 June 2022

Volume 63, pages 2041–2063, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Estimation for partially linear additive regression with spatial data

Download PDF

Tang Qingguo¹ &
Chen Wenyu¹

257 Accesses
Explore all metrics

Abstract

This paper studies a partially linear additive regression with spatial data. A new estimation procedure is developed for estimating the unknown parameters and additive components in regression. The proposed method is suitable for high dimensional data, there is no need to solve the restricted minimization problem and no iterative algorithms are needed. Under mild regularity assumptions, the asymptotic distribution of the estimator of the unknown parameter vector is established, the asymptotic distributions of the estimators of the unknown functions are also derived. Finite sample properties of our procedures are studied through Monte Carlo simulations. A real data example about spatial soil data is used to illustrate our proposed methodology.

Birnbaum–Saunders functional regression models for spatial data

Article 22 July 2019

Bayesian measurement error correction in structured additive distributional regression with an application to the analysis of sensor data on soil–plant variability

Article 01 March 2019

A goodness-of-fit test for regression models with spatially correlated errors

Article 18 September 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Spatial data, which are collected at different sites on the surface of the earth, arise in various areas of research, including econometrics, epidemiology, environmental science, image analysis, oceanography and many others. Many authors (e.g. Ripley 1981; Cressie 1991) have studied the parametric methods for statistical inference in spatial models. However, one often encounters situations where a particular parametric model cannot be adopted with confidence and thus a nonparametric method is used as an alternative. In the last two decades, efforts have been made in the literature to explore nonlinear relationship in spatial data. Among them, Tran (1990), Hallin et al. (2001) and Lee et al. (2004) discussed kinds of asymptotic properties of density estimation for spatial processes. Hallin et al. (2004) established weak consistency and asymptotic normality for local linear regression estimation. Gao et al. (2006) developed the properties of the local linear kernel estimator for semiparametric spatial regression model. Al-Sulami et al. (2017) developed a two-step estimation procedure for semiparametric spatial time-series autoregressive model and established the asymptotic properties of the estimators. Nandy et al. (2017) developed a regularized variable selection technique for building a spatial additive model. The focus of our work is on the estimation and inference of partially linear additive regression with spatial data.

Let $\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}$ be a random field indexed by $(i,j)\in {\mathbf {E}}^{2}$, where ${\mathbf {E}}=\{0,\pm 1, \pm 2, \cdots \}$ and $Y_{ij}$, with values in R, $\pmb {Z}_{ij}=(Z_{ij1},\ldots ,Z_{ijd_{1}})^{T}$, with values in $R^{d_{1}}$, $\pmb {X}_{ij}=(X_{ij1},\ldots ,U_{ijd_{2}})^{T}$, with values in $R^{d_{2}}$, are defined over some probability space $(\Omega ,{\mathcal {F}},P)$. A point $(i,j)\in {\mathbf {E}}^{2}$ is referred to as a site. Let S and $S'$ be two sets of sites. The Borel fields $\mathcal {B(S)}={\mathcal {B}}((Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in S)$ and $\mathcal {B(S')}={\mathcal {B}}((Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in S')$ are the $\sigma $-fields generated by the random vectors $(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})$ with (i, j) being the elements of S and $S'$ respectively. Let $d(S,S')$ be the Euclidean distance between S and $S'$. We will assume that $\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}$ satisfies the following mixing condition as defined in the literature (cf., Tran 1990; Hallin et al. 2004): there exists a function $\varphi (t)\downarrow 0$ as $t\rightarrow \infty $, such that whenever $S, S'\subset {\mathbf {E}}^{2}$,

$$\begin{aligned} \begin{array}{ll} \alpha ({\mathcal {B}}(S),{\mathcal {B}}(S')) &{} =\sup \{|P(AB)-P(A)P(B)|, A\in {\mathcal {B}}(S),B\in {\mathcal {B}}(S')\} \\ &{}\le \psi (Card(S),Card(S'))\varphi (d(S,S')), \end{array} \end{aligned}$$

(1.1)

where Card(S) denotes the cardinality of S, and $\psi $ is a symmetric positive function nondecreasing in each variable. In particular, if $\psi (\infty ,\infty )\le C_{0}$ for some positive constant $C_{0}$, $\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}$ is $\alpha $-mixing (or strongly mixing). Introduced by Rosenblatt (1956), $\alpha $-mixing dependence is a property shared by many time series models such as the autoregressive linear processes (see Athreya and Pantala 1986).

A crucial problem for a number of applications is the problem of spatial regression, where the influence of the vectors $\pmb {X}_{ij}$ and $\pmb {Z}_{ij}$ on some response variable $Y_{ij}$ is to be studied in a context of complex spatial dependence. In many practical situations, it is desirable to model multiple covariates nonparametrically. However, it is well known that multivariate nonparametric estimation is subject to the curse of dimensionality. A widely used approach for dimensionality reduction is to consider an additive model for the nonparametric part of the regression function in the partly linear model, which in turn results in the partially linear additive model. In this paper, we try to approximate the conditional mean function $E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij})$ by a partially linear additive function of the form

$$\begin{aligned} \Psi (\pmb {Z}_{ij},\pmb {X}_{ij}) =\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}), \end{aligned}$$

(1.2)

such that $E(Y_{ij}-\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}$ or, equivalently, $E(E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij}) -\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}$ is minimized over a class of partially linear additive functions of the form $\Psi (\pmb {Z}_{ij},\pmb {X}_{ij})$, where $\mu $ is an unknown constant, $\pmb {\beta }_{0}=(\beta _{01},\ldots ,\beta _{0d_{1}})^{T}$ is a $d_{1}$-dimensional vector of unknown parameters, $f_{r}(x_{r}), x_{r}\in [c_{r1},c_{r2}], r=1,\ldots ,d_{2}$ are unknown functions. For identifiability, we assume that $E[f_{r}(X_{r})]=0$ for all $1\le r\le d_{2}$.

Additive models are a special case of the more general projection pursuit regression model of Friedman and Stuetzle (1981). Additive models are useful for multivariate data analysis. They enable us to avoid the curse of dimensionality while retaining great flexibility in the regression function. However, in many empirical situations the mean of the outcome is assumed to depend on some covariates parametrically and some other covariates nonparametrically. Partially linear additive regression (PLAR), which contains both linear and nonlinear additive components, are more flexible than stringent linear models and are more parsimonious than general nonparametric regression models. PLAR have been studied by many authors, see, for example, Li (2000), Fan and Li (2003), Liang et al. (2008), You and Zhou (2013), Cheng et al. (2014), Lian et al. (2015), Sherwood and Wang (2016), Yang et al. (2019) and among others.

Gao et al. (2006) used marginal integration technique to estimate the additive components in PLAR with spatial data. However, this method is not suitable for high-dimensional vector $\pmb {X}_{ij}$. In this paper, we develop a new estimation method to estimate the additive components $f_{r}(x_{r})$ in (1.1). Our method has the following advantages comparing with the existing methods in the literature: (1) different from spline approach (Stone 1985; Huang et al. 2010) which needs to solve the constrained minimization problem and backfitting procedures (Brieman and Friedman 1985; Buja et al. 1989) which needs iterative algorithms, our estimators of additive functions have definite expressions, there is no need to solve the restricted minimization problem and no iterative algorithms are needed, resulting in fast and efficient estimation; (2) since there is only a smoothing parameter in B-spline procedure, when additive functions have different degrees of smoothness, the estimators of some functions become inefficient. In our proposed method, we first obtain the estimators of $\pmb {\beta }_{0}$ and the piecewise polynomial estimators of $f_{r}(x_{r}), r=1,\ldots , d_{2}$, the local linear estimator of $f_{r}(u_{r})$ is obtained by solving the univariate regression problem and the bandwidth $h_{r}$ is easily chosen; (3) different from marginal integration approach of Gao et al. (2006), when $\pmb {X}_{ij}$ is high-dimensional vector and the sample size is not big, our estimators behavior well, this can be seen from the following simulation and application; (4) under mild regularity conditions, the asymptotic properties of the proposed estimators are established.

In our spatial PLAR, spatial information is added into a regression modeling by the spatial dependence of the data $\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}),(i,j)\in {\mathbf {E}}^{2}\}$, where $\pmb {Z}_{ij}$ and $\pmb {X}_{ij}$ may contain both exogenous and endogenous variables, that is, neighboring values of $Y_{ij}$. Moreover, a component $Z_{ijr}$ of $\pmb {Z}_{ij}$ or a component ${X}_{ijr}$ of $\pmb {X}_{ij}$ may itself be a linear combination of neighboring values of $Y_{ij}$. In the literature, there is another stream of spatial regression modeling by using a bivariate function to model the spatial effect into the model, for example, Ramsay (2002) use a deterministic smooth surface function to describe the variations and connections among values at different locations. Yu et al. (2020) introduce a class of generalized geoadditive models, a synthesis of geostatistics and generalized additive models, where the effect of explanatory variables are modeled with additive univariate functions and the spatial effect is modeled via a bivariate function.

The paper is organized as follows. Section 2 describes the estimation method. Section 3 presents the asymptotic theory of our estimators. In Sect. 4, we conduct simulation studies to examine the finite-sample performance of the proposed procedures. In Sect. 5, the proposed methods are illustrated by analyzing a real estate data set. Technical proofs are relegated to the Appendix.

2 Estimation method

As stated in Sect. 1, the conditional mean function $E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij})$ is approximated by a partially linear additive function of the form $\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}) =\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})$ such that

$$\begin{aligned} E(Y_{ij}-\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}=E[Y_{ij}-(\mu +\pmb {Z}_{ij}^{T} \pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))]^{2} \end{aligned}$$

is minimized over a class of partially linear additive functions of the form $\Psi (\pmb {Z}_{ij},\pmb {X}_{ij})$ with $E(f_{r}(X_{ijr}))=0$, which implies that $ \mu =E(Y_{ij})-E(\pmb {Z}_{ij})^{T}\pmb {\beta }_{0}$. Since

$$\begin{aligned} \begin{array}{l} E[Y_{ij}-(\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))]^{2} \\ =E(E([Y_{ij}-E(Y_{ij}|\pmb {X}_{ij})-(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}|\pmb {X}_{ij}))^{T} \pmb {\beta }_{0}]^{2}|\pmb {X}_{ij})) \\ +E[E(Y_{ij}|\pmb {X}_{ij})-\mu -E(\pmb {Z}_{ij}|\pmb {X}_{ij})^{T}\pmb {\beta }_{0} -\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})]^{2}, \end{array} \end{aligned}$$

we then have

$$\begin{aligned} \pmb {\beta }_{0}= & {} (E[(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}|\pmb {X}_{ij}))(\pmb {Z}_{ij} -E(\pmb {Z}_{ij}|\pmb {X}_{ij}))^{T}])^{-1}E[(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}| \pmb {X}_{ij}))(Y_{ij}\\&-E(Y_{ij}|\pmb {X}_{ij}))] \end{aligned}$$

provided that the inverse exists, and $\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})$ is a projection of $E(Y_{ij}|\pmb {X}_{ij})-\mu -E(\pmb {Z}_{ij}|\pmb {X}_{ij})^{T}\pmb {\beta }_{0}$ on the space of additive functions of $X_{ijr},r=1,...,d_{2}$.

Assume that we have observations $(Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij})$ for $1\le i\le m$ and $1\le j\le n$. The total sample size is thus $N=m\times n$. We first estimate $\pmb {\beta }_{0}$. In order to approximate $f_{r}(x_{r})$ for $x_{r}\in [c_{r1},c_{r2}]$, we construct piecewise polynomial estimators of $f_{r}(x_{r})$ of degree $p_{r}$. We split equally $[c_{r1},c_{r2}]$ into $M_{rN}$ subintervals. Then the length of every subinterval is $2h_{r0}=(c_{r2}-c_{r1})/M_{rN}$. Let $I_{r\nu }=[c_{r1}+2(\nu -1)h_{r0},c_{r1}+2\nu h_{r0})$ for $1\le \nu \le M_{rN}-1$ and $I_{rM_{rN}}=[c_{r2}-2h_{r0},c_{r2}]$. Let $x_{r\nu }$ denote the center of the interval $I_{r\nu }$ and $\chi _{r\nu }$ denote the indicator function of $I_{r\nu }$, so that $\chi _{r\nu }(x_{r})$=1 or 0 according to $x_{r}\in I_{r\nu }$ or $x_{r}{\bar{\in }}I_{r\nu }$.

Denote $\breve{A}_{r\nu k}(x_{r})=\chi _{r\nu }(x_{r})[(x_{r}-x_{r\nu })/h_{r0}]^{k}$ for $k=0,1,\ldots ,p_{r}; \nu =1,\cdots ,M_{rN}; r=1,\ldots ,d_{2}$. We use $\breve{f}_{r}(x_{r})=\sum _{\nu =1}^{M_{rN}}\sum _{k=0}^{p_{r}}a_{r\nu k}\breve{A}_{r\nu k}(x_{r})$ to approximate $f_{r}(x_{r})$. Since $E[f_{r}(X_{r})]=0$, we then set $E[\breve{f}_{r}(X_{r})]=0$, this can be done by using $\breve{A}_{r\nu k}(x_{r})-\chi _{r\nu }(x_{r})E[\breve{A}_{r\nu k}(X_{r})]/E[\chi _{r\nu }(X_{r})]$ to replace $\breve{A}_{r\nu k}(x_{r})$ for $k=1,\ldots ,p_{r}$ and setting $\sum _{\nu =1}^{M_{rN}}a_{r\nu 0}E[\chi _{r\nu }(X_{r})]=0$. Let $a_{rM_{rN}0}=\sum _{\nu =1}^{M_{rN}-1}a_{r\nu 0}E[\chi _{r\nu }(X_{r})]/E[\chi _{rM_{rN}}(X_{r})]$. Since $E[\breve{A}_{r\nu k}(X_{r})]$ and $E[\chi _{r\nu }(X_{r})]$ are unknown, we then use

$$\begin{aligned} {\bar{A}}_{r\nu k}=\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})[(X_{ijr} -x_{r\nu })/h_{r0}]^{k}/\Big [\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})\Big ] \end{aligned}$$

to replace $E[\breve{A}_{r\nu k}(X_{r})]/E[\chi _{r\nu }(X_{r})]$. Denote

$$\begin{aligned} \begin{array}{l} A_{r\nu 0}(x_{r})=\chi _{r\nu }(x_{r})-\frac{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu } (X_{ijr})}{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})}\chi _{rM_{rN}}(x_{r}), \ \ \ \nu =1,\ldots ,M_{rN}-1, \\ A_{r\nu k}(x_{r})=\chi _{r\nu }(x_{r})\{[(x_{r}-x_{r\nu })/h_{r0}]^{k}-{\bar{A}}_{r\nu k}\}, \ \ \ k=1,\ldots ,p_{r}, \\ \pmb {A}_{r\nu }(x_{r})=(A_{r\nu 0}(x_{r}),A_{r\nu 1}(x_{r}),\cdots ,A_{r\nu q_{r}}(x_{r}))^{T}, \nu =1,\cdots ,M_{rN}-1, \\ \pmb {A}_{rM_{rN}}(x_{r})=(A_{rM_{rN} 1}(x_{r}),\cdots ,A_{rM_{rN} q_{r}}(x_{r}))^{T}, \\ \pmb {A}_{r}(x_{r})=(\pmb {A}_{r1}(x_{r})^{T},\cdots ,\pmb {A}_{rM_{rN}}(x_{r})^{T})^{T}, \ \ r=1,\ldots ,d_{2}. \end{array} \end{aligned}$$

Let $\pmb {a}_{r\nu }=(a_{r\nu 0},a_{r\nu 1},\cdots ,a_{r\nu p_{r}})^{T}$ for $\nu =1,\ldots ,M_{rN}-1$, $\pmb {a}_{rM_{rN}}=(a_{rM_{rN} 1},\cdots , a_{rM_{rN} p_{r}})^{T}$ and $\pmb {a}_{r}=(\pmb {a}_{r1}^{T},\cdots ,\pmb {a}_{rM_{rN}}^{T})^{T}$. We use $\vec {f}_{r}(x_{r})=\pmb {A}_{r}^{T}(x_{r})\pmb {a}_{r}$ to approximate $f_{r}(x_{r})$. Note that $\vec {f}_{r}(x_{r})$ is a piecewise polynomial of degree $p_{r}$ and satisfies that $\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\vec {f}_{r}(X_{ijr})=0$. Let ${\bar{Y}}=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}Y_{ij}$ and $\bar{\pmb {Z}}=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {Z}_{ij}$ be the estimator of E(Y) and $E(\pmb {Z})$ respectively. Based on spatial observations $\{(Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij}): 1\le i\le m; 1\le j\le n\}$, we use ${\bar{Y}}-\bar{\pmb {Z}}^{T}\pmb {\beta }$ to approximate $\mu $ and solve the following minimization problem

$$\begin{aligned} \begin{array}{l} \min \sum _{i=1}^{m}\sum _{j=1}^{n}\{Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\pmb {\beta }-\sum _{r=1}^{d_{2}}\pmb {A}_{r}^{T} (X_{ijr})\pmb {a}_{r}\}^{2} \end{array} \end{aligned}$$

(2.1)

with respect to the $\pmb {\beta }, \pmb {a}_{r}$. Denote $ {\bar{Y}}_{ij}=Y_{ij}-{\bar{Y}}$, $\bar{\pmb {Z}}_{ij}=\pmb {Z}_{ij}-\bar{\pmb {Z}}$, $\pmb {A}(\pmb {X}_{ij}) =(\pmb {A}_{1}^{T}(X_{ij1}), \ldots ,\pmb {A}_{d_{2}}^{T}(X_{ijd_{2}}))^{T}$ and $\pmb {a}=(\pmb {a}_{1}^{T},\ldots ,\pmb {a}_{d_{2}}^{T})^{T}$. Then the estimator $\hat{\pmb {\beta }}$ of $\pmb {\beta }_{0}$ is given by

$$\begin{aligned} \hat{\pmb {\beta }}=\pmb {\Gamma }_{N}^{-1}\Big [\sum _{i=1}^{m}\sum _{j=1}^{n} \bar{\pmb {Z}}_{ij}{\bar{Y}}_{ij} -\pmb {W}_{N}\pmb {A}_{N}^{-1}\big (\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij}) {\bar{Y}}_{ij}\big )\Big ], \end{aligned}$$

(2.2)

where

$$\begin{aligned} \pmb {\Gamma }_{N}= & {} \sum _{i=1}^{m}\sum _{j=1}^{n}\bar{\pmb {Z}}_{ij}\bar{\pmb {Z}}_{ij}^{T} -\pmb {W}_{N} \pmb {A}_{N}^{-1}\pmb {W}_{N}^{T}, \end{aligned}$$

(2.3)

$$\begin{aligned} \pmb {W}_{N}= & {} \sum _{i=1}^{m}\sum _{j=1}^{n}\bar{\pmb {Z}}_{ij}\pmb {A}(\pmb {X}_{ij})^{T}, \ \ \ \pmb {A}_{N}=\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})\pmb {A} (\pmb {X}_{ij})^{T}. \end{aligned}$$

(2.4)

The estimator $\tilde{\pmb {a}}=(\tilde{\pmb {a}}_{1}^{T},\ldots ,\tilde{\pmb {a}}_{d_{2}}^{T})^{T}$ of $\pmb {a}$ is given by

$$\begin{aligned} \tilde{\pmb {a}}=\pmb {A}_{N}^{-1}\Big (\sum _{i=1}^{m}\sum _{j=1}^{n} \pmb {A}(\pmb {X}_{ij})\big ({\bar{Y}}_{ij}- \bar{\pmb {Z}}_{ij}^{T}\hat{\pmb {\beta }}\big )\Big ) \end{aligned}$$

(2.5)

and piecewise polynomial estimator ${\tilde{f}}_{r}(x_{r})$ of $f_{r}(x_{r})$ is given by

$$\begin{aligned} {\tilde{f}}_{r}(x_{r})=\pmb {A}_{r}^{T}(x_{r})\tilde{\pmb {a}}_{r}, \ \ x_{r}\in [c_{r1},c_{r2}]. \end{aligned}$$

The estimator of $\mu $ is given by

$$\begin{aligned} {\hat{\mu }}={\bar{Y}}-\bar{\pmb {Z}}^{T}\hat{\pmb {\beta }}. \end{aligned}$$

(2.6)

Since ${\tilde{f}}_{r}(x_{r})$ is only piecewise smooth, it may not be appealing to use this piecewise polynomial directly as the estimate of the function $f_{r}(x_{r})$. After we estimated $\pmb {\beta }_{0}$, for a given $x_{0r}\in [c_{r1},c_{r2}]$, for $x_{r}$ in the neighborhood of $x_{0r}$, we use $b_{r0} + b_{r1}(x_{r}-x_{0r})$ to approximate the unknown coefficient function $f_{r}(x_{r})$. We then solve the following minimization problem

$$\begin{aligned}&\min \sum _{i=1}^{m}\sum _{j=1}^{n}\{Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\hat{\pmb {\beta }}-\sum _{k\ne r}{\tilde{f}}_{k}(X_{ijk})-[b_{r0} + b_{r1}(X_{ijr}-x_{0r})]\}^{2}K\nonumber \\&\quad \big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \end{aligned}$$

(2.7)

with respect to $b_{r0}, b_{r1}$, where $K(\cdot )$ is a given kernel function and $h_{r}$ is a chosen bandwidth. Let $\pmb {b}_{r}=(b_{r0}, h_{r}b_{r1})^{T}$ and $\pmb {B}_{ijr}=(1,(X_{ijr}-x_{0r})/h_{r})^{T}$. Then the estimator of $\pmb {b}_{r}$ is given by

$$\begin{aligned} \hat{\pmb {b}}_{r}=\Big (\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr} -x_{0r}}{h_{r}}\big )\pmb {B}_{ijr}\pmb {B}_{ijr}^{T}\Big )^{-1} \Big (\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \pmb {B}_{ijr}{\tilde{Y}}_{ijr}\Big ), \end{aligned}$$

(2.8)

where

$$\begin{aligned} {\tilde{Y}}_{ijr}=Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\hat{\pmb {\beta }}-\sum _{r'\ne r}{\tilde{f}}_{r'}(X_{ijr'}). \end{aligned}$$

(2.9)

The estimator of $f_{r}(x_{0r})$ is given by ${\hat{f}}_{r}(x_{0r})={\hat{b}}_{r0}=(1,0)\hat{\pmb {b}}_{r}$ for $r=1,\ldots ,d_{2}$.

To implement our estimation method, appropriate values $M_{rN}$ and $h_{r}$ are necessary. Here, $M_{rN}$ are mainly used to estimate the parameters $\pmb {\beta }_{0}$, for simplicity, we set $M_{1N}=\cdots =M_{dN}=M_{N}$. The value for $M_{N}$ can be selected by the following BIC information criterion:

$$\begin{aligned} BIC(M_{N})= & {} \log \{\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}[Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\pmb {{\hat{\beta }}}\nonumber \\&-\sum _{r=1}^{d_2} \pmb {A}_{r}^{T}(X_{ijr})\pmb {{\hat{a}}}_{r}]^{2}\} +M_{N}\log N/(2N). \end{aligned}$$

(2.10)

Large values of BIC indicate poor fits.

The bandwidth $h_{r}$ can be selected by the following cross-validation (CV) score:

$$\begin{aligned} CV(h_{r})=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\Big (Y_{ij} -{\hat{\mu }}^{-ij}-\pmb {Z}_{ij}^{T}\hat{\pmb {\beta }}^{-ij}-\sum _{r'\ne r}{\tilde{f}}_{r'}^{-ij}(X_{ijr'})-{\hat{f}}_{r}^{-ij}(X_{ijr})\Big )^{2}, \nonumber \\ \end{aligned}$$

(2.11)

where ${\hat{\mu }}^{-ij}, \hat{\pmb {\beta }}^{-ij},{\tilde{f}}_{r'}^{-ij}(X_{ijr'})$ and ${\hat{f}}_{r}^{-ij}(X_{ijr})$ are estimated by removing the (i,j)th observation.

3 Asymptotic properties of the estimators

In this section, we shall describe the asymptotic properties of the estimators $\hat{\pmb {\beta }}$ and ${\hat{f}}_{r}(x_{r}), r=1,\ldots ,d_{2}$. We first list following assumptions.

1.
The random field $\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in {\mathbf {E}}^{2}\}$ is strictly stationary. The marginal density $g_{r}(x_{r})$ of $X_{ijr}$ is continuous and bounded away from 0 uniformly over $[c_{r1}, c_{r2}]$ for $r=1,\ldots ,d_{2}$.
2.
$f_{r}(x_{r})\in C^{p_{r}+1}[c_{r1}, c_{r2}]$ for $r=1,\ldots ,d_{2}$.
3.
$E\Vert \pmb {Z}_{ij}\Vert ^{4+2\delta }<\infty $ and $E|\varepsilon _{ij}|^{4+2\delta }<\infty $ for some $\delta >0$, where $\varepsilon _{ij}=Y_{ij}-(\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0} +\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))$.
4.
The function $\psi (\cdot , \cdot )$ and $\varphi $ satisfy that $\psi (m,n)\le \min (m,n)$ and
$$\begin{aligned} \lim _{k\rightarrow \infty }k^{\gamma }\sum _{t=k}^{\infty }t[\varphi (t)]^{\delta /(2+\delta )} =0 \end{aligned}$$
for some constant $\gamma >(4+\delta )/(2+\delta )$.
5.
$M_{N}^{4+2\delta /(4+\delta )}/N\rightarrow 0$ and $NM_{rN}^{-4(p_{r}+1)}\rightarrow 0$, where $M_{N}=\max _{1\le r\le d_{2}}M_{rN}$.
6.
$\min \{m,n\}\rightarrow \infty $ and there exist two sequences of positive integer vectors, ${\mathbf {p}}_{m,n}=(p_{1},p_{2})\in {\mathbf {E}}^{2}$ and ${\mathbf {q}}_{m,n}=(q,q)\in {\mathbf {E}}^{2}$, with $q\rightarrow \infty $ such that $q/p_{1}\rightarrow 0$, $q/p_{2}\rightarrow 0$ and $m/p_{1}\rightarrow \infty $, $n/p_{2}\rightarrow \infty $, and $p_{N}=p_{1}p_{2}=o((Nh_{r})^{1/2})$, $N\varphi (q)\rightarrow 0$.
7.
The kernel function $K(\cdot )\ge 0$ is a bounded symmetric function with a compact support.

Assumption 1 is standard in this context; it has been used, for instance, by Gao et al. (2006) in the spatial context. The constants $\gamma $ and $\delta $ in assumption 4 may be interpreted as two indices of the mixing dependence. Generally, larger $\gamma $ and smaller $\delta $ indicate less dependence. If $\varphi (t)=O(t^{-\kappa })$ for some $\kappa >4(3+\delta )/\delta $ or $\varphi (t)=O(e^{-\iota t})$ for some $\iota >0$, then assumption 4 holds. Assumptions 6 is used to derive the asymptotic normality of the estimators.

Let ${\mathcal {F}}$ denote the class of additive functions such that $F(\pmb {x})\in {\mathcal {F}}$ if $F(\pmb {x}) =\mu +\sum _{r=1}^{d_{2}}F_{r}(x_{r})$ and $F_{r}(x_{r})$ satisfies assumption 2 and $E(F_{r}(X_{r}))=0$ for $r=1,\ldots ,d_{2}$. To obtain the asymptotic distribution of $\hat{\pmb {\beta }}$, we first need to adjust for the dependence of $\pmb {Z}_{ij}$ and $\pmb {X}_{ij}$, which is a common complication in semiparametric models. Let

$$\begin{aligned} f_{k}(\pmb {X}_{ij})=\mu _{k}+\sum _{r=1}^{d_{2}}f_{kr}(X_{ijr}) =arginf_{F(\pmb {x})\in {\mathcal {F}}}E[Z_{ijk}-F(\pmb {X}_{ij})]^{2} \end{aligned}$$

and $\theta _{k}(\pmb {X}_{ij})=E(Z_{ijk}|\pmb {X}_{ij})$. Since

$$\begin{aligned} E[Z_{ijk}-f_{k}(\pmb {X}_{ij})]^{2} =E[Z_{ijk}-\theta _{k}(\pmb {X}_{ij})]^{2}+E[\theta _{k}(\pmb {X}_{ij}) -f_{k}(\pmb {X}_{ij})]^{2}, \end{aligned}$$

therefore, $f_{k}(\pmb {X}_{ij})$ are the projections of $\theta _{k}(\pmb {X}_{ij})$ onto the additive functional space ${\mathcal {F}}$ (under the $L_{2}$-norm). In other words, $f_{k}(\pmb {X}_{ij})$ is an element that belongs to ${\mathcal {F}}$ and it is the closest function to $\theta _{k}(\pmb {X}_{ij})$ among all the functions in ${\mathcal {F}}$, for any $k =1,\ldots ,d_{1}$. Let $V_{ijk}=Z_{ijk}-f_{k}(\pmb {X}_{ij})$ and $\pmb {V}_{ij}=(V_{ij1},\ldots ,V_{ijd_{1}})^{T}$. Set $\pmb {\Gamma }=E(\pmb {V}_{ij}\pmb {V}_{ij}^{T})$ and $\pmb {\Omega }=\sum _{i=-\infty }^{+\infty }\sum _{j=-\infty }^{+\infty }E[\pmb {V}_{00}\pmb {V}_{ij}^{T}\varepsilon _{00}\varepsilon _{ij}]$. The following theorem gives the asymptotic distribution of the estimator $\hat{\pmb {\beta }}$.

Theorem 3.1

Suppose that assumptions 1–6 hold. Then

$$\begin{aligned} \sqrt{N}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\overset{d}{\rightarrow }N (O,\pmb {\Gamma }^{-1}\pmb {\Omega }\pmb {\Gamma }^{-1}). \end{aligned}$$

(3.1)

Let $\mu _{k}=\int x^{k}K(x)dx$, $\nu _{k} =\int x^{k}K^{2}(x)dx$ for $k=0, 1, \ldots $, and $\sigma _{r}^{2}(x_{r})=E(\varepsilon _{ij}^{2}|X_{ijr}=x_{r})$. The following Theorem 3.2 gives the asymptotic distribution of the estimators ${\hat{f}}_{r}(x_{r})$ for $r=1,\ldots ,d_{2}$.

Theorem 3.2

Suppose that assumptions 1–7 hold and $\sigma _{r}^{2}(x_{r})$ is continuous in some neighborhood of $x_{0r}$ and $M_{N}^{1+\delta /(4+\delta )}h_{r}\le C_{2}$ for some positive constant $C_{2}$. If $x_{0r}$ is an interior point of $[c_{r1}, c_{r2}]$, then,

$$\begin{aligned} \sqrt{Nh_{r}}\big ({\hat{f}}_{r}(x_{0r})-f_{r}(x_{0r})-\frac{\mu _{2}h_{r}^{2}}{2\mu _{0}}f''_{r}(x_{0r})\big )\overset{d}{\rightarrow }N\big (0,\frac{\nu _{0} \sigma _{r}^{2}(x_{0r})}{\mu _{0}^{2}g_{r}(x_{0r})}\big ). \end{aligned}$$

(3.2)

The result of Theorem 3.2 indicates that the estimator of $f_{r}(x_{0r})$ in partially linear additive model has the same asymptotic distribution as the estimator in spatial univariate nonparametric regression.

4 Simulations

We now study the finite sample performance of our proposed method in Sect. 2 using a simulation spatial dataset. We shall use the following spatial partially linear additive model. Let $\{\varepsilon _{ij}^{(k)}: (i,j)\in {\mathbf {E}}^{2}\}, k=1,2,3$ be three mutually independent i.i.d. N(0, 1) white-noise processes and $\{\varepsilon _{ij}^{(4)}=(\varepsilon _{ij1}^{(4)},\varepsilon _{ij2}^{(4)}, \varepsilon _{ij3}^{(4)})^{T}: (i,j)\in {\mathbf {E}}^{2}\}$ be an i.i.d. $N(0, \pmb {\Theta })$ process with $\pmb {\Theta }=(\theta _{kk'})_{3\times 3}$ and $\theta _{kk'}=\exp (-|k-k'|/3)$. Let $\{\varepsilon _{ij}^{(5)}: (i,j)\in {\mathbf {E}}^{2}\}$ be an i.i.d. N(0, 0.1) white-noise process. Let

$$\begin{aligned} Y_{ij}=\mu +Z_{ij1}\beta _{1}+Z_{ij2}\beta _{2}+Z_{ij3}\beta _{3}+\sum _{r=1}^{3}f_{r} (X_{ijr})+\varepsilon _{ij} \end{aligned}$$

(4.1)

with $\mu =2.4$, $\beta _{1}=1.6$, $\beta _{2}=-2.5$, $\beta _{3}=3.1$ and $f_{r}(X_{r})=f_{0r}(X_{r})-E(f_{0r}(X_{r}))$, where

$$\begin{aligned} f_{01}(X_{1})= & {} 1.2(X_{1}-0.9)^{2}, f_{02}(X_{2})=-2.1\sin (X_{2}+1.5), f_{03}(X_{3})\\= & {} 1.8\cos (3X_{3}-0.6). \end{aligned}$$

We set $Z_{ij1}=X_{ij1}^{2}-2X_{ij2}+\varepsilon _{ij}^{(1)}$, $Z_{ij2}=X_{ij1}+X_{ij2}^{2}-0.6X_{ij3}^{2}+\varepsilon _{ij}^{(2)}$, $Z_{ij3}=0.9+\sin (2X_{ij2}-X_{ij3})+\varepsilon _{ij}^{(3)}$, $X_{ij1}=(U_{ij1}+U_{ij2})/2$, $X_{ij2}=(U_{ij2}+U_{ij3})/2$, $X_{ij3}=(U_{(ij1}+U_{ij3})/2$. $\{U_{ijr}: (i,j)\in {\mathbf {E}}^{2}\}, r=1,2,3$ are generated by the spatial autoregression

$$\begin{aligned} U_{ijr}=\sin (U_{(i-1)jr}+U_{i(j-1)r}+U_{(i+1)jr}+U_{i(j+1)r}) +\varepsilon _{ijr}^{(4)}, \end{aligned}$$

and $\{\varepsilon _{ij}: (i,j)\in {\mathbf {E}}^{2}\}$ is generated by the spatial autoregression $\varepsilon _{ij}=(\varepsilon _{(i-1)j}+\varepsilon _{i(j-1)}+\varepsilon _{(i+1)j} +\varepsilon _{i(j+1)})/5+\varepsilon _{ij}^{(5)}.$

Data were simulated from this model over a rectangular domain of $m\times n$ sites—more precisely, over a grid of the form $\{(i,j): 76\le i\le 75+m,76\le j\le 75+n\}$, for various values of m and n. Each replication was obtained iteratively by the steps given in Hallin et al. (2004). 500 simulated spatial data sets are independently generated. For each simulated data set, the estimators of $\beta _{r},r=1,2,3$ and $\mu $ were computed by (2.2) with $p_{r}=1$ and (2.6). The number of subinterval $M_{1N}=M_{2N}=M_{3N}=M_{N}$ were determined by BIC criterion given by (2.10).The estimators of $f_{r}(x_{r}),r=1,2,3$ were computed by (2.8) with Epanechnikov kernel. Bandwidth $h_{r}$ was select by “leave-one-out” cross-validation procedure given by (2.10).

Table 1 summaries the mean square errors (MSE) of the estimators ${\hat{\mu }}$ and ${\hat{\beta }}_{r}$ for $r=1,2,3$ and the mean integrated squared errors (MISE) of the estimators ${\hat{f}}_{r}(x_{r})$ for $r=1,2,3$ based on 500 simulations for different m and n. Figure 1 depicts the actual functions $f_{r}(x_{r}),r=1,2,3$ and the mean estimated curves over 500 simulations with $m=20,n=20$ and their $95\%$ pointwise confidence bands. We see from Table 1 and Fig. 1 that our estimation procedure gives satisfactory results ever if for small m and n.

Table 1 MSEs and MISEs of the estimators based on regular grid data

Full size table

We now compare the proposed method (PM) with the B-spline method (BM) given in Huang et al. (2010). In B-spline method, cubic splines with equally spaced knots are used to approximate the additive functions $f_{r}(u_{r}), r=1,2,3,$ the smoothing parameter $K_{n}$ were determined by BIC criterion. For different m and n, Table 2 reports the MSEs of parametric estimators ${\hat{\mu }}$ and ${\hat{\beta }}_{r},r=1,2,3$ and the mean of the weighted average squared error $WASE_{r}$ of the estimators ${\hat{f}}_{r}(x_{r}),r=1,2,3$ based on 500 simulations, where $WASE_{r}$ is defined as

$$\begin{aligned} WASE_{r}=\frac{1}{mn+1-2[0.025mn]}\sum _{i=[0.025mn]}^{mn -[0.025mn]}\frac{[{\hat{f}}_{r}(x_{(ir)})-f_{r}(x_{(ir)})]^{2}}{\text{[ }{range}(f_{r})]^{2}}, \end{aligned}$$

$x_{(1r)}\le x_{(2r)}\le \cdots \le x_{(mnr)}$ is a permutation of $\{x_{ijr}: 76\le i\le 75+m,76\le j\le 75+n\}$ and range$(f_{r})$ is the range of the function $f_{r}(x_{r})$. Since the data near the boundary points are sparse, the estimators near the boundary points become poor. So we removed these points from $WASE_{r}$. Based on 500 simulations with $m=20,n=10$, Fig. 2 displays the boxplots for squared errors (SE) of parametric estimators ${\hat{\mu }}$ and ${\hat{\beta }}_{r},r=1,2,3$ and $WASE_{r}$ of the estimators ${\hat{f}}_{r}(x_{r}),r=1,2,3$. We see from Table 2 and Fig. 2 that there are no obvious differences between the parametric estimators of two methods, but the estimators of additive components $f_{r}(x_{r}), r=1,2,3$ obtained by the proposed method outperform that obtained by B-spline method. Because the number of knots of all the B-spline functions approximating the additive components $f_{r}(u_{r}), r=1,2,3$ is the same, when additive functions have different degrees of smoothness, some estimators will become inefficient. In the proposed method, we first obtain the estimators of $\pmb {\beta }_{0}$ and the piecewise polynomial estimators of $f_{r}(x_{r}), r=1,2,3$, the local linear estimator of $f_{r}(u_{r})$ is obtained by solving the univariate regression problem and the bandwidth $h_{r}$ is easily chosen.

Table 2 Comparison of the proposed method with B-spline method

Full size table

To investigate the performances of the estimators under different $M_{n}$, Table 3 displays the simulation results based on 500 simulations with $m=20, n=10$. Table 3 shows that the MSEs for parametric estimators and the WASEs for nonparametric estimators are not sensitive to the choice of $M_{n}$ and these estimators are efficient under broad ranges of $M_{n}$. These results also illustrate that the above BIC smoothing parameter selection procedure generally gives the satisfactory results.

Table 3 The MSEs and WASEs for the estimators with different $M_{N}$

Full size table

In order to investigate the influence of degree of the piecewise polynomial on the estimators, based on 500 simulations, Table 4 reports the simulation results under $q_{r}=1,2,3$, which correspond to piecewise linear function, piecewise quadratic polynomial and piecewise cubic polynomial respectively. We see from Table 4 that there is no essential difference between the three estimates. Of course, in comparison, piecewise linear function is easier to operate.

Table 4 The MSEs and WAESs for the estimators with different $p_{r}$

Full size table

In order to investigate the performances of our estimators based on irregular grid data, we first generate data over the rectangular grid of $m\times n$ sites. We then randomly selected half of the data to fit the model, that is, each site has a probability 0.5 of being selected. The final sample size is about mn/2. For different m and n, Table 5 summaries the MSEs of the parametric estimators and the MISEs of the nonparametric estimators ${\hat{f}}_{r}(x_{r})$ for $r=1,2,3$ based on 500 simulations. Comparing Table 5 with Table 1, we see that there is no clear difference between two estimators of $\mu $ and $\beta _{r}, r=1,2,3$. When the sample size is small (N=100,200), the estimators of $f_{r}(x_{r})$ for $r=1,2,3$ based on regular grid data is slightly better than that based on irregular grid data, but the sample size is large (N=300,400,600), there is no clear difference between two estimators of $f_{r}(x_{r})$ for $r=1,2,3$. All these show that our estimation method is also efficient for irregular grid data.

Table 5 MSEs and MISEs of the estimators based on irregular grid data

Full size table

5 A real data example

In this section, we apply our methods to analyze a real estate data set which includes the real estate data for 203 second-, third-and fourth-tier cities in China in 2016. Our purpose is to study the relationship between urban housing prices and their influencing factors. The response variable is urban housing price (Price). The covariates of primary interests include the average annual income of urban residents (Income), urban category, urban population, urban GDP, the ratio of no house and one house (RNO), bank interest rate (BIR), urban livability index (ULI), urban comprehensive competitiveness (UCC) and urban development index (UDI). We note that among these variables the data of some variables such as population and GDP are very large, whereas that of some variables such as ULI and UCC are small. For this purpose, for each data of these variables, we first make the following modification: each observation was divided by the maximum of all observations for this variable, so that the maximum of modified data of the variable is 1.

We first consider the following partially linear additive regression

$$\begin{aligned} \begin{array}{ll} \log (Price) &{} =\mu +\beta _{1}Z_{1}+\beta _{2}Z_{2}+f_{1}(Income)+f_{2}(Population)+f_{3}(GDP) \\ &{} +f_{4}(RNO)+f_{5}(BIR)+f_{6}(ULI)+f_{7}(UCC)+f_{8}(UDI)+\varepsilon , \\ \end{array} \nonumber \\ \end{aligned}$$

(5.1)

where $Z_{1}=1$, $Z_{2}=0$ stands for second-tier city, $Z_{1}=0$ and $Z_{2}=1$ stands for third-tier city and $Z_{1}=0$ and $Z_{2}=0$ stands for fourth-tier city.

The estimators of $\beta _{1}$ and $\beta _{2}$ are computed by (2.2) with $p_{r}=1$, the estimators of $\mu $ is computed by (2.6). The number of subinterval $M_{N}$ are determined by BIC criterion given in Sect. 2. The estimators of $f_{r}(\cdot ),r=1,\ldots ,8$ are computed by (2.8) with Epanechnikov kernel. Bandwidth $h_{r}$ are selected by “leave-one-out” cross-validation procedure. Table 6 exhibits the parametric estimators, and Fig. 3 shows the estimated curves of the additive components and their $95\%$ pointwise confidence intervals. The fact that ${\hat{\beta }}_{1}>{\hat{\beta }}_{2}>0$ in Table 6 indicates that the housing price for a second-tier city is larger than that for a third-tier city and the housing price for a third-tier city is larger than that for a fourth-tier city. We see from Fig. 3 that the estimated curves ${\hat{f}}_{1}(Income)$, ${\hat{f}}_{4}(RNO)$ and ${\hat{f}}_{8}(UDI)$ show rising trends as Income, RNO and UDI increase, while ${\hat{f}}_{5}(BIR)$ show decreasing trends as BIR increase. Figure 3 shows the subtle impact of the urban population and urban livability index (ULI) on housing prices, which overall declines as the urban population increases, while housing prices overall rise along with the urban livability index. This can be explained that some central and western cities, such as Fuyang, Shangqiu, Nanyang and Handan, have large populations, while housing prices in these cities are low. Figure 3 shows that ${\hat{f}}_{3}(GDP)$ appears the changing trend of decreasing first and then rising, this is due to lower house prices in some central and western cities, and higher prices in some eastern cities with high GDP. Figure 3 also shows that ${\hat{f}}_{7}(UCC)$ appears the changing trend of rising first and then decreasing.

Table 6 The parametric estimators for model (5.1). The values in parenthesis are standard error

Full size table

We find from Fig. 3 that the estimated curves of $f_{2}(Population),f_{4}(RNO)$ and $f_{6}(ULI))$ approximate straight line. We then further construct the following partially linear additive regression

$$\begin{aligned} \begin{array}{ll} \log (Price) &{} =\mu +\beta _{1}Z_{1}+\beta _{2}Z_{2}+\beta _{3}Population+\beta _{4}RNO+\beta _{5}ULI \\ &{} +f_{1}(Income)+f_{2}(GDP)+f_{3}(BIR)+f_{4}(UCC)+f_{5}(UDI)+\varepsilon , \\ \end{array} \end{aligned}$$

(5.2)

Table 7 displays the parametric estimators and Fig. 4 shows the estimated curves of the additive components and their $95\%$ pointwise confidence intervals in model (5.2). We see from Table 7 that $\log (Price)$ is negatively associated with urban population and positively associated with the ratio of no house and one house and urban livability index. Since the absolute values of ${\hat{\beta }}_{3}$ and ${\hat{\beta }}_{5}$ are small, it is shown to have a small effect on $\log (Price)$. Figure 4 shows that the estimated curves in model (5.2) are similar to that in model (5.1).

Table 7 The parametric estimators for model (5.2). The values in parenthesis are standard error

Full size table

To evaluate the prediction performance of model (5.1) and model (5.2), we applied leave-one-out cross-validation to the data; i.e., when predicting the housing price for the ith city, we omit the data for this city when fitting the models. Figure 5 displays the boxplots for the absolute prediction errors $|\widehat{\log (Price_{i})}-\log (Price_{i})|,\ i=1,\ldots ,203,$ for models (5.1) and (5.2), where $Price_{i}$ denotes the housing price for the ith city. The mean values of these errors for the two models are 0.3050 and 0.2031 respectively. We see from Fig. 5 and the mean absolute prediction errors that model (5.2) behaviors better than model (5.1) in prediction performance.

6 Conclusion and future studies

We have proposed a new estimation procedure for partially linear additive regressions with spatial data using piecewise polynomials combined with local linear methods. The proposed method is suitable for high dimensional data, there is no need to solve the restricted minimization problem and no iterative algorithms are needed. Under mild regularity assumptions, the asymptotic distribution of the estimator of the unknown parameter vector is established, the asymptotic distributions of the estimators of the unknown functions are also derived.

We method focuses on data on a regular grid. Simulation studies in Sect. 4 demonstrate the proposed procedure is also efficient for data irregularly positioned, but the relevant theoretical properties will need to be developed, which are left for future research. As pointed out by a referee, geoadditive models in which spatial effect is modeled via a bivariate function have been proposed and investigated in literature. Our method can be extended to geoadditive models. We leave such extensions to future work.

References

Al-Sulami D, Jiang Z, Lu Z, Zhu J (2017) Estimation for semiparametric nonlinear regression of irregularly located spatial time-series data. Econom Stat 2:22–35
MathSciNet Google Scholar
Athreya KB, Pantala SG (1986) A note on strong mixing of ARMA processes. Stat Probab Lett 4:187–190
Article MathSciNet Google Scholar
Brieman L, Friedman J (1985) Estimating optimal transformations for multiple regression and correlation (with discussion). J Am Stat Assoc 80:580–619
Article Google Scholar
Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models (with discussion). Ann Stat 17:453–555
MATH Google Scholar
Cheng G, Zhou L, Huang J (2014) Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 20:141–163
Article MathSciNet MATH Google Scholar
Cressie NAC (1991) Statistics for spatial data. Wiley, New York
MATH Google Scholar
Fan Y, Li Q (2003) A kernel-based method for estimating additive partially linear models. Stat Sin 13:739–62
MathSciNet MATH Google Scholar
Friedman J, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823
Article MathSciNet Google Scholar
Gao J, Lu Z, Tjøstheim D (2006) Estimation in semiparametric spatial regression. Ann Stat 34:1395–1435
Article MathSciNet MATH Google Scholar
Hallin M, Lu Z, Tran LT (2001) Density estimation for spatial linear processes. Bernoulli 7:657–668
Article MathSciNet MATH Google Scholar
Hallin M, Lu Z, Tran LT (2004) Local linear spatial regression. Ann Stat 32:2469–2500
Article MathSciNet MATH Google Scholar
Huang J, Horowitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Article MathSciNet MATH Google Scholar
Lee YK, Choi H, Park BU, Yu KS (2004) Local likelihood density estimation on random fields. Stat Probab Lett 68:347–57
Article MathSciNet MATH Google Scholar
Li Q (2000) Efficient estimation of additive partially linear models. Int Econ Rev 41:1073–1092
Article MathSciNet Google Scholar
Lian H, Liang H, Ruppert D (2015) Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Stat Sin 25:591–607
MathSciNet MATH Google Scholar
Liang H, Thurston S, Ruppert D, Apanasovich T, Hauser R (2008) Additive partial linear models with measurement errors. Biometrika 95:667–678
Article MathSciNet MATH Google Scholar
Nandy S, Lim C, Maiti T (2017) Additive model building for spatial regression. J R Stat Soc B 79:779–800
Article MathSciNet MATH Google Scholar
Ramsay T (2002) Spline smoothing over difficult regions. J R Stat Soc B 64:307–319
Article MathSciNet MATH Google Scholar
Ripley B (1981) Spatial statistics. Wiley, New York
Book MATH Google Scholar
Rosenblatt M (1956) A central limit theorem and a strong mixing condition. Proc Natl Acad Sci USA 42:43–47
Article MathSciNet MATH Google Scholar
Sherwood B, Wang L (2016) Partially linear additive quantile regression in ultra-high dimension. Ann Stat 44:288–317
Article MathSciNet MATH Google Scholar
Stone C (1985) Additive regression and other nonparametric models. Ann Stat 13:689–705
Article MathSciNet MATH Google Scholar
Tang Q, Cheng L (2009) B-spline estimation for varying coefficient regression with spatial data. Sci Chin Ser A 52(11):2321–2340
Article MathSciNet MATH Google Scholar
Tran LT (1990) Kernel density estimation on random field. J Multivar Anal 34:37–53
Article MathSciNet MATH Google Scholar
Yang J, Yang H, Lu F (2019) Rank-based shrinkage estimation for identification in semiparametric additive models. Stat Pap 60:1255–1281
Article MathSciNet MATH Google Scholar
You J, Zhou X (2013) Efficient estimation in panel data partially additive linear model with serially correlated errors. Stat Sin 23:271–303
MathSciNet MATH Google Scholar
Yu S, Wang G, Wang L, Liu C, Yang L (2020) Estimation and Inference for Generalized Geoadditive Models. J Am Stat Assoc 115:761–774
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Economics and Management, Nanjing University of Science and Technology, Nanjing, 210094, China
Tang Qingguo & Chen Wenyu

Authors

Tang Qingguo
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wenyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tang Qingguo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

In this section, let $C>0$ denote a generic constant of which the value may change from line to line. For a matrix $B=(b_{ij})$, set $\Vert B\Vert _{\infty }=\max _{i}\sum _{j}|b_{ij}|$ and $|B|_{\infty }=\max _{i,j}|b_{ij}|$. For a vector $v=(v_{1},\ldots ,v_{k})^{T}$, set $\Vert v\Vert _{\infty }=\sum _{j=1}^{k}|v_{j}|$ and $|v|_{\infty }=\max _{1\le j\le k}|v_{j}|$.

Let $f_{r\nu }(x_{r})=\chi _{r\nu }(x_{r})f_{r}(x_{r})$, ${\bar{f}}_{r\nu }=(\sum _{i=1}^{m}\sum _{j=1}^{n}f_{r\nu }(X_{ijr})) /(\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr}))$,

$$\begin{aligned} {\check{f}}_{r\nu }(x_{r})=f_{r}(x_{r\nu })\breve{A}_{r\nu 0}(x_{r})+h_{r0}f_{r}'(x_{r\nu })\breve{A}_{r\nu 1}(x_{r})+\ldots +h_{r0}^{p_{r}}f_{r}^{(p_{r})}(x_{r\nu })\breve{A}_{r\nu q_{r}}(x_{r})/p_{r}!, \end{aligned}$$

$\bar{{\check{f}}}_{r\nu }=(\sum _{i=1}^{m}\sum _{j=1}^{n}{\check{f}}_{r\nu }(X_{ijr})) /(\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr}))$ and $f_{r\nu }^{*}(X_{ijr})=[f_{r\nu }(X_{ijr})-\chi _{r\nu }(X_{ijr}){\bar{f}}_{r\nu }] -[{\check{f}}_{r\nu }(X_{ijr})-\chi _{r\nu }(X_{ijr})\bar{{\check{f}}}_{r\nu }]$. Noting that $f_{r}(x_{r})=\sum _{\nu =1}^{M_{rN}}f_{r\nu }(x_{r})$ and $f_{r\nu }(X_{ijr})=\chi _{r\nu }(X_{ijr}){\bar{f}}_{r\nu }+[{\check{f}}_{r\nu }(X_{ijr}) -\chi _{r\nu }(X_{ijr})\bar{{\check{f}}}_{r\nu }]+f_{r\nu }^{*}(X_{ijr})$, we get that

$$\begin{aligned} \begin{array}{ll} f_{r}(X_{ijr})&=\pmb {A}_{r}^{T}(X_{ijr})\pmb {a_{0r}} +{\bar{F}}_{rM_{rN}}\chi _{rM_{rN}}(X_{ijr})+\sum _{\nu =1}^{M_{rN}}f_{r\nu }^{*} (X_{ijr}), \end{array} \end{aligned}$$

(A.1)

where $\pmb {a}_{0r}=(\pmb {a}_{0r1}^{T},\cdots ,\pmb {a}_{0rM_{rN}}^{T})^{T}$ with $\pmb {a}_{0r\nu }=({\bar{f}}_{r\nu },h_{r0}f_{r}'(x_{r\nu }),\ldots ,h_{r0}^{p_{r}}f_{r}^{(p_{r})} (x_{r\nu })/p_{r}!)^{T}$ for $\nu =1,\ldots ,M_{rN}-1$ and $\pmb {a}_{0rM_{rN}}=(h_{r0}f_{r}'(x_{rM_{rN}}),\ldots ,h_{r0}^{p_{r}}f_{r}^{(p_{r})}(x_{rM_{rN}})/p_{r}!)^{T}$, ${\bar{F}}_{rM_{rN}}=(\sum _{i=1}^{m}\sum _{j=1}^{n}f_{r}(X_{ijr}))/(\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr}))$. Let $f_{kr\nu }(x_{r})=\chi _{r\nu }(x_{r})f_{kr}(x_{r})$, $f_{kr\nu }^{*}(X_{ijr}), k=1,\ldots ,d_{1}$ and ${\bar{F}}_{krM_{rN}}$ are defined similarly as $f_{r\nu }^{*}(X_{ijr})$ and ${\bar{F}}_{rM_{rN}}$. Denote $\vec {Y}={\bar{Y}}-E(Y)$, $\vec {\pmb {Z}}=\bar{\pmb {Z}}-E(\pmb {Z})$, $\breve{\pmb {Z}}_{ij}=(\breve{Z}_{ij1},\ldots ,\breve{Z}_{ijd_{1}})^{T}$ with

$$\begin{aligned} \breve{Z}_{ijk}=\sum _{r=1}^{d_{2}}({\bar{F}}_{krM_{rN}}\chi _{rM_{rN}}(X_{ijr}) +\sum _{\nu =1}^{M_{rN}}f_{kr\nu }^{*}(X_{ijr}))-\vec {Z}_{k}+V_{ijk}, \ \ k=1,\ldots ,d_{1}.\nonumber \\ \end{aligned}$$

(A.2)

Let

$$\begin{aligned} \breve{Y}_{ij}=\sum _{r=1}^{d_{2}}({\bar{F}}_{rM_{rN}}\chi _{rM_{rN}}(X_{ijr}) +\sum _{\nu =1}^{M_{rN}}f_{r\nu }^{*}(X_{ijr}))+\vec {\pmb {Z}}^{T}\pmb {\beta }_{0} -\vec {Y}+\varepsilon _{ij}. \end{aligned}$$

(A.3)

Then, we have

$$\begin{aligned} \begin{array}{ll} \hat{\pmb {\beta }}-\pmb {\beta }_{0}=\breve{\pmb {\Gamma }}_{N}^{-1} \Big [\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{\pmb {Z}}_{ij}\breve{Y}_{ij} -\breve{\pmb {W}}_{N}\pmb {A}_{N}^{-1}\big (\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A} (\pmb {X}_{ij})\breve{Y}_{ij}\big )\Big ], \end{array}\quad \end{aligned}$$

(A.4)

with

$$\begin{aligned} \breve{\pmb {\Gamma }}_{N}=\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{\pmb {Z}}_{ij} \breve{\pmb {Z}}_{ij}^{T}-\breve{\pmb {W}}_{N} \pmb {A}_{N}^{-1}\breve{\pmb {W}}_{N}^{T}, \ \ \ \breve{\pmb {W}}_{N}=\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{\pmb {Z}}_{ij}\pmb {A} (\pmb {X}_{ij})^{T}. \end{aligned}$$

(A.5)

Lemma A.1

Under assumptions 1–5, it holds that

$$\begin{aligned} \max _{1\le r\le d, 1\le \nu \le M_{rN}, 1\le k\le p_{r}}\Big |\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{A}_{r\nu k}(X_{ijr})-E(\breve{A}_{r\nu k}(X_{r}))\Big |=o_{p}(M_{N}^{-2}). \end{aligned}$$

Proof

Let $D_{N}=\{(i,j): 1\le i\le m, 1\le j\le n\}$, ${\tilde{A}}_{r\nu k}(X_{ijr})=\breve{A}_{r\nu k}(X_{ijr})-E(\breve{A}_{r\nu k}(X_{ijr}))$ and $A^{*}=\max _{1\le r\le d_{2}, 1\le \nu \le M_{rN}, 1\le k\le p_{r}}\big |\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}{\tilde{A}}_{r\nu k}(X_{ijr})|$. For any sufficiently small positive constant $\varepsilon $, we have

$$\begin{aligned} \begin{array}{ll} P(A^{*}>\varepsilon M_{N}^{-2}) &{} \le \frac{M_{N}^{4}}{\varepsilon ^{2}N^{2}}\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}} \sum _{k}\big [\sum _{(i,j)\in D_{N}}E(\breve{A}_{r\nu k}^{2}(X_{ijr}))\\ &{} +\sum \sum _{(i',j')\ne (i,j)\in D_{N}}|E({\tilde{A}}_{r\nu k}(X_{ijr}){\tilde{A}}_{r\nu k}(X_{i'j'r}))|\big ]. \end{array} \end{aligned}$$

(A.6)

Let $c_{Nk}=[M_{N}^{\delta /((2+\delta )\tau )}]$ for $k=1,2$, where $\tau >2(4+\delta )/(2+\delta )$ is a constant. Let the set $\{(i,j)\ne (i',j')\in D_{N}\}$ be split into the following two parts

$$\begin{aligned} \begin{array}{l} {\mathbf {S}}_{1}=\{(i,j)\ne (i',j')\in D_{N}: |i-i'|\le c_{N1}, |j-j'|\le c_{N2}\},\\ {\mathbf {S}}_{2}=\{(i,j)\ne (i',j')\in D_{N}: |i-i'|> c_{N1} \; \text{ or } \; |j-j'|> c_{N2}\}.\\ \end{array} \end{aligned}$$

By assumption 5, we have

$$\begin{aligned} \begin{array}{l} \frac{M_{N}^{4}}{\varepsilon ^{2}N^{2}}\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}} \sum _{k}\sum _{(i',j'), (i,j)}\sum _{\in {\mathbf {S}}_{1}}|E({\tilde{A}}_{r\nu k}(X_{ijr}){\tilde{A}}_{r\nu k}(X_{i'j'r}))| \\ \le CM_{N}^{4+2\delta /((2+\delta )\tau )}/N=o(1).\end{array} \end{aligned}$$

(A.7)

Turning to ${\mathbf {S}}_{2}$, using Lemma 5.1 of Hallin et al. (2004b), we obtain that

$$\begin{aligned} \begin{array}{ll} |E({\tilde{A}}_{r\nu k}(X_{ijr}){\tilde{A}}_{r\nu k}(X_{i'j'r}))| &{} \le C(E|{\tilde{A}}_{r\nu k}(X_{ijr})|^{2+\delta }))^{2/(2+\delta )} (\varphi (\Vert (i',j')\\ &{}-(i,j)\Vert ))^{\delta /(2+\delta )}\le CM_{N}^{-2/(2+\delta )}(\varphi (\Vert (i',j')\\ &{}-(i,j)\Vert ))^{\delta /(2+\delta )}. \end{array} \end{aligned}$$

Therefore, by assumptions 4 and 5 , we get

$$\begin{aligned} \begin{array}{l} \frac{M_{N}^{4}}{\varepsilon ^{2}N^{2}}\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}} \sum _{k}\sum _{(i',j'), (i,j)}\sum _{\in {\mathbf {S}}_{2}}|E({\tilde{A}}_{r\nu k}(X_{ijr}){\tilde{A}}_{r\nu k}(X_{i'j'r}))| \\ \le C\frac{M_{N}^{4}}{\varepsilon ^{2}N^{2}}N\sum _{l=1}^{2}c_{Nl}^{\tau } \sum _{t=c_{Nl}}^{N}t(\varphi (t))^{\delta /(2+\delta )} =o(1). \end{array} \end{aligned}$$

(A.8)

Now Lemma A.1 follows from (A.6)–(A.8) and the fact that $\sum _{(i,j)\in D_{N}}E(\breve{A}_{r\nu k}^{2}(X_{ijr}))\le NM_{N}^{-1}$. $\square $

Lemma A.2

Under assumptions 1-5, it holds that

$$\begin{aligned} \Vert \breve{\pmb {W}}_{N}/N\Vert _{\infty }=o_{p}(M_{N}^{1/2+\delta /(2+\delta )\tau }/N^{1/2}). \end{aligned}$$

Proof

We first prove

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})V_{ijl} \Vert _{\infty }=o_{p}(M_{N}^{1/2+\delta /(2+\delta )\tau }/N^{1/2}), \ \ l=1,\ldots ,d_{1}. \end{aligned}$$

(A.9)

Let $\xi _{ijr\nu 1}=\chi _{r\nu }(X_{ijr})-E(\chi _{r\nu }(X_{r}))\chi _{rM_{rN}}(X_{ijr}) /E(\chi _{rM_{rN}}(X_{r}))$,

$$\begin{aligned} \begin{array}{l} \xi _{r\nu 2}=\frac{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})}{\sum _{i=1}^{m} \sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})}-\frac{E(\chi _{r\nu }(X_{r}))}{E(\chi _{rM_{rN}}(X_{r}))}, \\ \eta _{ijr\nu k1}=\big (\frac{X_{ijr}-x_{r\nu }}{h_{r0}}\big )^{k}-\frac{E(((X_{r}-x_{r\nu })/h_{r0})^{k} \chi _{r\nu }(X_{r}))}{E(\chi _{r\nu }(X_{r}))}, \\ \eta _{r\nu k2}=\frac{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})((X_{ijr}-x_{r\nu }) /h_{r0})^{k}}{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})} -\frac{E(\chi _{r\nu }(X_{r})((X_{r}-x_{r\nu })/h_{r0})^{k})}{E(\chi _{r\nu }(X_{r}))}. \end{array} \end{aligned}$$

Then

$$\begin{aligned} \begin{array}{l} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})V_{ijl}\Vert _{\infty } \\ \le \sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}-1}\frac{1}{N}\big |\sum _{i=1}^{m} \sum _{j=1}^{n}(\xi _{ijr\nu 1}+\chi _{rM_{rN}}(X_{ijr})\xi _{r\nu 2})V_{ijl}\big | \\ +\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}}\sum _{k=1}^{p_{r}}\frac{1}{N}\big | \sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})(\eta _{ijr\nu k1}+\eta _{r\nu k2})V_{ijl}\big |. \end{array} \end{aligned}$$

(A.10)

Similar to the proof of Lemma A.1, we obtain that $E[\sum _{i=1}^{m}\sum _{j=1}^{n}\xi _{ijr\nu 1}V_{ijl}]^{2}\le CNM_{N}^{2\delta /((2+\delta )\tau )-1}$ and $E[\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})\eta _{ijr\nu k1}V_{ijl}]^{2}\le CNM_{N}^{2\delta /((2+\delta )\tau )-1}$. Hence,

$$\begin{aligned}&\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}-1}\frac{1}{N}|\sum _{i=1}^{m}\sum _{j=1}^{n} \xi _{ijr\nu 1}V_{ijl}|=o_{p}(M_{N}^{1/2+\delta /(2+\delta )\tau }/N^{1/2}), \end{aligned}$$

(A.11)

$$\begin{aligned}&\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}}\sum _{k=1}^{p_{r}}\frac{1}{N}| \sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})\eta _{ijr\nu k1}V_{ijl}|=o_{p}(M_{N}^{1/2+\delta /(2+\delta )\tau }/N^{1/2}).\nonumber \\ \end{aligned}$$

(A.12)

Lemma A.1 implies $\max _{1\le r\le d_{2}, 1\le \nu \le M_{rN}-1}|\xi _{r\nu 2}|=o_{p}(M_{N}^{-1})$, $\max _{1\le r\le d_{2}, 1\le \nu \le M_{rN}}|\eta _{r\nu k2}|=o_{p}(M_{N}^{-1})$. Since $E[\chi _{r\nu }(X_{ijr})V_{ijl}]=E[(\chi _{r\nu }(X_{ijr})-E\chi _{r\nu }(X_{ijr}))V_{ijl}] +E(\chi _{r\nu }(X_{ijr}))E(V_{ijl})=0$, then by arguments similar to those used in the proof of Lemma A.1, we have

$$\begin{aligned} \frac{1}{N}\big |\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})V_{ijl} \big |=o_{p}(N^{-1/2}), \end{aligned}$$

(A.13)

Therefore,

$$\begin{aligned} \sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}-1}\frac{|\xi _{r\nu 2}|}{N}\big |\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})V_{ijl} \big |=o_{p}(N^{-1/2}), \end{aligned}$$

(A.14)

$$\begin{aligned} \sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}}\sum _{k=1}^{p_{r}}\frac{|\eta _{r\nu k2}|}{N}\big |\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})V_{ijl} \big |=o_{p}(N^{-1/2}). \end{aligned}$$

(A.15)

Now (A.9) follows from (A.10)–(A.12), (A.14) and (A.15). Similar to the proof of (A.9), we deduce that

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})\pmb {A} (\pmb {X}_{ij})\Vert _{\infty }=o_{p}(M_{N}^{-1}). \end{aligned}$$

(A.16)

Using the fact that $E(f_{r}(X_{r}))=0$, we get that ${\bar{F}}_{rM_{rN}}=O_{p}(M_{N}^{3/2}/N^{1/2})$. Hence,

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}{\bar{F}}_{rM_{rN}}\chi _{rM_{rN}}(X_{ijr}) \pmb {A}(\pmb {X}_{ij})\Vert _{\infty }=o_{p}(M_{N}^{1/2}/N^{1/2}). \end{aligned}$$

(A.17)

Similar to the proof of (A.9) and using Assumptions , we have

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\big [\sum _{r=1}^{d_{2}} \sum _{\nu =1}^{M_{rN}}f_{kr\nu }^{*}(X_{ijr})\big ]\pmb {A}(\pmb {X}_{ij}) \Vert _{\infty }=o_{p}(M_{N}^{-p_{r}}N^{-1/2}). \end{aligned}$$

(A.18)

Similar to the proof of (A.17), we get that

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\vec {Z}_{k}\pmb {A}(\pmb {X}_{ij}) \Vert _{\infty }=o_{p}(M_{N}^{-1/2}N^{-1/2}). \end{aligned}$$

(A.19)

Now Lemma A.2 follows from (A.9) and (A.17)–(A.19)and Assumption 5. $\square $

Lemma A.3

Under Assumptions 1-5, it holds that

$$\begin{aligned} \breve{\pmb {\Gamma }}_{N}/N=\pmb {\Gamma }+o_{p}(1). \end{aligned}$$

Proof

We first prove that $M_{N}\pmb {A}_{N}/N$ is invertible. Let $\lambda _{min}$ be the minimum eigenvalue of $M_{N}\pmb {A}_{N}/N$. By Lemma 3 of Stone (1985) and Lemma A.1 and using Assumption 4 and the fact that $\chi _{r\nu }(X_{ijr})\chi _{r\nu '}(X_{ijr})=0$ for $\nu \ne \nu '$, we have that

$$\begin{aligned} \begin{array}{ll} \lambda _{min} &{} =\inf _{\Vert \pmb {a}\Vert =1}\frac{M_{N}}{N}\pmb {a}^{T}\pmb {A}_{N}\pmb {a}\\ &{} =\inf _{\Vert \pmb {a}\Vert =1}\frac{M_{N}}{N}\sum _{i=1}^{m}\sum _{j=1}^{n} \Big (\sum _{r=1}^{d_{2}}[\sum _{\nu =1}^{M_{rN}-1}\sum _{k=0}^{p_{r}}a_{r\nu k}A_{r\nu k}(X_{ijr}) \\ &{} +\sum _{k=1}^{p_{r}}a_{rM_{rN} k}A_{rM_{rN} k}(X_{ijr})]\Big )^{2} \\ &{} \ge C\inf _{\Vert \pmb {a}\Vert =1}\sum _{r=1}^{d_{2}}\frac{M_{N}}{N}\sum _{i=1}^{m} \sum _{j=1}^{n}\big [\sum _{\nu =1}^{M_{rn}-1}(a_{r\nu 0}\chi _{r\nu }(X_{ijr})\\ &{}+\sum _{k=1}^{p_{r}}a_{r\nu k}A_{r\nu k}(X_{ijr}))^{2} \\ &{} +(-\chi _{rM_{rN}}(X_{ijr})\sum _{\nu =1}^{M_{rN}-1}a_{r\nu 0}E_{rM_{rN}\nu } +\sum _{k=1}^{p_{r}}a_{rM_{rN} k}A_{rM_{rN} k}(X_{ijr}))^{2}\big ] \\ &{} \ge C\inf _{\Vert \pmb {a}\Vert =1}\sum _{r=1}^{d_{2}}(\sum _{\nu =1}^{M_{rn}-1}\pmb {a}_{r\nu }^{T}G_{r}\pmb {a}_{r\nu }+\pmb {a}_{rM_{rn}}^{T}G_{r}^{*}\pmb {a}_{rM_{rn}}) +o_{p}(1), \end{array} \end{aligned}$$

where $E_{rM_{rN}\nu }=\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})/\sum _{i=1}^{m} \sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})$, $G_{r}=(g_{rij})_{(p_{r}+1)\times (p_{r}+1)}$ with $g_{r11}=2$ and $g_{rij}=\int _{-1}^{1}x_{r}^{i+j-2}dx_{r}-\int _{-1}^{1}x_{r}^{i-1}dx_{r} \int _{-1}^{1}x_{r}^{j-1}dx_{r}/2$ for $i>1$ or $j>1$, $G_{r}^{*}=(g_{rij}^{*})_{p_{r}\times p_{r}}$ with $g_{rij}^{*}=\int _{-1}^{1}x_{r}^{i+j}dx_{r}-\int _{-1}^{1}x_{r}^{i}dx_{r} \int _{-1}^{1}x_{r}^{j}dx _{r}/2$ and $\pmb {a}_{r\nu }=(a_{r\nu 0}, a_{r\nu 1},\ldots ,a_{r\nu p_{r}})^{T}$ for $\nu =1,\ldots ,M_{rn}-1$ and $\pmb {a}_{rM_{rn}}=(a_{rM_{rn}1},\ldots ,a_{rM_{rn}p_{r}})^{T}$. For fixed $p_{r}$, it is easy to prove that $G_{r}$ and $G_{r}^{*}$ are positive definite. Hence, there exists a positive constant $C_{1}^{*}$ such that $\lambda _{min}\ge C_{1}^{*}+o_{p}(1)$ and consequently $M_{N}\pmb {A}_{N}/N$ is invertible. By arguments similar to those used to prove Lemma A.1 and using the fact that $E(f_{kr}(U_{r}))=0$ for $k=1,\ldots ,d_{1}$, we get that

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{\pmb {Z}}_{ij} \breve{\pmb {Z}}_{ij}^{T}=E(\pmb {V}\pmb {V}^{T})+o_{p}(1). \end{aligned}$$

(A.20)

Using Lemma A.2 and assumption 5, we obtain that

$$\begin{aligned} |\breve{\pmb {W}}_{N} \pmb {A}_{N}^{-1}\breve{\pmb {W}}_{N}^{T}/N|_{\infty }\le M_{N}\Vert \breve{\pmb {W}}_{N}/N\Vert _{\infty }^{2}\cdot |(M_{N}\pmb {A}_{N}/N)^{-1}|_{\infty } =o_{p}(1). \end{aligned}$$

(A.21)

Now Lemma A.3 follows from (A.5), (A.20) and (A.21). $\square $

Proof of Theorem 3.1

Using (A.13), we obtain that

$$\begin{aligned} \sum _{r=1}^{d_{2}}\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}{\bar{F}}_{rM_{rN}} \chi _{rM_{rN}}(X_{ijr})V_{ijk}=o_{p}(M_{N}^{3/2}N^{-1}). \end{aligned}$$

(A.22)

Similar to the proof of (A.18), we have

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\big [\sum _{r=1}^{d_{2}}\sum _{\nu =1}^{M_{rN}} f_{kr\nu }^{*}(X_{ijr})\big ]V_{ijk}=o_{p}(M_{N}^{-p_{r}}N^{-1/2}). \end{aligned}$$

(A.23)

Since

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}(\vec {\pmb {Z}}^{T}\pmb {\beta }_{0} +\vec {Y})V_{ijk}=(\vec {\pmb {Z}}^{T}\pmb {\beta }_{0}+\vec {Y}) \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}V_{ijk}=o_{p}(M_{N}N^{-1}), \nonumber \\ \end{aligned}$$

(A.24)

then by (A.3), we have

$$\begin{aligned} N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{Y}_{ij}V_{ijk}=N^{-1/2}\sum _{i=1}^{m} \sum _{j=1}^{n}\varepsilon _{ij}V_{ijk}+o_{p}(1). \end{aligned}$$

(A.25)

Similarly, $N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}(\breve{Z}_{ijk}-V_{ijk})\varepsilon _{ij} =o_{p}(1)$. Under the assumptions of Theorem 3.1, it is easy to prove that

$$\begin{aligned} N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}(\breve{Z}_{ijk}-V_{ijk})(\breve{Y}_{ij} -\varepsilon _{ij})=o_{p}(1). \end{aligned}$$

Hence,

$$\begin{aligned} N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}\breve{\pmb {Z}}_{ij}\breve{Y}_{ij} =N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {V}_{ij}\varepsilon _{ij}+o_{p}(1). \end{aligned}$$

(A.26)

Similar to the proof of Lemma A.2, we deduce that

$$\begin{aligned} \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij}) \breve{Y}_{ij}\Vert _{\infty }=o_{p}(M_{N}^{1/2+\delta /(2+\delta )\tau }/N^{1/2}). \end{aligned}$$

(A.27)

Therefore, Using Lemma A.2 and (A.27), we conclude that

$$\begin{aligned}&N^{-1/2}|\breve{\pmb {W}}_{N}\pmb {A}_{N}^{-1}\big (\sum _{i=1}^{m}\sum _{j=1}^{n} \pmb {A}(\pmb {X}_{ij})\breve{Y}_{ij}\big )|_{\infty } \nonumber \\&\quad \le N^{1/2}M_{N}\Vert \breve{\pmb {W}}_{N}/N\Vert _{\infty }\cdot |(M_{N}\pmb {A}_{N}/N)^{-1}|_{\infty } \cdot \Vert \frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})\breve{Y}_{ij} \Vert _{\infty }\nonumber \\&\quad =N^{1/2}M_{N}o_{p}(M_{N}^{1+2\delta /(2+\delta )\tau }/N)=o_{p}(1). \end{aligned}$$

(A.28)

By arguments similar to those used in the proof of Lemma 6 of Tang and Cheng (2009), we can prove that $N^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {V}_{ij}\varepsilon _{ij}$ is asymptotically normal. Therefore, (3.2) follows from (A.4), Lemma (A.3), (A.25) and (A.28). The proof of Theorem 3.1 is completed. $\square $

Proof of Theorem 3.2

Let $\pmb {a}_{0}=(\pmb {a}_{01}^{T},\ldots ,\pmb {a}_{0d}^{T})^{T}$. By arguments similar to those used to prove Lemma A.2, we deduce that

$$\begin{aligned} \Vert \tilde{\pmb {a}}-\pmb {a}_{0}\Vert ^{2}&\le CM_{N}^{2}\Vert \sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})\big ({\bar{Y}}_{ij}- \bar{\pmb {Z}}_{ij}^{T}\hat{\pmb {\beta }}-\pmb {A}^{T}(\pmb {X}_{ij})\pmb {a}_{0}) \Vert ^{2}\\&=o_{p}(M_{N}^{2+2\delta /(2+\delta )\tau }/N). \end{aligned}$$

Hence, under the assumptions of Theorem 3.2, by arguments similar to those used to prove Lemma A.2, we obtain that

$$\begin{aligned} \begin{array}{ll} \Big (\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \big (\sum _{r'\ne r}{\tilde{f}}_{r'}(X_{ijr'})-\sum _{r'\ne r}f_{r'}(X_{ijr'})\big )\Big )^{2} \\ \le \Vert \tilde{\pmb {a}}-\pmb {a}_{0}\Vert ^{2}\big \Vert \sum _{i=1}^{m}\sum _{j=1}^{n}K \big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big )\pmb {A}_{-r}(\pmb {X}_{ij})\big \Vert ^{2} \\ +\big [\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \sum _{r'\ne r}({\bar{F}}_{r'M_{r'N}}\chi _{r'M_{r'N}}(X_{ijr'}) +\sum _{\nu =1}^{M_{r'N}}f_{r'\nu }^{*}(X_{ijr'}))\big ]^{2} \\ =o_{p}(NM_{N}^{1+2\delta /(2+\delta )\tau }h_{r}^{2})=o_{p}(Nh_{r}), \end{array} \end{aligned}$$

where $\pmb {A}_{-r}(\pmb {X}_{ij})=(\pmb {A}_{1}(X_{ij1}),\ldots ,\pmb {A}_{r-1} (X_{ij(r-1)}),\pmb {A}_{r+1}(X_{ij(r+1)}),\ldots ,\pmb {A}_{d}(X_{ijd}))^{T}$. Therefore,

$$\begin{aligned} (Nh_{r})^{-1/2}\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big )\pmb {B}_{ijr}\big (\sum _{r'\ne r}{\tilde{f}}_{r'}(X_{ijr'})-\sum _{r'\ne r}f_{r'}(X_{ijr'})\big )=o_{p}(1). \nonumber \\ \end{aligned}$$

(A.29)

Now by arguments similar to those used in the proof of Theorem 3.1 of Hallin et al. (2004) and using (A.29), we can easily complete the proof of Theorem 3.2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qingguo, T., Wenyu, C. Estimation for partially linear additive regression with spatial data. Stat Papers 63, 2041–2063 (2022). https://doi.org/10.1007/s00362-022-01326-8

Download citation

Received: 03 September 2021
Revised: 04 May 2022
Accepted: 10 May 2022
Published: 17 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00362-022-01326-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimation for partially linear additive regression with spatial data

Abstract

Similar content being viewed by others

Birnbaum–Saunders functional regression models for spatial data

Bayesian measurement error correction in structured additive distributional regression with an application to the analysis of sensor data on soil–plant variability

A goodness-of-fit test for regression models with spatially correlated errors

1 Introduction

2 Estimation method

3 Asymptotic properties of the estimators

Theorem 3.1

Theorem 3.2

4 Simulations

5 A real data example

6 Conclusion and future studies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimation for partially linear additive regression with spatial data

Abstract

Similar content being viewed by others

Birnbaum–Saunders functional regression models for spatial data

Bayesian measurement error correction in structured additive distributional regression with an application to the analysis of sensor data on soil–plant variability

A goodness-of-fit test for regression models with spatially correlated errors

1 Introduction

2 Estimation method

3 Asymptotic properties of the estimators

Theorem 3.1

Theorem 3.2

4 Simulations

5 A real data example

6 Conclusion and future studies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation