1 Introduction

Spatial data, which are collected at different sites on the surface of the earth, arise in various areas of research, including econometrics, epidemiology, environmental science, image analysis, oceanography and many others. Many authors (e.g. Ripley 1981; Cressie 1991) have studied the parametric methods for statistical inference in spatial models. However, one often encounters situations where a particular parametric model cannot be adopted with confidence and thus a nonparametric method is used as an alternative. In the last two decades, efforts have been made in the literature to explore nonlinear relationship in spatial data. Among them, Tran (1990), Hallin et al. (2001) and Lee et al. (2004) discussed kinds of asymptotic properties of density estimation for spatial processes. Hallin et al. (2004) established weak consistency and asymptotic normality for local linear regression estimation. Gao et al. (2006) developed the properties of the local linear kernel estimator for semiparametric spatial regression model. Al-Sulami et al. (2017) developed a two-step estimation procedure for semiparametric spatial time-series autoregressive model and established the asymptotic properties of the estimators. Nandy et al. (2017) developed a regularized variable selection technique for building a spatial additive model. The focus of our work is on the estimation and inference of partially linear additive regression with spatial data.

Let \(\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}\) be a random field indexed by \((i,j)\in {\mathbf {E}}^{2}\), where \({\mathbf {E}}=\{0,\pm 1, \pm 2, \cdots \}\) and \(Y_{ij}\), with values in R, \(\pmb {Z}_{ij}=(Z_{ij1},\ldots ,Z_{ijd_{1}})^{T}\), with values in \(R^{d_{1}}\), \(\pmb {X}_{ij}=(X_{ij1},\ldots ,U_{ijd_{2}})^{T}\), with values in \(R^{d_{2}}\), are defined over some probability space \((\Omega ,{\mathcal {F}},P)\). A point \((i,j)\in {\mathbf {E}}^{2}\) is referred to as a site. Let S and \(S'\) be two sets of sites. The Borel fields \(\mathcal {B(S)}={\mathcal {B}}((Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in S)\) and \(\mathcal {B(S')}={\mathcal {B}}((Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in S')\) are the \(\sigma \)-fields generated by the random vectors \((Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\) with (ij) being the elements of S and \(S'\) respectively. Let \(d(S,S')\) be the Euclidean distance between S and \(S'\). We will assume that \(\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}\) satisfies the following mixing condition as defined in the literature (cf., Tran 1990; Hallin et al. 2004): there exists a function \(\varphi (t)\downarrow 0\) as \(t\rightarrow \infty \), such that whenever \(S, S'\subset {\mathbf {E}}^{2}\),

$$\begin{aligned} \begin{array}{ll} \alpha ({\mathcal {B}}(S),{\mathcal {B}}(S')) &{} =\sup \{|P(AB)-P(A)P(B)|, A\in {\mathcal {B}}(S),B\in {\mathcal {B}}(S')\} \\ &{}\le \psi (Card(S),Card(S'))\varphi (d(S,S')), \end{array} \end{aligned}$$
(1.1)

where Card(S) denotes the cardinality of S, and \(\psi \) is a symmetric positive function nondecreasing in each variable. In particular, if \(\psi (\infty ,\infty )\le C_{0}\) for some positive constant \(C_{0}\), \(\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij})\}\) is \(\alpha \)-mixing (or strongly mixing). Introduced by Rosenblatt (1956), \(\alpha \)-mixing dependence is a property shared by many time series models such as the autoregressive linear processes (see Athreya and Pantala 1986).

A crucial problem for a number of applications is the problem of spatial regression, where the influence of the vectors \(\pmb {X}_{ij}\) and \(\pmb {Z}_{ij}\) on some response variable \(Y_{ij}\) is to be studied in a context of complex spatial dependence. In many practical situations, it is desirable to model multiple covariates nonparametrically. However, it is well known that multivariate nonparametric estimation is subject to the curse of dimensionality. A widely used approach for dimensionality reduction is to consider an additive model for the nonparametric part of the regression function in the partly linear model, which in turn results in the partially linear additive model. In this paper, we try to approximate the conditional mean function \(E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij})\) by a partially linear additive function of the form

$$\begin{aligned} \Psi (\pmb {Z}_{ij},\pmb {X}_{ij}) =\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}), \end{aligned}$$
(1.2)

such that \(E(Y_{ij}-\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}\) or, equivalently, \(E(E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij}) -\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}\) is minimized over a class of partially linear additive functions of the form \(\Psi (\pmb {Z}_{ij},\pmb {X}_{ij})\), where \(\mu \) is an unknown constant, \(\pmb {\beta }_{0}=(\beta _{01},\ldots ,\beta _{0d_{1}})^{T}\) is a \(d_{1}\)-dimensional vector of unknown parameters, \(f_{r}(x_{r}), x_{r}\in [c_{r1},c_{r2}], r=1,\ldots ,d_{2}\) are unknown functions. For identifiability, we assume that \(E[f_{r}(X_{r})]=0\) for all \(1\le r\le d_{2}\).

Additive models are a special case of the more general projection pursuit regression model of Friedman and Stuetzle (1981). Additive models are useful for multivariate data analysis. They enable us to avoid the curse of dimensionality while retaining great flexibility in the regression function. However, in many empirical situations the mean of the outcome is assumed to depend on some covariates parametrically and some other covariates nonparametrically. Partially linear additive regression (PLAR), which contains both linear and nonlinear additive components, are more flexible than stringent linear models and are more parsimonious than general nonparametric regression models. PLAR have been studied by many authors, see, for example, Li (2000), Fan and Li (2003), Liang et al. (2008), You and Zhou (2013), Cheng et al. (2014), Lian et al. (2015), Sherwood and Wang (2016), Yang et al. (2019) and among others.

Gao et al. (2006) used marginal integration technique to estimate the additive components in PLAR with spatial data. However, this method is not suitable for high-dimensional vector \(\pmb {X}_{ij}\). In this paper, we develop a new estimation method to estimate the additive components \(f_{r}(x_{r})\) in (1.1). Our method has the following advantages comparing with the existing methods in the literature: (1) different from spline approach (Stone 1985; Huang et al. 2010) which needs to solve the constrained minimization problem and backfitting procedures (Brieman and Friedman 1985; Buja et al. 1989) which needs iterative algorithms, our estimators of additive functions have definite expressions, there is no need to solve the restricted minimization problem and no iterative algorithms are needed, resulting in fast and efficient estimation; (2) since there is only a smoothing parameter in B-spline procedure, when additive functions have different degrees of smoothness, the estimators of some functions become inefficient. In our proposed method, we first obtain the estimators of \(\pmb {\beta }_{0}\) and the piecewise polynomial estimators of \(f_{r}(x_{r}), r=1,\ldots , d_{2}\), the local linear estimator of \(f_{r}(u_{r})\) is obtained by solving the univariate regression problem and the bandwidth \(h_{r}\) is easily chosen; (3) different from marginal integration approach of Gao et al. (2006), when \(\pmb {X}_{ij}\) is high-dimensional vector and the sample size is not big, our estimators behavior well, this can be seen from the following simulation and application; (4) under mild regularity conditions, the asymptotic properties of the proposed estimators are established.

In our spatial PLAR, spatial information is added into a regression modeling by the spatial dependence of the data \(\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}),(i,j)\in {\mathbf {E}}^{2}\}\), where \(\pmb {Z}_{ij}\) and \(\pmb {X}_{ij}\) may contain both exogenous and endogenous variables, that is, neighboring values of \(Y_{ij}\). Moreover, a component \(Z_{ijr}\) of \(\pmb {Z}_{ij}\) or a component \({X}_{ijr}\) of \(\pmb {X}_{ij}\) may itself be a linear combination of neighboring values of \(Y_{ij}\). In the literature, there is another stream of spatial regression modeling by using a bivariate function to model the spatial effect into the model, for example, Ramsay (2002) use a deterministic smooth surface function to describe the variations and connections among values at different locations. Yu et al. (2020) introduce a class of generalized geoadditive models, a synthesis of geostatistics and generalized additive models, where the effect of explanatory variables are modeled with additive univariate functions and the spatial effect is modeled via a bivariate function.

The paper is organized as follows. Section 2 describes the estimation method. Section 3 presents the asymptotic theory of our estimators. In Sect. 4, we conduct simulation studies to examine the finite-sample performance of the proposed procedures. In Sect. 5, the proposed methods are illustrated by analyzing a real estate data set. Technical proofs are relegated to the Appendix.

2 Estimation method

As stated in Sect. 1, the conditional mean function \(E(Y_{ij}|\pmb {Z}_{ij},\pmb {X}_{ij})\) is approximated by a partially linear additive function of the form \(\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}) =\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})\) such that

$$\begin{aligned} E(Y_{ij}-\Psi (\pmb {Z}_{ij},\pmb {X}_{ij}))^{2}=E[Y_{ij}-(\mu +\pmb {Z}_{ij}^{T} \pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))]^{2} \end{aligned}$$

is minimized over a class of partially linear additive functions of the form \(\Psi (\pmb {Z}_{ij},\pmb {X}_{ij})\) with \(E(f_{r}(X_{ijr}))=0\), which implies that \( \mu =E(Y_{ij})-E(\pmb {Z}_{ij})^{T}\pmb {\beta }_{0}\). Since

$$\begin{aligned} \begin{array}{l} E[Y_{ij}-(\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0}+\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))]^{2} \\ =E(E([Y_{ij}-E(Y_{ij}|\pmb {X}_{ij})-(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}|\pmb {X}_{ij}))^{T} \pmb {\beta }_{0}]^{2}|\pmb {X}_{ij})) \\ +E[E(Y_{ij}|\pmb {X}_{ij})-\mu -E(\pmb {Z}_{ij}|\pmb {X}_{ij})^{T}\pmb {\beta }_{0} -\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})]^{2}, \end{array} \end{aligned}$$

we then have

$$\begin{aligned} \pmb {\beta }_{0}= & {} (E[(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}|\pmb {X}_{ij}))(\pmb {Z}_{ij} -E(\pmb {Z}_{ij}|\pmb {X}_{ij}))^{T}])^{-1}E[(\pmb {Z}_{ij}-E(\pmb {Z}_{ij}| \pmb {X}_{ij}))(Y_{ij}\\&-E(Y_{ij}|\pmb {X}_{ij}))] \end{aligned}$$

provided that the inverse exists, and \(\sum _{r=1}^{d_{2}}f_{r}(X_{ijr})\) is a projection of \(E(Y_{ij}|\pmb {X}_{ij})-\mu -E(\pmb {Z}_{ij}|\pmb {X}_{ij})^{T}\pmb {\beta }_{0}\) on the space of additive functions of \(X_{ijr},r=1,...,d_{2}\).

Assume that we have observations \((Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij})\) for \(1\le i\le m\) and \(1\le j\le n\). The total sample size is thus \(N=m\times n\). We first estimate \(\pmb {\beta }_{0}\). In order to approximate \(f_{r}(x_{r})\) for \(x_{r}\in [c_{r1},c_{r2}]\), we construct piecewise polynomial estimators of \(f_{r}(x_{r})\) of degree \(p_{r}\). We split equally \([c_{r1},c_{r2}]\) into \(M_{rN}\) subintervals. Then the length of every subinterval is \(2h_{r0}=(c_{r2}-c_{r1})/M_{rN}\). Let \(I_{r\nu }=[c_{r1}+2(\nu -1)h_{r0},c_{r1}+2\nu h_{r0})\) for \(1\le \nu \le M_{rN}-1\) and \(I_{rM_{rN}}=[c_{r2}-2h_{r0},c_{r2}]\). Let \(x_{r\nu }\) denote the center of the interval \(I_{r\nu }\) and \(\chi _{r\nu }\) denote the indicator function of \(I_{r\nu }\), so that \(\chi _{r\nu }(x_{r})\)=1 or 0 according to \(x_{r}\in I_{r\nu }\) or \(x_{r}{\bar{\in }}I_{r\nu }\).

Denote \(\breve{A}_{r\nu k}(x_{r})=\chi _{r\nu }(x_{r})[(x_{r}-x_{r\nu })/h_{r0}]^{k}\) for \(k=0,1,\ldots ,p_{r}; \nu =1,\cdots ,M_{rN}; r=1,\ldots ,d_{2}\). We use \(\breve{f}_{r}(x_{r})=\sum _{\nu =1}^{M_{rN}}\sum _{k=0}^{p_{r}}a_{r\nu k}\breve{A}_{r\nu k}(x_{r})\) to approximate \(f_{r}(x_{r})\). Since \(E[f_{r}(X_{r})]=0\), we then set \(E[\breve{f}_{r}(X_{r})]=0\), this can be done by using \(\breve{A}_{r\nu k}(x_{r})-\chi _{r\nu }(x_{r})E[\breve{A}_{r\nu k}(X_{r})]/E[\chi _{r\nu }(X_{r})]\) to replace \(\breve{A}_{r\nu k}(x_{r})\) for \(k=1,\ldots ,p_{r}\) and setting \(\sum _{\nu =1}^{M_{rN}}a_{r\nu 0}E[\chi _{r\nu }(X_{r})]=0\). Let \(a_{rM_{rN}0}=\sum _{\nu =1}^{M_{rN}-1}a_{r\nu 0}E[\chi _{r\nu }(X_{r})]/E[\chi _{rM_{rN}}(X_{r})]\). Since \(E[\breve{A}_{r\nu k}(X_{r})]\) and \(E[\chi _{r\nu }(X_{r})]\) are unknown, we then use

$$\begin{aligned} {\bar{A}}_{r\nu k}=\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})[(X_{ijr} -x_{r\nu })/h_{r0}]^{k}/\Big [\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu }(X_{ijr})\Big ] \end{aligned}$$

to replace \(E[\breve{A}_{r\nu k}(X_{r})]/E[\chi _{r\nu }(X_{r})]\). Denote

$$\begin{aligned} \begin{array}{l} A_{r\nu 0}(x_{r})=\chi _{r\nu }(x_{r})-\frac{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{r\nu } (X_{ijr})}{\sum _{i=1}^{m}\sum _{j=1}^{n}\chi _{rM_{rN}}(X_{ijr})}\chi _{rM_{rN}}(x_{r}), \ \ \ \nu =1,\ldots ,M_{rN}-1, \\ A_{r\nu k}(x_{r})=\chi _{r\nu }(x_{r})\{[(x_{r}-x_{r\nu })/h_{r0}]^{k}-{\bar{A}}_{r\nu k}\}, \ \ \ k=1,\ldots ,p_{r}, \\ \pmb {A}_{r\nu }(x_{r})=(A_{r\nu 0}(x_{r}),A_{r\nu 1}(x_{r}),\cdots ,A_{r\nu q_{r}}(x_{r}))^{T}, \nu =1,\cdots ,M_{rN}-1, \\ \pmb {A}_{rM_{rN}}(x_{r})=(A_{rM_{rN} 1}(x_{r}),\cdots ,A_{rM_{rN} q_{r}}(x_{r}))^{T}, \\ \pmb {A}_{r}(x_{r})=(\pmb {A}_{r1}(x_{r})^{T},\cdots ,\pmb {A}_{rM_{rN}}(x_{r})^{T})^{T}, \ \ r=1,\ldots ,d_{2}. \end{array} \end{aligned}$$

Let \(\pmb {a}_{r\nu }=(a_{r\nu 0},a_{r\nu 1},\cdots ,a_{r\nu p_{r}})^{T}\) for \(\nu =1,\ldots ,M_{rN}-1\), \(\pmb {a}_{rM_{rN}}=(a_{rM_{rN} 1},\cdots , a_{rM_{rN} p_{r}})^{T}\) and \(\pmb {a}_{r}=(\pmb {a}_{r1}^{T},\cdots ,\pmb {a}_{rM_{rN}}^{T})^{T}\). We use \(\vec {f}_{r}(x_{r})=\pmb {A}_{r}^{T}(x_{r})\pmb {a}_{r}\) to approximate \(f_{r}(x_{r})\). Note that \(\vec {f}_{r}(x_{r})\) is a piecewise polynomial of degree \(p_{r}\) and satisfies that \(\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\vec {f}_{r}(X_{ijr})=0\). Let \({\bar{Y}}=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}Y_{ij}\) and \(\bar{\pmb {Z}}=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {Z}_{ij}\) be the estimator of E(Y) and \(E(\pmb {Z})\) respectively. Based on spatial observations \(\{(Y_{ij}, \pmb {Z}_{ij},\pmb {X}_{ij}): 1\le i\le m; 1\le j\le n\}\), we use \({\bar{Y}}-\bar{\pmb {Z}}^{T}\pmb {\beta }\) to approximate \(\mu \) and solve the following minimization problem

$$\begin{aligned} \begin{array}{l} \min \sum _{i=1}^{m}\sum _{j=1}^{n}\{Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\pmb {\beta }-\sum _{r=1}^{d_{2}}\pmb {A}_{r}^{T} (X_{ijr})\pmb {a}_{r}\}^{2} \end{array} \end{aligned}$$
(2.1)

with respect to the \(\pmb {\beta }, \pmb {a}_{r}\). Denote \( {\bar{Y}}_{ij}=Y_{ij}-{\bar{Y}}\), \(\bar{\pmb {Z}}_{ij}=\pmb {Z}_{ij}-\bar{\pmb {Z}}\), \(\pmb {A}(\pmb {X}_{ij}) =(\pmb {A}_{1}^{T}(X_{ij1}), \ldots ,\pmb {A}_{d_{2}}^{T}(X_{ijd_{2}}))^{T}\) and \(\pmb {a}=(\pmb {a}_{1}^{T},\ldots ,\pmb {a}_{d_{2}}^{T})^{T}\). Then the estimator \(\hat{\pmb {\beta }}\) of \(\pmb {\beta }_{0}\) is given by

$$\begin{aligned} \hat{\pmb {\beta }}=\pmb {\Gamma }_{N}^{-1}\Big [\sum _{i=1}^{m}\sum _{j=1}^{n} \bar{\pmb {Z}}_{ij}{\bar{Y}}_{ij} -\pmb {W}_{N}\pmb {A}_{N}^{-1}\big (\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij}) {\bar{Y}}_{ij}\big )\Big ], \end{aligned}$$
(2.2)

where

$$\begin{aligned} \pmb {\Gamma }_{N}= & {} \sum _{i=1}^{m}\sum _{j=1}^{n}\bar{\pmb {Z}}_{ij}\bar{\pmb {Z}}_{ij}^{T} -\pmb {W}_{N} \pmb {A}_{N}^{-1}\pmb {W}_{N}^{T}, \end{aligned}$$
(2.3)
$$\begin{aligned} \pmb {W}_{N}= & {} \sum _{i=1}^{m}\sum _{j=1}^{n}\bar{\pmb {Z}}_{ij}\pmb {A}(\pmb {X}_{ij})^{T}, \ \ \ \pmb {A}_{N}=\sum _{i=1}^{m}\sum _{j=1}^{n}\pmb {A}(\pmb {X}_{ij})\pmb {A} (\pmb {X}_{ij})^{T}. \end{aligned}$$
(2.4)

The estimator \(\tilde{\pmb {a}}=(\tilde{\pmb {a}}_{1}^{T},\ldots ,\tilde{\pmb {a}}_{d_{2}}^{T})^{T}\) of \(\pmb {a}\) is given by

$$\begin{aligned} \tilde{\pmb {a}}=\pmb {A}_{N}^{-1}\Big (\sum _{i=1}^{m}\sum _{j=1}^{n} \pmb {A}(\pmb {X}_{ij})\big ({\bar{Y}}_{ij}- \bar{\pmb {Z}}_{ij}^{T}\hat{\pmb {\beta }}\big )\Big ) \end{aligned}$$
(2.5)

and piecewise polynomial estimator \({\tilde{f}}_{r}(x_{r})\) of \(f_{r}(x_{r})\) is given by

$$\begin{aligned} {\tilde{f}}_{r}(x_{r})=\pmb {A}_{r}^{T}(x_{r})\tilde{\pmb {a}}_{r}, \ \ x_{r}\in [c_{r1},c_{r2}]. \end{aligned}$$

The estimator of \(\mu \) is given by

$$\begin{aligned} {\hat{\mu }}={\bar{Y}}-\bar{\pmb {Z}}^{T}\hat{\pmb {\beta }}. \end{aligned}$$
(2.6)

Since \({\tilde{f}}_{r}(x_{r})\) is only piecewise smooth, it may not be appealing to use this piecewise polynomial directly as the estimate of the function \(f_{r}(x_{r})\). After we estimated \(\pmb {\beta }_{0}\), for a given \(x_{0r}\in [c_{r1},c_{r2}]\), for \(x_{r}\) in the neighborhood of \(x_{0r}\), we use \(b_{r0} + b_{r1}(x_{r}-x_{0r})\) to approximate the unknown coefficient function \(f_{r}(x_{r})\). We then solve the following minimization problem

$$\begin{aligned}&\min \sum _{i=1}^{m}\sum _{j=1}^{n}\{Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\hat{\pmb {\beta }}-\sum _{k\ne r}{\tilde{f}}_{k}(X_{ijk})-[b_{r0} + b_{r1}(X_{ijr}-x_{0r})]\}^{2}K\nonumber \\&\quad \big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \end{aligned}$$
(2.7)

with respect to \(b_{r0}, b_{r1}\), where \(K(\cdot )\) is a given kernel function and \(h_{r}\) is a chosen bandwidth. Let \(\pmb {b}_{r}=(b_{r0}, h_{r}b_{r1})^{T}\) and \(\pmb {B}_{ijr}=(1,(X_{ijr}-x_{0r})/h_{r})^{T}\). Then the estimator of \(\pmb {b}_{r}\) is given by

$$\begin{aligned} \hat{\pmb {b}}_{r}=\Big (\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr} -x_{0r}}{h_{r}}\big )\pmb {B}_{ijr}\pmb {B}_{ijr}^{T}\Big )^{-1} \Big (\sum _{i=1}^{m}\sum _{j=1}^{n}K\big (\frac{X_{ijr}-x_{0r}}{h_{r}}\big ) \pmb {B}_{ijr}{\tilde{Y}}_{ijr}\Big ), \end{aligned}$$
(2.8)

where

$$\begin{aligned} {\tilde{Y}}_{ijr}=Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\hat{\pmb {\beta }}-\sum _{r'\ne r}{\tilde{f}}_{r'}(X_{ijr'}). \end{aligned}$$
(2.9)

The estimator of \(f_{r}(x_{0r})\) is given by \({\hat{f}}_{r}(x_{0r})={\hat{b}}_{r0}=(1,0)\hat{\pmb {b}}_{r}\) for \(r=1,\ldots ,d_{2}\).

To implement our estimation method, appropriate values \(M_{rN}\) and \(h_{r}\) are necessary. Here, \(M_{rN}\) are mainly used to estimate the parameters \(\pmb {\beta }_{0}\), for simplicity, we set \(M_{1N}=\cdots =M_{dN}=M_{N}\). The value for \(M_{N}\) can be selected by the following BIC information criterion:

$$\begin{aligned} BIC(M_{N})= & {} \log \{\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}[Y_{ij}-{\bar{Y}}- (\pmb {Z}_{ij}-\bar{\pmb {Z}})^{T}\pmb {{\hat{\beta }}}\nonumber \\&-\sum _{r=1}^{d_2} \pmb {A}_{r}^{T}(X_{ijr})\pmb {{\hat{a}}}_{r}]^{2}\} +M_{N}\log N/(2N). \end{aligned}$$
(2.10)

Large values of BIC indicate poor fits.

The bandwidth \(h_{r}\) can be selected by the following cross-validation (CV) score:

$$\begin{aligned} CV(h_{r})=\frac{1}{N}\sum _{i=1}^{m}\sum _{j=1}^{n}\Big (Y_{ij} -{\hat{\mu }}^{-ij}-\pmb {Z}_{ij}^{T}\hat{\pmb {\beta }}^{-ij}-\sum _{r'\ne r}{\tilde{f}}_{r'}^{-ij}(X_{ijr'})-{\hat{f}}_{r}^{-ij}(X_{ijr})\Big )^{2}, \nonumber \\ \end{aligned}$$
(2.11)

where \({\hat{\mu }}^{-ij}, \hat{\pmb {\beta }}^{-ij},{\tilde{f}}_{r'}^{-ij}(X_{ijr'})\) and \({\hat{f}}_{r}^{-ij}(X_{ijr})\) are estimated by removing the (i,j)th observation.

3 Asymptotic properties of the estimators

In this section, we shall describe the asymptotic properties of the estimators \(\hat{\pmb {\beta }}\) and \({\hat{f}}_{r}(x_{r}), r=1,\ldots ,d_{2}\). We first list following assumptions.

  1. 1.

    The random field \(\{(Y_{ij},\pmb {Z}_{ij},\pmb {X}_{ij}): (i,j)\in {\mathbf {E}}^{2}\}\) is strictly stationary. The marginal density \(g_{r}(x_{r})\) of \(X_{ijr}\) is continuous and bounded away from 0 uniformly over \([c_{r1}, c_{r2}]\) for \(r=1,\ldots ,d_{2}\).

  2. 2.

    \(f_{r}(x_{r})\in C^{p_{r}+1}[c_{r1}, c_{r2}]\) for \(r=1,\ldots ,d_{2}\).

  3. 3.

    \(E\Vert \pmb {Z}_{ij}\Vert ^{4+2\delta }<\infty \) and \(E|\varepsilon _{ij}|^{4+2\delta }<\infty \) for some \(\delta >0\), where \(\varepsilon _{ij}=Y_{ij}-(\mu +\pmb {Z}_{ij}^{T}\pmb {\beta }_{0} +\sum _{r=1}^{d_{2}}f_{r}(X_{ijr}))\).

  4. 4.

    The function \(\psi (\cdot , \cdot )\) and \(\varphi \) satisfy that \(\psi (m,n)\le \min (m,n)\) and

    $$\begin{aligned} \lim _{k\rightarrow \infty }k^{\gamma }\sum _{t=k}^{\infty }t[\varphi (t)]^{\delta /(2+\delta )} =0 \end{aligned}$$

    for some constant \(\gamma >(4+\delta )/(2+\delta )\).

  5. 5.

    \(M_{N}^{4+2\delta /(4+\delta )}/N\rightarrow 0\) and \(NM_{rN}^{-4(p_{r}+1)}\rightarrow 0\), where \(M_{N}=\max _{1\le r\le d_{2}}M_{rN}\).

  6. 6.

    \(\min \{m,n\}\rightarrow \infty \) and there exist two sequences of positive integer vectors, \({\mathbf {p}}_{m,n}=(p_{1},p_{2})\in {\mathbf {E}}^{2}\) and \({\mathbf {q}}_{m,n}=(q,q)\in {\mathbf {E}}^{2}\), with \(q\rightarrow \infty \) such that \(q/p_{1}\rightarrow 0\), \(q/p_{2}\rightarrow 0\) and \(m/p_{1}\rightarrow \infty \), \(n/p_{2}\rightarrow \infty \), and \(p_{N}=p_{1}p_{2}=o((Nh_{r})^{1/2})\), \(N\varphi (q)\rightarrow 0\).

  7. 7.

    The kernel function \(K(\cdot )\ge 0\) is a bounded symmetric function with a compact support.

Assumption 1 is standard in this context; it has been used, for instance, by Gao et al. (2006) in the spatial context. The constants \(\gamma \) and \(\delta \) in assumption 4 may be interpreted as two indices of the mixing dependence. Generally, larger \(\gamma \) and smaller \(\delta \) indicate less dependence. If \(\varphi (t)=O(t^{-\kappa })\) for some \(\kappa >4(3+\delta )/\delta \) or \(\varphi (t)=O(e^{-\iota t})\) for some \(\iota >0\), then assumption 4 holds. Assumptions 6 is used to derive the asymptotic normality of the estimators.

Let \({\mathcal {F}}\) denote the class of additive functions such that \(F(\pmb {x})\in {\mathcal {F}}\) if \(F(\pmb {x}) =\mu +\sum _{r=1}^{d_{2}}F_{r}(x_{r})\) and \(F_{r}(x_{r})\) satisfies assumption 2 and \(E(F_{r}(X_{r}))=0\) for \(r=1,\ldots ,d_{2}\). To obtain the asymptotic distribution of \(\hat{\pmb {\beta }}\), we first need to adjust for the dependence of \(\pmb {Z}_{ij}\) and \(\pmb {X}_{ij}\), which is a common complication in semiparametric models. Let

$$\begin{aligned} f_{k}(\pmb {X}_{ij})=\mu _{k}+\sum _{r=1}^{d_{2}}f_{kr}(X_{ijr}) =arginf_{F(\pmb {x})\in {\mathcal {F}}}E[Z_{ijk}-F(\pmb {X}_{ij})]^{2} \end{aligned}$$

and \(\theta _{k}(\pmb {X}_{ij})=E(Z_{ijk}|\pmb {X}_{ij})\). Since

$$\begin{aligned} E[Z_{ijk}-f_{k}(\pmb {X}_{ij})]^{2} =E[Z_{ijk}-\theta _{k}(\pmb {X}_{ij})]^{2}+E[\theta _{k}(\pmb {X}_{ij}) -f_{k}(\pmb {X}_{ij})]^{2}, \end{aligned}$$

therefore, \(f_{k}(\pmb {X}_{ij})\) are the projections of \(\theta _{k}(\pmb {X}_{ij})\) onto the additive functional space \({\mathcal {F}}\) (under the \(L_{2}\)-norm). In other words, \(f_{k}(\pmb {X}_{ij})\) is an element that belongs to \({\mathcal {F}}\) and it is the closest function to \(\theta _{k}(\pmb {X}_{ij})\) among all the functions in \({\mathcal {F}}\), for any \(k =1,\ldots ,d_{1}\). Let \(V_{ijk}=Z_{ijk}-f_{k}(\pmb {X}_{ij})\) and \(\pmb {V}_{ij}=(V_{ij1},\ldots ,V_{ijd_{1}})^{T}\). Set \(\pmb {\Gamma }=E(\pmb {V}_{ij}\pmb {V}_{ij}^{T})\) and \(\pmb {\Omega }=\sum _{i=-\infty }^{+\infty }\sum _{j=-\infty }^{+\infty }E[\pmb {V}_{00}\pmb {V}_{ij}^{T}\varepsilon _{00}\varepsilon _{ij}]\). The following theorem gives the asymptotic distribution of the estimator \(\hat{\pmb {\beta }}\).

Theorem 3.1

Suppose that assumptions 1–6 hold. Then

$$\begin{aligned} \sqrt{N}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\overset{d}{\rightarrow }N (O,\pmb {\Gamma }^{-1}\pmb {\Omega }\pmb {\Gamma }^{-1}). \end{aligned}$$
(3.1)

Let \(\mu _{k}=\int x^{k}K(x)dx\), \(\nu _{k} =\int x^{k}K^{2}(x)dx\) for \(k=0, 1, \ldots \), and \(\sigma _{r}^{2}(x_{r})=E(\varepsilon _{ij}^{2}|X_{ijr}=x_{r})\). The following Theorem 3.2 gives the asymptotic distribution of the estimators \({\hat{f}}_{r}(x_{r})\) for \(r=1,\ldots ,d_{2}\).

Theorem 3.2

Suppose that assumptions 1–7 hold and \(\sigma _{r}^{2}(x_{r})\) is continuous in some neighborhood of \(x_{0r}\) and \(M_{N}^{1+\delta /(4+\delta )}h_{r}\le C_{2}\) for some positive constant \(C_{2}\). If \(x_{0r}\) is an interior point of \([c_{r1}, c_{r2}]\), then,

$$\begin{aligned} \sqrt{Nh_{r}}\big ({\hat{f}}_{r}(x_{0r})-f_{r}(x_{0r})-\frac{\mu _{2}h_{r}^{2}}{2\mu _{0}}f''_{r}(x_{0r})\big )\overset{d}{\rightarrow }N\big (0,\frac{\nu _{0} \sigma _{r}^{2}(x_{0r})}{\mu _{0}^{2}g_{r}(x_{0r})}\big ). \end{aligned}$$
(3.2)

The result of Theorem 3.2 indicates that the estimator of \(f_{r}(x_{0r})\) in partially linear additive model has the same asymptotic distribution as the estimator in spatial univariate nonparametric regression.

4 Simulations

We now study the finite sample performance of our proposed method in Sect. 2 using a simulation spatial dataset. We shall use the following spatial partially linear additive model. Let \(\{\varepsilon _{ij}^{(k)}: (i,j)\in {\mathbf {E}}^{2}\}, k=1,2,3\) be three mutually independent i.i.d. N(0, 1) white-noise processes and \(\{\varepsilon _{ij}^{(4)}=(\varepsilon _{ij1}^{(4)},\varepsilon _{ij2}^{(4)}, \varepsilon _{ij3}^{(4)})^{T}: (i,j)\in {\mathbf {E}}^{2}\}\) be an i.i.d. \(N(0, \pmb {\Theta })\) process with \(\pmb {\Theta }=(\theta _{kk'})_{3\times 3}\) and \(\theta _{kk'}=\exp (-|k-k'|/3)\). Let \(\{\varepsilon _{ij}^{(5)}: (i,j)\in {\mathbf {E}}^{2}\}\) be an i.i.d. N(0, 0.1) white-noise process. Let

$$\begin{aligned} Y_{ij}=\mu +Z_{ij1}\beta _{1}+Z_{ij2}\beta _{2}+Z_{ij3}\beta _{3}+\sum _{r=1}^{3}f_{r} (X_{ijr})+\varepsilon _{ij} \end{aligned}$$
(4.1)

with \(\mu =2.4\), \(\beta _{1}=1.6\), \(\beta _{2}=-2.5\), \(\beta _{3}=3.1\) and \(f_{r}(X_{r})=f_{0r}(X_{r})-E(f_{0r}(X_{r}))\), where

$$\begin{aligned} f_{01}(X_{1})= & {} 1.2(X_{1}-0.9)^{2}, f_{02}(X_{2})=-2.1\sin (X_{2}+1.5), f_{03}(X_{3})\\= & {} 1.8\cos (3X_{3}-0.6). \end{aligned}$$

We set \(Z_{ij1}=X_{ij1}^{2}-2X_{ij2}+\varepsilon _{ij}^{(1)}\), \(Z_{ij2}=X_{ij1}+X_{ij2}^{2}-0.6X_{ij3}^{2}+\varepsilon _{ij}^{(2)}\), \(Z_{ij3}=0.9+\sin (2X_{ij2}-X_{ij3})+\varepsilon _{ij}^{(3)}\), \(X_{ij1}=(U_{ij1}+U_{ij2})/2\), \(X_{ij2}=(U_{ij2}+U_{ij3})/2\), \(X_{ij3}=(U_{(ij1}+U_{ij3})/2\). \(\{U_{ijr}: (i,j)\in {\mathbf {E}}^{2}\}, r=1,2,3\) are generated by the spatial autoregression

$$\begin{aligned} U_{ijr}=\sin (U_{(i-1)jr}+U_{i(j-1)r}+U_{(i+1)jr}+U_{i(j+1)r}) +\varepsilon _{ijr}^{(4)}, \end{aligned}$$

and \(\{\varepsilon _{ij}: (i,j)\in {\mathbf {E}}^{2}\}\) is generated by the spatial autoregression \(\varepsilon _{ij}=(\varepsilon _{(i-1)j}+\varepsilon _{i(j-1)}+\varepsilon _{(i+1)j} +\varepsilon _{i(j+1)})/5+\varepsilon _{ij}^{(5)}.\)

Data were simulated from this model over a rectangular domain of \(m\times n\) sites—more precisely, over a grid of the form \(\{(i,j): 76\le i\le 75+m,76\le j\le 75+n\}\), for various values of m and n. Each replication was obtained iteratively by the steps given in Hallin et al. (2004). 500 simulated spatial data sets are independently generated. For each simulated data set, the estimators of \(\beta _{r},r=1,2,3\) and \(\mu \) were computed by (2.2) with \(p_{r}=1\) and (2.6). The number of subinterval \(M_{1N}=M_{2N}=M_{3N}=M_{N}\) were determined by BIC criterion given by (2.10).The estimators of \(f_{r}(x_{r}),r=1,2,3\) were computed by (2.8) with Epanechnikov kernel. Bandwidth \(h_{r}\) was select by “leave-one-out” cross-validation procedure given by (2.10).

Table 1 summaries the mean square errors (MSE) of the estimators \({\hat{\mu }}\) and \({\hat{\beta }}_{r}\) for \(r=1,2,3\) and the mean integrated squared errors (MISE) of the estimators \({\hat{f}}_{r}(x_{r})\) for \(r=1,2,3\) based on 500 simulations for different m and n. Figure 1 depicts the actual functions \(f_{r}(x_{r}),r=1,2,3\) and the mean estimated curves over 500 simulations with \(m=20,n=20\) and their \(95\%\) pointwise confidence bands. We see from Table 1 and Fig. 1 that our estimation procedure gives satisfactory results ever if for small m and n.

Table 1 MSEs and MISEs of the estimators based on regular grid data
Fig. 1
figure 1

The actual and the mean estimated curves for \(f_{r}(x_{r}),r=1,2,3\) in model (4.1) with \(m=20,n=20\) the 95% pointwise confidence bands. —, true curves; - - -, mean estimated curves; ..., 95% pointwise confidence bands

We now compare the proposed method (PM) with the B-spline method (BM) given in Huang et al. (2010). In B-spline method, cubic splines with equally spaced knots are used to approximate the additive functions \(f_{r}(u_{r}), r=1,2,3,\) the smoothing parameter \(K_{n}\) were determined by BIC criterion. For different m and n, Table 2 reports the MSEs of parametric estimators \({\hat{\mu }}\) and \({\hat{\beta }}_{r},r=1,2,3\) and the mean of the weighted average squared error \(WASE_{r}\) of the estimators \({\hat{f}}_{r}(x_{r}),r=1,2,3\) based on 500 simulations, where \(WASE_{r}\) is defined as

$$\begin{aligned} WASE_{r}=\frac{1}{mn+1-2[0.025mn]}\sum _{i=[0.025mn]}^{mn -[0.025mn]}\frac{[{\hat{f}}_{r}(x_{(ir)})-f_{r}(x_{(ir)})]^{2}}{\text{[ }{range}(f_{r})]^{2}}, \end{aligned}$$

\(x_{(1r)}\le x_{(2r)}\le \cdots \le x_{(mnr)}\) is a permutation of \(\{x_{ijr}: 76\le i\le 75+m,76\le j\le 75+n\}\) and range\((f_{r})\) is the range of the function \(f_{r}(x_{r})\). Since the data near the boundary points are sparse, the estimators near the boundary points become poor. So we removed these points from \(WASE_{r}\). Based on 500 simulations with \(m=20,n=10\), Fig. 2 displays the boxplots for squared errors (SE) of parametric estimators \({\hat{\mu }}\) and \({\hat{\beta }}_{r},r=1,2,3\) and \(WASE_{r}\) of the estimators \({\hat{f}}_{r}(x_{r}),r=1,2,3\). We see from Table 2 and Fig. 2 that there are no obvious differences between the parametric estimators of two methods, but the estimators of additive components \(f_{r}(x_{r}), r=1,2,3\) obtained by the proposed method outperform that obtained by B-spline method. Because the number of knots of all the B-spline functions approximating the additive components \(f_{r}(u_{r}), r=1,2,3\) is the same, when additive functions have different degrees of smoothness, some estimators will become inefficient. In the proposed method, we first obtain the estimators of \(\pmb {\beta }_{0}\) and the piecewise polynomial estimators of \(f_{r}(x_{r}), r=1,2,3\), the local linear estimator of \(f_{r}(u_{r})\) is obtained by solving the univariate regression problem and the bandwidth \(h_{r}\) is easily chosen.

Table 2 Comparison of the proposed method with B-spline method
Fig. 2
figure 2

Box plots for SE and WASE. Label 1 is boxplot for the proposed method and Label 2 is boxplot for B-spline method. a is boxplot for \({\hat{\mu }}\), b is boxplot for \({\hat{\beta }}_{1}\), c is boxplot for \({\hat{\beta }}_{2}\), d is boxplot for \({\hat{\beta }}_{3}\), e is boxplot for \({\hat{f}}_{1}(x_{1})\), f is boxplot for \({\hat{f}}_{2}(x_{2})\) and g is boxplot for \({\hat{f}}_{3}(x_{3})\)

To investigate the performances of the estimators under different \(M_{n}\), Table 3 displays the simulation results based on 500 simulations with \(m=20, n=10\). Table 3 shows that the MSEs for parametric estimators and the WASEs for nonparametric estimators are not sensitive to the choice of \(M_{n}\) and these estimators are efficient under broad ranges of \(M_{n}\). These results also illustrate that the above BIC smoothing parameter selection procedure generally gives the satisfactory results.

Table 3 The MSEs and WASEs for the estimators with different \(M_{N}\)

In order to investigate the influence of degree of the piecewise polynomial on the estimators, based on 500 simulations, Table 4 reports the simulation results under \(q_{r}=1,2,3\), which correspond to piecewise linear function, piecewise quadratic polynomial and piecewise cubic polynomial respectively. We see from Table 4 that there is no essential difference between the three estimates. Of course, in comparison, piecewise linear function is easier to operate.

Table 4 The MSEs and WAESs for the estimators with different \(p_{r}\)

In order to investigate the performances of our estimators based on irregular grid data, we first generate data over the rectangular grid of \(m\times n\) sites. We then randomly selected half of the data to fit the model, that is, each site has a probability 0.5 of being selected. The final sample size is about mn/2. For different m and n, Table 5 summaries the MSEs of the parametric estimators and the MISEs of the nonparametric estimators \({\hat{f}}_{r}(x_{r})\) for \(r=1,2,3\) based on 500 simulations. Comparing Table 5 with Table 1, we see that there is no clear difference between two estimators of \(\mu \) and \(\beta _{r}, r=1,2,3\). When the sample size is small (N=100,200), the estimators of \(f_{r}(x_{r})\) for \(r=1,2,3\) based on regular grid data is slightly better than that based on irregular grid data, but the sample size is large (N=300,400,600), there is no clear difference between two estimators of \(f_{r}(x_{r})\) for \(r=1,2,3\). All these show that our estimation method is also efficient for irregular grid data.

Table 5 MSEs and MISEs of the estimators based on irregular grid data

5 A real data example

In this section, we apply our methods to analyze a real estate data set which includes the real estate data for 203 second-, third-and fourth-tier cities in China in 2016. Our purpose is to study the relationship between urban housing prices and their influencing factors. The response variable is urban housing price (Price). The covariates of primary interests include the average annual income of urban residents (Income), urban category, urban population, urban GDP, the ratio of no house and one house (RNO), bank interest rate (BIR), urban livability index (ULI), urban comprehensive competitiveness (UCC) and urban development index (UDI). We note that among these variables the data of some variables such as population and GDP are very large, whereas that of some variables such as ULI and UCC are small. For this purpose, for each data of these variables, we first make the following modification: each observation was divided by the maximum of all observations for this variable, so that the maximum of modified data of the variable is 1.

We first consider the following partially linear additive regression

$$\begin{aligned} \begin{array}{ll} \log (Price) &{} =\mu +\beta _{1}Z_{1}+\beta _{2}Z_{2}+f_{1}(Income)+f_{2}(Population)+f_{3}(GDP) \\ &{} +f_{4}(RNO)+f_{5}(BIR)+f_{6}(ULI)+f_{7}(UCC)+f_{8}(UDI)+\varepsilon , \\ \end{array} \nonumber \\ \end{aligned}$$
(5.1)

where \(Z_{1}=1\), \(Z_{2}=0\) stands for second-tier city, \(Z_{1}=0\) and \(Z_{2}=1\) stands for third-tier city and \(Z_{1}=0\) and \(Z_{2}=0\) stands for fourth-tier city.

The estimators of \(\beta _{1}\) and \(\beta _{2}\) are computed by (2.2) with \(p_{r}=1\), the estimators of \(\mu \) is computed by (2.6). The number of subinterval \(M_{N}\) are determined by BIC criterion given in Sect. 2. The estimators of \(f_{r}(\cdot ),r=1,\ldots ,8\) are computed by (2.8) with Epanechnikov kernel. Bandwidth \(h_{r}\) are selected by “leave-one-out” cross-validation procedure. Table 6 exhibits the parametric estimators, and Fig. 3 shows the estimated curves of the additive components and their \(95\%\) pointwise confidence intervals. The fact that \({\hat{\beta }}_{1}>{\hat{\beta }}_{2}>0\) in Table 6 indicates that the housing price for a second-tier city is larger than that for a third-tier city and the housing price for a third-tier city is larger than that for a fourth-tier city. We see from Fig. 3 that the estimated curves \({\hat{f}}_{1}(Income)\), \({\hat{f}}_{4}(RNO)\) and \({\hat{f}}_{8}(UDI)\) show rising trends as Income, RNO and UDI increase, while \({\hat{f}}_{5}(BIR)\) show decreasing trends as BIR increase. Figure 3 shows the subtle impact of the urban population and urban livability index (ULI) on housing prices, which overall declines as the urban population increases, while housing prices overall rise along with the urban livability index. This can be explained that some central and western cities, such as Fuyang, Shangqiu, Nanyang and Handan, have large populations, while housing prices in these cities are low. Figure 3 shows that \({\hat{f}}_{3}(GDP)\) appears the changing trend of decreasing first and then rising, this is due to lower house prices in some central and western cities, and higher prices in some eastern cities with high GDP. Figure 3 also shows that \({\hat{f}}_{7}(UCC)\) appears the changing trend of rising first and then decreasing.

Table 6 The parametric estimators for model (5.1). The values in parenthesis are standard error
Fig. 3
figure 3

The estimated curves (solid line) and their \(95\%\) pointwise confidence intervals (dotted lines)

We find from Fig. 3 that the estimated curves of \(f_{2}(Population),f_{4}(RNO)\) and \(f_{6}(ULI))\) approximate straight line. We then further construct the following partially linear additive regression

$$\begin{aligned} \begin{array}{ll} \log (Price) &{} =\mu +\beta _{1}Z_{1}+\beta _{2}Z_{2}+\beta _{3}Population+\beta _{4}RNO+\beta _{5}ULI \\ &{} +f_{1}(Income)+f_{2}(GDP)+f_{3}(BIR)+f_{4}(UCC)+f_{5}(UDI)+\varepsilon , \\ \end{array} \end{aligned}$$
(5.2)

Table 7 displays the parametric estimators and Fig. 4 shows the estimated curves of the additive components and their \(95\%\) pointwise confidence intervals in model (5.2). We see from Table 7 that \(\log (Price)\) is negatively associated with urban population and positively associated with the ratio of no house and one house and urban livability index. Since the absolute values of \({\hat{\beta }}_{3}\) and \({\hat{\beta }}_{5}\) are small, it is shown to have a small effect on \(\log (Price)\). Figure 4 shows that the estimated curves in model (5.2) are similar to that in model (5.1).

Table 7 The parametric estimators for model (5.2). The values in parenthesis are standard error
Fig. 4
figure 4

The estimated curves (solid line) and their \(95\%\) pointwise confidence intervals (dotted lines)

To evaluate the prediction performance of model (5.1) and model (5.2), we applied leave-one-out cross-validation to the data; i.e., when predicting the housing price for the ith city, we omit the data for this city when fitting the models. Figure 5 displays the boxplots for the absolute prediction errors \(|\widehat{\log (Price_{i})}-\log (Price_{i})|,\ i=1,\ldots ,203,\) for models (5.1) and (5.2), where \(Price_{i}\) denotes the housing price for the ith city. The mean values of these errors for the two models are 0.3050 and 0.2031 respectively. We see from Fig. 5 and the mean absolute prediction errors that model (5.2) behaviors better than model (5.1) in prediction performance.

Fig. 5
figure 5

Boxplot for the absolute prediction error \(|\widehat{\log (Price_{i}))}-\log (Price_{i}))|,\ i=1,\ldots ,203\) for two models. Here 1 is the boxplot for model (5.1) and 2 is the boxplot for models (5.2)

6 Conclusion and future studies

We have proposed a new estimation procedure for partially linear additive regressions with spatial data using piecewise polynomials combined with local linear methods. The proposed method is suitable for high dimensional data, there is no need to solve the restricted minimization problem and no iterative algorithms are needed. Under mild regularity assumptions, the asymptotic distribution of the estimator of the unknown parameter vector is established, the asymptotic distributions of the estimators of the unknown functions are also derived.

We method focuses on data on a regular grid. Simulation studies in Sect. 4 demonstrate the proposed procedure is also efficient for data irregularly positioned, but the relevant theoretical properties will need to be developed, which are left for future research. As pointed out by a referee, geoadditive models in which spatial effect is modeled via a bivariate function have been proposed and investigated in literature. Our method can be extended to geoadditive models. We leave such extensions to future work.