Abstract
This study investigates using a Monte Carlo analysis the performance of the two most important information criteria, such as the Akaike’s Information Criterion and the Bayesian Information Criterion, not only in terms of selecting the true spatial econometric model but also in term of detecting spatial dependence in comparison with the LM tests for the simple two spatial models SLM and SEM. The analysis is also extended by incorporating several other spatial econometric models, such as the SLX, SDM, SARAR and SDEM, along with heteroscedastic and non-normal errors. Simulation results show that under ideal conditions these criteria can assist the analyst to select the true spatial econometric model and detect properly spatial dependence.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Spatial autocorrelation in errors is a very common problem in linear regression models with spatial data, which should be treated with caution, since it violates the assumption of the random sample, leading the analyst to ambiguous results. This behavior typically arises from observations corresponding to geographically proximate locations that are correlated because of their spatial dependence. As Anselin (1988a) has clarified, this spatial dependence along with spatial heterogeneity define the concept of spatial effects. For this purpose, a set of Lagrange Multiplier (LM) tests, known as spatial dependence tests (see Burridge 1980; Anselin 1988b; Anselin et al. 1996), have been developed in the literature to assist the analyst in terms of selecting and estimating the most appropriate spatial econometric model that considers the presence of spatial dependence. Moreover, spatially autocorrelated errors can also appear in spatial regression analysis as a symptom of a false indication of spatial dependence due to spurious behavior, as Finglenton (1999), Mur and Trivez (2003) and Agiakloglou et. al. (2015) have indicated.
Nevertheless, these LM tests that have been widely used in many empirical applications contain two important drawbacks. The first one is related to the restrictive alternate model structure imposed by these tests and the second one is associated with their reliability in selecting the right model. Indeed, as it is known, these tests are applied exclusively to the choice between a simple econometric model and a spatial model with a spatial lag structure either in the dependent variable or in the error, whereas in several cases their application often leads to inconsistent conclusions as to the choice of the right spatial econometric model, a problem that has been addressed by Anselin and Florax (1995). In addition, LeSage and Pace (2009) point out that spatial dependence tests were developed and established in the logic that their statistics are calculated solely from the residuals derived from the estimation of the simple econometric model using least squares estimation without requiring estimating the corresponding spatial econometric model. Clearly, these tests for spatial dependence versus the null hypothesis of no dependence do not require maximum likelihood estimation of the spatial model under the alternative hypothesis. For this reason and given the current availability of software, LeSage and Pace (2009) suggested that the selection of a spatial model should be made in the context of comparing the likelihoods of different models, while the analysis should start from a more general model that nests both the spatial lag model and the spatial error model.
Thus, it will be very interesting to investigate whether the use of any information criterion can help the analyst to select the true spatial econometric model, knowing that these criteria are usually applied to any quantitative analysis and that their performance has been limited explored to spatial econometric analysis. For this purpose, a Monte Carlo analysis is conducted to evaluate the performance of the two most frequently used information criteria, such as the Akaike’s Information Criterion (AIC) and the Bayesian Information Criterion (BIC), using only the three most important spatial econometric models, such as the SIM, SLM and SEM, suitable for the application of the LM tests, not only in terms of detecting spatial dependence but also in terms of selecting the right spatial econometric model as a complementary approach to model selection using the LM tests for these models. Simulation results show that these criteria can assist the analyst to identify the right spatial econometric model and spatial dependence more effectively in some cases than the LM tests. Note that the simulation process is conducted using rook and queen construction matrices along with a real geographical structure, i.e., the spatial structure of Greece, which has a lot of geographical peculiarities resulting in quite asymmetric spatial weights matrices. The research is also expanded, in term of selecting the right spatial econometric model, using the two aforementioned information criteria, by considering more spatial econometric models, namely the SLX, SDM, SARAR, and SDEM, as well as two non-ideal situations, such as heteroscedasticity and non- normality, where the results vary considerably. Hence, the objective of this research is concentrated on selecting the best-fitted spatial econometric model given the weights matrix formation, rather than searching for the most appropriate weights matrix formation knowing the spatial econometric model, as discussed in Zhang and Yu (2018).
The remaining of the paper is organized as follows. Section 2 presents the three simple econometric models, namely the SIM, SLM and SEM, along with the SDM, as a special case of the SEM, the information criteria AIC and BIC and analyses the strategies applied for the LM tests. Section 3 describes the design of the simulation process and discusses the results. Section 4 presents the extension of the Monte Carlo analysis with all the elements that have been added including the additional spatial econometric models, such as the SLX, SARAR, and SDEM, along with heteroscedastic and non-normal errors and discusses the results. The concluding remarks are included in Sect. 5.
2 The LM tests for spatial econometric models and the information criteria
Consider the Spatial Independent Model (SIM), also known as the Non-Spatial Econometric Model (NSEM), defined as:
where y is a (\(n \times 1\)) vector of observations of the dependent variable, Χ is a [\(n \times (k + 1)\)] matrix of observations of k independent variables with values of 1 for its first column to include the presence of the constant term, β is the [\((k + 1) \times 1)\)] vector of coefficients of the model and ε is the (\(n \times 1\)) random vector following the standard assumptions of regression, i.e., ε ~ N(0, σ2I).
Spatial econometric models have been introduced in the literature as multidirectional extensions of the time series econometric models on the geographical space, defining in that sense dependence for the values of a variable according to the geographical positions of its values and of the values of all independent variables in the model, including the error term and not according to their chronological dependence. The spatial dependence is incorporated into the model by the presence of the spatial weights matrix W, which defines the spatial interactions between the n neighboring regions and it is used in its row-standardized form.Footnote 1
The two most important spatial econometric models are the Spatial Lag Model (SLM) defined as:
and the Spatial Error Model (SEM) defined as:
with
where u is a (\(n \times 1\)) random vector and Wy and Wε are the spatially lagged vectors that incorporate spatial dependence consisting of weighted averages of the values of the variables in the n neighboring regions, while ρ and λ are the spatial lag coefficients of the dependent variable and the errors respectively. In addition, the SΕM model can be expressed as a spatial econometric model with spatial lags for all variables of the model, known as the Spatial Durbin Model (SDM), defined as:
where ρ = λ, \(-\lambda{\varvec{\beta}}={\varvec{\theta}}\), a condition that can be tested by performing the test presented by Mur and Angulo (2006), known as a common factor test, to investigate whether the model is actually a spatial error model or a more general spatial model, and the WX is the spatial lag matrix of the independent variables. Hence, when the SDM is estimated, while the true generating spatial model is the SEM, one should expect to get the same results.
It also important to mention that the values of the coefficients ρ and λ are not necessarily restricted strictly to the interval (− 1, + 1), as in time series analysis. The estimation is implemented provided that the Jacobian matrix is non-singular, an outcome that is related to the eigenvalues of the spatial weights matrices. More specific, row-standardized spatial weights matrices have always the largest eigenvalue equals to unity, something that ensures that the upper limit of the interval will always be + 1, while the value of the lower limit is unknown, and several times can be smaller than − 1 (see LeSage and Pace 2009). Thus, if the coefficient takes values inside the feasible interval, corresponding to the applied weights matrix, the Jacobian determinant will be positive, comforting that its logarithm exists and therefore the log-likelihood function will be well defined.
The selection of the best fitted model for a given set of spatial data is typically made through the LM tests for spatial dependence. In particular, the LM test for a SEM (LM-ERR), introduced by Burridge (1980), and the LM test for a SLM (LM-LAG), presented by Anselin (1988b), are conducted on a SIM under the null hypothesis, against a SEM and a SLM under the alternative hypothesis, respectively, and the data contains no spatial effects if the null hypothesis is accepted by both tests. Issues arise when both tests reject the null hypothesis, a behavior that tends to appear very often in practice, as indicated by Anselin et al. (1996), where the tests cannot clearly identify the type of the spatial effect, unless one test rejects and the other one accepts the null hypothesis. For this reason, Florax et al. (2003) proposed to select the spatial econometric model for which the LM statistic will have the highest value, hereafter strategy I. Moreover, and as an effort to minimize this problematic behavior, comparable robust LM tests have been constructed by Anselin et. al. (1996), namely the Robust LM-Error test (LM-EL) and the Robust LM-Lag test (LM-LE), showing that these tests have more power in locating the correct spatial model than the simple LM tests. Hence, the presence of spatial dependence is inspected through these new tests, hereafter strategy II, and if the null hypothesis is rejected by both robust LM tests, the spatial econometric model is selected according to the highest value of one of the two robust LM statistics. Lastly, another strategy has been proposed by Florax et al. (2003), known as hybrid strategy, which combines the classical and the robust LM tests, but it turns out that this strategy leads to the same results as the classical approach (strategy I), as indicated by Florax et al. (2003) and proved by Mur and Angulo (2009).
The choice also of the best fitted model for a given set of spatial data can also be conducted based on the values of a pre-selected information criterion, as this technique is typically applied in every quantitate analysis that involves model selection. Hence, the values of the two most often used in practice information criteria, such as the Akaike Information Criterion (AIC), presented by Akaike (1973), and the Bayesian Information Criterion (BIC), developed by Schwarz (1978), as an attempt to improve the performance of AIC, defined respectively as:
and
where \(\ln \hat{L}\) is the maximized value of the log-likelihood function, p is the number of parameters estimated from the econometric model and n is the sample size used for the estimation of the model, can be computed right after the estimation of any spatial model by maximum likelihood estimation and the best fitted model is selected according to the minimum value of that criterion, a technique that has been used in practice by Chi and Zhu (2008) on their empirical work for demographic data to select the best fitted spatial econometric model.
For this reason, it will be very interesting to study the performance of these criteria for spatial data in lieu of the performance of the LM tests not only in terms of identifying spatial dependence but also in terms of selecting the best spatial econometric model for these limited alternatively spatial models, knowing that each criterion has its own penalty function and therefore different pattern for model selection. Indeed, as it is known, AIC has the tendency to select models with large number of parameters, whereas BIC typically chooses small models, as a true approximation of their unknown population behavior.Footnote 2 Note that the behavior of these criteria has been investigated for geostatistical models by Hoeting et al. (2006) and Lee and Ghosh (2009) and for spatial processes by Agiakloglou and Tsimpanos (2021), but not for spatial econometric models.
3 Simulation results
The performance of the two previously presented information criteria, i.e., the AIC and the BIC, is investigated in terms of selecting the true spatial econometric model among the three alternative models, namely, the SIM, the SLM and the SDM, using a Monte Carlo analysis, where the SEM is included as an estimated SDM. The simulation process is conducted by considering only one independent variable derived from a uniform U(0, 10) distribution which is simulated only once and then it remained constant for all iterations. Thus, the matrix \({\varvec{X}}\) has dimensions (\(n\times 2\)) consisting of one independent variable and a column of ones to estimate the constant term. The vector of coefficients \({\varvec{\beta}}\) with dimension \((2\times 1)\) is assumed to take values of one for both of its elements. The random error vector \({\varvec{\varepsilon}}\) is derived from a \(N(0,\boldsymbol{\rm I})\) distribution and it is added to the vector \({\varvec{X}}{\varvec{\beta}}\) to produce the vector of the dependent variable \({\varvec{y}}\) for the non-spatial econometric model.
Spatial dependence is introduced into the models by defining a row-standardized spatial weights matrix \({\varvec{W}}\) with dimensions (\(n\times n\)) which is constructed using the rook (four neighbors-common edge) and the queen (eight neighbors-common edge and vertex) contiguity definitions, over a squared regular lattice for dimensions 10 × 10 and 20 × 20 providing samples of 100 and 400 observations. Calculation of the eigenvalues for the four spatial matrices shows that the lower value of the spatial coefficient for the rook criterion is − 1 for both sample sizes, meaning that the feasible range of values that the parameter can take is (− 1, 1), whereas the lower value for the queen criterion is − 1.97 for sample size of 100 observations and − 1.921 for sample size of 400 observations, leading to feasible ranges of (− 1.97, 1) and (− 1.921, 1) for sample sizes of 100 and 400 observations respectively (see Bivand et al. 2013 and Agiakloglou and Tsimpanos 2021). In addition, the simulation prosses is extended by including weight matrices derived from a geographical structure of Greece at the local authority districts of Kallikrates Operational Programme consisting of 325 municipalities.Footnote 3 The weights matrices are constructed to capture real geographical structure according to the 4-nearest neighbors and the 8-nearest neighbors definitions, based on the geographical coordinates of the centroid for each municipality. Therefore, the formation of the weights matrices will be considered as given for the whole Monte Carlo analysis, although Kelejian and Piras (2011) proposed a J-test for investigating alternative spatial econometric models with different weights matrices under the null hypothesis of a specific spatial econometric model (see also Jin and Lee 2013).
The simulation process is conducted in R using the SPDEP package developed by Bivand (2015) to generate: a) the SIM, b) the SLM, by multiplying the right-hand side of the SIM by the spatial multiplier \({\left(\boldsymbol{\rm I}-\rho {\varvec{W}}\right)}^{-1}\), that is:
c) the SEM as:
where the vector error term \({\varvec{u}}\) with (\(n\times 1\)) dimensions is derived from \(N(0,{\varvec{I}})\) distribution and the spatial parameters ρ and λ can take values within the feasible range intervals, as previously mentioned for these spatial weights matrices formulation, and d) the SDM as follows:
while no restrictions are applied for the values of θ. All models are estimated by maximizing the log-likelihood function so that the values of both information criteria can be calculated. The best fitted econometric model is selected according to the minimum value of the pre-defined criterion based on 1000 replications. Note that the SEM is not estimated directly but only indirectly as a SDM and the information criterion should select the SDM when the true generating model is the SEM.
Table 1 presents the percentage selection rates of both criteria when the true generating model is the SIM. As can be seen from this table, the BIC performs very well in terms of selecting the true model with selection rates close to 95% and 98% for samples of 100 and 400 observations respectively, regardless of the spatial weights matrix formation, including the Greek weight matrices. On the other hand, the selection rate of the true model based on both strategies using the LM tests is smaller than the selection rate of BIC, as can be seen from Table 1, although the empirical levels of all LM tests are close to the nominal level of 5% regardless of the sample size and the spatial weights matrix formation, a result that can also be found in Anselin and Florax (1995) with two independent variables used in the regression analysis. Hence, the BIC, unlike the AIC, will lead the analyst to the right model selection with confidence slightly larger than any of the LM tests strategy, especially for large sample sizes. The selection rates for all three econometric models based on both information criteria when the true generating model is the SLM are reported on Table 2. As can be seen from this table, the selection rate of the true model by both criteria is not affected by the value and by the sign of the spatial autoregressive parameter ρ, including the extreme case of a negative value smaller than -1 for the queen formation, as well as by the spatial matrix formation. The performance of both criteria is determined mainly by the sample size, i.e., the AIC selects the true model at a rate of 82% and 84% for sample sizes of 100 and 400 observations respectively, while the BIC selects the true model more accurate at rates of 96% and 99%, respectively. Hence, the BIC outperforms the AIC, as in the previous case, in terms of selecting the right spatial econometric model, for every given value of the spatial autoregressive parameter and sample size reaching levels close to certainty. It is also important to indicate that the SIM is not selected at all by both criteria, except for small values of ρ and small sample size at a very low rate. Furthermore, the spatial dependence as well as the right spatial econometric model are also recognized successfully by the LM tests, since one of the two the LM tests is designed for this alternative model structure. In that sense the LM tests using both strategies select the true model, i.e., the SLM, with more confidence than the BIC, reaching very frequently levels of certainty, as can be seen from Table 3, regardless of the matrix formation.Footnote 4
Table 4 reports the selection rates for all three econometric models by both information criteria when the true generating model is the SEM. As can be seen from this table, both criteria select the true model in its equivalent form, i.e., the SDM, more aggressively than the previous case, reaching very frequently levels of 100%. However, the performance of both criteria is strongly affected by the absolute value of the spatial error parameter λ, by the sample size and by the weights matrix formation, unlike the previous case. The true model is selected more frequently by both criteria as the absolute value of λ increases, regardless of the weights matrix formation, reaching the level of certainty for large absolute values of the spatial error parameter. For small absolute values of λ the SDM is not selected with certainty, simply because the spatial effects of a spatial error model are not significant and therefore, they appear in regression analysis as nuisance. This symptom is observed more rigorously for small sample sizes by both criteria and especially by the BIC using the queen formation or the K = 8 matrix formation where the selected best fitted model is the SIM. Unlike the previous case, the performance of both criteria is strongly affected by the formation of the weights matrices with the rook formation giving better results. In general, the AIC performs better than the BIC for this case, for every given value of the spatial error parameter, sample size and weights matrix formation, except for cases where both criteria select the right model with certainty. In addition, the presence of spatial error dependence is also identified by the LM tests, since one of them is specifically designed for this alternative spatial model structure. The LM tests will eventually select the true model using both strategies with slightly less certainty than the AIC, also reaching levels of 100%, as can be seen from Table 5.
Table 6 presents the selection rates of both information criteria when the true generating process is the SDM. The objective in this case is to investigate the performance of the information criteria in terms of selecting a more complicated spatial econometric model, which is equivalent to SEM under certain restrictions, an action that cannot be implemented directly by the LM tests, since these tests have not been constructed for this spatial model structure. As can be seen from this table, which reports simulation results for values of ρ and θ equal to 0.2, 0.5 and 0.8, based on 1000 replications, the performance of both information criteria is extremely very good in terms of selecting the true model.Footnote 5 Moreover, none of the two criteria ever selects the SIM, indicating that the spatial dependence is clearly recognized either by the true model or alternatively by the SLM for cases where the values of both parameters are small or just one of them. However, even in the case where the SLM is selected, although it is not the true model, the miss-selection problem is mitigated, since this model contains a spatial lag of the dependent variable. In general, the selection rate of the true model increases as sample size increases as well as the absolute values of both parameters increase reaching levels of certainty very quickly even for small sample sizes regardless of the weights matrix formation. For given sample size and value of θ (ρ) the performance of both criteria increases as the value of ρ (θ) increases. Finally, the AIC has an overall better performance than the BIC, since the true model contains more parameters regardless of the weights matrix formation.
The presence of spatial dependence along with the selection of the true model are also investigated by the LM tests, knowing that these tests are not suitable for this spatial model structure. The effort of this attempt is to examine their behavior for a realistic condition, where the limitation of the two simple spatial econometric models is relaxed, since there is a variety of spatial econometric models that can be considered, and more importantly that the spatial structure will not a priori be given. The good news is that spatial dependence is detected by all LM tests with certainty, with some minor abnormal behaviors, although the results are not reported. However, both strategies select with certainty the SLM, when the values of ρ and θ have the same sign regardless of their magnitude, as can be seen from Table 7, which reports simulation results only for both positive values.Footnote 6 For alternate sign of the two parameters the strategies tend to select the SEM instead of the SLM, especially when the absolute values of ρ and θ are equal (or close) to each other, perhaps due to the common factor, as in this case the SDM is indeed the SEM. Hence, the LM tests should be used with caution simply because they cannot recognize any other type of a spatial econometric model other than the SLM or the SEM, even in the case of the SDM which approximates the SEM, as opposed to the information criteria that can easily identify a more complicated spatial econometric structure. In fact, our results confirm the findings of Elhorst and Halleck Vega (2017) that the classical LM tests or the Robust LM tests are not suitable for the SDM case. In general both the information criteria and the strategies of the LM tests can help the analyst to detect successfully the true simple spatial econometric model.
4 Further simulation results
The Monte Carlo analysis is extended by considering several other spatial econometric models as an effort to eliminate the restrictions imposed by the LM tests and to evaluate the performance of the information criteria as far as selecting the right model among a larger variety of spatial econometric models. The models are:
-
(a)
the Spatial AutoRegressive AutoRegressive (SARAR) model, also known as Spatial Autoregressive Combined Model (SACM):
$${\varvec{y}}=\rho {\varvec{W}}{\varvec{y}}+{\varvec{X}}{\varvec{\beta}}+{\varvec{\varepsilon}}$$with
$${\varvec{\varepsilon}}=\lambda {\varvec{W}}{\varvec{\varepsilon}}+{\varvec{u}}$$ -
(b)
the Spatial Lag X (SLX) model:
$${\varvec{y}}={\varvec{X}}{\varvec{\beta}}+{\varvec{W}}{\varvec{X}}{\varvec{\theta}}+{\varvec{\varepsilon}}$$
and c) the Spatial Durbin Error Model (SDEM):
with
and they have been generated as:
-
(a)
the SARAR model as:
$${\varvec{y}}={\left(\boldsymbol{\rm I}-\rho {\varvec{W}}\right)}^{-1}{\varvec{X}}{\varvec{\beta}}+{\left(\boldsymbol{\rm I}-\rho {\varvec{W}}\right)}^{-1}{\left(\boldsymbol{\rm I}-\lambda {\varvec{W}}\right)}^{-1}{\varvec{u}}$$ -
(b)
the SLX model as:
$${\varvec{y}}={\varvec{X}}{\varvec{\beta}}+{\varvec{W}}{\varvec{X}}{\varvec{\theta}}+{\varvec{\varepsilon}}$$ -
(c)
the SDEM as:
$${\varvec{y}}={\varvec{X}}{\varvec{\beta}}+{\varvec{W}}{\varvec{X}}{\varvec{\theta}}+{\left(\boldsymbol{\rm I}-\lambda {\varvec{W}}\right)}^{-1}{\varvec{u}}$$
using the same simulation process.
Table 8 presents the percentage selection rates of both criteria when the true generating model is the SIM based on several other spatial econometric models. As can be seen from this table, the performance of both criteria remained almost unchanged as that obtained with fewer spatial econometric models and reported on Table 1. However, the selection rates are slightly smaller for both criteria since more models are considered with the BIC outperforming the AIC, but still having very high rates for selecting the true model.
Things changed when the true model is the SLM especially for the AIC, as can be seen from Table 9, relatively to results reported on Table 2, where now the presence of the SARAR model forces the AIC to select less frequently the true model by a reduction rate approximately 10%.Footnote 7 The BIC, on the other hand, kept the same selection rate of the true model, slightly smaller than before, due to the presence of more spatial econometric models, but still very high at the 95% area, regardless of the sample size and the weights matrix formation. Contrary, the results became very complicated when the true model is the SEM, as can be seen from Tables 4 and 10. Both criteria behave very similarly by selecting around 40% the true model, instead of 100% that was before, 32% the SARAR model and 28% the SDEM for moderate and large values of the parameter, whereas for small values both criteria select the SIM failing to detect the presence of spatial dependence. Hence, the existence of more spatial econometric models damaged the good image of both criteria obtained with fewer models when the true spatial econometric model is the SEM. Lastly, Table 11 reports the selection rates of both criteria for the SLX model among all six spatial econometric models, whereas Table 12 reports the section rates only for the generated models SDM, SARAR and SDEM that have two parameters. For the SLX case the selection rates of the true model increase as the value of the parameter increases and/or the sample size with the BIC outperforming AIC and reporting very high rates. Similarly, for the other models the selection rates increase as the values of both parameters increase and/or as the sample size increases regardless of the weights matrix formation reaching levels of certainty.Footnote 8
The simulation process is extended by considering two non-ideal situations, such as the case of a non-constant variance of the error term and non-normality, as in Mur and Angulo (2009), although these conditions are not taken into account in the calculation of both criteria and therefore they are not expected to improve their behavior. The heteroscedasticity is incorporated into the model by assuming that the error term is defined as:
where dr is the distance of the centroid of each cell r from the centroid of the upper left cell of the grid. Of course, there are several other ways of modeling heteroscedasticity, as for example considering dr to be the distance of the centroid of each cell r from the centroid of the central or the lower left cell of the grid (see for example Mur and Angulo, 2009), including the possibility of employing Bayesian analysis even for the case of selecting spatial econometric models under different weights matrix formations (see for example Lesage and Parent, 2007; Crespo Cuaresma and Feldkircher, 2013, Debarsy and Lesage, 2022 and Fernandez et al., 2001). However, the purpose of this research is focused on the influence of heteroscedasticity in the model selection process by AIC and BIC and not on the form of its appearance and that is why only one case is considered.
Indeed, as can be seen from Table 13, which reports the selection rates only for the generated models with one parameter, the behavior of both criteria has changed, either partially or significantly, under heteroscedasticity.Footnote 9 For the SIM and the SLM cases, the selection rate has decreased by a small percentage using the BIC and by a large percentage, close to 15%, using the AIC, while for the SEM and the SLX cases both criteria fail to detect the true model. In fact, the best-fitted model for the SEM case is the SLM having the highest selection rate, although it is not reported, whereas, respectively, for the SLM model is the SIM, where in this case both criteria fail to detect spatial dependence under heteroscedasticity. The selection failure of both criteria appears also for the SDM and SDEM under heteroscedasticity, unlike the SARAR model which is selected even with certainty in some cases, as can be seen from the Table 14, which reports the selection rates only for the generated model. Although it is not reported, both criteria prefer to select either the SLM or the SARAR model when the true generated model is either the SDM or SDEM.
Normality, on the other hand, seems not to be a very important issue since the selection behavior of both criteria did not change drastically, as can be seen from Tables 15 and 16 that report results for models with one and two parameters respectively, assuming that the errors are generated from a log-normal distribution. Indeed, the selection rates are slightly smaller than those obtained under normality and identical under student t distribution, although these results are not reported.
5 Concluding remarks
The identification of a model that most spatially expresses the spatial dependence on a given data set is a matter of paramount importance for every researcher in any spatial econometric analysis. The models that are typically considered to express spatial dependence in empirical applications are usually the SLM and the SEM, due to the fact that these two models can be identified by the existing spatial dependence tests, while other more general or more complicated spatial models that could better express the form of spatial dependence are less frequently considered. For this reason, this study examines the behavior of the two most important information criteria, i.e., the AIC and the BIC, in terms of detecting the true spatial econometric model not only in lieu with the existing LM tests for the two simple spatial models but also for more spatial econometric models that their existence cannot be identified by these tests. Simulation results show that these criteria contribute significantly and more effectively than the existing LM tests to the selection process of the true spatial econometric model for the simple two models the SLM and the SEM.
For the case of more spatial econometric models such as the SLX, SDM, SARAR and SDEM that have been considered additionally in this simulation process the results are mixed. For most of the spatial econometric models the BIC had incredible selection behavior with the AIC having typically, but not always, smaller selection rates of the true model. Exception of this statement is the SEM where both criteria select this model at a low rate, close to 40%, picking very frequently its relative models the SARAR and the SDEM.
The simulation process is extended under heteroscedasticity and non-normality, two non-ideal conditions that produced smaller selection rates of the true model by both criteria, with AIC having a larger decrease. However, heteroscedasticity for the generated models SEM, SLX, SDM and SDEM strongly influence the selection process leading to similar or relative models but not to the true and in some cases failing even to recognize spatial dependence. On the other hand, things did not change drastically under the relaxation of normality. Indeed, smaller selection rates of the true model are obtained under log-normal distribution, without observing any abnormal behaviors, whereas under student t distribution the selection rates were almost the same.
Notes
The Greek geographical structure is obtained from https://geodata.gov.gr/en/dataset/oria-demon-kallikrates, excluding thought the Mountain Athos region.
An abnormal behavior of the LM-EL test for the queen weights matrix formation and for small sample size is observed.
Simulation trials have also been conducted for negative values of θ as well as for absolute values of θ greater than 1 using positive values of ρ. The results obtained for negative values of θ are identical to the positive ones, except for small values of θ, where the percentage rates are typically smaller, while the results, obtained for absolute values of θ greater than one, are the same as the largest positive value of θ. Similar results are also obtained in general for negative values of ρ with some minor differences.
Simulation trials have also been conducted for positive values of ρ and absolute values of θ greater than one, as well as for negative values of ρ and for all other values of θ including values greater than one in absolute terms. The results obtained for negative values of ρ are identical to the positive ones, except for small values of θ, where the percentage rates are typically smaller, whereas the results, obtained for absolute values of θ greater than one, are the same as those for the positive value of θ for both positive and negative values of ρ.
Note that only results for positive values of the parameters are reported since the negative values gave the same results.
The simulation results for the real geographical structure of Greece are not reported for spacing issues and simply because they are very similar with the results of Table 12 obtained by the other two weights matrices formation.
Note that the simulation results for the real geographical structure of Greece are not included due to many instability problems that arose during the simulation process, while in general, the behavior of both criteria was similar as in the cases of the other two weights matrices formation.
References
Agiakloglou C, Tsimpanos A (2021) Evaluating information criteria for selecting spatial processes. Ann Reg Sci 66:677–697
Agiakloglou C, Tsimbos C, Tsimpanos A (2015) Is spurious behaviour an issue for two independent stationary spatial autoregressive SAR(1) processes? Appl Econ Lett 22:1372–1377
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the second international symposium of information theory. Academia Kiado, Budapest, pp. 267–281
Anselin L (1988a) Spatial econometrics: methods and models. Kluwer Academic Publishers, Dordrecht
Anselin L (1988b) Lagrange multiplier test diagnostics for spatial dependence and spatial heterogeneity. Geogr Anal 20:1–17
Anselin L, Florax RJ (1995) Small sample properties of tests for spatial dependence in regression models: Some further results. In: Anselin L, Florax RJ (eds) New directions in spatial econometrics. Springer-Verlag, Berlin, pp 21–74
Anselin L, Bera A, Florax R, Yoon M (1996) Simple diagnostic tests for spatial dependence. Reg Sci Urban Econ 26:77–104
Bivand R, Hauke J, Kossowski T (2013) Computing the jacobian in Gaussian spatial autoregressive models: an illustrated comparison of available methods. Geogr Anal 45:150–179
Bivand R (2015) Spdep: spatial dependence: weighting schemes. Statistics and models. R package version 0.5-82. http://CRAN.R-project.org/package=spdep
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. New York, Springer
Burridge P (1980) On the cliff-ord test for spatial correlation. J R Stat Soc B 42:107–108
Chi G, Zhu J (2008) Spatial regression models for demographic analysis. Popul Res Policy Rev 27:17–42
Cliff AD, Ord JK (1981) Spatial processes: models and applications. Pion Ltd, London
Crespo Cuaresma J, Feldkircher M (2013) Spatial Filtering, model uncertainty and the speed of income convergence in Europe. J Appl Economet 28(4):720–741
Debarsy N, LeSage J (2022) Bayesian model averaging for spatial autoregressive models based on convex combinations of different types of connectivity matrices. J Bus Econ Stat 40:547–558
Diebold FX (2007) Elements of forecasting, 4th edn. Thomson South- Western, Mason
Elhorst JP, Halleck Vega S (2017) The SLX model: extensions and the sensitivity of spatial spillovers to W. Pap De Econ Esp 152:34–50
Fernandez C, Ley E, Steel M (2001) Benchmark priors for Bayesian model averaging. J Econom 16(5):563–576
Fingleton B (1999) Spurious spatial regression: some Monte Carlo results with a spatial unit root and spatial cointegration. J Reg Sci 39:1–19
Florax RJGM, Folmer H, Rey SJ (2003) Specification searches in spatial econometrics: the relevance of Hendry’s methodology. Reg Sci Urban Econ 33(5):557–579
Hoeting JA, Davis RA, Merton AA, Thompson SE (2006) Model selection for geostatistical models. Ecol Appl 16(1):87–98
Jin F, Lee L-F (2013) Cox-type tests for competing spatial autoregressive models with spatial autoregressive disturbances. Reg Sci Urban Econ 43:590–616
Judge GG, Griffiths WE, Hill RC, Lutkepohl H, Lee T-C (1985) The theory and practice of econometrics, 2nd edn. Wiley, New York
Kelejian HH, Piras G (2011) An extension of Kelejian’s J-test for non-nested spatial model. Reg Sci Urban Econ 41:281–292
Lee H, Ghosh SK (2009) Performance of information criteria for spatial models. J Stat Simul Comput 79(1):93–106
LeSage J, Pace R (2009) Introduction to spatial econometrics. Chapman & Hall/CRC, Boca Raton
LeSage J, Parent O (2007) Bayesian model averaging for spatial econometric models. Geogr Anal 39:241–267
Mur J, Angulo AM (2006) The spatial durbin model and the common factor tests. Spat Econ Anal 1:207–226
Mur J, Angulo AM (2009) Model selection strategies in a spatial setting: some additional results. Reg Sci Urban Econ 39:200–213
Mur J, Trivez FJ (2003) Unit roots and deterministic trend in spatial econometrics models. Int Reg Sci Rev 26:289–312
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Zhang X, Yu J (2018) Spatial weights matrix selection and model averaging for spatial autoregressive models. J Econom 203:1–18
Acknowledgements
We are indebted to the editor Giuseppe Arbia and to the two anonymous referees for their useful comments and suggestions that improved the overall research and the presentation of this manuscript.
Funding
Open access funding provided by HEAL-Link Greece.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Agiakloglou, C., Tsimpanos, A. Evaluating the performance of AIC and BIC for selecting spatial econometric models. J Spat Econometrics 4, 2 (2023). https://doi.org/10.1007/s43071-022-00030-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43071-022-00030-x