1 Introduction

Because georeferenced data, some of which are real time, containing locational information have been continuously returned by a variety of sensors (e.g., public transport vehicles equipped with a global position system, remote sensing satellites, and smartphones) and obtained from more and more open sources (such as government health statistical data and demographic data), the amount of spatial data is increasing at an explosive rate. This kind of data is called big spatial data or big geospatial data. Compared with traditional spatial data, these data have a much bigger volume, much more variety, and much higher velocity, and the tools for processing and analyzing them are more complex (van Zyl 2014; Lee and Kang 2015; Li et al. 2016, Haynes et al. 2018). One feature of this kind of data is massive sample sizes, which is accompanied by the basic problem of still being able to detect the latent spatial autocorrelation (SA) across different geographical structures, and, furthermore, whether or not statistical properties for small-to-medium size datasets pertain to large-to-massive datasets.

For a traditional analysis, two typical statistics that have been devised to quantify the nature and degree of SA are the Moran coefficient (MC) and the Geary ratio (GR), which employ different metrics according to their mathematical expressions. The MC contains a cross-product term in its numerator pertaining to deviations from the mean [see Eq. (1)]; this construction is similar to Pearson’s product moment correlation coefficient, r, whose spatial counterpart may be a SA parameter of a spatial autoregressive model, which is widely employed by researchers across a range of disciplines. In other words, the MC corresponds to the spatial autoregressive perspective. The GR contains a paired comparison term in its numerator, one in which differences are between observation attribute values [see Eq. (2)]; this quantity is similar to that used to construct a semivariogram in which the geographic variation between two locations is expressed as a difference between two observation attribute values. In other words, the GR corresponds to the geostatistical semivariogram perspective (Legendre and Fortin 1989). Spatial autoregression works with the inverse covariance matrix, whereas geostatistics works directly with the spatial covariant matrix.

Therefore, the comparison is not only for two single statistics, but for two different conceptualizations. Although these two indices were introduced many decades ago by Moran (1950) and Geary (1954), respectively, they were not widely employed in terms of SA indexes until Cliff and Ord (1973, 1981) published their fundamental and pioneering works, in which these two statistics’ distributional properties, including their asymptotic normal sampling distributions and power (i.e., the probability of rejecting the null hypothesis when it is not true) comparisons for positive SA of small sample sizes were established in detail. Thereafter, various researches related to these SA statistics began to appear. For example, Griffith (1987, p. 44) first pointed out the relationship function expressing the MC in terms of the GR. Tiefelsdorf and Boots (1995) derived the exact distribution of the MC for small samples, which is a seminal work that helped to establish the novel Moran eigenvector spatial filtering spatial statistics methodology (Griffith 1996). Anselin (1995, 1996) introduced local indicators of spatial association (LISA) and the Moran scatter plot, which visualizes SA with a regression trend line superimposed on those geographical attribute points appearing in the numerator of the MC distributed across the four quadrants of the plane. Boots (2003) also furnished local SA indices for categorical data. Lee (2001) developed a bivariate spatial correlation coefficient as well as its local form by integrating Pearson’s r and the MC. Boots and Tiefelsdorf (2000) investigated the behavior of SA test statistics in three regular tessellations; Bivand et al. (2009) implemented the saddlepoint approximation instead of the normal approximation, and the exact distribution of the MC in the R spdep package, which makes power analysis easier because many geographic information system (GIS) software packages do not have this function. Chun (2008), Cheng et al. (2012), Bavaud (2013), and de la Mata and Llano (2013) discussed issues relating to network spatial autocorrelation. More recently, Carrijo and da Silva (2017) devised a modified MC to solve the problem of underestimating real SA when sample size is small; Anselin (2018) extended the Local GR to a multivariate context. Except for those theoretical studies on the statistical properties of these two statistics, the MC and GR often are used as tools in explanatory works for descriptive and visualization purposes. In addition, the MC is used as a tool for the diagnosis of SA in regression modeling (Cliff and Ord 1969, 1970).

For a massive spatial data analysis, the mathematical or statistical properties of the MC and the GR need to be extended to much larger sample sizes on the basis of Cliff and Ord’s (1973, 1981) pioneering works. One question asks why a researcher still uses SA coefficients to describe large sample size datasets. Being similar to those summary statistics (e.g., the mean, variance, and median) that portray data from different angles, and that are computed as initial descriptions when a researcher obtains his or her dataset, an SA coefficient can be seen as a summary statistic as well in spatial statistics. Thus, regardless of the sample size, knowing the degree of SA is useful so that researchers can have a first impression of the spatial data at hand. Moreover, calculating this statistic is not the target in a spatial data analysis experiment. Rather, it is a tool for determining subsequent treatments, such as the selection of a spatial model for describing data when a MC value indicates strong positive SA. Otherwise, the selection may be a nonspatial model if the MC is not calculated, or ignored. Consequently, extension of small sample size results to large or massive sample sizes is necessary and is the major purpose of this paper. Specifically, we derive the mathematical proofs of the asymptotic variances of the MC and the GR for different types of random variables, through which the MC is shown to be more efficient than the GR for large sample sizes. We also develop an analytical approach to compare the statistical power of the two statistics for any size dataset.

This article substantiates the findings in Luo et al. (2017), with detailed mathematical derivations and interpretations. It includes a methodology part in Sect. 2, followed by a mathematics section. Section 4 analyzes efficiency in terms of different surface partitionings and distribution conditions. Section 5 compares statistical power. Section 6 discusses relationships between the MC, the GR, and the join count statistics based on the work of Cliff and Ord (1973). This paper also provides results in Sect. 7 for two massive spatial dataset examples to verify the findings of the previous sections. Finally, this paper states conclusions and presents discussions in its last section. Its contributions beyond the 2017 paper are the following: detailed proofs for theorems, an alternative visualization of statistical power, a comment on the join count statistics that are applicable to nominal data, and two empirical examples to validate results.

2 Methodology

Most spatial analysts deal with only a few of the many possible types of random variables (e.g., normal, binomial, Poisson). Furthermore, the geospatial literature suggests that a particular set of geographic configurations furnishes useful insights into, and understanding of, many spatial statistics concepts. These are the topics of this section.

2.1 Distributional assumptions and geographic configurations

Throughout this paper, the following two aspects of postulates are set: one pertains to the types of random variable (i.e., distributional assumptions) and the other pertains to geographic configuration, or surface partitioning. These foci are inspired by Cliff and Ord’s (1973, 1981) work, in which the moments of the MC and the GR are derived under normality and randomization assumptions, and power curves have been drawn for several geographical configurations (i.e., circular, rook, queen, queen on torus, and an empirical surface partitioning).

This paper analyzes not only the normal distribution, but also three other specific distributions (i.e., uniform, beta with equal scale parameters less than one, and exponential). It also includes six additional geographical configurations (i.e., linear, hexagonal, maximum planar, the two versions of maximum hexagonal, and rook on torus). Essentially, different distributions render different kurtosis terms, and different geographical configurations produce different connectivity matrices. The following subsections describe details about these cases.

2.1.1 Four types of random variables

The four selected distributions are for continuous random variables; those for discrete random variables are not discussed in this paper. Figure 1 portrays their probability density function plots with their respective kurtosis terms (\(b_{2}\)).

Fig. 1
figure 1

Probability density function (PDF) plots. a Normal distribution, \(b_{2} = 3\). b Uniform distribution, \(b_{2} = 9/5\). c Beta distribution (\(\alpha = \beta = 0.5\)), \(b_{2} = 3/2\). d Exponential distribution, \(b_{2} = 9\)

These distributions are selected because they furnish a representative sample of the full range of probability distributions. Specifically, the normal family is the most typical case. Each of the other three distributions has no direct connection with the normal distribution, although the exponential distribution can be subjected to a Box–Cox power transformation that approximates a normal distribution, and none of these three non-normal distributions has a connection with either of the other two as well.Footnote 1 More specifically, the normal distribution represents a non-skewed and proper kurtosis distribution, the uniform distribution depicts a flat distribution within a finite interval, the beta distribution with \(\alpha = \beta = 0.5\) (this is the case employed throughout this paper) represents a sinusoidal distribution within a confined interval, and the exponential distribution depicts a skewed and leptokurtic distribution.

2.1.2 Geographic configurations

Ten geographic partitionings are employed: three of them, namely the maximum planar connectivity case, and the two maximum hexagonal cases (with odd and even columns), are theoretically constructed. These settings furnish a relatively comprehensive representation of possible realistic and theoretical configurations. For example, a square rook and a square queen articulation are common in the surface partitioning for remotely sensed images, whereas a hexagonal partitioning often is employed in spatial sampling designs (e.g., Chun and Griffith 2013, pp. 24–29). The linear [each of the internal areal units has two geographic neighbors, while each of the two end areals units has only one neighbor], the circle [a two-dimensional (2-D) counterpart to the linear case], and the torus [a 3-D counterpart to the square rook or queen case] do not relate to empirical landscapes; these 2-D and 3-D cases are configurations in which each areal unit has the same number of neighbors (e.g., for a circle, every areal unit has two neighbors, whereas for a torus with rook adjencacy, each cell has four neighbors, and for a torus with queen adjacency, each has eight). And the maximum planar case (Tait and Tobin 2017) can be seen as one possible realization of a planar graph that has the maximum number of edges \(3\left( {n - 2} \right)\), where \(n\) (i.e., the number of areal units) is the number of nodes in a graph. Furthermore, to gain a better understanding of this partitioning, maximum hexagonal cases with different numbers of columns have been designed (the internal linear units of a maximum planar case are replaced with hexagonal cells). Figure 2 portrays three of these situations, and Table 1 lists their corresponding neighbor sums, where \(P\) and \(Q\) are the number of rows and columns, respectively, in a configuration, \(n = P \times Q\) is the number of areal units under study, and \(\varvec{C} = \left( {c_{ij} } \right)_{n \times n}\) is the connectivity matrix, where \(c_{ij} = 1\) if areal units \(i\) and \(j\) are adjacent (i.e., they have a common edge or point, and hence are neighbors), and 0 otherwise; matrix \(\varvec{C}\) is symmetric.

Fig. 2
figure 2

Selected surface partitionings. a A regular square rook configuration. b A maximum hexagonal configuration with an odd Q. c A maximum hexagonal configuration with an even Q

Table 1 Neighbor sums of selected geographic configurations

In the hexagonal cases, the number of areal units \(n\) is no longer \(P \times Q\), but rather \(P \times Q + 2\), where the additional two areal units are those surrounding the outside of the geographic landscape. These connections between internal hexagons and the outer two areal units are designed to attain the maximum neighbor sums.

To illustrate the variation in different geographic connections and sample sizes, Table 2 presents the extreme eigenvalues of matrix \(\varvec{C}\) for each configuration, as well as their corresponding extreme MC and GR values. Discussion of the relationship between these matrix \(\varvec{C}\) eigenvalues, \(\lambda\), and the MC as well as the GR appears in subSect. 5.2.

Table 2 Selected eigenvalues and affiliated MC and GR values for different sample sizes and connectivity

The analyses for Table 2 employed essentially two different sample sizes, \(n \approx 100\) and 10,000. For L and CN-C, \(n = 1 \times 100\) or \(1 \times 10 ,000\); for SR, SQ, H, CN-TR, and CN-TQ, \(n = 10 \times 10\) or \(100 \times 100\); for MPC, \(n = 1 \times 100 + 2\) or \(1 \times 10 ,000 + 2\); for MH-O and MH-E, \(n = 10 \times 11 + 2\) or \(100 \times 101 + 2\), and \(n = 10 \times 10 + 2\) or \(100 \times 100 + 2\), respectively. Except for MPC, MH-O, and MH-E, all calculated SA index values are well behaved, in that \({\text{MC}} + {\text{GR}} \approx 1\), and the maximum eigenvalue is within the interval of the minimum row sum and the maximum row sum of matrix \(\varvec{C}\). For example, for the linear case with a smaller sample size, because \(n - 2\) of the row sums of matrix \(\varvec{C}\) are two, and only the first and last rows have a sum of one, the maximum eigenvalue is close to, but slightly less than, two, and also because of the symmetry of those eigenvaluesFootnote 2 and the zero trace,Footnote 3 the minimum eigenvalue is minus the maximum. Furthermore, the summation of the strongest positive SA is \({\text{MC}}_{ \hbox{max} } + {\text{GR}}_{ \hbox{min} } = 1.00815 + 0.00186 = 1.01001\), and its negative counterpart is \({\text{MC}}_{ \hbox{min} } + {\text{GR}}_{ \hbox{max} } = - 1.00961 + 1.99950 = 0.98989\). However, this appealing property no longer holds when the distribution of ones is highly skewed toward two rows that represent connectivities of the outer two units (i.e., MPC, MH-O, and MH-E). This skewness is more serious in the maximum planar case because the peripheral two cells are not only connected with each other, but also with all inner cells; in other words, each of these two rows contains \(n - 1\) ones, because they are adjacent to every cell except themselves. Hence, a severe unevenly structured adjacency matrix yields considerably large extreme eigenvalues, and MC and GR values; but these undesirable values appear only for the first and last few eigenvalues. Table 6 in Appendix 1 reports the first ten positive and last ten negative \(\lambda\), and MC and GR values for these three anomalous configurations. It reveals a big difference between selected extreme values (marked with red) and other values in the same row; this difference seems most conspicuous for the MPC case, especially with 10,002 samples. Reasons for these discrepancies include: (1) the number of ones in matrix \(\varvec{C}\) for at least one areal unit increases as \(n\) increases, and (2) the diameter of the affiliated graph is relatively small.

2.2 Efficiency and variance

There are two “efficiency” measures in statistics. One is used as a criterion that qualifies an estimator—for two unbiased estimators; the one with the smaller variance is more efficient. The other is used in hypothesis testing—when comparing two test procedures; the one that needs fewer observations for a given power is more efficient. This paper uses the first “efficiency” measure, but for another purpose, namely comparing two statistics rather than estimators.

An additional reason for pursuing this efficiency comparison is that Cliff and Ord (1969, p. 45) point out that the variance of the MC is “less affected by the distribution of the sample data” than the variance of the GR. This paper seeks to prove their finding from a more general perspective; the asymptotic variance is employed to achieve this goal.

2.3 Statistical power and its visualization

Because hypothesis tests are conducted on the basis of samples, they do not always yield correct conclusions. Consequently, considering both Type I (rejecting a true null hypothesis) and Type II (failing to reject a false null hypothesis) errors in hypothesis testing is important. The probability of committing a Type I error is denoted as \(\alpha\), which also is known as the significance level and preset at the beginning of a test procedure; the probability of committing a Type II error is denoted as \(\beta\), which depends on the sample size, the significance level, and the probability distribution under the null hypothesis. As originally devised, \({\text{power}} = 1 - \beta\), which is the probability of rejecting a false null hypotheses. The power of a hypothesis test is between zero and one, with a value closer to one indicating a better ability to reject a false null hypothesis. For illustrative purposes, Fig. 11 (Appendix 2) portrays power and is accompanied by description furnishing an intuitive impression about this statistical concept. In a spatial data analyzing procedure, testing for SA in (large) spatial datasets is crucial; accordingly, an obvious question asks about the quality of a test. A formal way or a “standard approach” (Cliff and Ord 1973, p. 131) to evaluate a test is to calculate its statistical power. As Weiss (2017, p. 449) states: “even more helpful is a visual display of the effectiveness of the hypothesis test, obtained by plotting points of power against various values of the parameter and then connecting the points with a smooth curve.” This notion of power can be applied to a two-sided (or two-tailed) as well as a one-sided (or one-tailed) situation, in keeping with the type of hypothesis test.Footnote 4 For convenience and comparison purposes, Sect. 5 presents the critical values in terms of the MC so that all power curves, including those for the GR and the join counts, can be shown with a single plot. To do so requires establishing theoretical relationship functions linking the MC, the GR, and the join count statistics.

3 Notation and theorems

This section presents necessary notation and limit theorems about variances of the MC and the GR.

Let \(X\) be the georeferenced variable of interest distributed over a tessellation. Its observations are \(x_{1} , x_{2} , \ldots , x_{n}\). The average of these observations is denoted by \(\bar{x} = \sum\nolimits_{i = 1}^{n} {x_{i} } /n\). \(\varvec{C} = \left( {c_{ij} } \right)_{n \times n}\) is the connectivity matrix denoted in Sect. 2.1.2. The sample MC and GR for variable \(X\) are defined as follows:

$${\text{MC}} = \frac{{n\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \left( {x_{i} - \bar{x}} \right)\left( {x_{j} - \bar{x}} \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} }},$$
(1)

and

$${\text{GR}} = \frac{{\left( {n - 1} \right)\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \left( {x_{i} - x_{j} } \right)^{2} }}{{2\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} \mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} }}.$$
(2)

The GR can be rewritten as (Griffith 1987)

$$\frac{n - 1}{{2\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} }}\frac{{2\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} \left( {\mathop \sum \nolimits_{j = 1}^{n} c_{ij} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - \bar{x}} \right)^{2} }} - \frac{n - 1}{n}{\text{MC}}.$$
(3)

Derivation of this formula appears as proof 1 in Appendix 3.

Cliff and Ord (1973) establish the exact variances of these two statistics. In the following, the subscript N denotes normality and R denotes randomization:

$${\text{Var}}_{N} \left( {\text{MC}} \right) = \frac{{n^{2} S_{1} - nS_{2} + 3S_{0}^{2} }}{{\left( {n - 1} \right)\left( {n + 1} \right)S_{0}^{2} }} - \frac{1}{{\left( {n - 1} \right)^{2} }},$$
(4)
$${\text{Var}}_{R} \left( {\text{MC}} \right) = \frac{{n\left[ {\left( {n^{2} - 3n + 3} \right)S_{1} - nS_{2} + 3S_{0}^{2} } \right] - b_{2} \left[ {\left( {n^{2} - n} \right)S_{1} - 2nS_{2} + 6S_{0}^{2} } \right]}}{{\left( {n - 1} \right)\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} - \frac{1}{{\left( {n - 1} \right)^{2} }},$$
(5)
$${\text{Var}}_{N} \left( {\text{GR}} \right) = \frac{{\left[ {\left( {2S_{1} + S_{2} } \right)\left( {n - 1} \right) - 4S_{0}^{2} } \right]}}{{2\left( {n + 1} \right)S_{0}^{2} }},$$
(6)

and

$${\text{Var}}_{R} \left( {\text{GR}} \right) = \frac{{\left( {n - 1} \right)S_{1} \left[ {n^{2} - 3n + 3 - \left( {n - 1} \right)b_{2} } \right] - \frac{1}{4}\left( {n - 1} \right)S_{2} \left[ {n^{2} + 3n - 6 - \left( {n^{2} - n + 2} \right)b_{2} } \right]}}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }} + \frac{{S_{0}^{2} \left[ {n^{2} - 3 - \left( {n - 1} \right)^{2} b_{2} } \right]}}{{n\left( {n - 2} \right)\left( {n - 3} \right)S_{0}^{2} }},$$
(7)

where \(S_{0} = \sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{n} {c_{ij} } }\), \(S_{1} = \frac{1}{2}\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{n} {\left( {c_{ij} + c_{ji} } \right)^{2} } }\), \(S_{2} = \sum\nolimits_{i = 1}^{n} {\left[ {\sum\nolimits_{j = 1}^{n} {\left( {c_{ij} + c_{ji} } \right)} } \right]^{2} }\), and for \(z_{i} = x_{i} - \bar{x}\), \(b_{2} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {z_{i}^{4} } /\left( {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {z_{i}^{2} } } \right)^{2}\) defines kurtosis. Again, because matrix \(\varvec{C}\) is symmetric and binary, \(S_{1} = 2\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{n} {c_{ij} } } = 2S_{0}\), and \(S_{2} = 4\sum\nolimits_{i = 1}^{n} {\left( {\sum\nolimits_{j = 1}^{n} {c_{ij} } } \right)^{2} }\).

Griffith (2010) proposes simplifying Eqs. (4)–(7) through asymptotics, assuming a normal distribution, producing

$${\text{Var}}_{A} \left( {\text{MC}} \right) = \frac{2}{{\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} }} = \frac{2}{{S_{0} }},$$
(8)

and

$${\text{Var}}_{A} \left( {\text{GR}} \right) = \frac{2}{{\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} }} + \frac{{2\mathop \sum \nolimits_{i = 1}^{n} \left( {\mathop \sum \nolimits_{j = 1}^{n} c_{ij} } \right)^{2} }}{{\left( {\mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{n} c_{ij} } \right)^{2} }} = \frac{2}{{S_{0} }} + \frac{{S_{2} }}{{2S_{0}^{2} }},$$
(9)

where subscript A denotes asymptotic.

Theorems 1 and 2 indicate that the asymptotic variance for the MC is insensitive to swapping the normality and randomization assumptions. They also reveal that the asymptotic variance of the MC approximates the exact variances well for both of these cases.

Theorem 1

\(\mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{MC}} \right) = {\text{Var}}_{A} \left( {\text{MC}} \right).\)

Theorem 2

\(\mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{R} \left( {\text{MC}} \right) = {\text{Var}}_{A} \left( {\text{MC}} \right).\)

Theorems 3 and 4 discuss the convergence of the GR exact variance for different probability assumptions when sample size approaches infinity. An analogous result has not been obtained for the variance of the GR for permutation sampling; the asymptotic version of this latter index depends on distributional assumptions.

Theorem 3

\(\mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{N} \left( {\text{GR}} \right) = {\text{Var}}_{A} \left( {\text{GR}} \right)\).

Theorem 4

\(\mathop {\lim }\limits_{n \to \infty } {\text{Var}}_{R} \left( {\text{GR}} \right)\) depends on \(b_{2}\), the kurtosis of a distribution.

Proofs for Theorems 1 to 4 appear in Appendix 3. For normal, uniform, beta, and exponential distributions, \(b_{2}\) has the values 3, 9/5, 3/2, and 9, respectively. Thus,

$${\text{Var}}_{\text{AN}} \left( {\text{GR}} \right) = 2/S_{0} + S_{2} /2S_{0}^{2} ,$$
(10)
$${\text{Var}}_{\text{AU}} \left( {\text{GR}} \right) = 2/S_{0} + S_{2} /5S_{0}^{2} ,$$
(11)
$${\text{Var}}_{\text{AB}} \left( {\text{GR}} \right) = 2/S_{0} + S_{2} /8S_{0}^{2} , \quad \left( {\alpha = \beta = 0.5} \right),$$
(12)

and

$${\text{Var}}_{\text{AE}} \left( {\text{GR}} \right) = 2/S_{0} + 2S_{2} /S_{0}^{2} ,$$
(13)

where the subscripts AN, AU, AB, and AE, respectively, denote the asymptotic variance of the normal, uniform, beta, and exponential distribution. That is to say, the asymptotic variance of the GR is sensitive to distributional assumptions.

Equation (10) coincidences with Griffith’s (2010) result [Eq. (9)].

4 Efficiency analysis

This section summarizes results for both asymptotic and exact variances.

4.1 Asymptotic variance ratios

Considering that a statistic with a smaller variance is more efficient, suppose the variance ratio of the MC and the GR is \(r_{\text{exact}} = {\text{Var}}_{\text{exact}} \left( {\text{MC}} \right)/{\text{Var}}_{\text{exact}} \left( {\text{GR}} \right)\), where subscript “exact” denotes the exact MC and GR variances, given by Eqs. (4) and (6), or by Eqs. (5) and (7). If \(r_{\text{exact}} < 1\), then the MC is more efficient than the GR; otherwise, \(r_{\text{exact}} > 1\), then the GR is more efficient. The following asymptotic variances also are of interest:

$$r = {\text{Var}}_{A} \left( {\text{MC}} \right)/{\text{Var}}_{A*} \left( {\text{GR}} \right) = \frac{{2/S_{0} }}{S},$$
(14)

where A* denotes AN, AU, AB, or AE and \(S\) denotes Eqs. (10), (11), (12), or (13). Similarly, if \(r < 1\), then the MC is more efficient than the GR; if \(r > 1\), then the GR is more efficient. Equation (14) indicates that \(S_{0}\), the sum of ones in matrix \(\varvec{C}\), and \(S_{2}\), the sum of the squared row sums of matrix \(\varvec{C}\), are needed to calculate the variance ratio; these two quantities have different values with different geographical configurations, values of \(S_{0}\) and \(S_{2}\) of selected geographical configurations are listed in Table 1. More values can be found in Tables 1 and 2 of Luo et al. (2017).

Table 3 presents selected asymptotic variances. Figure 3 portrays their respective ratio curves.

Table 3 Asymptotic variance ratios of the MC and the GR
Fig. 3
figure 3

Asymptotic ratio curves. a Curves for normal distribution. b Curves for uniform distribution. c Curves for beta distribution. d Curves for exponential distribution

Term \(k\) in the last column of Table 3 is the number of constant neighbors; for example, \(k\) may be 2, 4, and 8 for the circle, torus rook, and torus queen cases, respectively. Thus, all ratios are less than one, and especially for the maximum planar connectivity case, the values go to zero, which indicates that the MC is more efficient in terms of the asymptotic variances. This property can be seen more clearly from Fig. 3 because its curves have convergent trends whose trajectory values are less than one before n is 100. The CN case is not shown because when \(k\) is 2, 4, and 8, its ratios become the same values as those for the L, SR, and SQ cases.

4.2 Exact variance ratios

Section 4.1 presents discussion of asymptotic variance ratios as well as the efficiency priority (i.e., \(r < 1\)) of the MC versus the GR for the selected probability distributions. One remaining question asks whether or not identical results can be obtained when their exact variances [Eqs. (4) to (7)] are considered. Fortunately, by substituting appropriate \(S_{0}\) and \(S_{2}\) values into these formulae, and using some mathematical calculation software (e.g., Wolfram Mathematica 10.2, which has been used for this paper), these exact variance ratios are not difficult to compute. Table 3 in Luo et al. (2017) already summarizes them. That table reveals that all exact variance ratios of the MC versus the GR are one except those for the MPC and MH cases. More specifically, all exact ratios for the MPC case are zero, and they are (approximately) 0.4286, 0.6522, 0.75, and 0.1579 for the MH cases for the selected normal, uniform, beta, and exponential distributions, respectively. These results indicate that the MC is only more efficient than the GR based on the exact variances for the MPC and MH cases. Because the MPC exact ratio is the same as its asymptotic counterpart (both are zero), and because all other asymptotic variance ratios are not one, then these latter asymptotic versions need to be adjusted in order for the ratios to become one. Furthermore, calculating ratios of asymptotic and exact variances of the MC and the GR reveals that the GR’s asymptotic variances are the ones that need to be adjusted. The necessary GR adjustment factors equal those exact ratios divided by their asymptotic ratios. For the MPC case, the MC asymptotic variance needs to be adjusted such that it should be multiplied by 1/3 for all probability distributions. This adjustment assessment furnishes quantitative evidence that the GR is far more sensitive to the underlying frequency distribution of an attribute variable.

Luo et al. (2017) present the exact variance ratio curves as well as values for 184 specimenFootnote 5 irregular surface partitions. For illustrative and comparative purposes, Fig. 4 reproduces some of these plots more delicately (with a higher resolution and more distinguishable colors), which depict convergence in the interval [13, 7250], [10, 7250], [8, 7250], and [23, 7250] for the normal, uniform, beta, and exponential probability distributions, respectively.

Fig. 4
figure 4

Exact variance ratio curves with 184 specimen points superimposed. a Curves for a normal distribution. b Curves for a uniform distribution. c Curves for a beta distribution. d Curves for an exponential distribution

Figure 4 portrays that, except for the MPC (the purple curve) ratio converging on zero, and the MH (the pink curve) ratio converging on a specific value less than one for each probability distribution, these ratios converge on one. In addition, those specimen geographic landscapes, presented as black dots superimposed on the ratio curves, mostly scatter between the regular SQ (the green curve) and the MH (the pink curve) cases.

In this section, asymptotic as well as exact variance ratios of the MC and the GR are discussed. These asymptotic variances are far simpler in their expressions than their exact counterparts. This simplicity motivates an exploration of how much the sample size (or threshold) above which those results obtained with asymptotic methods differ from those obtained with exact methods. More detailed work about these two statistics is included in Luo et al. (2017, p. 263, Table 4). One also is interested in these statistics in terms of their asymptotic variance, especially if they have better statistical properties when sample size goes to infinity. For example, for a 1000-by-1000 remotely sensed image, for which the size 1,000,000 far exceeds those thresholds above which asymptotic results are close to exact results (see the square rook row in Table 4 in Luo et al. 2017), both asymptotic variances of the MC and the GR achieve good accuracy. Consequently, one question asks how to choose between these two indices; this section answers this particular question.

5 Statistical power visualization

Cliff and Ord (1973) conduct simulation experiments to compare the power of the MC and the GR by employing 12-by-2, 4-by-3, 5-by-5, and 7-by-7 lattices in which both the SR and SQ cases are discussed, a 25-cell circle, and the 26 counties of Eire (an irregular surface partitioning), and conclude that the MC is more powerful. Subsequently, they (1981) updated the largest sample size to 81 (a 9-by-9 lattice) by referring to Haining’s (1978) work. Being different from the spatial Markov scheme that Cliff and Ord used, Haining introduces a two-dimensional moving average spatial model as the alternative hypothesis, compares the power of the likelihood ratio (denoted by L.R. in his paper) and the MC, and draws the conclusion that the L.R. statistic is more powerful. Writing a year earlier, Bartels and Hordijk (1977) discuss the MC power by using three different error estimators (OLS, BLUS, and RELUS) in their four illustrative examples (the dataset for the first three is the Netherlands with 39 regions, but with a different number of variables for each example, whereas the dataset for the last case is Eire with 26 regions and three artificial variables). All OLS estimators achieve the highest power, except for a very few high (0.9) and low (0.1) SA values. More recently, Dray (2011) develops two new SA indexes to describe a more complex situation (positive and negative SA are involved simultaneously, and their summation is zero or nearly zero), uses a Monte Carlo method to test the significance of these new statistics as well as the MC, and concludes that these two new statistics are as powerful as the MC for purely positive or negative SA structures, but are more powerful than the MC for complex situations.

However, all of these power assessments are calculated based upon Monte Carlo simulations, and only for several selected positive SA values (a one-tailed test). Actually, in the spatial analysis literature, Monte Carlo approaches used for inference are widely adopted not only for areal unit data, but also for point data (Diggle 2010) because of their flexibility, intelligibility, and extendibility. Although Hope (1968) suggests a simplified Monte Carlo test procedure to reduce the size of a reference set (i.e., the number of iterations), generating thousands or even millions of random numbers that are in keeping with a tested distribution still is time-consuming. In addition, this procedure needs to include repetitions. Even reducing the number of iterations by a few in order to reduce the processing time can be at the expense of precision. In contrast, an analytical approach is more rapid and accurate. The following section presents various power curves for the MC and GR with different sample sizes and geographical configurations, which are plotted by an alternative method that appears in Luo et al. (2017).

5.1 A method for calculating statistical power

The definition of statistical power states that if \(1 - \beta_{\text{MC}} > 1 - \beta_{\text{GR}}\), then the MC is more powerful than the GR (i.e., the MC test is more likely than the GR test to reject a false null hypothesis, or the MC test is more likely to obtain a significant result to support the existence of a spatially autocorrelated phenomenon); otherwise, the GR is more efficient. Luo et al. (2017) state an alternative hypothesis, \(H_{1}\), of nonzero SA, which results in two-tailed tests for the MC and GR. But in order to parallel pioneering work and make a clearer comparison, this section focuses on the one-tailed counterpart.Footnote 6 Thus, the null hypothesis, \(H_{0}\), still is no SA, but \(H_{1}\) becomes a hypothesis of positive SA; here \(\alpha\) is set to 0.05.

Figure 5 displays the MC and GR power curves for positive SA and various surface partitionings, where the horizontal axis presents the degree of SA, and the vertical axis stands for the value of statistical power. Each plot shows two sample sizes, 5-by-5 and 9-by-9 (except for the MH with an odd Q, which has two more cells than the other sample sizes), where the solid red–green lines represent the MC-GR power curves with 25 (or 27) cells, and the dashed pink–blue lines represent the MC-GR power curves with 81 (or 83) units. Several conclusions can be made: (1) power increases with increasing sample size (which is a standard result) and the degree of SA; (2) for the SR, SQ, and H cases, the MC is more powerful than the GR; (3) for the L and CN (circle, torus rook, and torus queen) cases, the GR is slightly more powerful than the MC for very small sample sizes (e.g., 5-by-5), but this small advantage disappears with increasing sample size (e.g., 9-by-9); and, (4) for the MH case, the MC is more powerful. Findings (1) to (3) are consistent with Cliff and Ord’s (1981) summaries, whereas the MH as well as the H case is newly shown here. Moreover, compared with the early power curves, these Fig. 5 curves are smoother because they are drawn with an analytical method rather than simulation experiments employing only several specific sample sizes; any size power curve can be plotted this way.

Fig. 5
figure 5

The MC and GR positive power curves for various geographic configurations. a The L case. b The CN-C case. c The SR case. d The CN-TR case. e The SQ case. f The CN-TQ case. g The H case. h The MH case

5.2 A theoretical evaluation

A key step in the method outlined in Sect. 5.1 is to evaluate the relationship function between the MC and the GR so that their power curves can be plotted with a common measurement scale. In the preceding power analysis, SA is quantified by the MC, so all the GR values are replaced by their respective MC expressions. Fortunately, Eq. (3) furnishes a primary form that indicates a negative correlation between the MC and GR. Referring to this formula, theoretical equations can be constructed. However, in order to construct these equations, the MC and GR values need to be generated. One technique is to take advantage of the matrix \(\left( {\varvec{I} - \frac{{11^{\text{T}} }}{n}} \right)\varvec{C}\left( {\varvec{I} - \frac{{11^{\text{T}} }}{n}} \right)\), which appears in the numerator of Eq. (1) when the MC is written using matrix notation, where \(\varvec{I}\) is the identity matrix, \(1\) is an n-by-1 vector of ones, and T denotes the matrix transpose operation. Multiplying the eigenvalues of this matrix by \(\frac{n}{{1^{\text{T}} \varvec{C}1}}\) furnishes the complete set of distinct MC values for a geographic landscape, with the extreme values establishing the minimum and maximum possible MC values (de Jong et al. 1984). Corresponding GR values also can be calculated with the eigenvectors of this matrix: using matrix notation, the numerator of Eq. (2) may be written as \(2\left( {\left( {\varvec{C}1} \right)_{{{\mathbf{diagonal}}}} - \varvec{C}} \right)\) (de Jong et al. 1984; Griffith 2003), where \(\left( {\varvec{C}1} \right)_{{{\mathbf{diagonal}}}}\) is a diagonal matrix whose diagonal entries are row sums of connectivity matrix \(\varvec{C}\). The resulting theoretical relationship functions appear in Luo et al. (2017).

Figure 6 portrays selected scatter plots with fitted lines (shown in red) superimposed on them. These plots depict the relationship between the GR (the vertical axis) and the MC (the horizontal axis) for regular tessellations (the SR, SQ, and H cases for \(n = 10 ,000\)) and the CN case. Overall, these scatter plots have a negative sloping trend line, although thick line portions appear in the SR, SQ, and H scatter plots. All of the fitted lines evaluated by the functions that have the same form as Eq. (3) closely correspond to their respective scatter plots.

Fig. 6
figure 6

MC versus GR scatter plots with superimposed fitted lines. a The SR case. b The SQ case. c The H case. d The CN-TR case. e The CN-TQ case. f The CN-C case

6 The MC and GR versus the join count statistics

As one type of test for SA, the join count statistics (Cliff and Ord 1973) apply to nominal (e.g., binary 0–1) data. Three different join count statistics exist: BB, WW, and BW, where BB denotes a one area adjacent to a one area, WW denotes a zero area adjacent to a zero area, and BW denotes a one adjacent to a zero area. Assignment of the value one or zero to the \(i\)th areal unit depends on the presence or absence of some phenomenon in that unit. If it is present, then this unit has \(x_{i} = 1\); otherwise, it has \(x_{i} = 0\).

Cliff and Ord (1973) point out the similarity of the BB and MC, and the BW and GR, furnish an equation relating the BB and MC in which the attribute variable \(X\) also is included, and derive WW as a linear combination of BB and BW. Although these join count statistics are less popular today than several decades ago, Chun and Griffith (2013) furnish equations relating the MC and BB + WW, and the GR and BW for nonfree sampling (sampling without replacement):

$${\text{MC}} = \frac{2n}{{S_{0} }}\left( {\frac{\text{BB}}{{n_{1} }} + \frac{\text{WW}}{{n_{2} }}} \right) - 1,$$
(15)

and

$${\text{GR}} = \frac{{n\left( {n - 1} \right)}}{{S_{0} }}\frac{\text{BW}}{{n_{1} n_{2} }},$$
(16)

where \(n_{1}\) is the number of areal units with one, \(n_{2}\) is the number areal units with zero, \(n_{1}\) and \(n_{2}\) are preset, and \(n_{1} + n_{2} = n\).

On one hand, Eq. (16) confirms the similarity between the GR and BW; on the other hand, Eq. (15) indicates that the MC is related not only to the BB but also to the WW. Cliff and Ord (1973) only considered the similarity between the MC and BB. Because WW can be written as a linear combination of BB and BW, the MC finally relates to BB and BW.

Again, using the technique suggested in Sect. 5, Fig. 7 portrays selected GR and BW power plots of two-tailed tests for the CN-C, SR, SQ, and H cases and the 5-by-5 and 9-by-9 sample sizes. The solid red–green lines represent 5-by-5 GR-BW power plots, whereas the dashed pink–blue lines represent the 9-by-9 GR-BW power curves. Except the CN-C case, all plots depict a power priority of the BW versus the GR, which is counter to Cliff and Ord’s (1973) results.

Fig. 7
figure 7

The two-tailed test power plots of the GR versus the BW. a The CN-C case. b The SR case. c The SQ case. d The H case

7 Two massive spatial data examples

Two remotely sensed images are employed to verify the findings furnished in the previous sections. For the continuous random variables, the normalized difference vegetation index (NDVI) was calculated for a Landsat 7 Enhanced Thematic Mapper Plus (ETM +) image of the Yellow Mountain region (Anhui, China) to illustrate the efficiency of the MC versus the GR. To illustrate a nominal data case, pixels constituting an image of the Huairou Reservoir region (Beijing, China) captured from Map World are classified as water or not water to calculate the join count test as well as to indicate weaknesses of the GR versus the MC for this measurement scale.

7.1 A continuous random variable case

A Yellow Mountain image (Fig. 8), downloaded from the USGS Earth Explore website (https://earthexplorer.usgs.gov/), is for October 8, 2002, and forms a 7811-by-7051 rectangular region with \(n = 55 ,075 ,361\) pixels. It includes spectral bands B1–B8, with B1–B7 having 30 m spatial resolution, and B8 having 15 m spatial resolution. Considering there are some zero spectral value regions black areas in Fig. 8a), and the borders are indented, a 5140-by-4754 (\(n = 24 ,435 ,560\)) pixels sub-image (Fig. 8b) was cropped and is the study area across which the NDVI is calculated. Figure 8b is the zoomed-in version of the area demarcated by the red border in Fig. 8a.

Fig. 8
figure 8

The Yellow Mountain region remotely sensed image and the subarea extracted for analysis

The distribution of the NDVI is shown in Fig. 9; three normal distributions (denoted by red, green, and blue curves) are fitted to these data as components of a finite mixture distribution (black dotted line curve).

Fig. 9
figure 9

Normal finite mixture distribution with three components for the Yellow Mountain subregion

Table 4 includes the MC and GR values as well as some hypothesis testing statistics obtained with the normality assumption for the NDVI index. By setting rook adjacency and constructing the binary spatial weights matrix, the values of the MC and the GR are 0.9294 and 0.0705, respectively, which indicate very strong positive SA. Meanwhile, the expected values, variances, and Z-scores under the null hypothesis of zero SA are listed; the extremely large Z-scores imply rejection of the null hypothesis. The asymptotic variances as well as their ratio of 0.2 (this calculated value coincides with the theoretically derived value; see the entry of the normal row and the SR column in Table 3) support an efficiency priority for the MC versus the GR. The power values go to one because of the large sample size, which is 24,435,560 here.

Table 4 Selected statistics for the NDVI of the Yellow Mountain region sub-image

7.2 A binary random variable case

The Huairou Reservoir region image is captured from Map World (http://www.tianditu.cn/). It was obtained in the summer of 2010 by ZY-3 and covers a 6843-by-7895 rectangle area comprising \(n = 54 ,025 ,485\) pixels; it is an RGB image. However, for analysis purposes, this image is dichotomized, with pixels being classified as being water or not water. Specifically, those values of pixels with no water are set to 0 (\(n_{2} = 45 ,977 ,103\); 85.10% of the total), and those values of pixels with water are set one (\(n_{1} = 8 ,048 ,382\); 14.90% of the total). Figure 10 portrays the original image and its binary counterpart.

Fig. 10
figure 10

The Huairou Reservoir remotely sensed image and its binary counterpart

Table 5 summarizes results for statistical hypothesis tests conducted in terms of the join count statistics, the MC, and the GR for this binary image. Hypothesis testing with the join count statistics is under nonfree sampling, whereas hypothesis testing with the other two statistics is under normality; all three utilize the rook adjacency. Statistics in Table 5 imply the present of significant positive SA because the counts of BB and WW joins are larger than their respective expectations, and the BW join count is significantly less than its expectation, both of which confirm the rejection of zero SA. Meanwhile, the MC and GR values are very close to their extreme positive values. In addition, the asymptotic variance ratio for the MC versus the GR also indicates a weakness of the GR for this large sample size. As an aside, these quantities together with the \(n_{1}\) and \(n_{2}\) values confirm Eqs. (15) and (16). The large sample size of 54,025,485 produces statistical powers of one.

Table 5 Selected statistics for the dichotomized Huairou Reservoir image

8 Conclusions and discussions

In its formative years, spatial statistics restricted much of its attention to small-to-medium datasets mostly because of computer technology constraints; more recently, it commonly engages large-to-massive datasets because computer technology allows it to. Therefore, analyses of properties of SA statistics for massive spatial data are necessary. This paper focuses on the efficiency and statistical power of the MC and the GR for massively large sample sizes and draws two main conclusions. Firstly, the MC is more efficient than the GR in terms of asymptotic variances, but only for the MPC and the MH cases when exact variances are discussed. (The MC and the GR can be equally efficient for other geographical configurations.) This is a finding that alters our understanding of the MC and the GR.

Secondly, the statistical power of these two indexes goes to one when sample size is large, negating some results established with small datasets. A number of additional findings also are important. One is that the asymptotic variance of the MC is more stable across, and hence less sensitive to, distributional assumptions, a conclusion implied by Theorems 1 to 4, because the asymptotic variance of the MC may be uniformly expressed by the formula introduced by Griffith (2010), whereas the one for the GR is determined by an underlying distribution’s kurtosis. The second finding is that the relative efficiency positions of the 184 empirical irregular surface partitioning specimens indicate that realistic geographic surface partitionings are between the MH and the regular SR or SQ configurations. The third finding is that the relationship between the MC and the GR may be expressed by Eq. (3), which highlights a negative correlation between these two statistics, and allows them to be differentiated according to attribute variable and connectivity features. A final finding complements results obtained by Cliff and Ord (1973): the MC is not more powerful than the GR for all possible geographical configuration types (e.g., the L and CN cases) and relatively large sample sizes. These asymptotic variance, efficiency, and power comparison results for large sample sizes and various spatial structures are relevant to especially massive spatial data analyses.

In addition, a comparative power visualization technique is presented in this paper that produces smoother power curves for any sample size. Plots appearing in Fig. 5 are generated by this technique; they contain more connectivity cases than those presented in Luo et al. (2017) and reveal that the MC is not more powerful than the GR for positive SA when the connectivity criteria are L or CN. Instead of obtaining the p value by ranking the test criteria with those random sample results (Hope 1968), this technique calculates the probability and the power through a formal inference protocol, and its significant results can lead to a rejection of the null hypothesis of zero SA, whereas the significant results of a Monte Carlo test can only indicate no spatial randomness. Finally, a discussion of the join count statistics reveals that Cliff and Ord (1973) might have focused on BB + BW rather than only on BB when considering the similarity between the MC and the join count statistics. The GR-BW power plots appearing in Fig. 7 reveal a surprising conclusion that the BW is more powerful than the GR for the SR, SQ, and H cases.

Once again, the conclusion is that the MC is preferable to the GR for a big spatial data analysis that always contains massive samples and has complex geographical configurations because the asymptotic variance of the former is smaller and more stable than that of the latter. Furthermore, the statistical power of these two statistics as well as the join count statistics approach one under the situation of big spatial data, i.e., the power advantage of any statistic existing in small samples, is lost.

In conclusion, this paper focuses on statistical properties of two SA coefficients in the background of big spatial data. These statistics need to be calculated no matter how big a sample size is–they emphasize different features of a spatial dataset and furnish input for choosing a proper model specification, which is a different issue from the meaningless statistical significance that arises from a massive sample size. The discussion of SA throughout this paper is from a global perspective; a local perspective that may relate to spatial heteroskedasticity also is relevant sometimes. This spatial heteroskedasticity refers to unstable/different means, variances, and possibly frequency distributions across a geographical landscape. Spatially varying means can be described with regression covariates. Spatially varying variances can be adjusted for along the lines of Oden (1995), Waldhör (1996), and Jackson et al. (2010); this is a future research topic. Griffith and Chun (2016) address yet another aspect of varying variance, namely the uncertainty of the SA parameter in a simultaneous autoregressive (SAR) model; they describe it with a beta–beta mixture distribution. Spatially varying frequency distributions can be assessed with diagnostic statistics (one goal here is sensitivity to specification error). Spatially varying SA can generate an outcome of different levels of local SA [i.e., LISA; (e.g., local Moran’s I), whose linear combination is proportional to the global SA coefficient]. Consequently, one manifestation for a given geographic landscape is that many significant LISA average to approximately zero, implying their global SA measure is not significant. Spatial heteroskedasticity reflects the uneven changes of geographical phenomena, changes that may be attributable to geographic diversity. Thus, spatial heteroskedasticity is a symptom of an inhomogeneous geographical landscape. Transcending this local perspective, SA also can be discussed in terms of a model perspective, because the MC can express the SA parameter rho in a SAR model, for example, as a sigmoid function (Griffith, 2003, p. 33); this link function provides a connection between the MC and spatial autoregressive models that contain many meaningful covariates. In this context, the SA term including rho in an SAR model represents missing spatially structured variables, hence substituting for uninclude covariates. These themes constitute topics for future research.