Keywords

Introduction

The complexity of computational experimentation in regional science has drastically increased in recent decades. Regional scientists are constantly developing more efficient methods, taking advantage of modern computational resources and geocomputational tools, to solve larger problem instances, generate faster solutions or approach asymptotics. The first formulation of the p-median problem provides a numerical example that required 1.51 min to optimally locate four facilities in a 10-node network [52]; three decades later, Church [16] located five facilities in a 500-node network in 1.68 min. As noted by Anselin et al. [7], spatial econometrics has also benefited from computational advances; the computation of the determinant required for maximum likelihood estimation of a spatial autoregressive model proposed by Ord [47] was feasible to apply for data sets not larger than 1000 observations. Later, Pace and LeSage [48] introduced a Chebyshev matrix determinant approximation that allows the computation of this determinant for over a million observations in less than a second. According to Blommestein and Koper [11], one of the first algorithms for constructing higher-order spatial lag operators, which was devised by Ross and Harary [54], required 8000 s (approximate computation time) to calculate the sixth-order contiguity matrix in a 100×100 regular lattice. Anselin and Smirnov [5] proposes new algorithms that are capable of computing a sixth-order contiguity matrix for the 3111 U.S. contiguous counties in less than a second.

An important aspect when conducting computational experiments in regional science is the selection of the way that the spatial phenomena are represented or conceptualized. This aspect is of special relevance when using a discrete representation of continuous space, such as polygons [34]. This representation can be accomplished through regular and irregular lattices; the use of one or the other could cause important differences in the computational times, solution qualities or statistical properties. We suggest four examples, as follows: (1) The method proposed by Duque et al. [21] for running the AMOEBA algorithm [1] requires an average time of 109 s to delimit four spatial clusters on a regular lattice with 1849 polygons. This time rises to 229 s on an irregular lattice with the same number of polygons. (2) For the location set covering problem, Murray and O’Kelly [46] concluded that the spatial configuration, number of needed facilities, computational requirements and coverage error all varied significantly as the spatial representation was modified. (3) Elhorst [24] warns that the parameters of the random effects spatial error and spatial lag model might not be an appropriate specification when the observations are taken from irregular lattices.Footnote 1 (4) Anselin and Moreno [4] finds that the use of regular or irregular lattice affects the performance of test statistics against alternatives of the spatial error components form.

However, returning to the tendency toward the design of computational experiments with large instances, there is an important difference between generating large instances of regular and irregular lattices. On the one hand, regular lattices are easy to generate, and there is no restriction on the maximum number of polygons. On the other hand, instances of irregular lattices are usually made by sampling real maps. Table 1 shows some examples of this practice.

Table 1 Annotated chronological listing of studies that use irregular lattices generated by sampling real maps

The generation of large instances of irregular lattices has several complications that are of special interest in this paper. First, the size of an instance is limited to the number of polygons of the available real lattices. Second, the possibility of generating a large number of different instances of a given size is also limited (e.g., generate 1000 instances of irregular lattices with 3000 polygons). Third, as shown in Fig. 1, the topological characteristics of irregular lattices built from real maps change drastically, depending on the region from where they are sampled, which could bias the results of the computational experiments.Footnote 2

Fig. 1
figure 1

Examples of two instances of 900 irregular polygons. (a) United States. (b) Spain

This paper seeks to contribute to the field of computational experiment design in regional science by proposing a scalable recursive algorithm (RI-Maps), which combines concepts from stochastic calculus (mean reversing processes), fractal theory and computational geometry to generate instances of irregular lattices with large number of polygons. The resulting instances have topological characteristics that are a good representation of the irregular lattices sampled from around the world. Last, the use of these instances guarantee that the difference in the results of computational experiments are not consequence of differences in the topological characteristics of the used lattices.

The remainder of this paper is organized as follows: Section “Conceptualizing Polygons and Lattices” introduces the basic definitions of the polygons and lattices and proposes a consensus taxonomy of the lattices. Section “Topological Characteristics of Regular and Irregular Lattices” presents a set of indicators that are used to characterize the topological characteristics of a lattice and shows the topological differences between regular and irregular lattices. Section “RI-Maps: An Algorithm for Generating Realistic Irregular Lattices” presents the algorithm for generating irregular lattices. Section “Results” evaluates the capacity of the algorithm to generate realistic irregular lattices. Finally, Section “Application of RI-Maps” presents the conclusions.

Conceptualizing Polygons and Lattices

A polygon is a plane figure enclosed by a set of finite straight line segments. Polygons can be categorized according to their boundaries, convexity and symmetry properties, as follows:

  1. (i)

    Boundary: A polygon is simple when it is formed by a single plain figure with no holes, and it is complex when it contains holes or multiple parts.Footnote 3

  2. (ii)

    Convexity: In a convex polygon, every pair of points can be connected by a straight line without crossing its boundary. A concave polygon is simple and non-convex.

  3. (iii)

    Symmetry: A regular polygon has all of its angles of equal magnitude and all of its sides of equal length. A non-regular polygon is also called irregular [19, 38].

A lattice is a set of polygons of any type, with no gaps and no overlaps, that covers a subspace or the entire space. Next, a more formal definition: A lattice is the division of a subspace SR n into k subsets iS such that ∪ i = S and ∩ i = ϕ, where ϕ is the empty set of R n [32].Footnote 4 There exist different taxonomies of lattices depending on the field of study. In an attempt to unify these taxonomies, a consensus lattice taxonomy is presented in Fig. 2. This taxonomy classifies lattices according to the shapes of their polygons, their spatial relationship and the use, or not, of symmetric relationships to construct the latticeFootnote 5:

Fig. 2
figure 2

Consensus taxonomy of lattices

  1. (i)

    According to the variety of the shapes of the polygons that form the lattice: Homomorphisms are lattices that are formed by polygons that have the same shape, and polymorphisms are lattices that are formed by polygons that have different shapes.

  2. (ii)

    According to the regularity of the polygons that form the lattice and the way in which they intersect, each vertexFootnote 6: Regular, lattices formed by regular polygons in which all of the vertexes join the same arrangement of polygons [57]; semi-regular, when the polygons are regular but there are different configurations of vertexes; and irregular otherwise [28].

  3. (iii)

    According to the existence of symmetric relationships within the latticeFootnote 7: Symmetric, when the lattice implies the presence of at least one symmetric relationship; and asymmetric otherwise.

  4. (iv)

    According to the symmetric relationship of translation: A lattice is periodic if and only if it implies the use of translation without rotation or reflection; it is aperiodic otherwise [57].

Table 2 shows an example of each category of this consensus taxonomy.

Table 2 Example lattices

The topological characteristics of lattices are usually summarized through the properties of the sparse matrix that represent the neighboring relationships between the polygons in the map, the so-called W matrix [8, 12, 30, 41, 50].Footnote 8 This paper uses six indicators of which the first three are self-explanatory: The maximum (M n ), minimum (m n ) and average number of neighbors per polygon (\(\boldsymbol{\mu }_{1}\)). The fourth indicator, the sparseness (S), see Eq. (1), is defined as the percentage of ones entries with respect to the total number of entries in a binary W matrix (k 2, where k is the number of polygons in the lattice). The fifth indicator is the first eigenvalue of the W matrix (\(\boldsymbol{\lambda }_{1}\)). It is an algebraic construct commonly used in graph theory [26, 58] and regional science [1214, 30] to summarize different aspects of the W matrix. The first eigenvalue, λ 1, is the maximum real value, λ, that solves the system given by Eq. (2), where I k is the identity matrix of order k × k. The last indicator, (\(\boldsymbol{\mu }_{2}\)), is the variance of the number of neighbors per polygon. It measures the spatial disorder of a lattice, and is given by Eq. (3), where W ij denotes the value of W in the row i and column j.

$$\displaystyle\begin{array}{rcl} S = \frac{\sum W} {k^{2}} & &{}\end{array}$$
(1)
$$\displaystyle\begin{array}{rcl} (W -\lambda I_{k})v& =& 0{}\end{array}$$
(2)
$$\displaystyle\begin{array}{rcl} \mu _{2} = \frac{\sum _{i=1}^{k}\left (\sum _{j=1}^{k}W_{ij} -\mu _{1}\right )^{2}} {k - 1} & &{}\end{array}$$
(3)

Within the field of regional science, lattices are frequently used with two purposes: First, real lattices can be used to study real phenomena, e.g., to analyze spatial patterns, confirm spatial relationships between variables and detect spatio-temporal regimes within a spatial panel, among others. Second, lattices can be used to evaluate the behavior of statistical tests [4, 45], algorithms [21] and topological characteristics of lattices [8, 40, 41]. In these cases, it is necessary to use sets of lattices that satisfy some requirements imposed by the regional scientist, e.g., the number of polygons, regularity or irregularity of the polygons and the number of instances. To accomplish this goal, it is a common approach to use a geographical base for real or simulated data polymorphism irregular aperiodic asymmetric (e.g., real lattices and Voronoi diagrams) or homomorphism regular periodic symmetric lattices (e.g., regular lattices). The following sections are restricted to the second use of lattices.

Topological Characteristics of Regular and Irregular Lattices

As stated above, regional scientists have the option of using regular or irregular lattices in their computational experiments. However, this section will show that there are important topological differences between these types of lattices.

Real lattices have topological characteristics that vary substantially from location to location. As an example, Fig. 3 presents the topological characteristics of lattices of different sizes (100, 400 and 900 polygons) sampled in Spain and the United States. Each box-plot summarizes 1000 instances. Important differences emerge between these two places: Spanish polygons tend to have more neighbors, are more disordered and their first eigenvalues are higher in mean and variance. These differences in the topological characteristics have direct repercussions on the performance of algorithms whose complexity depends on the neighboring structure [1, 21].

Fig. 3
figure 3

Topological differences of lattices from Spain and the United States

Regular lattices and Voronoi diagrams are also commonly used for computational experiments because they are easy to generate, there is no restriction on the size of the instances (the number of polygons in the map) and their over-simplified structure allows for some mathematical simplifications or reductions [9, 31, 61]. However, the topological characteristics of these lattices are substantially different from real, irregular lattices. These differences can lead to biased results in theoretical and empirical experiments, e.g., spatial stationarity in STARMA models [36], improper conclusions about the properties of the power and sample sizes in hypothesis testing [4, 45] and the over-qualification of the computational efficiency of the algorithms [1, 21], among others. Table 3 shows the topological differences between real maps, two types of regular lattices and Voronoi diagrams.

Table 3 Average topological characteristics for real maps, regular lattices and Voronoi diagrams

To illustrate the magnitude of these differences, we calculated the topological indicators (M n , m n , μ 1, μ 2, S and λ 1) for six thousand lattices of different sizes (1000 instances each of 100, 400, 900, 1600, 2500 and 3600 polygons) that were sampled around the world at the smallest administrative division available in Hijmans et al. [35]. As an example, Fig. 4 shows seven of those instances. These real instances are then compared to regular lattices that have square and hexagonal polygons and Voronoi diagrams.Footnote 9 To avoid the boundary effect on M n , m n , μ 1 and μ 2, the bordering polygons are only considered to be neighbors of interior polygons. Last, S and λ 1 are calculated using all of the polygons. Table 3 shows that regular lattices are not capable of emulating the topological characteristics of real lattices in any of the indicators: μ 2 = 0 and M n , m n , μ 1 = 4 and 6 (for squares and hexagons, respectively) are values that are far from those of real lattices. The values obtained for λ 1 and S indicate that regular lattices of hexagons are more connected than real lattices, while regular lattices of squares are less connected than real lattices. With regard to Voronoi diagrams, M n and m n indicate that they are not capable of generating atypically connected polygons. The values of μ 1 are close to real lattices. Finally, Voronoi diagrams are more ordered than real lattices, with values of μ 2 close to 1. 7, while real lattices report values of μ 2 that are close to 8.

Fig. 4
figure 4

Base map and example of a random irregular lattice obtained from it

RI-Maps: An Algorithm for Generating Realistic Irregular Lattices

This section is divided into two parts. The first part introduces an algorithm that generates irregular polygons based on a mean reverting process in polar coordinates, and the second part proposes a novel method to create polymorphic irregular aperiodic lattices with topological characteristics that are similar of those from real lattices.

Mean Reverting Polygons (MR-Polygons)

The problem of characterizing the shape of irregular polygons is commonly addressed in two ways, that is, evaluating its similitude with a circle [33] or describing its boundary roughness through its fractal dimension [10, 25].Footnote 10 In this paper, we apply both concepts in different stages during the creation of a polygon: The similitude with a circle to guide a mean reverting process in polar coordinates, and the fractal dimension to parameterize the mean reverting process.

Mean Reverting Process in Polar Coordinates

Different indexes are used to compare irregular polygons with a circle: Elongation ratio [60], form ratio [37], circularity ratio [44], compactness ratio [18, 29, 53], ellipticity index [56] and the radial shape index [17]. As Chen [15] states, all of these indexes are based on comparisons between the irregular polygon and its area-equivalent circle. Under this relationship, an irregular polygon can be conceptualized as an irregular boundary with random variations following a circle, which lead us to use a mean reverting process in polar coordinates to create irregular polygons.Footnote 11 A mean reverting process is a stochastic process that takes values that follow a long-term tendency in the presence of short-term variations. Formally, the process x at the moment t is the solution of the stochastic differential equation (4), where μ is the long-term tendency, α is the mean reversion speed, σ is the gain in the diffusion term, x(t 0) is the value of the process when t = 0 and {B t } t ≥ 0 is an unidimensional Brownian [43]. Equation (5) shows the general solution; however, for practical purposes, hereafter we use the Euler discretization method, which is given by Eq. (6), where ε t is white noise.

$$\displaystyle\begin{array}{rcl} dX_{t}& =& \alpha (\mu -X_{t})d_{t} +\sigma dB_{t}{}\end{array}$$
(4)
$$\displaystyle\begin{array}{rcl} x(t)& =& e^{-\alpha (t-t_{0})}\left (x(t_{ 0}) +\int _{ t_{0}}^{t}e^{\alpha (s-s_{0})}\alpha \mu ds +\int _{ t_{0}}^{t}e^{\alpha (s-s_{0})}\sigma dB(s)\right ),{}\end{array}$$
(5)
$$\displaystyle\begin{array}{rcl} X_{t}& =& X_{t-1} +\alpha (\mu -X_{t-1})\varDelta _{t} +\sigma \sqrt{\varDelta _{t}}\epsilon _{t}{}\end{array}$$
(6)

Algorithm 1 MR-Polygon: mean reverting polygon.

Algorithm 1 presents the procedure for generating an irregular polygon P in polar coordinates using, as a data generator, a mean reverting process (X t ). This algorithm guarantees that the distance between two points in X t , following the process X t , is equal to the distance between the same two points in P when following the process P counterclockwise. The purpose of this equivalence is to preserve the fractal dimension of X t in P. The angles Δ R and ϕ 1 in Algorithm 1 are the result of solving the geometric problem presented in Fig. 5. These two angles are used in Eq. (7) to establish the location of the next point in P. The points of P are denoted as P θ , with θ between 0 and 2π.

Fig. 5
figure 5

Geometric problem to preserve the length and the fractal dimension of the mean reverting process when it is used to create an irregular polygon. (a) \(X_{t} \geq X_{t-\varDelta _{t}}\). (b) \(X_{t} <X_{t-\varDelta _{t}}\)

$$\displaystyle{ P_{\theta +\phi _{1}} = \left \{\begin{array}{l l} P_{\theta } +\varDelta _{R}&\quad \text{if}\ X_{t+\varDelta _{t}} \geq Xt \\ P_{\theta } -\varDelta _{R}&\quad \text{if}\ X_{t+\varDelta _{t}} <Xt.\\ \end{array} \right. }$$
(7)

Because the process P depends on the parameters α, μ and σ, it is worthwhile to clarify their effect on the shape of polygon P: α is the speed at which the process reverts to the circle with radius μ and σ is the scaling factor of the irregularity of the polygon. High values of α and low values of σ generate polygons that have shapes that are close to a circle with radius μ. Finally, Δ t is utilized to preserve the fractal dimension of both processes, X and P, and determines the angular step, ϕ 1 (see Fig. 5).

MR-Polygon Parameterization

The process of establishing the values for α, μ, σ, Δ t and X 0 is not an easy task, and their values must be set in such a way that the shape of P is similar to a real irregular polygon. However, how do we determine whether a polygon P satisfies this condition? In this case, the fractal dimension appears to be a tool that offers strong theoretical support to assess the shape of a given polygon.

According to Richardson [53], the fractal dimension D of an irregular polygon (such as a coast) is a number between 1 and 2 (1 for smooth boundaries and 2 for rough boundaries) that measures the way in which the length of an irregular boundary L (Eq. (8)) changes when the length of the measurement instrument (ε) changes. The fractal dimension is given by Eq. (9), where \(\hat{C}\) is a constant.

In general, an object is considered to be a fractal if it is endowed with irregular characteristics that are present at different scales of study [42]. For practical purposes, D is obtained using Eq. (9) and is given by 1 minus the slope of log(L(ε)). This procedure is commonly known as the Richardson plot.

$$\displaystyle\begin{array}{rcl} L(\epsilon )& =& \hat{C}\epsilon ^{1-D}{}\end{array}$$
(8)
$$\displaystyle\begin{array}{rcl} \log (L(\epsilon ))& =& (1 - D)\log (\epsilon ) -\log (\hat{C}){}\end{array}$$
(9)

In almost all cases, the Richardson plot can be explained with two line segments that have different slopes; then, two fractal dimensions can be obtained: textural, for small scales, and structural, for large scales [39]. As illustrated, Fig. 6 shows a segment of the United States east coast taken from Google maps in two resolutions. Note that as the resolution increases, some irregularities that were imperceptible at low resolution become visible. In this sense, it can be said that irregularities at low resolution define the general shape and are related to the structural dimension, while irregularities at high resolution capture the noise and are related to the textural dimension. Regional scientists tend to use highly sampled maps, which preserve the general shape but remove the small variations. This simplification does not change the topological configuration of the maps [20]. Figure 7 presents the Richardson plot of the external boundary of the United States and its textural and structural fractal dimension.

Fig. 6
figure 6

Illustrative example of irregularities explained by the structural and textural dimension

Fig. 7
figure 7

Richarson plot to estimate the textural and structural dimension of the external boundary of the United States

In the field of stochastic processes, some approaches, which are based on different estimations of the length, have been made to characterize them through their fractal dimension. In our case, an experimental approach based on the fractal dimension of real polygons is proposed to select an appropriate combination of the parameters α and σ to generate realistic irregular polygons. Because our interest is on general shape rather than small variations, we account only for the structural dimension.Footnote 12 The parameterization process is divided into two parts: In the first part, the frequency histogram of the fractal dimensions of the real polygons is constructed. In the second part, we propose a range of possible values for α and σ, given μ, X 0, Δ t , which generates fractal dimensions that are close to those obtained in the first part. Because the level of the long-term tendency μ does not affect the length of X and because Algorithm 1 guarantees that the length is preserved, μ can be defined as a constant without affecting the fractal dimension. Hereafter, it is assumed that μ = X 0 = 10. The value of Δ t is set to be 0. 001 to properly infer both of the fractal dimensions.

The empirical distribution of the fractal dimension of the irregular polygons is calculated over a random sample of 10, 000 polygons from the world map used in Section “Topological Characteristics of Regular and Irregular Lattices”. The result of this empirical distribution is presented in Fig. 8a. To find the fractal dimension of the MR-Polygons, we generate a surface of the average dimensions as a function of the values of α and σ, which range from 0. 01 to 5 with steps of 0. 1 (Fig 8b). The resulting surface indicates that the fractal dimension is mainly affected by σ, especially when looking at small dimensions. Additionally, it is found that fractal dimensions close to 1. 23 are obtained when σ takes on values between 1. 2 and 1. 5, regardless of the value of α.

Fig. 8
figure 8

Stages to find the values of α and σ. (a) Fractal dimensions of real polygons. (b) Fractal dimension of simulated polygons as a function of α and σ

Figure 9 presents some examples of polygons using different values of α and σ. The polygons in the second row, which correspond to σ = 1. 5, produce irregular polygons that have a realistic structural fractal dimension. Additionally, in the same figure, both the original (gray line) and sampled (black line) polygons reinforce the fact that sampling a polygon does not affect the structural dimension. From now on, we will use sampled polygons to improve the computational efficiency.

Fig. 9
figure 9

Examples of stochastic polygons generated using Algorithm 1 with different values of σ and α

Recursive Irregular Maps (RI-Maps)

Up to this point, we were able to generate irregular polygons with fractal dimensions that are similar to those from real maps. The next step is to use these polygons to create irregular lattices of any size whose topological characteristics are close to the average values obtained for these characteristics in real lattices around the world. For this step, we formulate a recursive algorithm on which an irregular frontier is divided into a predefined number of polygons using MR-Polygons. Our conceptualization of the algorithm was made under three principles: (1) Scalability: Preserving the computational complexity of the algorithm when the number of polygons increases; (2) Fractality: Preserving the fractal characteristics of the map at any scale; and (3) Correlativity: Encouraging the presence of spatial agglomerations of polygons with similar sizes, which is commonly present in real maps in which there are clusters of small polygons that correspond to urban areas.

Algorithm 2 presents the RI-Maps algorithm to create polymorphic irregular aperiodic asymmetric lattices with realistic topological characteristics. This algorithm starts with an initial empty irregular polygon, pol, (the outer border of the RI-Map) and the number of polygons, n, to fit inside. In a recursive manner, a portion of the initial polygon pol starts being divided following a depth-first strategy until that portion is divided into small polygons.Footnote 13 This process is repeated for a new uncovered portion of pol until the whole area of pol is covered. Because the recursive partitions are made by using MR-Polygons, we take the values of α from a uniform distribution between 0. 1 and 0. 5, and the values of σ from a uniform distribution between 1. 2 and 1. 5. Regarding μ, X 0 and Δ t , we use values proposed in Section “Mean Reverting Polygons (MR-Polygons)”. Finally, to guarantee the computational treatability of the geometrical operations, each polygon comes from a sampling process of 30 points. The main steps of the RI-Maps algorithm are summarized in Fig. 10.

Fig. 10
figure 10

Diagram of the main steps of the RI-Maps algorithm

Algorithm 2 RI-Map: recursive irregular map.

The RI-Maps algorithm has three unknown parameters:

  • p 1: Because each polygon is created by the MR-Polygons using a polar coordinate system that is unrelated to the map being constructed with RI-Maps, it is necessary to apply a scaling factor, \(\sqrt{\frac{p_{1 } \times area(pol)} {n\times \pi \times \mu ^{2}}}\), that adjusts the size of the MR-Polygon before being included into the RI-Map.

  • p 2: When a new polygon is used to divide its predecessor, its capacity to contain new polygons (measured by the number of polygons) is proportional to its share of the unused area of its predecessor. However, to enforce the appearance of spatial agglomerations of small polygons, the number of polygons that the new polygon can hold is increased with a probability of p 2.

  • p 3: When p 2 indicates that a new polygon will hold more polygons, the number of extra polygons is calculated as the p 3 percent of the number of missing polygons that are expected to fit into the unused area of its predecessor polygon. The number of extra polygons is subtracted from the unused area to keep constant the final number of polygons (n).

Table 4 illustrates the effect of the parameters p 2 and p 3 on the topological characteristics of RI-Maps. In the first row, p 2 and p 3 equal 0, which generates highly ordered lattices without spatial agglomerations. The second and third rows are more disordered than the first row and have spatial agglomerations, with those in the second row less frequent and evident than those in the third row. As will be shown in the next section, lattices in the third row are more realistic in terms of their topological characteristics.

Table 4 Examples of RI-Maps of 400, 1600 and 3600 polygons using different combinations of parameters

To find a combination of p 1, p 2 and p 3 that generates realistic RI-Maps in terms of their topological characteristics, we use a standard genetic algorithm, where the population γ at iteration i, denoted as γ i, is formed by the genomes \(\gamma _{j}^{i} = [p_{j_{1}}^{i},p_{j_{2}}^{i},p_{j_{3}}^{i}]\), where \(p_{j_{1}}^{i}\), \(p_{j_{2}}^{i}\) and \(p_{j_{3}}^{i}\) are real numbers between 0 and 1, representing instances of p 1, p 2, p 3, which are denoted as phenomes. In this case, \(i \in \mathbb{N}\) between 0 and 20 and \(j \in \mathbb{N}\) between 0 and 100. To evaluate the quality of each genome’s fitness function, F(γ j i) is defined in Eq. (10), where θ is a set of polygons, ϕ k is the relative importance for a map of k polygons and f k (γ j i) is a function given by Eq. (11) that measures the average difference between the values of the topological indicators of real lattices and those values of RI-Maps formed by k polygons using the phenome γ j i. For the sake of simplicity, in Eq. (11), Ψ k = [M n , m n , μ 1, μ 2, S, λ 1] denotes the vector of real indicators and Ψ k (γ j i) denotes the vector for the mean values of RI-Maps with k polygons using γ j i. The superindex l is used in Ψ k l and Ψ k l(γ j i) to refer to the lth indicator in the real and simulated values, respectively. Finally, ns is the number of simulations to be generated with each genome.

$$\displaystyle\begin{array}{rcl} F(\gamma _{j}^{i}) = \frac{\left (\sum _{k\in \theta }\phi _{k}f_{k}(\gamma _{j}^{i})\right )} {\sum _{k\in \theta }\phi _{k}} & &{}\end{array}$$
(10)
$$\displaystyle\begin{array}{rcl} f_{k}(\gamma _{j}^{i}) = \frac{\sum _{l=1}^{6}\frac{(\sum _{s=i}^{ns}\varPsi _{ k}^{l}(\gamma _{ j}^{i}))-ns\varPsi _{ k}^{l}} {ns\varPsi _{k}^{l}} } {6} & &{}\end{array}$$
(11)

The algorithm starts with an initial random population of 100 genomes to obtain the best four genomes. The subsequent populations are composed of two parts. The first 64 genomes are all of the possible combinations of the last best 4 genomes, and the other 36 genomes are random modifications of those 64 genomes. Because of the computational time required to evaluate Eq. (10), only lattices of 400 and 1600 were used, with an importance of ϕ 400 = 1 and ϕ 1. 600 = 2, respectively. The algorithm reached the optimal value after 13 iterations with p 1 = 0. 010, p 2 = 0. 050 and p 3 = 0. 315.

Results

Figure 11 presents a graphical comparison of the topological characteristics of real RI-Maps and Voronoi diagrams. The values for the RI-Maps were obtained from 100 instances.Footnote 14 The results show that RI-Maps have a maximum (M n ) and a minimum (m n ) number of neighbors that are very close to the values found in the real lattices. Regarding the average number of neighbors, both RI-Maps and Voronoi diagrams show similar values that are slightly higher than those observed in real lattices. However, because the number of neighbors is an integer value, it can be concluded for all three cases that the average number of neighbors is 6, which verifies the findings by Weaire and Rivier [59] in irregular lattices. Regarding μ 2, RI-Maps are a better approach to simulate the level of disorder found in real lattices. To facilitate the visualization, the values of S are reported as \(S {\ast}\sqrt{n}\). The results show that RI-Maps replicate the values of real lattices at any size, while Voronoi diagrams report higher values that tend to increase with the number of polygons. Last, RI-Maps have values of λ 1 that are closer to the values of real lattices, especially for large instances.

Fig. 11
figure 11

Comparison of the topological characteristics of real lattices, RI-Maps and Voronoi diagrams

Table 5 presents the average and standard deviation of RI-Maps under the optimal parameters (p 1 = 0. 010, p 2 = 0. 050, p 3 = 0. 315) found in the previous section. This table completes the topological information on lattices presented in Table 3. Figure 12 shows the running times for different instance sizes using a HP ProLiant DL140 Generation 3 computer running the Linux Rocks 6.0 operating system equipped with 8 GB RAM and a 2.33 GHz Intel Xeon Processor 5140. The dotted line shows the x = y values, but its non-linear appearance is due to the quadratic scale used in the x-axis to improve the visualization of the plot. Although the reported times correspond to a non-optimized code, the plot shows an almost linear relationship between the problem size and the running time.Footnote 15

Table 5 Topological characteristics (mean and standard deviation) for RI-Maps
Fig. 12
figure 12

Running times of RI-Maps while the number of areas increases

Application of RI-Maps

In this section, we present an example of the use of RI-Maps based on the computational experiments designed by Duque et al. [21] to compare the efficiency of the improved AMOEBA algorithm. To present the results, Duque et al. [21] proposed three computational experiments; one of them reports the running time of AMOEBA as the number of polygons of regular lattices increases. In this paper, we will run the same algorithm not only for regular lattices but also for real irregular and simulated irregular lattices (RI-Maps). First, we want to see whether the conclusions that are obtained for regular lattices can be extrapolated to irregular lattices. Second, we want to see if the results obtained with RI-Maps are also valid for real irregular maps. This experiment was executed with a HP ProLiant DL140 Generation 3 computer running the Linux Rocks 6.0 operating system equipped with 8 GB RAM and a 2.33 GHz Intel Xeon Processor 5140.

In the generated experiment, for each type of lattice, there were 30 instances with 1600 polygons. For each instance, we generated a spatial process that had four clusters, each using the methodology proposed by Duque et al. [21]. Last, the instances for real maps were obtained from sampling the same world map that was used in previous sections. Figure 13 presents the distribution of the running times obtained for each type of lattice, and Table 6 compares the distributions with the two-sided Kolmogorov-Smirnov test [27]. The null hypothesis of the Kolmogorov-Smirnov test is that the two samples come from the same probability distribution. Regarding the first question, it is clear that using a regular lattice for testing the AMOEBA underestimates the execution times. On the other hand, the distribution of the running times obtained for real maps and RI-Maps is statistically equal, which shows the benefits of using RI-Maps because it can automatically generate instances without limiting the maximum number of polygons.

Fig. 13
figure 13

Execution times of AMOEBA over regular lattices and RI-Maps of 1600

Table 6 Kolmogorov-Smirnov test to compare the distributions of AMOEBA execution times using different lattices

Conclusions

This paper introduces an algorithm that combines fractal theory, the theory of stochastic processes and computational geometry for simulating realistic irregular lattices with a predefined number of polygons. The main goal of this contribution is to provide a tool that can be used for geocomputational experiments in the fields of exploratory spatial data analysis, spatial statistics and spatial econometrics. This tool will allow theoretical and empirical researchers to create irregular lattices of any size and with topological characteristics that are close to the average characteristics found in irregular lattices around the world.

As shown in the last section, the performance of some geocomputational algorithms can be affected by the topological characteristics of the lattices in which these algorithms are tested. This situation can lead to an unfair comparison of algorithm performances in the literature. With the algorithm proposed in this paper, the differences in the computational performances will not be affected by the topological characteristics of the lattices.

This paper also shows that the topological characteristics of regular lattices (with squared and hexagonal polygons) and Voronoi diagrams (commonly used to emulate irregular lattices) are far from the topological characteristics that are found in real lattices.