When sampling an environmental resource it is important to randomly choose the sampling locations (Conn et al. 2016) over the study area to provide formal statistical inference from the sample to the population. Smith et al. (2017) established that in ecology 12% of field studies selected samples using simple random sampling (SRS) and 9% used systematic sampling. These methods are probabilistic sampling designs (meaning there is an element of randomness in selecting their samples) and have well established statistical properties (MacKenzie 2006; Sica 2006; Stehman 2009). Howevermost of the ecological studies in Smith et al.’s (2017) review were not probabilistic sampling designs. Some studies used haphazard or subjective judgement sampling methods and some studies did not specify how their samples were drawn (Smith et al. 2017). This is troubling because data gathered in a haphazard or subjective way can produce unrepresentative samples and biased estimates of population characteristics (Albert et al. 2010; Levy and Lemeshow 2013)

Choosing an appropriate sampling design for a particular study can be difficult and there is no best design for all research questions (Kenkel et al. 1990; Stehman and Overton 1994). This choice depends on many things including the study objectives, available sampling frames and known auxiliary variables. This paper focuses on making an inference from a sample to the entire population using a specific class of probabilistic sampling designs called spatially balanced sampling designs. These designs were chosen because they are particularly useful for sampling natural resources (Stevens and Olsen 2004). For a full treatment on the subject, the reader is referred to Benedetti et al. (2015).

What is spatially balanced sampling?

To achieve good estimates of population characteristics, the spatial pattern of the sample should be similar to the spatial pattern of the population. However, the spatial pattern of the response variable may not be known before the sample is drawn. Fortunately, a common spatial feature in environmental sampling is that nearby locations tend to be more similar because they interact with one another and are influenced by the same set of factors (Stevens and Olsen 2004). Therefore, an effective strategy is to spatially spread the sample evenly over the resource. A sample that is evenly spread over the resource is called a spatially balanced sample. Stevens and Olsen (2004) introduced the phrase spatially balanced sampling and proposed a statistic that measures the spatial balance or regularity of a sample using Voronoi polygons.

Why should environmental scientists use spatially balanced designs?

The potential advantages of using spatially balanced sampling have been demonstrated in the field of environmental science (Stevens and Olsen 2004; Christianson and Kaufman 2016; McGarvey et al. 2016). The first advantage is that spatially balanced samples are evenly spread over the resource. Covering the resource avoids under-coverage and over-coverage, which can happen with probabilistic sampling designs with poor spatial balance (Stevens and Olsen 2004; Christianson and Kaufman 2016). If a researcher’s analysis requires clusters of nearby observations, spatially balanced cluster sampling could be useful (Robertson et al. 2017).

Spatially balanced samples can be very efficient when the response variable has a strong spatial trend (Stevens and Olsen 2004; Barabesi and Franceschi 2011; Grafström and Lundström 2013; Robertson et al. 2013; Benedetti et al. 2017), because their design-based estimators take into account spatial heterogeneity (Wang et al. 2012) and spatial auto-correlation (Haining 2003). When estimating a population total or mean using the Horvitz-Thompson estimator, the local mean variance estimator (Stevens and Olsen 2003) is a popular variance estimator. There have been many studies showing the effectiveness of spatially balanced sampling with this estimator on a variety of populations with different spatial structures (c.f. Stevens and Olsen 2004; Grafström et al. 2012; Grafström and Lundström 2013; Robertson et al. 2013, 2018; Grafström & Matei 2018). If the spatial trend is weak or if there is no trend at all, there is no statistical advantage in choosing spatially balanced designs over other probabilistic designs (Robertson et al. 2013).

Another potential advantage of spatially balanced sampling is reduced sample cost. As mentioned above, spatially balanced designs can produce precise design-based estimators when there is a spatial trend in the response variable. Hence, to achieve a desired level of precision, fewer observations may be required when spatially balanced sampling is used. This was illustrated by Kermorvant et al. (2017, 2019) for a clam monitoring program in Arcachon Bay, France. They showed that when spatially balanced designs were used, the total survey cost was reduced by 30% when compared with simple random sampling.

Another useful feature of some spatially balanced designs is that they can draw spatially balanced oversamples (replacement units). This is particularly useful for environmental sampling because when sample units cannot be observed (private property, inaccessible, too dangerous, etc.), replacement units are often required to achieve the desired sample size (Stevens and Olsen 2004; Robertson et al. 2018; Theobald et al. 2007). For example, in the Oklahoma statewide stream and river monitoring program, only 130 of the 177 randomly chosen sites could be observed (Oklahoma Water Resources Board 2013). Of the unobservable sites, eight were on private land and 39 had dry channels or were not accessible. To achieve the desired sample size, the researchers selected replacement sites from an oversample drawn using simple random sampling. The potential advantage of using spatially balanced oversamples is that the observed sample maintains some degree of spatial balance over the observable resource (Stevens and Olsen 2004; Robertson et al. 2018). Although oversampling is of practical importance, it does not eliminate the non-response from unobservable units or the bias of an inference (Robertson et al. 2018).

Generalized Random Tessellation Stratified

Generalized Random Tessellation Stratified (GRTS) (Stevens and Olsen 2004) is the most popular spatially balanced sampling design for environmental studies (Grafström and Tillé 2013; Foster 2016). It was developed by the U.S. Environmental Protection Agency (EPA) for the National Environmental Monitoring and Assessment Program (Messer et al. 1991; Stevens and Olsen 2004). The GRTS uses a complex algorithm to draw its sample, and we briefly discuss its main steps here. The reader is referred to Olsen et al. (2012) for a full, non-technical description of the GRTS. Initially, a grid is superimposed over the study area and each grid cell is hierarchically numbered using a base four numbering system. The numbered grid cells are then randomly permuted using reverse hierarchical ordering and mapped (in order) to the real line. A systematic sample from the ordered grid cells is then drawn, and one sampling unit is randomly selected from each of these grid cells. The selected units are then mapped back to their respective locations in the study area, to yield the sample locations. The systematic sampling and hierarchical ordering that the GRTS uses ensures that the sample is spatially balanced (Stevens and Olsen 2004).

To investigate GRTS’s popularity, the Google Scholar search engine was used because it provided access to a wide range of publication types. It indexes journal papers (published online or in paper format), conference proceedings, posters and technical reports from research organizations in both the public and private sectors. The keyword “GRTS” was used for the search, and we did not include citations from 2018. All the documents found were categorized as either “publications” or “reports”. Publications included peer-reviewed journal articles and refereed conference proceedings, and reports included all other publication types. Results are displayed in Fig. 1.

Fig. 1
figure 1

Flow representation of use and/or citations of GRTS in the literature, publication date (stars) of several spatially balanced designs and R packages (arrows) are shown

Our analysis found 600 documents citing the GRTS throughout the world. The citation and/or use of the GRTS showed a steady increase until 2013, after which it flattened out. At the beginning, there were more reports than publications, but this trend appears to have reversed since 2014. Most of the documents found were in the fields of environmental science (mostly ecology but also environmental chemistry) and statistics (new designs, tests and comparisons). There were only two publications from other fields. The first was in economics, where the GRTS was compared with existing sampling designs for business surveys (Dickson et al. 2014) and the second was a thesis on sampling standards for maintenance management quality assurance (Liu and Chen 2018).

Other spatially balanced sampling designs

Many spatially balanced designs have been proposed in the literature. In this section, we mention several approaches.

The Local Pivotal Method (LPM) (Grafström et al. 2012) is a flexible spatially balanced design that can draw equal and unequal probability samples in multiple dimensions. Unequal probability sampling can be more efficient than equal probability sampling if there is a positive correlation between the inclusion probabilities and the response values (Robertson et al. 2013). Additional dimensions could include auxiliary information such as ecological threats, time intervals, species population structure or environmental data (Brown et al. 2015). The Swedish national forest inventory, for example, has implemented the LPM with five auxiliary variables (Grafström et al. 2017). The LPM is a popular method with 100 citations (using Google Scholar), where most of its applications were related to forestry.

Grafström (2012) also presented spatially correlated Poisson sampling (SCPS). This design is a modification of correlated Poisson sampling (Bondesson and Thorburn 2008) that draws spatially balanced samples. The LPM is algorithmically easier than the SCPS, but the SCPS may produce better results for some populations (Grafström and Schelin 2014).

Balanced acceptance sampling (BAS) (Robertson et al. 2013, 2017) is another spatially balanced design. The BAS uses the Halton sequence (Halton 1960) to spread its sample across multiple dimensions. The BAS is conceptually simple, computationally efficient and is particularly useful for drawing spatially balanced oversamples (Robertson et al. 2018). We found 34 publications citing the BAS, where most of the papers were methodological rather than applied. The BAS has been used to survey bats in Bighorn Canyon National Recreation Area (Keinath and NRA 2016) and is being used for New Zealand’s national monitoring program (van Dam-Bates et al. 2018). The BAS is well suited for areal resources (geographic areas), but it can be inefficient on some point resources (Robertson et al. 2018). To improve the performance of the BAS on point resources, Robertson et al. (2018) presented Halton iterative partitioning (HIP). This spatially balanced design uses properties of the Halton sequence to partition a point resource into nested boxes to draw its sample, rather than using the sequence itself.

Benedetti and Piersimoni (2017) presented a flexible class of spatially balanced designs that draw their samples based on a within-sample distance (Benedetti and Piersimoni 2017). The algorithm is simple to implement in multiple dimensions, and any distance or similarity measure can be used to define the within sample distance.

Spatially balanced sampling packages in R

Several R software (R Core Team 2014) packages are freely available to draw spatially balanced samples. To draw the GRTS samples, spsurvey (Kincaid and Olsen 2015) or SDraw (McDonald 2016) can be used. These packages can draw samples from point resources (geographic locations), linear resources (rivers) and areal resources (geographic areas), and can also draw spatially balanced oversamples. The spsurvey package can also draw stratified spatially balanced GRTS samples with user-defined strata.

The other spatially balanced designs mentioned in this article can be selected using the following packages. BalancedSampling (Grafström and Lisic 2016) draws equal and unequal probability LPM and SCPS samples from point resources. The BAS and HIP samples/oversamples can be selected from point, linear and areal resources using SDraw (McDonald 2016). Historical or legacy sites can also be incorporated into a BAS design (Foster et al. 2014) using the MBHdesign package (Foster 2016). Finally, the R package Spbsampling (Pantalone et al. 2019) can be used to draw the within-sample distance-based methods of Benedetti and Piersimoni (2017).

Figure 2 shows examples of equal probability spatially balanced samples of 150 points drawn from a point resource using the SDraw and BalancedSampling. An oversample of 20 points is also illustrated for the BAS and GRTS. Note how the oversample points are spatially balanced with respect to the primary sample. To illustrate the R syntax for these packages, an annotated R script that creates Fig. 2 is given in the supplementary material section.

Fig. 2
figure 2

Several spatially balanced samples drawn using different designs, where open symbols denote oversamples sites

Conclusion

Environmental scientists are beginning to use more advanced sampling designs to achieve robust statistical results. Spatially balanced designs are particularly useful for environmental science because they can produce good-sample coverage over a resource, precise design-based estimators and potentially reduce sampling cost. The GRTS is the most popular spatially balanced sampling design, and it is easy to implement using freely available R packages like the spsurvey and SDraw. Another useful feature of the GRTS is that spatially balanced oversamples can be drawn. Although oversampling is of practical importance, it does not eliminate the non-response from unobservable units or the bias of an inference. Several other spatially balanced designs are also available, each with an accompanying R package. The LPM and SCPS samples can be drawn using the BalancedSampling, and the SDraw selects the BAS and HIP samples/oversamples. Although spatially balanced sampling has mostly been used in ecology, we encourage all environmental scientists to these designs when the research objective is to make an inference from a sample to the entire population.