The Pervasive Challenge of Error and Uncertainty in Geospatial Data

Wechsler, Suzanne Perlitsh; Ban, Hyowon; Li, Linna

doi:10.1007/978-3-030-04750-4_16

Suzanne Perlitsh Wechsler⁷,
Hyowon Ban⁷ &
Linna Li⁷

Part of the book series: Key Challenges in Geography ((KCHGE))

984 Accesses
3 Citations

Abstract

Understanding, quantifying and communicating uncertainty in spatial data, and its propagation through geospatial analyses have been a challenge long recognized in the geospatial community. Over the decades, extensive research has contributed to our understanding of geospatial uncertainty. However, consistent agreed upon methods for addressing, managing, and communicating uncertainty have not been integrated into common geospatial practice. Understanding and accepting the challenge that uncertainty presents to practitioners in the twenty-first century is a step forward in ensuring results of spatial analyses are communicated with greater accuracy and validity for responsible geospatial practice. The mission of this chapter is to provide readers with an appreciation of the varied nature of uncertainty in spatial data, its sources, propagation, and communication.

“…human knowledge is personal and responsible, an unending adventure at the edge of uncertainty.”

Bronowski (1974, p. 367).

Access provided by Autonomous University of Puebla. Download chapter PDF

Sources of Uncertainty in Location Analysis

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

Keywords

1 Introduction

Uncertainty refers to an overall lack of confidence in the quality of scientific data, analyses, and the decisions upon which these are based (Brown 2010). Understanding, quantifying, and communicating uncertainty in spatial data, and its propagation through geospatial analyses, have been challenges long recognized in the geospatial community. However, consistent agreed upon methods for addressing, managing, and communicating uncertainty in the results of geospatial analyses have not been integrated into common geospatial practice. If results of spatial analyses are unaccompanied by measures of certainty, how can we ensure that users of spatial data and associated analyses are provided with the requisite information for responsible data use and decision-making?

Identifying and utilizing mechanisms for addressing uncertainty and integrating these approaches into common geospatial practice remain challenges for geospatial researchers and practitioners. Over the decades, extensive research has contributed to our understanding of geospatial uncertainty (see for example (Couclelis 2003; Fisher 1999, 2006; Heuvelink and Burrough 2002; Hunsaker et al. 2013; Li et al. 2018). The intent of this chapter is to present: (1) generally agreed upon concepts of uncertainty and (2) provide examples of types of approaches to address uncertainty that can be applied to specific spatial datasets.

We begin this chapter with definitions of terms and associated concepts. This first section provides an overview of the complexities of uncertainty in geospatial science. The next sections expand on this background with examples of how methods of confronting uncertainty can be integrated to address uncertainty into geospatial practice.

2 Uncertainty Fundamentals

Understanding the error in spatial databases is necessary to ensure appropriate application of GIS and to ensure progress in theory and tool development. (Chrisman 1991, p. 167)

The largest contributing factor to spatial data uncertainty is error. Error is defined as the departure of a measure from its true value. Our lack of knowledge of the extent and expression of errors in spatial datasets, and their propagation through analyses, results in uncertainty. In applying scientific principles, we have been taught that errors are “bad” and with enough practice and attention to detail, error can be eliminated. However, error abounds in spatial data. No matter how sophisticated mapping technologies become, spatial representations of map features are still generalizations from limited samples of reality. Positions of features represented on a map are subject to displacement and associated attributes are subject to inconsistencies in description and ontological representation. Even at our present level of technological capacity, distortions are an inherent component of maps and their digital representation, and a concomitant component of spatial data. All geospatial data are likely to contain errors, which leads to uncertainty in measurement and in visualization of analytic results (Chrisman 1991; Goodchild and Gopal 1989).

2.1 Error As a Consequence of Scale

In what ways are errors introduced into geospatial data? Let us begin with recognizing the major sources of geospatial data. Geographic Information Systems (GISs) enable exploration of patterns in spatial data in order to understand natural, physical, and anthropogenic processes. These latter phenomena all have variations in their spatial and temporal scale. However, spatial data used to represent them are generated at fixed scales in space and time. This difference results in a major source of error in GIS data: varying spatial and temporal scale are discrepant from obtained GIS data. GISs are incorrectly assumed to be scale independent, because we are able to zoom in and out of data layers. This assumption is untrue and problematic. All map features are only representations of reality and are thus distorted through the cartographic process. Projections displace features in area, size, distance, and direction, while datums determine where features are placed. Additionally, all map features are generalized based on the spatial scale of representation. For example, the twists and turns of a stream represented cartographically at a map with a scale of 1:24,000 will be more detailed than the same stream represented at a scale of 1:100,000. In thematic maps, boundaries of features are represented as discrete and mutually exclusive (Woodcock and Gopal 2000). However, due to positional errors and spatial scale, boundaries are imprecise and can be indeterminate. Furthermore, spatial analyses that include soil boundaries may misrepresent results when Boolean logical operations are used. For example, soil polygons are represented with discrete boundaries. In a query that only includes binary selections of discrete objects such as “is” soil_type_A and “is not” soil_type_A, areas that may contain some components of soil_type_A will be missed.

It is clear that scale is a problematic issue in many sciences, notably those that study phenomena embedded in space and time. (Goodchild 2011, p. 5)

Error due to spatial scale can be minimized by improved precision of measurement that results in greater accuracy. The terms precision and accuracy cannot be used interchangeably. Precision refers to the exactness of a measurement. In spatial data, precision can be attained, for example, using high-quality receivers embedded in a global positioning system (GPS) that enable more exact measures of horizontal and vertical positions. In a raster GIS, precision is achieved through more closely spaced measurements as represented by higher resolution grid cells. Accuracy is defined as the closeness of a measure to its “true” value (Chrisman 1991). Accuracy is quantified by comparing measurements in a dataset with co-located values deemed “higher accuracy.”

While there are many versions of accuracy statistics (Chow and Kar 2017), the accuracy of geospatial data is most commonly quantified using the Root Mean Square Error (RMSE, Eq. 16.1). This global statistic requires access to a higher accuracy measurement, which is not always attainable. As a parametric measure, it is based on the assumption that errors are both normally distributed and representative of the dataset, both of which are not always the case. Nonetheless, the RMSE is the most standard accuracy statistic and has been integrated in common practice as an accepted measure of data quality. The RMSE is expressed as

$$ \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {y_{i} - yt_{i} } \right)^{2} } }}{N -1}} $$

(16.1)

where y_i refers to the ith map value, yt_i refers to the ith known or measured or higher accuracy sample point, and N is the number of sample points.

A consequence of imprecision in spatial data representation is that sampling may occur at a spacing that does not match the underlying frequency of the process we are trying to represent, thus missing the natural scale of a process. Data representation misses the complexity of the natural environment. This is referred to as aliasing and results in bias in representation. Landscape studies are particularly susceptible to this mismatch between spatial and temporal scales. For example, satellites such as MODIS collect fire signatures on the landscape at a high temporal resolution, but on 1-km² grid cells. These signatures are difficult to match to other satellite data such as Landsat where land cover is represented using a high spatial resolution (30 m²) but a comparably coarse temporal resolution. Matching fire patterns with landscape patterns are therefore problematic (Laris et al. 2016).

Another example of uncertainty that is contributed by error occurs when the spatial scales and boundaries imposed by scale do not match the processes we are trying to understand. This is known as the modifiable aerial unit problem (MAUP) (Openshaw 1984). Gerrymandering is an example of an outcome of MAUP in practice. Where we place census boundaries influences the decisions that are made about communities, and can be based on different spatial representations of underlying demographic attributes. Political outcomes can be manipulated by the spatial scales that are employed. Aggregating from census block group to census tract to zip code each changes the outcome of demographic spatial analyses. Similarly, the extent to which geographic boundaries deviate from behavioral context, for example, within and between neighborhoods, has recently been differentiated from MAUP and referred to as the Uncertain Geographic Context Problem—UGCoP (Kwan 2012). Another offshoot of MAUP is the Modifiable Conceptual Unit Problem (MCUP) whereby how we conceptualize spatial processes through the models we develop to represent them may lead to different model predictions (Miller 2016). MAUP and its derivative concepts are unavoidable.

So far we have discussed how spatial and temporal scalings result in uncertainty. In the next section, we explore uncertainty that is due to the lack of precision of language and the choices of words we use to describe spatial phenomena.

2.2 Semantic Uncertainty

When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean — neither more nor less.”…“The question is,” said Alice, “whether you can make words mean so many different things. (Carroll 2002, p. 185)

As Humpty Dumpty asserts, words should be used consistently, without altering their underlying definition (Chisholm 2012). Often, however, descriptive information of map features are inconsistent and have multiple definitions. The resulting uncertainty is referred to as semantic uncertainty and arises from “vague” and “ambiguous” definitions of geospatial representations. Vagueness refers to a lack of unique distinction between objects and classes such as the boundary of a mountain that is continuous and fuzzy rather than crisp. Ambiguity occurs when there are more than two different definitions for a term (Fisher 1999).

Spatial data are developed by people, and as such are affected by variations in human judgment (Shi et al. 2003). For example, it is not uncommon to have two soil experts disagree on a classified type of soil at a particular position. Meanings are attached to geographic features using categories, tags, or descriptive text that are usually loosely defined. For example, two land cover classification systems—the National Land Cover Dataset (NLCD) 1992 and NLCD 2001—provide different definitions for the same land cover classes (Ahlqvist 2008). Similarly, various land cover products can have different thematic accuracies. A land use term such as “area not taken up by agricultural activities” may overlap with a range of various land cover types (Devos and Milenov 2015). The uncertainty that arises from discrepancies in meanings that are applied to spatial data is referred to as semantic uncertainty.

2.3 Semantic Uncertainty in Web-Based Geographic Data

Increasingly, GIS services reside in the cloud and are transmitted over the Internet. Collaboration between distant GISs using the Internet requires interoperability among the GISs so that databases and applications can be shared. However, semantic uncertainty can occur between web-based GISs. For example, if a user of a web-based GIS “A” requests data that is only provided by GIS “B”, the request and the response between “A” and “B” should be understood correctly for successful transfer of the data. This communication between different GISs requires a variety of protocols related to the data request such as visualization of the requested data, format of data, and queries used to request and extract the data. Successful interoperability requires structuring the GIS in terms of the data model and applying agreed upon semantics, using sets of rules and constraints of object class definitions in the GIS (Bishr 1998).

2.4 Semantic Uncertainty in Crowdsourced Geographic Data

In the previous sections, we discussed different types of uncertainty and various ways in which they are introduced into geographic information. Numerous methods have been created to quantify and visualize geographic data uncertainty, many of which are based on comparing a dataset with a mathematical model or a dataset with higher accuracy. These techniques are very helpful for understanding and quantifying uncertainty, but they may not be appropriate for a new and promising type of data that have increasingly become an important and complementary source of geographic information: crowdsourced geographic data or volunteered geographic information (VGI). Crowdsourcing refers to the process of soliciting information, ideas, or services from the crowds who are largely untrained volunteers. For example, many projects have taken place to collect various data from general citizens, including the Audubon Christmas Bird Count project and OpenStreetMap (OSM). This popular type of data source has raised new and unique challenges for assessing, measuring, and communicating data uncertainty (Goodchild 2007).

Semantic uncertainty may be hard to measure and characterize in crowdsourced geographic data. While mapping professionals may disagree on what class to assign to a particular pixel, the information that may be associated with any piece of crowdsourced information is wide open; people may disagree about everything associated with a crowdsourced feature. For instance, when a road is the subject for a dataset, attributes for a road feature may include the road name, highway, other tags, and length. But the specific information assigned to each of these attributes is provided solely at the discretion of a volunteer. Comparisons between crowdsourced datasets and non-crowdsourced data (i.e., authoritative data) reveal that crowdsourced data are inconsistent, with different classification systems, loose definitions, and inconsistent naming.

In a comparison between authoritative and crowdsourced bike path data, for example, the underlying meaning of the attributes must first be determined and agreed upon to establish semantic correspondences between the two datasets (Li and Valdovinos 2018). In a specific example, the Open Street Model (OSM) is a crowdsourced dataset. Volunteers may choose different tags to attach to the same type of feature due to a lack of strict definitions. A bikeway may have a tag “track,” “lane,” “designated bicycle,” etc. in various attribute columns. Another example of semantic uncertainty can be observed when assessing categorized messages on the Ushahidi platform in the 2010 Haiti Earthquake. Camponovo and Freundschuh (2014) found that 50% of the messages were mis-categorized by volunteers for major categories, and 73% of the messages were misinterpreted at the subcategory level. This level of uncertainty raises serious concerns over the legitimacy of using crowdsourced data for emergency responses and other similar critical situations. In another study, expert volunteers were better at identifying land cover types than nonexpert volunteers using the data from the Geo-Wiki crowdsourcing tool (See et al. 2013). This argues for the potential usefulness of training volunteers and the necessity for establishing standards for data gathering and observing as part of the VGI process.

In authoritative (i.e., non-crowdsourced) datasets, it is the responsibility of data producers to provide data quality metadata. According to the Content Standard for Digital Geospatial Metadata created by the United States Federal Geographic Data Committee (FGDC 1998), data quality information is an essential part of metadata in geographic datasets and the FGDC has set forth basic standards for data quality. For example, metadata should include a general assessment of attribute accuracy, positional accuracy, logical consistency, completeness, and lineage.

Traditional geographic datasets are typically created using consistent and verified data collection methods (e.g., surveying, remote sensing imagery) and evaluated using agreed upon quality control standards such as the RMSE. Traditional quality control procedures usually involve a selection of a sample from a dataset and measurement of average uncertainty associated with the selected features. Such results are recorded as a general assessment of data quality.

Lack of quality control and data quality documentation is common in crowdsourced data. These aspects of quality information are not present in most crowdsourced datasets for a variety of reasons:

(1)
Crowdsourced data are contributed by various volunteers with a wide range of data collection skills, mapping experience, and familiarity with geospatial technologies.
(2)
In the volunteer world, volunteers may not have the necessary skills or guidance required to systematically measure and communicate uncertainty.
(3)
Selection of a representative sample for quality evaluation is not available in crowdsourced data.
(4)
Definitions of objects under investigation are inconsistent.
(5)
Inherent observer bias is introduced into the dataset.
(6)
Uncertainty may not be consistent even within the same dataset, which makes it challenging for experts to evaluate the quality of VGI dataset.

The result of such absence of standards is that there are various levels of uncertainty even within the same dataset. This makes it impossible to use a single measurement to describe the uncertainty associated with a whole geographic dataset.

An example of how positional uncertainty and completeness vary significantly from area to area is a comparison between the crowdsourced OSM and the authoritative dataset provided by the Ordnance Survey of Great Britain (Haklay 2010). Similarly, in a study using geotagged Flickr photos of places in France, peoples’ perceptions showed that positional uncertainty associated with these photos varied from a few meters to 78,200 m (Li and Goodchild 2012). This suggests that feature-level accuracy is required if uncertainty is to be represented accurately in VGI. However, a major value of crowdsourcing is its rapid production of geographic data that are not available from traditional sources. Comparison of every piece of VGI-generated geographic information using a “gold standard” for data accuracy is impractical (Goodchild and Li 2012). New uncertainty evaluation methods need to be developed for crowdsourced data that are acceptable for their intended use. For example, there may be a higher standard required for datasets that are used to provide information to emergency responders in search and rescue operations.

2.5 Sampling Bias in Crowdsourced Data

Crowdsourced geographic datasets can be massive having millions of geographic features. For example, Wikimapia, an open-content mapping project completely contributed by volunteers, had just under 28,000,000 objects as of January 2018 (Wikimapia 2018). Large-sized datasets known as big geodata does not guarantee that they are complete or representative. Volume and types of geographic features in a particular area are largely dependent on the interests of volunteers. As people tend to contribute data in their familiar areas, population density plays a large role in data availability. Populous metropolitan areas are more likely to have a more complete dataset compared to rural areas (Estima et al. 2014). Omission of important features may be difficult to detect. This further confirms that the quality of a crowdsourced dataset may be very uneven within the same source due to different samples, different methods, and mixed populations with a lack of specificity about the mixed population.

People are optimistic about the ever-growing field of big data because it allows us to record large volumes of objects and events nearly in real-time using advanced geospatial technologies. However, large quantities of data do not equal greater quality and we need to always be aware of the limitations of the accuracy crowdsourced data. Twitter, for example, has been a popular data source in many applications across different fields because of its convenience to obtain. Yang and Mu (2015) studied geographic patterns of psychological depression using georeferenced tweets. Widener and Li (2014) used geolocated tweets to describe the distribution of healthy and unhealthy food across the contiguous United States. In all these studies, although the total number of investigated tweets is impressive, the limitations are also obvious. Twitter users and particularly a small percentage of heavy users cannot be representative of specific populations. As demonstrated by Li et al. (2013), the spatial and temporal distribution of tweets is uneven from place to place, and the variation is often correlated with socioeconomic characteristics of places. If we look further, a long tail effect is present in any Twitter dataset: a large percentage of tweets is dominated by a small percent of users. Sociology places high value in carefully selected samples that effectively represent a larger population. However, when a data source like Twitter is used, identifying a specific population is uncertain. Addressing error due to the limits of the sampled population and associated inherent biases continues to pose a challenge in working with this rich geospatial data source. It is incumbent upon data analysts to qualifying the data based on the way it was gathered and any evidence of the nature of the bias.

2.6 Using Fuzzy Sets to Visualize Semantic Uncertainty

Errors due to semantic uncertainty result from incomplete knowledge of spatial phenomena. Traditionally, in a cartographic map, boundaries are treated as discrete and mutually exclusive (Woodcock and Gopal 2000). Positional errors are not incorporated in the representations map features. The true location of a boundary is unknowable and subject to positional and semantic error. Researchers have explored the use of fuzzy set theory (Zadeh 1965) to visualize semantic uncertainty (Ban and Ahlqvist 2009; Bishr 1998; Li et al. 2018; McBratney 1992). Fuzzy set theory assumes all boundaries are fuzzy. Fuzzy logic specifically addresses situations when the boundaries between classes are not clear. Unlike crisp sets, fuzzy logic is not a matter of “in” or “out” of the class; it defines how likely it is that the phenomenon is a member of a set.

Specific mathematical membership classes are used to classify the likelihood of membership in a class (Zadeh 1978). For example, an area that is definitely classified as, for example, being a specific soil type, is given a membership value of 1. An area that is definitely not part of that specific soil type is given a membership class with a value of 0. Anything in between is assigned a value ranging from 0 to 1, based on a membership curve. Various membership curves have been developed to represent the nature of the fuzzy relationship. These membership classes can be used to represent semantic similarity metrics for evaluation of different classifications—or definitions—of the same concept (Ahlqvist 2008). Semantic uncertainty can be reduced by developing and applying these types of measures of semantic membership.

2.7 Summary of Variables and Concepts of Data Uncertainty

Figure 16.1 provides various representations of spatial data uncertainty by building on the framework of Foote and Huebner (2000) and Chrisman’s (1991) taxonomy of error (Chrisman 1991; Foote and Huebner 2000). These ideas are further expanded to include issues of vagueness and ambiguity that contribute to semantic uncertainty. The various sources of error that contribute to spatial data uncertainty can be encapsulated under the framework of spatial, temporal, and observational scales. Our understanding of processes, achieved through spatial datasets and geospatial analyses, is derived through this lens.

3 Approaches to Error and Confronting Uncertainty

Once a model of error is developed, what good does it do? (Chrisman 1991, p. 173)

The previous section broadly contextualizes sources of error that contribute to uncertainty in spatial data. In this section, we discuss how our understanding of these errors can be used to address uncertainty. We provide two examples of how error can be modeled. The first example provides an example of approaching uncertainty in a raster elevation dataset. The second example demonstrates an approach to visualizing semantic uncertainty in census tract data using fuzzy set theory.

3.1 An Example: Modeling Error to Address Uncertainty Using Monte Carlo Simulation

Measures of error, such as the RMSE, provide mechanisms for quantifying uncertainty. Assuming that the estimator of elevation is unbiased, the RMSE provides an estimate of the variance of the error estimation. The RMSE can be used as springboard for generating random error fields. For example, in a digital elevation model (DEM), elevations are represented as continuous fields using a raster grid cell structure. Assuming that error can be approximated by the RMSE, each elevation value has the possibility of being the stated value, or any other value in a normal distribution with a mean of 0 and a standard deviation equal to the RMSE.

Numerous studies have applied Monte Carlo simulation techniques to evaluate uncertainty, specifically in digital elevation models (DEM) where elevations are represented as continuous fields using a raster grid cell structure (Ehlschlaeger et al. 1997; Hunter and Goodchild 1997; Wechsler 2000, 2006, 2007). Random error fields can be generated where each cell represents a value within the distribution of the RMSE. When added to the original elevation dataset, a possible realization of the elevation, under uncertainty, is achieved. Hundreds or thousands of these equally viable realizations of the elevation surface can be generated. When overlaid upon each other, and using a Monte Carlo simulation approach, statistical distributions of uncertainty, on a per-cell-basis, can be generated.

In analyzing data to determine error fields, the goal is to spatially represent the underlying potential nature of error, which as previously established are unknowable. However, in accordance with Tobler’s First Law of Geography (Tobler 1970), adapted from Neprash (1934), map features, including error are more similar the closer they are in proximity (Neprash 1934; Tobler 1970). Thus, errors are spatially autocorrelated. Varying the spatial autocorrelation of random fields allows the data user to visualize and quantify error present in their data. Figure 16.2 provides an example of four random error fields with varied representation of spatial autocorrelation. When added to surfaces, each error field represents the surface under uncertain conditions. Using Monte Carlo simulation techniques, these grids are analyzed to provide statistical estimations of uncertainty (Fig. 16.2). Such surface realizations can be used to derive per-cell uncertainty estimators and probability surfaces that can better inform decision-makers about the variability of outcomes (Fig. 16.3). This increased understanding of the data can enable decision-makers to consider how much error and what types of error can be tolerated in predicting events.

3.2 An Example: Semantic Uncertainty of the Exurbanization Concept

Exurbanization refers to the demarcation between urban and suburban areas. Here, we provide an example where fuzzy set theory is applied to address semantic uncertainty, using exurbanization as an example. The exurbanization boundary has been a subject of research in over eighteen studies; however, there is no agreement in these studies on the definition of exurbanization. This subject therefore is an excellent example of semantic uncertainty (Berube et al. 2006). For instance, exurbanization can be defined by various characteristics of a region such as population density, distance, and household lot size (Ban and Ahlqvist 2009). The same exurban area, therefore, may be represented differently based on the definition used. For example, Daniels (1999) defined the exurban areas as “10–50 miles away from a major urban center of at least 500,000 people,” “5–30 miles from a city of at least 50,000 people,” “generally within 25-min commuting distance,” and/or “the population density is generally less than 500 people per square mile” (Daniels 1999). These four definitions are each different, resulting in semantic uncertainty.

A fuzzy set approach can be used to visualize the boundaries of the exurban areas given semantic uncertainty. In this example, we use population density for the exurban definition given by Daniels (1999) where exurban areas include “less than 500 people/mi².” The fuzzy set membership function assigns each spatial object a membership value ranging between zero—e.g., not exurbanized at all—and one—e.g., entirely exurbanized. In addition, a membership value of 0.5 is the breakpoint of the definition—e.g., either exurban or non-exurban (Li et al. 2018). Using this function within a GIS, boundaries can be represented by their likelihood of membership and given fuzzy boundaries.

Figures 16.4 and 16.5 demonstrate how fuzzy set membership functions can be developed for a chosen exurban definition using a fuzzy set approach. According to the numerical expression of the attribute, the population density values of “500 people/mi²” serve as the breakpoint in this fuzzy set membership function in determining whether a location is more exurban or not.

The areas between 0 people/mi² and 500 people/mi² represent fuzziness in our understanding of what constitutes exurban with high degrees. These areas are assigned a membership value between 1 and 0.5 in the fuzzy set membership function. A membership value of 0.5 is assigned to areas of the breakpoints—exactly 500 people/mi². A full fuzzy set membership value of 1 is assigned to areas with a population density value of 0 people/mi², since these areas are definitely exurban. A membership value of 0 is assigned to all areas with population density values greater than 1000 people/mi² since these areas definitely satisfy our conceptualization of this area as non exurban (Fig. 16.4).

Following Ban and Ahlqvist (2009), a set of membership functions for population density using the membership function (mf) expressed in Fig. 16.4 can be developed as follows in Eq. 16.2:

$$ \begin{aligned} & {\text{mf}}{:} \,- \frac{1}{1000} \cdot X + 1\;{\text{for }}\left( {X \le 1000} \right),{\text{and}} \\ & {\text{mf}}{:}\,0\;{\text{for}}\;(X > 1000) \\ \end{aligned} $$

(16.2)

where mf refers to the membership function and X = population density.

When Eq. 16.2 is applied to census block data, results can be visualized cartographically (Fig. 16.5). Figure 16.5b illustrates the uncertain spatial boundaries based on the exurban definition from Daniels (1999) spatially measured using Eq. 16.2 and population density in Los Angeles County in California, USA. The continuous and fuzzy scale values in Fig. 16.5b represent the heterogeneous degrees of exurbanization in the study areas that crisp Boolean-style boundaries may miss (Fig. 16.5a). Darker blue colors represent higher degrees of exurbanization.

The use of a fuzzy set approach to address semantic uncertainty is presented using the case of the exurbanization concept. The usefulness of this approach to represent uncertainty in spatial boundaries is demonstrated and serves as an example of how this information can be applied to visually represent datasets with semantic inconsistencies.

4 Decision-Making Under Uncertainty

Error is an inevitable component of all scientific endeavors. Error in geospatial datasets is introduced into and propagated in all spatial analyses. As described in this chapter, approaches to addressing and communicating the level of error and associated uncertainty are unique and specific to each spatial analysis and associated datasets.

The decision-maker is obligated to take into consideration all aspects of spatial analyses, including error. The utilization of the analyses is at the discretion of the decision-maker, who takes into account the context and nature of risks and benefits of the unique applications of the analyses. It is the responsibility of the geospatial analyst to provide a decision-maker with the data necessary to assist in determining the level of uncertainty and associated risk that are acceptable and appropriate. In fact, geospatial practitioners who become certified geospatial professionals (GISPs) must adhere to a GIS Code of Ethics and Rules of Conduct to “…do the best work possible…provide full, clear, and accurate information…” (GISCI Code of Ethics 2018) and “…acknowledge…errors and…not distort or alter the facts…” (GISCI Rules of Conduct 2018).

Academics and researchers must continue to develop accessible approaches for quantification and visualization of such error propagation, and prepare students to understand and accept such approaches. Software vendors must seamlessly integrate such measures so that they are available to practitioners who at a minimum acknowledge and communicate the limitations of the datasets that have been incorporated into analyses. Utilization of error by practitioners is a separate and complex area of study that merits extended examination and review.

5 Conclusions

Uncertainty in geospatial data results from error. We have identified various sources of error as summarized in Fig. 16.1. The following are examples of sources of error. (1) technological limitations affect the precision of measurements. (2) The imposition of scale in a spatial and temporal context inevitably results in discrepancies between measurement and ground truth. (3) Vague concepts, ambiguous boundaries, and contentious classifications all introduce uncertainty into geographic representation. (4) Unknown and unknowable samples and sampling procedures are the inherent sources of error in crowdsourced data.

Why should we care about geospatial uncertainty and why is this a persistent challenge of geospatial practice? The goal of geospatial information is to provide a deeper understanding of the world. Deriving understanding from information is essential for knowledge building. Common geospatial practice does not consistently include representations of uncertainty as an essential and expected component of results. The intent of this chapter was to provide a general review of “uncertainty basics.” An improved understanding and acceptance of uncertainty will lead to more effective and responsible geospatial practice.

We assert that all spatial analyses should include appropriate statements of qualifications related to error. These should be sufficient to enable a user to take error into consideration in decision-making. Addressing uncertainty requires a recalibration of our perspective related to error, moving from aversion to acceptance to integration. To manage uncertainty requires acknowledging how errors are introduced into spatial data and how errors are measured. Error should be minimized to the best of our ability and then quantified to the extent possible. Uncertainty measures should be developed and applied routinely as a part of spatial analyses and communication of results. These tools are essential to capture the complexity in geospatial analyses and in turn used for improved application and knowledge building.

References

Ahlqvist O (2008) Extending post-classification change detection using semantic similarity metrics to overcome class heterogeneity: a study of 1992 and 2001 US National Land Cover Database changes. Remote Sens Environ 112(3):1226–1241
Article Google Scholar
Ban H, Ahlqvist O (2009) Representing and negotiating uncertain geospatial concepts—where are the exurban areas? Comput Environ Urban Syst 33(4):233–246
Article Google Scholar
Berube A, Singer A, Watson J, Frey W (2006) Finding Exurbia: America’s fast-growing communities at the metropolitan fringe. The Brookings Institution, Living Cities Census Series, Washington, DC
Google Scholar
Bishr Y (1998) Overcoming the semantic and other barriers to GIS interoperability. Int J Geogr Inf Sci 12(4):299–314
Article Google Scholar
Brown JD (2010) Prospects for the open treatment of uncertainty in environmental research. Prog Phys Geogr 34(1):75–100
Article Google Scholar
Camponovo ME, Freundschuh SM (2014) Assessing uncertainty in VGI for emergency response. Cartogr Geogr Inf Sci 41(5):440–455
Article Google Scholar
Carroll L (2002) Alice’s adventures in wonderland & through the looking-glass (modern library classics). Random House Inc., Modern Library Paperback Edition, USA
Google Scholar
Census U (2010) US 2010 Census Summary File 1
Google Scholar
Chisholm M (2012) Definitions in semantics: the Humpty-Dumpty principle in definitions. http://definitionsinsemantics.blogspot.com/2012/03/humpty-dumpty-principle-in-definitions.html
Chow E, Kar B (2017) Error and accuracy assessment for fused data: remote sensing and GIS. Integrating scale in remote sensing and GIS, p 125
Chapter Google Scholar
Chrisman N (1991) The error component in spatial data. Geogr Inf Syst 1:165–174
Google Scholar
Couclelis H (2003) The certainty of uncertainty: GIS and the limits of geographic knowledge. Trans GIS 7(2):165–175
Article Google Scholar
Daniels T (1999) When city and country collide: managing growth in the metropolitan fringe. Island Press
Google Scholar
Devos W, Milenov P (2015) Applying Tegon, the elementary physical land cover feature, for data interoperability. Land use and land cover semantics: principles, best practices, and prospects, p 243
Chapter Google Scholar
Ehlschlaeger CR, Shortridge AM, Goodchild MF (1997) Visualizing spatial data uncertainty using animation. Comput Geosci 23(4):387–395
Article Google Scholar
Estima J, Fonte CC, Painho M (2014) Comparative study of Land Use/Cover classification using Flickr photos, satellite imagery and Corine Land Cover database
Google Scholar
FGDC (1998) Content standard for digital geospatial metadata. http://gis.sam.usace.army.mil/General_Information/Standards_And_Reports/Metadata%20Content%20Standard.pdf
Fisher PF (1999) Models of uncertainty in spatial data. Geogr Inf Syst 1:191–205
Google Scholar
Fisher PF, Tate Nicholas J (2006) Causes and consequences of error in digital elevation models. Prog Phys Geogr 30(4):467–489
Article Google Scholar
Foote K, Huebner D (2000) Error, accuracy and precision—the geographer’s craft project. Department of geography, University of Texas, Austin
Google Scholar
GISCI Code of Ethics (2018) https://www.gisci.org/Ethics/CodeofEthics.aspx. Accessed 28 Aug 2018
GISCI Rules of Conduct (2018) https://www.gisci.org/Ethics/RulesofConduct.aspx. Accessed 28 Aug 2018
Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221
Article Google Scholar
Goodchild MF (2011) Scale in GIS: an overview. Geomorphology 130(1):5–9
Article Google Scholar
Goodchild MF, Gopal S (1989) The accuracy of spatial databases. CRC Press, Florida
Google Scholar
Haklay M (2010) How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ Plan 37(4):682–703
Article Google Scholar
Heuvelink GB, Burrough PA (2002) Developments in statistical approaches to spatial uncertainty and its propagation. Int J Geogr Inf Sci 16(2):111–113
Article Google Scholar
Hunsaker CT, Goodchild MF, Friedl MA, Case TJ (2013) Spatial uncertainty in ecology: implications for remote sensing and GIS applications. Springer, New York
Google Scholar
Hunter G, Goodchild M (1997) Modeling the uncertainty of slope and aspect estimates derived from spatial databases. Geogr Anal 29(1):35–49
Article Google Scholar
Kwan M-P (2012) Uncertain geographic context problem: implications for environmental health research. In: 142nd APHA Annual Meeting and Exposition (November 15–19 Nov 2014), 2012. APHA
Google Scholar
Laris P, Dadashi S, Jo A, Wechsler S (2016) Buffering the savanna: fire regimes and disequilibrium ecology in West Africa. Plant Ecol 217(5):583–596
Article Google Scholar
Li L, Ban H, Wechsler SP, Xu B (2018) 1.22—Spatial Data Uncertainty A2—Huang, Bo. In: Comprehensive geographic information systems. Elsevier, Oxford, pp 313–340. https://doi.org/10.1016/B978-0-12-409548-9.09610-X
Chapter Google Scholar
Li L, Goodchild MF (2012) Constructing places from spatial footprints. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information, 2012. ACM, pp 15–21
Google Scholar
Li L, Goodchild MF, Xu B (2013) Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartogr Geogr Inf Sci 40(2):61–77
Article Google Scholar
Li L, Valdovinos J (2018) Optimized conflation of authoritative and crowd-sourced geographic data: creating an integrated bike map. In: Information fusion and intelligent geographic information systems (IF&IGIS’17). Springer, pp 227–241
Google Scholar
McBratney AB (1992) On variation, uncertainty and informatics in environmental soil management. Soil Res 30(6):913–935
Article Google Scholar
Miller MD (2016) The modifiable conceptual unit problem demonstrated using pollen and seed dispersal. Glob Ecol Conserv 6:93–104
Article Google Scholar
Neprash JA (1934) Some problems in the correlation of spatially distributed variables. J Am Stat Assoc 29(185A):167–168
Article Google Scholar
Openshaw S (1984) The modifiable areal unit problem. In: Geo Abstracts. University of East Anglia
Google Scholar
See L, Comber A, Salk C, Fritz S, van der Velde M, Perger C, Schill C, McCallum I, Kraxner F, Obersteiner M (2013) Comparing the quality of crowdsourced data contributed by expert and non-experts. PLoS ONE 8(7):e69958
Article Google Scholar
Shi W, Fisher P, Goodchild MF (2003) Spatial data quality. CRC Press, Florida
Google Scholar
Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ geogr 46(sup1):234–240
Article Google Scholar
Wechsler SP (2000) Effect of DEM uncertainty on topographic parameters, DEM scale and terrain evaluation.PhD. Dissertation State University of New York College of Environmental Science and Forestry
Google Scholar
Wechsler SP, Kroll CN (2006) Quantifying DEM uncertainty and its effect on topographic parameters. Photogram Eng Remote Sens 72(9):1081–1090
Article Google Scholar
Wechsler S (2007) Uncertainties associated with digital elevation models for hydrologic applications: a review, Hydrol. Earth Sys Sci, Spec Issue: Uncertainties Hydrol Obs 11:1481–1500
Article Google Scholar
Widener MJ, Li W (2014) Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US. Appl Geogr 54:189–197
Article Google Scholar
Wikimapia (2018) Wikimapia Statistics. http://wikimapia.org/#lang=en&lat=33.784300&lon=-118.115700&z=12&m=b&show=/stats/action_stats/?fstat=101&period=1&year=2018&month=1. Accessed Jan 18 2018
Woodcock CE, Gopal S (2000) Fuzzy set theory and thematic maps: accuracy assessment and area estimation. Int J Geogr Inf Sci 14(2):153–172
Article Google Scholar
Yang W, Mu L (2015) GIS analysis of depression among Twitter users. Appl Geogr 60:217–223
Article Google Scholar
Zadeh LA (1965) Information and control. Fuzzy Sets 8(3):338–353
Google Scholar
Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Geography, California State University, Long Beach (CSULB), Long Beach, CA, USA
Suzanne Perlitsh Wechsler, Hyowon Ban & Linna Li

Authors

Suzanne Perlitsh Wechsler
View author publications
You can also search for this author in PubMed Google Scholar
Hyowon Ban
View author publications
You can also search for this author in PubMed Google Scholar
Linna Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suzanne Perlitsh Wechsler .

Editor information

Editors and Affiliations

National Technical University of Athens, Pikermi, Greece
Kostis Koutsopoulos
University of Zaragoza, Zaragoza, Spain
Rafael de Miguel González
EUROGEO, Liverpool, UK
Karl Donert

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wechsler, S.P., Ban, H., Li, L. (2019). The Pervasive Challenge of Error and Uncertainty in Geospatial Data. In: Koutsopoulos, K., de Miguel González, R., Donert, K. (eds) Geospatial Challenges in the 21st Century. Key Challenges in Geography. Springer, Cham. https://doi.org/10.1007/978-3-030-04750-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-04750-4_16
Published: 17 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04749-8
Online ISBN: 978-3-030-04750-4
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

The Pervasive Challenge of Error and Uncertainty in Geospatial Data

Abstract

Similar content being viewed by others

Sources of Uncertainty in Location Analysis

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

Geospatial Analysis and Geocomputation: Concepts and Modeling Tools

Keywords

1 Introduction