Introduction

Geographic Information Systems (GIS) offer a very rich toolbox of methods and technologies that extend far beyond the mere production of maps (cartography). They enable the spatial contextualisation (e.g., physical, biological, environmental, economic, demographic, ethnic, social, cultural, etc.) of health and disease data.

The traditional classification of what GIS can do in cancer and cancer care comprises two broad types of GIS applications, namely geography of disease (cancer epidemiology and outcomes in populations) and geography of healthcare systems (cancer prevention, screening and treatment/care services delivery) [13].

The former (geography of disease) encompasses exploration, description, and modelling activities which can include the analysis of the spatiotemporal incidence of cancer and related environmental and other place-associated phenomena, the detection and analysis of disease clusters and patterns, causality analysis, and the generation of new hypotheses. The latter (geography of healthcare systems) deals with the planning, management, and delivery of suitable cancer prevention, screening, treatment, and care services, ensuring among other things adequate and equitable patient access. This paper provides an eight-year snapshot of geospatial cancer research in peer-reviewed literature (2002–2009), presenting the clinico-epidemiological and methodological findings and trends in the covered corpus.

Methodology

Paper search and selection strategy

We searched PubMed (http://pubmed.gov/) using the following query:

(“2002/01/01”[PDat] : “2009/12/31”[PDat]) AND (“Health Place”[Journal] OR “Int J Health Geogr”[Journal] OR “Geospat Health”[Journal]) AND (neoplasms[MeSH Terms] OR cancer[All Fields] OR cancers[All Fields] OR neoplasm[All Fields] OR neoplasms[All Fields] OR oncology[All Fields] OR oncology[All Fields] OR oncologists[All Fields] OR oncological[All Fields] OR tumor[All Fields] OR tumors[All Fields] OR tumour[All Fields] OR tumours[All Fields] OR lesion[All Fields] OR lesions[All Fields] OR carcinogen[All Fields] OR carcinogens[All Fields] OR carcinogenic[All Fields] OR chronic[All Fields])

We limited our search to PubMed-indexed papers published during the period from 1 January 2002 to 31 December 2009 in only three journals that are currently fully dedicated to geospatial research in health and healthcare; namely International Journal of Health Geographics, Geospatial Health, and Health and Place.

Some relevant studies are also published in other journals not specialising in this area of interest, e.g., [47], but these were not included in our sample of geospatial cancer research.

Our query retrieved 128 papers in total. The retrieved articles were manually scanned to determine relevance for inclusion. Some articles (n = 45) applied one or more technique(s) directly to an oncological dataset, either in the context of a specific cancer or of an entire cancer dataset. Other articles (n = 41) focused on a method or technique in the context of a generic oncological reference (e.g., distance to cancer care centres) or examined a number of conditions, of which at least one is oncological, or focused on a non-oncological dataset but made clear and adequately relevant generalisations or references to applications to oncological datasets or cancer. We included both classes of articles in our study (n = 86).

We also came across articles (n = 10) that only vaguely referenced oncology in passing, either in their background/introduction or conclusions/discussion sections, and a considerable number of papers (n = 32) that did not contain any explicit oncology reference at all. Of these latter two types of articles (n = 42), we further identified seven papers that are relevant and of interest to our study, bringing the total number of articles included in our snapshot of geospatial cancer research to 93 papers [8100].

Of all included studies, about 66% were conducted in the United States (this does not necessarily reflect the author’s or authors’ affiliation or geographic location).

Results

Clinico-epidemiological findings and implications

A number of cancer sites and types were covered in our selected corpus of 93 papers.

Table 1 presents a sampler of some of the main clinico-epidemiological findings and implications from the surveyed papers. Table 1 was constructed according cancer sites. Additional relevant references were cited in Table 1 to support, contrast, or discuss/contextualise the findings from our reviewed corpus of papers.

Table 1 Summary of main clinico-epidemiological findings and implications in the surveyed papers

As shown in Table 1, several important clinico-epidemiological phenomena were elucidated. Geographical disparity and diversity in cancer mortality were examined for several cancer sites [8, 13, 35]. The association between cancer risk and exposure to environmental and lifestyle risk factors was examined using different techniques [11, 27, 30, 31, 33, 39, 48, 54]. Two studies examined racial differences in cancer stage at diagnosis and cancer mortality [40, 41]. Interesting relationships between socioeconomic status and self-reported health, cancer incidence, cancer stage, and cancer mortality were assessed [15, 16, 19, 24, 28, 36, 42, 47, 51]. Various GIS methods also explored cancer patterns and outcome among immigrant populations [18, 34]. Spatial and/or temporal clustering for incidence and mortality for some cancer sites was also explored in several studies [20, 21, 23, 25, 32, 45, 46, 52, 55].

Geospatial methodological findings

On data sources for geospatial cancer research

Boscoe et al. [19] provide an overview of some of the unique characteristics of spatial data, followed by an account of the major types and sources of data used in the spatial analysis of cancer, including data from cancer registries, population data, health surveys, environmental data, and remote sensing data. García-Pérez et al. [35] cover a good example of environmental data sources, namely the European Pollutant Emission Register (EPER), a public inventory of pollutant industries created by decision of the European Union. This has since been improved upon by the European Pollutant Release and Transfer Register (E-PRTR—http://ec.europa.eu/environment/air/pollutants/stationary/eper/index.htm). Such environmental data sources can prove useful for quantification of the effect of proximity to different industrial plants on cancer risk and all-cause mortality observed in nearby cities and towns. Viel et al. [96] present a good example of environmental carcinogen (dioxin) exposure modelling involving GIS.

Meteorological data can also prove very useful in cancer studies. For example, Kinoshita et al. [67] studied the geographical distribution of pancreatic cancer in relation to selected climatic factors in Japan.

On disaggregate data privacy issues

The relative rarity of cancer causes a sparse data problem for analysis, both for detecting clusters in data with high spatial variability and for communication of results without violating confidentiality and individuals’ privacy [13, 85, 101]. Confidentiality constraints often preclude the release of disaggregate data about individual cancer patients [102, 103]. Access to individually geocoded (disaggregate) data often involves lengthy and cumbersome procedures through review boards and committees for approval (and sometimes is not possible). Moreover, current data confidentiality-preserving solutions compatible with fine-level spatial analyses often lack flexibility or yield less than optimal results [102].

When data are spatially aggregated to large areas to preserve individuals’ privacy, the ability of researchers to detect disease clusters or to investigate suspected relationships between environmental exposures and disease events is affected in different ways. Exposure assessment data are generally collected for different areas than health and demographic data [102]. Kamel Boulos et al. [103] provide a comprehensive state-of-the-art review of data privacy issues (including privacy-preserving solutions and recommendations) in health GIS studies.

On the use of social deprivation indices

Many countries have their own geography-based index or indices of deprivation that are regularly updated; for example, England has ‘The English Indices of Deprivation 2007’ (http://www.communities.gov.uk/documents/communities/pdf/733520.pdf). Examples of work (from our selected corpus of 93 papers) that used such indices include the three studies by Pearce and colleagues [8284], as well as [88].

On geocoding and choice of appropriate geographic unit of analysis

Geocoding is the process of finding-associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or postal codes. With geographic coordinates, the features can be mapped and entered into GIS software and geostatistical tools for further processing and advanced visualisation. However, geocoding can prove difficult at times. The accuracy and completeness with which it is performed can vary, and this can affect the findings of spatial epidemiologic analyses and lead to bias in a study’s outcomes.

On the importance of ‘place history’ in diseases with long latency such as cancer

Most analyses of spatial clustering of disease have been based on either residence at the time of diagnosis or current residence. An underlying assumption in these analyses is that residence can be used as a proxy for environmental exposure. However, exposures earlier in life and not just those in the most recent period may be of significance [50]. Similarly, there is evidence of a contribution of early life socioeconomic exposures to the risk of chronic diseases, including cancer, in adulthood, but extant studies investigating the impact of the neighbourhood social environment on health tend to characterise only the current social environment [87]. Most cancers develop over a period of 20–30 years and are a result of multiple exposures and the interplay of external factors with the individual’s genetic susceptibility. Because latencies differ by cancer type and most likely by an individual’s susceptibility, little guidance is available for this question [85]. In breast cancer, for example, there is accumulating evidence that early life exposures may contribute to risk. In the study by Han et al. [50], examination of lifetime residential history provided additional information on geographic areas associated with higher risk.

Residential histories are increasingly available (despite complexities involved in obtaining and geocoding historical addresses [87], including data access barriers in relation to individuals’ privacy), raising the possibility of routine surveillance in a manner that accounts for individual mobility and that incorporates models of cancer latency and induction. Jacquez et al. [63, 64] developed new case-only clustering techniques that account for residential mobility, latency and induction periods, relevant covariates. In a similar vein, Rose et al. [87] argue that it should be possible in many cases to characterise the earlier social environment with known levels of measurement error and such an approach should be considered in future studies.

Spatial and aspatial statistics in geospatial cancer research

A large number of studies in our selected corpus papers used spatial and aspatial statistics in their geospatial investigation into cancer, e.g., [44, 57, 70, 80]. Goovaerts et al. [41] raise a number of salient points surrounding the selection of approaches to specific analyses.

Pearce et al. [82] used logistic regression in a study belonging to a larger project on the geography of lung cancer incidence across Scotland. The main aim of their study was to develop a technique for estimating smoking probability for different age/sex groups in small areas across the whole of Scotland using information on smoking behaviour from the Scottish Household Survey. Other studies from our selected corpus of geospatial cancer research papers that used logistic regression include [11, 22, 43, 54, 75].

Bove Jr et al. [20] used GIS in a geographically based exposure assessment to evaluate cancer risk associated with carcinogenic (or potentially carcinogenic) substances in the environment. They employed kriging, a geostatistical method to interpolate the value of a random field at an unobserved location from observations of its values at nearby locations. The disadvantages of kriging, generally speaking, include constant variance assumption and normality. Goovaerts [39] provides a creative modification and application of geostatistical techniques that have in large part been developed for, and proven in, other fields such as population ecology. Goovaerts’ proposed approach accounts for geographic heterogeneity in the population-at-risk and incorporates this information into the stabilisation of the rates and into estimates of the uncertainty through P-field simulation. Capitalising on the abundant geostatistical literature devoted to the modelling of local and spatial uncertainty, plus the recent development of Poisson kriging, Gooaverts [39] presents a novel approach, and the corresponding computer code, to generate realisations of the spatial distribution of risk values, and applies it to age-adjusted breast and pancreatic cancer mortality rates recorded for white females in 295 US counties of the North-east (1970–1994). Two important approaches exist for mapping disease rates, namely geostatistical techniques and Bayesian methods. While important theoretical work has been accomplished separately over the past decades using these two approaches, there has been a lack of synergies between them, and as a result these two approaches have grown as two different disciplines. The study by Goovaerts and Gebreab [42] represents the first work that compares these two approaches using a simulation study and cancer data sets. While any comparison studies can always be criticised (e.g., in this study the simulation method used can favour one of the methods being compared), the geostatistical approach was found to perform better than a full Bayesian approach in this paper.

Bilancia and Fedespina [17] conducted a geographical analysis of lung cancer mortality data in the Province of Lecce, Italy, during the period 1992–2001, using standard statistical methods for disease mapping. Their study offers useful insight into the geographical clustering of lung cancer in the province of Lecce, by estimating the spatial pattern of risk excess in the area and thus contributing some useful statistical evidence towards the generation of new hypotheses for further study about the possible causes for the observed pattern.

Goria et al. [45] conducted an ecological study in four French administrative departments and highlighted an excess risk in cancer morbidity for residents around municipal solid waste incinerators. Their research showed the importance of advanced GIS tools and statistical techniques to better assess weak associations between the risk of cancer and past environmental exposures. In most epidemiological studies, distance is still used as a proxy for exposure. This can lead to significant exposure misclassification. Additionally, in geographical correlation studies the non-linear relationships are usually not accounted for in the statistical analysis. In studies of weak associations, it is important to use advanced methods to better assess dose–response relationships with disease risk.

On the use of univariate vs. multi-method approaches to assess the geographic patterns of cancer

Ideally and whenever practically possible, investigators should adopt an exploratory, integrative, and multi-scalar approach when assessing geographic patterns of cancer, using a variety of techniques for geographic pattern detection at different spatial scales, as different methods will often identify different spatial patterns. Jacquez and Greiling [60, 61] demonstrated an approach employing a battery of techniques to elucidate geographic variation in cancer incidence in Long Island, New York, and to evaluate spatial association with air-borne toxics.

Complete spatial randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organised systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. Goovaerts and Jacquez [37] present a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. A study of the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties on Long Island, New York, USA (focusing on the same ZIP codes as in [60, 61]) was used to demonstrate this approach.

On measuring geographic access to cancer care and profiling service users

Measuring cancer care facility accessibility, e.g., using a metric such as travel time [14], becomes important when deciding on the sitting of (i.e., choosing the best location for) new cancer screening and care centres, or on the expansion or closure of existing centres, based on the profiled needs of target communities (service users) [88]. The first step in cancer control is identifying where the cancer burden is elevated, which can suggest locations where interventions are mostly needed [86].

An increasing number of studies of spatial accessibility of people to a service use GIS estimates of car travel times to the health services under examination. Haynes et al. [52] compared GIS estimates based on average car speeds on different classes of road and cancer patients’ reports of the time taken to make actual car journeys to hospital and found that the two were closely related.

Mobley et al. [76] make an important contribution to health disparities research through the use of multi-level modelling to examine variables at the macro, community, and individual levels. Their research serves to demonstrate the complexity of historical, social, economic, and cultural factors that impact health and access. They considered and evaluated the interplay between individual, social, cultural, and physical environments, concepts that are generally overlooked in other studies or considered too difficult to collect. Their study demonstrates the importance of understanding place-specific differences in access to care (the example studied in [76] is mammography use across the USA), differences that would be “averaged out” in pooled analyses. Results such as those reported by Mobley et al. can potentially affect the way that access to health services is characterised and impact decisions about service provision to better meet the needs of the public by helping decision-makers optimise their service and resource planning and management decisions.

On the art of cartography (visual communication through maps) and the use of online interactive and animated maps

To communicate population-based cancer statistics, cancer researchers have a long tradition of presenting data in a spatial representation, or map. A comprehensive review by Bell et al. [13] focuses on designing maps to communicate effectively. The biggest challenge is to ensure that maps of health statistics inform without misinforming. For example, Gelman and Price [104] have shown that plotting rates could be very misleading when sample sizes vary by area. Advances in the sciences of cartography, statistics, and visualisation of spatial data are constantly expanding the toolkit available to mapmakers to meet this challenge and avoid such pitfalls.

Cancer mortality maps are used by public health officials to identify areas of excess and to guide surveillance and control activities. Quality of decision-making thus relies on an accurate quantification of risks from observed rates which can be very unreliable when computed from sparsely populated geographical units or recorded for minority populations. Indeed, a major limitation of choropleth maps (thematic shaded or patterned maps) is the common biased visual perception that larger rural and sparsely populated areas are of greater importance. Addressing this limitation, Goovaerts [38] presents a geostatistical methodology that accounts for spatially varying population sizes and spatial patterns in the processing of cancer mortality data. His approach, described in [40], allows the continuous mapping of mortality risk, while accounting locally for population density and areal data through the coherence constraint.

Interactive Web mapping technologies are also opening up access to, and participation in, GIS and geospatial databases for much wider audiences; users only need to have a standard Web browser (some solutions also require a downloadable browser plugin) and an Internet connection [105]. Interested readers are referred to the following specialised tutorial series on ‘Web GIS in practice’, published by the International Journal of Health Geographics: http://www.ij-healthgeographics.com/series/1476-072X-Gis.

Theseira [93] describes an early (2002) example of Internet GIS for the West Midlands Region in England (the Multi-Agency Internet Geographic Information Service (MAIGIS) project) that involved cancer data. She highlights the importance of data sharing between organisations (see also [2]) and also mentions cancer data confidentiality issues in her paper. Yi et al. [99] describe a Web GIS example integrating open source technologies and public health data to create EpiVue, a Web-based cancer information system that is accessible to a wide audience through the Internet (https://epivue.cphi.washington.edu/epivue/). Bhowmick et al. [16] stress the importance of user involvement in map design and the usability aspects of maps based on their experience in developing the online Pennsylvania Cancer Atlas (http://www.geovista.psu.edu/grants/CDC/).

Vieira et al. [95] used animated maps to show spatiotemporal changes. A series of static maps was used to create a movie showing how breast cancer risk in upper Cape Cod, Massachusetts, varied as historical residences changed over space and time. Individual maps were saved as image files and used to create a storyboard in Windows Movie Maker (now known as Windows Live Movie Maker http://download.live.com/moviemaker) and generate a movie (available at http://www.biomedcentral.com/content/supplementary/1476-072X-7-46-S1.wmv) in which each map plays for 0.5 s before transitioning to the next map. Animated maps are also a feature of the Cancer Atlas Viewer, a free Windows software application for exploration of the US National Atlas of Cancer Mortality (http://www.biomedware.com/software/about_Atlas.html).

Popular and less popular software tools

ESRI ArcGIS

ArcGIS (http://www.esri.com/software/arcgis/index.html) is a popular, integrated GIS and mapping software system and components suite from Environmental Systems Research Institute, a GIS software development and services company based in Redlands, California, USA. The US National Cancer Institutes is among the very many third-party entities providing plugins that extend the core functionality of ArcGIS (http://gis.cancer.gov/nci/geovisualization.html#extensions).

Paz et al. [81] used ArcGIS software to geocode the location of the patients’ homes. They also used the ‘spatial join’ tool of the ArcGIS 9 software, which merges geographically referenced information from different geographic layers based on the spatial location of individual features in these layers. Mandal et al. [72] used Hot Spot Analysis (Getis-Ord Gi* statistic) in ArcGIS 9.3 for spatial clustering analysis of breast and prostate cancers in the continental United States between 2000 and 2005 (see http://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Hot_Spot_Analysis_(Getis-Ord_Gi*)_(Spatial_Statistics)). McEntee and Ogneva-Himmelberger [73] also report using ArcGIS and the Gi* statistic.

Scott et al. (2002) [89] illustrate the problems of data shortage in a developing country and report using ESRI Atlas GIS v4, a now out-dated, basic Windows Desktop mapping software application (http://rpmconsulting.com/atlas/AtlasGIS4_0.pdf). ArcMap (ArcView 9.x) in the latest ArcGIS platform includes similar functionality and much more. Their paper provides a useful demonstration of GIS’ potential in the creation of a cancer information system in the context of a developing country (South Africa). Garb et al. [34], on the other hand, describe a novel and unconventional use of GIS (specifically, ArcGIS and its Web mapping component, ArcIMS—http://www.esri.com/software/arcgis/arcims/index.html) to map lesions located in body spaces rather than geographical spaces and locations on the surface of the Earth (the prefix ‘geo’ means Earth). Garb et al. used GIS to examine the findings of transanal endoscopic microsurgery (TEM), a minimally invasive procedure to locate and remove both benign and cancerous lesions of the rectum. Maps of rectal topology were developed in two and three dimensions, highlighting anatomical features of the rectum and the location of lesions found on TEM. Spatial analysis demonstrated a significant relationship between anatomical location of the lesion and procedural failure (clinical outcome). This study demonstrates the feasibility of rendering anatomical locations and clinical events in a GIS and its value in clinical research focusing on individual patients.

Other studies using ArcGIS include [25, 88, 95].

SaTScan

SaTScan™ (http://www.satscan.org/) is a free software that analyses spatial, temporal, and space–time data using the spatial, temporal, or space–time scan statistics. It is designed for any of the following interrelated purposes: (i) perform geographical surveillance of disease, to detect spatial or space–time disease clusters, e.g., of cancer cases, and to see whether they are statistically significant; (ii) test whether a disease is randomly distributed over space, over time, or over space and time; (iii) evaluate the statistical significance of disease cluster alarms; and (iv) perform repeated time-periodic disease surveillance for early detection of disease outbreaks.

A number of statistical methods for evaluating global clustering and local cluster patterns are available. Hinrichsen et al. [56] examined statistical tests for evaluating spatial clustering of disease characteristics, using prostate cancer data from Maryland Cancer Registry, USA (1992–1997). Jackson et al. [59] compared a number of such tests using a dataset of 1950–1969 lung cancer mortality in the USA and employed SaTScan in their study. Gregorio et al. [48] used SaTScan and showed how results of a spatial analysis can differ when the study geography (study area size) is altered. McLaughlin and Boscoe [74] examined the effects of randomisation methods on statistical inference in disease cluster detection using cancer datasets, with findings and recommendations for unbiased statistical inference that are applicable to popular software tools such as SaTScan. Sheehan et al. [90] carried out a study using SaTScan to determine whether observed geographic variations in breast cancer incidence in Massachusetts 1988–1997 are random or statistically significant, whether statistically significant excesses are temporary or time-persistent, and whether they can be explained by covariates such as socioeconomic status (SES) or urban/rural status. Other studies in our corpus of 93 selected papers that used SaTScan include [25, 27, 29, 32, 33, 47, 54, 55, 68, 91, 94, 98, 100].

Goovaerts [43] improves on SaTScan by providing an approach that has the potential of detecting geographic clusters that are not otherwise detected by the conventional spatial scan statistic. Chen et al. [24] address SaTScan’s lack of cartographic support for understanding clusters in their geographic context by providing an interactive visual interface to support the interpretation of SaTScan results. The geovisual analytics approach they describe in [24] facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and visualisation methods and tools that support selection of SaTScan parameters. Boscoe et al. [18] propose another technique for the display of results of Kulldorff’s spatial scan statistic and related cluster detection methods as a map with a nested or contoured appearance that provides a greater degree of informational content. They demonstrated their technique using prostate cancer mortality data in counties within the contiguous United States during the period 1970–1994.

BUGS and WinBUGS

BUGS (http://www.mrc-bsu.cam.ac.uk/bugs/) is an acronym standing for Bayesian inference Using Gibbs Sampling and is a flexible software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods. Chen et al. [25] used BUGS among other methods and software tools to look at the association between liver cancer and immigration in Ontario, Canada. WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml) is a version of BUGS offered as a stand-alone program (for Microsoft Windows) with a graphical user interface. Papers that used WinBUGS include [23, 30, 31, 66, 92].

Miscellaneous software tools

Nolasco et al. [77] used the R software (http://www.r-project.org/) to look at associations between mortality and socioeconomic inequalities in Spain. Wheeler [98] also used the R software, in addition to ClusterSeer (http://www.terraseer.com/products_clusterseer.php) and SaTScan, to compare cluster detection methods for childhood leukaemia incidence in Ohio, USA. Another paper using ClusterSeer is the study by Jacquez and Greiling [60], looking at breast, lung, and colorectal cancer in New York, USA. Vinnakota and Lam [97] employed a spatial data mining approach using Classification Based Association (CBA) software (http://www.comp.nus.edu.sg/~dm2/index.html) to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States between 1988 and 1992 (colorectal, lung, breast, and prostate cancers). GIS technology was used to integrate these data which were defined at different spatial resolutions, and to visualise and analyse the data mining results.

Basara and Yuan [12] investigated the relationship between environmental conditions and health outcomes in communities using the SOM-GIS (self-organising maps-GIS) method. The software implementation used by Basara and Yuan is Viscovery SOMine (http://www.viscovery.net/somine/). The self-organising map algorithm (SOM) has been applied in medical research to address the need for non-linear analytical methods to study the multifaceted aetiology of certain diseases. Kohonen developed the algorithm to search for patterns within expansive, multivariate, numerical datasets.

The ability to represent, quantify, and model individual exposure through time is a critical component of risk estimation, particularly for diseases with long latency periods (such as cancer). In response to this need, a STISSpace Time Intelligence System—(http://www.terraseer.com/products_stis.php) has been developed by Avruskin et al. [10] to visualise and analyse objects simultaneously through space and time.

Discussion

There has long been a recognition that place matters in health, from identification of clusters of yellow fever and cholera in the nineteenth century to modern day analyses of regional and neighbourhood effects on cancer patterns using georeferenced cancer data [85]. The use of spatially referenced data in cancer studies is gaining in prominence, fuelled by the development and availability of spatial analytic and geovisualisation tools and the expansion of the linkages between geography and health [19]. The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in conducting effective cancer surveillance [59] and in developing and monitoring cancer screening programmes. Furthermore, examining geographic variation in cancer patient survival can help identify important prognostic factors that are linked by geography and generate hypotheses for further investigation into survival disparities [55]. GIS can also help in studying the geography of healthcare systems in relation to cancer and in making important service decisions to maximise resource efficiencies and ensure appropriate utilisation of services [76].

While many lessons have been learnt from the spatial analysis of cancer, there are several caveats that apply to many, if not all such analyses. These caveats such as the issue of ‘ecological fallacy’ can substantially detract from a spatial analysis and if not accounted for can lead to weakened and erroneous conclusions [2, 62]. (Ecological fallacy is a flawed interpretation of results of a study, whereby inferences about the characteristics of specific individuals are based solely upon aggregate statistics collected for the group and/or region to which those individuals belong).

Place history is of prime importance in geospatial health research, particularly in the case of diseases with long “incubation” or latency such as cancer, where the effect of risk factors in the patient’s environment might take many years to become manifest as cancer. Many geospatial studies are limited by the lack of disaggregate data and insight into individuals and their place (residential and work) histories, resulting in the wrong assumption of correlations and associations between a disease/patients and features/factors in their current place of residence and/or work at the time study data are collected, and missing the real link between the aetiopathogenesis of the disease and previous residential and work locations.

John Snow’s illustration of his theorised origin of cholera in London via a map of case residences was possible because of the large number of cases in a small geographic area with a single, precisely located exposure [1, 85]. The detection of clusters of a rare disease such as cancer requires sophisticated statistical tools that filter out potentially confounding effects of age, spatially varying population density, and mobility/place histories. Therefore, appropriate application of the statistical methods is mandatory [62, 85].

There is an increasing need for new evidence-based methods and tools that support knowledge construction (to support everyday practice) from complex geospatial datasets related to public health [3]. However, current methods and tools remain difficult for most public health and cancer practitioners to select and use, and results are also frequently difficult to interpret by them and prone to many errors and misinterpretations [2, 3].

To be successful, the design of geospatial methods and tools must be grounded in a solid understanding of the work practices within the domain of use. Bhowmick et al. [15] focused on developing that understanding through the adoption of a user-centred approach to toolset design where they investigated the work of cancer researchers and used the results of that investigation as inputs into the development of design guidelines for new geovisualisation and spatial analysis tools. They conducted key informant interviews focused on use, or potential use, of geographic information, methods, and tools and complemented this with a systematic analysis of published, peer-reviewed articles on geospatial cancer research. Results were used to characterise the typical process of analysis, to identify fundamental differences between intensive users of geospatial methods and infrequent users, and to outline key stages in analysis and tasks within the stages that methods and tools must support. Approaches and findings such as those described by Kamel Boulos [3] and Bhowmick et al. (Epub in 2007) [15] should guide future design and implementation decisions for visual and analytic tools that support cancer prevention and control research and practice.

Another problem with data on human health and cancer is that the data required for analysis are typically scattered across many distributed sources and often collected by different groups and agencies (data sharing difficulties and need for data sharing agreements between different organisations and data custodians) [2, 85]. Accumulating and validating data required for analysis from these multiple sources might take longer than the analysis itself [85].

NAACCR and NCI (United States)

As hinted to above, the full support of cancer dataset custodians (organisations collecting and overseeing such datasets) is also of prime importance in enabling the full realisation of the vision described in [3, 15]. In the United States, the North American Association of Central Cancer Registries, Inc. (NAACCR) established a GIS Ad Hoc Committee to address the appropriate uses of GIS in cancer registry practice (see http://www.naaccr.org/committees/gis). The Committee has published a number of very highly recommended key practical handbooks, including ‘Using Geographic Information Systems Technology in the Collection, Analysis, and Presentation of Cancer Registry Data: A Handbook of Best Practices’ (http://www.naaccr.org/filesystem/pdf/GIS%20handbook%206-3-03.pdf), ‘A Geocoding Best Practices Guide’ (http://www.naaccr.org/filesystem/pdf/Geocoding_Best_Practices.pdf), and a ‘Review of Cluster Analysis Software (http://www.naaccr.org/filesystem/pdf/Final%20Report%20Cluster%20Software%202004-09-27%20rev.pdf).

The US National Cancer Institute (NCI) has been equally active in relation to GIS use in cancer research and practice (see http://gis.cancer.gov/). NCI’s GIS-related applications span GIS database development (e.g., GIS for Breast Cancer Studies on Long Island–LI GI http://li-gis.cancer.gov/), spatial data analysis (covering environmental exposure assessment, statistical modelling, outlier detection for cancer surveillance and cluster identification using tools such as SaTScan), geovisualisation tools development (e.g., http://gis.cancer.gov/nci/geovisualization.html#extensions), and communication of georeferenced statistics (State Cancer Profiles http://statecancerprofiles.cancer.gov and Cancer Mortality Atlas http://www3.cancer.gov/atlasplus/). Moreover, NCI offers funding for GIS grants (http://cancercontrol.cancer.gov/grants/portfolio.asp?codename=spatial/gis%20models) and in 2002, sponsored a meeting of geospatial practitioners and statisticians to develop a series of articles describing the current best practices in the analysis of spatial data [86]. These articles were published in International Journal of Health Geographics [13, 19, 62, 85, 86].

Conclusions

Understanding the relationship between location and cancer/cancer care services can play a key role in disease control and prevention, and in better service planning, and appropriate resource utilisation. Although there are still many barriers, technical (e.g., ease-of-use and interpretation, while avoiding misinformation) and organisational (e.g., data collection, access and sharing; other data issues related to individuals’ privacy and mobility/residential histories), hindering the wide-scale adoption of GIS and related technologies in everyday practice of the health sector, the situation is gradually improving (e.g., NAACCR and NCI activities).

The authors would very much like to see follow-on snapshot papers on specific cancer types (e.g., geospatial research in breast cancer), perhaps also incorporating further data from additional papers published outside our three chosen journals (e.g., [47]) and also covering additional years (e.g., 1995–2001 and 2010 (e.g., [146])).