Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Recent technological advances combined with a more geographically aware population are introducing new opportunities and challenges for data collection for public health purposes. Volunteered geographic information (VGI) generally refers to data volunteered by individuals, including a geographic component, that can be later disseminated using various tools (Goodchild 2007). VGI data itself can also be thought of as any data provided by an individual that includes some geographic context and which allows for the aggregation and dissemination of data based on this information. Even throughout this book and chapter, the Goodchild’s is the prevailing definition, the “volunteering” of geographic information did not suddenly start in the last few years – rather our ability to rapidly collect, conceptualize, and use this information has grown significantly. This is important because some of the traditional methods by which individuals share geographic information can transfer knowledge about the individual quickly and in ways that was not possible or was much more difficult only a few years ago.

Today, new tools are providing ways for individuals, communities, corporations, and governments to harness volunteered geographic information without necessarily needing a fully functional geographic information system (GIS). Terms like GPS, GIS, and location-based services are more commonplace in the media and used much more frequently by the general public. The popularization of tools like Google’s My Maps, Google Earth, Microsoft’s Virtual Earth (now Bing Maps), foursquare, OpenStreetMap, and others has made spatial processes and data more accessible. Google’s My Maps provides free tools for the creation of vector data. Users can create features on an existing base map by adding attributed locations as points (an address), lines (a popular jogging path), or polygons (favorite neighborhood to hang out in; location of a popular street fair). Users can also overlay and georectify digital images in products like Google Earth. New cloud-based services like those available through ArcGIS.com are even providing free access to some of the functionality found in common GIS desktop software. Cloud-based GIS services are providing more opportunities for collaboration among GIS users, and relatively little instruction is necessary in order to leverage these tools. GPS units, once commonplace, are now being replaced by similar functionality in smartphones. As a result, many spatial processes, once requiring expensive GIS software, hardware, GIS datasets, and technical expertise, are now exposing traditional GIS tools to a broader audience. Some of these advances and increase in interest have partially been achieved through better usability and improved user interface design. These advances have all contributed to the removal of some traditional barriers to entry in GIS. Some believe this has even contributed towards the creation of two new disciplines – neogeography and neocartography. Whether or not these tools and techniques lend themselves to entirely new disciplines remains to be seen. It is clear that the methods and access through which geographic data is collected, analyzed, and disseminated will never be the same. The use of VGI promises to do what KML and subsequent adoption as an Open Geospatial Consortium standard (OGC 2011) did for GIS data.

Tools like InSTEDD’s GeoChat allow for groups or individuals to share, comment on, and collect location-based information, later aggregating it to create datasets capable of display in a GIS. GeoChat is designed to work essentially as a group chat; the program was originally designed to allow groups of individuals to communicate during an emergency and report an information (InSTEDD 2011). The application provides an easy way to catalog location-based, temporal information using just a cell phone equipped with SMS texting capabilities. Products like OpenDataKit provide an open-source set of tools that allow users to create and disseminate surveys using smartphones (ODK 2011). Using such systems, users can easily create their own forms for data collection activities in the field. Because the cost of development of the form and systems is easily scalable, they can be adapted by small or large groups.

For the public health researcher, these tools may at first glance seem trivial, but they are providing new opportunities to gather information from individuals and communities using cheap and efficient methods. Anyone who can send a simple text message on their cellular phone can inform a study or provide geographically referenced data and thereby potentially create information that is not available in any standard dataset. The short message service (SMS) protocol allows for very short bursts of data to be shared with not only individuals but groups of individuals, parsed and aggregated to create bigger datasets and viewed on a map when geocodable information is included. This approach towards data collection and aggregation can provide much needed quantitative and qualitative data, capturing a variety of variables that would enable increased understanding of physical and/or built environments. For urban areas, these tools may aid in better understanding an individual’s perception of safety, identify or document environmental health concerns, better understand what a neighborhood means to the individual, collect data during health events or emergencies, or better understand how the food environment is viewed by the individual across various socioeconomic status (SES) measures. These new, rich data sources would provide a tapestry of information that better reflects the realities of those inhibitors that presently block or discourage access to a variety of health services, identify health behaviors, provide for a better understanding of the built environment’s impact on health, and will help to identify outreach strategies at a neighborhood level.

2 The Potential for VGI in Public Health Practice

In 2011, the New York City Department of Health and Mental Hygiene, with student volunteer support from CUNY’s School of Public Health at Hunter College, undertook a study to evaluate the concentrations of alcohol advertisements throughout New York City (NYC DOHMH 2009). The study was partially based on earlier results gathered through a similar sampling performed that summer with support from local community groups by the department’s Bureau of Alcohol and Drug Use, Prevention, Care and Treatment. In both studies, GeoChat was used to catalog alcohol advertisements into three broad categories: type of advertisement, type of alcohol, and brand. In the fall of 2011, a strategy for gathering location-based information on the advertisements was devised in which a random sample of 30 ZIP codes across three income categories (low, medium, high) were selected, and teams were sent out to capture the location of alcohol advertisements using nothing more than a cell phone and a copy record for backup and notes. The result was the creation of a point dataset that represented the alcohol advertisements throughout the 30 selected ZIP codes.

Surveys provide another logical opportunity for the attributed collection of geographic data – either administered by a survey subject directly through an application or administered to them by field staff. In survey design, VGI provides a way to collect results that leverage geography. In New York City, the Department of Health and Mental Hygiene conducts an annual survey of approximately 10,000 New Yorkers called the Community Health Survey (NYC DOHMH 2009). In this survey, respondents are asked a series of questions – information from questions such as “Have you ever been told by a doctor that you have diabetes?” provides neighborhood-level estimates for a variety of health indicators. In all, approximately 35 of these indicators in any given year can be mapped. In all cases, ZIP codes are the only geographic indicator other than the phone prefix (which is not generally used to identify place except in cases where the provided ZIP code does not exist or cannot be properly aggregated). Because the survey needs to maintain a high level of statistical validity, ZIP code level data is rolled up into groups of two or three ZIP codes which comprise what are called United Hospital Fund (UHF) neighborhoods. These neighborhoods are good approximations for place at a subborough level (boroughs in New York City are counties) but are often criticized for not being at a fine-enough scale to identify very small compositional differences between neighborhoods. UHF neighborhoods often group disparate areas together, which makes the term “neighborhood” a bit of a misnomer. Still, the Community Health Survey provides detail-rich data which guide many public health programs. VGI holds some unique promise for improving the geographic accuracy of surveys like the Community Health Survey. For one, using VGI allows the survey participant to provide location-based information that is not dependent on their ZIP code or home address. Location-based services can capture a point returned by the GPS in the phone or by triangulation using cell phone towers. Alternately, if it is determined that such information is too sensitive, participants in a survey can be aggregated to a predefined or flexible grid. In the case of a predefined grid, participants and their respective answers are grouped into grid boundaries or existing polygon neighborhood definitions, like the NYC Projection Areas created by the NYC Department of City Planning. In this way, the participant’s exact location is only necessary to assign a neighborhood definition. Alternately, if a location is recorded, the exact information can be loaded into any administrative boundary, where the underlying population is sufficiently large to ensure statistical validity in the dataset.

In the case of epidemiological investigations, VGI may provide opportunities to crowdsource accurate spatial representations of travel patterns and behavior. For example, in a tuberculosis or food-borne illness investigation, if a contact is suspected, the user may turn on a tracking mechanism that reports all place history, times, and length of stay for a period of time. When compared to others that are identified as existing in that same cohort, relationships can be identified that were previously unnoticed. The information feed can also be passive. Location-aware individuals can behave as public health “lookouts” or sentinels. When one of these individuals becomes ill or otherwise affected by a public health concern, the entire place histories can help to identify spatial relationships that were not evident before.

Understanding the relationship between proximity to healthy food and health is something that VGI can help to address. By enlisting local community groups and volunteers, a robust dataset can be sourced that reflects locations of quality and type of food sources, as well as the opinions and insight that a community can bring to better understanding an area’s local tapestry. Existing government data sources and initiatives can be enhanced and further populated by the VGI experience. In New York City, the recent Food Retail Expansion to Support Health (FRESH) initiative identified food desserts through a Supermarket Need Index (SNI) (Smith et al. 2011). The SNI is an index that reflects a number of variables in the calculation of need, including population, access to a car, poverty, number of fresh fruits and vegetables, obesity, and diabetes. The initiative and follow-up work between the NYC Department of City Planning, NYC Department of Health and Mental Hygiene, and NYC Economic Development Corporation provide insight into how many variables contribute to the creation of a high-need area. However, there are other indicators of place that are not widely accounted for. One example is the definition of a neighborhood itself; the NYC Department of City Planning’s Projection Areas serve as better approximations of neighborhoods than ZIP codes, but it is clear that neighborhoods do not start or stop at distinct Project Area boundaries.

3 Privacy Concerns and Health Data

Clinical and public health practitioners are well aware of the importance of protecting the confidentiality of personally identifiable information. However, the collection and disclosure of personally identifiable information in this novel geographic context are less understood by many public health researchers. Privacy is always a top issue in technology circles – major privacy breaches and a lack of transparency into how users’ data is used have recently reignited some public awareness of privacy concerns. As technology giants begin harnessing larger and larger datasets that include spatial information, new data sources provide yet further opportunities for the exploitation of personally identifiable information (Forbes 2011). Companies like Google are realizing that in order to compete in big data environments, simplifying the process by which data can be connected is important. In 2012, Google got “rid of over 60 different privacy policies” and replaced them with “one that’s a lot shorter and easier to read.” The stated goal was to create a more seamless experience for users (Google 2012).

It is not clear that these companies are fully aware of the precedent they are establishing when first releasing potentially personally identifiable information into the wild and later placing restrictions on data access or protections in place once problems have been identified. A recent CNET investigation found that Microsoft had collected spatial data on laptops, cell phones, and Wi-Fi devices and released that information on the web without taking precautions that other companies with similar datasets (e.g., Google) had (CNET 2011). In public health research, the notion that removing attribute data from a person’s geographic footprint and thereby only linking things by geographic and temporal proximity should not be seen as a sufficient mode of protecting an individual. Reverse geocoding, a process of identifying a street address from a point on a map, provides plenty of opportunities for identifying an individual, as do time-stamped geographic data or travel paths (Brownstein et al. 2005). Therefore, it is up to the public health researcher to ensure that such requirements for privacy are met. John Snow’s legendary Broad Street map of cholera cases in 1854 would present problems in a peer-reviewed publication today. However, spot maps are still a popular way to depict emerging cases during epidemiological investigations.

One clear way to address this problem is through simple education of the public health researcher. A street address, for example, can already be considered confidential information, but the context of the information, say a spreadsheet versus an online map, does not change the confidentiality of the data. While the risk associated with the publication of a spot map identifying the location of HIV-infected patients, for example, may be very obvious to health researchers today, ethical challenges in applying GIS in a public health setting remain. Furthermore, with the democratization of geographic information, more personally identifiable datasets are created, collected, and analyzed by individuals that have no formal health education, training, or work experience. Therefore, it is quite possible that health-related applications emerge which are created either by individuals without a health sciences background or an incentive to adequately protect privacy.

Ethics in geographic information science remains an underexplored topic. While GIS is heavily used in oil and gas exploration, environmental sciences, and military intelligence, the investigation of appropriate and ethical use in the literature is weak (Goodchild 2011). Recently, however, there have been initiatives, papers, and presentations on ethics in GIS, of which the most visible and rigorous exploration of these issues are a series of graduate seminars through a National Science Foundation Grant (Penn State 2010).

4 Ethical Norms in Public Health

For the public health researcher, collecting data using VGI is a novel method of data capture but can use many of the standard approaches the researcher is likely already familiar with. As discussed thus far, tools that provide for the collection of VGI allow for the opportunity to collect spatial information by a subject or group of subjects. In traditional public health data collection practice, it may be common to collect standard US Postal Service information including a person’s country, state, ZIP code, residential address, and name. Using a survey tool like OpenDataKit, a user can be prompted to enter all the same information through their smartphone. Spatial information can also be collected directly from the device. If the collection of geographic information is continuous over a period of time, it is possible to realistically ascertain a residence, workplace, and commuting pattern of an individual. Google’s Latitude does just that – by passively monitoring a user, with their permission, the application will build a personal history profile, even providing statistics on how often one stays at work, where the individual travels most frequently, and during what times of the week.

Because a person’s geographic history can be collected and retained directly in the device, there is the risk that this location-based information can be collected without the person’s awareness or consent. Even if consent is provided, the individual may not be aware of the extent of additional information that can be ascertained about him/her simply by providing a steady stream of location-based information. The recent uproar surrounding the disclosure that the “Carrier IQ software was being used by phone manufacturers and carriers to monitor performance without implicit knowledge of the individual” is one recent example and has led to the creation of draft legislation towards a Mobile Device Privacy Act (Ars Technica 2012).

By retaining detail-rich spatial and temporal data histories, the researcher can determine the place and time of many events surrounding the subject. While this information can be immensely useful for understanding patterns, links, and relationships between people and place, it can also compromise a person’s identity. As mentioned previously, it would be wrong to assume that the removal of personally identifiable information except for the subject’s latitude/longitude, travel paths or footprints, or travel history is itself sufficient for protecting patient confidentiality. It is easy to see how one could readily determine with reasonable accuracy an HIV patient’s particular habits by simply reverse geocoding the estimated point of residence and cross-checking the other place visits with facilities in the area. This might lead one to determine a particular individual visiting an HIV clinic, or a domestic violence victim visiting a shelter or support group. If such information is disclosed (the location of an Alcoholics Anonymous meeting, domestic violence shelter, STD treatment facility), the exposure of just one subject could risk exposure of others who have not provided consent. In this manner, people essentially become sentinels for behavior, and while public health uses abound, threats to privacy and security do too. If consumers were made aware of the power of their own geographic footprint, they would in turn be more likely to protect it the same way they do their home address. Today, geographic data collected by location-aware devices is almost always coupled with temporal data, which can make it exponentially more powerful. The question of where someone lives or works becomes, “Where were you last Saturday evening?”

Consumers are however prone to quickly accept these caveats in the interest of using the latest application, and marketers of these products are happy to make that process easy. A legally binding document or agreement would simply be skipped over by a quick selection of the “I Accept” button, whereby consumers would be willing to give up highly personal information in exchange for the value of a service. Furthermore, certain services might be constructed whereby they were essential, leaving the customer little opportunity to consider an alternative with which they would not sacrifice such information.

The most recent draft of the Personal health record (PHR) by the Office of the National Coordinator for Health Information Technology states that “Health information stored in a personal health record is under the control of the patient” (ONCHIT 2010a, b). This draft goes on to highlight how different vendors should use a unified method, similar to that of a nutritional label on food packaging to alert customers to privacy and security. A similar approach or model for the use of VGI might also be appropriate, thereby putting the control of one’s personal geographic history back in the hands of the individual. It is also important that individuals also retain some responsibility and understand what they are responsible for. One example is found in the HONcode of the Health on the Net Foundation. The HON Code of Conduct is referenced in Microsoft’s HealthVault Account Privacy Statement. It provides guidelines for the ethical dissemination of health information and relies on user’s “sense of responsibility” to report health-related websites that deviate from this standard (Microsoft Health Vault 2011a, b; HON 2011).

5 De-identification of Geographic Information Under the Privacy Rule and the Institutional Review Board

The Health Insurance Portability and Accountability Act (HIPAA) of 1996 established a Federal Privacy Rule facilitating more rigorous regulation of the use of protected health information (NIH 2007). Protected health information rules apply to “individually identifiable health information” but have not kept up with new data available with emerging technology (NIH 2007). The Privacy Rule presently identifies geographic subdivisions, including the street address of an individual, as data that must be removed from a record, but it does not provide explicit guidance around an individual’s given latitude/longitude or personal travel history (NIH 2007). It is clear that this information could potentially be classified as a unique identifying number (NIH 2007). Since new technologies, including those now found in smartphones, provide ample ways of identifying a person’s location, and numerous applications use this information to connect an individual with services, it can be very difficult to decouple the individual from their location. Traditionally, statistical measures have been employed on datasets, thereby aggregating individual-level data securely to a larger administrative boundary. However, data can also be aggregated to a much smaller area (a grid cell or series of grid cells, the size of which is determined by the GIS analyst) for analytical purposes. This may or may not provide ample protection for the individual, since the grid of cells still needs to be generalized enough to not give away sensitive location information but granular enough to provide better definition than other administrative boundaries.

There are however ongoing efforts to advocate for such privacy concerns. The Office of the National Coordinator for Health IT has established a Chief Privacy Officer. This position is responsible for providing advice on the implementation of technology within HITECH programs and advising the National Coordinator on privacy issues (ONCHIT 2011). The Health Information Technology for Clinical and Economic Health (HITECH) Act provides additional protections for health information already covered by HIPAA. The new protections are geared towards making information available to the patient and expanding patient rights to such information while also protecting disclosures of health information to insurers, business associates, and marketers without previous patient authorization (ONCHIT 2011).

Along another front, an institutional review board (IRB) can provide a systematic check that seeks to protect study subjects from unethical behavior and undue risks. Ethics standards as spelled out in the National Institutes of Health Clinical Research Training state that because human subjects are “a necessary means to the end of greater knowledge,” that there is the potential for exploitation (NIH 2012). Geography tells us a lot about study subjects, and the collection of such data and the enrollment of subjects should periodically be checked, just as it would be for any other IRB-approved study. Guidelines effectively reduce the risk of this happening. Informed consent is used to help protect subjects and dictates that subjects clearly understand and agree to the study’s goals and objectives. Subjects need to have a clear sense about how information collected about them is used later – something that is clearly lacking from many privacy statements. Existing protections, as defined by and provided for a traditional IRB review and approval process, must be able to take into account such privacy concerns from these emerging technologies. In particular, individuals sitting on IRB panels must understand the ramifications of practitioners collecting geographic data in public health studies and must provide direction for ensuring that such data collected using VGI cannot be used for something other than how it was originally intended. The expiration of VGI data or some other mechanism may be one way to help ensure that data collected for one purpose is not later repurposed for something else.

Researchers using VGI must question the risks of tools developed in an environment that does not require institutional review board (IRB) approval. Preferable researchers without access to an IRB or similar institution would seek out with a partner with one that is willing to review their proposed work since IRBs have the ability to review research projects conducted outside of their organization (FDA 2011). However, it is unlikely that many researchers unaffiliated with an IRB would voluntarily seek regulation, partially because IRB review and approval is often a long process that requires significant upfront documentation and – very clearly – the informed consent of study subjects. It seems unlikely that such stringent measures would be placed on geographically centered research being performed outside of a research institution with an IRB.

Given the history of privacy concerns around public health and the protections already in place, could a board of GIS professionals serve to provide some oversight and proactive guidance on the use of such technologies? If those leveraging the power of VGI will not seek out human subject protection on their own, it may fall on the GIS professional to ensure that human subject abuses never take place. If standards are not put into place, it is likely that real problems with unrestricted collection of spatial and temporal data will not be uncovered until explored further by the courts. Use of GPS devices by police is coming under increased scrutiny, but such inspection is often late (NYT 2012). VGI, when used in public health settings, must be approached in terms of existing protections and potential for future abuse. In particular, because VGI provides a mechanism for collecting increasingly accurate spatial and temporal data about an individual, privacy must be protected above all.

6 Summary and Conclusions

As geographers and public health practitioners, it is our duty to inform the general public as to the value of geography. We must educate the general public to the inherent risks and rewards of sharing geographic information with private companies, nonprofit organizations, government, and individuals. Location should be viewed as one would view their social security number – something over which they should not lose control nor share broadly with others. When it is shared, it should be with full disclosure of the risks and – as much as possible – the unintended consequences that might remain. As it can be common for tools and even data to be used for something other than originally intended, it is vital to develop an appropriate framework of recommendations for the collection, analysis, and use of the information. Such guidelines would provide some mechanism by which the creators of such systems can be made aware of such concerns while empowering those in the public health community to identify and address such applications.

Volunteered geographic information will thrive in an open community largely because it is, fundamentally, volunteered. However, location-based information collected on large subsets of the population is largely done without adequate disclosure to individuals. Unfortunately, it may not be until we see abuses of such technology that we see a need to further regulate the collection of such data.

Inherently, one’s location belongs to the individual – not to the cell phone company, not to the government, nor to any other application provider. Until we treat identifiable information coming from the individual with the adequate level of care and ensure that individuals understand the ramifications of providing such location-based information, it is unlikely that future abuses will be curtailed. In an electronic age, it is only too easy to accidentally release personally identifiable information, and an individual’s geographic footprint only provides an additional measure by which it can readily be disseminated.

Perhaps the easiest approach is to remove the passive monitoring of one’s location, when the benefits to the individual clearly do not outweigh the risks. Location needs to be treated as privileged information owned first and foremost by the individual and no one else. Active participation in an application that requires geographic information in order to work correctly still requires receipt of consent, but to what extent should the individual be able to control and later remove one’s own data from further analytical use?

Institutional review boards clearly show one alternative to vetting the use of VGI in research, but, as pointed out earlier in this chapter, the likelihood that all applications using location for research purposes would abide by such regulation seems remote. Clearly the power must sit with the individual, and the individual must be willing and able to exercise some control over their own geographic footprints. Federal, state, and local initiatives, along with adoption of electronic health records by large employers, may make possible successful merger of volunteered geographic information and health information technology in a way that does not sacrifice privacy. The rapidity of the use, acceptance, and development of VGI in developing countries is phenomenal. Need for efficient, low-cost, and user-friendly technology has outweighed privacy concerns and thus facilitated the adoption of VGI for a wide range of uses. Areas that are lacking the necessary infrastructure for more common public health informatics deployments can utilize their own light-weight, web-based systems across networks that are in place, the cell phone network being an example of this. In fact, some countries have better infrastructure for cell phones than sanitation (The Telegraph 2010). In these scenarios, have we sacrificed privacy and perhaps data quality for convenience?

There will surely be abuses of personal geographic information, and there will be cases where sharing one’s location or location history will unwittingly implicate or otherwise harm an individual, group of individuals, or organizations. The more aware public health researchers, geographers, and to a greater extent the general public are to the challenges that remain in securing such geographic information, the more likely it is that we as practitioners are able to avoid and mitigate future damages from such exposure. Personal geographic information should remain under the individual’s control, and mechanisms like the HIPPA privacy rule may assist individuals in understanding not only their rights but how important personal information is used (DHHS 2011).