Introduction

Trust between doctor and patient is fundamental to the practice of medicine. A patient must trust the physician sufficiently to share personal details that may be stressful, embarrassing, or potentially damaging. A physician must trust that a patient is sharing enough information to make an accurate diagnosis, and that a patient is able to give informed consent about treatments that may pose significant risks. Trust in psychiatrists may be more important to patients with mental disorders than to patients with other serious illnesses [1]. An essential component of the trust between doctor and patient is privacy. Over two thousand years ago, Hippocrates emphasized the importance of privacy, and the practice of medicine has recognized and valued the importance of privacy ever since.

Privacy of medical data is regulated by federal and state laws but primarily HIPAA. HIPAA regulates patient data that is collected by providers and their business associates in relation to treatment, payment or healthcare operations. Most privacy discussions relate to concerns about HIPAA, such as the relative ease of re-identification of deidentified data [2, 3]. This review will focus on the medical and health data that are increasingly being collected outside of HIPAA protections. Medical and health data outside of HIPAA can be volunteered by consumers directly, observed by corporations recording consumer actions, and inferred by calculated models [4]. The rapidly expanding stores of data collected outside of HIPAA are encroaching on the traditional doctor patient relationship and eroding medical privacy.

Digital World

To understand the implications of medical and health data collected outside of HIPAA, it is necessary to review the scope and complexity of the rapidly expanding digital world. The percent of the world’s stored information that is in digital format has dramatically increased from 25 % in the year 2000 to more than 98 % in 2013 [5]. In the US, the amount of digital data is doubling every three years, driven by increased consumer use of smartphones, Internet, social networks and picture-taking, metadata (information about information), conversion from analog to digital (film, TV, voice), and the growth of machine generated data including RFID tags, sensors, and surveillance cameras [6]. Metadata for online transactions contains information such as account numbers, login IDs, passwords, phone numbers, browser types, IP addresses, date, time, email sender and recipient, search terms and results, cookies and device fingerprints [7••].

Eighty % of the digital data stored in the US is consumer related and the majority of this is data about consumers’ lives, such as metadata, medical records and imaging, rather than data explicitly created by consumers such as emails sent or pictures taken [6, 8]. This personal detail is valuable because it can be combined, indexed and searched in databases, used to create individual digital dossiers and used for predictive modeling or profiling. Indeed, the digital databases about consumer daily activities, transactions and movements are considered to be a new asset class and the primary source of competitive advantage in the twenty first century [4].

Digital data does not reside where it was generated. Data moves and is serviced by many corporations and devices, including Internet service providers, communications companies, mail servers, database servers, web site owners, Internet retailers, data brokers, analytics firms and advertising networks. Every organization along this journey has the ability to copy and store data, including in countries with different regulations. About one-fourth of all digital data are original information, while the remaining three-fourths are duplications such as email attachments and backup copies [9].

Changing Public Perceptions of Privacy

Along with the expansion of the digital world, the public attitude toward privacy is evolving [10]. Although surveys completed outside of healthcare find that consumers still value privacy, there is a well documented “privacy paradox” showing inconsistencies between peoples intentions and behaviors relating to disclosing personal information [11]. Most consumers are willing to pay for online services with personal information rather than money [12, 13], or to disclose personal information for monetary rewards of less than $50 [14]. Personal information is willingly and routinely disclosed in daily life to save time and money with the use of credit cards, cell phones, social media, search engines, and loyalty cards, and because the use of many digital technologies is no longer optional [15••].

The public is also exposed to relentless hype of new technologies and gadgets by the media, especially aimed at the younger generations [16]. Technology leaders, generally from Internet companies such as Facebook and Google that monetize masses of personal data, actively promote “less privacy” as the new social norm [17, 18]. Privacy is portrayed as an old-fashioned, costly value that stifles innovation, efficiency, and entrepreneurship [10, 19]. In relation to healthcare, privacy is often described as a barrier that impedes the full potential of collaboration, technology, and big data to improve outcomes and address critical problems of quality and cost [2022]. In contrast, openness and sharing of data is described as fundamental to the public good since the data mining of digital medical records will create future knowledge and innovation in healthcare [2325]. Futurists in the “quantified self movement” embrace devices that can be worn on the body for self-tracking of biological and physiological data, not only for self-improvement, but to combine into massive scientific databases [26, 27].

Sources of Medical and Health Data Outside of HIPAA

Daily Sources

There are numerous daily sources of medical and health data outside of HIPAA protection. These include credit card payments for physician visit co-pays, purchase of over the counter (OTC) medications, home testing products, tobacco products, health foods, items related to disabilities, and visits to alternative practitioners [28, 29•, 30]. People also volunteer medical information online by searching for disease information, discussing their medical experiences in emails, blogs, chat groups, or social media sites including those dedicated to specific illnesses, or by calls to toll-free numbers. Other online activities that reveal medical information include registering for coupons on pharmaceutical direct-to-consumer advertising sites, registering for free trials of OTC products or online health services, registering for disease advocacy sites or to view patient support forums, “liking” web pages about diseases, completing online health and symptom checkers, and donating to health causes [3033]. About three-fourths of consumers who use the Internet search for health information [34], and about three-fourths of health web sites contain third party tracking elements [35, 36]. Furthermore, one-third of U.S. consumers use YouTube, Facebook and Twitter for medical related discussions such as to check consumer reviews [37]. See Table 1 for an example of how a patient with depression may potentially disclose personal medical and health data outside of HIPAA protections.

Table 1 Examples of data that may potentially be collected outside of HIPAA protection for a patient with depression

Other medical information outside the HIPAA framework is held by gyms, fitness clubs, wellness providers, banks, medical researchers, health fairs, and transit companies [29•]. Employers who do not fall under HIPAA, including those with fewer than 50 employees, may obtain medical information such as to determine ability to perform duties required for employment [41]. Additionally, state and federal governments are excluded from HIPAA requirements, allowing the storage of Medicaid records offshore [42] and allowing 33 state governments to sell or share personal health data [43].

Mobile Medical Apps

A myriad of technologies are now available to monitor every aspect of daily life including physiological measurements, physical activity and behavior [44•]. There has been an explosion of applications for mobile devices to promote health and disease self-management. As of 2012, there were about 13,000 health apps for consumers on the Apple AppStore, of which 5.8 % were related to mental health, 4.13 % to sleep, and 11.44 % to stress and relaxation [45]. A 2013 study reported 14,000 health apps, of which, 558 were for mental health and behavioral disorders, with two-third being for autism, anxiety, depression, and attention deficit hyperactivity disorder [46].

The vast majority of these applications are not medical devices and do not require FDA approval. The data from most apps are managed by the software vendor, not accessible by healthcare providers, and are outside of HIPAA regulations. Patients may mistakenly assume that mobile apps are under the scope of HIPAA since the same data, such as heart rate, may be collected by an application that is accessible to their physician and covered by HIPAA, or on a mobile app that is not accessible to the physician and not covered by HIPAA [47]. Even data from a prescribed medical device may fall outside of the scope of HIPAA if it is sent directly to the device manufacturer, who in turn provides a summary report to the physician [48]. Many consumers are not aware that data from medical apps are frequently sent to the software vendor, and to third party sites for analytics and advertising services [49].

Patient Control of Digital Medical Records

Many patients are obtaining digital copies of their medical records, such as with Blue Button from the Veterans Administration. Once downloaded from a provider’s EHR system, the medical record data are outside of HIPAA protection, and the patient becomes responsible for stewardship of the data. Patients without a background in technology management may inadvertently become a large source of leaking medical records. Moreover, data posted to the Internet are effectively permanent, since data cannot be deleted with assurance due to the distributed and redundant storage of Internet data [9, 50]. For example, comments from patients with multiple sclerosis containing private health information were found on YouTube health videos after their accounts were deleted [51]. Another concern is that patients will combine data downloaded from their EHR with unprotected data in a mobile app. There are also many online sites for maintaining personal health records (PHR) although these are rarely used today.

Data Brokers

Data brokers, also referred to as data aggregators or information resellers, are a multi-billion dollar industry that collect, analyze, and sell data on consumers [28, 52•]. As of 2012, about 4000 data brokers have data on about 300 million Americans [53]. Data brokers collect data from every aspect of our lives including public records such as property taxes and voter registrations, publicly available information such as phone numbers and Internet postings, and non-public information such as financial data, loyalty cards, and Internet transactions [28, 52•]. Additionally, consumers have accepted location aware mobile devices such as smartphones, which contain multiple sensors, are frequently always-carried and always-on, and provide tracking information [54•]. Data brokers link together data from all of an individual’s online and offline accounts and devices [52•, 55], and some store data indefinitely [30]. In general, consumers do not have the right to control what personal information is collected, maintained, used, and shared by data brokers or to correct errors [28, 30, 52•]. Furthermore, data brokers routinely purchase data from other data brokers, so a consumer could not realistically trace the source of incorrect data [30]. Most regulations that impact data brokers pertain to the financial sector such as under the Fair Credit Reporting Act. The primary products from data brokers are used to predict consumer behavior and are sold mainly to online marketers.

Medical and Health Products from Data Brokers

Data brokers sell a variety of products about health issues based on data collected outside of HIPAA. Consumer lists are available by diagnosis such as depression, ADHD, or anxiety [52•, 56, 57] and by medications taken such as antidepressants [58]. Data brokers also combine health data with data from consumer habits, assets, and demographics to use in consumer health scores, profiling, and predictive modeling [59]. Examples of scores that are used outside the HIPAA framework include the Brand Name Medicine Propensity Score from Acxiom [60] and the FICO Medication Adherence score [29•, 61]. Consumer health scores may be used as variables within predictive models by life insurers or actuaries as part of an evaluation process [62, 63]. Data collected by data brokers can also be purchased for re-identification. This is of great concern since the more information available about a person, the easier it is to re-identify the person in the future [3].

Predictive Modeling

Predictive modeling, referred to by the advertising community as behavioral targeting, is used to bring specific advertisements to online users based on their perceived interests. Behavioral targeting is about twice as effective as other forms of online advertising, and is viewed as critical for a business model that provides free online content and services [64]. The data used to create behavioral targeting algorithms includes detailed activity at websites from content providers (such as search terms, search histories and content selected), clickstreams (route navigated across the Web), and a wide range of data purchased from data brokers. Many analysts believe that the more data that can be combined, the more precise the profile that can be generated about our habits. Acxiom offers “over 3000 propensities for nearly every U.S. consumer” [65].

Most algorithms used for profiling and targeted marketing are not publicly available but medical and social science researchers have identified a wide variety of individual traits and behaviors based on Internet data. Researchers have investigated patterns of activity, linguistic style, and emotional expression in the content of social media [66]. For example, personality was predicted from data in Twitter [67], personal web sites [68], and Facebook [69]. Data from Facebook were used to identify depression in college students [70], ethnicity and sexual orientation [71••], and schizotypy personality [72]. Data from Twitter were used to predict postpartum emotional changes [66].

Predictive modeling is also used to estimate health status and may have the same consequences for an individual as if the information came from an electronic medical record (EMR). For example, when Target predicted that a customer was pregnant due to purchasing patterns [73], it caused as much distress as if this was based on actual data from a healthcare provider [74]. This incident also highlighted that personal health information can be created by combining seemingly innocuous data, and that a predictive model outside of HIPAA protection can cause harm whether or not it is accurate. Health predictions may seriously impact a person’s life including getting and keeping a job, and the ability to get life insurance [74]. Although health predictions may be incorrect or disclose information people want kept private [50], the current legal framework does not address predictive models using data outside of HIPAA [74, 75]. Adverse consequences of health profiling may affect members of certain groups disproportionately [50], such as those with mental illness. Health profiling is accurate enough to use to recruit patients for clinical trials [76].

Since data outside of HIPAA are easy to obtain and subject to minimal regulation, the use of predictive models of health status as a substitute for actual individual medical data may increase [75]. Predictive health models can also be combined with traditional medical data, such as that leaked by a patient controlling data downloaded from a provider’s EHR system. This could lead to a future in which data brokers have more detailed information about a patient than that directly disclosed to their physician. It is important to remember that the results of predictive models are not based on physician judgment or on a directly measured value, but are calculated values often by disciplines outside of medicine. The accuracy of commercial predictive models is not published and replicated like the results of a scientific study. Additionally, the data brokers who sell predictive health models are not involved in patient care and have no training in medical ethics.

Selling Patient Experience

One area of particular concern involves the health web sites at which users create as well as read content, such as online patient support communities. This data often consists of self-reported diagnoses, medical history, symptoms, treatments, drug reactions, and patient opinions about providers. These web sites commonly have a business model based on aggregating, mining, and selling user generated content, often to pharmaceutical companies, device manufacturers or researchers [77, 78]. Patient generated data is particularly valued by marketing organizations because it reflects routine behavior rather than answers to solicited surveys [77]. Many companies behind these web sites actively encourage sharing of data in order to build larger databases [79].

Patients may not be aware of the commercial ownership of these web sites [79] or may not realize the extent of the third party involvement [78, 80]. For example, in a study of 69 patient support sites, pharmaceutical connections to the organizations were difficult to determine by end users [81]. People who are comfortable sharing data online for the betterment of the general good may not want to do so to enrich a company [79]. Additionally, there are a growing number of web scraper companies that automatically gather data from unstructured or semi-structured data pages of target websites to amass large databases. One healthcare example is Treato, which “automatically collects the massive amount of patient-written health experiences from blogs and forums”, then processes the data and sells to pharmaceutical marketers [82]. Finally, there are technical privacy issues unique to social networking sites such that the data may be more difficult to anonymize than that in relational databases [2, 80].

Privacy Policies for Online Activities

Internet privacy policies are not succeeding at explaining the risks of data sharing to the public, and may serve more as liability disclaimers than as assurances of consumer privacy [83]. Most people do not even read online privacy policies, including at healthcare web sites, or understand that commercial organizations share, analyze and sell data [8486]. Surprisingly, many people have unexpected reactions to privacy policies. Some consumers mistakenly believe that the mere presence of a “privacy policy” means that their information will be kept private, and that the web site will not share their information [85]. Additionally, the perception of control over the release of information from a privacy policy may increase consumers’ willingness to disclose sensitive information, even if actual control is not increased [87]. In contrast, some people see the presence of a privacy policy as a warning of an unsafe environment, and will withhold more information than when there is no mention of privacy [88].

Multiple studies of healthcare websites have found that the privacy policies are difficult to understand. Most privacy policies are written at a reading level equivalent to two years of college [8992] although half the US adult population has completed less than 1 year of college [93]. One study found that the privacy policies of 185 major health institutions were about as long as a research article in JAMA [94]. A comparison of privacy policies for nine healthcare websites before and after HIPAA legislation found that after the legislation, the policies were more descriptive but longer and more difficult to comprehend [90]. The readability issues may be more important for patients with mental illnesses since they may also have impaired reading abilities [95, 96].

On social media web sites, privacy policies apply only to the data that the social media companies collect from the users such as through registration forms or cookies, and not to the content that is posted directly by the users [86]. Although the Federal Trade Commission (FTC) states all mobile applications should have a privacy policy [97], a study of 43 popular mobile health and fitness apps for Apple and Android devices found that less than half posted a privacy policy, and less than half of these policies were accurate [49]. A review of privacy policies on 24 PHR systems reported that the descriptions of security and privacy measures were insufficient, and compliance with HIPAA regulations were low [98]. Many consumers lack the technical skills to control privacy online, such as to change the default privacy settings on social media sites or browsers, or to use advertiser opt-out sites [99101]. Increased consumer training on technical skills is needed to maximize use of the existing online privacy options.

Data Breaches

Disclosures of HIPAA protected medical data are a major concern. The enforcement provisions in HIPAA were significantly strengthened by the 2009 HITECH Act, which included the first federal data breach notification, instigated security audits, significantly increased fines and authorized HIPAA enforcement by the states attorney generals [102]. Yet data breaches of HIPAA protected medical information are increasing in frequency [102]. In a 2014 survey of 91 healthcare organizations, 90 % reported at least one incident in the last two years while 38 % reported more than five incidents [103]. When including only breaches involving at least 500 individuals, over 29 million patient health records have been compromised since 2009. Medical data breaches are well publicized in the press [104] and at a web site from HHS for all breaches affecting more than 500 patients [105].

Many people require access to medical records, including doctors, nurses, technicians, administrators, clerical workers, and those working in business associates such as insurance companies, billing, coding and transcription companies, pharmacies, medical suppliers, care facilities, and government offices. This fragmented nature of the US healthcare system makes data breaches particularly difficult to control since the risk of a breach is the product of the risk at each of the organizations involved. About 20 % of the recent breaches involved a business associate [106], many of which lack technical expertise [107]. Most breaches involve portable devices [106] and the most common cause is theft [102]. EMR are a prime target for theft, since they contain financial, credit, personal and insurance information, and medical identity theft is the fasting growing healthcare fraud [108].

It is harder to know how frequently breaches occur at data brokers as there is no current federal standard for breach notification by data brokers. However, large breaches have been reported, including at LexisNexis, Kroll Background America [109], Experian [110], and Acxiom [111]. PHR that are not associated with HIPAA-covered entities are regulated by FTC breach notification requirements [112].

Losing Trust

Although the public routinely gives away most personal information, medical privacy remains uniquely important to most, as underscored by the very existence of HIPAA and HITECH. The use of technology in medicine is widely supported but concern remains about the security of the medical information that is protected by HIPAA, such as in EMR, as summarized in Table 2. In a study of psychiatric outpatients almost 90 % had concerns about confidentiality with the use of EMR, such as unauthorized access within a university healthcare system, inappropriate use of information, and stigmatization [126]. There are serious consequences when patients fear their privacy is at risk. Patients may become selective about the information they provide, offering an incomplete or misleading description of their condition. In recent surveys, a substantial number of people said they would withhold data from their physician due to privacy concerns related to technology, as shown in Table 3. Patients who are worried about privacy are also less likely to seek care or return for follow-up treatment, or may seek care outside of their provider network undermining the benefits of care coordination [126, 128].

Table 2 National survey findings of adults in the US regarding privacy concerns about HIPAA protected medical records
Table 3 National survey findings of adults in the US on withholding medical information due to privacy concerns related to technology

Much of the general public is unaware of the large amount of medical and health data being amassed outside of HIPAA confidentiality protections. As the public becomes more informed about the secondary market for health data, concern about privacy and security of all medical data is likely to increase. This, in turn, may dissuade more people from seeking help or revealing the information to physicians. This is of particular concern to psychiatry, since patients with mental disorders are more likely to withhold information from their doctors than patients with other serious illnesses [1].

Protecting Data

The first step to protect medically related data from collection by data brokers and Internet companies outside of HIPAA protection is to recognize the scope of the problem. Actions needed to address this complex problem are outside the scope of this review. These include steps which are specific to individual activities, devices, and applications, and changes to federal and state laws.

Conclusions

Large quantities of health data are being created outside of HIPAA protection, primarily by consumers. Most of the data generated by consumers are controlled by data brokers and Internet companies that have no involvement in patient care and no training in medical ethics. Data brokers are combining health data with other consumer data to make health related profiles, which may increasingly be used to identify individual health status. The results of the predictive profiles may have adverse impact regardless of accuracy. As knowledge of data brokers becomes more widespread, more patients may avoid healthcare or withhold data from physicians due to privacy concerns, which may have especially serious consequences in psychiatry. The far reaching problems relating to the use and protection of medical and health data outside of HIPAA need to be addressed by broad collaborations of medical, legal, consumer, and technical expertise. In the interim, measures to increase awareness of the growth of medical and health data outside of HIPAA protection are needed for both clinicians and patients.