Keywords

1 Introduction

Calls to revise the US system of human research protections to adapt to changes in the practice of medicine and biomedical research are not new. Pragmatic clinical trials [1], community engaged research [2], and learning health systems [3] all pose unique ethical challenges that could not have been envisioned when the regulatory system was developed in the 1970s. Similarly, the use of “big data” in medical research has received increased attention in recent years as conducting such research has become cheaper and easier due to advances in computing technology. “Big data” is a somewhat vague term and used in a variety of different ways [4,5,6,7,8]. Here we focus specifically on the use of existing patient data combined across institutions in research that is subject to the Common Rule (e.g., federally funded or conducted at research institutions that choose to uniformly apply the Common Rule), and the application of artificial intelligence (AI) to that big data, particularly in research related to mental health.

The relationship between big data and AI in medical research is essential in some regards. IBM defines big data in terms of the three “V”s of volume, variety, and velocity [8]. Volume is obvious, and we will discuss the application of this to medical data below. Velocity refers to the speed at which data can be created and accessed and is relevant to big data and AI in medical research. However, the most salient “V” for this topic is variety. Daniel O’Leary describes the multiple sources and varieties of data, including wearable electronic devices that monitor our health, social media, phones, and radio frequency identification (RFID) [9]. AI’s influence on variety becomes essential when we discuss “unstructured data,” meaning data that are simply amassed and lack patterns or structure. To remedy this, “Under situations of large volumes of data, AI allows delegation of difficult pattern recognition, learning, and other tasks to computer-based approaches [10].”

This intersection of AI and big data raises ethical challenges, specifically for privacy, confidentiality, and informed consent [11,12,13]. Privacy and confidentiality concerns are only magnified when the focus of research is on mental illness or other stigmatized health conditions. Therefore, we argue, this intersection necessitates novel thinking around how to best fulfill the Belmont principle of respect for persons when conducting such research, particularly in the absence of any specific guidance from current research regulations for research using AI.

2 Big Data and AI in Biomedical Research

Opportunities abound for health researchers to aggregate and analyze data originally collected for non-research purposes. Perhaps the best example of the creation of very large, inter-institutional data sets for health research is PCORnet. PCORnet is an initiative funded by PCORI (Patient-Centered Outcomes Research Institute) that seeks to combine 11 Clinical Data Research Networks (CDRNs), 18 Patient Powered Research Networks (PPRNs), and one coordinating center to create a database of 100 million covered lives [9]. The CDRNs are health system-based networks, created by linking the clinical data warehouses of large institutions, while the PPRNs are operated and governed by patients and their partners. For example, the Mental Health Research Network (MHRN), a CDRN, combines data from13 health systems that serve approximately 12.5 million patients across 15 states (17% of whom have a mental health condition) [14]. The MoodNetwork PPRN aims to enroll at least 50,000 patients with major depressive disorder and bipolar disorder to provide longitudinal data through medical records and surveys and potentially participate in prospective comparative effectiveness studies [15].

The data in repositories such as PCORNET exists in both structured and unstructured formats, and AI can enable researchers to create structure where it is lacking. A unique challenge for big data research on electronic medical records (EMR) lies in data mining the typed “free text” notes that clinicians enter into the EMR. Notes have great research potential, but placing them in a structure that allows researchers to analyze them requires AI, as it would be impossibly complex and time consuming for people to do this unassisted. As O’Leary notes “Natural language, natural visual interpretation, and visual machine learning will become increasingly important forms of AI for big data.” Natural language in particular can enable AI to comb through free text notes in the EMR, provide structure, and ultimately enable their use in research.

3 Big Data Health Research and the Belmont Principles

Perhaps most obviously, PCORnet and similar research that relies on large data sets poses challenges to informed consent. Whether or not this was the original intention of its authors, the Belmont framework for ethical human subjects research and the resulting federal regulations elevate individual prospective informed consent above all other ethical considerations, prioritizing the principle of respect for persons above beneficence and justice while also making respect for persons synonymous with informed consent. Therefore it is no surprise that discussions of the ethics of health research using big data tend to focus on the challenges of informed consent. Simply stated, the key ethical challenge of big data health research lies in balancing respect for persons with the potential benefits. However, when respect for persons is inappropriately and narrowly conceived as individual prospective informed consent, as many have argued, [16, 17] this sets up big data research for ethical failure. Given the size of some data networks, the problem of informed consent in the context of big data seems intractable. Consent from all subjects is not merely “impracticable” (to use a regulatory term), it is impossible. In minimal risk research, when informed consent is not possible, the alternative is usually simply to waive consent and be done with it. However, we argue that a waiver, even when ethically permissible, does not demonstrate respect for persons, if we interpret the spirit of the principle of respect for persons to require extra protections not just for those with diminished autonomy, such as persons with mental illnesses that impair their capacity to consent, but for those who cannot give truly informed consent due to practical constraints. Here, we would like to stimulate discussion and analysis of other processes and measures that might be capable of demonstrating respect for persons in research more broadly defined, when informed consent is not possible. In order to do this, we must first discuss privacy and confidentiality in the context of health research that relies on big data, as the most likely and potentially most severe harms in such research would be from an informational breach.

4 Privacy and Confidentiality

In research, privacy is commonly understood as pertaining to information about which an individual has a reasonable expectation that access is controlled by the individual, whereas confidentiality is commonly understood as pertaining to information that an individual has entrusted to another, with an understanding that the information will only be used for certain purposes. The primary risks to subjects in research that uses large data networks are those potential harms that might result from a breach of confidential information. There is also the risk of dignitary harm as a result of a perceived invasion of privacy.

When patients discuss symptoms with their therapists, agree to take medications, or discuss mental health diagnoses, they generally believe that this information will not be shared with anyone except other health care providers and third-party payers. However, in reality, this information is frequently accessed by researchers, most commonly in chart review studies. Such research is done with Institutional Review Board (IRB) approval in a manner consistent with the Code of Federal Regulations that govern research, and without the patient’s consent. In ethical terms, the patient provides her physician with information, often sensitive information, in confidence with the understanding that it will be kept private, and that information (de-identified) is then shared with a researcher the patient has never even heard of without her consent. In a strict sense, the patient’s privacy has been violated, her confidentiality has been breached. This scenario occurs innumerable times every day across the United States…and importantly, it is allowable by the federal regulations as long as other protections are in place to. It is important to recognize that we do violate privacy and we do breach confidentiality when we engage in such research, and it is deemed ethically acceptable when appropriate steps are taken to minimize the possibility of data breaches (e.g., by not collecting and storing identifiers).

While under HIPAA, any disclosure of mental health therapy notes require patient authorization, there are certain parts of an electronic health record related to mental health that are NOT considered therapy notes, such as prescriptions and medication monitoring, modalities of treatment, results of clinical tests, and summaries of treatment plans, symptoms, and progress. Additionally, persons with mental illness also confide details of their symptoms, diagnoses, and medications to non-mental health practitioners, and these details, which may have serious legal or employment ramification if breached, may end up in various places in the medical record, including as free text notes that are not protected as therapy notes.

Because of potential dignitary harms of invasion of privacy, the sensitive nature of mental illness, and the significant harms that could result from a breach of confidential mental health related information, we will later propose that, as an ethical requirement, researchers should think proactively about how to engage patients and communities to conduct AI research in mental health that uses data from electronic health records.

5 Notification and Broad Consent: Ethically Insufficient

Research suggests that people generally favor the use of their data for health research [18]. This is used to justify the use of both notification (informing without getting consent) and broad consent (getting consent without fully informing) practices. While many institutions seek legal and ethical cover by including a notification to patients that data might be used for research, this is ethically insufficient for two reasons. First, these notices are often buried in a HIPAA privacy notice in the clinical consent, and so are read by few patients. Second, such notices offer no opt out option. Let’s say the patient did read and understand such a notice in her hospital’s privacy policies and simply decides that it is consistent with her values to allow her data to be used for research. In this case, then there is no violation of confidentiality or privacy. She need not even have full information concerning the research to be conducted, nor even information beyond that her data may be used for future unspecified research (with standard protections). Perhaps one of her deep moral convictions is that people should help others, she sees such research as an instance of helping others, and so agrees to allow her data to be used for research. Perhaps she is passionate about reducing stigma associated with mental illness and as a result, believes strongly in sharing her mental health information with researchers. The specific ethical reason is not important. Competent adults do not need full and comprehensive information to make autonomous, informed choices [19]. What is important, ethically, is whether or not the patient or subject consents to the data use.

There have been novel proposals to reevaluate the means by which we obtain consent, including blanket, broad, and dynamic consent [19,20,21,22]. However, these proposals still fall short of the traditional aims of informed consent in the context of big data, as potential research subjects cannot know at the time of consent the studies to which they are in essence consenting. Instead, they consent to vague categories of research.

6 A Broader Conceptualization of Respect for Persons and a Balance with Other Principles

No Belmont principle is absolute, and the Belmont report exhorts us to identify the relevant ethical principles and balance them against one another. There are great potential benefits to big data health research, especially when balanced against the very small chance of potential harms that may result from an unintended informational breach. The ethical consideration typically becomes whether or not the benefit of such data research outweighs the affront to respect for persons. This balancing, however, is not straightforward as respect for persons and beneficence in this case appeal to fundamentally different ethical concerns.

We seek to reframe the issue: If the risks are minimal and consent impracticable, and appropriate confidentiality protections are in place, waiving informed consent can only be ethically permissible if research demonstrate some effort towards demonstrating respect for persons through one or more of the strategies we suggest below.

The Federal Regulations that govern research recognize the ethical tension between respect for persons and beneficence. 45 CFR 46.116(d) contains a provision for waiving informed consent for certain types of research, such as research on data from clinical records. To grant such a waiver, an IRB must document that four conditions are met, the second of which is that the waiver or alteration will not adversely affect the rights and welfare of the subjects. The difficulty with this recommendation is it makes perfect sense to consider whether or not waiving the requirement for informed consent will adversely affect the welfare of potential research subjects. This is a simple application of beneficence, and is a simply risk/benefit calculation. Of course, in big data research on mental illness, stigma and the unique harms that could result from breach of confidential information, including legal and employment harms, must be taken into account, but the likelihood and magnitude of these harms can be anticipated and weighed against potential benefits of the research. What does not make sense is whether or not the rights of a potential subject will be adversely affected. While benefit and harm are terms that are amenable to considerations of degrees, rights are not. Rights are simply either violated or not. They cannot be “adversely affected.” If an individual is denied his right to vote, we do not say that his rights have been adversely affected, as though his right was decreased. It has been violated. Rights simply do not admit of degrees as harms and benefits do. So, to attempt to balance one Belmont principle (beneficence) that admits of degrees and can be increased, decreased, balanced, or ignored, with another Belmont principle (respect for persons) that, at least in this case, is simply upheld or not, is futile.

We could simply state that in evaluating such research, we recognize the great potential for benefit and the minimal potential for harm, and so are justified in waiving informed consent. However, it does not follow that our ethical responsibilities with regard to respect for persons have been completed. This is the key to the concern noted above, that thinking about respect for persons only in terms of informed consent will lead us to waive consent and be done with it. We argue that, in the era of AI and big data, we can and should conceptualize respect for persons as something broader than the right to self-determination through informed consent. Doing so develops respect for persons as something that admits of degrees , and the ethical obligation of researchers and IRBs then becomes thinking through how to better balance respect for persons with beneficence, rather than simply determine if/when consent can be waived. To do this, we should explore other potential means of demonstrating respect for persons that do not rely solely on informed consent and shift focus away from rights-based thinking. We can look to other models of research for means of doing this, primarily through patient and other stakeholder engagement and apply this to AI research on mental health.

7 Beyond Informed Consent

We would like to suggest ways that researchers might demonstrate respect for persons, not through individual informed consent but through patient and other stakeholder engagement. These suggestions are heavily influenced by the traditions of community engaged (CEnR) and community-based participatory research (CBPR), research conducted under the emergency exception for informed consent (EFIC) regulations, as well as some of the recent work in biobanking. Ranging from least to most “engaged,” we recommend notifying potential research participants that research using their personal health data may occur; sharing information about research results with the public; consulting with individuals who represent the interests of potential research participants; and including public members in research oversight activities at the institutional level.

CEnR and CBPR in mental health research are not new. These strategies have been used in mental health research for some time, and guidance exists on the overall ethical approach to mental health research [23, 24], as well as utilizing such methods in mental health research in specific populations [23, 25]. What presents a novel challenge is utilizing these methods in mental health research that leverages big data an AI. Identifying representative stakeholders from large data sets being used for multiple types of studies may be challenging. Additionally, when engaging stakeholders in clinical trials of medication or community-based intervention research the research is easier to explain and the aims more tangible than in AI research. However, as AI becomes more integrated with our daily lives, citizens are becoming more interested in the potential harms and benefits and may be likely to want to engage in research partnerships.

Notification

There is tremendous value in letting the community of potential research participants know what research you are planning to do. Time and time again institutional transparency has proven to have great extrinsic as well as intrinsic value. Many academic medical centers use a variety of strategies to let their patients know about the kinds of research that is going on and the fact that their medical records may be accessed appropriately for certain kinds of research. One common example is Research Match, a registry developed and utilized by a consortium of Clinical and Translational Sciences Award (CTSA) institutions [26]. Such practices should be implemented more widely, as they promote public awareness of research. To be sure, the regulatory conditions for waiving consent in minimal risk research have a requirement for notification “after participation,” but they are vague about which studies require notification, and it’s unclear why notification should wait until “after participation.” Notification should be considered across many more types of research, although evidence is needed regarding the most effective and respectful forms of notification.

Information

There is also tremendous value in letting people know the results of research—positive or negative. Initiatives are underway to improve dissemination of results of federally funded research and of clinical trials that are federally funded or conducted to gather data for applications to the FDA (NIH Open Access Policy, clinicaltrials.gov). However, the research community could be doing a lot better at this, including ensuring that results are published where lay people are likely to read them. This practice is common in CBPR and CEnR, and can maintain and improve community/academic partnerships [27,28,29]. For example, Dirks et al. report on how community member involvement in disseminating the results of research with a decision-support tool to aid in depression management can broaden reach and increase acceptability of the information [30].

Consultation

Asking potential research participants “What do you think about what we’re planning to do?”, which is qualitatively different from simply notifying, not only demonstrates respect for persons, it can also improve the relevance of research questions and findings (Note: this really only demonstrates RFP when researchers actually listen.).

EFIC research invokes a specific regulatory framework (the “Final Rule,” 21 CFR 50.24) that allows researchers to conduct research that is greater than minimal risk yet waives the requirement for informed consent, as long as certain requirements of the Final Rule are met. This research is conducted in settings in which consent is not possible (i.e., heart attack victims who are unable to provide consent due to their condition and cannot be proactively consented as it is unknown who will suffer a heart attack), yet the research is essential to advancing healthcare knowledge. This is important, as this is the only research that is greater than minimal risk that the regulations permit without informed consent, and so the additional safeguards become key [31,32,33]. It is also important because it establishes a precedent for regulating other means of demonstrating respect for persons beyond prospective written informed consent. One of those safeguards is that the researchers, with approval and oversight from the IRB, must conduct community consultation for the research study. The point of community consultation is not to gain community consent or proxy consent. Rather, the point is to consult with the community and learn whether or not the community thinks that this kind of research ought to be done in their community and what changes, if any, should be made to the research plan to make it more acceptable to the community. This provides an ethical model for using data for research purposes without consent. If it can be established that, just as in EFIC research, consent is not feasible and the benefit is great, a preferable, ethical alternative to doing nothing at all would be to engage the community in conversations about the research or use of data.

Several different models of community consultation have been employed in EFIC studies. Models include querying a convenience sample, random digit telephone surveys, targeted focus groups, large community meetings/public forums, community advisory boards, or some combination of these methods [34,35,36,37,38,39,40]. Each of these approaches has distinct advantages and disadvantages, and the appropriateness of each approach will vary depending on the purpose of the research.

Even better is to obtain ongoing consultation through mechanisms like community advisory boards (CABs), which are common in community engaged research (CEnR) [41]. CEnR is an approach to research that “provides communities with a voice and role in the research process beyond providing access to research participants” and may include working with communities to identify research priorities, systematically studying the views of community members regarding research protocols prior to implementation, community advisory and review boards, hiring community members as part of the research team, and including community members as co-investigators [24] Unlike EFIC, CEnR is not a regulatory model but rather an approach to research that follows a set of principles aimed at fulfilling ethical and process goals, such as establishment of an equitable, sustainable partnership between academic researchers and community partners (CTSA Principles of Engagement). Funders of AI research might consider requiring consultation or other forms of engagement when individual informed consent is not practical.

In CEnR, stakeholder engagement is not meant to be a replacement for individual informed consent, but when done correctly can demonstrate respect for persons—as well as for communities qua communities. The challenge is how best to engage key stakeholders in the research process in a manner that is not merely ad hoc or after the fact, but one that does so in the spirit of respect for persons. Developing plans and guidelines before engaging stakeholders will fail to involve community members in decisions regarding data use and will not foster a sense of ownership. A successful CEnR partnership requires engaging the community early in and frequently throughout the process eliciting input on all aspects (i.e., identifying concerns and needs of the community, employing community members as members of the research team, forming a community advisory council, etc.), and actually incorporating community input into the research design, implementation, analysis, and dissemination.

8 Include Participant Representatives in Project Leadership

In CEnR, relationships are ideally bi-directional, that is, respect for all parties is encouraged, acknowledging that the researcher has as much to learn from the community as the community does from the researcher. In some research, particularly when individual prospective informed consent is not possible, including community partners as part of the research leadership team can be essential in understanding how to best approach the community regarding the use of their data in research.

Specific to the use of AI in big data research in mental health, CEnR can help to truly engage the public in the oversight and goals of such research. Additionally, by engaging communities in a bi-directional manner in such research, the research will be improved by addressing the research priorities of communities as well as making the research more transparent, which can help ameliorate some of the well-publicized concerns that the public has about AI research [42, 43].

Increase Public Participation in Institutional Research Oversight

This suggestion is a bit different from others in that it refers to including the lay members of the public in research activities at the institutional level and is therefore not necessarily something that can be implemented in a specific study. Many calls have been made throughout the past several decades for more public members on IRBs [44]. Such calls note that while the regulations require non-affiliated and non-scientist members to serve on the IRB, a non-scientist might be an administrative assistant from the institution, and a non-affiliated might be a retired physician. Such examples likely fall short of providing a community voice for the lay public. Such an effort also broadens respect for persons to communities and not just individuals.

Many if not all of these activities can also be justified on other ethical grounds beyond respect for persons, but thinking of them in terms of what they can do to demonstrate respect for persons is helpful for big data health research studies in which individual informed consent would be impossible to obtain from every potential participant.

9 Stakeholder Engagement in AI Mental Health Research

AI research specific to mental health is beginning to grow but most is still in the proof-of-concept phase. Such research uses not only EMR data but also data from patient reported outcomes, brain imaging, novel monitoring systems such as smart phones, and social media. Much of this research has been aimed at improving diagnostic clarity, identifying mental illness earlier, personalizing treatment, or identifying patients at increased risk of suicide [45]. The extent to which patient stakeholders have been engaged in this research is unclear.

How might the lessons of community engagement be applied to AI research that uses big data in mental health? CABs are a ready example. Local and national mental health advocacy organizations could be contacted to provide representation on advisory boards for mental health research with big data. Such CABs could also be asked to weigh in on research priorities, use of confidential patient data, and broader community engagement strategies. . For example, in the MoodNetwork, patient stakeholders from a variety of different advocacy groups were involved in developing the network website, patient surveys, and recruitment materials [46].

A unique strength of the mental health community is the number of advocacy groups, many of which are already actively engaged in research. Groups like the National Alliance on Mental Illness (NAMI) and Mental Health America serve persons with mental illnesses and their loved ones daily, and could be ideal groups to engage in the guidance and oversight of this research. Additionally, there are organizations that advocate on behalf of specific diseases or populations. Dry Hooch is a veteran-led organization that provides, among other things, peer counseling for veterans in mental health and substance use disorders.

Researchers could use CEnR strategies to work with Dry Hooch to apply AI research on issues of veterans’ mental health. In such a scenario, veterans themselves would identify the mental health issues of relevance and importance to them, in conjunction with their academic partners. Additionally, veterans would have a seat at the table in the design of the research and data analysis, lending credibility to the use of AI in big data research on veterans’ mental health and building trust in the research enterprise.

10 Conclusion

The use of AI in conducting research on big data for purposes other than the reason for which was initially collected poses unique ethical challenges. Fortunately, there are examples particularly from EFIC research and CEnR that provide models for conducting this important work in a manner that adheres to the highest ethical standards. Engaging patient and other community stakeholders in meaningful, bi-directional, sustainable partnerships can help researchers demonstrate respect for research participants, even in the absence of direct interaction with individual participants.

In discussions of ethics of AI and big data health research, we encourage less focus on the technical aspects of informed consent and more imagination regarding ways to demonstrate respect for persons. This can be accomplished through implementation of some or all of the engagement strategies we have outlined. The engagement strategies presented here also promote other ethical aims, such as the requirement of social or scientific value, the incorporation of more and diverse public voices into the process of independent review, and the elevation of the values of good stewardship of research resources and transparency and public accountability.