Introduction

Genetic biobanking has become a vital component in research investigating the underlying genetic mechanisms of certain common diseases, such as cardiovascular disease, diabetes, and multiple sclerosis. Over the past decade, our understanding of the genetic risk factors involved in common diseases have greatly benefited from technological advancements and the increased availability of large repositories of genetic specimens. Research using these new technologies and genetic biobanks has contributed to identifying risk loci for numerous conditions, such as multiple sclerosis [1] and colorectal cancer [2], improving the treatments and diagnostics for conditions, such as prostate cancer [3], tacrolimus dose requirements for kidney transplant patients [4] and development of diagnostic testing for rare conditions such as pseudoxanthoma elasticum (PXE) [5, 6]. Genetic biobanks will continue to be critical in improving the identification, diagnosis and treatment of individuals with common and rare diseases.

Currently, to understand genomic information and implement appropriate clinical genomic programs, it is critical to have large collections of clinical samples with associated health data and the ability to track health and clinical activities over time. Research using samples from large biobanks are essential in understanding genetic risks for common disease caused by gene variants with small effect sizes and uncommon environmental and other risk exposures that impact health. Furthermore, studies utilizing biobanked samples are useful in developing personalized therapeutics, targeting biomarkers in disease progression and prognosis, and implementing personalized medicine projects [7∙]. The size of the collection, long term storage of samples, interactions with and consenting of participants, preparing samples for high-throughput analysis, and the long term management of expectations and research are what set biobanks apart from other large research endeavors. Managing these differences in an economical and sustainable model is critical for survival of the biobank [8].

This paper will describe some of the recent developments and controversies associated with biobanking over the past year. We have used the most recent publications involving genetic biobanks in translational medicine studies, and will further discuss the impact this research may have on the greater genetics community and clinical care. Recent developments on the topic of sample procurement and preservation are also critical issues in the field of biobanking, but will not be reviewed here.

Genetic biobanks are usually large collections of human genetic specimens (DNA and/or RNA) that are linked to relevant health and personal information. Over the years, biobanks have evolved in response to the changing needs of technology, investigators, and regulatory pressures, resulting in the creation of a variety of biobanks. Population-wide biobanks have been established in many different countries such as Iceland, the UK, Estonia, Canada, the United States, Finland, Australia, and South Korea, to name a few. Population biobanks have largely been involved in international efforts to harmonize data and samples; allowing for meaningful collaborations that span many countries, funding agencies, governance structures, and populations [7∙]. This type of cooperation among biobanks leads to increased statistical power and sample size, which is particularly important when studying rare diseases and gene variants with small effects [9]. Hospital-based or single institutional biobanks which may include smaller collections of samples or samples from multiple studies with common storage and governance, may increase their power by joining together to form networks or consortia to execute research studies [10∙]. There are other biobanks focusing on amassing large populations of samples from persons with specific conditions such as AIDS [11∙], diabetes [12], prostate cancer [13], or psoriasis [14]. Rare disease biobanks and biobanks created through consumer websites are increasingly becoming available, particularly as disease advocacy organizations (DAO) and genetic testing companies recognize the ability of motivated organizations and individuals to accelerate translational research [15, 16].

In response to the needs of researchers to access large numbers of samples for genomic research, different models have been implemented by biobanks to recruit as many participants as possible. Some biobanks are created by compiling collections of samples and data from multiple research projects, while other biobanks enroll participants directly into the biobank. Both models have been fairly successful, with many biobanks having amassed thousands of participant samples with associated clinical and environmental data for genetic research purposes. Virtual biobanks have also been created to help investigators locate samples from different biobanks for testing and data mining to address the needs of investigators obtaining diverse samples or enough samples that meet specific criteria [11]. Furthermore, tools such as the Informatics for Integrated Biology and the Bedside platform (I2B2) allow biobanks connected to electronic clinical information sources to integrate and analyze large amounts of data from multiple health record systems [17].

As biobank samples are increasingly used for translational research and clinical implementation projects, questions about appropriate means to have ongoing engagement with participants, what are the best consenting methods, returning personal results and other policy issues must be addressed by each biobank. Governance structures and engagement of study participants and other consultants will help to ensure that these questions are addressed adequately.

Two critical policy issues that have a far reaching impact on the use of samples, is how participants donate their genetic and health information, along with the permissions associated with the use of their data, and the contacting of participants following enrollment.

The Nature of Informed Consent in Genetic Biobanks

There is much debate surrounding the use of informed consent when enrolling participants into a biobank and, if informed consent is used, what the consenting process looks like. Currently in the United States, research involving human subjects is regulated and subject to government rules and regulations such as obtaining informed consent. This occurs when, “a living individual about whom an investigator (whether professional or student) conducting research obtains (1) Data through intervention or interaction with the individual, or (2) Identifiable private information” [18]. Whether or not a biobank defines research on its samples as human subjects research or not, dictates the consenting process and how participants are enrolled into the biobank.

For those biobanks that have determined that their repository does not meet the criteria for human subjects research, under United States’ federal guidelines, they do not need to consent their participants for any research being performed on their samples. For some biobanks, such as biobanks that allow research to be performed on dried blood spots left over from newborn screening, this policy has recently been the source of substantial controversy [19, 20∙]. Other biobanks, such as Vanderbilt University Medical Center’s BioVU biobank, have determined that they are not performing human subject research, but have decided to give patients the choice of opting-out of participating. They have designed their biobank around using residual blood samples and de- identified data, which exempts them from having to obtain consent from their participants and follow the other federal regulations for the protection of human subjects [21].

Biobank participants do go through some type of informed consent process to enroll in the majority of biobanks, including some that use residual samples and those that collect samples at the time of enrollment [2224]. However, there is controversy surrounding the nature of that consent. Many biobanks have opted to consent participants using a broad consent for future use of the sample, where the type of genetic research that might be performed on samples is undefined [8]. Some have criticized this consent model for being too vague and that participants cannot truly give informed consent for undefined research purposes [25, 26∙]. While others, using the same ethical principle of autonomy, state that giving broad consent can be informed consent [27]. What has always been raised as an alternative model, consenting participants prior to each use of their sample, has been deemed impractical [25, 27, 28] despite the preferences of the general population [29]. A tiered consent has also been suggested, where participants are allowed to choose from a predetermined list of options and limitations that would govern the future use of samples, but similar concerns with this model have also been raised [19]. More recently, an on-line research portal such that researchers are able to have an ongoing engagement with biobank participants, allowing them to consent to new types of research in “real time”, has been proposed as a consent model. However, there is still some debate regarding the use of this model and if it is able to adequately address some of the ethical concerns better than the current broad consent model [26∙, 30].

Returning Results to Biobank Participants

Returning information to research participants has been the subject of much discussion and debate. Many biobanks inform their participants about research results by providing aggregate information to all participants. This can be in the form of a newsletter or other methods [31]. Personal results of diagnostic tests and biometrics conducted for participation in the biobank are often returned to participants. This may include results of baseline measurements such as blood pressure, body fat, and lung function tests to complete health evaluations [32]. However, there is considerable controversy regarding the ethical obligation of researchers to return individual genetic or genomic results to participants, whether they are incidental findings or research results. Results can be generated from the initial studies (if a biobank is created from data from several studies), studies performed by the biobank, or from secondary research studies using biobank samples.

In the United States, one factor that impacts a biobank’s ability to return individual research results is if the analysis was performed in a Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. The CLIA states that clinical tests can only be performed in a CLIA-certified laboratory, which means if a biobank has genomic results on a participant, they cannot be returned if the results might impact diagnosis, management, or the physician’s or patient’s decision-making (http://wwwn.cdc.gov/clia/) unless the analysis was performed or the results validated in a CLIA-certified laboratory.

If genomic results are to be returned to participants, it is unclear what role biobanks should play in this process. Some biobanks have policies on whether or not they return results to participants [33]. For those biobanks that use de-identified samples with no method to re-identify them, they cannot return any individual results to participants. However, for those biobanks that have the ability to identify or re-identify their participants, there are ongoing debates regarding (1) whether or not there is an obligation to return individual results (whether from their own studies or secondary studies using their samples), (2) if they should return results, which results should be returned, (3) when should the results be returned, (4) who should return the results, and (5) who should pay for all of this (the biobank, the participant, or the researcher who performed the research).

There have been a number of papers published as a result of considerable deliberation and scholarly review of this topic. Most of these papers provide guidelines and recommendations as to what type of genetic research results should be returned to participants [3436, 37∙]. If individual results are to be returned, most guidelines agree that before returning any results, they need to be scientifically validated and the nature of the results with regards to the risk of developing a condition, the severity of the condition, and available treatment options needs to be examined [3436, 38]. For example, Fabsitz et al., recommend that individual genetic research results should be returned in a timely manner if they meet all of the following criteria: (a) The genetic finding has important health implications for the participant, and the associated risks are established and substantial, (b) The genetic finding is actionable, that is, there are established therapeutic or preventive interventions or other available actions that have the potential to change the clinical course of the disease, (c) The test is analytically valid and the disclosure plan complies with all applicable laws, and (d) During the informed consent process or subsequently, the study participant has opted to receive his or her individual genetic results [3436, 37∙, 38]. Some of the recommendations go into more detail than others, but all recommend considering the risks and benefits as well as ensuring the validity of the data before returning research results. To help make policy decisions regarding returning individual genomic research results to participants, biobanks usually gather input from many sources, including community and scientific advisory committees, and surveying their target populations. A number of biobanks have successfully managed participant relationships in a manner that encourages ongoing interaction with participants and lays the groundwork for conversations about returning genomic results, such as the Coriell Personalized Medicine Collaborative [39].

Privacy and Biobank Samples

The privacy of personal information has often been cited by participants as a large concern regarding biobank enrollment [4042]. This is particularly true in the United States where concern about third-party access to private information is especially acute. Recent studies have suggested that this may not be as critical an issue in non-US populations [43]. However, in the US, participants have reasonable call for concern as there are many examples of companies and health care systems where breeches of confidential data have occurred. Biobanks with linkages to electronic health data have developed a variety of methods for dealing with these issues, including anonymizing samples, de-identification and coding of samples and data, and un-coupling genetic and health data from identifiable information [44]. As most biobanks serve as an honest broker in that they provide de-identified samples and data to third party investigators for research purposes, they must have strict policies and procedures in place for managing and protecting the confidentiality of the information and samples that have been entrusted to them. The issues of privacy become even more critical as biobanks harmonize and share data, and as sharing research data becomes a request of funding agencies [45]. Those involved in genomic research, including funding agencies, regulatory bodies and investigators, will need to balance the interests and values of research participants while making policy decisions regarding genomic research.

Research using Biobanks

The characteristics of a biobank, such as the types of samples, the population sampled, and recruitment methods, strongly influence the type of research conducted using their samples and data. Examples include: (1) Population-based biobanks with a high response rate to recruitment from a population are well- suited for research examining the incidence of genetic conditions and defining genetic factors associated with common diseases [46], and (2) Disease-focused biobanks with lower response rates to recruitment or single site biobanks are good repositories to perform research that identify genetic response to treatments, molecular targeted therapies, defining genetic and environmental risks associated with a condition, and biomarkers that can better classify disease status and progression [4, 8, 12]. Consent methods and the ability to recontact participants also affect the use of samples downstream. Studies requiring ongoing contact or follow up surveys and evaluations, which cannot be obtained through other means, must have access to a population that has agreed to be recontacted and are likewise engaged with the biobank. Additionally, biobank participants who have been consented to allow their samples to be used for specific disease research limit the further use of samples for other types of research purposes. As the use of biobanked samples become more common and access to large populations more critical for research, the focus on broad use of samples and data will become even more critical.

Research using Biobanks Linked to Electronic Health Records

Access to clinical and other data sources about their participants is an essential component for research studies. Biobanks in a number of countries link to national health or other health related databases to obtain retrospective and prospective information on their participants. In the United States, the fragmented health system presents challenges to obtaining health data beyond what may be available in a single health system. Further, many have questioned the viability of using electronic health records (EHR) for research purposes. Biobanks linked to large health systems or networks may have more complete medical information than other tertiary care centers, particularly those in urban settings where patients move frequently from one health care provider to another. However, it is possible to conduct studies based on longitudinal EHR data as proven by research conducted in the eMERGE Network [47].

The electronic MEdical Records and GEnomics (eMERGE) network is a NIH-funded consortium of biobanks that are linked to electronic medical records, which have developed methods and conducted early stage research demonstrating the usefulness of biobanks in translational medicine research. The eMERGE network is currently comprised of nine different biobanks, including both adult and pediatric participants. The network has developed tools for genomic research using EHR such as defining methods for selecting phenotypes through EHR, sharing the phenotypes across multiple institutions and EHR systems, and conducting genome-wide association studies across the network. Additionally, this consortia has helped to define some of the ethical and social issues associated with genomic research and demonstrated methods for securely sharing data and addressing the privacy of genomic and clinically-derived data. Some of the issues identified by the network included whether and when to return research results, how to engage biobank participants in discussions about research on their samples, and what are the critical components of the consent process for biobank participants [10∙, 45, 48]. The network is currently applying their experience with EHR and genomic data by studying the return of genomic results to patients through implementing clinical decision support tools and working with physicians to access this information through the EHRs. Biobanks linked to EHRs, such as the eMERGE network, provide a unique opportunity to study new ways to interact with health care professionals and patients around genomic information through already established EHR. These studies begin to address the much needed challenge of advancing the science by serving to educate physicians and patients through decision-support tools in the EHR.

Research using Disease-Focused Biobanks

Compared to population-based or broad biobanks, whose samples can be used as both controls and cases in studies examining many different types of conditions, a number of biobanks are collecting samples related to specific diseases [4951]. Some of these biobanks were established to create a resource of samples and clinical data for purposes of optimizing treatment for patients with a particular common condition and tracking outcomes. Such is the case for the Danish Center for Strategic Research in Type 2 Diabetes (DD2). This biobank has begun to collect samples (blood, DNA, plasma, and urine) from newly diagnosed Type 2 Diabetes patients throughout Denmark during a 5 year period. Clinical information will be gathered through a variety of Danish population based registries. This nation-wide Biobank of newly diagnosed Type 2 Diabetes patients is the first of its kind and will provide ongoing information about the progression, treatment and interventions in Type 2 Diabetes with a focus on personalized treatment [52]. Advantages of the research from these types of biobanks are the focused nature of the collections and the ability to concentrate knowledge and resources on a specific condition. The increasing emphasis on creating disease focused biobanks and developing methods for precision medicine is evidenced by the efforts of the National Cancer Institute (NCI) which has created the Biospecimen and Biorepository Research Branch (BBRB) to develop standards and processes for obtaining high quality samples for research. In addition to developing standards for sample maintenance and informatics systems management, guidelines for ethical, regulatory and societal issues related to biobanking are also considered [53].

Genetic DAO have also begun to develop their own biobanks to further research on rare conditions, leveraging their relationships with patients and families as well as their extended knowledge of the rare conditions they represent. A study by Landy et al. [54] found that 45 % of respondents to a study about DAO participation in clinical research were involved with a research registry or Biobank. Many DAO have made significant contributions to finding disease genes, such as with the example of PXE International, and have made tangible contributions to development of a clinically available genetic test [5, 6, 15, 55]. As DAO continue to participate in the development, and more significantly, establishment of biobanks for rare diseases, they will shape the types of clinical research conducted by and through genetic biobanks.

Conclusion

Biobanks are redefining many aspects of research such as allowing ongoing access to research populations, exploring methods of consent and governance, and creating new models for conducting translational research. The large size of many biobanks coupled with the enormous potential of EHRs and other electronic health data, place this type of research in the forefront of making significant contributions to health care. While some biobanking methods have proven less productive than others, they provide many lessons learned regarding appropriate strategies for future research [8]. Redefining aspects of clinical genetic research will also affect the workforce and how results of research will be defined and translated into healthcare. As applications of biobanking research become more relevant to clinical care or involves implementation studies such as in eMERGE II [56], more clinical genetic specialists, such as genetic counselors, who are familiar with the many issues associated with genomic research and can relay personal research results, as well as develop educational materials for patients, will be needed [57, 58]. New paradigms are currently needed for understanding and relaying research results made possible by current and future genetic technologies as they evolve. Biobank research has demonstrated that it can be instrumental in advancing genetic research and with understanding how the findings can be incorporated into clinical care.