Introduction

Breast cancer (BC) is the most commonly occurring invasive cancer among women, and approximately one million new cases are diagnosed worldwide each year. One of the strongest risk factors for the development of BC is having a close relative affected with the disease. On the basis of the increased relative risk of BC in first-degree relatives of a woman with BC and segregation studies on cases of BC in the families of affected women, it has been estimated that 5 % of these women carry a genetic risk factor transmitted according to a Mendelian dominant model. Twenty years ago, the BRCA1 and BRCA2 genes were identified by genetic linkage studies. Following the cloning of these two major BC genes, diagnostic testing for mutations in BRCA1 and BRCA2 has been routine clinical practice in many developed countries. It facilitates risk estimation and implementation of cancer prevention strategies and has the potential to influence cancer therapy [1, 2].

Since the discovery of the major predisposing genes BRCA1 and BRCA2 and the identification of the syndrome-related high-risk susceptibility genes PTEN, CDH1 and STK11, substantial progress has been made in identifying genetic causes of BC. Despite these advances, about 80 % of the familial relative risk remains unexplained. Current data suggest that a large proportion of the remaining familial aggregations of BC will be explained by several genes with a wide range of associated risks; moreover, sporadic cases may also be explained by a combination of susceptibility alleles [3]. Very simplistically, BC susceptibility genes have been classified into three groups, each associated with a different level of average BC risk and mutation prevalence in the population [4]: high-risk genes such as BRCA1 and BRCA2 in which rare monoallelic mutations convey high risk of BC and OC and explain about 25 % of the familial risk; common very modest-risk SNPs detected by genome-wide association studies (GWAS) explain about another 10 % of genetic susceptibility [5••, 6, 7], and rare, intermediate-risk variants have been identified more recently by a candidate gene approach and the resequencing of large numbers of familial BRCA1/BRCA2 mutation-negative cases and control groups (ATM, BRIP1, CHEK2, PALB2, RAD51C, RAD51D) or by an agnostic exomic sequencing approach [8, 9•, 10•] (Fig. 1).

Fig. 1
figure 1

Timeline for discovery of BC susceptibility genes, features, frequency and size effect of risk alleles

This review focuses on resources and different study designs that have been developed over the last two decades to identify new BC susceptibility genes and discusses the challenges that remain to orientate appropriate clinical management for carriers of a likely deleterious variant in one of these genes.

Hereditary Breast and Ovarian Cancer and High-Penetrance Variants

Rare, large families with multiple-cases of BC affecting several generations provided clear evidence that inherited factors are important causes of BC. The two major BC-predisposing genes BRCA1 and BRCA2 were identified by genetic linkage studies in the early 90s using microsatellite markers. Microsatellites are tandem repeats of simple sequences that occur abundantly throughout the genome. These genetic polymorphisms are stable enough that they can be used in genetic analyses. Linkage analysis statistically compares the genotypes between affected and unaffected individuals within a pedigree and looks for evidence that some alleles of the typed polymorphic markers are inherited along with the disease trait. Different categories can be assigned to putative gene carriers and non-carriers in the family, depending on their age and whether or not they have developed cancer (liability classes). When a marker and the gene harbouring a mutation responsible for the disease are close together on a chromosome, there is a higher likelihood that they will be inherited together as familial recombination events during meiosis are less likely to have interfered. The likelihood that a specific allele of a microsatellite and the studied trait will be inherited together is measured as a logarithm-of-odds (LOD) score. A high LOD score (3.0 or higher) indicates a strong chance that the gene is located near the given marker (≥1000:1 in favour), whereas a low score (lower than −2.0) means that the gene is almost certainly not near that marker locus.

Following the localization of BRCA1 and BRCA2 on chromosomes 17q21 and 13q12, respectively, the two genes were subsequently cloned [1114]. BRCA1 counts 24 exons (22 of which are coding) and BRCA2 counts 27 exons (26 of which are coding). The two genes share little structural homology but are involved in the same cell-signalling pathway. Their protein products (1863 and 3418 amino-acid long, respectively) are expressed in a wide range of tissues and are especially critical in controlling DNA damage repair, helping to prevent the accumulation of mutations in cancer-related genes [15]. Hence, BRCA1 and BRCA2 work as tumour suppressor genes, that is, a mutation in these genes causes loss or reduction in their function. Somatic loss of heterozygosity (LOH), the loss of a normal functioning allele at a heterozygous locus, is a genetic event frequently observed in the multistep process of BC tumour development and progression [16]. Carriers of a germline mutation in BRCA1 or BRCA2 are at risk of acquired loss of the wild-type allele, one of the common mechanisms of inactivation in hereditary BC [17].

Several hundreds of distinct germline mutations have been described for both BRCA1 and BRCA2, with few mutational hotspots [18]. Mutations are spread throughout the genes, and most of them are loss-of-function (LOF) mutation causing premature truncation of the protein, due to nonsense substitutions, small insertions or deletions, splice site alterations or large gene rearrangements. Pathogenic missense substitutions fall in evolutionarily conserved regions of the proteins, in some functional domains (such as RING or BRCT domains in BRCA1). Founder mutations exist in some populations (for example, in families of Jewish ancestry, in French-Canadian families from Québec or in Icelandic and Polish populations), and their presence can aid genetic testing [19].

Identification of a germline pathogenic mutation in patients with hereditary breast and ovarian cancer (HBOC) syndrome allows mutation carriers in the family to be included in cancer prevention programs, which are proven to be life saving. In addition, the absence of a BRCA1 or BRCA2 pathogenic mutation allows relatives reassurance and avoiding preventive oophorectomy. However, a clearly pathogenic sequence variant is found in only 10–20 % of subjects who undergo full-sequence BRCA1 and BRCA2 testing, and one complication resulting from genetic testing is that about the same proportion of patients screened for the BRCA1 and BRCA2 genes are found to carry of a variant of uncertain significance (VUS), i.e., a DNA sequence variant that is not easily classified as either pathogenic or neutral. Usually, VUS are either a missense substitution or a variant altering the splice junction consensus regions but outside of the canonical GT-AG dinucleotides [20]. Interpretation of VUS has become a major challenge for molecular diagnosis laboratories and genetic counsellors as these variants are problematic for the management of patients and their families who receive the ambiguous information. Over the last several years, several databases and open access tools have been created for biologists, clinical cancer geneticists and researchers to facilitate interpretation of novel genetic variations identified in at risk BC patients, such as the UMD-BRCA1/BRCA2 databases available at http://www.umd.be/BRCA1/W_BRCA1/ and http://www.umd.be/BRCA2/brca2, respectively [18], or the BRCA gene Ex-UV LOVD available at http://brca.iarc.fr/LOVD [21]. Several years ago, the Breast cancer Information Core (BIC) aimed to develop a system that could combine data from several independent types of analysis to arrive at a classification for BRCA1 and BRCA2 variants [22]. In February 2008, a Working Group on BRCA1 and BRCA2 VUS convened at the International Agency for Research on Cancer agreed on clinically applicable guidelines for VUS classification and began the diffusion of a tool for evaluation of VUS beyond the BC genetics community. The proposed Bayesian approach involves a multifactorial likelihood-ratio model based on co-segregation, personal and family history, tumour histopathology and co-occurrence of variants in trans. Integrated evaluation of a VUS includes a prior probability based on in silico analyses of the substitutions, both from the perspective of potential effects on mRNA splicing and potential effects on protein function. The classification system involves five classes of variants, each of which is associated with a given probability of being pathogenic. Testing recommendations associated with each class of variants have been proposed. The five category qualitative classifier is explicitly meant to help clinical cancer geneticists with patient counselling [23••], and over the last few years, a series of studies analysed many BRCA1 and BRCA2 variants, and now more 150 variants are classified with reasonable confidence [21, 2427]. The majority of the VUS, however, are still assigned to class 3 (0.05–0.95 probability of being pathogenic) [23••]. As BRCA1 and BRCA2 VUS are individually rare, an international and interdisciplinary consortium named ENIGMA (for evidence-based network for interpretation of germline mutant allele) was been set up in 2009 to gather enough genetic, clinical and histopathological information from a worldwide network of laboratories and hospitals in order to define clinical relevance of the problematic variants (Table 1) [28].

Table 1 Consortia aiming at deciphering the heritability of BC

Even classification of clearly pathogenic mutations themselves (“Class 5 variants” with >0.99 probability of being pathogenic) has several challenges. Family-based studies showed that mutations in BRCA1 and BRCA2 are inherited in an autosomal dominant fashion with incomplete penetrance: this means that the majority of subjects carrying a BRCA1 or BRCA2 mutation will develop cancer during their lifetime, but not all. Moreover, there is evidence that variation in penetrance estimate exists among mutation carriers [29]. Genetic epidemiology studies showed that breast and ovarian cancer risks in BRCA1 and BRCA2 mutation carrying families vary according to the location and the origin of the mutation [2935]. Additionally, it has been shown that genetic and non-genetic factors influence cancer risk for BRCA1 and BRCA2 mutation carriers—a fact that may help explain variation in risks between families [29]. It is important that these inter-individual cancer risk differences in both age of cancer onset and site of the cancer could be taken into account when choosing risk reduction strategies. Two consortia are conducting studies that aim to achieve a more reliable estimation of individual cancer risks: the International BRCA1/2 Carrier Cohort Study (IBCCS) is exploring the extent to which non-genetic BC risk factors such as radiation exposure through chest X-rays or reproductive factors can modify cancer risk in BRCA1 and BRCA2 mutation carriers [3639], while the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA) is investigating the role of low-penetrance common breast and ovarian cancer genetic risk factors, discussed hereafter, in BRCA1/2 carriers [4045] (Table 1).

Polygenic Inheritance and Low-Penetrance Variants

Overall, mutations in high-risk BC genes explain about 25 % of the familial risk, and the frequency of BRCA1 and BRCA2 mutations in BC families is highly correlated with the degree of family history. They are present at very low frequencies in the general population (minor allele frequency (MAF) <0.1 %), and explain less than 5 % of total BC predisposition [46]. Taken together, these observations have led to the hypothesis that the so-called “sporadic cases” (i.e., cases with no known affected close relatives) could be attributable in part to allelic variants present in more than 1–5 % of the population and that will confer lower risks [47, 48]. To identify those, association studies are being conducted on thousands of cases who are unselected for family history and thousands of unrelated controls. The first studies of common genetic variation and BC risk looked at candidate genes involved in specific pathways that are important in BC biology, including DNA repair, cell-cycle regulation, carcinogenesis, apoptosis, carcinogen metabolism and steroid hormone metabolism. Most of these studies were based on putative functional variants or on tag SNPs, i.e., SNPs correlated with, and therefore could serve as a proxy for, much of known remaining common variation in a region. Although numerous associations have been proposed from candidate gene analyses, few definitive susceptibility alleles were unequivocally identified through such approaches. A notable success though was the demonstration that the common missense substitution p.Asp302His in the CASP8 gene, which encodes a regulator of apoptosis, was associated with a moderate reduction in BC risk, and the association was convincingly replicated through the large multicentre Breast Cancer Association Consortium (BCAC) (p value: 1.1 × 10−7; odds ratio [OR] 0.88; 95 % CI 0.84–0.92) [49, 50]. Soon after this first success, large-scale case–control studies were greatly enhanced by the development of commercial ‘SNP chips’ or arrays that capture most, although not all, common variation in the genome. These high-throughput genotyping technologies have allowed conducting empirical GWAS with very low error rates and at very low per SNP genotyping costs. However, because associations between SNPs and BC show low ORs and given the very large number of tests that are required to obtain reliable signal, huge sample sets are needed. Different strategies have therefore been proposed to reduce genotyping costs such as procedures for improving power by inferring genotypes, by combining data across studies or by applying multiple stage designs where “best hits” from one study would be followed up in independent case–controls series [51]. As for many other complex diseases, these strategies have proven successful approaches for the identification of common low-risk loci without prior knowledge of location or function, and they have provided an opportunity to improve our understanding of the aetiology of BC. The two first GWASs of BC were conducted concurrently by the BCAC [5••] and by the Cancer Genetic Markers of Susceptibility study [6], mainly in populations of European origin. Subsequently, additional GWAS was undertaken in other populations and a recent meta-analysis of nine GWAS that included 10,052 BC cases and 12,575 controls of European, followed by a replication study on a subset of SNPs genotyped in 45,290 cases and 41,880 controls identified further genome hits leading to the identification of over 70 loci involved BC susceptibility [52]. Combined analyses now suggest that more than 1000 further loci are yet to be discovered and that part of the remaining risk result from a combination of multiple common variants, each conferring a small effect on BC risk, with ORs usually between 1.2 and 1.5 [5••, 53]. Assuming a polygenic model, many low-penetrance SNPs may have a cumulative effect on both the overall risk of disease [54] and on disease onset [55].

These case–control studies have highlighted an enrichment of genes involved in tumorigenesis in model systems, cell death and differentiation. Intriguingly, the majority of BC risk-associated SNPs fall in non-coding or intergenic regions, and the underlying biological/functional impact of these variants is yet to be elucidated. GWASs have also revealed that pattern of association is different by population (e.g., Europeans vs. Asians) [56]. In this regard, follow-up studies in different populations with different linkage disequilibrium patterns will be helpful for fine-mapping studies. Another finding is that some SNPs are associated with risk of a specific BC subtype. For instance, the association at the 6q25 locus near the oestrogen receptor gene ESR1 is found in ER-negative BC [56], and not unexpectedly, the same SNPs showed an association with BC risk in BRCA1 carriers [41]. Another lesson from GWASs has been highlighted by the large-scale collaborative genotyping experiment involving >200,000 individuals from four consortia (COGS) and the design of a custom genotyping array with about 211,000 SNPs (iCOGS chip) is the detection of pleiotropic loci associated with risk of three hormone-related cancers, namely breast, ovarian and prostate cancers. In addition, some environmental factors specifically alcohol consumption and parity seem to modify the association between some SNPs and cancer risk [57]. To pursue the dissection of the genetic architecture of BC and other cancers, a second dedicated genotyping array, the OncoArray (>600,000 SNPs), has been recently designed, and genetic analysis is on the way to further characterize the genetic and environmental bases of breast, ovarian, prostate, colorectal and lung cancers [58].

Rare Moderate-Risk Variants

Recent genetic epidemiology studies of BC have tended to focus on the third class of BC susceptibility genes. Classical linkage analyses were no longer successful as pathogenic variants in the as yet unidentified genes are either not penetrant enough or frequent enough to produce LOD score signals [59, 60], nor could these genes be identified by GWASs due to the very low frequency of these variants in the general population. Typically, the pathogenic alleles of the so-called “moderate-risk” or “intermediate-risk” genes exemplified by ATM, CHEK2, PALB2, genes of the MRN complex (MRE11, RAD50 and NBN) and RAD51 paralogs (RAD51C, RAD51D and XRCC2) [8, 9•, 46, 6264, 64••, 6570] confer ORs of 2.0–5.0, and for some of these genes, the summed frequency of the pathogenic alleles is close to 1 %. Approximately, 5 % of the familial risk is likely to be explained by mutations in such genes. Evidence for this class of BC-predisposing variants actually dates back to the late 80s when Swift demonstrated that carrier status for recessively inherited ataxia-telangiectasia (A-T) was associated with a threefold-elevated risk of female BC [7175]. Recent case–control, case-only and control-only mutation screening studies of the A-T predisposing gene ATM confirmed that protein-truncating (LOF) variants confer increased risk of BC [61, 62]. Moreover, in a pooled analysis including all published ATM case–control mutation screening studies as well as new data, Tavtigian et al. conducted a separate analysis of LOF variants and of rare missense substitutions. In addition to confirming the contribution of LOF variants, this study highlighted the importance of the most severe grades of rare ATM missense substitutions (as predicted by the align-GVGD algorithm [76]) in BC susceptibility and showed that almost all of the evidence for some of the rare missense substitutions being pathogenic could be attributed to those falling in the key FAT, protein kinase and FATC domains [61]. Likewise, the spectrum of pathogenic variants in RAD50, MRE11A and NBN includes a relatively high proportion of rare (with MAF <0.1 %) missense substitutions located in the conserved functional domains of the gene products forming the MRN complex [46]. The case of CHEK2 is particular since two founder mutations are predominant in some populations: the carrier frequency of the truncating mutation c.1100delC is 0.5–1.0 % in several northern European countries and the carrier frequency of the missense mutation p.Ile157Thr is above 1.0 % among individuals of Slavic ancestry [77, 78]. However, in more mixed populations, half or more of the susceptibility alleles consist of rarer missense substitutions [63, 79], and overall, it is estimated that CHEK2 pathogenic variants account for 2.5–3.0 % of early onset or familial BC cases [80]. Finally, the characterization of PALB2 as a BC susceptibility gene through familial and population-based studies showed that LOF mutations in PALB2 confer a higher risk of BC than do LOF mutations in ATM and CHEK2 [8, 65, 81] and may overlap with that for BRCA2 mutation carriers. The mean absolute BC risk for PALB2 female carriers by 70 years of age was 35 % (95 % CI 26–46), which is comparable to the mean absolute BC risk for BRCA2 carriers estimated by Antoniou et al.: 45 % (95 % CI 31–56) [82]. This would justify routinely offering genetic testing for PALB2 along with BRCA1 and BRCA2. Hence, PALB2 is the first example among the “new” BC susceptibility genes for which identified mutation carriers could be informed of optimal clinical screening and treatment [64••, 66].

Until very recently, rare intermediate-risk variants were identified by screening carefully selected candidate genes involved in the homologous recombination repair detection and signalling pathways involving BRCA1 and BRCA2 in large numbers of familial BC cases or early onset BC series, as well as population control groups using traditional mutation detection techniques such as Sanger sequencing, dHPLC or high-resolution melt curve analysis. Their discovery is now accelerating thanks to the introduction of massive parallel sequencing (MPS) in both research and diagnosis laboratories and studies aiming to explore entire DNA repair pathways or the exome in large study samples.

Until exome sequencing of thousands of cases and matched controls becomes available, a successful approach has been to combine a two-stage study design where whole-exome sequencing of women affected with BC from highly selected multiple-case BC families is followed by case–control mutation screening of candidate gene(s) plus additional genotyping in BC pedigrees. So far, gene prioritization for follow-up studies has been based on knowledge of biological function of the genes and for more than one highly selected family to carry a variation in the gene. Hence, among the dozen of possible candidates, the RAD51 paralog XRCC2 was picked as a plausible candidate because (i) exome sequencing of multiple affected relatives from 13 high-risk BC families identified a LOF mutation partially segregating with BC in one family and a very rare likely deleterious missense variant in a second family, and (ii) Xrcc2-knockout mice present with genetic instability due to homologous recombination deficiency [83]. The confirmation of the involvement of XRCC2 in BC susceptibility came from the large-scale validation step where the entire coding sequence of the gene was screened in 1308 cases and 1320 frequency-matched controls recruited through population-based sampling by the BC Family Registry [9•, 84]. However, the association was not replicated in an independent study involving 3548 non-BRCA1/2 familial BC cases and 1435 healthy controls, although a relative risk smaller than two could not be excluded [85]. Following the same study design, the RINT1 gene was further investigated because exome sequencing identified rare mutations in 3 out of 49 multiple-case BC families, and because this tumour suppressor gene encoding the RAD50-interacting protein 1 is essential for maintaining several aspects of the Golgi apparatus dynamics as well as the integrity of the centrosome, which coordinates mitosis [10•]. Interestingly, this study showed that RINT1 carriers in BC high-risk families were also at increased risk of developing Lynch syndrome–spectrum cancers.

Beyond the laboratory challenge imposed by large-scale of resequencing projects, such studies have also illustrated the complexity of variant annotation using various bioinformatics tools, which, although imperfect, is a necessary step to classified the variants and to pool those with a a priori similar severity grade in order to increase the power of the statistical tests. Thus, a second challenge for MPS projects is to be able to conduct a statistically powerful enough analysis of the multitude of rare sequence variants identified. Moreover, combining exome sequencing data from different studies will require effort to harmonize analytical and bioinformatics pipelines for the analysis of NGS data. To this end, the COMPLEXO consortium was formed to facilitate international collaborations, an essential step to advance the identification of additional moderate-risk BC susceptibility genes by increasing the likelihood of identifying causal variants in the same gene in multiple families [86].

Despite the unclear clinical utility of the majority of the already identified moderate-risk genes, gene panel testing has already become a reality. However, for many genes the evidence for association is weak, risk estimates for carriers are imprecise and biased (if conducted at all), and very often variants in the “new” genes do not achieve clinically actionable classification, the vast majority of them being considered as VUS. Therefore, neither individuals carrying a deleterious variant nor a VUS can benefit from knowing their mutation status because risk, penetrance and appropriate clinical management strategies have not yet been established. Risks associated with tumour subtypes or other tumour localizations are even more imprecise. Hence, as for BRCA1 and BRCA2 evidences based on co-segregation, frequency and functional studies need to be gathered for variants in each gene, and improvement of in silico evaluation is required to interpret test results. Curated data repositories for clinically relevant variants reviewed by expert panels are also needed.

Eventually, identifying causal genetic variants in novel genes associated with susceptibility to BC will make it possible to improve the medical management of women at risk and to adapt the surveillance of these women according to risk assessments based on new tests. If the risk is considered high, early and regular magnetic resonance imaging (MRI)-based screening could be offered, and prophylactic surgery could be discussed.

Conclusion

Identifying specific genes and gene variants that predispose to the development of cancer is important for a greater understanding of the biological pathways that are involved in tumourigenesis, elucidating how environmental factors may exert their effects in combination with genes, and also identifying individuals whose risk is high enough to benefit from existing risk reduction strategies. Indeed, the identification of inherited mutations in a number of cancer-predisposing genes has been a major step for genetic counselling. However, currently approximately 80–90 % of families who are tested in a clinical setting for BRCA1 and BRCA2 mutations receive a negative report. The identification of the majority of BC susceptibility genes will be a great advance for these women who could be offered additional testing. At the moment, they are counselled solely on the basis of their family history; identification of additional susceptibility genes, which would allow more precise estimation of their risk, would be of great value. With the implementation of MPS technologies in genetic testing laboratories, the selection criteria for tests for these mutations are becoming increasingly wide-ranging as clinicians come to recognize the importance of such information for improving patient management.

Hence, investigating genetic factors in phenotypically well-described high-risk populations and the possibility to conduct large-scale validation in national and international research resources will be extremely useful to further decipher the genetic architecture of BC. It will provide evidence based on which a larger number of susceptibility genes for BC will be routinely screened in the future. In turn, this may open new therapeutic avenues, as it has been the case with the use of PARP inhibitors in carriers of BRCA1 and BRCA2 mutations, and significantly progress personalised medicine in areas such as BC prevention, risk estimation, early detection, screening recommendations, treatment selection (genotype specific/targeted therapies) and improved prognosis, leading to the improved clinical management of women at genetic risk of breast cancer.