Introduction

Women with deleterious mutations in BRCA genes face an increased risk of breast cancer (BC) and ovarian cancer (OC), and often develop cancer at a young age. Estimates of the lifetime cumulative risk (penetrance) of BC associated with BRCA mutations range from about 80 % in earlier studies on high risk families [15], which were subject to ascertainment bias, to around 50–60 % in population-based or prospective studies [613]. Antoniou et al. [14] assembled information from 22 studies on families of index cases with BRCA1 or BRCA2 mutations, unselected for family history, and estimated a cumulative BC risk (0–70 years) of 65 % for those with BRCA1 mutations and 45 % for those with BRCA2 mutations. A meta-analysis of 10 studies that corrected for ascertainment bias showed the following cumulative risks to age 70 for mutation carriers: BC risk of 57 % for BRCA1 and 49 % for BRCA2, and OC risk of 40 % for BRCA1 and 18 % for BRCA2 [15].

Several genetic risk assessment methods are available to estimate the probability of BRCA mutation in individuals, in order to select them for molecular diagnosis [16]. Empirical methods, usually based on the number of BC and OC cases in the family, age at diagnosis, and occurrence in subsequent generations, ignore data from unaffected relatives [1720], and may therefore grossly overestimate the probability of the mutation in large families with few affected members. By contrast, methods based on genetic models consider information from all relatives, whether they are affected or not. Berry et al. [21] and Parmigiani et al. [22] developed a method (BRCAPRO software), based on Bayes’ theorem, which required data on all first and second degree relatives of the family proband, and incorporated as prior probabilities incidence rates in the US population, allele mutation frequencies and penetrances estimated from studies in families with several BC or OC cases [4, 22, 23]. Recently, the BRCAPRO, available at http://bcb.dfci.harvard.edu/bayesmendel/brcapro.php, has been improved to include third degree relatives.

In the European case-only study (COS) on the role of gene-environment interactions in the development of BC in young women [2427], we wanted to classify women developing BC before age 40 according to the probability that they carried a BRCA mutation, as estimated from family history. COS was carried out in seven countries with BC incidence ranging from 50 to 100 per 100,000 women per year (Estonia, France, Germany, Israel, Italy, Scotland and Slovenia) [28]. In such a context, the application of a Bayesian model requires country-specific assumptions of sporadic BC and OC incidence; however, BC incidence has increased over generations, both in the general population [29] and in mutation carriers [14, 3032], and single age-specific incidence and penetrance curves do not accurately describe the disease risk in subsequent generations.

We have developed a computer program (COS software) to estimate the probability of carrying a deleterious BRCA mutation when incidence and penetrance are increasing over generations. The software is based on the same Bayesian logic as BRCAPRO, it is able to evaluate all third degree relatives, parents’ cousins and grandparents’ siblings (fourth degree), and it allows researchers to incorporate a hypothetical third BRCA gene to reduce overestimates of BRCA mutation probability due to the presence of other genes or gene combinations with similar penetrance.

Antoniou et al. [33] also developed a model (BOADICEA), available at http://ccge.medschl.cam.ac.uk/boadicea/, that takes into account a polygenic component beyond BRCA1 and BRCA2 and the increasing incidence of BC and OC over subsequent birth cohorts. BOADICEA also takes into account the incidence of other cancers, such as prostate and pancreas cancer [34].

Roudgari et al. [35] showed that the Scotland specific COS software compared favorably with BOADICEA in terms of sensitivity and AUROC (area under receiving operator curve), while BOADICEA showed better specificity.

We present here a new version of the COS software based on improved penetrance estimates of both BC and OC and compare its performance with BRCAPRO and BOADICEA.

Materials and methods

Study subjects and genetic testing

The present study is based on data from high risk families attending the Medical Genetics Unit of Milan National Cancer Institute (INT). Genetic counseling and testing was offered to all eligible families using widely accepted criteria based on the number of cases and ages at diagnosis (Table 1) [36, 37].

Table 1 Eligibility criteria to genetic counseling and testing

Data from 384 BRCA1 and 229 BRCA2 mutated families were used to calculate penetrance. An independent set of 436 consecutive families (those recruited between 2004 and 2008) was used for model validation. This set included 79 BRCA1 and 27 BRCA2 families, and 330 families tested negative for BRCA1 and BRCA2 genes. Families with variant of uncertain significance were excluded from the study.

BRCA gene mutation testing was performed either by denaturing high performance liquid chromatography (DHPLC) or by direct sequencing or by a combination of both methods examining all coding exons and corresponding splice sites of both genes. Individuals who tested negative at these analyses were investigated for the occurrence of large genomic rearrangements by multiple ligation-dependant probe amplification (MLPA), using commercially available kits (MRC-Holland). Families were considered as BRCA1 or BRCA2 mutation positive when genetic variants fulfilling one of the following criteria were ascertained. (a) Variants generating a premature stop codon (PTC), including nonsense mutations, small out-of frame insertions/deletions, splicing mutation confirmed by in vitro functional analyses, and large genomic rearrangements, with the exception of those introducing a PTC at or downstream BRCA2 codon 3326; (b) base pair changes, confirmed splicing mutations and genomic deletions leading to the loss of the translation start point; (c) confirmed splicing mutations and genomic deletions leading to the in-frame loss of exonic region coding for functional protein domains; (d) variants at the nearly invariant GT and AT dinucleotides at the 5′ and 3′ intron ends, which are predicted to affect mRNA splicing, even if not experimentally verified; (e) missense mutations and small in-frame deletions classified as pathogenic by multi-factorial probability based models [38]; (f) missense mutations affecting the highly conserved cysteine residues of the RING-finger domain of the BRCA1 protein. Families were considered as BRCA mutation negative when neither one of the above described genetic alterations nor variants of uncertain significance were ascertained.

All participants in this study signed an informed consent, approved by INT Ethic Committee, to the use of their biological samples and data for research purposes.

Mutation probability model

As shown by Parmigiani et al. [20] the probability P(M|H) of a person carrying a deleterious mutation (M) in a cancer gene, given a family history of cancer (H), is given by the Bayes theorem:

$$ P\left( {M = 1|H} \right) = P\left( {H|M = 1} \right)f/\left[ {P\left( {H|M = 1} \right)f + P\left( {H|M = 0} \right)\left( {1 - f} \right)} \right] $$

where f is the empirically-determined mutation prevalence. The expression implies that the probability P(M|H) depends on the penetrance in women with a mutation (M = 1) and on the incidence in women without a deleterious mutation (M = 0). The model is fully described in “Appendix 1”.

Model development

Estimation of incidence in women without deleterious BRCA mutations

We estimated country-specific general population BC incidence by birth cohort from cause-specific mortality data [39] and population-based cancer survival data [40, 41], using a mathematical model of the relationship between incidence, survival, mortality, and prevalence [42]. In this model, incidence was a polynomial function, with age, diagnosis period, and birth cohort as covariates. We produced BC incidence estimates for Estonia, Slovenia and Scotland, which were checked against national cancer registry data. We also produced estimates for Italy, France, and Germany, which were checked against registry data covering only part of the national populations [41]. In the present study, we are using the figures for Italy, where the cumulative lifetime BC risk increased from about 2–3 % for the women born at the turn of the 20th century, to 8–9 % for those born in the 1940 s, without any further increase subsequently [29, 43]. These estimates approximate the expected BC incidence in women carrying normal BRCA alleles. The only exception is the incidence in young women, where corrections are required, because a substantial proportion of cases in young women are attributable to BRCA mutations (see “Appendix 2”). Since male BC is rare, we used cross-sectional incidence curves estimated by pooling together 15 years of data from Italian cancer registries, without attempting any cohort-specific estimates.

As in Europe the change of OC incidence over 20th century generations has been much less dramatic than the change of BC [44], we tentatively incorporated into the software country-specific, cross-sectional age-specific incidence data from the cancer registries [28].

Incidence in women with a deleterious BRCA mutation

In order to estimate BC and OC penetrance in women with a deleterious BRCA mutation, we used pedigrees from 384 families with a BRCA1 mutation and 229 with a BRCA2 mutation. We also used 330 families which tested negative for both BRCA1 and BRCA2 to estimate the penetrance of a hypothetical third BRCA gene. For each of these sets, we simulated various age and birth cohort specific incidence and penetrance curves for BC and OC with the following model

$$ r\left( {age,gen} \right) = Ir\left( {age} \right)*G\left( {gen} \right) $$

where Ir is a 6-parameter curve given by the linear interpolation between incidence rates at ages 28, 35, 42, 49, 60, 75, and G is a 3-parameter curve given by the linear interpolation between the 1900, 1930 and 1960 cohorts. Ir is kept constant after age 75 and decreases exponentially before age 28, and G is kept constant before 1900 and after 1960.

We generated four functions (BC incidence, BC penetrance, OC incidence, OC penetrance), choosing for each function the set of 9 parameters that maximize the following

$$ \mathop \prod \limits_{fam = 1}^{N} P[fam \;history|BRCA = 1] $$

which expresses the probability of observing the histories of all families given the mutation in the probands. For the computation we used the conjugate gradient maximization method [45].

However, such functions could overestimate the incidence and penetrance curves, because families with numerous cases tend to be selected in preference to those with few cases. To partially correct for ascertainment, we assumed that the families would not have been selected for genetic testing before the occurrence of cancer in the proband. We therefore did not take into account the probands’ cancers.

Figure 1a–d show estimates of the cumulative risks of developing BC and OC in cohorts of women with a deleterious BRCA1 or BRCA2 mutation born in 1900, 1920, 1940, and 1960. For comparison, Fig. 1a–d also show the penetrance curves estimated by the meta-analysis by Chen and Parmigiani [15]. In the women carrying a BRCA1 mutation, estimates of cumulative risk of BC to age 70 increased from 35 % in the women born in 1900 to 58 % in the women born in 1960. The corresponding penetrances for BRCA2 mutation carriers were 38 and 63 %. OC 70-year penetrance estimates also increased dramatically from 9 to 47 % for BRCA1 and from 7 to 26 % for BRCA2. These penetrances are within the literature ranges for BRCA mutations [14], and closely similar to recently estimated cumulative risks in large US population-based studies [46, 47].

Fig. 1
figure 1

Breast and ovarian cancer penetrance estimates: cumulative risk for successive birth cohorts of women carrying a BRCA1 or BRCA2 mutation. The risk curve derived by Chen and Parmigiani [15] is also shown. a BRCA1, breast cancer, b BRCA2, breast cancer. c BRCA1, ovarian cancer, d BRCA2, ovarian cancer

Incidence in eligible women who tested negative for BRCA mutations

We also estimated the penetrance curves for BC and OC in the 330 eligible families which tested negative for BRCA1 and BRCA2 mutations. The method of penetrance estimation was the same as the method used for mutated families. These estimates were incorporated into the model as if these families carried a deleterious mutation in a third hypothetical BRCA gene (BRCA3). Due to eligibility criteria, in fact, these families have a higher incidence than the general population, whether due to genetic and/or environmental factors, which must be taken into account. If the BRCA1/BRCA2 negative families had the same incidence as the general population, the assumption of a BRCA3 condition would not be necessary to discriminate BRCA1 and BRCA2 families from the other families in the eligible set. We are aware that such a hypothetical BRCA3 gene is unlikely to exist, and that the excess incidence in these families is more likely due to the interaction of several low-penetrance alleles, or to moderate/high penetrance mutations in a number of distinct non-BRCA genes [48]. However, for our purposes the real nature of the trait is irrelevant.

Allele frequencies

Allele frequencies can be set manually in the COS software. The allele frequencies for the Italian population are not known. We tentatively set the default frequencies at 0.0006 for BRCA1, and 0.0002 for BRCA2, as is widely used [49]. This BRCA1/BRCA2 ratio is consistent with the ratio of BRCA1 and BRCA2 mutations in our validation data set (79/27). However, we simulated the effect of considering other frequencies. Increasing the BRCA2/BRCA1 ratio up to 1, as suggested by recent allele frequency estimates [50, 51]. did not materially change the results.

Claus et al. [52] estimated that the genotype frequency of all mutated BC genes is 0.0033, that is to say allele frequency 0.00165. We used these estimates to hypothesize a BRCA3 allele frequency of 0.00165 − 0.0008 = 0.00085.

Software use

See “Appendix 3”.

Model evaluation

Performance of COS and comparison with other models

We examined the performance of COS and other programs in predicting mutations in a validation set of 436 Italian high-risk families. We calculated the area under the receiver operator characteristic curve (AUROC). ROC curves are generated by plotting sensitivity against 1-specificity, considering various probability levels as separating negative from positive predictions. The closer the AUROC is to 1, the better the software is at estimating correct outcomes. We calculated 95 % confidence intervals (CIs) for AUROC according to DeLong et al. [53]. We used the best probability threshold obtained from ROC curve to compute sensitivity and specificity. In the ROC curve, where each point is obtained by calculating the sensitivity and the specificity for a defined threshold, the best threshold is the point that minimizes the quantity (1-specificity)2 + (1-sensitivity)2 [48].

We also computed the Brier score (mean of the squared difference between each probability prediction and the genetic test outcome, which can be either 1 when positive or 0 when negative). A Brier score of 0 indicates perfect prediction; the worst possible score is 1.

We then compared the AUROC, the Brier score, the sensitivity and the specificity obtained by the COS software with those obtained by BRCAPRO v5.1, BRCAPRO v6.0 and BOADICEA v3.0 (Fig. 2).

Fig. 2
figure 2

Comparison of receiver operator characteristic (ROC) curves for COS, BRCAPRO v5.1, BRCAPRO v6.0 and BOADICEA

Each software has a specific threshold that maximizes his discrimination power. Therefore, to compare the validity of different software it is necessary to use their specific best thresholds. Otherwise, choosing a single threshold, it may occur that a software shows a better performance only because his best threshold is the nearest to the chosen one. As medical genetics units usually refer to a specific threshold, e.g. 10 % to select families for genetic testing we also show how the sensitivity and specificity change as a function of the chosen threshold (Fig. 3).

Fig. 3
figure 3

Sensibility and specificity of COS, BRCAPRO 6.0 and BOADICEA as a function of the threshold chosen for selecting families for BRCA testing

Results and discussion

We estimated the penetrance of BRCA mutation in a large set of Italian families attending genetic counseling and eligible to BRCA testing (Fig. 1). Our results could be useful for genetic daily practice in high risk families, helping geneticists and women in the complex decision process about preventive options. Overall, our penetrance estimates are in line with the ones previously published; however, they allow a more accurate personal risk definition for women belonging to different birth cohorts.

The validation set included 436 consecutive eligible index cases (416 females and 20 males): 320 women developed BC (median age 39 years, range 21–80 years), 28 OC (median age 50.5 years, range 27–77 years), and 28 both BC and OC (BC median age 47.5 years, range 30–72 years; OC median age 55 years, range 36–76 years); 13 men developed BC (median age 56 years, range 34–73 years). In 47 families, individuals affected by BC or OC were not available for testing, thus the analysis was carried out on their closest relatives.

Among the 436 families, 319 included BC cases only (41 with one early onset BC and negative family history), 15 OC cases only (one with early onset OC and negative family history) and 102 both BC and OC cases (10 with both BC and OC in the index patient and negative family history). Overall, this set included 1171 BC cases, 135 OC cases and 53 cases with both BC and OC out of 7,580 women; 33 BC cases were recorded among 6,789 men.

In the validation set of 436 families we found 106 deleterious mutations (79 BRCA1 and 27 BRCA2) corresponding to a frequency of mutated genotype of 24.3 % (95 % CI 19.6–28.9). The COS software average estimate of mutation probabilities was 24.7 %. The Brier score was 0.125 and the AUROC 0.845 (CI95 % 0.764–0.924). The best probability threshold to discriminate the BRCA1 and BRCA2 mutated families from the non-mutated ones was 22.9 %, which corresponds to a sensitivity of 80.3 % and a specificity of 80.2 %. Table 2 shows the corresponding values obtained with BRCAPRO v5.1, BRCAPRO v6.0 and BOADICEA. COS and BRCAPRO v6.0 had almost the same performance, superior to BRCAPRO v5.1 and BOADICEA for all performance indexes. The improvement of the performance of BRCAPRO v6.0 with respect to version 5.1 is likely to be due to the fact that version 6.0 allows the software to take into account third degree relatives. As all the methods are heavily dependent on the validity of the incidence and penetrance assumptions, it is most likely that the BOADICEA assumptions do not fit the Italian population so well. BOADICEA and BRCAPRO include the information on BC phenotype (hormone receptor expression), and BOADICEA allows researchers to take into account the occurrence of other BRCA associated cancers (prostate and pancreas). We have not taken into account these phenotypes, because in our database the information was sparse and COS has not yet developed these functions.

Table 2 Comparison of performance indexes of different software in order to estimate the mutation probability on the basis of family history

Figure 3 shows the performance of COS, BRCAPRO 6.0 and BOADICEA as a function of the threshold chosen for selecting families for BRCA testing. For the usually chosen threshold of 10 % the performance of COS and BRCAPRO 6.0 is the same (COS sensitivity 87.7 and specificity 63.2; BRCAPRO 6.0 sensitivity 87.7 and specificity 63.3), while BOADICEA shows a much lower sensitivity (71.5) and a somewhat higher specificity (70.6). For a threshold of 20 % the performance of COS would be better than BRCAPRO 6.0.

All programs predict BRCA1 and BRCA2 mutations separately. However their ability to discriminate is limited, because the assumed penetrance of the two mutated genes is fairly similar. Therefore we are only presenting the performance indicators for both genes combined.

The BRCAPRO software developed by Berry and Parmigiani [21, 22] substantially improved previous empirical approaches based on the number and ages of affected relatives. However, using BRCAPRO up to version 5.1, some families with several young cancer cases showed low mutation probabilities, because distant relatives with cancer could not be accommodated in the software. As already mentioned, version 6.0 dramatically improved the performance (Table 2), most likely because it allows researchers to take into account more distant relatives. Nevertheless, we would have expected a better performance of the COS software with respect to BRCAPRO, because the latter does not take into account the evolving cancer incidence and penetrance over subsequent generations. Moreover, we expected an overestimation of the mutation probability by BRCAPRO, because of the absence of the third gene in the model. Nevertheless, BRCAPRO v6.0 provides good estimates, most likely because of the following factors, which lower the estimation of the mutation probability:

  1. (a)

    the single curve for incidence incorporated in BRCAPRO v6.0 overestimates the expected BC incidence in the women belonging to older generations, when BC incidence was low. This causes an underestimation of the mutation probability.

  2. (b)

    BRCAPRO incorporates very low penetrance estimates, much lower than those estimated by the Chen and Parmigiani meta-analysis of studies without ascertainment bias, with BC cumulative risks at 70 years of age = 43 % for BRCA1 and 32 % for BRCA2 (vs. 57 and 49 % in the meta-analysis), and OC cumulative risks = 30 % for BRCA1 and 15 % for BRCA2 (vs. 40 and 18 %).

  3. (c)

    BRCAPRO v6.0 seems to give a very low weight to contralateral BC.

All the models share the limitations that they use penetrance estimates, and also published allele frequency estimates, which may be inaccurate. The relatively poor performance of BOADICEA on our Italian families highlights the importance of the availability of population specific incidence and/or penetrance estimates. Several approaches have been used to estimate the average penetrance associated with BRCA mutations. Earlier estimates applied the maximum-LOD-score method to multiple-case families collected for linkage studies for the identification of disease loci [3, 4]. This method provided dramatically overestimated life-long cumulative risk estimates, of the order of 80 % by age 70 for BC. A few years later, however, Struewing et al. [9], in a population based study of 5318 Ashkenazi Jews, unselected for family history, showed that the penetrance is likely to be much lower. By comparing the cancer histories of relatives of carriers and non-carriers of the three founder mutations, he estimated that the risk of BC by the age of 70 was 56 %. The major limitation of studies based on cases unselected for family history in other populations, in which mutations are almost spread all over the two genes, is their small size and, therefore, the imprecision of estimates. Antoniou et al. [14], however, pooled 22 studies, half population based and half selected on the basis of the young age of the proband. Mothers and sisters of the proband were assumed to be followed from age 20 years and were censored at the age of first cancer diagnosis: the resulting estimates for BC risk to the age of 70 years were 65 % for BRCA1 and 45 % for BRCA2. The estimated penetrance of BRCA1, however, was higher if the studied families were selected for the young age of the proband, suggesting some residual ascertainment bias (mutations associated with early age of cancer may confer higher lifetime cumulative risks). The recent meta-analysis of Chen and Parmigiani [15], who pooled population based studies and studies that corrected for ascertainment bias, gave somewhat lower estimates for BRCA1 (57 %), and similar results (49 %) for BRCA2 mutation carriers. An excellent method to estimate penetrance is to prospectively follow-up healthy mutation carriers, as was made in the EMBRACE study [13], with a 10-year follow-up of 988 mutation carriers without a previous diagnosis of breast or ovarian cancer: the average BC cumulative risks by age 70 years were estimated to be 60 % for BRCA1 and 55 % for BRCA2 carriers. These results cannot be influenced by the previous familial cancer history, but, still, may be somewhat overestimated, because in the studied families, selected for family history, high penetrance mutations may have been overrepresented. There is increasing evidence, in fact, that different mutations may confer different risk. We estimated penetrance from families selected for young age of the proband or the presence of several cases in the family, i.e. from a series biased for higher penetrance. We are confident, however, that the exclusion of the proband’s cancer from the computation provided fairly good correction of ascertainment bias. As shown in Fig. 1, in fact, our estimates are quite in line with those of Cheng and Parmigiani [15], and the increasing penetrance over subsequent birth cohorts is of the same order of the increasing relative risk estimated by Antoniou et al. [14].

Nevertheless, whatever the validity of the estimates, there is an intrinsic limitation in the discriminating power of the models, which depends on differences in phenotype incidence. When the difference in disease incidence between women with and without mutation is large, the power is high. Similarly, if penetrance is low and if it approaches the incidence of sporadic cancer, the discriminating power is reduced. It is because BC penetrance estimates for BRCA1 and BRCA2 are so similar that reliable discrimination between them is actually difficult.

The COS software is available for free on request to patrizia.pasanisi@istitutotumori.mi.it.

The COS risk prediction model demonstrated high performance indexes and can be a useful risk stratification tool in research studies. At present, however, the use of prediction models and strict probability thresholds in clinical practice protocols still present evident limitations. To select families eligible for BRCA testing, models should not be used alone and genetic counseling should always be provided.