Introduction

Breast cancer has become the most common type of cancer in females and is ranked as the sixth leading cause of death to Chinese women since 2006 [1]. The incidence of breast cancer in China has increased from 11.7 per 100,000 in 1998 to 29.2 per 100,000 in 2008 [2]. Compared to western countries, the incidence rate of breast cancer is lower in China, but the absolute number of new cases is higher because of China’s larger population (1.31 billion). Of the total new cases of breast cancer diagnosed in the world, 21.3 % are from China [3]. In addition, cancer registries in China show annual increase in incidence from 3 to 4 % which is higher than that in any other country [4]. The increasing incidence underscores the need for early diagnosis of cancer. The only method established to reduce mortality from breast cancer is mammographic screening [57]. Even so, no such program exists nationwide in China, because of relatively lower breast cancer incidence and resource constraint. However, not long ago, a project named “breast cancer screening, supported by central funds (from central government)” was initiated in 2008. Local governments of some economically developed areas also have started investing in breast cancer prevention, for example, the “Double Ribbon” activity in Beijing, Tianjin and some other cities, where free cervical and breast examinations are undertaken for women of a certain age range (40–60 years) http://www.bjhb.gov.cn/gzfwq/zhfw/wsaqts/201206/t20120628_51162.htm. Besides the health care policy and financial support, creating awareness among the population is another important function associated with screening. There is some evidence to show that individualized numerical risk estimates increase adherence to screening [8]. But, mammography is expensive and requires manpower and technical expertise. Breast health global initiative (BHGI) guidelines recommend that breast health awareness (BHA) should be promoted among all women at the basic level [9]. A woman’s decision to embark on a program of surveillance depends on her awareness of her medical options, personal preferences, and, more importantly, on the individualized estimate of the probability of her developing breast cancer in a defined period [10]. In resource-constrained countries, a risk assessment tool may also be useful for designing the screening protocols, particularly for high-risk subsets of population, among whom the incidence is rather low [1113].

Based on our systematic review of epidemiological studies on risk factors of breast cancer in Chinese women, we developed the health risk appraisal (HRA) model aimed at predicting an individual’s risk of developing breast cancer in the next 5 years. In this article, besides describing the development of the model, we assessed its performance based on a database of the first round of breast cancer screening program.

Method

HRA model

The HRA model was developed based on the meta-analysis of the main risk factors of breast cancer. The risk score is calculated as the sum of the scores of the included risk factors. The absolute risk of an individual developing breast cancer in the next 5 years was estimated by using the risk score conversion tables and 5-year incidence probability tables [14].

Calculation of pooling odds ratio: meta-analysis

Two authors reviewed the electronic databases (PudMed, ScienceDirect, Wiley, CNKI, WanFang, and VIP Database) of the risk factors of breast cancer among Chinese women up to December 2012 and supplemented the data with their manual search. The following were the inclusion criteria: (1) type of study, case–control study or cohort study; (2) topic of interest, risk factors of breast cancer, which provide a complete cross-table data of exposure with outcomes or odds ratio (OR) or relative ratio (RR) and 95 % confidence interval (CI) (or they can be calculated from other data in the article); (3) population, Chinese female or including Chinese female; and (4) language of publication: Chinese or English.

The following were the exclusion criteria: (1) those with benign breast disease were selected as controls; (2) type of study was review (including systematic review), case report, etc.; (3) original sample size in each arm was less than 100 in the case–control study; and (4) study with incomplete data of interest and duplicate publication.

Two review authors (Yuan Wang, Ying Gao), working simultaneously, but independently, scanned the related abstracts and obtained full-text reports for the abstracts, which suggested that they were related to evaluating the risk factors on breast cancer. After obtaining the full reports (either in full peer-reviewed form or as in-press articles), the review authors independently assessed eligibility of the candidate study for inclusion in the review. In the case of multiple publications of the same study or overlapping datasets, only data from the largest or most updated results was included. If there was any disagreement in study selection, it was adjudicated by the third reviewer (Wenli Lu).

After reviewing the full texts, 98 articles were included in this systematic review, among which 20 articles related to the age at menarche, 6 articles to the age at first birth, 37 articles to the history of benign breast diseases, 46 articles to family history of breast cancer, 41 articles to the history of breastfeeding, and 35 articles to history of induced abortion. For a proper assessment, of the weightage of each study, the SE for each logarithm OR was calculated and recognized as the estimated variance of the log OR. Inverse variance weighting method was used for pooling. The pooling of ORs was performed with Review Manager 5.0. Pooled odds ratios of six main risk factors for breast cancer are shown in Table 1.

Table 1 Pooled odd ratios of six main risk factors for breast cancer in Chinese female

The proportions of risk factors’ exposure

A database was obtained from the breast cancer-screening project in 53 cities of China. This database contains data on the age of woman when screened, age of menarche, age at first birth, history of benign breast diseases, family history of breast cancer, history of breast-feeding, and history of induced abortion. The proportions of risk factors’ exposure in general population were obtained for the breast cancer screening project database. Information relating to 62,875 women was included in this database (Table 2).

Table 2 The proportions of risk factors’ exposure in general Chinese women

Calculation of risk score

The risk factors for breast cancer were estimated using the risk score conversion table (quantitative criteria for assessment) (Table 3) and Formula 1. The included risk factors were chosen to be setup at pooling odds ratio larger than 1.5 or lower than 0.7 and statistically different from 1.0. A risk score of more than 1.00 would increase the risk of breast cancer. Baseline incidence ratio (BIR) was calculated with \( \frac{1}{{\displaystyle \sum_{i=1}^n{\mathrm{POR}}_i\times {P}_i}} \).

Table 3 The risk score of exposure factors in Chinese women
$$ {\mathrm{RS}}_i={\mathrm{POR}}_i\times \mathrm{BIR}=\frac{{\mathrm{POR}}_i}{{\displaystyle \sum_{i=1}^n{\mathrm{POR}}_i\times {P}_i}} $$
(1)
RS i :

Risk score of exposure factors i

POR i :

Pooled odd ratio

BIR:

Baseline incidence ratio

P i :

Proportion of risk factors’ exposure

Calculation of combined risk score

Combined risk score (CRS) was calculated as the total of the risk scores for an individual. The method used for combining the risk factor values into a composite risk factor is called the “credit–debit” method [15]. The RS i amounts by which the risk factors exceed the average (1.0) were minus 1.0, then added together (Formula 2). Risk factor values (RS i ) less than 1.0, were multiplied with each other (Formula 3). The sum of the latter product (CRS2) and the former (CRS1) products yields the CRS (Formula 4).

$$ {\mathrm{CRS}}_1=\left({\mathrm{RS}}_1-1\right)+\left({\mathrm{RS}}_2-1\right)+\cdots +\left({\mathrm{RS}}_i-1\right) $$
(2)
$$ {\mathrm{CRS}}_2={\mathrm{RS}}_1\times {\mathrm{RS}}_{{}_2}\times \cdots \times {\mathrm{RS}}_i $$
(3)
$$ \mathrm{CRS}={\mathrm{CRS}}_1+{\mathrm{CRS}}_2 $$
(4)

Calculation of probability of developing breast cancer

The age-specific incidence of 5-year probability of occurrence of breast cancer was calculated from the age-specific incidence [16] according to Formula 5 [17] (Table 4). In Formula 5, the number 2 is constant number. To calculate the 5-year risk of breast cancer, number 5 was used.

Table 4 Calculation of probability of developing breast cancer in 5 years in general population
$$ \mathrm{Probability}\ \mathrm{of}\ \mathrm{developing}\ \mathrm{breast}\ \mathrm{cancer}\ \mathrm{in}\ 5\ \mathrm{years}=\frac{2\times 5\times \mathrm{age}\hbox{-} \mathrm{specific}\ \mathrm{in}\mathrm{cidence}}{2+5\times \mathrm{age}\hbox{-} \mathrm{specific}\ \mathrm{in}\mathrm{cidence}} $$
(5)

Calculation of individual risk of breast cancer

The risk of each individual having breast cancer was estimated based on the risk score conversion tables and 5-year incidence probability tables (Formula 4)

$$ \mathrm{FR}={P}_j\times \mathrm{FP} $$
(6)
FR:

Five-year risk of breast cancer

P j :

Combined risk scores

FP:

Probability of developing breast cancer in 5 years

A program was designed for calculating HRA model by connection to Microsoft SQL server 2012 from PowerBuilder 9.0.

Certificate database

The database comprising the screening results from 53 cities was used for certification of the risk assessment model. The average risk was represented by the median of each subgroup, classified by age group, age at menarche, age at first birth, family history of breast cancer, history of benign breast diseases, feeding history, and history of induced abortion. Nonparametric test was used to compare the average risks of the subgroups. The discriminatory accuracy of this model was assessed by using the receiver operating characteristic curve (ROC curve) and the corresponding area under the curve (AUC). The ROC and AUC were obtained by applying the models to screening cohorts in 53 cities. ROC curve is a plot of true-positive rate (sensitivity) versus false-positive rate (1, specificity) at a continuum of thresholds; breast cancer is predicted to a participant if her estimated probability of breast cancer exceeds a particular threshold.

Results

Probability of developing breast cancer in 5 years

The median 5-year risk of breast cancer for 62,875 individuals was found to be 3.3 ‰ [interquartile range, 2.2–3.8 ‰]. For 10 % of the women in the total population, it was higher than 8 ‰ (Fig. 1). The probability of developing breast cancer increased with age from 1.2 ‰ for participants aged 35 through 40 years to 3.8 ‰ for participants aged 50 through 55 years; thereafter, it slightly decreased with increasing age (Fig. 2). The 5-year risks of breast cancer in subgroups by risk factors are shown in Fig. 3. For women whose age at menarche was less than 12 years, the median 5-year risk was higher than that for women whose age at menarche was more than 12 years. For 503 women, who gave their first birth after 35 years of age, the 5-year risk of breast cancer was about three times (11.1 ‰) more than that (3.2 ‰) of women who gave their first birth earlier than 35 years of age. Risk of breast cancer for women who had a history of benign breast diseases was about three times (9.4 ‰) more than that (3.1 ‰) for women with no such history. The risk of breast cancer was 8.2 ‰ for 510 women who had at least one first-degree relative, diagnosed with breast cancer. This is much higher than the risk of those who had no such family history. For 5,769 women, who did not breastfeed, the 5-year risk of breast cancer was 6.3 ‰. For those who had more than three induced abortions, the risk was estimated as 7.2 ‰, which was about twice that of those who had fewer induced abortions.

Fig. 1
figure 1

Distribution of 5-year risk of breast cancer in 62,875 Chinese female

Fig. 2
figure 2

Comparisons of estimated 5-year risk of breast cancer by current age

Fig. 3
figure 3

Comparisons of estimated 5-year risk of breast cancer a by age of menarche, b by age at first birth, c by history of benign breast diseases, d by history of breast cancer, e by history of breast feeding, and f by induced abortion

Uncertainty in projections

The performance of HRA models was assessed in terms of sensitivity (the percentage of subjects predicted to have the risk of cancer among the cancer cases) and specificity (the percentage of subjects predicted to be cancer-free among the cancer-free controls). When women whose risks higher than median risks were considered as high risk of breast cancer, the sensitivity was 60.0 %. The number of cancer missed cases was six (40 %) out of 15 cases of diagnosed breast cancer (Table 5). When women in the top 10 % of the risk scores were considered as having high risk of breast cancer, a lower sensitivity (26.7 %), and a higher specificity (90.1 %) were observed.

Table 5 Projection certainty of HRA model in project breast cancer

The ROC curve also was computed for the HRA models. Figure 4 shows the validated ROC curve. The area under the curve was 0.64 (95 % CI, 0.50–0.78), which is significantly different from 0.50 on the borderline, suggesting therefrom that the model is rather reliable.

Fig. 4
figure 4

The ROC curve of breast cancer risk assessment model

Discussion

This study introduces a breast cancer risk assessment model developed on the basis of published studies on risk factors of breast cancer. Information on current age, age of menarche, age at first birth, history of benign breast diseases, family history of breast cancer, history of breast feeding, and history of induced abortion was used as the input data for calculating the 5-year risk of breast cancer. Considering individuals whose risk was above the median risk (3.3 ‰) from the validation database, the sensitivity was 60.0 % and specificity 47.8 %. The unweighted AUC was 0.64. The percentage of risk 5-year risk of developing breast cancer was calculated for each individual. Increase in age group will lead to increase in the 5-year risk. Besides, any change in any of the risk factors would lead to increase in the 5-year risk.

Six predicting factors were included in this HRA model, besides current age. In terms of age, the probability of developing breast cancer over 5 years decreased after 60 years of age, which might be a reflection of decrease in the age-specific incidence of breast cancer of that age group. Ages at menarche and first birth are established risk factors for breast cancer. Meta-analysis of 117 studies confirms that young age at menarche and old age at menopause increase breast cancer risk. The studies included in the analysis were carried out in 35 countries, mostly in Europe and North America [18]. Menarche at late age showed a protective effect of about 40 % compared to the menarche at ages earlier than 12 years. For women who were in the older age group at the time of their giving first birth were, on average, the risk of breast cancer was twice higher than that of those who were in the youngest age group of first birth category (younger than 35 years) [1921]. Meta-analysis based on a review of 12 previous reports, show a decrease in the risk of breast cancer in ever-breastfeeding women, as compared to the never-breastfeeding parous women and this decrease was more pronounced in nonmenopausal women at the time of diagnosis of breast cancer and in long-term breastfeeding women [22]. The protective effect of breastfeeding showed about 50 % reduction in breast cancer risk as compared to that in no-breastfeeding women. The prediction effect of the above-mentioned three factors is consistent with the findings of epidemiological studies [2325]. History of benign breast diseases instead of “ever having previous benign breast biopsy” was used when current HRA model was developed, because biopsies were not common among the majority of Asian women. Family history of breast cancer was included in this HRA model, instead of “number of relatives having breast cancer”, which was used by Gail model whose accuracy of prediction might be limited. A possible link between abortion and breast cancer has been suggested because abortion is thought to interrupt the normal cycle of hormones during pregnancy. Some believe that this interruption might increase a woman’s risk of developing breast cancer [26, 27]. “History of induced abortion (more than 3 times)” was included in this HRA model, because the pooling odds ratio was close to two. The pooled work of odds ratios for risk factors of breast cancer suggests a somewhat different pattern of risk factors for the Chinese population. The effect of age at menarche, history of abortions, and breastfeeding is stronger as compared to the effects of the factors included in the Gail model of 1987. On the contrary, the effect of family history was found to be weaker. Future research on environmental carcinogens, dietary agents, and endogenous hormones may contribute to a better understanding of the ethnic differences in breast cancer risk factors. Several factors were not included in the present model because of their weak effects or lack of statistical significance. Those factors will have to be reconsidered once additional evidence becomes available about their significant effect on breast cancer risk.

The current model is not a perfect model. Alternative models and strategies will have to be suggested for arriving at the model parameters. Adding new factors or changing the parameters that are being used in the model is easy. It is unlikely that any prediction model will have a much higher discriminatory accuracy, given the low relative risks associated with well-established nonmodifiable breast cancer risk factors. In addition, present HRA model combined the effect of risk factors independently of each other. But, for a multifactorial cause of event, the association between an event and a particular factor would be influenced by distribution among the exposure groups of other risk factors. To avoid distortion from confounding, it is necessary to use a predictive model such as the multiple logistic function that takes into account the simultaneous effects of multiple risk characteristics. However, the appropriateness of multiple logistic function as a predictive model depends on the accuracy of measurement of interaction effects among risk factors. Several procedures available for multivariate risk estimation in HRA, including the log linear model, were evaluated. They all require either datasets containing all the predictor variables of interest or favorable assumptions about lack of confounding and interaction that was, in general, not available. The superiority of the theoretically more appealing methods to the simple one employed in the conventional HRA remains to be demonstrated.

Selection of the risk factors included in the current model is based on a systematic review of epidemiological studies. Selection is considered necessary, because that might take advantage of the statistical power of meta-analysis. Meta-analysis is considered a more powerful estimate of the true effect size, as opposed to the less precise effect size derived from a single study. In addition, the results of meta-analysis can be generalized to a larger population. It should be noted that a meta-analysis of several small studies does not predict the results of a single large study [28]. In meta-analysis, for calculating the pooling odds ratio, used in the current HRA model, only those studies that had a large sample and high quality design were included. Further, it should be borne in mind that when using the outcomes of a meta-analysis in HRA model, the association between risk factors and risk of disease might be over-estimated because of publication bias. The risk of breast cancer to an individual could be overestimated too. The accuracy of predication needs to be further tested with a larger cohort at the population level and with a long-term follow-up. As regards the reliability of the validation, it should be noted that the database for validation was obtained from merely the first round of screening of a breast cancer project without any further follow-up. It may not be an appropriate way to test the reliability of our model in predicting the 5-year risk of breast cancer. Hence, follow-ups are required to monitor the development of breast cancer, if any. The present way of validation would be under question because of the small case sample and the lack of follow up. Using more population-based prospective follow-up studies for the validation would be quite helpful for improving the current model.

The exposure proportions of general population’s risk factors were calculated from the database of a breast cancer screening project. It is likely that the proportions of risk factors were overestimated, because women with known risk factors may show better attendance at screening. The risk scores and predicted probability of developing breast cancer may be underestimated because of underestimation of the exposure proportion of risk factors in general population. The distribution of risk factors for breast cancer among Chinese women would change with China’s economic and social development, and increase in breast cancer incidence. In the event of such changes, the current model needs to be updated.

Many models have been developed for assessing these risks with varying degrees of validation [2931]. Until recently, the two most frequently used models are the Gail and the Claus models. In both these models, the combination of risk factors was developed from a single case–control data. The Gail model focuses primarily on nongenetic risk factors with limited information on family history, in contrast to the original Claus model, which does not include any of the nongenetic risk factors. The Gail model was modified and validated for different races, including the Asians [3234]. However, the Gail and Claus models are not much used in China because biopsies and genetic tests are not easily available to the majority of Chinese women. In addition, the reliability of the prediction offered by the current model is comparable to that of the Gail model used in China and of the model involving single-nucleotide polymorphisms [35]. The current model aims at providing a more applicable predictive tool for breast cancer risk.

The risk prediction model reported in this article is based on a combination of risk factors and shows good overall predictive power, but it is still weak at predicting which particular women will develop the disease. The HRA model highlights the health risks, but does not diagnose disease and thus it cannot be a substitute for consultation with a medical or health practitioner. It is important to stress that this tool is not expected to replace other tests like self-examination, breast screening, or detection by other options. It is believed that additional information on individual genetic aspects could provide improved discriminatory power in predicting the risk of developing breast cancer. The model may prove useful in the screening setting by increasing awareness among women of the risk of their developing breast cancer. Individualized numerical risk estimates can increase adherence to screening [34, 36, 37]. This HRA model was developed to motivate the woman to take proper precautions, and to call attention to the expected risk. Every coin has two sides. Hospitals, institutes, or health management centers and health check-up centers, which plan to use this health risk appraisal tools should take into consideration the side effects, such as psychological sequelae [3840]. The findings of Su and his colleagues indicate that communicating the risk status by individual health risk appraisal service can induce psychological sequelae, especially in women having higher risk status [38].

Considering the rapid increase in breast cancer incidence in recent decades in China, development of breast cancer risk assessment models, targeting the Chinese women, is very much needed. Despite several limitations, the current model is more applicable to Chinese women than any other model developed for western populations. Software for computerized assessment software will be developed based on the current model. We expect that this HRA model will contribute to improving Chinese women’s awareness of the need for undergoing breast cancer screening.