1 INTRODUCTION

Breast cancer (BC) is one of the most prevalent malignancies among women worldwide, including Vietnamese. It is recognized that BC is a complicated disease that is affected by hereditary and environmental variables. The heredity of BC was estimated in recent times to be 31% and its typical environmental components to just 16% (Möller, 2016), suggesting that a significant concern for recent scientific investigations into BC is their variety in disease-related genes.

Many genome-wide association studies (GWASs) have already found many susceptibility variations related to BC in Caucasian and Asian populations (Hunter, 2007; Gold, 2008; Stacey Simon N, 2008; Cai, 2011; Kim, 2012; Han M.-R., 2016). However, most were variations with modest penetration risk than those in prevalent genes such as BRCA1 or BRCA2 (Bradbury, 2007). The two genes make up 25% of the family risk and around 5% of the incidence of BC through uncommon mutation frequencies (Peto, 1999; Pharoah, 2004). Over 100 single nucleotide polymorphisms (SNPs) have been discovered thus far. Although most of these SNPs were discovered in primarily Caucasian groups (Turnbull, 2010; Fletcher, 2011; Haiman, 2011), a few SNPs were discovered in Asian populations (Kim, 2012; Cai, 2014; Han M.-R., 2016). Several studies have shown that specific SNPs exhibit ethnic-specific characteristics and should not be tested in other populations (Kim, 2012; Chen Yazhen, 2016; Han M.-R., 2016; Xu M., 2016). Fine-scale mapping of GWAS-identified areas (Glubb, 2015; Orr, 2015; Shi, 2016), as well as meta-analysis of previous GWAS (Lindström, 2014; Michailidou, 2015; Couch, 2016), have recently contributed to the enormously increasing number of SNPs related to BC susceptibility in certain ethnic groups (Zheng Y., 2013).

Even if a specific SNP is related to the risk of BC, this single variation conferring risk is minimal. As a result, polygenic risk scores (PRS) have been developed to assess the cumulative effect of specific SNPs related to BC (Mavaddat, 2015). A PRS considers each SNP’s odds ratio (OR) and the overall number of risk alleles that an individual holds. The majority of PRS generated were taken from population data sets of the Caucasian population. Instead, a small amount of research in Asian populations is developing PRS models for the risk of BC using associated SNPs.

This study aimed to examine the relationship of known SNPs with the risk of BC in Vietnamese women. The combinations of relevant SNPs associated with BC were also utilized to generate a PRS that was then assessed to predict Vietnamese BC risk.

2 MATERIALS AND METHODS

2.1 Study Population

This study was a population case-control study of BC. Women aged 28–81 yr with histologically confirmed initial primary in situ or invasive BC were identified as cases at Oncology Hospital in Ho Chi Minh City, Vietnam between 2015 and 2017 (n = 240). Healthy women with no known cancer were recruited at the Oncology Hospital in Ho Chi Minh City to serve as population-based healthy controls. This study included 271 controls without a cancer record, typically matching an approximate cases’s age distribution.

2.2 DNA Extraction, SNP Selection, and Genotyping

The salting-out procedure was used to extract genomic DNA from peripheral blood and kept at –80°C until it was needed for further research. SNP selection was carried out through a review of GWASs or candidate-gene association studies based on the existing genotype analysis literature (Gapska, 2009; Qian, 2011; Li N., 2014; Han M.R., 2015; Qi, 2015; Wu, 2015; Chen Y., 2016; Hein, 2017; Zhang H., 2017; Zhu, 2017). The ten SNPs, which were most significantly associated with BC in other populations and common in Vietnamese population (MAF > 10%), were selected including VDR (rs2228570), IGF-I (rs7965399), miR-146A (rs2910164), MRE11A (rs2155209), TOX3 (rs4784227), HSPD1 (rs2605039), LSP1 (rs3817198), FGFR2 (rs2981582), miR-196A2 (rs11614913), and miR-370 (rs12325489).

SNP genotyping was performed using the High-resolution method (HRM) in a LightCycler 96 System (Roche Diagnostics Penzberg Germany). Comprehensive quality control (QC) procedures were followed to ensure genotyping quality, including duplicate genotype identification using Sanger, a Hardy-Weinberg equilibrium (HWE) test, and a call rate of greater than 99%. A total of 30 (5.9%) quality control samples were successfully genotyped, with a 100% concordance rate.

2.3 Statistical Analysis

The odds ratio (OR) and 95% confidence interval (CI) were computed using a logistic regression model to examine the association between the SNPs and BC risk. HWE was examined among controls using a goodness-of-fit Chi-squared test. Student’s t-test and Chi-squared test were used to compare cases and controls. R version 4.1.0 was used for all statistical analyses.

A PRS was established to estimate the polygenic contribution of BC susceptibility loci using marginally significant SNPs associated with BC risk (p < 0.05) based on any one of the per-allele, codominant, dominant, overdominant, or recessive logistic regression models. For robust linkage disequilibrium SNPs located on the same gene or chromosome (each D' > 0.9), the one variant with the lowest P-value as a candidate was chosen. Then, a weighted PRS was calculated for each individual using the formula: PRS = β1x1 + β1x1 +… βkxk … + βnxn where βk is the per-allele log odds ratio (OR) for BC associated with the minor allele for SNP k, and xk is the number of risk alleles for the same SNP.

Logistic regression analysis was performed to investigate the association between BC and PRS, with PRS being a continuous variable (Mavaddat, 2015). When stratified by menopausal status, a logistic regression model was created to evaluate the association between PRS and BC risk. In addition, ORs based on logistic regression models were estimated for different PRS quartiles, with the first quartile being the reference. The area under the receiver operating characteristic curve (AUC) was applied to evaluate the model’s discriminative ability.

3 RESULTS

Distributions of the characteristic of BC patients and healthy controls are shown in Table 1. The mean age of cases and controls were 51 ± 8.8 and 50 ± 8.9, respectively. The mean age was not significantly different between cases and controls. Due to a lack of comprehensive information, this study relied solely on age as a rough measure for menopausal status (Premenopausal: age ≤50, Postmenopausal: age >50) (Hill, 1996). The percentages of menopause status in patients were 47 and 53% for premenopausal women and postmenopausal women, respectively. These percentages in controls were 60% premenopausal women and 40% postmenopausal women. Interestingly, a significantly more significant proportion of cases were postmenopausal than those of similar ages in controls.

Table 1. Descriptive characteristic of BC patients and healthy controls

Table 2 indicates the association between the ten candidate SNPs and BC risk in the assessed Vietnamese population. For the per-allele model, two SNPs, rs2605039 and rs2981582, revealed significant associations with the risk of BC. The three SNPs, named rs4784227, rs2605039, and rs11614913, showed significant association with the risk of BC under the dominant model. The four SNPs, rs2155209, rs2605039, rs3817198, and rs2981582, demonstrated significant association with BC risk under the recessive model. CASC22 gene rs12325489 was significantly associated with the risk of BC either in the codominant or under the overdominant model. The remaining three SNPs, VDR rs2228570, IGF-I rs7965399, and miR-146A rs2910164, showed a non-significant association with the risk of BC. Seven SNPs, namely rs2155209, rs4784227, rs2605039, rs3817198, rs2981582, rs11614913, and rs12325489, were selected to create PRS model as their association with BC risk were marginally significant. In addition, since seven SNPs were not in strong linkage disequilibrium (D' < 0.8), all these SNPs were chosen for establishing the PRS model.

Table 2.   Association between the candidate SNPs and risk of BC

According to the quartile distribution, women in the second (from –0.18 to –0.06), third (from –0.06 to 0.07), and fourth (from 0.07 to 0.48) quartiles had 1.79-, 2.03-, and 2.65-fold increased BC risks compared to women in the first quartile (from –0.49 to –0.18), showing a significant increasing trend (p = 0.01). The trend was also significant when the same risk score was separately applied to premenopausal women (p = 0.007). Postmenopausal women in the 4th quartile showed a significant 2.38-fold increase in the risk of BC (p = 0.03) (Table 3).

Table 3. Association analysis between PRS and BC risk

The AUC was then calculated to evaluate the effectiveness of the PRS model (Fig. 1). The AUC was estimated at 71, 75, and 68% for the PRS in all ages, premenopausal and postmenopausal status, respectively (Table 3). The estimated AUC of models was higher than 70%, corresponding to an acceptable discriminative ability to diagnose patients with and without the disease (Safari, 2016).

Fig. 1.
figure 1

ROC curve. The purple line is the reference. The red line shows the ROC of all population (AUC = 0.71). The green line shows the ROC of postmenopausal status (AUC 0.68), whereas the blue line represents the ROC of premenopausal status (AUC = 0.75).

4 DISCUSSION

The risk associated with high- and moderate-risk breast cancer susceptibility genes is modified by polygenic risk scores (Kuchenbaecker, 2017). When PRS was included in the risk prediction model, Kuchenbaecker et al. found significant differences in the risk for breast cancer (Kuchenbaecker, 2017). Mainly, with a PRS in the top quintile, BRCA1 carriers had a 56% risk of developing breast cancer by the age of 80. Others with a PRS in the 90th percentile, on the other hand, had a 75% risk of breast cancer by 80 yr old. There was also evidence of subtype-specific PRS, with a PRS adjusted for ER-negative risk having the highest correlation with the risk of breast cancer in BRCA1 carriers (Kuchenbaecker, 2017). The highest discrimination was found in the ER-negative PRS model for BRCA1 carriers (AUC = 0.58). However, since the literature remains limited to only a few research findings, more investigation is warranted.

Presently, genetic screening of high- and moderate-penetrance genes obtains non-informative results for most women at high risk of breast cancer. Breast cancer prevention strategies are underutilized within this group of women (Schwartz, 2012). Consequently, novel risk prediction methods are needed to notify risk management decisions for women who have received non-informative outcomes from monogenic testing. According to studies investigating the use of polygenic factors, PRS was predictive of breast cancer risk in women with non-informative BRCA1/2 results (Dite, 2013; Dite, 2016; Li H., 2017; Lakeman, 2019).

There is currently no guideline to help clinical genetic services implement breast cancer polygenic testing. Usually, genetic services primarily concentrate on testing monogenic risk genes (e.g., BRCA1/2) and their familial consequences. Since polygenic screening has become more widely used in clinical practice, a transformation toward personalized care will be necessary. Breast cancer PRS testing is now available in clinics (Hughes, 2017; Black, 2018), and it is aimed at women who have received non-informative results from monogenic testing.

This study evaluated ten BC-related SNPs to determine the possible relationships with BC risk in the Vietnamese population (Table 2). Among them, seven SNPs were significantly associated with BC risk in our Vietnamese population, including MRE11A rs2155209, TOX3 rs4784227, HSPD1 rs2605039, FGFR2 rs2981582, LSP1 rs3817198, miR-196A2 rs11614913, and CASC22 rs12325489 after multiple testing. The findings provide additional evidence for follow-up GWAS studies in BC, particularly those carried out in Asian populations. In addition, the PRSs were constructed using seven selected SNPs to measure the cumulative effect of variants. Furthermore, the PRS models were developed to discriminate against women according to BC risk, which provided adequate power with an AUC of 71%. In addition, this PRS model seems to be most effective in premenopausal women with an AUC of 75%.

In the current association study, of the seven BC‑related SNPs, four SNPs were associated with an increased BC risk in Vietnamese, namely TOX3 rs4784227, FGFR2 rs2981582, LSP1 rs3817198, and CASC22 rs12325489. These four SNPs have been previously reported to be significantly associated with BC risk across different ethnicities with a similar direction of effect to our findings (Long, 2010; Fernandez-Navarro, 2013; Lin, 2014; Na Li, 2014; Zhang Y., 2017; Zuo, 2020). However, several studies were showing that rs3817198 was statistically insignificant as a BC risk factor in some populations, including Turkey (Ozgoz, 2020), China (Chen Y., 2016; Tan, 2017), Brazil (Fernandes, 2016), Tunisia (Shan J., 2012), and Germany (Campa, 2011). The differences between studies indicated that rs3817198 is population-specific SNPs, thus could act as a specific marker for the Vietnamese population. Regarding the molecular mechanism of effect, these SNPs interfere with distinct pathways underlying BC’s growth. By modulating the expression of TOX3, rs4784227 could target and affect BRCA1, which significantly involves controlling genome stability and DNA repairing (Shan Jingxuan, 2013; Tajbakhsh, 2019). SNP rs2981582 and rs3817198, meanwhile, are located in the intron region of FGFR2 (Easton, 2007) and LSP1 (Harrison, 2004), respectively, which have a role in promoting cell proliferation, differentiation (Ricol, 1999; Yu, 2003; Fogarty, 2007) and controlling cell cycle, and apoptosis (MacLachlan, 1995). As for rs12325489, it dictates the transcription of a gene engaging in cancer tumorigenesis and metastasis—lincRNA CASC22 via creating a binding site for miR-370 (Dinger, 2008; Guttman, 2009; Huarte, 2010; Gibb, 2011).

Other significant associations identified in our study included MRE11A rs2155209, HSPD1 rs2605039, and miR-196A2 rs11614913. The current study has demonstrated that these three SNPs showed a significantly decreased risk for BC in a Vietnamese population. The effect trend was likely the same as the previously reported studies for two SNPs, rs2605039 (Zhu, 2017) and rs11614913 (Xu W., 2011; Wang J., 2012; Wang P.Y., 2013; Chen, 2014; Dai Z.J., 2015; Dai Z.M., 2016; Mu, 2017; Zhang H., 2017; Bastami, 2019; Choupani, 2019). In terms of rs2155209, a reverse direction of effect was observed between Vietnamese and Chinese. While rs2155209 showed decreased BC risk in Vietnam, it was associated with an increased BC susceptibility in China (Wu, 2015). This finding is the first study in Vietnam to explore the association between rs2155209 and the risk of BC; thus, further large-scale studies should be carried out to confirm our results. SNP rs2155209 is located in the 3'UTR of MRE11A, a gene responsible for repairing DNA damage when a double-strand break occurs, leading to BC development (Lobrich, 2007). SNP rs2605039 is a genetic variant in the intron region of HSPD1, which encodes for a heat shock protein controlling the expression of anti-apoptotic proteins—BCL-2 and BCL-XL (Ghosh, 2008; Pace, 2013). By dictating the binding region of miR-196A2 and homeobox genes, rs11614913 significantly influences cell proliferation and DNA repair (Easton, 2007; Ma, 2007; Stacey S.N., 2007).

As an individual, the effect of each SNP on the risk of BC is modest. However, it has been demonstrated that their combined impact, as PRS, provides a level of risk discrimination that could be utilized to stratify individuals into distinct disease risk groups. Previous studies that examine the influence of PRS on the risk of BC have consistently shown greater PRS in women who have been diagnosed with BC than in the controls (Sawyer, 2012; Muranen, 2016; Evans, 2017). Overall, European population research shows that the lowest and highest quartile PRS distribution is at least a two-fold difference in the risk of BC (Wacholder, 2010; Darabi, 2012; Allman, 2015; Vachon, 2015). Similar findings have also been reported across other populations, including African American and Asian ancestry (Zheng W., 2010; Allman, 2015; Hsieh, 2017; Chan, 2018; Starlard-Davenport, 2018). Our PRS results also showed a linear association with an increased risk of BC, indicating at least 2.65-fold risk for Vietnamese women in the highest quartile compared to those in the lowest (Table 3). In the combination of PRS and premenopausal status, women in the third and fourth quartiles had 3.65- and 3.32-fold increased BC risks than women in the first quartile (Table 3). These ORs were much higher compared to previous studies. Hsieh et al. created a PRS composed of six SNPs and found that the OR was around 2.26-fold for women in the highest quintile compared to those in the lowest score [36]. Mavaddat et al. constructed a 77-SNP PRS for BC and found a threefold increase in risk when comparing the highest and the middle quintiles (Vachon, 2015).

In addition, this study has assessed the discriminatory accuracy of BC PRS. The discriminatory accuracy of PRS has been most commonly assessed by calculating AUC. The reported AUC for BC PRS has been modest, ranging from 0.59 to 0.69 for European populations and 0.57 to 0.72 for non-European populations (Table 4). The obtained AUCs in our study (>70%) were relatively high compared to previous retrospective and non-familial studies in American, European, and Asian populations (Table 4). A study with a similar AUC result was reported by Shieh et al., 2016 (Shieh, 2016). The study showed an AUC of 0.72 within a sample size of 51 cases and 51 controls in Asian Americans. The difference was that Shieh et al.'s study generated PRS from 76 variants, while this study obtained PRS using only 7 SNPs. There has not been a common consensus on whether fewer or more SNPs would render a better PRS model. In two separate studies conducted in Asians having a similar sample size, one obtained an AUC of 0.60 using only 6 SNPs in their PRS (Shieh, 2017), while the other obtained an AUC of 0.57 using a 46-SNP PRS (Chan, 2018). These findings imply that the choice of ethical SNPs for the populations under study must be tailored. In addition, the sample size seems to have no noticeable impact on the discriminant ability of PRS models. In two separate European studies obtaining a similar AUC of 0.62, one conducted with 1664 cases and 1636 controls (Mealiffe, 2010) while the other conducted with a much larger sample size 33 673 cases and 33 381 controls (Vachon, 2015).

Table 4.   Comparison of the studies on PRS for BC risk

The novelty of this study is that compared to other studies utilizing PRS (Table 4), this study has evaluated four SNPs (miR-196A2 rs11614913, CASC22 rs12325489, MRE11A rs2155209, and HSPD1 rs2605039) that have not been previously included in any other PRS. Of the above four SNPs, there were two SNPs (rs11614913 and rs12325489) on miRNA and long non-coding RNA gene, suggesting the potential use of these non-coding RNAs in further BC studies. Three SNPs (rs11614913, rs2155209, and rs2605039) out of four SNPs above were associated with a reduced risk of BC in Vietnamese. This finding could have contributed to the increased discriminant efficiency of the PRS model in this study.

Nevertheless, some concerns must be addressed in order for this work to be correctly interpreted. First, the power of logistic regression analysis in this study could reach 80% in detecting a log-additive OR of 1.38 with a MAF of 14%. However, other SNPs with lower ORs and MAFs may need a larger sample size to reach this statistical power. Second, our current analysis was limited to 10 common BC-risk variants (>10% in the Vietnamese population) identified by previous association studies with OR values higher than 1.6 or lower than 0.7. Shortly, larger effect sizes of sequence variants are likely to be uncovered. Therefore, our PRS results should be interpreted carefully. In addition, due to a lack of data on clinicopathological characteristics, we could not conduct subgroup analyses in terms of different cancer subtypes. Further studies should take subgroup analyses to differentiate BC risk using PRS.

5 CONCLUSIONS

Our data evaluated and identified the significant association of seven SNPs out of the ten SNPs with BC risk in a Vietnamese population. The PRS model included seven BC-related SNPs that are significantly related to BC risk. The seven-SNP PRS only and menopausal status help discriminate women at high risk of BC from those at low risk. Future comprehensive evaluations of the genetic risk variants in a larger population are warranted.