Introduction

Breast cancer (BC) is the most frequent cancer among women in both developing and developed regions of the world. In 2012, around 522,000 women died due to BC and another 1.67 million new cancer cases were diagnosed worldwide [1, 2]. About 5–10% of women with a diagnosis of BC do have a family history of BC, which is a known risk factor for this disease [3]. Breast tumours can be caused by germline variants in tumour suppressor genes like TP53, which may also be somatically mutated in sporadic tumours [4]. To identify somatic mutations and to determine genes which are critical in the development of human cancer, the International Cancer Genome Consortium (ICGC) was launched. The first spectra of somatic mutations in human protein-coding genes in BC were published in 2006 and 2007 [5, 6]. Around 90% of BC tumours are caused by somatic mutations, so-called driver mutations, which initiate the carcinogenic process [79]. To identify possible driver genes in sporadic breast tumours a number of studies using next-generation sequencing were published in 2012 [1013]. However, it was not investigated, whether the driver genes could also contain inherited variants, which influence the development of cancer. Thus, the aim of this study was to identify germline variations in potential driver genes which influence BC risk and/or survival.

Materials and methods

Study population

The present study was performed using a population-based Swedish cohort consisting of 782 prospectively collected cases and 1,559 age-and gender-matched controls from the Västerbotten intervention project (VIP), the mammary screening project (MSP) and from the Department of Oncology at the Norrlands University Hospital in Umeå. Controls were matched with cases by age at baseline (±6 months) and time of sampling (±2 months). Blood samples were collected from an ethnically homogenous population living in Umeå (North Sweden) and its surroundings between January 1990 and January 2001 [14]. Prospective cases were identified from the cohorts by record linkage to the regional cancer registry. Date and the reason of death were collected until 30 January 2012 from the Swedish population register while clinical data were enquired from the registry managed by the Northern Sweden Breast Cancer Group (Table 1).

Table 1 Characteristics of breast tumours at the time of diagnosis

All participants gave informed consents to the use of theirs samples for research purpose. The study was approved by the ethical committees of the participating institutes.

Gene/SNP selection

We focused on genes described to carry BC driver mutations in at least two of the following publications: Banerji et al. [10], Ellis et al. [11], Shah et al. [12], Stephens et al. [13]. We were mainly interested in genes not previously reported as BC driver genes, consequently well-known and intensively studied genes, such as BRCA1, BRCA2, TP53 and PTEN, were excluded from the study. SNP selection was done by Ensembl browser release 69 (http://www.ensembl.org/index.html). We emphasized regions with known functions like core promoter, 5′- and 3′-untranslated regions (UTRs) and nonsynonymous SNPs which were described in well-verified transcripts marked in Ensembl as HAVANAmanually curated gold transcripts and consensus coding sequences (CCDSs). Haploview was used to select SNPs on the basis of linkage disequilibrium (LD) (r 2 ≥ 0.80) to minimize the number of SNPs to be genotyped. An additional inclusion criterion was defined by a minor allele frequency (MAF) >10%, with the exception of ARID2 (rs7570492), MAP3K13 (rs13091808), MLL2 (rs11168827) and MLL3 (rs1323116, rs3735156). CBFB was excluded as it did not have any functional SNPs fulfilling our selection criteria. Reported driver genes with SNPs fulfilling our selection criteria are listed in Online Resource 1.

Genotyping

Either the KASPar SNP Genotyping System (KBioscience, Hoddesdon, Great Britain) or the TaqMan SNP Genotyping Assay (Life Technologies, Darmstadt, Germany) was used. Both are based on an allele-specific PCR. Master Mix for the KASPar assays was prepared according to the KBioscience’s conditions and products, whereas 5x HOT FIREPol Probe qPCR Mix Plus from Solis BioDyne (Tartu, Estonia) was used for the TaqMan assays. In case an assay could not be designed, an assay for a highly linked SNP (r 2 ≥ 0.80) was ordered instead. PCRs were performed in a 384-well plate format using a total reaction volume of 4 µl per well. Endpoint genotype detection was performed using the ViiA7 Real-Time PCR System (Applied Biosystems, Weiterstadt, Germany).

Statistical analysis

The χ 2 test was used to test the observed genotype frequencies in the controls for Hardy–Weinberg equilibrium (HWE). To estimate the associations between genotypes and BC risk, odds ratios (ORs) and 95% confidence intervals (CIs) were calculated by logistic regression (PROC LOGISTIC, SAS Version 9.2; SAS Institute, Cary, NC). Relative risk of death was estimated as hazard ratio (HR) (PROC PHREG, SAS Version 9.2, SAS Institute, Cary, NC) via Cox regression. Polymorphisms that showed significant differences in BC-specific survival in the unadjusted model were analysed further by adjusting the data for size of tumour, lymph node metastases, histological grade and estrogen (ER) and progesterone receptor (PR) status (PROC PHREG, SAS Version 9.2, SAS Institute, Cary, NC). p values ≤0.05 were considered statistically significant. Kaplan–Meier method (PROC LIFETEST, SAS Version 9.2; SAS Institute) was used to generate survival curves. To measure the differences between the survival functions among the different genotypes log-rank test (PROC LIFETEST, SAS Version 9.2; SAS Institute, Cary, NC) was used. To account for multiple testing, empirical p values were generated using the permutation option in Plink [15]. Number of permutations used was equal to 10,000. In this prospective study, we followed the REMARK recommendations for reporting of tumour marker prognostic studies [16].

In silico functional analyses

To increase our knowledge about the consequences of the SNPs on protein-binding sites, chromatin structure and promoter and enhancer strength, HaploReg (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) was used; RegulomeDB (http://regulome.stanford.edu/) was utilized to gain detailed information of possible effects on histone modification. All effects were proofed for data in MCF7 (Michigan Cancer Foundation-7 breast cancer cell line), T-47D (epithelial cell line derived from mammary ductal carcinoma), human mammary epithelial cells (HMEC) or MCF10A-ER-SRc (breast epithelial cell line-estrogen receptor-src) cell lines. Effects on transcription factor-binding sites (TFBSs) were calculated through position weight matrices (PWM). PolyPhen/SIFT prediction was used to evaluate a possible impact of an amino acid change on protein structure and function (Ensemble release 75, http://www.ensembl.org/index.html). Influence of the 3′-UTR SNPs on micro-RNA binding was studied using microSNiPer (http://epicenter.ie-freiburg.mpg.de/services/microsniper/). All linked SNPs mentioned in HaploReg with an r 2 ≥ 0.80 among the European population were studied for their influence on promoter, enhancer or chromatin structure.

Signatures of selection

To gain information about the functional consequences of the SNPS, likelihood of mutations, conservation and recombination rate (RR) of the genes and the SNPs associated with BC risk, survival or tumour characteristics were evaluated. Recombination rate of a specific region was assessed in comparison to the whole chromosome. As an estimate for conservation, phylogenetic p value (phyloP) was used. A region is conserved if the value is ≥ .3 and has a positive prefix. Both variables were analysed with the UCSC Browser (http://genome.ucsc.edu). Indication of selective pressure was assessed by analysing first the Fixation Index (F ST). It compares allele frequencies between two populations. Values ≥0.25 indicate strong genetic differentiation and values >0.05 moderate genetic differentiation. Second, integrated haplotype score (iHs), which determines the length of haplotypes around a SNP, was assessed. Scores with IiHsI >2 are a proof of selection and IiHsI >1.5 is an indication of selection; a negative score refers to a longer haplotype for derived alleles and a positive score for ancestral alleles. The third value, Fay Wu’s H, distinguishes between a DNA sequence evolving randomly and the one evolving under positive selection. Strong negative values starting at −40 are considered as a signature of a selective sweep [17, 18]. To identify these values, haplotter was used (http://haplotter.uchicago.edu). In case the genotyped SNP was not listed, values of a highly linked SNP (r 2 ≥ 0.80) were used instead.

Results

Altogether 20 SNPs in 14 potential driver genes were associated with BC risk, survival and/or clinical and pathological tumour characteristics at p ≤ 0.05 level (Tables 2, 3; Online Resource 2 and 3).

Table 2 Genes with SNPs associated with BC risk and their associations with clinical characteristics (p ≤ 0.05)
Table 3 Genes with SNPs associated with BC-specific survival and their associations with clinical characteristics (p ≤ 0.05)

SNPs associated with risk

Five genes were associated with BC risk (Table 2; Online Resource 2). The genotype distribution of rs2242442 (TBX3) and rs10497520 (TTN) was significantly different between the cases and the controls (overall p = 0.01 and p = 0.03, respectively). The most significant association was observed for rs2242442 (TBX3): both heterozygous and homozygous carriers of the minor allele were at a decreased risk of BC (OR 0.76, 95% CI 0.64–0.92; dominant model). Also minor allele carriers of another TBX3 SNP, rs12366395, had a decreased risk (OR 0.83, 95% CI 0.69–1.00; dominant model). LD between these SNPs determined by 1000 genomes was r 2 = 0.01. Interestingly, three of the four genotyped SNPs in TBX3 were associated with less aggressive tumour features: rs2242442 with small tumour size and rs8853 and rs1061651 with low histological grade (Table 2, Online Resource 3).

Among TTN rs10497520 minor allele carriers, only the homozygous ones were at increased risk (OR 1.96, 95% CI 1.18–3.26). Four additional SNPs in TTN showed associations with less favourable tumour characteristics, large tumour size, high-grade and/or negative hormone receptor status (Online Resource 3).

An increased risk was observed for homozygous carriers of two SNPs (r 2 = 0.25) in MAP3K1 (rs702688 OR 1.33, 95% CI 0.99–1.76; rs72758040 with OR 1.36, 95% CI 1.01–1.83). However, no association with clinical tumour characteristics was observed for any of the eight genotyped SNPs.

One SNP in MLL2, rs11168827, was associated both with risk (OR 1.31, 95% CI 1.00–1.72 for homozygotes), positive hormone receptor status and low grade. A decreased risk was observed for homozygous minor allele carriers of SF3B1 rs4685 (OR 0.73, 95% CI 0.54–0.97). The SNP was also associated with negative lymph node metastasis and hormone receptor status. By applying the permutation test, none of these associations remained statistically significant.

SNPs associated with BC-specific survival

SNPs in ARID1B, ATR, RUNX1 and TTN showed association with BC-specific survival (Table 3; Online Resource 2). Poor survival was observed for carriers of the minor allele of the SNPs rs73013281 (ARID1B) and rs2227928 (ATR) (HR 1.58, 95% CI 1.02–2.45 and HR 1.63, 95% CI 1.00–2.64, dominant model), whereas for SNPs rs17227210 (RUNX1), rs2303838 (TTN) and rs2042996 (TTN) poor survival was observed only among homozygous carriers. These results are supported by the Kaplan–Meier plots (Fig. 1). Tumours of the ARID1B and ATR SNP carriers were also diagnosed at high stage. The association between rs2227928 (ATR) and high stage remained statistically significant after applying the permutation test (p = 0.023). Consequently, the associations of ARID1B and ATR with survival did not stay significant after adjustment for clinical tumour characteristics (Table 4). The strength of the associations between the RUNX1 and TTN SNPs and survival remained at the same level after adjustment with tumour size, lymph node metastasis status and grade, they even became stronger after further adjustment with hormone receptor status. However, these results should be taken with caution because of the small numbers of the minor homozygote genotypes and because of incomplete hormone receptor status data (Table 1). Nevertheless, as two additional RUNX1 SNPs, rs8130963 and rs7276777, were associated with stage and four SNPs in TTN with less favourable tumour characteristics, a true direct or indirect association of the genetic variation with survival cannot be ruled out (Online Resource 3).

Fig. 1
figure 1

Kaplan–Meier plots of SNPs associated with breast cancer-specific survival

Table 4 Associations of SNPs with BC-specific survival without and with adjustment for established clinical markers

Functional characterization of the associated SNPs

We used HaploReg to search for SNPs in high LD (r 2 ≥ 0.80) with SNPs associated with either BC risk, tumour characteristics or survival and used the experimental data obtained in any mammary epithelial (tumour) cell line (MCF7, T-47D, HMEC, MCF10A-ER-SRc) to assess their possible functional role in breast tumourigenesis (Online Resource 4; summarized in Table 5). All promoter and 5′-UTR SNPs, except rs12465459 (TTN), were located in an active promoter. According to the ENCODE data, all these SNPs can affect chromatin structure, histone modification, regulatory protein and/or transcription factor binding. Of the eight 3′-UTR SNPs covered by our study, five were predicted to change the binding site for one or more miRNAs and they all had an impact on histone modification and transcription factor-binding sites. A possible or probably damaging functional effect was predicted by PolyPhen for two of the six missense SNPs.

Table 5 Summary of the possible functional effects of genes and SNPs associated with BC risk, survival or clinical markers

Signatures of selection

TTN was the most noticeable gene in this study. Four out of six SNPs are conserved and all six SNPs show either signatures of a strong (F ST ≥ 0.25) or moderate (F ST ≥ 0.05) genetic drift and an evolution under positive selection (Fay Wu’s H ≤ −40) (Online Resource 5). In addition, the iHs score (−1.5) for s10497520 is an indication for a selection based on the derived allele. All these values lead to a strong indication of a functionality of the genetic variation in TTN. Furthermore, rs8130963 (RUNX1) is significant among three values. Fay Wu’s H = −56 indicates an evolution under positive selection which is supported by F ST = 0.346 (CEU vs. YRI) and F ST = 0.054 (CEU vs. ASN).

Discussion

Our aim was to identify cancer-related germline variants in novel genes classified as potential BC driver genes in four studies published in 2012. After genotyping and statistical analysis ATR, RUNX1, TBX3 and TTN became the focus of the study. These potential driver genes carry germline variants with a statistical and functional impact and could, therefore, be potential predisposing genes with an influence on the development of BC. Whereas SNPs in TBX3 were associated with less aggressive tumour markers, SNPs in ATR, RUNX1 and TTN showed an opposite association. All associated or highly linked SNPs (r 2 ≥ 0.80) affected gene regulation according to the experimental ENCODE data. RUNX1 and TTN showed also signatures of positive selection.

Ataxia Telangiectasia mutated and Rad3-related (ATR), together with Ataxia Telangiectasia mutated (ATM), plays an important role in cell cycle regulation by transducing DNA damaging signals [19]. Thus, ATR has been studied as a target for cancer therapy [20]. In the present study, two missense SNPs rs2227928 and rs2229032 were genotyped. These SNPs were also identified in Finnish and French breast/ovarian cancer families [21, 22], however, at a similar frequency as in the healthy control populations. Both SNPs are predicted to be benign/tolerated according to PolyPhen/SIFT. However, rs2227928 captures rs6768093 (r 2 = 0.99), which is located at the active promoter of ATR. Rs2227928 was associated with high stage, large tumour size and positive regional lymph node metastases, consequently poor survival was observed. Several SNPs which are linked to rs2227928 with an r 2 between 0.85 and 0.97 are located in the PLS1 (Plastin 1) gene. The encoded actin-binding protein has been found at high levels specifically in small intestine [23]. An association with BC is so far unknown. The significant F ST value (0.076 EUR vs. YRI) for rs2227928 could be an indication of genetic hitchhiking, as other values were not noticeable.

Runt-related transcription factor 1 (RUNX1) is a tumour suppressor which is highly expressed in breast epithelial cells [24]. Downregulation of RUNX1 is part of a 17-gene signature that has been suggested to predict BC metastasis [25]. RUNX1 may stimulate E-cadherin and inhibit epithelial-to-mesenchymal transition [26]. However, how RUNX1 promotes BC development has to be clarified. In the present study, three out of the four genotyped SNPs were associated with high stage (rs8130963, rs7276777) or poor survival (rs17227210). All these SNPs are located in introns and were genotyped instead of potentially functional SNPs, rs72813661 in the active promoter, rs13051066 in the 3′-UTR and rs56045941 in the 5′-UTR, respectively. Rs8130963 shows a strong negative Fay Wu’s H value of −56 and a strong genetic differentiation between the European and African population (F ST = 0.346), which is an indication for positive selection. Rs17227210 was associated with poor survival before (HR 3.66, 95% CI 1.48–9.08) and after adjustment for ER, PR, T, N and grade (HR 5.07, 95% CI 1.15–22.47). All linked SNPs are located at a position of either a strong or a weak enhancer. One of these, rs17227231 (r 2 = 0.92 to rs17227210), affects GATA3 (GATA binding protein 3) binding. As GATA3 has already been classified as a high confident cancer driver gene and as a possible marker for metastatic breast carcinomas [27, 28], the change of the GATA-binding site could explain the poor survival associated with rs17227210.

The T-box transcription factor 3 (TBX3) is expressed in mammary tissues and plays a context-dependent role in mammary gland development and tumourigenesis [29]. TBX3 interacts with several major oncogenic pathways and is overexpressed in many tumours but most commonly in BC [30]. Recently, somatic mutations in TBX3 have been classified as BC driver mutations [1013, 31, 32]. Our study suggests an additional layer to the involvement of TBX3 in the development of BC by showing that SNPs are associated with protective tumour traits. The two risk-associated SNPs rs2242442 and rs12366395 are located at the active promoter and have an effect on the transcription factor-binding site of STAT (signal transducer and activator of transcription). Mutations in STAT proteins, which lead to unregulated cell proliferation, have been found in myeloproliferative disorders. However, these aspects are not well studied in BC [33]. Gene expression of TBX3 could be influenced by the SNPs rs8853, rs1061651 and two other 3′UTR SNPs linked to them due to their impact on miRNA-binding sites. An association of miR-1290, which binding is affected by rs3741698, a SNP linked to rs1061651 with r 2 = 0.91, and estrogen receptor-positive BC has been described [34]. Furthermore, TBX3 overexpression has been observed in primary breast tumours and BC cell lines with higher expression in estrogen receptor-positive tumour cells [29]. However, other publications have described that estrogen-induced TBX3 overexpression results in a pool of estrogen receptor-negative cancer stem-like cells [30].

TTN (Titin) has been intensively studied as a component of the muscle contractile machinery [35, 36]. However, TTN seems to play a role in non-muscle cells during chromosome condensation and chromosome segregation [37, 38]. Furthermore, the disease rhabdomyosarcoma is associated with Titin. Rhabdomyosarcoma can also affect the breast, although rarely [39]. Thus, TTN may play a role in oncogenesis, although the biological mechanisms need to be evaluated. In contrast, TTN is also described as a false-positive driver gene due to mutational heterogeneity which dominates over true driver events [35, 40]. In our study, six out of nine genotyped SNPs were associated with increased risk, aggressive tumour characteristics and/or poor survival. Three of these SNPs showed an association with negative hormone receptor status. Two of five missense SNPs (rs12463674 and rs10497520) are predicted to cause a probably or possibly damaging amino acid change by PolyPhen. Thus, these SNPs could have impact on the structure and function of the protein. Beside rs12463674 and rs10497520, TTN has a large number of other missense mutations, though phyloP estimates for TTN indicate high conservation. For all genotyped SNPs, the values of F ST, Fay Wu’s H and iHS indicated strong positive selection.

Although our study provides new knowledge about genes and mutations influencing BC risk, tumour characteristics and survival certain limitations have to be considered: first, a relatively small sample size and especially missing information on hormone receptor status decreased the power to detect associations with genotypes; second, the associations did not stay statistically significant after correction for multiple testing; third, the results need to be replicated in another population. The strengths of our study included a population-based design, with prospectively collected blood samples, long follow-up, and detailed clinical data. As several SNPs within one gene were associated with BC risk, tumour characteristics and/or survival, all of them with functional consequences and some even with signatures of positive selection, a true direct or indirect association of the studied SNPs with BC development cannot be ruled out.

Conclusion

Our study suggests that germline variants in driver genes of sporadic tumours can have an impact on BC risk, tumour characteristics and/or survival. Several SNPs in ATR, RUNX1, TBX3 and TTN showed associations with BC development and progression. In silico analyses provided evidence of possible functional consequences for the associated SNPs. For TTN and RUNX1, strong signatures of positive selection gave further insights on the functionality of the SNPs. However, to verify the results on BC survival and the influence of TTN, further investigations are necessary.