To the Editor,

A recent report by Tang et al. contributes to a rapidly emerging and exciting field of work examining blood-derived DNA methylation as a risk factor for breast cancer [1]. Multiple studies have identified associations with increased breast cancer risk for a genome-wide measure of hypomethylation in blood-derived DNA ([2], and others). Tang et al. carried out a considered, staged analysis that involved a genome-wide discovery phase using the Illumina Infinium HumanMethylation 450 K methylation assay and blood-derived DNA from 48 unselected breast cancer cases and 48 unaffected controls. They then measured seven marks that were suitable for analysis using MassARRAY Epityper technology in three validation cohorts that included blood-derived DNA from a proportion of familial breast cancer cases. Associations with risk were replicated for three marks. For these three replicated marks, the total number of cases included was between 565 and 568. The authors noted the importance of replicating their findings in prospective studies.

We attempted to replicate the findings of Tang et al. using a prospective design based on 423 breast cancer cases and matched unaffected controls participating in the Melbourne Collaborative Cohort Study (MCCS). MCCS participants provided a blood sample at baseline (1990–1994) and incident cases to 31 December 2007 were identified through linkage with the Victorian Cancer Registry. Controls were selected through incidence density sampling and individually matched to cases on year of birth, year of baseline attendance, country of origin and sample type. Case–control pairs were allocated consecutively on a same chip of the assay to minimise batch effects [2]. We used conditional logistic regression to compute odds ratios (OR) and 95 % confidence intervals (CI) for the association between methylation level at each CpG (expressed as M-values, per standard deviation) and risk of breast cancer. We found no evidence that any of the three CpG sites identified and replicated in the Tang et al. study were associated with risk of breast cancer in our prospective study (Table 1). This was the case in analyses with and without adjustment for cell composition (estimated using the Houseman algorithm) and after restricting the analysis to cases diagnosed within five years of blood draw. Further, there was no evidence of an interaction with time since blood draw, fitted as a continuous variable (Table 1). Adjusting for smoking status, body mass index and alcohol drinking did not substantially change these findings. No evidence of association was observed (P ≥ 0.1) for the additional four CpGs identified in the discovery phase of the Tang et al. study that were not replicated in later phases of that study. All seven CpGs were measured with very good reliability in our study (based on 15 technical replicate pairs, intraclass correlation coefficients were typically greater than 0.85, except cg21932542 for which ICC was 0.68 [3]).

Table 1 Replication analysis of the three CpG sites identified in Tang et al., Oncotarget, 2016, in the Melbourne Collaborative Cohort Study

The main difference between our study and that of Tang et al. is the fact that our blood samples were taken from unaffected individuals, whereas Tang et al. used samples taken after breast cancer diagnosis. We found no evidence of varying effect size by time since blood draw, nor of hypomethylation in cases for blood samples taken within five years of diagnosis (although power was limited for these subgroup analyses). Blood samples drawn at the time of breast cancer diagnosis may contain circulating tumour cells and or cfDNA, both of which can make a significant contribution to the DNA yield especially in the context of heavy tumour load. Although in another context it is exciting to consider the measurement of blood-born tumour DNA methylation marks, this is a complication for studies aiming to identify risk factors for unaffected women. Indeed, these marks could have considerable potential as biomarkers for residual disease monitoring.

Another difference between the two studies is that MCCS cases and controls were matched on age and the time period of blood sample collection, which was wider and heterogeneous in the German study. We also controlled more strictly for batch effects by randomly assigning each case–control pair to the same batch [2]. The methylation measurement technology used in the two studies was also different; interestingly, only five of the seven marks identified in the discovery phase of Tang et al. passed the technical replication phase using MassARRAY Epityper (on the same 94 DNA samples). Tang et al. also used a mix of sporadic and familial cases compared to the population-based sample from the MCCS, although this did not seem to have affected their observed relative risks.

Replication of findings remains challenging for epigenome-wide association studies, but as costs associated with these assays decrease and the number of studies able to make these measurements in a large number of samples increases, we anticipate that it will become increasingly possible to conduct more definitive replication studies, as has been achieved for common genetic variants via genome-wide association study consortia.