Introduction

Methylation of specific gene promoter regions can be used to segregate diseased from healthy tissues in a number of diseases, most notably cancer [1]. There are now many well-documented and a few clinically proven methylated DNA markers that may be used diagnostically as shown in Table 1 and reviewed in [24]. Most of these markers have been found by analysing genomic DNA extracted from cancerous tissue and healthy tissue. In this context, methylation of tumour suppressor gene promoter regions was compared and it was found that diseased tissue contained a higher proportion of the methylated allele than healthy tissue.

Table 1 Genes with promoter regions that are hypermethylated in cancer. Standardised gene names and positions in the human genome can be found at www.genenames.org

Most commonly, DNA methylation analysis is based on the bisulphite conversion process that was developed in the early 1990s since this offers the potential to measure many CpG sites simultaneously [23]. It is a multi-step procedure requiring alkali denaturing of DNA, sulphonation of cytosine with bisulphite, hydrolytic deamination of cytosine, alkali desulphonation to uracil and finally isolation of converted DNA free from reagents. It is challenging to measure the quantitative yield of this process, but early HPLC measurements indicated that greater than 90 % of DNA was lost. However, commercial kit suppliers now claim recoveries in the 75 to 90 % range [24, 25].

After extraction and bisulphite conversion of DNA, the proportion of a specific target region or target CpG site that is methylated is measured. The term “methylation ratio” is commonly used to describe the ratio of output signals corresponding to methylated and total DNA target sequence mostly after bisulphite conversion. The use of output signals without corrections for recovery or yield prior to or during analysis means that in the absence of rigorous standardisation the measurement results can only be consistent with each other for the duration of a single experiment. In this case, the measurement results are defined by the procedures used to produce the results. Such measurements cannot be linked to an independent reference and consequently comparison of results between investigators and even by the same investigators at different times may be difficult since all variable parameters would need to be standardised to enable a valid comparison. After allowing for operational differences in analytical systems, the value of the methylation ratio measured is also affected by the DNA extraction method and the proportion of diseased tissue in the specimen extracted.

Despite these drawbacks, it is now well established that many tumour suppressor genes are silenced in cancer due to aberrant methylation of their promoter regions. This discovery has provided a number of potentially stable targets that could be used for diagnostic assays. However, the decision on which target or targets to use for diagnosis depends on comparison of studies made in different laboratories, using different measurement methodologies on different genes at different times. If the potential of these markers for diagnosis is to be realised, measuring systems that can be compared between laboratories and over varying time frames will be needed for validation of discoveries made with uncalibrated systems [26]. Though methylation measurements have the potential to be used for diagnosis of cancer, only very few tests are currently in use [27].

Unfortunately, a wide range of measuring technology has been applied to these measurements with only a few exceptional attempts to characterise the accuracy of the measuring systems and all have suffered from the absence of highly defined reference materials that could be readily used to evaluate method bias [28].

To evaluate accuracy and comparability of methylation ratio measurements between laboratories, we prepared a methylated DNA reference material and developed procedures to measure completeness of methylation, to measure the concentration of the reference materials and to calculate the methylation mole ratio of mixtures. The accuracy of the reference methylation ratios was evaluated by estimating the measurement uncertainty, a process that yields an interval that indicates measurement trueness; low uncertainty gives narrow intervals and greater confidence in the reference value [29]. The target DNA sequence was a 550 base pair (bp) region of the human CDKN2A tumour suppressor gene that is associated with complete loss of expression in many cancer types [30, 13, 8, 12]. The methylated and unmethylated materials were combined gravimetrically to give methylation ratios of 0.2, 0.5 and 0.8 calculated as amount of methylated DNA divided by the total amount of methylated and unmethylated DNA. The materials were then distributed to three centres that were highly experienced in methylation ratio measurement. All centres used bisulphite conversion followed by analysis using a commercially available measuring system and the results were returned for statistical evaluation. Synthetically derived reference materials may differ in length from genomic DNA, but may be more similar to DNA extracted from formalin-fixed, paraffin-embedded tumour sections. In any case, these materials represent the best case of DNA quality and provide a known starting point for characterising laboratory procedures.

Materials and methods

Target sequence

The target was a 550-bp sequence, shown in Fig. 2, from a region of the human CDKN2A tumour suppressor gene that is associated with complete loss of expression in many cancer types [30]. It is located in exon 1α from 21,965,080 to 21,964,531 of the reference sequence NT 008413.18 and contains 45 CpG sites.

Preparation of DNA reference materials

Details of the preparation and characterisation of the reference materials are given in Electronic Supplementary Material. In summary, a 550-bp amplicon of the CDKN2A target region was prepared using end-point PCR and a portion was methylated in vitro using SssI methyl transferase. The methylated and unmethylated DNA solutions were deproteinated, precipitated from ethanol and the resulting pellet was dissolved in TE0.1 buffer (10 mM tris(hydroxymethyl)aminomethane, 0.1 mM EDTA adjusted to pH 8.0 with 1 M HCl). These purified solutions of unmethylated and methylated CDKN2A 550 bp amplicons shall be referred to as the reference materials. The identity of the unmethylated DNA amplicon was verified by Sanger sequencing and by liquid chromatography-mass spectrometry (LC-MS) measurement of the molecular weights of MspI restriction fragments. Digestion of the 550 bp amplicon was necessary to produce smaller fragments that were amenable to LC-MS analysis. The LC-MS conditions facilitated denaturation of the double stranded fragments into single strands enabling the mass spectrum of each strand to be processed using proprietary MaxEnt deconvolution software to obtain the molecular weight.

Completeness of the in vitro methylation procedure was also confirmed by LC-MS measurement of the molecular weights of MspI restriction fragments. The relevant properties of expected MspI restriction fragments are listed in Electronic Supplementary Material Table S1. The methylated amplicon produced in this way corresponded to the fully methylated genomic sequence and thus could be used to evaluate the effects of the measurement process including bisulphite conversion.

The molar concentration ratio of the methylated reference material relative to the unmethylated reference material was measured by high pressure liquid chromatography (HPLC) and assumed that the molar extinction coefficients for the methylated and unmethylated molecules were equivalent. Five independent ten-fold dilutions of the unmethylated reference material and five independent five-fold dilutions of the methylated reference material were prepared and each was analysed by HPLC in triplicate (Electronic Supplementary Material Table S2); triplicate measurements were averaged and the average peak area was adjusted by the mass diluted to give peak area per unit mass of the original reference materials (Electronic Supplementary Material Table S3).

Preparation of methylation mixtures

Portions of the methylated and unmethylated DNA reference materials were diluted to ~6 ng/μL (based on absorbance at 260 nm) and gravimetrically mixed to give methylated DNA target concentration ratios of approximately 0.2, 0.5 and 0.8 (Electronic Supplementary Material Table S4); the mixtures are later denoted as M0.2, M0.5 and M0.8 and the methylated and unmethylated reference materials as M1.0 and M0, respectively. The reference methylation ratios (MR) of the mixtures were calculated using Eq. 1 that is derived in the Electronic Supplementary Material.

$$ \mathrm{MR}=\frac{P_{\mathrm{M}}}{1+{m}_{\mathrm{U}}/{\mathrm{Fm}}_{\mathrm{M}}} $$
(1)
P M :

Proportion of CDKN2A 550 bp CpG sites that were methylated in the methylated reference material

m M :

Mass of the methylated preparation used to prepare mixture (g)

m U :

Mass of the unmethylated preparation used to prepare mixture (g)

F :

Concentration ratio of methylated material to unmethylated material

Calculation of uncertainty of reference methylation ratios in methylation mixtures

The measurement uncertainties of the four factors, P M, m M, m U and F, in the methylation ratio measurement equation (Eq. 1) were used to estimate uncertainty of the methylation ratio of the mixtures supplied to laboratories for measurement. Methylation ratio uncertainties were evaluated using Eq. 2.

$$ \mathrm{U}\ \mathrm{MR}=\mathrm{MR}\sqrt{{\left(\frac{u{ P}_{\mathrm{M}}}{P_{\mathrm{M}}}\right)}^2+{\left(\frac{u F}{F}\right)}^2\frac{1}{{\left(1+\frac{{\mathrm{Fm}}_{\mathrm{M}}}{{\mathrm{m}}_{\mathrm{U}}}\right)}^2}} $$
(2)

The proportion of CDKN2A 550 bp CpG sites that were methylated in the methylated preparation, P M, was calculated by examining the deconvoluted mass spectra of restriction fragments for evidence of incomplete methylation. Evidence for incomplete methylation was found in the mass spectrum of the 183 and 145 bp fragments only; the abundances of molecular ions corresponding to incomplete methylation were adjusted for methylation content then summed and expressed as a proportion of the total abundance of methylated and unmethylated molecules in a similar manner to evaluating purity of organic compounds [31]. Uncertainty of P M was estimated based on detection of incomplete methylation and the probability of incompletely methylated materials being present below the limit of detection.

Uncertainty in the concentration ratio factor F was estimated as the sum of the squares of the standard uncertainty of the concentration of the methylated and unmethylated reference materials.

DNA region measured by the three expert centres

Subsamples of the mixtures with a methylation ratio of 0.2, 0.5 and 0.8, and the methylated and unmethylated reference materials were distributed to three centres that have substantial experience in methylation ratio analysis. The same 386-bp region of the reference material, shown in Fig. 1, was analysed at each centre using a commercially available measuring system.

Fig. 1
figure 1

Sequence of the 550 bp CDKN2A region synthesised for reference materials. The position of the 386 bp target region is highlighted in grey with CpG sites annotated in red. After bisulphite conversion and in vitro transcription, methylated cytosine produces guanosine and unmethylated cytosine produces adenosine at the corresponding positions in the RNA complement to the sequence shown above. Uracil specific RNase A treatment cleaves the strand of the RNA complement at every uracil site (corresponding to adenosine in the original DNA sequence) to give fragments that are analysed by MALDI-TOF

The measurement procedure at each centre consisted of many separate sample work up steps. Briefly, bisulphite converted DNA was amplified using PCR (Electronic Supplementary Material Table S6) prior to in vitro transcription to RNA, RNase A digestion and MALDI-TOF analysis. The PCR primers targeted a 386-bp region containing 35 CpG sites of the 550 bp DNA supplied in the calibrated solutions as illustrated in Fig. 1. Sixteen CpG sites were on RNA fragments that could not be analysed and the remaining 19 CpG sites were distributed between 11 RNA fragments with some fragments having 2 or 3 CpG sites as illustrated in Fig. 2 and listed in Electronic Supplementary Material Table S7.

Fig. 2
figure 2

Schematic of the 386 bp region analysed. The black upper line represents the entire 386 bp sequence with circles representing CpG sites and red oblique lines representing RNase A cleavage sites; white CpG sites were on fragments that could not be detected by MALDI-TOF. A single methylation ratio value is obtained for each RNase fragment irrespective of the number of CpG sites. The lower line represents the corresponding portion of the 550 bp sequence with diamonds showing the approximate position of MspI cleavage sites and size of the fragments underneath

Since the RNA fragments have differing numbers of CpG sites and only a single ratio is obtained for each fragment that was capable of detection, the measured fragment is termed a CpG unit here for clarity.

Statistical analysis of MALDI-TOF methylation ratio measurements

Each centre measured methylation ratios based on the replication scheme shown in Fig. 3 and provided measured methylation ratios for all replicates and CpG units in spreadsheet format.

Fig. 3
figure 3

Replication scheme. For each reference mixture, this process was replicated three to five times to give a total of 15–25 analytical replicates each consisting of methylation ratios for 11 CpG units covering 19 CpG sites

Variability of measurements due to CpG site, bisulphite conversion and the combined effects of processing for and analysis by MALDI-TOF was evaluated by using least-squares regression analysis to fit the data to Eq. 3. The replication scheme combined with the five different methylation levels gave a maximum of 1,375 data points provided by each centre that were evaluated simultaneously

$$ {z}_{h i jk}= A+{M}_i+{B}_j+{R}_k+{S}_h+{G}_{h i}+{\varepsilon}_{h i jk} $$
(3)
z hijk :

is the measured methylation ratio

A :

is a constant term

S h :

(h = 1 to n) is the effect of CpG site h on the measured methylation ratio

M i :

(i = 1 to 5) indicates the nominal methylation ratio relative to A

B j :

(j = 1 to n) indicates the bisulphite conversion replicate

R k :

(k = 1 to n) indicates an effect specific for the analytical replicate

G hi :

measures the interaction strength between site h and methylation level i

ε hijk :

are the residual values (residual errors) remaining after the least-squares fitting.

For each of the three Centres, there were a few missing values that were replaced by the mean of the existing values in the same set of replicates. This preliminary operation, which produces a balanced layout of data, has no significant effect on any of the results and was followed by removal of data outliers.

The least-squares regression analysis uses the model equation (Eq. 3) to calculate the value of each parameter that results in the lowest least-squares residual. To determine whether the value derived is different from zero, it is compared to the standard uncertainty of the parameter calculated as the standard deviation of the mean. For example, if B 3 is significantly different from zero, then the third bisulphite test gives consistently different results from the others. Similarly, if R 1 is significantly different from zero, then overall the first of any set of replicate measurements is consistently different from the others in the set. A more detailed explanation of the statistical analysis is given in Electronic Supplementary Material.

Results and discussion

The procedure for preparation of the reference material mixtures resulted in the methylation ratio value being dependent on the completeness of the methylation process for the methylated material, P M, and the complete absence of methylation for the unmethylated reference material. Both of these parameters were evaluated by LC-MS measurement of the molecular weights of MspI restriction fragments of the 550-bp sequence in the methylated and unmethylated reference materials. In the methylated reference material, the molecular weights matched the theoretical values for complete methylation and scant evidence of incomplete methylation was found (Table 2, Fig. 4) whilst in the unmethylated reference material, the molecular weights matched the theoretical values for no methylation with no evidence of contaminating methylation (Electronic Supplementary Material Table S5).

Table 2 Measured relative molar masses (M R) of MspI fragments of methylated CDKN2A reference material. LC-MS analysis denatured the DNA yielding single stranded molecules with different relative molar masses
Fig. 4
figure 4

Deconvoluted mass spectrum for methylated 116 bp digest fragment showing the molecular weights of 35,202.3 and 36,849.4 Da for the two strands. The molecular weights at 35,370.3 and 37,017.2 Da were most likely due to adducts of hexafluoro-2-propanol (M R. 168.0) from the LC-MS mobile phase

The procedure for evaluating completeness of methylation of the methylated reference material, P M, calculated the proportion of CpG sites in the CDKN2A 550 bp sequence that were methylated based on the assumption of equivalent molar response of methylated and unmethylated DNA. Low abundance signals were observed at some molecular weights that may have corresponded to incomplete methylation. The normalised abundances of molecular weight fragments corresponding to incomplete methylation were summed and subtracted from 100 to give an estimate of the completeness of methylation of the methylated reference material, P M, of 0.982 with a relative expanded uncertainty of 2.3 %.

Portions of these unmethylated and methylated reference materials were combined as shown in Electronic Supplementary Material Table S4 to give mixtures with reference methylation ratios, calculated using Eq. 1 with an expanded uncertainty (k = 2) calculated using Eq. 2 as given in Table 3.

Table 3 Reference methylation ratios, MR, and expanded uncertainty, U MR, of methylation mixtures

These methylation mixtures together with the methylated and unmethylated reference materials were supplied to all three centres and each centre used the same primer sequences for amplification prior to in vitro transcription, RNase digestion and MALDI-TOF analysis. This procedure was chosen as it was one of the few high throughput techniques that was in regular use at the all the centres and though there are multiple steps, automation and miniaturisation have improved efficiencies and standardisation compared to methylation specific PCR and sequencing based methodologies [32].

The average methylation ratios measured by the three centres are presented in Fig. 5.

Fig. 5
figure 5

Methylation ratios measured at three centres compared to the reference methylation copy number ratios. The solid line represents the reference value and its width covers the confidence interval of expanded uncertainty (k = 2); error bars for each point represent twice the standard deviation

Each centre underestimated the methylation of every solution except for the unmethylated reference material (not shown), though centre 1 was consistently closer to the reference values. A well-documented cause of underestimation of methylation after bisulphite conversion is PCR bias and there are recommendations for minimising this effect [33, 34]. In one study, bias towards amplification of the bisulphite converted unmethylated sequence of CDKN2A was in the order of 20-fold [35]. The sequence studied in reference 35 does not overlap the one in this study, but similar factors may have contributed to the underestimation that we observed even though the underestimation seen in the present study was less than a factor of 2 for each centre and would thus be classified as having “little” bias according to Warnecke et al. [35]. If PCR bias was present, it is not readily apparent why the bias would be different for different centres as each used the same primer sequences.

Another significant study on bias in methylation ratios was undertaken by Coolen et al. [28] who used similar technology and found similar levels of underestimation of methylation at high methylation ratios as was observed in the current work.

There is a fundamental difference between this study and previous work on PCR bias and methylation ratio bias; in this work, the methylation ratio was characterised independently of the measuring systems used to obtain methylation ratios by the three expert centres. In this way, the reference value was known with an extremely high level of confidence thus allowing valid conclusions about differences between results and expected values.

A striking feature of these results was the very high degree of variability in methylation ratio measurements of different fragments in the same sample shown by the error bars in Fig. 5. The standard deviations presented in these error bars were calculated from the combined results of all bisulphite and analytical replicates and all CpG sites; multiplying the standard deviation by 2 gives a value that encompasses 95 % of the measurement results.

The least-squares regression analysis separated variability due to the three major analytical steps. Interestingly, neither the bisulphite conversion process nor the multi-step procedure of producing RNA fragments and measurement by MALDI-TOF was a significant contributor to the variability since the least-squares values for these parameters were not significantly different from zero in comparison to their uncertainty. The major source of variability in measured methylation ratio was the CpG site effect. Each site gave consistently high or low values for all replicates compared to the average value. This was most clearly illustrated by the values measured for the M1.0 material that has a reference MR of 0.982 and a relative expanded uncertainty of 2.3 %. For centre 1, the average of all replicates measurements over all CpG sites was 0.88 however this was composed of non-random results from each site as shown in Fig. 6. At this centre, the values measured for CpG units 11–13 and 14–16 were consistent with the reference value, but those for CpG 4, CpG 17 and CpG 31 gave 0.80, 0.82 and 0.73, respectively. The effect of the CpG units with consistently low values is to lower the measured methylation ratio value for the 386 bp region analysed thus increasing the measurement bias irrespective of the number of replicates.

Fig. 6
figure 6

Methylation ratios of the M1.0 material measured at centre 1 (top), centre 2 (middle) and centre 3 (bottom). Each of the bisulphite replicates (separated by vertical lines) were analysed four or five times to enable statistical evaluation of variability due to bisulphite conversion, CpG site and the multi-step measurement process

CpG site variability was recorded at all three centres and there was some consensus in CpG sites that gave high or low results at different centres (see Fig. 6 above and Electronic Supplementary Material Table S8). For example, CpG 4 and CpG 17 had significantly lower MR values at all centres but CpG 11, 12, 13 had a consistently higher MR than other CpG units and was much closer to the reference value. PCR bias could not be the cause of MR variability between sites since this would result in the same bias for each CpG site.

The reproducibility of the site effects implied that sequence context may have been a causative factor so correlation to primary structure was attempted. Some of the factors examined were GC content of the RNA fragment, presence or absence of cytosine in the RNA fragment and GC content upstream and downstream from the RNA fragment. None of these factors could be correlated with the presence or absence of the negative bias observed. However, CpG units with multiple CpG sites had mostly positive S h values (Table 8 On-Line Resources 1) indicating a lower bias compared to the reference value than those with a single CpG site. This could mean that CpG sites that are close to each other may have been more resistant to erroneous bisulphite conversion than those sites that were isolated. Whatever the reason may be, it was clear that sequence context was the major determinant of variability in measured MR.

The intercomparison described here was made using only a single genomic region from the CDKN2A gene. Though many different CpG site contexts were evaluated, the variability encountered in this study has not yet been demonstrated for other target regions. If this were confirmed, there would be important implications for genomic research and for identification of regions that may serve as cancer diagnostic markers. The magnitude of underestimation of methylation together with the inherent variability between CpG sites means that many genes may have been inadvertently categorised as unmethylated or lightly methylated by some investigators whilst others may have found the same regions to be methylated depending on the measuring system used and specific CpG sites targeted.

Reference materials and mixtures with defined ratios that have low uncertainty have been used in this work to characterise a commercial DNA methylation measuring system and to enable comparison between laboratories. The results of this work show that, in the absence of reference materials and at least for this target region of CDKN2A, direct comparisons between laboratories cannot be made due to underestimation of methylation. Further, variability of measurements between CpG sites compromises the concept of measuring regional methylation by averaging the methylation ratios of many CpG sites. If the variability in measured MR between CpG sites is an artefact of the bisulphite conversion reaction, then it may be more effective to first identify those CpG sites that give the most accurate results and then design assays that target these sites only. Standardisation of the best CpG target sites will reduce the amount of work required for provision of reference materials.