Introduction

Age-prediction is an important part of forensic science. Currently, individual age may be predicted by measure and analysis of some osteal markers as bones and teeth with its acceptable error [1, 2]. However, this method is only confined to certain cases with the existence of skeleton. In recent years, many studies have explored the utility of biomarkers for predicting biological age. Telomere length has been believed to be a promising biomarker for age estimation [35]. Unfortunately, assays currently used to determine average telomere length of a sample are not precise or reproducible and are susceptible to a number of other variables that can alter the measure [5, 6]. The measurement of the 4977 bp deletion of mitochondrial DNA has also been well studied and used for age estimation [5, 6]. However, studies did not discover a direct correlation with age due to the heterogeneity in the abundance of mitochondrial deletions and the levels of oxidative stress in the patients [3, 6, 7]. The age-dependent accumulation of advanced glycation end-products (AGEs) has been used for age-estimation studies but poses the same problems as mitochondrial deletions [6, 8]. The racemization of aspartic acid in deceased and living persons was described as the best method for potential use in age estimation at the moment with higher correlation coefficients [6]. The most ideally analyzed target tissue using this method is tooth dentin, due to the accuracy and reproducibility of results obtained with that tissue [6, 9]. The most recently described method of age estimation is to measure sjTRECs accumulation in T-cells [1012]. However, this method can only be used to test peripheral blood (T-cells) and therefore does not apply to other forensically relevant tissues [11]. These mentioned methods for age estimation require further research due to the limited types of tissues that can be tested with these methods and the unknown effects of oxidative stress on these measures [6]. Therefore, it is imperative to find other feasible methods for individual age-prediction of forensic biological samples.

The accumulating evidence indicates that aging resembles a developmentally regulated process which is tightly controlled by specific epigenetic modifications rather than a random course [1316]. Among epigenetic modifications, DNA methylation is best characterized and plays an important role in cellular and organic senescence [1720]. 5-Methylcytosine of CpGs in the genomic DNA is established during ontogeny of mammal and can be replicated during cell division by maintenance of DNA methyltransferases. The global DNA methylation level decreases with aging in human tissues as a consequence of a progressive loss of DNMT1a efficiency [21, 22], whereas some loci associated with specific genes have become hypermethylated during aging [15, 18, 2326]. Therefore, age-associated methylation changes cannot simply be based on random deterioration but a developmentally regulated process during ontogenic development [27]. Recently, some age-related methylation changes were reported, which revealed the potential value of methylation for predicting biological age [2831]. Therefore, age-associated methylation can be used as a biological indicator for forensic age-prediction. In this study, three novel specific age-related methylation markers were successfully isolated from human blood. These markers display a strong correlation with age and result in a quantitative model of the age-prediction with high accuracy.

Materials and methods

Sample donors

Peripheral venous blood samples were collected from 105 donors (age range, 10–72 years). Ten senior donors (age range, 65–70 years) and ten junior donors (age range, 10–15 years) were selected for isolation and validation of methylation. Twenty independent samples (ten senior donors and ten junior donors) were selected for further validation of methylation. Sixty-five independent samples (39 males and 26 females; age range, 11–72 years) were selected for correlation analysis and age-prediction. Exclusions included conditions and drugs known to affect DNA methylation, such as rheumatoid arthritis, lupus, cancer, and the drugs procainamide and hydralazine. The research work described in this paper was carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans. The study protocol was approved by the Ethics Committee of Tongji Medical College of HUST, China. All donors gave their informed consent prior to their inclusion in the study.

Isolation of age-related methylation by MS-RDA

Methylation-sensitive representational difference analysis (MS-RDA) was performed as previously reported method [31].

Preparation of DNA

Genomic DNA of each donor was extracted from the blood using standard procedures of phenol and chloroform, followed by ethanol precipitation. DNA pool of the senior donors was prepared with equal mixture of genomic DNA from ten senior donors (age range, 65–70 years). DNA pool of the junior donors was prepared with equal mixture of genomic DNA from ten junior donors (age range, 10–15 years).

Preparation of amplicon

Pooled DNA (3 μg) was digested with 30 units of Hpa II restriction enzyme (NEB, USA) for 15 h at 37 °C. After phenol extraction and ethanol precipitation, 1 μg of digestion product was ligated to 500 pmol of RHpa adaptor by 800 units of T4 DNA ligase (NEB, USA) at 16 °C overnight. RHpa adaptor was prepared by annealing two oligonucleotides, RHpa24 and RHpa11, gradually from 90 °C to 10 °C for 80 min. The sequences of all six oligonucleotides used to prepare adaptor and for PCR amplification came from the former report [31] and listed in Table 1. The ligated product was amplified by 25 cycles of PCR with RHpa24 oligonucleotide as a primer. The RHpa adaptor of the amplicon was removed by digestion with HpaII and separation with DNA Sequence Purification Kit (TaKaRa Co., Japan).

Table 1 Sequence of six oligonucleotides used to prepare adaptor and for PCR amplification in MS-RDA

Subtractive hybridization

In positive subtractive hybridization, the amplicon from the senior donors was selected as testers and the amplicon from the junior donors was selected as drivers. In reverse subtractive hybridization, the amplicon from the junior donors was selected as testers and the amplicon from the senior donors was selected as drivers.

For the first cycle of competitive subtractive hybridization, the J adaptor (500 pmol), prepared by annealing two oligonucleotides, JHpa24 and JHpa11, gradually from 90 °C to 10 °C for 80 min, was ligated to 1 μg of the tester amplicons with T4 DNA ligase. An appropriate amount (500 ng) of the tester DNA with the J adaptor at its ends was mixed with 20 μg of the driver DNA. The DNA mixture was purified by phenol extraction and ethanol precipitation and dissolved in 8 μl of 3 × EE buffer (3 mM EDTA/3 mM N-[2-hydroxyethyl] pipecazine-N9-[3-propanesulfonic acid], pH 8.0), denatured at 96 °C for 10 min, and reannealed at 67 °C for 24–36 h in the presence of 1 M NaCl. One-tenth of the reannealed product was amplified by PCR with the JHpa24 oligonucleotide primer for 10 cycles. Tester/tester and tester/driver double-stranded DNA sequences had J adaptors on both and one end, respectively, and could be amplified exponentially and linearly, respectively. DNA sequences amplified linearly, existing as single-stranded DNA, were digested with 60 units of Mung-Bean nuclease (NEB, USA), and the remaining double-stranded DNA was again amplified by PCR for 22 cycles with JHpa24 oligonucleotide primer. A little amplified product was electrophoresed with 1.0 % agarose gel to watch the efficiency of the first cycle of subtractive hybridization. Other PCR product was purified with DNA Sequence Purification Kit (TaKaRa Co., Japan) to be used as the tester DNA of the second cycle of subtractive hybridization.

The second cycle of competitive hybridization was performed by switching the J adaptor used in the first cycle of competitive hybridization to a new adaptor (N adaptor for the second cycle, prepared by annealing two oligonucleotides, NHpa24 and NHpa11, gradually from 90 °C to 10 °C for 80 min). Varied amounts (50 ng) of ligation solution of tester DNA were mixed with 20 μg of driver DNA. Denaturing, reannealing, and selective amplification of the self-annealed products were performed as the procedure in the first cycle.

After the second cycle of subtractive hybridization, PCR product was electrophoresed with 1.0 % agarose gel. The visible separated DNA bands in the ethidium bromide-stained gel were cut, and the DNA fragments were recovered from the gel with an agarose gel recovery kit (Takara Co., Japan).

Sequencing of isolated DNA fragments

The isolated DNA fragments were cloned into pMD19-T Vector (Takara Co., Japan). Plasmid DNA was transformed into Escherichia coli competent cells JM109 (Takara Co., Japan). Bacteria were incubated in LB medium for 1 h at 37 °C, then plated onto LB plates containing ampicillin (50 μg/ml), X-gal (20 μg/cm2) and IPTG (20 μg/cm2) and incubated for 14 h at 37 °C. Five positive plasmid clones (white colony) were selected and identified by PCR amplification of the inserts using the Hpa24 primers. Sequence analysis was implemented by 3730 genetic analyzer (Applied Biosystems, USA). After the Hpa24 primers were removed from the obtained sequences, homology analysis of these sequences was performed by Basic Local Alignment Search Tool (BLAST) available at NCBI Website (http://www.ncbi.nlm.nih.gov/BLAST/). CpG island located in sequence of each isolated fragment was reported with the NEWCPGREPORT program available at EMBOSS website (http://emboss.bioinformatics.nl/). Promoter region located in sequence of each isolated fragment was predicted with the PROSCAN programs available at the website of Advanced Biosciences Computing Center (http://www-bimas.cit.nih.gov/molbio/proscan/).

Validation of methylation

In order to validate variant methylation of the isolated fragments, methylation degree of these fragments was roughly quantified by bisulfite sequencing in the same samples (ten senior donors and ten junior donors) used for MS-RDA and other independent samples (ten senior donors and ten junior donors).

Genomic DNA (2 μg) from each donor was subjected to bisulfite modification by using the EZ DNA Methylation kit (Zymo Research, CA). Each isolated fragment was amplified with designed primers, cloned into pMD19-T Vector, and sequenced as described above by 3730 genetic analyzer (Applied Biosystems, USA). For each donor, ten cloned segments of each isolated fragment were sequenced, and the methylated fraction of each CpG averaged across the ten cloned segments. The methylated degree of each CpG in each group was presented as the average methylated fraction of 20 donors, thus each value represents a total of 200 determinations. Difference comparison among individuals of the same group for methylation of each CpG was determined with single factor variance analysis. Intergroup difference for methylation of each CpG was analyzed by group t test. The CpG sites in every fragment that show obvious differential methylation between the senior and the junior were selected to identify their correlation with age.

Quantitative analysis of age-associated methylation

To identify the CpG sites which reveal a continuous age-associated hyper- or hypomethylation, Sequenom MassARRAY® analysis was used to quantify the methylation status of the isolated age-associated methylation fragments in 65 independent donors (age range, 11–72 years). Sequenom MassARRAY analysis utilizes mass spectrometry (MS) for the detection and quantitative analysis of DNA methylation using Homogeneous MassCLEAVE (hMC) base-specific cleavage and matrix-assisted laser desorption/ionization time-of-flight (MALDITOF) MS [32].

Genomic DNA was isolated as described above. DNA (1 μg) was converted with sodium bisulfite using the EZ DNA methylation kit (Zymo Research, CA) as described above. The primers were designed using epidesigner (http://www.epidesigner.com) for the target markers. Each reverse primer has a T7-promotor tag for in vitro transcription (5′-cagtaatacgactcactatagggagaaggct-3′), and the forward primer is tagged with a 10mer to balance TM (5′-aggaagagag-3′). The primer pairs were designed to span the target CpG regions of isolated fragments. PCR amplification of 1 μl bisulfite-treated DNA (>5 μg/μl) was performed using HotStar Taq Polymerase in a 5-μl reaction volume using PCR primers at a 200-nM final concentration. After Shrimp Alkaline Phosphatase treatment, 2 μl of the PCR products were used as a template for in vitro transcription and RNase A cleavage for the T-reverse reaction, as per manufacturer’s instructions (Sequenom hMC). The samples were desalted and spotted on a 384-pad SpectroCHIP (Sequenom) using a MassARRAY nanodispenser (Sequenom), followed by spectral acquisition on a MassARRAY Analyzer Compact MALDI-TOF MS (Sequenom). The resultant methylation calls were performed by EpiTyper software v1.0 (Sequenom) to generate quantitative results for each CpG site or an aggregate of multiple CpG sites. The experiment was performed in triplicate. The non-applicable reading and its corresponding site were eliminated in calculation. More detailed methodology may be found in reference [33].

The linear correlation analysis between age and the methylation markers

The methylation fraction of each fragments of each sample was calculated and expressed as the percent methylation. The Spearman test was used to determine correlation between age and methylation fraction, with significance set at P < 0.05. r represents the measure of the relationship between two variables. To provide an unbiased estimate of predictive accuracy for age, a leave-one-out analysis was used where the multivariate regression model is fit on all but one subject and its prediction should be related to the truly observed age of the left-out subject.

Results

Isolation of age-related methylation

After 2 cycles of positive subtractive hybridization, four differentially methylated fragments were successfully isolated from the senior and the junior (Fig. 1, P1–P4). The letter “P” means the sequence is the product of the positive subtractive hybridization. After 2 cycles of reverse subtractive hybridization, two differentially methylated fragment bands were successfully obtained from the two groups (Fig. 1, R1–R2). The letter “R” stands for the product of the reverse subtractive hybridization.

Fig. 1
figure 1

Electrophoretogram of differentially methylated DNA fragments by MS-RDA with digestion of HpaII (1.0 % agarose gel, EB stained). Lane 1: products of the first cycle of positive competitive hybridization with the tester amplicons from the senior donors and the driver amplicons from the junior donors; lane 2: products of the second cycle of positive competitive hybridization: three clearly visible fragment bands were attained (P1–P4); lane 3: products of the first cycle of reverse competitive hybridization with the tester amplicons from the junior donors and the driver amplicons from the senior donors; lane 4: products of the second cycle of reverse competitive hybridization: four clearly visible fragment bands were present (R1–R2); M DNA marker

Sequence analysis of isolated fragments

Six DNA fragments were successfully obtained from the isolated bands by sequencing. The sequence information of six fragments was shown in supplementary data 1.

All fragment sequences have high homology with human genomic sequences in GenBank. P1 has 100 % homology with partial sequence of exon-1 of the human gene ubiquitin-specific peptidase 11 (USP11, GenBank ID: 8237) located on chromosome Xp11.23. P2 has 100 % homology with partial sequence of intron-13 of the human gene NOP14 nucleolar protein homologue (yeast) (NOP14, GenBank ID: 8602) located on chromosome 4p16.3. P3 has 100 % homology with the partial sequence of intron-1 of the human gene Zic family member 5 (ZIC5, GenBank ID: 85416) located on chromosome 3. P4 has 100 % homology with the upstream region of extron-1 of the human gene phosphatase 1, regulatory subunit 18 (PPP1R18, GenBank ID: 170954) located on chromosome 6p21.3. Fragment R1 has 100 % homology with the unknown region between the human gene ZNF557 and INSR located on chromosome 19p13.2. R2 has 100 % homology with the unknown region located on chromosome 1p36.33. The location of all fragments in human genomic sequences was shown in Fig. 2.

Fig. 2
figure 2

Schematic diagram of CpG sites in six sequences. This map shows the relative position of each CpG site in the sequence. Each horizontal bar represents a sequence. Each short vertical bar represents a CpG site in the sequences. The promoter regions of P1, P2, P4 was labeled with dotted line. Methylation of each CpG site in six sequences was analyzed with bisulfite sequencing and was shown in Fig. 3

CpG islands were reported on fragments P1, P3, P4, and R1 by the program NEWCPGREPROT with the default parameters. Promoter regions were predicted on fragments P1, P2, and P4 by the programs PROSCAN. The main sequence characters of all isolated fragments were shown in Table 2. The relative location of all CpGs and promoter regions in each fragment was shown in Fig. 3.

Table 2 Characteristics of differentially methylated DNA fragments by MS-RDA
Fig. 3
figure 3

Comparison for methylation of each CpG in six sequences between the senior and the junior donors. In this diagram, the number represents the CpG according to the array in the sequence that was shown in Fig. 2. Some CpGs display variable difference of methylation between two groups. Twenty-three CpGs in fragment P1, nine CpGs in fragment P2, and five CpGs in fragment P3 present a significant different methylation between the senior donors and the junior donors (shown with symbol “*”). Average methylation of these CpGs in each fragment was further validated by MassARRAY (see Fig. 4). In each fragment, the result represents the average methylation of each CpG from ten individuals (mean ± S.E.M. of ten donors, the senior versus the junior, P < 0.01 by group t test)

Methylation validation of isolated fragments

There are different numbers of CpGs in different fragments. Each CpG in each fragment shows its special methylation status with age. The research failed to detect any significant individual diversity within the same age group for all CpGs of isolated fragments. Some age-related methylation change was observed in fragments P1, P2, and P3. Methylation level of some CpGs in fragments P1, P2, and P3 displays a remarkable difference between the senior group and the junior group (Fig. 3, mean ± S.E.M. of ten donors, the senior versus the junior, P < 0.01 by group t test).

The linear relationship between age and the methylation changes

Methylation status of three methylation markers was quantitatively measured from 65 independent donors (age range, 11–72 years). In order to get a possible accuracy of relationship analysis, the CpGs showing different methylation between the senior and the junior in each fragment were studied. Based on the average methylation of the selected CpGs, a significant increase or decrease in methylation was found to correlate with age for the CpGs at P1 (r = −0.88), P2 (r = −0.85), and P3 (r = 0.80) (for all correlation, P < 0.0001). The result of correlation analysis is shown in Fig. 4.

Fig. 4
figure 4

Age-related methylation analysis: average percentage methylation of CpGs in fragments P1–P3 versus age was validated in 65 donors. Linear trend lines are shown in all charts. P1 (r = −0.88), P2 (r = −0.85), P3 (r = −0.80)

Regression analysis for epigenetic age-predication

To provide an unbiased estimate of predictive accuracy for age, a regression analysis was used where the multivariate regression model is fit on all but one subject and its prediction should be related to the truly observed age of the left-out subject. The estimated age: age = 103.007 − 49.205 × M P1 − 71.875 × M P2 + 67.645 × M P3 (R 2 = 0.93). M P1, M P2, and M p3 represent the average percent methylation of the CpGs at P1, P2, and P3. The predicted values are highly correlated with the observed age in the sample (r = 0.96, P < 0.001, n = 65, Fig. 5). The averagely absolute difference between the predicted and the observed age (the error) is 4 years. For the male or female, the multivariate regression model of the estimated age is different. The estimated age of male: age (male) = 116.927 − 49.121 × M P1 − 86.421 × M P2 + 47.978 × M P3 (R 2 = 0.93). The estimated age of female: age (female) = 87.495 − 48.665 × M P1 − 55.304 × M P2 + 86.959 × M P3 (R 2 = 0.94). For the male model or female model, the averagely absolute difference between the predicted and the observed age (the error) is 4.7 and 2.9 years, respectively.

Fig. 5
figure 5

Predicted versus observed age of all subjects using a multivariate regression model. Its predicted age (y-axis) was related to the truly observed age of the left-out sample (x-axis). The predicted values are highly correlated with the observed ages (r = 0.96, P < 0.001, n = 65)

Discussion

A lot of different methods for age-prediction have been evaluated during the last decades in the forensic community [312]. These methods mentioned for age estimation require further research due to the limited types of tissues that can be tested with these methods and the unknown affects on these measures [6]. In this paper, age-related DNA methylation change was introduced as a method to determine the age of a person. This method was able to predict age with a great accuracy and a correlation coefficient of r = 0.96, which is less than the coefficient obtained when analyzing the racemization of aspartic acid in human dentin. However, the better accuracy of this method is based on the evaluation of the three fragments altogether to estimate human age.

Many techniques and methods have been developed to analyze the genome-wide methylation with their special applicable fields for particular purpose [34]. MS-RDA is a kind of rough and relative effective method to screen and isolate novel differentially methylated sequences from genome [31, 35]. This technique can serve as an efficient tool in understanding the nature of epigenetic changes and their significance to the aging process and development [31]. Compared with other screening methods, MS-RDA is more effective and economical for us to isolate differentially methylated targets.

In this study, three novel age-related methylation fragments were obtained from the whole genome. There is a special variable methylation pattern for each CpG of every fragment. Some CpGs display a remarkable variation of methylation between the senior and the junior, which reveals age-related hyper- or hypomethylation patterns. It is important to notice that no obvious individual variation of methylation was observed within the same age group. The result reveals that the methylation of individual in the same age roughly keeps at the same level.

Age-associated methylation is considered to be based on gradual alteration during ontogenic development although it is yet unknown how site-specific methylation change is regulated [27]. Some age-related methylation markers with good accuracy for prediction of age have been reported [2831]. However, more age-related methylation markers should be identified to add the availability of epigenetic age-prediction before age-related methylation becomes a valuable epigenetic predictor of age.

In this study, data of methylation patterns of three novel methylation markers in donors across wide age range reveals a continuous age-associated hyper- or hypomethylation. The linear correlation between methylation level of three markers and age (Fig. 4) shows a good linear hyper- or hypomethylation trend. The regression model for age-prediction shows the ability to predict an individual’s age with an average accuracy (R 2 = 0.93). The predicted values are highly correlated with the observed age in the independent sample (r = 0.96, Fig. 5) and show an average of 4 years difference between the predicted and true age. Therefore, the model has a relative accuracy and could be used by forensic scientists to estimate age based on a biological sample. For the male or female, the average accuracy is different between the male (R 2 = 0.93) and female (R 2 = 0.94), owing to the sexual distinction.

Peripheral blood was chosen as research sample in this research for its constant appearance at crime scenes. Peripheral blood leukocytes are a heterogeneous mix of cells, each type with a different pattern of DNA methylation reflecting the repertoire of expressed genes [36]. Therefore, the observed age-associated methylation differences could represent age-dependent subset shifts rather than methylation changes in the same cell type. But whatever the observed methylation changes represent, they can reflect the whole tendency of blood’s methylation with aging. Moreover, every tissue has different methylation pattern [36, 37] and could have a specific methylation change during aging. If age-related methylation markers in multiple tissues are identified, a wider analysis for forensic age-prediction can be possibly carried out to predict age over the whole blood sample. Combination of multiple age-associated methylation markers will accurately estimate age with lower error.

Conclusions

Three novel age-related methylation markers were isolated from human blood. The methylation patterns of these markers display a strong correlation with age and result in a quantitative model of the age-prediction with high accuracy. Further research need to be heavily oriented toward finding more age-related methylation markers for age-prediction from various biological samples. If mathematical models of age-prediction can be improved with a lower error, it is conceivable that biological age might one day be predicted with more accuracy as chronological age.