Introduction

DNA methylation

Epigenetics is the study of reversible, heritable changes that influence gene regulation without altering the underlying DNA sequence [1]. Perhaps, the most well characterised is DNA methylation, an epigenetic modification whereby a methyl group is covalently added to cytosine in DNA [2]. Most DNA methylation occurs in CpG dinucleotides [3]; however, methylation elsewhere (CpT, CpA and CpC) has been documented [4, 5]. The human genome harbours millions of CpGs which may exist in a methylated, semi-methylated or unmethylated state depending on the chromosomal location, alleles, cell type or developmental phase [6,7,8]. Introns, 3′ untranslated regions (UTRs) and intergenic sequences are severely depleted in CpGs, whereas exons have a somewhat higher density of CpGs [9]. It is estimated that between 60 and 90% of CpGs are methylated. Unmethylated CpGs were originally thought to be grouped in regions known as CpG islands (CGIs); however, CGIs have also been reported to exist in methylated states. CGIs are often located in promoter regions and control gene expression via DNA-differentiated methylation levels [10]. CGIs are 300–3000 bp long, with a CpG content of above 50%. DNA methylation is essential in mammalian development, partaking in numerous processes including genome stability, X-chromosome inactivation, genomic imprinting, aging and carcinogenesis [2, 3]. Aberrant DNA methylation patterns are said to be implicated in numerous diseases such as cancers [11], autoimmune diseases [12], diabetes [13], neurodegenerative [14] and psychological disorders [15].

DNA methylation patterns are the collective result of methylation establishment and maintenance [extensively described in 6, 16,17,18]. Such established methylation patterns may be stored as well as stably inherited and may affect an organism phenotype [19]. Even though DNA methylation is heritable over cell divisions, the methylome undergoes changes during development. During early embryogenesis, the oocyte and sperm initially display high methylation levels. Remodelling of methylation occurs in the first cell divisions of the zygote when both maternal and paternal genomes become demethylated. The maternal genome undergoes passive loss of methylation, while the paternal genome is actively demethylated during several DNA replications. Methylation levels are re-established during the blastocyst stage, during which cells also lose totipotency. While these dramatic fluctuations in methylation are not repeated, local changes in DNA methylation do occur in each cell lineage, thereby resulting in specific methylomes which may enable prediction of cell type [18, 20].

Furthermore, DNA methylation patterns are dynamic and may vary due to several factors. These comprise genetic variation including single-nucleotide polymorphisms (SNPs) [21], diet, nutrition and alcohol [22, 23], lifestyles, stress [8, 24], smoking [25, 26], drugs and substance abuse [27] amongst others. Several studies have also highlighted age-related changes in global DNA methylation [17, 28, 29]. Age and DNA methylation are generally thought to exhibit an indirect relation, i.e. the aging of cells is coupled with the loss of DNA methylation [30]. This was evident in a study by Florath et al. [31] who found a negative correlation between age and DNA methylation at 43 CpG sites in six genes. One study by Day et al. [32] presented age-related DNA methylation changes to be tissue-specific. With progressing age, CpG sites outside of CGIs are said to lose methylation, while the opposite was true for the CGIs [30, 33].

DNA methylation has been shown to affect chromatin availability to transcription factors and binding of regulatory proteins, thereby playing a role in the regulation of gene expression. This control of gene transcription is critical for normal human development and cellular differentiation [33]. Unmethylated sequences are generally expressed, while methylated promoters are generally hindered and thus methylated sequences undergo transcriptional repression [34,35,36]. Conversely, gene body methylation has been reported to show a positive correlation with gene expression. DNA methylation is more predominant within gene bodies than at promoters [37]. While most CGIs display low methylation, a small percentage does acquire methylation during development. Additionally, a significant number of CGIs are differentially methylated between healthy tissues and cell types [38, 39]. DNA methylation may also be present in CGI shores, which are regions 0–2 kb from CGIs, CGI shelves which are regions 2–4 kb from CGIs and other open sea regions which are CpG sites harboured in the genome randomly. Flanking regions of a CGI have been found to exhibit stable methylation which may also play a role in gene regulation [40,41,42].

Epigenome-wide association studies (EWAS) have found numerous individual CpG sites and genomic regions that show different methylation patterns between human tissue and body fluids. These are known as differentially methylated sites (DMSs) and tissue-specific differentially methylated regions (tDMRs), respectively [43, 44]. tDMRs are mainly found at the margins of CpG islands and both CpG and G/C content is lower than that of surrounding regions. tDMRs are thought to afford cells with an epigenetic memory by generating cell-type specific hypo- and hypermethylation patterns [35, 45]. While the distinct active regulatory role of tDMRs is not fully established, these differentially methylated regions are thought to function by either attracting or preventing the binding of specific factors in a methyl-dependent manner. Major links between gene silencing and tDMRs have been identified [46, 47]. Based on differential methylation profiles, DMSs and tDMRs enable distinction between tissues and fluids [44, 48, 49]. It is this factor, along with potential to estimate age, that is now exploited in forensic science.

DNA methylation-based markers in forensic sciences

The recovery and subsequent analysis of human body tissues and fluids from crime scenes are perhaps one of the most crucial preliminary steps of a forensic investigation [50, 51]. The transfer of skin, blood, sweat, urine and saliva from human to object may indicate and help decipher events of a physical confrontation. Similarly, semen, vaginal fluid and menstrual blood assist in the reconstruction of sexual assault events. Body fluids can aid in the reconstruction of crime scenes and ease the process of identifying individuals who were present and involved in the crime [35, 52,53,54]. Identification of body fluids, along with linking donors to said body fluids, is not mediocre tasks; they may have localized on material or surfaces, be mixed with other fluids or contaminants, present in minimal quantities and in sub-optimal quality [41, 55], all of which present restraints to exact identification.

Studies show that tDMR-based markers present salient features for the identification of forensically relevant body fluids and in estimating individual age [56, 57]. However, the focus of the present review is on the use of DNA methylation-based markers for body fluid identification.

Forensic samples are not always present in high quantity and quality, and it is imperative to restrict consumption and degradation of valuable evidence. DNA methylation-based assays are compatible with existing short tandem repeat (STR) typing protocols and allow multiplexing which permits identification of several body fluids at the same time [58,59,60,61]. Mixture analysis has also been reported by several studies [62, 63]. However, even though DNA itself is stable, some methylation assays require bisulphite conversion which may degrade DNA. In addition, sometimes low-input protocols might be challenging to work with [64, 65].

Frequently employed methods for the identification and quantification of DNA methylation levels in forensics are discussed in Supplementary Information. The most recent studies on current DNA methylation-based markers, verification of marker stability and results of mixture analysis are critically discussed in the succeeding pages.

The current status of DNA methylation-based markers for forensic application

Frumkin et al. [58] were the first forensic-based study to report differentially methylated genomic loci between venous blood, saliva, semen, skin epidermis, vaginal fluid, menstrual blood and urine using methylation-sensitive restriction enzyme-PCR (MSRE-PCR). Fifteen loci which contained the restriction enzyme for HhaI (GCGC) were selected by an in-house developed software program for the tissue identification assay. However, only four proved promising. Varying ratios of methylation were observed in the target tissues and body fluids. The ratio of methylation levels between L91762/L68346 was lower in semen than all other fluids, whereas high methylation ratios of the same loci, L91762/L68346 and L76138/L26688 were characteristic of skin epidermis samples. The results showed distinction between semen and other fluids and skin epidermis from other fluids, respectively. However, a study by Gomes et al. [66] failed to reproduce these findings, since their results showed that high methylation ratios of L91762/L68346 were characteristic of not only skin, but also saliva samples, rendering the marker non-specific.

Madi et al. [67] employed bisulphite modification and pyrosequencing to research differentially methylated regions that were previously reported by Eckhardt and colleagues [44]. The group found several CpG sites in the ZC3H12D, FGF7, C20orf117 and BCAS4 genes which displayed differential methylation profiles of saliva, blood, semen and skin tissue. Five CpG sites in ZC3H12D exhibited semen specificity. FGF7 also enabled identification of semen with hypermethylation relative to blood, saliva and skin. Blood was positively identified by hypermethylation of a locus within C20orf117. Accurate differentiation from skin was inconclusive in the experiment. Therefore, this marker requires more vigorous testing for identification of blood or skin. Eckhardt et al. [44] also showed high levels of methylation at C20orf117 in CD4+ and CD8+ lymphocytes relative to skin and sperm. Eckhardt et al. [44] found highest methylation of a region 5′ upstream of BCAS4 in semen; however, the same tDMR showed highest methylation at five CpG sites in saliva when compared to other fluids tested by Madi et al. [67]. Despite the confusion between Eckhardt et al. [44] and Madi et al. [67], in a follow-up study by Antunes et al. [68], the same markers ZC3H12D and FGF7, C20orf117 and BCAS4 were proven useful for semen, blood and saliva identification, respectively.

Testing aged samples by Antunes et al. [68]

ZC3H12D, FGF7, C20orf117 and BCAS4 markers were tested to determine mean percent methylation of 9-year-old blood samples, 20-year-old blood and semen samples. Interestingly, methylation patterns were observed to be unwavering over such long periods of time; percent methylation was the same as samples that were recently collected.

Lee et al. [69] selected five previously reported tDMRs to differentiate between saliva, blood, semen, vaginal fluid and menstrual blood. tDMRs in the USP49 and DACT1 genes (ubiquitin-specific peptidase 49 and Dapper 1 isoform II, respectively) were selected as semen-specific markers as they were identified to be testes-specific by Kitamura et al. [70] and tDMRs for PFN3 (profilin III), PRMT2 (protein arginine N-methyltransferase II) and HOXA4 (homeobox A4) genes were chosen as blood-specific markers as different methylation patterns in blood, spleen and brain tissues were observed by Illingworth et al. [71]. Bisulphite treatment, PCR and sequencing showed that DACT1 and USP49 were unmethylated in over 90% of clones from semen and hypermethylated in almost all blood, saliva, vaginal fluid and menstrual blood clones. The PFN3 marker displayed moderate methylation (65%) in vaginal fluid whereas high methylation (80%) was observed in other tissues and fluids. PRMT2 was hypermethylated in vaginal fluid and menstrual blood and demonstrated great differences between semen/vaginal fluid and semen/menstrual blood. Similar results were reported by An et al. [55] and Choi et al. [56]. The HOXA4 tDMR displayed high degrees of methylation in blood and female saliva and was hypomethylated in vaginal fluid and menstrual blood. The latter two markers did not show specificity for a single body fluid; however, the authors suggested that low methylation of HOXA4 and high methylation of PRMT2, USP49 and DACT1 could be used to confirm the presence of vaginal fluid and menstrual blood.

Sensitivity and forensic simulation by Choi et al. [56]

Using a multiplexed MSRE-PCR and a complementary bacterial DNA-based assay, DNA methylation patterns could be generated for saliva and semen with just 500 pg (0.5 ng) or more of starting DNA and 250 pg (0.25 ng) of starting DNA from vaginal fluid was sufficient, showing that minimal amounts of DNA could be used for analysis. Mixtures of saliva and semen and semen and vaginal fluid were clearly distinguished by amplification of L81528 (semen-specific) in this study. A single post-coital penile sample and three post-coital vaginal samples were tested for an artificial sexual assault case and results showed a mixed sample profile as low peaks were observed for PFN3 (vaginal fluid-specific) and the semen-specific L81528 marker.

Park and colleagues [72] used the HumanMethylation 450 K bead array to identify eight CpG sites which were differentially methylated between saliva, blood, semen and vaginal fluid. All chromosomal regions and details of the markers are shown in Table 1. Overall, each marker displayed hypermethylation for its target body fluid with significantly reduced methylation in the others. Both saliva and blood markers showed above 50% methylation levels in respective target body fluids while less than 10% methylation were observed in other fluids. Semen and vaginal fluid markers showed above 90% and 65% methylation, respectively, in target body fluids, whereas below 16% methylation were observed in other fluids [72]. Sensitivity tests by Park et al. [72] showed that at least 10 ng of starting DNA was sufficient for precise distinction of fluids.

Table 1 Summary of previously reported body fluid-specific DMSs, chromosomal locations, sample sizes and methods of analysis

Lee et al. [59] identified 64 potential differentially methylated CpGs between saliva, blood, semen, vaginal fluid and menstrual blood using the HumanMethylation 450 K bead array. The 64 sites were selected as they exhibited more than 30% differences in methylation values between saliva, blood and vaginal fluid, whereas a 50% threshold was used for potential semen-specific candidate CpGs. The group examined methylation of the 64 candidate CpGs in 151 more body fluids using bisulphite sequencing and Methylation SNaPshot (Methylation Sensitive Single Nucleotide Primer Extension, MS-SNuPE) and identified a subset of CpG sites that showed most body fluid specificity. Only one CpG site (cg17621389) showed semen-specific hypomethylation, whereas seven others showed target body fluid-specific hypermethylation. These included one saliva-specific marker that was 2 bp downstream of cg09652652, i.e. cg09652652-2d, two blood-specific markers (cg06379435 and cg01543184), two semen-specific markers (cg26763284 and cg17610929) and two vaginal fluid-specific markers (cg09765089-231d and cg26079753-7d). Two markers of this panel, cg06379435 (blood-specific) and cg17610929 (semen-specific) were also included in the study by Park and colleagues [72]. Some semen samples showed almost complete methylation not only at semen-specific sites (cg26763284 and cg17610929), but also at one blood-specific marker (cg01543814) with hypomethylation at other markers. Four semen samples showed low methylation signals at one blood-specific marker (cg06379435) and vaginal fluid-specific markers (cg09765089-231d and cg26079753-7d). Such sporadic methylation profiles exhibited by semen could have been due to a small number of white blood cells that may have been present. Additionally, the authors [59] admittedly had trouble with menstrual blood, as 2 out of 11 samples exhibited similar methylation profiles as vaginal fluid, and 9 out of 11 menstrual blood samples also showed methylation signals at the blood-specific markers (cg06379435 and cg01543184). This was attributed to disregarding the time of collecting menstrual blood during the menstrual cycle, and small sample size (11 menstrual blood samples out of a total 151 samples). The group also exposed samples to the environment by placing samples in shaded areas for 75 days to test the efficacy of the multiplex SNaPshot (MS-SNuPE) assay. Similar to their previous research [55], all fluids except saliva were successfully analysed. When testing sensitivity of the method, successful DNA methylation profiling results were achieved with approximately 0.5 ng or more bisulphite-converted DNA [59].

Since the markers studied by Lee et al. [59] required extra interrogation for positive identification of body fluids, the research was repeated by the same group [60] by profiling DNA methylation levels in a total of 70 samples using the HumanMethylation 450 K bead array. The authors found two sites in the SLC26A10 gene (cg09696411 and cg18069290) which showed potential for menstrual blood specificity. The specificity of the CpGs was again tested in 125 vaginal fluid and 201 menstrual blood samples, and both showed menstrual blood-specific hypermethylation. The authors found that methylation levels were highest when sample collection was performed on the second day of menstrual bleeding, whereas specificity of the markers was reduced on the fourth and fifth days. The multiplex SNaPshot (MS-SNuPE) reaction from their preceding research [59] was also modified. In the new assay performed on 229 body fluids, the semen hypomethylation marker cg17621389 was removed, the blood marker cg01543184 (which also showed semen-specificity) was replaced with cg08792630 (which was also studied by Park et al. [72]), and the two new menstrual blood-specific markers were added. This resulted in all markers successfully exhibiting body fluid-specific hypermethylation.

Another study based on tDMRs identified by Eckhardt et al. [44] examined five regions in the BIK, CYTH4, GAS2L1, MDFI and OSM genes [74]. The group used bisulphite pyrosequencing and found that all CpG sites showed differential methylation patterns between saliva, blood, semen, vaginal fluid and menstrual blood. However, the fourth CpG site in CYTH4 showed blood-specific hypomethylation, three CpGs in GAS2L1 showed blood-specific hypermethylation and four sites in MDFI showed menstrual blood-specific hypomethylation. Regions examined in the BIK and OSM genes did not show specificity towards a single body fluid [74].

Antunes et al. [77] developed a high-resolution melt analysis (HRMA) assay to confirm the use of a 91-bp region located in an intron of ZC3H12D as a semen-specific marker. In previous research using pyrosequencing [67], an intron in the ZC3H12D gene showed hypermethylation in saliva and blood, but hypomethylation in semen. Indeed, the same results were observed by Antunes et al. [77]. During sensitivity analysis, 1 ng of starting genomic DNA was found to be the minimum amount necessary for amplification using the developed HRM assay. Generally, in forensic casework, degraded and contaminated samples are recovered from crime scenes. Such contaminants may contain inhibitors, which, when co-isolated with DNA could decrease amplification efficiency. Inhibitors may act by either binding DNA or by diminishing the activity of the Taq polymerase enzyme [77]. A simple clean-up step might not be adequate enough to remove inhibitors which bind to DNA [77]. One substance which is known to bind DNA and hence negatively affect PCR is humic acid [78, 79]. Thus, to determine if the clean-up steps in bisulphite conversion were sufficient to remove humic acid, it was added before bisulphite conversion. Addition of humic acid did not inhibit amplification and proved that clean-up steps after bisulphite conversion were sufficient to remove any humic acid that may be co-extracted with genomic DNA. To determine if the presence of humic acid affects PCR, humic acid was also added after conversion. While the PCR efficiency declined, it did not affect the melting curve and regardless of the humic acid, as semen still showed lower melting temperature and methylation levels when compared to saliva and blood. The study was performed to portray the development, sensitivity, accuracy and efficiently of the assay itself; however, the authors brilliantly demonstrated the use of the assay by validating a previously reported semen-specific marker [77].

Two markers, one specific to saliva (BCAS4) and one for semen (ZC3H12D) from Madi et al. [67], and one marker specific to blood (cg06379435 and four adjacent CpG sites) from Park et al. [72] were analysed by Silva et al. [63]. In this extensive report, bisulphite pyrosequencing was used to perform several developmental validations to determine the efficacy of the markers. BCAS4 showed saliva-specific hypermethylation, cg06379435 showed blood-specific hypermethylation and ZC3H12D showed semen-specific hypomethylation. Similar results were obtained even when vaginal fluid, menstrual blood and nasal secretion were included in the reaction; however, a CpG site in the BCAS4 locus (saliva-specific) showed similar methylation levels when compared to vaginal fluid, and one CpG site in the cg06379435 locus (blood-specific) showed similar methylation values in menstrual blood. Similar to their previous work [77], the group tested the effects of inhibitors such as hematin and humic acid which could affect PCR amplification [63, 78]. When added before bisulphite modification followed by amplification, these inhibitors had no discernible effects on methylation levels. This was possibly due to the clean-up step prior to amplification, because when both inhibitors were added after bisulphite modification, the amplification process failed, and results were distorted. There were no significant differences in methylation levels for all three markers even when samples were exposed to heat for 10–25 minutes.

Mixture analysis, forensic simulation and human specificity of markers by Silva et al. [63]

In a mixture analysis, methylation levels showed intermediate percentages when compared to pure samples, and hence, ZC3H12D (semen-specific), BCAS4 (saliva-specific) and cg06379435 (blood-specific) could not accurately distinguish the target body fluids in mixtures. Analysis was also performed for a saliva swab obtained from a coffee drink and blood and semen that were retrieved from cotton fabrics. Methylation levels of the saliva and blood markers were still higher in target body fluids when compared to others, and the semen marker still showed low methylation in semen with high methylation in other fluids. Two novel features of this study included analysis of the same markers and body fluids in another laboratory wherein all results were reproduced, as well as a species-specificity test. Methylation of the markers was profiled in non-human samples such as mouse, dogs, cats, chicken, bovine, equine, pig, chimpanzee, orangutan, gorilla, Escherichia coli, Staphylococcus aureus, Enterococcus faecali and Pseudomonas aeruginosa. Only the non-human primates showed results which was expected given the close evolutionary relation to humans [80]; however, cg0679435 (blood-specific) was not recognized in all primates, and ZC3H12D (semen-specific) was not recognized in orangutan samples. No results were observed for all other non-human species; thus, the authors declared the markers suitable for forensic casework [64]. However, further mixture analysis is necessary, as it was not successful for differentiation of body fluids in the study.

Antunes et al. [81] examined a sub-region of the PFN3 (vaginal fluid-specific) marker found by Lee et al. [69]. The marker was previously shown to display vaginal fluid-specific intermediate hypomethylation. In this study, 10 CpGs out of the previously reported 42 CpGs located in the 5′ upstream sequence were examined by pyrosequencing. Similar to the previous reports, the marker did not show complete hypomethylation, but rather intermediate methylation (25–55%) when compared to saliva, blood which showed > 60% methylation levels, while semen showed lower than 10% methylation. The study found that 5 ng of starting genomic DNA was sufficient to obtain the above results.

Mixture analysis and human specificity of markers by Antunes et al. [81]

During the mixture analysis by Antunes et al. [81], saliva, blood and semen were mixed in varying concentrations. The results showed that each fluid displayed intermediate methylation levels and accurate discrimination was not possible; hence, it was suggested that upon deconvoluting mixtures, more than one marker specific to each tested body fluid should be analysed. Similar to their previous work [63], a unique aspect of the study included a test of the influence of non-human DNA on the specificity of the marker. Only chimpanzee DNA showed similar results to human DNA. The cats, cows, orangutan and gorilla DNA yielded non-specific results which the pyrosequencing software declared as unknown as it did not match the human reference sequences, while none of the other non-human DNA yielded any results.

Watanabe et al. [75] developed a quantitative real-time PCR-based assay to examine the specificity of the potential blood-specific CpG site cg06379435 as well as CpGs located 7 bp and 14 bp downstream. These 3 CpG sites were analysed in saliva, blood, semen and vaginal fluid samples. The marker was indeed shown to be specific for blood as methylation levels were highest in blood compared to other fluids. Since mixtures of body fluids are usually found at crime scenes, the group resorted to analysing rs7359943, which is the SNP adjacent to cg06379435 to identify semen and blood in a mixture. For this, a DNA sample mixed with 50% of semen DNA with the rs7359943 AA genotype and 50% of blood DNA with the rs7359943 GG genotype was used. The methylation levels of all three CpG sites were lower in AA allele clones (semen) and higher in the GG allele clones (blood). While the method was successful and depicts the advantages of using genetic information along with epigenetic markers, it is only applicable when the identity of the mixed body fluids is known, and alleles from one individual can be distinguished from the genotype of other individuals. Furthermore, while the study was informative, menstrual blood was not included in this research. Blood is a component of menstrual blood, and confusion between the body fluids is common. It is essential to consider menstrual blood when interrogating a blood-specific marker, especially since many researchers do have difficulty in differentiating between these two body fluids [59, 62].

Age and sensitivity testing by Watanabe et al. [75]

The cg06379435 marker also proved to be effective and showed low methylation levels when tested against saliva and semen samples which were stored at room temperature from between 4 months and 29 years, whereas high methylation levels were observed for 29-year-old blood stains. Upon testing sensitivity, various concentrations (0.1–10 ng) of pooled blood DNA were evaluated. The group found that just 1 ng of starting DNA was sufficient for blood identification. The accuracy of the marker was tested by mixing various concentrations of blood with semen; the results showed that blood was identified only when blood/semen was 100:80% and 80:20%.

Two previously reported markers (cg17610929 for semen and cg09765089 for vaginal fluid) along with six novel body fluid-specific markers were studied by Lin et al. [62] for forensic body fluid identification. The authors selected the markers from public datasets of the HumanMethylation 450 K, namely the GPL13534 platform in NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo). The markers were selected with a criteria of a high methylation value in target fluids and nearly zero values in other fluids (Table 1). The group developed a 10-plex Methylation-Specific PCR assay which included cg09107912 and cg16732616 for saliva, cg24124443 and cg01607849 for blood, cg17610929 and cg05261336 for semen, cg09765089 and cg25416153 for vaginal fluid, along with a bisulphite conversion control (beta-actin) and a gender-specific marker (Amelogenin). The markers were tested against 65 samples of saliva, blood, semen, vaginal fluid and menstrual blood. A MS-SNuPE reaction was used to analyse the multiplex products, and each marker showed high methylation in target body fluids while low methylation levels were found in other fluids. However, menstrual blood samples also showed methylation peaks at both blood and vaginal fluid markers, indicating that menstrual blood was composed of those fluids.

Forensic simulation and mixture analysis by Lin et al. [62]

A novel aspect of this study was analysis of actual forensic casework samples (fabrics cut from underwear, swabs, tissues) and thereafter comparing the results obtained from conventional catalytic, immunological and microscopic tests. While the conventional tests gave ambiguous results, the 10-plex DNA methylation-based assay enabled accurate identification of all samples present on fabric from undergarments, swabs and tissue papers. Upon mixture analysis, when mixed in equal quantities, each marker enabled accurate identification of respective target fluids. In fact, semen was even detected when present at only 1.89% concentration in mixtures, reinforcing the reliability of the technique.

Forat and colleagues [73] performed two extensive genome-wide methylation profiling using the HumanMethylation 27 K and 450 K arrays. The group reported differentially methylated sites specific to saliva, blood, semen, vaginal fluid and menstrual blood. The description of each marker and their methylation status is described in Table 1. These researchers used both bisulphite sequencing and MS-SNuPE to analyse methylation levels of the markers. cg21597595-4d and cg15227982-137d were both hypermethylated in saliva; cg26285698-14d and cg26285698-39d were hypomethylated in blood, whereas cg03363565-59d was hypermethylated in blood; cg22407458-288d and cg05656364-283d were hypomethylated and hypermethylated in semen, respectively; cg14991487-85d and cg03874199-212d were both hypermethylated in vaginal fluid; and cg09696411-25d and cg09696411-40d were both hypermethylated in menstrual blood. Blood and menstrual blood exhibited similar methylation profiles for the cg03363565-59d marker, and methylation differences were rather small between the menstrual blood markers and vaginal fluid markers for the target fluids. Evidently, venous blood, menstrual blood and vaginal fluid are not always well discriminated [59, 62]. Similar to Lee et al. [59], the authors credited this to incorrect sampling on tampons and time of collection; since samples were collected on days 1–5, whereas most women regard day 2 as strongest bleeding day and day 5 is usually disregarded. A step-wise analysis of body fluids in question was recommended, beginning with cg09696411 for menstrual blood and then using other markers to confirm the presence of menstrual blood. Methylation levels found by the MS-SNuPE assay were compared to standard bisulphite sequencing, Roche 454 sequencing and Illumina MiSeq-sequencing [73]. The group found that all the next -generation sequencing (NGS) platforms yielded the same results as the MS-SNuPE assay.

Forensic simulation and disease analysis by Forat et al. [73]

During a forensic simulation, a total of 75 body fluids were left in dry (left at room temperature), humid (in an exicator) and outdoors (on ground) conditions. The methylation levels of the body fluid-specific CpGs were determined several times over a 6-month period. All methylation levels were found to be unaffected in dry conditions whereas humid conditions in the exicator seemed to affect methylation levels the most. Still, humidity-related changes did not affect the discriminatory power of the markers in target body fluids. The markers also facilitated detection of target body fluids when the target body fluids comprised a minimum of 20% or more of mixtures.

A previous study had shown that methylation levels of a tumour cell is decreased to 40–60% of the healthy cell [82]; thus, the potential effect of some common tumours on methylation levels of the loci was tested [73]. Some of the markers indeed showed variation; in cervix carcinoma samples, both vaginal fluid markers showed decreased methylation values in vaginal fluid, when compared to healthy controls. This decreased methylation resulted in overlapping of methylation levels with the menstrual blood marker (cg09696411). Aberrant methylation due to disease must be considered when using this marker as false negatives may greatly compromise the test. Additionally, methylation of the semen markers also fluctuated in blood obtained from patients with chronic lymphocytic leukaemia (CLL), chronic myelocytic leukaemia and myeloproliferative syndrome, and one CLL sample showed a slightly increased methylation signal with the menstrual blood marker (cg09696411). Zhang et al. [83] previously reported that methylation levels of loci may be controlled or altered by genetic variants; thus, a NGS experiment was performed to determine if genetic factors would affect the methylation levels of the candidate CpGs [73]. Briefly, library preparation was completed using the NEBNext™ Ultra DNA Library from Illumina. Five PCRs were performed using equal amounts of pooled bisulphite converted DNA from 10 participants. Following PCR, the amplicons were purified and used as input for the library preparation. NGS analysis was performed on the MiSeq instrument (Illumina) using 2 × 250-bp paired end sequencing [73]. A mutation del-91 (Chr 12:57619697, hg38) that was found in only 2.6% of menstrual blood reads was shown to decrease the methylation level of the menstrual blood marker (cg09696411). Additionally, a sequence variant G>A 167 (Chr 10:102776279, hg38) found in 14.7% of reads decreased methylation of cg15227982-137d (saliva-specific). Such reduction in methylation levels due to the genetic variants would greatly reduce the sensitivity of the test. Finally, a sequence variant T>C 38 (Chr 2:5366146, hg38) which occurred in 4% of reads increased the methylation level of cg21597595-4d (saliva-specific); however, since this was a hypermethylation marker anyway, the results would not be distorted. This comprehensive research study [73] highlighted the importance of considering the effects of external factors, disease, genetic factors and different methods when researching differential methylation for body fluid identification.

Vidaki and colleagues [76] obtained methylation data for buccal cells, blood and semen from two previous studies [48, 84] and selected 11 potential body fluid-specific differentially methylated CpG sites. These sites were analysed against saliva, blood, semen, vaginal fluid, menstrual blood, buccal cells, skin and urine. The study included four markers for buccal cells, three markers for blood and four markers for semen (Table 1). Markers for buccal cells were inconclusive; however, upon testing for blood specificity, the cg13763232 site showed potential as a hypermethylation marker. The site was methylated only in blood, while in other fluids, a partial methylation or unmethylation was observed. In fact, three adjacent CpGs showed blood-specific hypermethylation. One caveat was that skin samples did also show a wide range of methylation (0.13–0.94); thus, the locus cannot be used confidently. More rigorous testing of the cg13763232 marker with a larger sample size is necessary. For identification of semen, only hypomethylation markers were found to be highly semen-specific; in fact, nine sites around cg04382920 and four sites around cg11768416 showed semen-specific hypomethylation.

Sensitivity and analysis of aged samples by Vidaki et al. [76]

Two markers (cg04382920 and cg11768416) underwent extensive validation using a sensitivity analysis and testing against aged samples. The markers were shown to provide stable methylation results even when 0.05 ng starting DNA was used in a mixture with blood. Additionally, no false negatives, outliers or significant differences in methylation were detected when the markers were tested on semen samples that were stored at − 20 °C for almost 16 years, as well as blood stains on fabric that was kept at ambient temperature for more than 20 years [76].

Holtkotter and colleagues [61] employed a unique approach—the group first performed a literature search to identify most promising methylation-based markers for saliva, blood, semen and menstrual blood. A total of 13 markers were chosen from Lee et al. [69], An et al. [55], Park et al. [72] and Lee et al. [59]. Only four markers, cg09652652-2d (saliva-specific), cg06379435 (blood-specific), cg26763284-138d (semen-specific) and cg26079753-7d (menstrual blood-specific), were subsequently used in a multiplex SNaPshot (MS-SNuPE) assay wherein each marker successfully distinguished the respective target body fluid. However, for the blood marker (cg06379435), semen also showed a partial methylation profile. It was therefore suggested to use the blood marker in combination with the semen marker (cg26763284-138d), since semen would display much higher methylation levels.

Mixture analysis and forensic simulation by Holtkotter et al. [61]

During mixture analysis, each marker was able to distinguish the target fluid. The saliva-specific marker cg09652652-2d showed similar methylation profiles in saliva and buccal cells; thus, differentiation between those fluids was not possible. The multiplex assay was also applied to mock crime scene samples, which were created by mixing various ratios of fluids and then applying to cellulose swabs. Once again, the markers successfully discriminated each respective body fluid [61]. It is indeed advantageous that methylation of several CpG sites in each region was analysed in the bisulphite PCR reaction as this makes the markers more reliable than relying on a single CpG site. However, the SNaPshot primers are designed to terminate one base pair upstream of the target cytosine which means that the method facilitates analysis of only one target CpG site [85]. It is definitely not ideal to rely solely on one CpG site. Furthermore, while these four markers were shown to be robust and reliable in a multiplex assay, the location of the markers was not given much consideration. Both saliva and semen-specific markers were located in exons and these regions are said to be more prone to mutations [86]. The blood and menstrual blood-specific markers were located in 5′ and 3′ regions, respectively. Untranslated regions play an essential role in health and disease. These regions house upstream open reading frames, internal ribosome entry sites, are GC rich and influence the rate of translation. Mutations in these regions are fairly common, and pivotal in the onset of diseases such as Alzheimer’s disease, breast cancer, bipolar affective disorder, congenital heart disease, among others [87, 88]. Whether sequence alterations in the UTRs of the blood and menstrual blood-specific markers occur frequently and/or result in aberrant protein expression requires additional studies; however, these factors must be considered prior to validating them.

The location of these tDMRs in the human genome, related CGIs and genes as well as the number of samples used have been summarised in Table 2.

Table 2 Summary of previously reported body fluid-specific tDMRs, chromosomal locations, sample sizes and methods of analysis

Considerations for development of DNA methylation-based markers for application in forensic science

Assessing the stability of DNA methylation markers

For a marker to be deemed reliable and confidently applied in routine forensic casework, rigorous testing is crucial. While it is true that some researchers do attempt to validate the markers using simulated forensic conditions (discussed below), it remains to be seen if any marker will fulfil such high requirements.

The chromosomal location of a potential marker is a first and foremost factor. Ideally, a body fluid-specific CpG site or tDMR should not be located in a region with a high mutation frequency, aptly coined ‘a mutation hotspot’ [89]. Mutation rates at any given region are a reflection of the stability and sensitivity of nucleotides to mutagenic agents, and the fidelity and efficiency of DNA replication and repair machinery [90, 91]. Studies show that mutation/substitution rates at A/T base pairs are between 25 and 85% lower than G/C base pairs (excluding CpG sites) [92, 93]. Specifically of interest would be that although primarily in single-stranded DNA, a cytosine followed by guanine is approximately 10 times more mutable than a cytosine followed by adenine or thymine. This hypermutability is due to spontaneous deamination of methyl-cytosines into thymine [91, 94]. Thus, the rate of CpG mutation is a function of the rate of DNA melting, which in turn is affected by the local base composition wherein G/C base pairs are stronger than A/T pairs [95, 96]. There have been numerous surveys of the human genome to determine which regions undergo mutation and thereby facilitate evolutionary divergence. These studies show that synonymous sites in exons and protein coding genes produce considerably higher estimates of evolutionary divergence when compared to introns and pseudogenes [86, 97,98,99,100]. Additionally, with the exception of regulatory sites, intergenic DNA also shows lower evolutionary divergences than those seen in coding sequences [86, 101,102,103]. Currently, most body fluid-specific markers for forensic use are located near or within exons (Tables 1 and 2). Exons are known to harbour numerous polymorphisms [104]; in fact, Ng et al. [105] examined a single human exome (exons in the genome) and found approximately 12,500 coding variants that can affect gene expression and protein function.

Considering the above, it would be beneficial that a DNA methylation-based body fluid-specific marker be located in a region containing more A/T pairs than C/G pairs, in an intron, pseudogene or intergenic region of DNA, but preferably not in an exon. With cytosine itself being more mutable than any other base [86, 91], it is compulsory to investigate whether the regions within and neighbouring the tDMR/marker are prone to mutation.

Preferably, a marker should facilitate absolute unambiguous identification of the target body fluid based on the differential methylation levels between the fluids. Since body fluids are generally recovered in mixtures, it is ideal that a marker shows only a complete on/off methylation status. This means that the marker should ideally exhibit complete methylation (> 90%) in the target fluid and unmethylation (< 10%) in non-target fluids, or vice versa [62, 76]. This should hold true even if the DNA recovered from body fluids is of a degraded/contaminated nature or is present in varying concentrations and mixtures. The ability of the marker to identify the body fluid should not falter using various methods to analyse the methylation levels, or with factors known to affect DNA methylation, such as genetic variation [106], diets, lifestyle, environment [24], age [21, 107], disease [16, 108, 109], ethnicity [6, 107, 110, 111] and smoking [25, 112], to name a few. However, to the best of our knowledge, no DNA methylation-based marker for forensic use has undergone such extensive research which considers all the above stipulations.

Reproducibility of methylation patterns using various DNA methylation-based assays

Exposure of samples to the environment for a certain number of days to determine effects of time and age, mixture analysis as well as sensitivity tests to determine minimum concentration of DNA required are rather standard and common aspects in the above studies [55, 56, 59, 72, 75,76,77]. However, another essential aspect is to test whether methylation results are reproducible using different DNA methylation-based assays. Reed et al. [113] compared methylation levels generated by bisulphite PCR and sequencing (BSP) with methylation levels obtained from pyrosequencing. In this report, higher variation of methylation levels was obtained from BSP possibly due to the bacterial cloning step, whereas this step is needless for pyrosequencing. Each method will present its own advantages and limitations, and it is to be expected that the use of numerous methods to measure DNA methylation will not present the exact same results [114]. As an example, the vaginal fluid-specific marker cg09765089 was examined by Lee et al. [59] on 11 samples using bisulphite sequencing and multiplex MS-SNuPE. This study found approximately 35% methylation in vaginal fluid and 37% methylation in menstrual blood. Lin et al. [62] studied the same marker on 10 samples using methylation-specific PCR combined with multiplex MS-SNuPE and found approximately 39.6% methylation in vaginal fluids. However, this study did not test menstrual blood samples.

Silva et al. [63] performed analysis of body fluid-specific markers in another laboratory to assess the reliability and production of similar results between different laboratories. Their results were indeed reproduced in another facility for the cg06379435 (blood-specific), BCAS4 (saliva-specific) and ZC3H12D (semen-specific) markers. Depending on the extent of the variation in methylation levels using different methods, equal sample types and numbers, data normalization, base alignments, corrections and quality controls might alleviate these errors to a certain degree [57, 115].

Additionally, while some markers have been repeatedly studied, they are investigated by the same groups of researchers using the same methods. These include cg09652652-2d (saliva-specific), cg06379435 (blood-specific) and cg17610929 (semen-specific) which were studied by Lee et al. [59] and Lee et al. [60] using targeted bisulphite sequencing and methylation SNaPshot as well as ZC3H12D and FGF7 (semen-specific), C20orf117 (blood-specific) and BCAS4 (saliva-specific) markers which have been repeatedly studied by Madi et al. [67], Silva et al. [63] and Antunes et al. [68] by bisulphite pyrosequencing. It is imperative that more than one method be used in a single study to prove that regardless of the method of analyses, the methylation results of a particular marker in a specific body fluid will be reproducible; thus, the marker will not waver in accuracy to identify the target fluid. From the above studies, only Forat et al. [73] reproduced results of their MS-SNuPE assay using bisulphite sequencing, pyrosequencing and the MiSeq platform by Illumina.

Selecting regions of DNA that house multiple CpG sites

Numerous studies using the Illumina HumanMethylation arrays find potential CpG sites at which methylation may be specific to a body fluid (Table 1). However, ideally, a marker comprising methylation patterns at more than one CpG site that are found to exhibit body fluid-specific methylation patterns would be sought after. In fact, upon experimentation, Lee et al. [59], Lee et al. [60] and Forat et al. [73] who found that the body fluid-specific CpG sites were not those identified by the Illumina array, but rather a few bases downstream (for example; cg09652652-2d which is saliva-specific; and cg09765089-231d and cg26079753-7d which are vaginal fluid-specific). Instead of only the single CpG sites, perhaps the entire region (a few hundred base pairs) may facilitate body fluid identification. Additionally, mixture analysis by Silva et al. [63] and Watanabe et al. [75] was not successful using single CpG sites. Placing confidence on a region/locus as opposed to a single CpG site would surely be more desirable in forensic casework. Only few studies have considered this; Park et al. [72] assessed between two and four CpGs around the eight markers that were analysed and found them to also be hypermethylated in target fluids. Vidaki et al. [76] also identified four and nine neighbouring CpGs in two semen-specific hypomethylated markers (cg11768416 and cg04382920) and proved that the regions around the sites were also specific. However, despite Antunes et al. [81] examining regions containing five CpG sites, this group also struggled with deconvoluting mixtures.

Assessing environmental influences and disease-related changes of methylation patterns in tDMRs

Forat and colleagues [73] examined environmental effects such as dried, humid and wet conditions on DNA methylation levels of markers; these researchers showed that humidity did indeed influence methylation levels of their menstrual blood marker (cg09696411), vaginal fluid marker (cg14991487) and saliva marker (cg21597595). The concepts of testing effects of inhibitors such as humic acid which can be found freely in soil and influences of non-human DNA were only tested in two studies by the same group [63, 81]. Disease is known to cause aberrant methylation patterns in humans [15, 108, 109]; yet, only Forat and colleagues [73] have compared methylation levels between healthy and cancer-inflicted individuals, wherein cervix carcinoma was shown to affect methylation levels in vaginal fluid. No other study in forensics has considered that individuals with disease will exhibit different methylation levels when compared to healthy individuals, and this would drastically reduce the reliability of body fluid-specific markers.

Genetic variation

Genetic variation such as single-nucleotide polymorphisms (SNPs) is known to influence methylation levels to some degree [83, 116, 117], but this impact is seldom investigated in forensic-based research. Watanabe et al. [75] failed to distinguish between body fluids in mixtures, and therefore resorted to searching for nearby SNPs. However, the group did not test the effects of polymorphisms on methylation, but merely used it as complementary method. One study, Forat et al. [73], indeed found that a mutation (del-91, Chr 12:57619697, hg38) reduced methylation levels of a menstrual blood marker, and a sequence variant (T>C 38, Chr 2:5366146, hg38) increased methylation levels of a saliva-specific marker. Such findings prove that SNPs do affect methylation and are certainly a factor to be reckoned with. Even though Forat et al. [73] stated that there is no reported SNP that confirmedly alters methylation of CpG sites found by the Illumina array, if a potential marker is located in a region of a gene that is highly variable or undergoes mutation frequently, the marker cannot be relied on. However, it must be noted that some SNPs can also influence the expression of remote genes located at a distance, instead of the expressions of the genes that actually harbour them [118, 119]. Furthermore, SNPs affect heritability of methylation sites, which is also a factor that affects the stability of DNA methylation-based markers, as elaborated next.

Heritability

Heritability of DNA methylation is the proportion of variance explained by additive genetic factors. In humans, low ‘epigenetic heritability’ was reported by Gervin et al. [120]. These researchers obtained blood from 49 monozygotic (MZ) twin pairs and 40 dizygotic (DZ) twin pairs to investigate inter-individual variation and heritable patterns of DNA methylation. CpG islands (CGIs) and 5′ regions exhibited low methylation, while conserved noncoding regions had intermediate methylation and randomly selected CpG sites within the major histocompatibility complex (but outside the CGIs, 5′ and noncoding regions) showed high methylation levels. Lower within-pair differences in DNA methylation were observed for MZ twins than that of DZ twins, in the conserved noncoding regions and randomly selected CpG sites compared to CGIs and 5′ regions. The authors stated that low heritability levels of between 2 and 16% were observed across the four distinct regions [120]. Another twin-based study by Bell and colleagues [121] examined blood samples from 172 female twins (43 DZ pairs, 33 MZ pairs and 20 singletons) at 26,690 promoter CpG sites and found a mean genome-wide heritability of 18%.

For forensic use of a tDMR-based marker, the heritability of methylation at that marker must be thoroughly investigated and confirmed. This was certainly not the case in two studies investigating a CpG site in the promoter of AXL receptor tyrosine kinase by Boks et al. [122] and Breton et al. [123]. Boks and colleagues [122] examined whole blood obtained from 23 MZ twin pairs and 23 DZ twin pairs to investigate heritability. Twenty-three percent of sites analysed showed significant heritability, with most significant heritability (0.94) observed at sites in AXL, amongst other genes. However, contradictory results were obtained when the same site was assessed by Breton and colleagues [123]. This study, however, examined a different body fluid, buccal cells which were obtained from 16 MZ twins and 20 DZ twins. Such large discrepancies of heritability were attributed to different body fluids assessed, underlying genetic distributions between individuals as well as polymorphic imprinting, wherein in some individuals, AXL may be mono-allelically expressed whereas in others, it would be biallelically expressed [122, 123]. Even though these studies were not based on forensic applications of DNA methylation, they portrayed important considerations to avoid discrepancies when investigating heritability of particular CpG sites.

Several studies have shown that cis-acting (closely located) genetic variation such as SNPs can account for a large proportion of variation in DNA methylation [83, 116, 124]. The true extent of this influence is not known, but is dependent on the individual, tissue, location and functional genomic context of the CpG site. Quon and colleagues [124] assessed heritability of methylation levels of 21,000 CpG sites in four regions of the brain: the cerebellum, frontal cortex, caudal pons and temporal cortex from 150 unrelated individuals. This research identified 636, 654, 600 and 812 heritable DNA methylation sites/loci in the cerebellum, frontal cortex, caudal pons and temporal cortex, respectively. These heritable sites were found to be enriched in open chromatin regions and known binding sites of transcription factors, which infers functional roles of some of the sites. Heritable methylation sites were found to have a high number of SNPs within a 50-kb window than non-heritable methylation loci, suggesting that the higher the number of SNPs in the region, the more likely it was to find heritable methylation loci. The estimated heritability of all methylation loci that were thought to be heritable across all brain regions was nearly 30%; whereas heritability across loci including those which were not thought to be heritable was less than 3%. Across all four regions, 181 loci were heritable, whereas 207 loci were heritable across at least three of the brain regions [125].

Rowlatt et al. [126] assessed over 196,000 CpG sites in healthy colorectal tissue samples obtained from Colombian participants to examine phenotypic profiles, genetic effects and regional genomic heritability. This research found that CpG sites located in regions of low CpG content exhibited great variation, higher methylation and were more likely to be heritable than compared to CpG sites located in CpG-rich regions. CpG sites located in intergenic regions displayed higher methylation levels and were more likely to be heritable than those in transcription start sites or intragenic regions. The group also found that genetic variants in genomic risk regions for colorectal cancer can also affect methylation levels in healthy tissues. For example, methylation levels of cg15193198 and cg24112000 were affected by SNP rs4925386. Both of these methylation sites were found to be heritable; however, heritability decreased considerably due to the presence of the SNP [126].

Thus, upon investigating a DNA methylation-based marker for body fluid identification, thorough investigation must be performed to ensure that methylation of the marker is heritable across generations. Alternatively, it might be wise to select a marker based in a region which is epigenetically stable, even throughout environmental and nutritional fluctuations.

Finally, other vital yet overlooked aspects regarding DNA methylation-based markers are the impact of diets, lifestyle and ethnicity on the stability of the markers. No study based on DNA methylation-based markers for forensic body fluid identification has considered the impact of varying diets, different socio-economic rates, cultures and geographic locations, smoking, drugs, alcohol or ethnic backgrounds on the methylation status of the markers. These factors have repeatedly been shown to alter methylation levels [6, 22, 25, 107, 112].

Future outlook

An all-encompassing study would be one (or preferably a few collaborators in various countries) which selects a panel of potential body fluid-specific markers and performs methylation analysis using several different techniques which will demonstrate the reproducibility of the markers. The tests should be performed on a large number of samples of different fluids, which ought to be obtained from individuals of diverse age groups, ethnicities and geographic locations, healthy individuals as well as those inflicted with various diseases, medicated and non-medicated, smokers and non-smokers, alcohol-drinkers and non-alcohol drinkers, drug-users and non-drug users. These samples must be corrected for cell type heterogeneity if need be. SNP-mapping as well as determining whether the methylation levels of the particular sites are heritable are also crucial steps to be taken. Heritability of DNA methylation patterns at certain regions requires longitudinal studies, since this is a measure of methylation over several generations. The standard mixture, sensitivity and specificity tests must certainly not be neglected. And finally, once the markers have been completely confirmed to identify body fluids, they should be used in age prediction models to determine efficacy in age estimation, as is currently being widely explored [127,128,129,130].

Such large-scale study designs and concomitant rigorous interrogation of the markers would be essential to standardise markers, methods and techniques which is presently done for SNP and STR markers. Once the currently identified DNA methylation-based markers undergo such investigation, it would be conceivable that a dedicated forensic database for DNA-methylation-based markers be developed. Reverting to this database will in turn enable us to place as much confidence on differential DNA methylation-based body fluid identification as is placed currently on databases such as the Y-STR database and Combined DNA Index System (CODIS). Ensuing this intense research, we see much potential for routine application of differential DNA methylation-based body fluid identification in forensic casework.