Keywords

1 Introduction

Over the past decades, the number of genetic studies using saliva samples has been steadily increasing (for a comprehensive review see Sun & Reichenberger, 2014). Since all normal somatic cells contain the same genetic sequence in our body, analyzing genetic markers from any kind of cell other than gametes should provide the same information. There are some exceptions to this general rule, for example in genetic mosaicism or in the case of foreign human cells present in some individuals (such as the donor cells after transplantation or in twins exposed to fused circulation), and in precancerous cells where the original genomic DNA (gDNA) sequence is already altered. Epigeneticists also started to use salivary specimen, based on the assumption that saliva can be as good alternative source of DNA as blood in epigenetic association studies of non-blood-based diseases. However, there are still a few technical issues (due to tissue specificity and reversibility of epigenetic signals) to be considered in this area of research (see details at the last sections).

2 Using Saliva as a Minimally Invasive Genetic Sample

In hospital-based genetic laboratories, blood samples are the preferred DNA source for a couple of reasons: (1) reliable DNA yield with high quantity and quality, (2) more experience has been accumulated, (3) blood sampling can be easily combined with other (routine clinical) blood tests, and automatic sample processing is often available at hospital centers (reducing workload). However, in field studies where sample storage and processing cannot be easily achieved, or at participants’ homes—where medical assistance for blood drawing is difficult to manage—saliva can serve as an ideal surrogate tissue for genetic analyses. Noninvasive sampling methods of collecting cells from the mouth are also preferred in research involving children, the elderly, and healthy nonclinical participants, especially in large-scale studies where biological samples could be mailed to the laboratory at ambient temperature. Table 6.1 summarizes the fields where saliva sampling is preferred. Since 1 ml of blood or saliva contains similar number of cells (~half a million), comparable amounts of DNA can be extracted from both (Sun & Reichenberger, 2014). However, blood samples have greater volume (typically ~8 ml) than saliva samples (maximum 4 ml, usually 1–2 ml, see Chap. 3). Therefore, the total DNA yield from blood samples is mostly higher.

Table 6.1 Types of genetic analyses when saliva samples are preferred

Saliva is an easily accessible source of cells, containing both epithelial cells exfoliating from the oral mucosa and leukocytes filtrating from blood vessels, as it has been shown via microscopic observations (see schematic representation at Fig. 6.1 upper part). The approximate ratio of the different cells was demonstrated by genetic analyses of oral samples from patients after allogeneic blood stem cell transplantation using informative DNA markers to differentiate between the host’s epithelial cells and donor’s blood cells (Thiede, Prange-Krex, Freiberg-Richter, Bornhauser, & Ehninger, 2000). Nevertheless, a major drawback of oral biospecimens is that bacterial and fungal or viral contamination cannot be avoided, which can cause problems if bacterial deoxyribonuclease (DNase) enzymes degrade the human DNA. However, this issue can be resolved simply with proper design. For example, rinsing the mouth and avoid eating for at least half an hour prior to saliva collection reduces contamination. If the sample has to remain at room temperature for a longer time, a preservative can be added. There are several commercially available kits providing convenient protocols for saliva collection when the stabilizing solution is released into the sample after closing the device. This solution is a lysis buffer containing the ionic detergent sodium dodecyl sulfate at a proper pH and salt concentration that is optimal for DNA storage (i.e., removing divalent cations, which are required for DNase activity). Since all kinds of cells, either human or microbial, are disrupted by the lysis buffer, it also renders the sample nonhazardous, which might be important for complying with biosafety regulations.

Fig. 6.1
figure 1

Cell types and their possible epigenetic modifications in saliva. Top: Structure of the nonkeratinized buccal epithelium. Cells are shown in purple as they appear in the histological hematoxylin & eosin staining. The underlying collagen fibers are indicated by the pink stripes, and the capillary branches are represented in purple with yellow symbolizing plasma. A magnified section of a blood vessel demonstrates the different cell types in blood (beside the most abundant red blood cells and tiny thrombocytes, nucleus containing leukocytes are shown). To achieve their immunological functions, leukocytes can exit blood vessels (red arrows) and transverse through the epithelial cell layers. Hence the cell composition of saliva is quite heterogenous, which is shown on the right side. Based on microscopic observations and DNA methylation data of salivary samples, the two major cell types are buccal epithelial cells (large, pink cells) and granulocytes (smaller, purple cells with segmented nuclei). The different types of bacteria are represented by light blue dots, dark blue rods, and orange conglomerates, whereas fungi are illustrated by green filaments. These microorganisms live in the mucinous layer (shown in gray) and are often attached to the exfoliating epithelial cells. Therefore, genetic samples obtained from saliva contain both human and foreign DNA. Bottom: Epithelial cells (first panel) and the three main white blood cell types (granulocyte, lymphocyte, and monocyte), which can be present in saliva. The nuclei (shown in dark purple) are the source of gDNA. Epigenetic variations are shown at the bottom: The two parallel green ribbons represent the double stranded DNA with the sense sequence in 5′–3′ direction. Bases are: A, adenine; C, cytosine; G, guanine; and T, thymine. Hydrogen bonds are denoted as dashed lines. Although the genetic sequence is the same in every somatic cell type of a healthy individual, epigenetic marks such as DNA methylation can show cell type-specific patterns, as illustrated by the yellow circles on the C bases

3 Quantity and Quality of Salivary DNA Samples

If saliva is collected in a laboratory setting, the passive drooling method is advised (see Chap. 3), and bacterial growth can be avoided without adding any chemical by storing the sample at a temperature −20 °C or less. In this way, the salivary sample can be used for multiple assays (e.g., using the supernatant for hormone or other saliva-biomarker analyses, while using the cell pellet fraction for genetic analyses). Since DNA is a stable nucleic acid, repeated freeze-thaw cycles have a negligible effect on the DNA yield (Nemoda et al., 2011). Although several freeze-thaw cycles can damage the gDNA, this fragmentation rarely results in non-amplifiable sample (Digestion by bacterial DNase enzymes imposes a bigger risk if the sample is not stored properly). In addition, storage for extended time periods reduces the DNA yield but does not affect substantially the usability of gDNA in genotyping (Durdiaková, Kamodyová, Ostatníková, Vlková, & Celec, 2012). It is to note that routine DNA concentration measurements (using UV absorbance at 260 nm) are based only on the chemical properties of the nucleic acids. Thus, the integrity of high-molecular weight gDNA should be checked by other methods (e.g., gel electrophoresis, see Rethmeyer, Tan, Manzardo, Schroeder, & Butler, 2013) before the costly and time-consuming genetic analyses.

High-quantity and high-quality DNA can be obtained not only from whole saliva samples but also from the collection devices, that usually absorb cells present in saliva. Therefore, it is advisable to keep the collection device after the centrifugation of saliva if genetic analyses are also planned beside hormone measurements. In this way, the biological sample collection procedure can be simplified, which can be a crucial point in studies involving children. In terms of collection medium, if saliva is obtained via cotton or hydrocellulose absorbent device, most of the nucleic acid content can be recovered from the device after incubating it in cell lysis buffer (similar to DNA isolation starting from a cell pellet). However, if a synthetic swab is used, equal amounts of DNA could be isolated from the saliva filtrate and the collection device (Nemoda et al., 2011). Thus, preliminary analyses are advised to check the approximate DNA yield when a new protocol is planned with a collection device, especially when collecting saliva from infants, since much lower saliva volume can be obtained from them.

It is important to emphasize that the total amount of DNA obtained from saliva samples shows huge variability. In a review paper by Sun and Reichenberger (2014), the cited research groups report on average 20–40 μg DNA yield per ml saliva (ranging from 1 to 160 μg/ml). Larger DNA yield was observed when rubbing the tongue against the inside of the mouth before saliva collection (Nunes et al., 2012). Therefore, even though 0.1 ml of saliva is sufficient for genetic analyses (yielding approximately 1–2 μg DNA), researchers are advised to collect at least 1 ml saliva from children and adults, so that a larger number of analyses can be conducted (Usually one genotyping assay requires 10–20 ng of DNA). Better quality and higher amount of DNA (hence larger volume of saliva) is required for genome-wide association studies (GWAS), where thousands of gene variants are measured in parallel on microarrays (for a detailed review on GWAS see Stranger, Stahl, & Raj, 2011). Although the current methodologies can work with as low as 0.5 μg gDNA at high-throughput, large-scale measurements, using DNA samples with concentration below 30–50 ng/μl is not recommended. Thus, the optimal protocol for saliva collection has to be selected according to the aims of each study so that enough good quality gDNA can be obtained for the genetic analyses while considering the lower end of the DNA yield range.

The quality of salivary DNA can also vary substantially depending on the DNA isolation technique (e.g., classical phenol-chloroform extraction versus silica membrane-based purification kit, Durdiaková et al., 2012). Interestingly, the ratio of human and microbial DNA can be also affected by the extraction method (Vesty, Biswas, Taylor, Gear, & Douglas, 2017). The pros and cons of the most frequently used DNA isolation methods are presented in Table 6.2. It has to be noted, that the preparatory step of cell lysis (to release DNA molecules in the sample) can also affect the quality of the DNA specimen. For example, using preloaded lysis buffer at saliva collection can be disadvantageous if the expected volume of saliva is not achieved (e.g., only 0.5 ml saliva is provided instead of the recommended 2 ml at an Oragene self-collection kit), because it can affect the efficiency of the molecular analyses (Pulford, Mosteller, Briley, Johansson, & Nelsen, 2013). Remaining chemicals, such as sodium dodecyl sulfate, phenol, or ethanol can inhibit or degrade the enzyme amplifying the DNA template in the polymerase chain reaction (PCR) (Rossen et al., 1992). While this problem could be easily detected in samples where the reaction is completely inhibited (i.e., not yielding sequence-specific amplicons), more subtle differences in amplification efficiencies could result in biases of quantitative measurements, like at DNA methylation analyses (Soriano-Tárraga et al., 2013).

Table 6.2 Techniques for genomic DNA isolation from saliva

The remaining organic compounds can also affect DNA concentration measurements using UV absorbance on a spectrophotometer, potentially resulting in overestimated (or confounded) DNA quantity and quality. Therefore, companies recommend using concentration measurements based on colorimetric reactions that can estimate the double stranded gDNA in salivary samples. However, this technique still does not provide precise information about the amount of useful human gDNA in a sample, since saliva always has a portion of foreign DNA due to its microbial content (even from healthy individuals). Importantly, the human/microbial DNA ratio can be estimated by real-time PCR technique using human-specific primers (for more details see methods by Nishita et al., 2009). Remarkably, varying portions of human DNA were reported by different research groups: Mean percentages of amplifiable human DNA varied between 40 and 80% in saliva samples, ranging from about 10 to 100% in most of the studies (see references by Sun & Reichenberger, 2014). Still, the human DNA yield can be kept on the higher end of this range by thoroughly rinsing the mouth with water 5–30 min before saliva collection. Notably, Hu et al. (2012) showed that salivary samples with at least 31% human-specific amplifiable DNA performed as well as blood-derived DNA samples.

In summary, oral cells can be used for various genetic analyses (see Table 6.1). Saliva can be easily collected in a broad age range, increasing the consent rate for providing biological sample, especially among healthy participants, which is a crucial point in large-scale epidemiological studies (e.g., in follow-up analyses of specific birth cohorts). The processing of this noninvasively obtained biological specimen is similar to that of blood-based samples, yielding comparatively good quality DNA samples for a wide range of genetic analyses. Until recently, the majority of genetic and epigenetic studies used blood as a source of gDNA, hence saliva is often referred to as surrogate tissue in these studies. Although salivary DNA is a mix of human, bacterial, and fungal DNA (Fig. 6.1), due to the species-specific PCR amplification step in the genotyping procedures (for a visualized experiment see Lorenz, 2012), it can be readily used for genetic analyses. In the following sections, applications of salivary DNA samples in human genetic studies are discussed. These studies assess the sequence of human DNA which is present in every normal somatic cell (hence the source of cells does not matter in these genetic analyses). Measurements of malignant cells and DNA adducts used in oral cancer diagnostics are presented in Chap. 19, whereas Chap. 8 describes studies assessing salivary cell-free DNA (called as liquid biopsy in cancer diagnostics, see review by Siravegna, Marsoni, Siena, & Bardelli, 2017). For the usage of microbial DNA in salivary samples, see Chaps. 7 and 13.

4 Applications of Saliva in Genetic Analyses

There are two main types of genetic analyses where saliva samples are used: genetic tests on the individual level and genetic association studies that compare groups (see Table 6.1). Genetic tests aim to detect inherited risk factors for specific diseases (e.g., sickle cell disease, cystic fibrosis) helping diagnosis, whereas the current association studies try to reveal genetic susceptibility for developing common diseases (e.g., diabetes mellitus, hypertension, and Alzheimer’s disease) by linking certain genetic variants to disease state or associated medical, physical, and psychological characteristics (e.g., blood sugar level, blood pressure, memory functions, respectively). In these studies, common gene variants—the so-called polymorphisms with allele frequencies higher than 5%—are the most often analyzed (for more information on human genetic topics, check NIH website: https://ghr.nlm.nih.gov/primer). Since individual genetic factors usually explain only a small portion of the heritability in complex (multifactorial) diseases, large numbers of study participants are needed to detect their modest effect (for an educational review, see Craig, 2008). Especially, GWAS require exceptionally large sample sizes, as they analyze thousands of polymorphisms in order to identify new genes which could be linked to the phenotypes of interest, without a priori hypotheses. In these large-scale epidemiological studies salivary samples became popular, since the consent rate for providing saliva is higher than for blood (Hansen, Simonsen, Nielsen, & Hundrup, 2007; Randell et al., 2016).

Genetic studies targeting children and the elderly, or with nonpersonal recruitment procedures (i.e., via mail or Internet) particularly benefit from the use of saliva. However, there are a few technical issues to consider when planning to recruit participants providing this noninvasive biospecimen for a genetic study. For example, higher consent and return rates were reported from patients with pediatric Crohn’s disease compared to controls in a pilot study by Kappelman et al. (2018): 75% of the contacted adolescent patients gave consent to their participation in the genetic study and returned saliva sample by mail, while only 44% of the sex and age matched controls gave their consent and saliva sample (expecting a gift card after the successful study enrollment). Another study investigated the effect of monetary incentive in donating biospecimen for genetic study: 43% of adult patients with inflammatory bowel disease participating in an internet-based survey gave salivary samples when 20$ was offered, and only 26% mailed back the saliva collection kit when no compensation was offered (Randell et al., 2016). The age of the targeted population also matters, because older adults are more willing to donate saliva and send the home-collected kit by mail. In a UK study, 84% consent rate was reported among older individuals with a chronic disease compared to 59% of the contacted families with a sick child (Bhutta et al., 2013). Importantly, the prospective or retrospective nature of the recruitment procedure (i.e., calling families before or after the doctor’s visit) can also impact consent rate. Where parental consent is necessary for the genetic study, contacting families before the doctor’s visit in order to provide detailed information about the aims of the study is advised, and taking the saliva at the clinic personally would result in higher rate of sample donation (Bhutta et al., 2013). In sum, adult patients can be easily recruited for a genetic study via mail, especially if a telephone call is made by a specialist physician providing detailed information before the sample collection at home. In the recruitment procedure of healthy adults, a follow-up telephone call is also advised after sending the information via mail.

Nowadays, biobanks all over the world store various kinds of biological samples from patients with specific diseases, and often DNA samples (derived either from blood or saliva) of participants from the general population (see https://biobanking.org/). Besides the classical clinical case-control studies, geneticists investigate population-based cohorts at an increasing rate, since well-characterized subjects with data on thousands of genetic markers are a valuable research resource for association studies, such as the UK Biobank (see publications at http://www.ukbiobank.ac.uk/genetic-publications/). Large registries of patients and healthy individuals are building up in almost every country, where—for the sake of non-biased inclusion—saliva samples are also accepted for genetic analyses (see the All of Us Program at https://allofus.nih.gov/).

5 Comparison of Salivary and Blood Samples in Genetic Analyses

Following the spread of easily accessible and affordable genotyping methods, many research groups could try out saliva collection methods and compare the resulting DNA samples to the “gold standard” blood DNA. The simplest type of genetic variation is the Single Nucleotide Polymorphism (SNP) with only two types of alleles. It is also the most common type of genetic polymorphism (Genomes Project Consortium et al., 2015). For more details on the biological background of genetic variations see the NIH Biological Sciences Curriculum Study (2007). Since usually small fragments (100–200 base pairs) are amplified from the human gDNA during SNP genotyping (see Table 6.3), even degraded DNA samples can give reliable results, which is an important issue in forensic applications (reviewed by Sobrino, Brion, & Carracedo, 2005). As most of the genotyping methods include an amplification step with sequence-specific primer pair, mixed origin, low-concentration salivary DNA samples with a tiny amount (picograms) of human DNA can be used. Of course, the DNA input requirements are higher at high-throughput, multiplex PCR methods, where unbiased amplification should be achieved for multiple primer pairs.

Table 6.3 Types of genetic variants in the human genome and their analytic methods

With recent genome-wide analyses, precise estimates were gained for the accuracy of genetic analyses performed with salivary DNA samples. Approximately 99% (or higher) concordance rates have been reported with matched saliva and blood-derived DNA samples on high-density SNP-microarrays (Abraham et al., 2012; Bahlo et al., 2010; Gudiseva et al., 2016; Hu et al., 2012). Using similar, hybridization-based genotyping assays for the detection of larger Copy Number Variation (CNV), paired blood and saliva specimens were compared on chromosomal microarrays obtained from three different companies. Importantly, the bacterial content (ranging from 3 to 21%) of salivary DNA did not affect the genotyping quality of any platform used (Reiner et al., 2017), proving that saliva is a reliable alternative DNA source for genetic testing.

As for the genetic methodology, applying two types of SNP genotyping (SNP microarray and Taqman assays), Abraham et al. (2012) showed a high concordance rate (>99%) between paired blood and saliva samples in the genotype results. In addition, high-quality Sanger-sequencing could be produced from most saliva samples (Gudiseva et al., 2016). Failed genotyping was reported only for samples with DNA concentration below 10 ng/μl, highlighting the need for concentrating samples with low DNA yield. Fewer studies were published on saliva collection issues in connection with Short Tandem Repeats (STR) or Variable Number of Tandem Repeats (VNTR), where the length of the targeted genomic region can range from 100 to 1000 base pairs (Table 6.3). Genotyping performance of VNTRs was not influenced by either saliva or DNA sample characteristics (Nemoda et al., 2011; Nishita et al., 2009), but degradation of DNA samples can affect long-range PCR amplification (reviewed by Alaeddini, Walsh, & Abbas, 2010). In conclusion, saliva is as good a source of human cells and gDNA as blood, performing similarly in a wide range of genetic analyses. Although the human-specific portion is lower, and the risk of impurity and DNA degradation are higher in salivary samples, with a careful quality checkup step most of the problematic samples can be recognized and excluded from the analyses.

6 Applications of Saliva in Epigenetic Analyses

Following the disappointing results of the first wave of GWAS, the pursuit for the “missing heritability” prompted researchers to measure epigenetic variants in order to study the underlying biological mechanisms of gene–environment interactions (Manolio et al., 2009). Until recently, epigenetic analyses have been restricted to the affected, disease-relevant tissues, since a substantial portion of the epigenetic marks is tissue specific. However, using the appropriate tissue for epigenetic association studies is often not feasible (e.g., having liver samples for metabolic diseases or brain samples for neurological disorders). Therefore, researchers started to use surrogate, easily accessible peripheral tissues, such as blood, saliva, or buccal cells. Previously, most of the epigenome-wide association studies (EWAS) used blood-derived DNA samples, although buccal cells could potentially serve as better surrogate tissue in non-blood-based diseases (Lowe et al., 2013). Importantly, saliva contains both buccal and blood cells (see Fig. 6.1); hence, it can serve as a good alternative source of gDNA in several EWAS. Although it is still questionable which peripheral tissue is more relevant for studying certain non-blood-based diseases or traits, the answers would be hopefully revealed by current bioinformatic analyses of epigenomic and transcriptomic datasets of various tissues, which are publicly available for researchers (e.g., Gene Expression Omnibus, GEO, https://www.ncbi.nlm.nih.gov/geo/).

Epigenetic marks create important information above the genetic sequence (epi—in Greek means on, above, over) which govern gene expression in multiple ways. These mechanisms are responsible for long-term regulation, switching on exclusively those genes which an individual cell requires (Almouzni & Cedar, 2016). Once the cell (and tissue) identity is established, epigenetic marks are transferred from the mother cell to the daughter cells during somatic cell divisions (contributing to the cellular memory). These marks include covalent modifications of the gDNA and the chromatin-associated histone proteins (such as acetylation, methylation, phosphorylation, and ubiquitination), controlling the accessibility of the chromatin structure. Importantly, there exists a reciprocal cross talk between these processes, therefore many studies measure only one type of epigenetic marks. Due to the stability of DNA, studying the chemical modifications of gDNA is one of the most popular epigenetic analyses, which will be discussed in this section. Analyses of other types of epigenetic mechanisms (such as chromatin structure and histone modifications) are more sensitive (requiring freshly frozen or processed samples), technically laborious, and expensive. Hence, mostly disease-specific tissues are studied by these detailed methods in cancer and infectious disease research; for these types of epigenetic analyses using saliva and oral tissues see Chaps. 9 and 19.

7 An Overview of Epigenetic Modifications on the DNA Molecule

The most frequent and most widely studied covalent modification in the human genome is the methylation of the cytosine base of a CpG dinucleotide which makes up about 1% of the genome (The letter “p” represents the phosphodiester bond between cytosine and guanine). The 5-methylcytosine (5mC) is often called as the fifth base of the DNA, present in 0.6–0.8% of all bases in the human genome, depending on the developmental stage and tissue type. The majority of methylated CpG sites is located in repetitive sequences and confers repression on transposable elements (the so-called “junk” DNA). When DNA methylation occurs at crucial regulatory regions of protein-coding genes, such as promoters and enhancers, it usually correlates with gene silencing, especially at the so-called CpG islands (CpG-rich regions with high G & C base content). This transcription repression can be achieved by either directly inhibiting transcription factor binding or recruiting chromatin-modifying proteins (see review by Deaton & Bird, 2011). However, methylation at CpG islands located in gene bodies might result in the opposite effect by preventing aberrant transcription initiation events in order to guarantee the correct mRNA transcription (Neri et al., 2017). Therefore, it is important to note that an increase in DNA methylation level can either repress or enhance gene expression depending on the genetic position of the methylation change. Since the majority of previous analyses focused on promoter regions, the repressive feature of DNA methylation would be applied in the subsequent sections when discussing the role of this epigenetic process. The level of DNA methylation is usually expressed in percentages, although it is a binary signal (i.e., a certain CpG-site can be either methylated or non-methylated on the chromosome). It is due to the fact that the DNA content of a biological sample comes from a pool of cells whose DNA methylation patterns might differ substantially (see examples in Fig.6.1 lower part). Therefore, the percentage of DNA methylation is more representative of the proportion of cells in which certain CpG-sites have been methylated in order to shut down the transcription of the respective gene (according to the simplified model using the repressive feature of DNA methylation).

The 5mC can be further modified by hydroxylation catalyzed by a family of oxidases, the Ten Eleven Translocation (TET) enzymes (for a review, see Huang & Rao, 2014). Relatively high 5-hydroxymethylcytosine (5hmC) levels are reported in the brain, while other organs have various amount of this epigenetic modification (on average ∼0.1% of all bases in the human genome, for details see Nestor et al., 2012). Growing evidence supports the hypothesis that 5hmC is an intermediate in active DNA demethylation processes, and it might also be involved in gene expression regulation (reviewed by Wu & Zhang, 2017). Additionally, 5hmC can be further oxidized to formyl- and carboxylcytosine, but these covalent cytosine modifications are observed at much lower rates in the genome of mature cells compared to 5mC and 5hmC (1000 and 10,000 times less, respectively). Hence, the formyl and carboxyl modifications are not measured routinely. It is noteworthy that stem cells can have a substantial level of these oxidized forms during organogenesis, as they are involved in DNA demethylation, a process yielding activation of genes, which is needed for acquiring new cell type-specific features.

It is important to mention that, until recently, the most commonly used techniques in epigenetic analyses were based on the classical bisulfite conversion of the gDNA, which cannot distinguish between 5mC and 5hmC (see Fig. 6.2c). Therefore, the more accurate terminology for 5mC and its oxidized forms together is “modified cytosine” when referring to previous DNA methylation array or pyrosequencing data where bisulfite converted samples were used. In newly developed methods, 5mC and 5hmC signals can be differentiated with additional chemical reactions (reviewed by Nestor, Reddington, Benson, & Meehan, 2014). On Fig. 6.2, green check marks indicate techniques which are specific enough to be used for accurate DNA methylation and hydroxymethylation analyses. However, most of the accumulated EWAS results obtained from blood samples can be still regarded relevant for DNA methylation data, because normal white blood cells have negligible 5hmC levels (around 0.02% of all bases, less than 5% of modified cytosines), hence one can argue that there is no need to distinguish it from the 5mC mark. There is less data on the different DNA modifications in oral mucosa, which can affect salivary 5mC and 5hmC levels (see the different cell types in saliva at Fig. 6.1). One study reported higher global 5hmC level in saliva compared to blood (0.036 vs. 0.027%, Godderis et al., 2015), but this is still in the range of <5% of DNA modifications. Although further studies with larger sample sizes are needed for more accurate estimates of salivary 5hmC levels, it seems that the proportion of 5hmC in saliva is similar to that in blood. In conclusion, the distinction between 5mC and 5hmC can be biologically important for certain cell types such as neurons and stem cells, but marginally important for others like leukocytes and buccal cells. Therefore, in the following section, only DNA methylation studies of saliva compared to blood samples would be discussed in detail.

Fig. 6.2
figure 2

Techniques for detection of epigenetic modifications in human DNA. The non-modified cytosine base (C) can be methylated (m) in a CpG dinucleotide sequence (underlined), which can be further modified (hydroxymethylation: h). For details on the biochemical processes creating 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) see the overview of epigenetic modifications. (a) In the affinity enrichment methods, specific antibodies are applied in order to differentiate 5mC and 5hmC and also distinguish them from formyl- and carboxylcytosine. (b) Specific restriction enzymes can also differentiate covalently modified C from non-modified C. The scissors show when the CCGG sequence can be cut (∇ indicates cleavage site), whereas the red X indicates when an enzyme cannot cut the DNA. Both HpaII and MspI enzymes cleave the CCGG sequence with a non-modified CpG-site (left panel), but HpaII is blocked by any kind of chemical modification (other two panels). MspI can cut the CCGG sequence with 5mC or 5hmC in the middle, but its cleavage is blocked by glycosylated 5hmC (right panel). Therefore, the classical enzyme pair HpaII and MspI cleavage ratio gives information about the modified C (5mC & 5hmC together)/non-modified C level, but with a prior glycosylation (catalyzed by β-glucosyltransferase using UDP-glucose) the 5hmC can be differentiated from 5mC. (c) With the classical bisulfite treatment—used as a first step in bisulfite cloning, pyrosequencing, and at Infinium Human Methylation BeadChip arrays—the non-methylated C is converted to uracil (U is replaced by T in the PCR), whereas both 5mC and 5hmC stay as C. This way, the C/T ratio readout shows the proportion of modified C (5mC & 5hmC together)/non-modified C, which is an inaccurate measure for DNA methylation in certain tissues (indicated by a red exclamation mark). Note that after bisulfite conversion, the two parallel DNA strands are not complementary anymore, therefore one strand has to be selected for amplification and subsequent bisulfite sequencing (BS-seq). Additional chemical reactions can separate 5mC and 5hmC signals. For example, the oxidative pretreatment with potassium perruthenate (KRuO4) changes 5hmC to 5-formylcytosine, which would be changed to U at the subsequent bisulfite conversion step, then to T in the PCR amplification (similarly as non-methylated C is converted to U, then to T). This technique is called as oxidative bisulfite sequencing (oxBS-Seq, developed by Booth et al., 2013) where the readout of 5mC is achieved, whereas for 5hmC detection the comparison of the classical BS-seq and oxBS-Seq information is needed. Another alternative method is the TET-assisted bisulfite sequencing (TAB-seq, Yu et al., 2012), which involves a β-glucosyltransferase-mediated protection of 5hmC (The big glucose moiety protects this modified C from further chemical reactions), and subsequent oxidation with recombinant TET enzyme, which turns 5mC to 5-carboxylcytosine (5caC). The following bisulfite treatment and PCR amplification would change both originally non-methylated C and 5caC (derived from 5mC) into T, whereas 5hmC is read as C

8 Epigenetic Studies Measuring Environmental Effects in Salivary Samples

Besides its crucial role in embryogenesis, where innate developmental signals elicit epigenetic changes in a highly predictable pattern, DNA methylation is also responsive to the external environmental cues (Szyf & Bick, 2013). Growing evidence shows interindividual variation in every stage of life, even after birth in monozygotic twins. These differences between monozygotic twins potentially reflect the cumulative effects of environmental exposure (Tan, Christiansen, von Bornemann Hjelmborg, & Christensen, 2015). Analyzing monozygotic twin pairs is an important model in epigenetic studies, because it enables us to rule out the effects of genetic polymorphisms. While DNA methylation is a dynamic and reversible process, it is the most stable epigenetic mark as it is part of the covalent structure of the DNA itself; therefore, it has the potential to serve as a biomarker. The overall stability of DNA methylation patterns has been shown in longitudinal study samples (Forest et al., 2018), although at certain CpG-sites there could be significant age-related changes (Horvath et al., 2016). Consequently, the use of DNA methylation analyses in association studies has been increasing over the last decade (for brief summaries of recent epigenetic publications visit https://www.whatisepigenetics.com/). These epigenetic studies are often conducted on surrogate peripheral tissues, such as blood or saliva. Since saliva is easily accessible, and sampling is less invasive than blood, a relevant direction in epigenetic research is to establish if the changes in DNA methylation patterns of saliva samples are comparable to that of the target tissue (e.g., liver or brain).

Although many studies reported high correlations in methylation levels between blood-derived and salivary DNA samples (reviewed by Langie et al., 2017), one should not forget that the majority of CpG-sites are located within repetitive sequences with high methylation levels and at CpG islands of housekeeping genes with low methylation levels, resulting in little variability within or between individuals. Hence, these sequences are not ideal for association studies measuring correlations between DNA methylation level and environmental exposure. A recent epigenome sequencing analysis demonstrated that only about 10% of the human CpG-sites showed interindividual variability, representing 2 million out of the 26.8 million autosomal CpG-sites (Hachiya et al., 2017). However, less than 2% of the total CpG-sites have been analyzed with previous array-based techniques, which measured 27–450,000 sites. The presently available DNA methylation array (EPIC BeadChip analyzing more than 850,000 sites) has increased number of CpG-sites but it is still cancer research oriented, meaning that it covers most of the human genes; it does not focus on regions which are variable between healthy individuals, and could be informative for association studies.

When using surrogate tissue for their analyses, researchers are warned that only a minority of the variable CpG-sites show correlations between the DNA methylation levels of different tissues, as it was shown for example in paired blood and brain samples (Hannon, Lunnon, Schalkwyk, & Mill, 2015). Interestingly, DNA methylation profiles of saliva samples were more similar to publicly available data of brain samples, compared to that of whole blood samples (Smith et al., 2015). Comparative DNA methylation analyses of matched brain, blood, saliva, and buccal samples showed high overall correlation between brain and peripheral tissue (r = 0.90 for saliva-brain, r = 0.86 for blood-brain, r = 0.85 for buccal-brain) when assessing the average methylation level at each CpG-site in a group of 21 patients undergoing brain resection (Braun et al., 2019). However, the proportion of CpG-sites showing significantly similar DNA methylation levels between the target and surrogate tissue was the highest in blood samples (20.8% compared to 17.4% in buccal and 15.1% in saliva samples). The main conclusion of this study is that the similarity of DNA methylation patterns of different tissues highly depends on the actual chromosomal region. Researchers can check the degree of cross-tissue correlation of the analyzed CpG-sites on the study website (Iowa Methylation Array Graphing for Experimental Comparison of Peripheral tissue & Gray matter, IMAGE-CpG, at https://han-lab.org/methylation/default/imageCpG#). As for blood-saliva comparisons, epigenome-wide array data of paired samples showed that 2–4% of the assayed CpG-sites were differentially methylated (Langie et al., 2017). Thus, selection of informative CpG-sites is highly recommended for EWAS to: (1) reduce the number of analyzed loci, which would be crucial for the detection of moderate-small effects given the available statistical methods; (2) focus only on those gene regions which are responsive to environmental stimuli (i.e., the variable CpG-sites); (3) select CpG-sites with good reported correlations of DNA methylation level between the surrogate and target tissue.

Based on animal and human epigenetic studies, DNA methylation mechanisms are proposed to be involved in recording early life experiences, thus influencing gene expression in order to fine-tune the activity of physiological systems. In particular, the prenatal environment, where the majority of epigenetic modifications are established, can have long-lasting effects on DNA methylation patterns. This has been shown in relation to both physical and psychosocial environmental exposure (see reviews by Marsit, 2015; Nemoda & Szyf, 2017). Based on previous epigenome-wide and targeted DNA methylation analyses, it is hypothesized that epigenetic changes involved in life-long responses to the intrauterine and early life environment are system-wide; hence, potentially detectable in multiple tissues. For example, after the pioneering animal studies, psychosocial stress evoked DNA methylation changes have been reported at the glucocorticoid receptor gene promoter in human studies using different tissues (reviewed by Turecki & Meaney, 2016). Increased methylation at the 1F promoter region of the glucocorticoid receptor gene was associated with childhood adversity in brain hippocampal samples of deceased adults (McGowan et al., 2009). It was also associated with prenatal exposure to maternal stress in newborns’ cord blood, and in infants’ salivary samples, although the affected CpG-sites varied (see meta-analysis by Palma-Gudiel, Cordova-Palomera, Eixarch, Deuschle, & Fananas, 2015). Therefore, saliva could be a suitable surrogate tissue in DNA methylation analyses, enabling measurement from an early age, even from early infancy.

However, other studies using blood or saliva to assess epigenetic changes caused by different intrauterine environment (e.g., birth weight discordant monozygotic twins) did not show significant differences in DNA methylation patterns of adult twin pairs (Souren et al., 2013; Tan et al., 2014). It has to be emphasized that these studies analyzed approximately 450,000 sites (using Illumina’s Infinium HumanMethylation450 BeadChip array), without reducing the informative CpG-sites in their statistical analyses, as it was later suggested by Edgar, Jones, Robinson, and Kobor (2017) in their data reduction method, which lists more than 100,000 non-variable CpG-sites in both blood and buccal epithelial cells. Considering the limitations of current genome-wide studies assessing thousands of sites with potentially small individual effects, it is not surprising that none of the associations reached statistical significance. In addition, epigenetic changes triggered by early life adversity could be overshadowed by later environmental exposure (As of note, 34- and 63-year-old adults were analyzed in the mentioned twin EWAS yielding no significant associations). Studying a younger age group and using a reduced number of CpG-sites, birth weight discordance was associated with within-pair differences of salivary DNA methylation at genes involved in neurodevelopment, as well as with differences in brain shape and size of the adolescent MZ twins (Casey et al., 2017). Using another approach to reveal biological processes, Zaghlool et al. (2018) analyzed intermediate molecular phenotypes, including blood, urinary, and salivary metabolite levels, and reported associations with DNA methylation levels at selected CpG-sites previously linked to diabetes mellitus, obesity, and smoking. Salivary tyramine metabolite, for instance, was associated with CpG-sites linked to smoking.

Finally, there are still a lot of technical issues that must be considered when using surrogate peripheral tissues for DNA methylation analyses. For example, although the type of somatic cells used for genetic analysis is irrelevant, tissue type variation and intraindividual differences in cell composition of non-sorted biological samples can hide authentic epigenetic differences. In addition, DNA methylation levels in blood and saliva samples can be affected by age, sex, and ethnicity (Horvath et al., 2016), and also by genetic variants (i.e., methylation quantitative trait loci, for details, see Do et al., 2017). Therefore, proper data processing is necessary to control for heterogeneity in samples—even when a cohort is homogenous and the biological sample type is the same throughout an epigenetic study—because different cell composition ratios can still substantially affect DNA methylation patterns. When using epigenome-wide arrays, the different proportion of leukocytes (mostly granulocytes) and buccal epithelial cells in salivary samples can be adjusted by reference-based or reference-free statistical methods (Langie et al., 2017). Cell ratios can also be assessed by measuring specific markers selected from cell type-specific CpG-lists in candidate gene analyses (Eipel et al., 2016). Lastly, caution should be taken when interpreting differences in DNA methylation levels, since current laboratory methods measuring epigenetic marks can be biased. This can be a common issue when using bisulfite-converted templates due to the different chemical properties of C- and T-rich DNA strands. However, this technical problem can be easily detected with internal controls and solved by suitable correction methods (see Moskalev et al., 2011). In conclusion, with careful design and appropriate additional analytical steps, salivary DNA samples can be successfully applied in epigenetic association studies.

9 Future Directions and Opportunities

Based on recent (often negative) findings of GWAS, it seems that individual genetic factors linked to complex diseases or traits explain only a small proportion of the inherited component of phenotypic variance. To improve the ability to detect moderate effects, researchers in medical genetic fields are aiming at: (1) increasing sample size in specific GWAS cohorts and pool samples in international consortia which would allow for conducting meta- and mega-analyses to identify genetic variants with small effects; (2) applying more precise intermediate or endophenotypes, which are influenced by fewer genetic variants (Blanco-Gómez et al., 2016); (3) studying gene–gene and gene–environment interactions in order to reveal the “missing heritability” (Manolio et al., 2009). The use of saliva as a biospecimen seems valuable to these goals. Genetic studies over the last decade have shown that saliva is a reliable source to study inherited genetic variants present in every somatic cell of an individual. Moreover, because sequencing the coding gene regions (i.e., exome) or the whole genome in large patient cohorts is now a reality, studying rare genetic variants in the pathomechanisms of complex inheritance diseases became possible, thus supplementing present GWAS that measure common genetic polymorphisms. Importantly, the bioinformatic and statistical tools dealing with this enormous data have to be constantly updated in research laboratories. Fortunately, the research community is providing free program packages which can be easily applied (e.g., the R Project for Statistical Computing at https://www.r-project.org/).

In the wake of technical developments, the number of studies using saliva is likely to increase exponentially both in genetic and epigenetic analyses. However, caution is needed in DNA methylation studies due to numerous technical issues (Langie et al., 2017). In order to overcome the various biological and statistical challenges, improvement of bioinformatic analyses is continuously needed in this area of research. Fortunately, open access to publications has been increasing. These include databases of analytical procedures (e.g., European Bioinformatics Institute, https://www.ebi.ac.uk/) and genome-wide datasets (such as dbGAP, https://www.ncbi.nlm.nih.gov/gap), which are helping the research community to achieve scientific goals (see Complex Disease Epigenetics Group at https://www.epigenomicslab.com/). Similarly to genetic studies, the spread of sequencing methods could also widen the repertoire of analyzed CpG-sites in epigenetic association studies. Likewise, pooling different samples with the help of international consortia could facilitate the generalization of EWAS findings (Flanagan, 2015). Once the technical issues are controlled, epigenetic studies should offer great possibilities in disease prevention and management, as proposed by the developmental origins of health and disease (DOHaD) concept (Rosenfeld, 2015). This could be achieved, for instance, via longitudinal studies using epigenetic analyses at multiple time points starting with the in utero environment (assessed at birth), then in infancy and childhood, thus focusing on the most sensitive periods to adverse environmental effects. The potential outcomes of these studies would help intervention programs concentrate on specific time points. Furthermore, epigenetic changes are dynamic and could be modified by the reverse enzymatic processes later in life, potentially even in adulthood, hypothetically allowing epigenetic treatments. According to present theories of disease development, the early life environment can alter the genetically determined program to prepare the individual for the anticipated environment later in life (e.g., poor nutrition during early life would predict life-long undernutrition), prompting functional epigenetic changes. However, the adaptive responses may become maladaptive when there is an inconsistency between the anticipated and the real environments later in life, resulting in metabolic, cardiovascular, or mental health problems (Gluckman, Hanson, Cooper, & Thornburg, 2008). Finally, although there are still lots of technical obstacles in clinical epigenetics (Aslibekyan, Claas, & Arnett, 2015), linking specific epigenetic alterations to disease-specific gene expression changes in the background of common diseases would pave the way for the development of targeted epigenetic treatments (Szyf, 2015).