Introduction

The neurotransmitter serotonin (5-hydroxytryptamine (5-HT)) and its transport in the serotoninergic system have been implicated in a wide range of neurobiological conditions, often pathological and mainly psychiatric, with a major impact in Western countries. It has been estimated that psychiatric conditions cost about $57.5B in the USA, equivalent to the cost of cancer care in 2006. The World Health Organization (WHO) has already reported that mental illnesses are the leading causes of disability-adjusted life years worldwide, accounting for 37 % of healthy years lost from non-communicable diseases. Depression alone accounts for one third of this disability. The most recent WHO report estimates the global cost of mental illness at nearly $2.5T (two thirds in indirect costs) in 2010, with a projected increase to over $6T by 2030 (source: National Institute of Health, National Institute of Mental Health, available at URL http://www.nimh.nih.gov).

The 5-HT transporter (5-HTT) is the “main mover” of serotonin across the serotoninergic system. 5-HTT is an integral membrane protein of 630 amino acids (for review, see [1, 2]) containing 12 transmembrane domains and functions as a highly selective sodium- and chloride-dependent transporter. After release from presynaptic axons, the 5-HT action is terminated by the presynaptic 5-HTT, which mediates the serotonin reuptake from the intersynaptic space, ensuring its recycling into new cytoplasmatic vesicles [3, 4]. Duration and magnitude of 5-HT biological actions largely rely on 5-HTT, thus acting as a master regulator of the fine-tuning of 5-HT signaling. Dysfunction in this signal pathway has been implicated in a host of psychiatric disorders and traits including affective disorders, schizophrenia, anxiety, autism, depression, suicide, obsessive-compulsive disorders, and addiction [5]. Accordingly, 5-HTT is the major target of the selective serotonin reuptake inhibitors (SSRIs) [6], a set of medications widely used to treat depression, anxiety, and other emotional and behavioral disorders [714].

5-HTT is encoded by the human solute carrier family 6 (neurotransmitter transporter), member 4 (SLC6A4) (GenBank accession number NG_011747.2) at locus 17q11.2 [15, 16]. The gene, composed of 15 exons spanning ∼40 kb, is expressed in the central nervous system (CNS) [17], blood platelets [18], lymphoblasts [19, 20], and enterochromaffin cells of the gastrointestinal tract [21]. Up to date, the SLC6A4 gene has been studied more than any other target gene in the field of neurobiology. A large number of studies have been conducted to determine whether genetic variation at SLC6A4 locus contributes to variation in 5-HT reuptake. Major findings from these studies indicated that combinations of non-coding and rare coding variants could change serotonin transport as much as 40-fold in vitro [22]. A crucial role for non-coding variants in altering 5-HTT messenger RNA (mRNA) levels, thus regulating 5-HT reuptake, has been demonstrated. To this regard, the 5-HTT gene-linked polymorphic region (5-HTTLPR), located in the promoter region of the SLC6A4 gene, is the most extensively and best characterized of these non-coding variants.

The 5-HTTLPR (rs4795541) is a variable number of tandem repeat (VNTR), composed of 20 to 24 bp imperfect repeat units, with a total of 28 currently known units. The most common alleles have been classified into two main allelic groups: the short (S) allele (14 repeats) and the long (L) allele (16 repeats) by Heils et al. [23]. A number of less common extra-long (17–24 repeats) and extra-short (11–13 repeats) alleles have been grouped into the minor XS and XL allele variants, respectively.

The S allele is present in 42 % of Caucasians and in 79 % of Asians, whereas L allele is much more frequent in Western compared to Asian populations. The S/S genotype is present in the 22 % of Caucasians and in 60 % of Asians, whereas the L/L genotype is present in 29–43 % of Caucasians, but in 1–13 % of East Asians [24, 25]. The overall incidence of rare XS and XL allelic variants is highest in individuals of African or African-American ancestry and East Asian populations [2634] although individuals heterozygous for rare size variants were also identified in European-Caucasians and American populations [31, 3538]. Traditionally, the S allele is associated to low expressing function, which negatively affects serotonin reuptake rate (Fig. 1). Notably, although a large part of the neurogenetics literature concerned only the 5-HTTLPR, there are two single nucleotide polymorphisms (SNPs) rs25531 [40] and 25532 [41] located within the 5-HTTLPR that introduce additional allele frequency variation and alter functionality.

Fig. 1
figure 1

Representation of the SLC6A4 gene encoding the serotonin (5-hydroxytriptamine (5-HT)) transporter (5-HTT), an integral membrane protein which mediates the reuptake of the serotonin from synaptic spaces into presynaptic neurons. The 5-HT can undergo to enzymatic degradation by monoamine oxidase A (MAO-A) or to recycling into synaptic vesicles. A repeat length polymorphism (5-HTTLPR) in the promoter gives rise to two main alleles, the short (S) and long (L) variants, which show different functional influence on SLC6A4 transcription. Traditionally, the S variant is associated to low-expressing phenotype, which negatively affects the 5-HTT recycling from synaptic cleft. Adapted from Canli and Lesch [39], with permission from Nature Publishing Group

Up to date, myriad genetic association studies have examined the role of 5-HTTLPR in mediating vulnerability to neuropsychiatric illness [4245] or as predictor of antidepressant response [46, 47]. Since 1996, when the first functional study has been described, data demonstrating the role of the 5-HTTLPR in influencing SLC6A4 promoter efficiency are still discussed.

The aim of this review is to discuss updated findings stemming from neurogenetics/neurobiology papers dealing with the molecular influence of the 5-HTTLPR promoter variants on SLC6A4 gene expression. To this end, we will detail the genetic architecture and arrangement of repeat elements for the known 5-HTTLPR alleles. Mechanisms at molecular level that might explain the variation in the 5-HTTLPR are also considered.

We also provide a description of the uncommon allelic variants reported in literature. We critically review in vitro functional studies investigating the role of 5-HTTLPR on SLC6A4 promoter efficiency.

A deeper knowledge of the “5-HTTLPR universe” will be useful to better understand the molecular basis of serotonin homeostasis and the pathological basis underlying serotonin-related neuropsychiatric conditions and traits.

The Genetic Architecture of the 5-HTTLPR: Investigating the Variation of VNTR Structures

Gelernter and colleagues [48] were the first to attempt to classify the SLC6A4 polymorphisms: a VNTR in intron 2 [49], a PstI restriction fragment-length polymorphism detected with a 3′-UTR probe [50], and a second imperfect VNTR in the promoter region (i.e., the 5-HTTLPR) [23]. These polymorphisms were referred to as the “A”, “B,” and “C” polymorphisms, by order of publication. Accordingly, the S and L alleles of the 5-HTTLPR were designated C14 and C16, respectively, to denote the classification system and the number of repeat units.

This nomenclature, however, has been rapidly modified since discovery of several S and L allele variants by Nakamura et al. [32], which introduced a new nomenclature of the alleles (the number of repeats followed by a letter, 14-A etc. and 16-A etc.; DDBJ/EMBL/GenBank nucleotide sequence database accession numbers AB031247-AB031259) and adopted a different alignment of the repetitive elements compared to that used by Heils et al. [23].

Nakamura et al. provide evidence that the 5-HTTLPR contains 20 different repeat units, named by Greek letters and arranged to generate several different allelic variants [32]. A schematic representation of 5-HTTLPR belonging to the L and S alleles is shown in Fig. 2. The nomenclature introduced by Nakamura et al. has been further adapted by Murdoch et al., which converted the letter to a corresponding number (e.g., 14-A to 14-1, 16-A to 16-1, and so on) [31]. Recent evidence has been provided that 5-HTTLPR contains at least 28 different repetitive units, being one of them a novel repeat unit first reported by the authors of this study (additional data are given in Supplementary Material). The nucleotide sequence of all repeat units described so far is shown in Table 1. Overall, 39.29 % of units (n = 11) are composed of 22 bases, 25.00 % (n = 7) are composed of 23 bases, 17.86 % (n = 5) are composed of 20 bases, 14.29 % (n = 4) are composed of 21 bases, and 3.57 % (n = 1) are composed of 24 bases. The initial sequence TGCA is shared by 92.8 % of the units. The repetitive units display a GC% > 59.00 %; a cytosine is present in conserved positions in all repeat elements. The different units are arranged to generate several allelic variants consisting of 11 to 24 repeat units and including variants longer (17–24 repeats) and shorter (11–15 repeats) than L (16 repeats) allele. The genetic architecture and arrangement of repeat elements for each allelic variant are summarized in Table 2. To avoid confusion about the nomenclatures assigned to 5-HTTLPR alleles, we assumed the canonical 14-A or 14-1 as the S allele and the canonical 16-A or 16-1 as L allele throughout the text.

Fig. 2
figure 2

Schematic representation of the SLC6A4 L and S allelic variants. The genetic architecture of the 5-HTTLPR, which is located in the upstream regulatory region, is depicted in detail. Individual repeat units of the 5-HTTLPR are identified by Greek letters according to Nakamura nomenclature [32]. The SNP rs25531 converts the ζ element into the μ and the SNP rs25532 converts the μ element into the ä

Table 1 Nucleotide sequence of the variable number of tandem repeat (VNTR) units of the 5-HTT gene-linked polymorphic region (5-HTTLPR)
Table 2 Alleles of the human 5-HTTLPR promoter: genetic architecture and arrangement of repeat units

Allelic variation in repeat architecture and structure play an important role in the genetics of human disease, such as diabetes or epilepsy [55, 56].

The 5-HTTLPR VNTR composition shows 40 different allelic variants currently known. Interestingly, in the range of 11 to 24 repeats, alleles with 12 and 23 repeats are missing. The most common composition (39 out of 40, 97.5 % of the VNTRs) in the first five positions is α-β-γ-δ-ε, followed by α-β-γ-δ-ε-ζ in 70 % of the structures. Overall, 67.5 % of the structures (27 out of 40) end with the unit sequence η-τ-ι-κ-λ-μ-γ-ξ. The genetic architecture of the VNTR shows a pattern of conserved sections of repetitive elements interspersed with varying or expanded regions in the case of longer allelic variants. The array of the inserted repeat units in the place of the ζ element (the sixth repeat) of the S allele accounts for most of the differences in the polymorphic region [32]. This suggests that the sequence around the ζ element may represent a recombination “hot spot”. The S and L alleles contain several hot spot for deletion mutagenesis (TGCAGCC) [23, 26, 57]. This sequence appears repeatedly at the beginning of the inserted elements in various alleles, so the same sequence may also account for the duplication observed in the allelic variant longer than the L allele [26, 32]. The guanine-cytosine (GC) content of the repeat units (GC% > 59 %, see also Table 1) may enhance the formation of four-stranded DNA structures, which facilitate chromosome alignment during homolog pairing [58]. Furthermore, the presence of hot spot sequences and GC richness in the VNTR may explain the extreme variation in the 5-HTTLPR as minisatellite regions are known to act as hot spots for homologous recombination [59, 60]. Similar variation in GC-rich VNTR is present and directly associated with recombination events both in the 5′-flanking region of the human insulin gene and in the major histocompatibility locus [6163]. Deletions/duplications of the repeat units could also result from somatic instability occurring by DNA replication slippage [32, 65, 66]. Due to the overall homologies between the repeat elements, different chromosomal crossovers may result in duplication or deletion events leading to allelic variants larger or shorter in size than L allele. This also means that the true origins of duplications or deletions cannot be confidently determined making the definition of the boundaries somewhat arbitrary [23, 31, 32].

Does the 5-HTTLPR Affect Transcriptional Activity of the Full-Length Promoter? Lessons from In Vitro Functional Studies

The SLC6A4 full-length promoter region actually spans ∼2 kb and is defined by a TATA-like motif and several putative transcription factor binding sites including a cAMP response element (CRE)-like motif, AP1, AP2, MZF1, NfkB/Elk1, SOX5, and CTFC [52, 67]. The transcriptional binding sites may play some important roles in the endogenous SLC6A4 expression and its sensitivity to allelic status at the 5-HTTLPR [52]. The initially reported size of the promoter region between the 5-HTTLPR and the transcription initiation start site [49, 68] was further revised with the identification of a somatic deletion/insertion (Mortensen et al. GenBank AF1265506 [54], Flattem et al. GenBank AF117826 [52], Lesch et al. GenBank X76753.2 [57]). The smaller size of the early amplified promoter region [49, 68] may have arisen from its in vivo instability. As rearrangement at the SLC6A4 locus may have occurred during genomic library production or clone isolation and amplification, Lesch and coworkers thus identified a form of the 5-HTT gene lacking the 380 bp promoter region later identified by others [52, 54].

The promoter VNTR (the 5-HTTLPR), which is located in the proximal part of the 5′-flanking regulatory region and is 1.4 kb upstream of the transcription start of 5-HTT (see Fig. 2), has been suggested to have the ability to affect transcriptional activity of the SLC6A4 gene promoter. Noteworthy findings of in vitro promoter functional studies are discussed below and summarized in Table 3. The overview underlines different results determined by assays and cell lines employed in the studies.

Table 3 Overview of SLC6A4 promoter functional studies: cell-specific expression of the canonical L and S variants

Earlier studies on the human 5-HTT promoter were conducted by Heils and coworkers [68]. Deletion mapping of the human SLC6A4 promoter demonstrated that the first 300 bp contained a silencer region and that information contained within ∼1.4 kb of the 5′-flanking sequence conferred cell-specific expression [68].

The promoter activity was effectively observed in human placental choriocarcinoma serotonergic JAR cell line [72], whereas no activity was detected in both 5-HTT-deficient human neuroblastoma SK-N-SH and HeLa cells, transfected with promoterless reporter gene expression vector [68]. Further, functional studies of the native L and S allelic variants in human JAR cell line revealed that transcriptional activity is modulated by the 5-HTTLPR. This VNTR is likely to form a complex tetrastrand-like secondary structure and probably interacts with other promoter regulatory regions [23].

Comparison of the polymorphic alleles revealed that basal transcription activity of the L variant was about 3-fold higher than the S variant. Forskolin and phorbol ester led to induction of transcription demonstrating that cyclic AMP (cAMP) and protein kinase C (PKC)-dependent mechanisms stimulated induction of both variants, although the dose-dependent increase was proportionally smaller in the S promoter variant [23]. L and S 5-HTT promoter constructs transiently transfected by Lesch et al. [69] into human lymphoblasts carrying different 5-HTT alleles reproduced reporter expression data obtained by Heils et al. [68]. Furthermore, lymphoblasts homozygous for the 5-HTT promoter L variant showed mRNA concentrations ranging from 1.4 to 1.7 times with respect to cells containing one (genotype L/S) or two copies of the S allele (genotype S/S). As well, a number of binding sites for radiolabeled 5-HTT ligands and 5-HT uptake in cell membrane preparations showed consistent genotype-dependent differences that persisted proportionally when transcription was induced through activation of cAMP- or PKC-dependent pathways [69]. Thereby, the S variant has been functionally associated with lower basal and induced transcriptional efficiency of the SLC6A4 promoter, resulting in decreased gene expression, lower amount of the serotonin transporter, and lower serotonin reuptake activity when compared with the L variant.

Since these pioneering studies, the influence of the 5′ promoter variants on the SLC6A4 expression has been investigated in several reporter gene assays [35, 37, 38, 40, 41, 54, 71]. The L and S variants of the SLC6A4 promoter displayed different promoter’s activity when in vitro transcriptional efficiencies was evaluated in different cellular environments by Mortensen et al. [54]. To accurately determine the regions of the full-length promoter involved in cell-specific activity, a series of nested 5′ deletions was assayed by transient transfection of gene reporter constructs in the kidney fibroblast-like COS-1, human placental JAR and rat raphe-derived neuronal RN46A cell lines [54]. The full-length 16-repeat promoter was active in the three cell lines, even though the COS-1 cells, which does not express endogenous 5-HTT, only showed a general constitutive activity for all constructs. In the RN46A cell line, the L promoter variant exhibited a higher activity than the S variant, indicating that the 16-repeat region may contain a positive transcriptional element in the 5′ regulatory region, or the insertion of two extra repeats may change the transcription factor (TF) binding profile by altering the physical distance. Accordingly, the 5′ end-deletion of the VNTR also lacking the 43-bp ins/del (missing in the S promoter) showed the same activity as the S promoter in the RN46A cells, but not in the COS-1 and JAR cell lines, confirming the presence of an activating element in the 5′ regulatory region. This result reproduced previous studies that employed corresponding promoter constructs [23, 69] or that investigated the expression of the native SLC6A4 in lymphoblastoid cell lines with homozygous (S/S and L/L) and heterozygous (S/L) genotypes [69] (Table 3). Removal of the 3′ region of the VNTR resulted in an increase of the basal transcriptional activity likely revealing the presence of a silencer element. Downstream deletions indicated the presence of cis-acting, cell-specific activating elements in the RN46A cell line, as well as one common regulatory element able to promote transcription in both RN46A and JAR cells. A reporter construct with a same deletion (−736) lacking both the VNTR and some positive regulatory elements reproduced almost similar detrimental effect on transcriptional activity in the 5-HTT-expressing human colon carcinoma SW480 cell line [71]. Overall, functional analysis performed by Mortensen et al. on the complete promoter, whose sequence revealed an internal 379 bp fragment not reported in previous publications, did not confirm the silencing effect of the 5′-end of the VNTR nor the higher activity of the 16-R promoter previously demonstrated in the serotonergic JAR cells [23, 68].

Sakai et al. also investigated the modulating effect of the 5-HTTLPR allelic variants on the transcription of the 5-HTT through evaluation of their putative enhancer/silencer activities in RN46A, COS-7, and PC-12 (a pheochromocytoma of rat adrenal medulla) cellular contexts [70]. Each 5-HTTLPR promoter variant, ligated upstream the SV40 promoter-reporter gene, only impaired the transcriptional activity (30–80 % of the control promoter vector) in the rat neuronal serotonergic RN46A cell line. However, no significant differences between the canonical S and L alleles were observed (Table 3).

This result was not in agreement with the previous work of Heils et al., which reported an allele-dependent repression of transcriptional activity of heterologous constructs in a different cellular context, the human placental JAR cell line [23]. In this report, the long 5-HTTLPR showed a strong silencer activity able to repress the reporter expression to a value of about 20 % of the promoter control, whereas the short form of the VNTR was less efficacious (about 54 %, a similar value to that measured by Sakai and coworkers). However, since the transcription of most genes depends on the cellular environment, the activity of the 5-HTT promoter in cell line such as JAR cells may not represent that in the serotonergic efferent neurons where 5-HTT transcription could undergo a more stringent regulation [73]. Overall, the silencing effect showed by 5-HTTLPR allelic variants in the raphe-derived RN46A cells was concordant with report of Mortensen et al. [54]. Nevertheless, their findings demonstrated significant difference between transcription levels driven by the L and S promoter variants in RN46A cells but not in the other cell lines tested (JAR and COS-1 cells). The different reporter constructs (the whole promoter region in the Mortensen’s reporter construct and the solely 5-HTTLPR in the Sakai’s heterologous constructs) could explain this discrepancy. The interaction between 5-HTTLPR and the downstream region may be required to differentiate the promoter activity of long and short allelic variants [54]. The 5-HTTLPRs differential silencer and/or enhancer activities reported by Heils et al. could be also ascribed to complex interactions with other 5-HTT promoter regulatory elements [23].

Over the last decade, the potential effect of the 5-HTTLPR alleles on basal SLC6A4 transcription was demonstrated by measuring reporter activity after transient transfection with constructs carrying the solely S and the L polymorphic VNTR. A statistically significant 2.8-fold increase in transcriptional efficiency produced by the L allele construct was observed when compared with the S allele in the RN46A neuronal cell line [40] (Table 3). Ehli and coworkers were able to replicate this result in the same recipient cell line, although not to the same effect (1.8-fold vs 2.8-fold) [35] (Table 3). Likewise, the allelic difference resulted in 2-fold higher transcriptional efficiency of the L promoter construct as compared to the S in the human colon carcinoma SW480 cell line [71]. Recently, we assayed the functional strength of L and S 5-HTTLPR variants using β-lactamase reporter constructs in the serotonergic JAR cell line [37, 38] (additional data are given in the Supplementary Material). An increase in the L 5-HTTLPR-driven reporter transcription in comparison to the S variant was observed (see Table 3), although the difference in the β-lactamase expression level in cell lysates was not statistically significant. Further studies performed in native lymphoblasts [7477] have highlighted the impact of these variants on SLC6A4 expression [74, 77]. The influence of the 5′ promoter variants is also reflected in assays with whole blood [78] and serotonin uptake assays in platelets [7981]. Conversely, imaging literature and expression data reported contradictory results regarding the actual functional effect of 5-HTTLPR genotype on brain transporter availability [76, 8295]. Almost all data provided no evidence for regulation of brain SERT availability by SERT promoter polymorphism both in healthy subjects and neuropsychiatric patients.

Overall, results from cellular studies are not completely consistent with the original landmark reports [23, 69]. Nevertheless, taking into consideration differences in experimental strategies investigating SLC6A4 expression in vitro, the majority of these studies have largely confirmed the higher transcriptional activity of the promoter containing a 16-repeat VNTR [35, 37, 38, 40, 51, 71, 74, 75, 77, 78, 80, 81].

Two Single Nucleotide Polymorphisms Close to 5-HTTLPR: What Difference a SNP Makes

Hu et al. reported an A/G SNP within the 5-HTTLPR insertion giving rise to the A and the G variants of the L allele (LA, LG) [40], while Kraft and coworkers identified an A/G polymorphism (designated as rs25531), lying just upstream of the 43 bp insertion/deletion, which differentiates between the L and S alleles [96]. Although these two SNPs were initially assigned to two physically distinct loci, a detailed alignment of the repeat elements showed that both effectively refer to the same nucleotide. Accordingly to the repeat architecture designated by Nakamura, the rs25531 can actually occur both in the context of L or S HTTLPR alleles within the conserved sixth repeat unit [96]. In other words, the ζ element (the sixth repeat) (Fig. 2) carries the single nucleotide polymorphism rs25531, which leads to the appearance of the allelic subtypes SA, SG, LA, and LG (see Table 2), whereas the segment missing in the S variant occurs within the seventh through ninth repeats of the most common configuration of the HTTLPR. The LA alleles contain the canonical ζ element whereas the minor LG alleles are characterized by the μ repeat [32, 96] (see repeat unit sequences in Table 1 and repeat unit structure in Table 2). Recently, Perroud et al. reported SG genotype and confirmed that the rs25531 lies 18 bp 5′ to the ins/del [97]. This SNP polymorphism should therefore be considered to constitute four haplotypes instead of a triallelic unique locus as proposed by Hu et al. [40]. The base substitution in rs25531 creates a consensus binding sequence for the activator protein 2 (AP-2) transcription factor, one of the many nuclear factors that function both as transcriptional activators and repressors. The role of this SNP as a potential modulator of transcription factor binding (TFB) was functionally characterized by in vitro binding assays, where oligonucleotides containing rs25531 and flanking sequences showed greater binding to nuclear extracts when compared with the major LA allele [96].

The work of Hu and coworkers demonstrated that 5-HTTLPR predicts SLC6A4 expression in lymphoblasts by real-time PCR as mRNA expression varied across the HTTLPR genotypes [40]. Specifically, by virtue of the contribution of the interacting SNP rs25531, a significant fraction (10–25 % depending on ethnicity) of L alleles is low-expressing. Homozygous S/S and LA/LA genotypes were the lowest expressing and the highest expressing genotype, respectively, and the LA/LA genotype differed significantly from each of the other genotype (S/S, S/LG, LG/LG, LA/LG) other than S/LA (see Table 4). The two heterozygous genotypes carrying one copy of the S allele did not exhibit the expected lower expression as predicted by the S-dominant model. These results were not in agreement with the original proposed suggestion [69], thus turning the S-dominant recessive effect into a codominant one. A significant effect of 5-HTTLPR genotype on mRNA expression was also demonstrated in lymphoblast cell lines by Bradley et al. [74]. Consistent with the demonstration of Hu and coworkers, the values obtained for the S/L heterozygotes did not significantly differ from those obtained from the S/S homozygotes. These findings also indicated that an additive, three-class model, mechanism appeared to mediate the 5-HTTLPR variants’ effect on 5-HTT mRNA transcription [74]. In the work of Hu et al., LG had reduced basal transcriptional efficiency relative to LA, while LG and S showed nearly equivalent basal expression in transfected rat neuronal RN46A cells [40] (see Table 4). The functional equivalence of LG and S reporter constructs carrying the solely 5-HTTLPR strengthened mRNA expression results obtained in lymphoblasts by the same authors. Furthermore, evidence from AP2 LG-based “decoy” DNA depletion assays in cells transfected with the S-, LA-, or LG-HTTLPR reporter plasmids demonstrated that rs25531 effectively creates a binding site for the transcriptional regulator AP2, which exerts a repressive role on the SLC6A4 promoter activity. Thus, treatment with oligo-G decoy DNA equalized LG and LA reporter expression. On the one hand, due to the presence of other AP2 sites within the 5-HTTLPR, the S and LA also responded to oligo-G decoy even though the increase in transcription levels was to a lesser extent [40]. More recently, no significant difference in expression between S and LG alleles was also observed after transfection of the same RN46A cell line with similar reporter plasmids carrying the 5-HTTLPR from S and LG allelic variants [35] (see Table 4). Comparison of the silencer activities between LG and LA alleles in the Sakai’s report showed no significant difference in contrast to the results of Hu et al. [70]. However, it needs to be considered that the presence of 5-HTTLPR upstream heterologous promoter, such as in the Sakai’s constructs, might result in a different influence on the gene reporter’s transcription in the same cellular environment. Likewise, functional analysis of the SNP rs25531 performed on the S allele background (SA and SG) by Sakai and coworkers led to similar transcription levels [70] (Table 4).

Table 4 Overview of the SLC6A4 promoter functional studies: the combined effects of 5-HTTLPR and SNPs rs25531 and rs25532

The effect of the LG allele variant described by Hu et al. was not reproduced by other studies assessing the influence of the SLC6A4 polymorphic promoter on its own expression in native lymphoblastoid cell lines [77, 98]. Conversely, the first study investigating the relationship between the LA/LG polymorphism and a 5-HTT density index in healthy humans using 3-(11) C-amino-4-(2-dimethylaminomethylphenyl-sulfanyl) benzonitrile (DASB) positron emission tomography (PET) demonstrated a significant effect of LA/LG polymorphism on 5-HTT binding potential in putamen. Higher 5-HTT binding was associated with carriers of the high expressing genotype LA/LA; the effect was most significant in the subsample of subjects with Caucasian ancestry [90]. Reimold et al. showed a similar effect in midbrain in 19 healthy volunteers with LA/LA genotype [91]. These data were not replicated in a cross-sectional PET study involving a large group of healthy European Caucasian volunteers. This study effectively showed that polymorphic variations in 5-HTTLPR did not affect 5-HTT gene expression as measured by DASB binding potential in the living human brain [94].

In 2008, Wendland et al. described the rs25532, a C → T polymorphism located less than 150 nucleotides of rs25531 in the repetitive element designated μ (Fig. 2). Accordingly to Nakamura nomenclature, the rs25532 SNP give rise to the 16-G allele (LAT, A at rs25531, T at rs25532; GenBank accession number EU035981) and the 14-E allele (SAT, A at rs25531, T at rs25532; GenBank accession number EU035982) [41] (see Table 2). The mutated repeat unit was recently described as the ä repeat unit in the worldwide population variation at the SLC6A4 locus by Murdoch and coworkers [31] (see Table 1). Both L and S alleles with rs25532 SNP were always in phase with A at rs25531 [41]. The functional impact of both L and S rs25532 T alleles was demonstrated by attenuation of the reporter gene transcription in RN46A, PC12, and JAR cell lines in the contexts of the 5-HTTLPR cloned into a basic vector with no promoter (see Table 4). The effect size of the C → T substitution differed between S and L alleles and between cell lines. The mean luciferase expression was reduced by 15–30 % in S constructs (SAT relative to SAC) and by 25–80 % in L constructs (LAT relative to LAC). Most significant expression differences were observed between L allelic variants in the rat neuronal raphe-derived RN46A, rat adrenal medulla pheochromocytoma PC12, and human placental choriocarcinoma JAR cell lines, with an 80 % decrease of expression in the RN46A neuronal cellular context. Although significant, the difference between SAC- and SAT-dependent transcription levels are much more modest [41]. Expression results from Wendland et al. were in agreement with other studies of this promoter region showing that basal transcriptional activity of the LA allele was higher than that of the SA variant (from 1.8- to about 3-fold) [23, 35, 37, 38, 40, 71]. Overall, functional data arising from transcriptional assay that involved LG (rs25531) and LT (rs25532) alleles showed that both the minor T (rs25532) and the G (rs25531) alleles particularly attenuated the gain-of-function of L allele relative to S allele [40, 41].

Global haplotype inference analysis at the SLC6A4 locus revealed that the rs25531 G allele almost always turned up on the 16-repeat background. Conversely, the T allele of rs25532 occurred almost always on the 14-repeat background. The extremely low frequencies of the SG and the LT allelic variants suggested that each allele arose on separate background; the G allele of rs25531 and the T allele of rs25532 never coincided on the same haplotype [31].

Uncommon Allelic Variants

Recently, SLC6A4 gene expression was measured in lymphoblast-derived cell lines from 134 African American females from the Family and Community Health Care Takers Study (FACHS) and its relationship with 5-HTTLPR genotypes analyzed [33]. Genotyping revealed 86 cell lines with the L/L genotype, 26 with the S/S genotype, 13 with the L/XL genotype, 8 with the S/XL genotype, and 1 with the XL/XL genotype.

Ordinal regression analysis demonstrated a significant relationship between genotype and gene expression supporting a dose-effect model in which an increased length of the 5-HTTLPR is associated with an increased gene expression (S < L < XL) [33]. Unfortunately, these authors made a comparison among 26 cell lines with the S/S genotype, 86 cell lines with the L/L genotype, and an erroneous 24 XL that did not match neither the XL carriers (n = 22) neither the overall number of XL alleles (n = 23).

As it has been speculated that longer variants act as “super long” alleles with respect to transcriptional activity, this pattern is consistent with the hypothesis that the XL variant will be associated with increased transcriptional efficiency. The XL allele reported in the FACHS is presumably the same as the XL20 allele (about 81 bp longer than the L allele) observed in African Americans, Japanese, and Chinese population by Gelernter et al. [28, 48]. Alleles of similar size (XL alleles composed of 18, 19, and 20 repeat units showed in Table 2) have been reported by others [26, 29, 31, 32, 37, 38, 53, 99]. Avula et al. detected other allelic XL variants (alleles 17-B, 18-A, 18-C, 20-B, 24-A in Table 2) expressed in heterozygous state with the S allele [51].

Sakai et al. investigated the modulating effect of some allelic variants (official names 15-A, 19-A, 20-A, 22-A in Table 2) on the transcription of the luciferase reporter gene. In the RN46A cell line, insertion of 5-HTTLPRs of these alleles into the SV40 promoter-luc+ transcriptional unit silenced the promoter activity to about 30–35 % of the luciferase activity of the empty vector. Furthermore, all these variants showed a significant decrease of the promoter activity in comparison to canonical S and L alleles [70] (Table 5).

Table 5 Rare allelic 5-HTTLPR promoter variants: in vitro studies

The novel rare alleles XL17 (allele 17-C in Table 2), XL18 (allele 19-C or XL19 in Table 2), and XS11 (see Table 2) were identified in heterozygous state in members of families (the Vermont Family Study) with “White/non-Latino” ethnicity [35]. Genotyping results demonstrated identity by descent (IBD) for XL18 and XS11 alleles and that the two rare alleles were not a result of a de novo rearrangement or an example of somatic mosaicism. Setting the boundaries of the repeat units according to the Nakamura approach, the XL17 contained one more κ repeat while the allele reported as XL18 by Ehly et al. contained three ϕ and one ο elements after the first 6 repeat units, effectively resulting in a 19-repeat extra-long allele (allele XL19 in Table 2). It is noteworthy that the XL17 carries the extra repeat downstream the usual recombination/duplication site.

Both alleles contained the A at rs25531, resulting in XLA17 and XLA19 alleles if using the triallelic nomenclature by Hu et al. (see Table 2). These novel XL alleles had only one AP2 TFBS as a result. The XS11, when compared with the canonical L allele (allele 16-A in Table 2), did not contain β, γ, δ, ε, and ζ repeats. By virtue of ζ element deletion, the XS11 allele also contained only one AP2 TFBS. The functional strength of each 5-HTTLPR allele was characterized using luciferase reporter constructs transfected into rat RN46A cell line. The XL17 exhibited a slight but significant decrease in expression of gene reporter in comparison with the L even though the two alleles only differed by an additional κ repeat present in the XL17 allele (Table 5). In contrast to Vijayendran’s report, it has been hypothesized that the increase in promoter length or differential binding of transcription factors most likely gives rise to attenuation in expression, although this effect was not observed with the XL19 carrying two more elements [35]. The XL19 5-HTTLPR was effectively able to drive basal transcription at the same level of the L allele. Furthermore, the drastic difference in genetic architecture and repeat length generating the loss of 18 TFBS observed in the XS11 allele did not significantly change the 5-HTTLPR promoter activity compared with the canonical S allele in the RN46A cellular context [35] (Table 5).

We identified two 5-HTTLPR alleles longer than the common L allele together with an extra-short variant allele [37, 38] (additional data are given in Supplementary Material). The extra-long alleles named XL1 and XL2 were both composed of 18 repeat units. As showed in Table 2, the XL1 allele (GenBank accession number KM054528) corresponded to the LJ allele reported by Michaelovsky et al. [30] and to the XL18 (official name 18C in Table 2) allele described by Avula et al. [51]. Their inserted repeat elements were the exact duplication of the repeats seventh (ο) and eighth (ζ). Otherwise, the XL2 allele (GenBank accession number KM054529) differed from both the European-Caucasian 18-repeat variant (allele 18 EC in Table 2) detected at a low frequency in a sample from Denmark (GenBank accession number EF179203 [36]) and from the 18-1 allele reported by Murdoch et al. [31]. The seventh repeat unit (the η′ repeat) contained one more cytosine in comparison to these 18-repeat alleles (see Table 1). The extra-short allele XS1 (GenBank accession number KM054527) was shorter than 23 bp; it lacked the κ repeat, contained 13 repeat units and corresponded to the S* allele already observed by Frisch et al. in a clinical report of a suicidal Libyan Jewish patient [27] (see Table 2). Notably, the XL1, XL2, and XS1 rare alleles occurred in a heterozygous state with the L (XL1, XL2) and S (XS1) alleles, and all of them contained the A at rs25531 and the C at rs25532. To our knowledge, only Ehli et al. [35] and Murdoch et al. [31] described extra-short alleles, smaller than the common 14-repeat S allele (i.e., the XS11 and the 13-2 alleles shown in Table 2). Allelic-specific regulatory effect on the basal transcription was determined after transient transfection of the human JAR cell line with constructs carrying each 5-HTTLPR upstream of the reporter gene (additional data are given in Supplementary Material). Expression analysis revealed a slight decrease of expression of XS1 construct in comparison with S construct, whereas no significant difference in both XL1 and XL2 alleles’ transcriptional activities versus L was observed [37, 38] (Table 5). The functional results obtained with XL1 and XL2 alleles reproduced in vitro findings achieved with the XL19 allele [35], which contained a duplication of the same VNTR region.

Concluding Remarks

In the field of genetic association studies, the SLC6A4 gene is probably being studied more than any other target gene in the field of neurobiology. In the pursuit of specific genes implicated as risk factors in one or more mental illnesses, many genetic association studies examining the 5-HTTLPR genetic variants have been undertaken. The bulk of investigations on 5-HTTLPR were prompted by pioneering studies showing that the L variant of the SLC6A4 promoter is more active than the S allele variant. At present, however, in spite of robust overall meta-analytic evidences of some associations that have been recently confirmed [4245], results were not consistent across studies, with some findings remaining controversial even in meta-analyses confirming the overall association. In a concise as notable viewpoint, Murphy et al. pointed out substantial oversights and omissions in scientific publications that reported data from genotyping only the 5-HTTLPR, neglecting the two nearby SNPs or ethnicity of the subjects genotyped [100]. The exclusion of these factors that alter 5-HTTLPR allele frequency and functionality might have influenced study results and, most importantly, altered association studies of drug treatment responses and drug side effects.

In this article, we reviewed in vitro functional studies stemming from neurogenetics/neurobiology papers dealing with the molecular influence of the promoter allelic variants on SLC6A4 expression. The contribution arising from the SNPs located in the 5-HTTLPR region, that alter the functional effects of the L versus S variant, as well as intriguing results from the rare allelic variants shorter and longer than S and L, have been considered. Functional studies allow exploring separately the effects of regulatory elements in a more restricted system like cell lines. The regulatory difference between the two common L and S alleles seems to be tissue-specific, as evidenced by disparity among functional in vitro studies in neural and non-neural cells. The 5-HTTLPR polymorphism may play effectively a more complex role in neural tissue than in serotoninergic-non-neural or heterologous cell systems. Functional promoter analyses could theoretically yield some incorrect results because the expression of the reporter gene does not undergo to closed-loop control. The reporter gene assays actually lack the negative feedback, as reaction product does not activate the negative feedback loop limiting its transcription in the same way the 5-HTT protein would. Notably, in the functional assays performed so far, the reporter gene constructs incorporated either the complete or the partial VNTR sequence of the SLC6A4 promoter resulting in data not always comparable. Furthermore, data based on experiments using plasmid constructs, which are generally not affected by DNA methylation, could explain why larger effect sizes were observed in such communications. The omission of the two nearby SNPs could be partially responsible for inconsistent results.

It has been speculated that longer genetic variants could act as “extra-long” alleles with respect to transcriptional activity [33]. This pattern is consistent with the hypothesis that the XL variants of the 5-HTTLPR will be associated with increased transcriptional efficiency. The XL allelic-specific regulatory effect on the basal transcription reported so far did not confirm the paradigm “extra-long allele—higher transcriptional activity.” However, transcriptional activity measures have been obtained by using reporter constructs carrying the solely 5-HTTLPR sequences. Further in vitro functional characterization of XL full-length promoter versions is required to investigate the hypothesized dose effect for the 5-HTTLPR repeats on SLC6A4 expression.

Recent data suggest that 5-HTTLPR polymorphism does not affect human brain function by changing the availability of the neurotransmitter but by other mechanisms [86]. The complexity of findings indicates that regulation of 5-HTT may operate at different levels. In vitro promoter activity may only rely on factors involved in gene regulation while measures of phenotypic effects of the 5-HTTLPR may also be dependent on receptor binding and neurofunctional associations to 5-HTTLPR that are linked to brain function and behavior. It can be argued that other factors may impact on 5-HTT regulation and expression. Additional untranslated regions have been identified as functional elements of the SLC6A4 gene expression. At present, from a molecular viewpoint, the functional variations in the SLC6A4 expression can no longer be attributed to the classic contribution of S and L as low- and high-expressing but need to be integrated with the contribution arising from the effect of modulating polymorphisms as rs2531, rs25532, intronic and 3′-UTR variability, and epigenetic regulatory mechanisms [100]. These elements are generating increasing interest in this research field as it is likely that will have a marked impact on SLC6A4 expression. In our opinion, this research field deserves further consideration.

The main advantage of understanding the genetic variations within the SLC6A4 region is the potential consequences on research design, methods, and interpretation. The combination of several methods at structural, functional, and system levels such as neuroimaging techniques is now contributing to reveal the interaction between molecular genetic mechanisms, environmental factors, brain function and behavior, as recently demonstrated [101]. Therefore, the candidate gene association studies need to take genetic and molecular variations into account to better translate the achievements from studies on neurobiological markers into individual therapeutic treatments.