Introduction

As a vestige of ancient retroviruses, human endogenous retroviruses (HERVs) have been inserted into human genome and account for ~8 % of the whole genome. HERVs have been scattered amongst all chromosomes through germ line infection of ancient exogenous retroviruses and have various functional roles. Some of these roles are specific to disease states, and thus, HERV-derived transcripts can sometimes be used as biomarkers [1, 2]. HERVs are classified based on their primer-binding site (PBS), which binds host tRNA and is involved in replication in the host. To date, many families of HERVs have been identified, and their roles identified [3].

A full-length HERV sequence consists of two long terminal repeats (LTRs), the PBS, and four viral genes (gag, prt, pol, and env). The gag gene codes for structural proteins, the prt gene codes for a protease, the pol gene codes for viral enzymes, and the env gene codes for the surface envelope proteins [4, 5]. HERV elements can influence the host cell in two ways: (1) through the expression of HERV-derived viral genes [6, 7] and (2) through the modulation of HERV LTRs as transcriptional regulatory signals [8]. During evolution, most HERV families have been truncated or undergone insertions, deletions, and substitutions [3]. However, some HERV families have conserved their structure in the host genome and express their viral genes. Therefore, it is important to classify HERV families based on their PBS, identify full-length HERV LTRs and viral genes, and confirm the expression of HERV genes in host cells.

Full-length HERVs have long open reading frames, and their transcripts and proteins are expressed in various cell lines and tissues [7, 9, 10]. Some HERV transcripts or proteins are more highly expressed in tumors than in normal tissues [7, 9] and are more highly expressed in placenta than other tissues [11, 12]. In addition, several studies have indicated that HERV LTR sequences can regulate gene expression by acting as alternative promoters [13, 14], enhancers [15, 16], silencers [17], hormone-responsive elements [18], and polyadenylation signals. In this study, we identified HERV-Y in the human genome, confirmed its structure, and assessed its expression in 20 human tissues and 11 human brain regions, including an Alzheimer’s disease sample.

Materials and methods

RNA samples

Total RNA from normal human tissues were purchased from Clontech (Mountain View, CA, USA). These samples contained 500 ng of RNA. A reverse transcription reaction was performed using reverse transcriptase (RT) at an annealing temperature of 42 °C for 90 min with an RNase inhibitor (Promega, Madison, WI, USA) as described [9].

In silico analysis and primer design

The RetroTector program was used to identify HERVs located at a specific locus within the human genome (GRCh37/hg19). Each LTR sequence was aligned using ClustalW in the MEGA6 program [19]. To avoid redundant amplified sequences, in silico PCR was performed with the UCSC in silico PCR program using pol regions as primer sequences for HERV-Y (Table S1). Primer sequences that make multiple products in the human genome were excluded from the study. In order to differentiate the three loci by RT-PCR and real-time RT-PCR, we selected specific regions of three HERV-Y elements for each primer-binding site (Figure S1).

Phylogenetic analysis

Neighbor-joining trees were generated using the MEGA6 program [19] with 100 bootstrap replicates. The percentage of bootstrap replicates supporting the branch is indicated at each node. Sequences for HERV-I on chromosome 7 (AC007276), HERV-T on chromosome 19 (AC078899), HERV-E on chromosome 19 (AC010329), HERV-F on chromosome 11 (AC123788), HERV-W on chromosome 7 (AC007566), HERV-H on chromosome 2 (AC020550), HERV-FRD on chromosome 6 (AL136139), HERV-S on chromosome X (AC233296), HERV-L on chromosome 16 (AC003003), and HERV-M on chromosome 7 (AC004614) were retrieved from GenBank. HERV-R on chromosome 7, known as ERV3-1, was obtained from the UCSC genome browser. HERV-K on chromosome 6 was represented by the human-specific HERV-K109 sequences described previously [20]. The RetroTector program was used to predict the HERV structure from these sequences. The pol-RT region was identified by a BLASTx search [21].

Quantitative real-time RT-PCR

The products of the HERV-Y viral transcript were detected by quantitative real-time RT-PCR using the primers indicated in Table S1, and experiments were performed on a Rotor Gene Q (QIAGEN, Hilden, Germany) with a QuantiTect SYBR Green PCR Kit (QIAGEN). In each reaction, the melting curve of amplified samples was a single peak, which indicated one specific PCR product. No template control was amplified, and primer dimers were not detected. For normalization, the ACTB gene was amplified from human β-actin (NP_001092.1). Real-time RT-PCR amplification of HERV-Y elements and the ACTB gene were conducted for 45 cycles of 95 °C for 10 s, 58 °C for 15 s, and 72 °C for 15 s. Then, melting curve analysis was performed for 30 s at 55–99 °C. Each primer set yielded a single, sharp peak, indicating amplification of a single product [9]. All samples were amplified in triplicate to ensure reproducibility.

Results

Structure analysis of three HERV-Y elements

RetroTector was used to screen the human genome for HERVs, and full-length HERVs with the primer-binding site (PBS) for tRNA-Tyr were selected (Figure 1). This PBS is homologous to that of equine endogenous retrovirus (EqERV), which uses tRNA-Tyr for complementary DNA synthesis [22]. The PBS sequence to which the tRNA-Tyr binds had not been described previously in the human genome, and we therefore temporarily named this new HERV family HERV-Y.

Fig. 1
figure 1

LTR sequences of three HERV-Y elements and the reverse complement of PBS sequences. Both the 5′ and 3′ LTRs were identified, and they were aligned against each other. The dashes represent missing residues. The reverse complements of PBS sequences and tRNA-Tyr sequences were aligned. The sites of promoter or polyadenylation sequences are boxed

We identified the loci and structure of three full-length HERV-Y elements (Figure 2), and each PBS region located between the 5′-LTR and gag gene was revealed to have a high-degree of homology to the tRNA-Tyr. The PBS follows the 5′ LTR, with each PBS matching at least 15 of the 18 nucleotides at the 3′ end of the human tyrosine tRNA. Each HERV-Y element was ordered according to the chromosome number and location, and the full-length HERV-Y family is located on chromosome 8 and chromosome 13 (Table 1). All three HERV-Y elements have a 5′ LTR-gag-prt-pol-env-3′ LTR structure. We characterized the HERV-Y elements according to their size. Full-length HERV-K elements consist of 8.4–9.6 kb, and the two LTRs range in size between 0.2 and 0.5 kb (Table 1). With respect to HERV-Y101, one potential promoter (ATAAAT) site was identified. The PBS of this element matches 16 of the 18 bases at the 3′ end of the human tyrosine tRNA. A putative promoter (ATAAAT) and polyadenylation site (AATAAA) were detected in the HERV-Y102 and HERV-Y103 LTR sequences. These LTR elements could regulate transcription of adjacent genes [8, 11, 23]. Based on this information, the roles of HERV-Y LTR sequences and viral genes could be elucidated in further studies.

Fig. 2
figure 2

Structural analysis of full-length HERV-Y located in specific loci on human chromosomes. The depicted structures were analyzed using HERV-Y sequences from the human genome hg19 assembly with the RetroTector10 program. Those HERVs harbor a 5′ LTR-gag-prt-pol-env-3′ LTR structure. Subdomains detected by RetroTector10 are indicated in three frames in the upper diagrams. Two LTRs, four viral genes, and the PBS are indicated below each diagram

Table 1 List of novel HERV-Y families identified and characterized in this paper

Phylogenetic analysis of three HERV-Y elements

Using phylogenetic analysis, we compared the relationship of HERV-Y to other HERV families (Figure 3). The phylogenetic tree was constructed by the neighbor-joining method with HERV family sequences obtained from the GenBank database. Our three HERV-Y elements grouped with HERV-I, -T, -E, and –R, consistent with prior reports [24, 25], and these elements are classified as gammaretroviruses [26]. HERV-L and HERV-S are classified as spumaviruses [4, 26] and are closely related as an outgroup of the group including HERV-Y [3, 27]. HERV-K, also known as HML, is classified as a betaretrovirus and comprised an outgroup of the HERV phylogeny [3, 26, 28]. Two families of HERV (HERV-K, HERV-M) were also independently classified. Our dendrogram illustrates the phylogenetic relationships between the HERV-Y elements and other HERV families.

Fig. 3
figure 3

Phylogenetic trees based on the pol RT sequences of HERV-Y and other HERV families. The tree was constructed by the neighbor-joining method. Bootstrap evaluation of the branching patterns was performed with 100 replicates. Branch lengths correspond to genetic distance

Expression analysis of HERV-Y

To determine the role of HERV-Y, we confirmed the presence of the pol genes of HERV-Y. In the HERV genome, the pol gene encodes the RT and integrase (IN), which are responsible for the synthesis of viral DNA from viral RNA and the integration that DNA into the host genome, respectively [29, 30]. The pol gene is therefore essential for viral activity. HERV-Y101 pol was expressed ubiquitously in all examined tissues, and HERV-Y102 pol and HERV-Y103 pol were dominantly expressed in the brain (Figure 4). In the case of the HERV-W element, a placenta-specific expression pattern has been observed [12], but HERV-Y elements were dominantly expressed in the brain. Based on our data, HERV-Y101 pol was ubiquitously expressed in all tissues but dominantly expressed in cerebellum and whole brain (Figure 4a, left). HERV-Y102 pol was dominantly expressed in the cerebellum, fetal brain, and prostate (Figure 4b, left). HERV-Y103 pol was also ubiquitously expressed, and the highest level of expression was in the adrenal gland and cerebellum (Figure 4c, left). HERV-Y pol genes were expressed in various tissues, suggesting that they can have biological roles in the host. Given that HERV-Y pol genes were most highly expressed within the brain, the expression pattern within the brain was assessed. Eleven different areas of the brain as well as an Alzheimer’s disease brain sample were analyzed, and HERV-Y pol genes were expressed at a higher level in the pons than in other brain regions (Figure 4, right).

Fig. 4
figure 4

Quantitative real-time RT-PCR analysis of HERV-Y pol genes in 20 normal human tissues and 12 human brain regions. Relative expression levels of each HERV-Y gene was normalized to the expression levels of the human ACTB gene

Discussion

Here, we report the identification and expression analysis of HERV-Y, a new multicopy family of HERVs in the human genome. In mammalian genomes, two ERVs predicted to use tRNA-Tyr were detected by in silico analysis in cow and horse, as bovine endogenous retrovirus (BoERV) and EqERV, respectively [22, 31]. When identifying retroviral sequences in the host genome, finding the LTRs is important, as it plays an important role in the initiation of transcription. A specific tRNA derived from the host cell binds to the complementary PBS region, located between the 5′ LTR and the gag region. We used the program RetroTector to identify ERVs that use tRNA-Tyr, and we then predicted the genomic sequence and structure of these elements in the human genome. Viral LTR sequences have promoter activity that induces viral RNA synthesis, and ERVs have promoter activity that can initiate viral gene expression within the host cell, as well as expression of adjacent genes [15, 23]. We identified LTR sequences, performed alignments between 5′ and 3′ LTR sequences, and then identified promoter and polyadenylation signals. HERV-K [32], HERV-H [33], and HERV-W [17] LTR sequences have been confirmed by their promoter activity in an in vitro reporter system, and we expect to investigate the promoter activity of HERV-Y LTRs in future studies.

In this study, we identified three HERV-Y elements in the human genome. HERV-Y101 and HERV-Y102 were detected in chromosome 8, and HERV-Y103 was detected in chromosome 13. These elements have the typical retroviral structure at the genomic level, and mRNA expression patterns were found by real-time RT-PCR. The three HERV-Y elements have well-conserved genomic elements that are similar in length, and they grouped with HERV-I, -T, -E, and -R in a phylogeny tree. The phylogeny indicated that the HERV-Y elements should be classified as gammaretrovirus-like class I HERVs [4, 26]. Class I HERVs have similar expression patterns and are specifically expressed in the brain and reproductive tissues [4]. From this analysis, we reasoned that these three HERV-Y elements could play roles similar to those of class I HERVs in the host.

The activity and location of HERVs can be detected by various in silico techniques. Furthermore, HERV activity can be measured and characterized by RT or env gene activity [3436]. An RT-PCR-based method was first introduced by Silver et al. [37], and subsequent methods have used real-time RT-PCR [9, 34] to detect RT or env gene activity. In this study, we used in silico analysis to identify HERV-Y elements in the human genome. Then, we assessed the expression of the pol genes of these elements and found tissue-specific expression patterns.

Many HERV elements have been identified, and their roles in the host have been elucidated. Some HERV elements may play pathological roles in cancer or other diseases and could be useful as biomarkers. In the case of class I HERVs, members of the HERV-E family are not expressed in normal lung and liver tissue but are expressed in the HepG2 and A549 cell lines [38]. HERV-R is expressed in placental tissue as well as in some cancer tissues and cancer cell lines [39]. It is also significantly expressed in normal human placenta as well as liver and lung cancer tissues [9]. HERV-F transcripts have been detected in human leukemia cell lines [40], and they are dominantly expressed in lung cancer [34]. HERV-H is also detected in leukemia cells [40], as well as in the plasma of people with multiple sclerosis [41]. Tissue-specific HERV expression indicates that HERVs might have a biological function. As a syncytin gene, HERV-W is preferentially expressed in the human placenta, where it mediates placental cytotrophoblast fusion during placental development [12].

As one of the best-studied HERV elements, HERV-K is very active within the host genome and is classified as a betaretrovirus-like class II HERV. The activity of HERV-K has been detected in cancer tissues, and it is currently used as a cancer biomarker [42]. HERV-S is closely related to ERV-L and is classified as a spuma-like class III HERV [4, 26, 43]. It is present on the X chromosome [3]. The pol gene of the HERV-S element is expressed within the brain and thymus, as well as in some cancer cell lines. The HERV-S pol gene is well conserved in non-prosimian primates [27]. HERV-V, one of the recently identified HERV elements in chromosome 19, possesses simian-specific elements and has undergone purifying selection during evolution [44]. Over the past two decades, many HERV families and elements have been identified and classified, and their functions have been characterized. The HERV-Y elements are full-length and expressed in various human tissues. The expression patterns of the HERV-Y elements in cancer will need to be studied in the near future, as well as their role in the brain.

The expression of HERVs is related to brain-related diseases such as schizophrenia [45, 46], bipolar disorder [47], and multiple sclerosis [48]. The results of this study suggest that HERV-Y elements have a tendency to be highly expressed in the brain. This information motivated us to assess the expression of HERV-Y in 12 different brain regions. HERV-Y elements were highly expressed in the pons and thalamus (Figure 4, right), which are known to be crucial nodes of neuronal signals [49]. This expression pattern is similar to the expression pattern for the HERV-W element [50]. The pons is a part of the brainstem that controls the basic vital functions such as heartbeat, breathing, and blood pressure. Importantly, the pons carries signals from the cerebrum to the cerebellum and medulla, and passes sensory signals on to the thalamus [49, 51]. These data suggest that HERV-Y elements have a possible role in controlling the transmission of sensory signals.

In the human genome, this study is the first to report the identification of the HERV-Y family. Following identification, the locations and genomic structure of these elements were determined. We confirmed HERV-Y expression in 20 tissues, 11 brain tissues, and a brain tissue sample from a patient with Alzheimer’s disease, using real-time RT-PCR. In summary, we identified new HERV-Y (101, 102, and 103) families in chromosomes 8 and 13. They were ubiquitously expressed and were especially dominant in the pons. The results presented here provide insight into the biological functions of HERV-Y in those tissues.