Introduction

Today’s mammalian sex chromosomes have evoluted and originated from an ordinary autosomal pair, which became differentiated and specialized as the Y chromosome progressively degraded, losing practically all of its active genes except for those with a selectable male-specific function, sex, and spermatogenesis [1]. Sex is determined by a master switch, SRY (sex-determining gene on the Y chromosome), which was isolated in 1990 [2] and was found to be necessary and sufficient to develop a testis [3]. The identity of SRY as the human and mouse testis-determining factor was confirmed by mutation analysis and transgenesis. Mutation or deletion of the SRY gene results in male-to-female sex reversal [46], whereas its acquisition by X–Y exchange causes female-to-male sex reversal. Moreover, a transgene with the human SRY open reading frame inserted into the mouse regulatory sequence caused XX sex reversal in mice [7].

The human SRY gene encoding 204 amino acids is an intronless gene with no recognizable features except for having an 80 amino acid DNA binding and bending domain called the high-mobility group (HMG) box due to its homology with the high mobility group proteins. Alignment of the SRY 5′ flanking sequence across 10 mammalian species reveals overall poor conservation, except for an element near the transcription start point [8]. However, the results from different reports suggest that SRY action is conserved across species, despite its poor conservation outside the HMG box [1, 9]. Since the SRY sequence is poorly conserved between species except for the HMG box, it has been strongly suggested that the evolution of the SRY gene occurred rapidly [911], but its mechanism is still debatable with no consensus about its evolution and the regulation of its expression. In order to tackle this issue, we reanalyzed the human SRY gene by searching for a homologous sequence in the database and investigated SRY gene evolution and expression.

In this article, we describe how the SRY gene emerged and started to work as a hybrid gene between a portion of the first exon of DiGeorge syndrome critical region gene 8 (DGCR8) and SRY box-3 (SOX3) gene. Also, we describe the identification of the regulatory sequence in the SRY promotor that is the transcription factor CP2 (TFCP2) binding motif and how it can be a good candidate as a promoter. This TFCP2 is a component of the stage selector protein complex in preferential expression of gamma-globin genes of hemoglobin subtype in fetal erythroid cells [12, 13].

Furthermore, we demonstrate how this TFCP2 can directly bind to and regulate the SRY gene expression in a dose-dependent manner.

Materials and methods

Plasmid construction

The entire open reading frame of human TFCP2 was amplified by PCR using cDNA from total RNA isolated from NT2/D1 cells and cloned into pcDNA3.1 (Invitrogen, NV Leek, Netherlands). RNA interference (RNAi) plasmids were constructed using pSilencer 1.0-U6 (Ambion, Austin, TX, USA). The designed RNAi targeting sequence of TFCP2 was selected using Ambipn web-based software.

The target sequence (5′-GCAAGAAGAGTCGAGTTTG-3′) corresponded to nucleotides 150–168 from the start codon of human TFCP2 mRNA. Human SRY promoter fragments were constituted by PCR amplification. PCR primers containing KpnI and XhoI sites were used to insert the SRY promoter fragment into pGL3 Basic (Promega, Madison, WI, USA). Putative TFCP2 binding sites of the SRY promoter were mutated using QuikChange II site-directed mutagenesis (Stratagene, La Jolla, CA, USA).

Quantitative real-time PCR

NT2/D1 cells were transfected with 5 μg of TFCP2-pcDNA3.1 or TFCP2-RNAi, using FuGENE6 transfection reagent (Roche, Indianapolis, IN, USA). Empty pcDNA3.1 or pSilencer 1.0-U6 was used as a control. Forty-eight hours after transfection, total RNA was isolated by TRIzol Reagent (Invitrogen). cDNA was synthesized using 2 μg total RNA and SuperScript III Reverse Transcriptase (Invitrogen). Real-Time PCR was performed using SYBR Premix Ex Taq (Takara, Shiga, Japan) on the AB 7500 real-time PCR system (Applied Biosystems, Foster, CA, USA) and normalized to the expression of glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The primers used in the real-time PCR were the following: SRY sense 5′-GCCGAAGAATTGCAGTTTGC-3′, antisense 5′-GTTGATGGGCGGTAAGTGGC-3′; TFCP2 sense 5′-ATAGCATGAGTGATGTCCTT-3′, and antisense 5′-TAGGGTTTCATCATGGAGTT-3′.

Luciferase assay

NT2/D1 cells were transfected with 100 ng of SRY promoter reporter constructs along with 100 ng of either TFCP2-pcDNA3.1 or empty vector. 0.4 ng of pRL-SV40 vector (Promega) was cotransfected as an internal control. After 48 h, cells were lysed and luciferase activity was assessed using a Dual-Luciferase Reporter Assay System (Promega).

Electrophoretic mobility shift assay

The TFCP2 in vitro translation expression vector, tagged with c-Myc, was produced by inserting an entire open reading frame into pGBKT7 vector (Clontech, Palo Alto, CA, USA). Translation was performed in a reticulocyte lysate-based in vitro translational system (TNT® Quick Coupled Transcription/Translation Systems, Promega). Single-stranded complementary oligonucleotides were annealed and end-labeled with [γ-32P] ATP with T4 polynucleotide kinase. Labeled oligonucleotide and 3 μl of in vitro translated TFCP2 were incubated for 45 min at room temperature with 20 μl of binding buffer: 10 mM HEPES (pH 7), 50 ng/μl poly (dI-dC), 10% glycerol, 100 mM KCl, 0.6 mM MgCl2, 0.25 mM DTT, and 0.1 mM EDTA. For competition or supershift assays, unlabeled oligonucleotide competitor (250-fold excess) or 3 μl c-Myc antibody (no dilution, C3956, Sigma, St Louis, MO, USA) was added 45 min before adding labeled oligonucreotide. Samples were electrophoresed on 5% (w/v) polyacrylamide gel for 1–2 h at 100 V, the gels were dried, and autoradiography was performed using Mac BAS1500 (Fuji film, Tokyo, Japan). The sequences of oligonucleotides were as follows (only forward shown): CP2-2 5′-GTAACAAAGAATCTGGTAGAAGTGAGTTTTGG-3′, CP2-3 5′-GTTTCGAACTCTGGCACCTTTCAATTTTGTCG-3′ (bold underline shows LBP-1CS motif), CP2-2M 5′-GTAACAAAGAACGCCCTAGAAGTGAGTTTTGG-3′ (bold underline shows mutation of LBP-1CS motif).

Chromatin immunoprecipitation assay

NT2/D1 cells were plated in 90-mm dishes and cultured at 37°C, and then were transfected with 5 μg of Flag-tagged TFCP2 plasmids using FuGENE6 transfection reagent (Roche). After 48 h, we split the transfected cells into dishes and subsequently screened with G418 for two weeks. Single cell clones were selected and cultured. Transfected NT2/D1 cells (5 × 107) were treated with 1% formaldehyde for 15 min at 37°C, washed with PBS buffer containing protease inhibitors, collected by centrifugation at 4°C, resuspended in a SDS lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris–HCl) containing protease inhibitors, incubated on ice for 10 min, and sonicated. 60 μl chromatin shearing sample was treated with 2 μl Flag antibody (no dilution, F1804, Sigma). Chromatin immunoprecipitation (ChIP) assay was carried out using One Day ChIP Kit (Diagenode, Sart-Tilman Liege, Belgium), according to the manufacturer’s methods. PCR amplification of the SRY promoter was performed and yielded a 117-bp amplicon, corresponding to nucleotides −130 to −14 of the SRY promoter; sense 5′-GCGGAGAAATGCAAGTTTCA-3′, antisense 5′-GAGAGTGCGACAAAATTGAA-3′. PCR was performed under the following conditions: 95°C for 3 min followed by 30 cycles at 95°C for 30 s, 60°C for 30 s, and 72°C for 1 min, ending with a final extension at 72°C for 10 min.

Results

Homology search of human SRY

Previous reports established that the HMG box of SRY was strictly conserved from the HMG box of SOX3 [14] and that the SOX3 has good identity with SRY inside the HMG box but not outside it (Fig. 1b), and thus, the SRY gene may have evolved from the ancestral SOX3 gene. In this study, we investigated how the human SRY emerged by reanalyzing and searching for a homologous sequence in the database after adding amino acids of the non-coding region together with the open sequence upstream to the start codon of the human SRY. Using NCBI BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) to search for an amino acid sequence of the human as well as other mammalian species’ SRY, we found that only DGCR8, known for its role in microRNA biogenesis [15], has homology from the N-terminal side of the start codon to the N-terminal side of the HMG box of SRY (Fig. 1a). In particular, Pro-Phe-Asn-Phe amino acids of the N-terminal side, Thr-Glu-Ser-Cys-Ser-Lys amino acids of the C-terminal side, together with both Ser and Leu amino acids were highly conserved. Furthermore, we found that the nucleotide sequence of DGCR8 has homology to human (41%) and mouse (23%) SRY genes (Fig. 1c), and its first exon is rich in GC contents, suggesting that it is most likely to be a good candidate for a promotor. Therefore, we hypothesized that the SRY gene is a hybrid gene generated by the insertion of part of the DGCR8 gene upstream of the HMG box of the SOX3 gene. Accordingly, this inserted sequence constituted a controlling element and a coding sequence for the newly emerged hybrid SRY gene.

Fig. 1
figure 1

Homology search for human SRY. a The amino acid sequence of the SRY flanking region is conserved in DGCR8. b HMG box is highly conserved in SRY and SOX3. Gray boxes show conserved amino acids. c A comparison of nucleotide sequences between DGCR8 and SRY in both humans and mouse species. Boxes indicate nucleotides having homology

Search for a motif common to mRNA of DGCR8 and SRY genes

We tried to test our hypothesis that a new promoter region of SRY was created by the insertion of a part of the DGCR8 into the upstream sequence of the HMG box of the SOX3 gene, and we used GENETYX-SV/RC Ver.7.08 to search for a transcription factor binding motif common to mRNA of DGCR8, upstream of human SRY and mouse Sry. Motif analysis revealed that a leader-binding protein (LBP) 1-CS (A/TCTGG), known as TFCP2, binding motif is found in the upstream sequence of both human and mouse SRY genes. For this TFCP2 motif, two putative binding sites (TCTGG) were located −3 and −45 bp from the Pro-Phe-Asn-Phe sequence, and these amino acids conserved between human SRY and DGCR8 (Fig. 2). In mouse Sry and human DGCR8, the CP2 binding site was located at −20 from the transcription initiation site (ATG start codon) [16] and at −35 bp from Pro-Phe-Asn-Phe, respectively (Fig. 2). Previous reports identified four transcription initiation sites [1619], among which one transcription site starts just next to the TCTGG site. Therefore, we surmise that TFCP2 might be a transcription factor for the SRY gene.

Fig. 2
figure 2

Search for a motif common to mRNA of DGCR8 and SRY genes. Human SRY and mouse Sry upstream, and human DGCR8 mRNA have the same motif LBP1-CS shown in boxes. Amino acids written in bold indicate their conservation in human SRY and DGCR8

Expression of SRY is regulated by TFCP2

In order to investigate the relationship between TFCP2 regulation and SRY expression, we started by examining the effect of TFCP2 overexpression on SRY. Figure 3a shows that SRY mRNA levels are slightly increased by overexpression of TFCP2 in NT2/D1 cells, indicating that SRY might be upregulated by TFCP2 overexpression. In contrast, suppression of TFCP2 by RNAi led to a significant reduction (P < 0.001) of SRY mRNA expression (Fig. 3b).

Fig. 3
figure 3

TFCP2 regulates SRY expression. The levels of mRNA expression of TFCP2 and SRY were measured by quantitative real-time PCR. Signals were normalized against GAPDH signal. Data are shown as the mean ± standard deviations. n = 6; * P < 0.001; Student’s t test. a SRY mRNA expression was increased after transient transfection of TFCP2. b SRY mRNA expression was reduced after transfection of TFCP2 RNAi. c TFCP2 increases SRY promoter activity in a dose-dependent manner. Fold activity of SRY promoter activity was compared with that of control (without TFCP2 expression vector). Experiments were performed in at least triplicate. Data are shown as the mean ± standard deviations. n = 6; * P < 0.001; Student’s t test

Then, we investigated whether TFCP2 acts directly on the SRY promoter. We generated an SRY luciferase reporter construct by cloning the 902 bp SRY proximal promoter fragment containing three CP2 binding motifs (TCTGG) into pGL3 basic vector, as shown at the bottom of Fig. 4a. The luciferase assay confirmed that overexpression of TFCP2 by transient transfection in the NT2/D1 cells led to a dosage-dependent increase of SRY promoter activity (Fig. 3c).

Fig. 4
figure 4

Effect of TFCP2 on SRY promoter activity. a Luciferase reporter constructs for SRY promoter deletion analyses indicated on the left. Filled circles are LBP1-CS motif. Data show luciferase activity of the cotransfected different length of the SRY promoter relative to control (pcDNA3.1) transfected cells. TFCP2-dependent induction was recognized between −67 and −130 bp from the ATG start codon. b Sequence of the mutant reporter constructs and luciferase activities. M2, M3, and M2+3 showed remarkable (62%), weak (30%), and 44% reduction, respectively, compared to wild-type construct. Mutation of CP2-2 abolished TFCP2-dependent promoter activity. All experiments were performed in at least triplicate. Data are shown as the mean ± standard deviations

Identification of TFCP2 response elements in the SRY promoter

We characterized the SRY promoter region by performing deletion analysis of the SRY promoter. This deletion analysis revealed that the region (−130 to −67) within the SRY proximal promoter is crucial for TFCP2 expression (Fig. 4a) and the TFCP2 binding motif (TCTGG) is included in this region (−130 to −67) with a nearby region (−43 to −39). Since these regions may constitute a critical promoter region for TFCP2, we produced mutant reporter constructs to study its promoter activity (Fig. 4b, left). These mutant reporter constructs were co-transfected with TFCP2 expression vector, and promoter activity was measured (Fig. 4b, right). When compared to the wild-type, mutated CP2-2 (M2) showed a 62% reduction, while mutated CP2-3 (M3) showed a weak reduction (30%), and the double mutated (M2+3) showed a 44% reduction. With these results in hand, it was not possible to decide whether TFCP2 binding site is CP2-2 and/or CP2-3. In order to examine whether TFCP2 binds directly to CP2-2 and/or CP2-3 sites, we performed an electrophoretic mobility shift assay (EMSA) using c-Myc-tagged TFCP2 (Fig. 5a). A probe containing the CP2-2 or CP2-3 site was used for EMSA. EMSA results showed that the CP2-2 probe was shifted by the c-Myc-tagged TFCP2 protein but CP2-3 probe was not (Fig. 5b). Additionally, the super-shifting of the CP2-2 probe by the c-Myc antibody could be inhibited by a cold competitor (Fig. 5c). Furthermore, a mutated probe of the CP2-2 site (CP2-2M) was not shifted by c-Myc-tagged TFCP2 protein (Fig. 5d). These findings confirmed that TFCP2 specifically binds to the CP2-2 site of the SRY promoter, and consequently, TFCP2 is confirmed as an important transcription factor for the SRY.

Fig. 5
figure 5

TFCP2 binding to CP2-2 site in the SRY promoter. a Western blot analysis of c-Myc-tagged TFCP2 and control (pGBKT7 vector not containing TFCP2) translated in vitro using reticulocyte lysate with monoclonal antibody against c-Myc. Lane 1 control; Lane 2 c-Myc-tagged TFCP2. b EMSA was performed using radio-labeled CP2-2 and CP2-3 oligonucleotides. C indicates control and T indicates c-Myc-TFCP2. c EMSA was performed using the radio-labeled CP2-2 oligonucleotide. A supershifted band was recognized with the c-Myc-antibody. Competition was performed using 250-fold excess cold oligonucleotide. d Mutated oligonucleotide of CP2-2 (CP2-2M) was not bound with c-Myc-tagged TFCP2 protein

TFCP2 binds specifically to the SRY promoter

Using ChIP, we could confirm that the TFCP2 binds to the SRY promoter. The TFCP2-DNA complex was immunoprecipitated using Flag antibody from NT2/D1 cells transfected with Flag-tagged TFCP2. The DNA was then purified and a successful PCR was performed using specific primers to the SRY promoter, yielding the target PCR amplicons (Fig. 6). These data clearly indicate that TFCP2 acts as a regulator of the SRY gene by directly binding to its promoter.

Fig. 6
figure 6

Chromatin immunoprecipitation assays were performed on NT2/D1 cells transfected Flag-tagged TFCP2 with Flag antibody (Ab+). Minus Flag antibody (Ab−), normal IgG, and no DNA (DNA(−)) were used as negative controls

Discussion

Mammalian sex is determined by the presence or absence of the Y chromosome that bears a male-dominant sex-determining gene SRY, which switches the differentiation of gonads into male testes. The molecular signaling mechanism turning on the switch, however, has remained unclear since identification of the SRY.

In this study, we described how the SRY gene emerged. By analyzing amino acid homology, we comprehended that SRY is a hybrid gene between a portion of the first exon of DGCR8 and the SOX3 gene. Furthermore, we identified the regulatory sequence in the SRY promotor region by searching for a common motif shared with DGCR8 mRNA. Therefore, we could demonstrate that TFCP2 binds directly to the SRY sequence and regulates its expression.

Transcription factor CP2 is a component of the stage selector protein complex in preferential expression of the gamma globin genes of hemoglobin subtype in fetal erythroid cells [12, 13]. It was initially shown to bind to the promoter region of the murine alpha-globin gene [2022]. Approximately 450 million years ago, during the evolution of fish, alpha and beta globin evolved. Later on, through the evolution of mammals, novel embryonic and gamma globins evolved by duplication and mutation of beta globin. It is well known that the hemoglobin molecule consists of two alpha chains and two beta chains; however, beta chains are switched from gamma globin just before birth. In humans, during the developmental process, the starting globin form is embryonic globin, whose synthesis occurs in the yolk sac from the third to eighth week of gestation. At about the fifth week of gestation, the first switch from embryonic to gamma globin occurs, and the major site of hematopoiesis begins to move from the yolk sac to the fetal liver. At just this time, TFCP2 may start to regulate not only gamma globin but also SRY expression in the gonadal ridge.

The α-thalassemia, mental retardation, X-linked protein (ATRX) regulates globin expression, and mutations in the ATRX gene were reported to cause genital abnormalities; therefore, ATRX is said to be one of the genes working on sex-determination/differentiation [23]. Since both TFCP2 and ATRX are contributing in the regulation of hematopoiesis, it is strongly, suggested that several similarities exist between their functions including their effects on sex determination and/or differentiation.

Our findings add to the current available knowledge about the evolution of sex chromosomes and the SRY gene, and their role in sex determination. It is strongly suggested that during evolution, SRY evolved from an ancestral SOX3 gene on proto-sex chromosomes [1]. The SOX3 gene is the most closely related gene to SRY within the HMG box. In mice the expression of Sox3 is very low, but it is expressed in the urogenital ridge overlapping with Sry expression [24]. This conserved expression at a critical time in sex determination implies that it retains some ancestral properties during development of the sex system of animals that originated from their requirement to proliferate and fit their progeny into the altered environment.

These results led us to hypothesize that SRY is a hybrid gene that evolved from ancestral SOX3 by insertion of the DGCR8 gene upstream of SOX3. Therefore, SOX3 shows good identity to the SRY gene only within the HMG box but not outside it and the inserted DGCR8 gene shows homology with the SRY sequence upstream to the HMG box. Support for this concept can be linked to clinical observations, where point mutations resulting in male-to-female sex reversal concentrates in HMG box [19, 25]. Other point mutations reported outside the HMG box matched to the conserved amino acids of the DGCR8 [26, 27]. To date, our knowledge on the evolution of sex chromosome is limited. In many species, it is difficult to identify the chromosome which bears the sex determining gene. This indicates that the sex determining gene emerged without a significant chromosomal aberration. By excluding chromosome aberrations, the mutation which may had given rise to a novel sex determining gene(s) is limited to four mechanism, namely, duplication, mutation, deletion, or insertion. The Y-linked sex-determination gene DMY in medaka emerged by the duplication and mutation of the autosomal gene DMRT1 (dsx- and mab 3-related transcription factor 1) [2830]. But in our search, we could not find any traces of tandem duplication. Also, there was no evidence of deletion as we could not detect any similarity in the upstream of SOX3 and SRY. This indicates that although there are different ways in making the sex determining gene, the mammalian SRY emerged by insertion.

There are four possible effects of mutation on protein function, namely, loss of function, gain of function, novel property, and ectopic or heterochronic expression. Complete evolution and specialization of the SRY gene as a sex-determining gene were accomplished by gain of function that may have evolved by changing the target sites and binding affinity. Regulation of SRY expression by TFCP2 was also attained from the hybrid property of SRY from SOX3 with the inserted DGCR8, which contains the promotor region and the TFCP2 binding motif.

In conclusion, the mammalian SRY gene emerged by the gain-of-function mechanism; a portion of the DGCR8 gene was inserted upstream of an ancestral SOX3 gene on proto-sex chromosomes, and a new promoter region was created, and hence, TFCP2, which functioned initially as a transcription factor for the α-globin gene, was expressed as a truncated SOX3 gene (Fig. 7). To our knowledge this is the first report to provide evidence about the evolution of the SRY as a hybrid of two genes SOX3 and DGCR8, and about the regulation of the SRY expression by the TFCP2.

Fig. 7
figure 7

Model of evolution of the human SRY gene. SRY is a hybrid gene of DGCR8 and SOX3. TFCP2 binding to the SRY promoter region is required for SRY expression