Introduction

Gene expression analysis is a fundamental step in biological research. Common experimental techniques, including northern blot hybridization, microarray and quantitative real-time reverse transcription PCR (qRT-PCR), are employed to evaluate gene expression levels. qRT-PCR is the most widely used method for quantifying differential expression. It is a powerful technique due to its high sensitivity, specificity, accuracy and broad quantification range (Bustin 2005; Bustin et al. 2005). Nevertheless, many factors should be taken into account for reliable analysis, including the amount of starting material (i.e., RNA concentration), cDNA quality, primer specificity, and accurate normalization. Therefore, it is vital to select stable reference genes to control for biological and technical variations for each sample to obtain accurate results (Mahoney et al. 2004).

The selection of reference genes is critical for analyzing expression of genes of interest. Housekeeping genes are considered to be constitutively and stably expressed regardless of growth conditions or developmental stage. Therefore, they were considered as the best reference genes. Some of the popular housekeeping gene used for qRT-PCR analysis in plants and animals include elongation factor-1A (EF-1a), polyubiquitin (UBQ), actin (ACT), and α-tubulin and β-tubulin (TUA and TUB, respectively), Glyceraldehyde-3-phosphate dehydrogenase C-2 (GAPC2), Polypyrimidine tract-binding protein homolog (PTB), catalytic subunit of protein phosphatase 2A (PP2A), stearoyl-ACP desaturase (SAD) (Andersen et al. 2004; Brunner et al. 2004; Artico et al. 2010; Kim et al. 2003; Goidin et al. 2001). Nevertheless, the idea of universal housekeeping reference genes has been disproved as they are affected by different experimental conditions (Ding et al. 2004; Nicot et al. 2005; Czechowski et al. 2005). To identify a reliable reference, several computational programs have been developed including geNorm (Vandesompele et al. 2002), NormFinder (Andersen et al. 2004), BestKeeper (Pfaffl et al. 2004), ReFinder (Xie et al. 2012) and ΔCt method (Silver et al. 2006), all of which are based on different statistical algorithms.

Identifying reliable reference genes have been performed in different plants, including Arabidopsis thaliana (Remans et al. 2008), wheat (Paolacci et al. 2009), barley (Faccioli et al. 2007), rice (Jain et al. 2006), soybean (Jian et al. 2008), grape (Reid et al. 2006), tomato (Exposito-Rodriguez et al. 2008), potato (Nicot et al. 2005) and peach (Tong et al. 2009). However, few studies focused on the reference genes of cotton (Wang et al. 2013; Artico et al. 2010). Cotton is one of the most important crop plants, which provides high quality of natural fiber and edible oil and brought great economic value. Several groups, including our own, are interested in understanding the genetic pathways of fiber development (Xie et al. 2015) and oil biosynthesis. For that, gene expression analysis is inevitable. Gossypium hirsutum and G. barbadense share a common ancestor, G. raimondii, whose genome has been sequenced (Paterson et al. 2012). G. raimondii has become a cotton model species for genetic and functional investigations; however, there have been no reports on suitable reference genes in it. Additionally, a single housekeeping gene is insufficient to normalize the expression level of genes under different experiment conditions (Gutierrez et al. 2008b; Schmittgen and Zakrajsek 2000). Therefore, in this study, we have evaluated 12 housekeeping genes in four different tissues of G. raimondii for qRT-PCR data normalization. We found that the expression of SAD and TUA11 are relatively stable in different tissues among all of the samples.

Materials and methods

Plant material and growth conditions

Gossypium raimondii was grown in the greenhouse in East Carolina University as our previous report (Ma et al. 2015). The flower buds were collected which the diameter was approximately 1 cm, and the sepal was separated from the bud. At the same time, the first-outstretched leaves and shoots on the top of plant were collected. Four tissues were frozen in liquid nitrogen immediately and stored at −80 °C for further analysis.

Selecting housekeeping genes and primer design

To identify suitable reference genes, the first critical step is to choose a series of housekeeping genes. In this study, 12 housekeeping genes previously used as reference genes in cotton (G. hirsutum) or other model plants were selected, including TUA10 (α-tubulin 10), TUA11 (α-tubulin 11), TUB6 (tubulin 6), MZA (clathrin adaptor complexes medium subunit family protein), GAPC2 (glyceraldehyde-3-phosphate dehydrogenase C-2), ACT (actin), EF1a (elongation factor 1A), PTB (polypyrimidine tract-binding protein homolog), UBQ (ubiquitin extension protein), PP2A (catalytic subunit of protein phosphatase 2A), SAD (stearoyl-ACP desaturase), FBX (F-box family protein) (Artico et al. 2010). Primers were designed by Primer-blast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) based on the obtained cDNA sequences obtained from the Joint Genome Institute (JGI) Phytozome v9.1: Gossypium raimondii v2.1 website (http://www.phytozome.net/cotton.php).

RNA extraction and quantitative real-time PCR

According to our previous report (Zhang and Pan 2009), 0.03–0.1 g tissues were prepared to RNA extraction. Total RNA was extracted from four tissue using mirVana miRNA Isolation Kit (Ambion, Austin TX) according to the production introduction. The quantity and quality of total RNA were determined by NanoDrop ND-1000 (NanoDrop Technologies, Wilmington, DE, USA), which is based on the ratios of A260/280 and A260/230. The TaqMan MicroRNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA) was used to generate the 1st strand of cDNA for an individual gene. For each sample, 800 ng RNA was used for reverse transcription. The reaction volume was 15 µl, including 0.15 µl dNTPs, 1.5 µl reverse transcription buffer, 1 µl MultiScribe Reverse Transcriptase, 0.19 µl RNase inhibitor, 800 ng total RNAs. Reverse transcription started with 30 min at 16 °C for primer binding with the RNA, 30 min at 42 °C for elongation, followed by 5 min at 85 °C for denaturation, and was finally held at 4 °C. Before qRT-PCR, the products were diluted with 80 µl DNase/RNase-free water and stored in −20 °C.

qRT-PCR was performed to determine the expression levels of 12 housekeeping genes in four different G. raimondii tissues. Gene-specific reverse and forward primers were used for each target gene (Supplementary Table 1). All reactions were performed using the Applied Biosystems ViiA 7 Real Time PCR System (Foster City, CA, USA). PCR reactions were performed under the following program: 10 min at 95 °C and 45 cycles of the thermal cycling of 15 s at 95 °C for denaturation and 60 s at 60 °C for annealing and extension in a 384-well plate. Each reaction was performed in triplicate (technical replicates) on three individual samples (biological replicates). SPSS (20) was used to generate boxplots using Ct values for the 12 housekeeping genes. Differences in gene expression were assessed using one-way analysis of variance (ANOVA). Relative quantities were calculated in Excel before being imported to the gene expression stability programs.

Determination of gene stability

Five common statistical approaches: geNorm, ΔCt method, NormFinder, BestKeeper, and RefFinder were preformed to identify the stability of each reference gene.

geNorm (Vandesompele et al. 2002) determines the gene stability by pairwise comparisons among all candidate genes. The expression stability value (M value) is calculated, and the gene with highest M-value is eliminated. The process is then repeated until there are only two genes left. The lowest M value is recommended as the optimum reference gene (Vandesompele et al. 2002).

The comparative ΔCt method was the only one that uses an excel sheet without any other applet. It is similar to geNorm and also depends on the pairwise comparison (Silver et al. 2006). Briefly, the mean and SD of each pair candidate genes and the average SD of each gene are calculated, and the gene with lowest SD is considered as the reliable reference gene. Boxplots were done using SPSS, different colors represent different pairs.

NormFinder uses an ANOVA-based model (Andersen et al. 2004). The method calculates a stability value for all candidate genes tested that is based on the combined estimate of intra- and inter-group expression variations of the genes.

BestKeeper was another Excel-based application to evaluate the gene expression stability (Pfaffl et al. 2004). The expression level of all candidate genes was detected via three variables: the standard deviation (Goetz et al. 2006), coefficient of correlation (r) and coefficient of variance (CV). The mean of Ct values for each sample across all candidate genes are combined to form the BestKeeper index. Subsequently, each candidate gene is tested in a pair-wise way via Pearson correlation coefficients, the coefficient of determination (r2) and the P value. The result is a ranked order of their stability. The highest ranked gene is the most stable. Any candidate gene with the SD value higher than 1 is considered as least reliable and should be excluded.

ReFinder (http://www.leonxie.com/referencegene.php) applies four computational programs, NormFinder, BestKeeper, GeNorm, and the comparative ΔCt, to produce a comprehensive stability value for each gene (Xie et al. 2012). The Ct value of each gene was input directly. Finally, ReFinder calculates the geometric mean of each gene to reach its overall final ranking. A lower geometric mean of ranking value indicates more stable expression.

Results

Descriptive statistics for the expression levels of 12 reference gene candidates

RNA transcript levels of the 12 candidate reference genes were assessed in four different tissues: young leaves, shoots, buds and sepals. All tested genes were expressed except FBX; which was subsequently removed the following analyses (Fig. 1). Based on our results, SAD and TUA11 were the most stable reference genes in the different organs and overall level. PTB, UBQ and PP2A showed higher variations in their expression levels. Among the 12 tested candidate reference genes, SAD mRNA was the most abundant whereas UBQ mRNA was the least abundant in all four tissues. The Ct median values across the candidate housekeeping genes ranged from 20.67 to 44.3 (Table 1). The wide range of Ct values imply that these candidate genes had different expression levels in the four tissues examined. SAD, TUA11, TUA10 and MZA had the lower Ct values with the Ct median of 20.67, 22.37, 31.53 and 31.66, respectively. However, PTB, UBQ and GAPC2 showed largest Ct values with the Ct median of 42.01, 44.3, and 40.85, respectively. Genes with SD values exceeding 1 were considered as unstable. The least variable reference genes were TUA11, TUB6 and SAD with the SD value of 0.74, 0.96 and 0.8, respectively. Conversely, PTB, UBQ and PP2A were the most variable genes (SD value is 3.59, 3.71 and 3.49, respectively). Although some of these genes showed a lower expression variation between the different tissue samples, it is still necessary to perform further analyses in order to identify the most suitable combination of these genes candidates for normalizing gene expression.

Fig. 1
figure 1

Boxplots showing the threshold cycles (Liang et al. 2006) of 11 reference genes in four different tissues as well the average of four organs. 50 % of the values are included in the box. The median is indicated with a black line in the center of the box, the 75th and 25th percentiles are represented by upper and lower line, respectively. The y-axis represents the ∆Ct values of three biotechnology replicates of each tissues, while the x-axis shows the eleven reference gene distribution in leaf, shoot, bud, sepal and the average of four organs

Table 1 Descriptive statistics of the raw Ct values of each gene among all samples

geNorm analysis of candidate reference gene stability

The transformed data of reference gene candidates were calculated using geNorm based on the geometric mean of the SD, which was defined as the M value for a putative reference gene. The stability of each reference gene was ranked by its M value, and genes with the lowest M values were considered the most stable in expression. M < 1.5 was used as criterion for stable gene expression (Zhang et al. 2012). The results of geNorm showed that the M value of most reference genes was less than 1.5 except UBQ, PTB and PP2A that were Fig. 2 treated as least stability genes (Fig. 2). Among those reference genes, the most stable genes were TUA11 and SAD with the same M value of 0.47. Overall, TUA11, SAD, TUB6, GAPC2, EF1Α were good reference genes. In decreasing order, the M-based ranking of reference genes from the most stable (lowest M value) to the least stable one (highest M value) was: SAD, TUA11, TUB6, GAPC2, EF1Α, ACT, MZA, TUA10, PP2A, PTB and UBQ.

Fig. 2
figure 2

The stability of reference genes as analysed by geNorm amongst all tissues. Mean expression stability (M) was computed following stepwise exclusion of the least stable gene across all tested tissues. The most stable genes are on the right and the least stable genes on the left

We also analyzed the 11 reference candidates using geNorm, which ranks reference gene candidates based on the pairwise variation Vn/n + 1. Interestingly, the V-value for all the gene pairs was higher than 0.15 (Fig. 3). This suggests that the use of multiple housekeeping genes as reference genes is not required for reliable normalization.

Fig. 3
figure 3

The reference ranking based on pairwise variation value (V-value) as calculated by geNorm. The y-axis represents pairwise variation (Vn/Vn + 1) which was calculated between Vn (normalization factors) and Vn + 1 to determine the optimal number of reference genes

Reference gene ranking based on comparative ∆Ct method

∆Ct method compares differential expression of ‘gene pairs’. If the dCt value of two genes remains constant in different samples, then both genes are stably coexpressed (Cassan-Wang et al. 2012). Our results showed that SAD, TUB6, TUA11 were better reference genes than others (Table 2), while GAPC2, EF1a, ACT, MZA, TUA10, PP2A, PTB and UBQ were ranked below, respectively (Table 2, Fig. 4). This was similar to the result generated by geNorm.

Table 2 A summary of pair wise comparison of means and SD for reference gene using ∆Ct method; the last column are the average of SD of each gene pairs
Fig. 4
figure 4

A boxplot graph representing the gene expression values of 11 reference gene candidates. The ΔCt values between each gene pair/group are shown in y-axis, while the x-axis represents 11 reference candidates. Different “gene pairs” are shown as different colors

Reference gene ranking based on NormFinder

The geNorm and ∆Ct methods evaluate the optimal reference genes between tissues. NormFinder was the only applet to rank the best candidate reference genes according to their minimal combined inter- and intra-tissue expression variation based on normalization factor (NF) calculation. So, stability of expression was then re-analyzed using the NormFinder algorithm. The most reliable candidate reference gene were SAD, TUA11, EF1Α, TUB6, GAPC2, ACT with the stability value of 0.19, 0.28, 0.38, 0.44, 0.45 and 0.98, which may serve as the top six reliable reference genes. The rank of six least stable reference candidates (MZA, TUA10, PP2A, PTB and UBQ) were the same as described in the method of geNorm, ∆Ct and NormFinder. In addition, UBQ was consistently ranked as the least stable gene in all statistical methods.

Reference gene ranking based on BestKeeper

This applet ranked candidate reference genes based on the Ct values of each gene, standard deviation (Goetz et al. 2006) and coefficient of variation (CV). SD and CV were considered as two criteria to deduce stable reference gene candidates. Genes with SD greater than 1 are considered unstable. The results of BestKeeper analysis are shown in Table 3. In our study, TUA11, SAD, ACT and TUB6 showed an SD value lower than 1 (with the SD value of 0.62, 0.63, 0.86 and 0.87, respectively) indicating that these genes were suitable as reference genes. Furthermore, pairwise correlation and regression analysis were performed to assess the inter-gene relations. A strong correlation was seen for EF1a (r = 0.948). However, based on BestKeeper index, EF1a had a high SD value in comparison to other candidates (SD = 1.436). Therefore, EF1a was excluded from candidate reference gene list. The result of BestKeeper agreed with those of ∆Ct method, geNorm and NormFinder, although their rank order was slightly altered. When evaluated across all experimental samples, TUA11 was ranked as the best reference gene.

Table 3 Ranking of reference gene based on BestKeeper

Comprehensive ranking

ReFinder integrates the currently major computational programs (geNorm, NormFinder, BestKeeper, and the comparative ∆Ct method) to compare and rank the candidate reference genes. ReFinder confirmed that SAD (1.49) and TUA11 (1.41) were the most reliable reference gene among different tissues in G. raimondii (Table 4). The candidate reference genes were ranked (from the highest to lowest stability) by the ReFinder as TUB11>SAD>TUB6>GAPC2>EF1a>MZA>TUA10>PP2A>PTB>UBQ. According to the results of all programs, TUB11 and SAD were considered as the most stable reference genes, and UBQ was the least stable one.

Table 4 A summary of comprehensive ranking and other four-method ranking of 11 reference genes in G. raimondii

The expression profiles of commonly used housekeeping genes were tissue-specific (Table 5). In leaves, TUA11, TUB6 were considered as the most reliable reference genes, MZA was selected as the most reliable reference gene in shoot, bud and sepal, whereas EF1a and UBQ were the best stable reference candidates in shoot and sepal, respectively. The overall ranking showed that TUA11 and SAD were the most reliable reference genes in all different tissues.

Table 5 The comprehensive stability of different tissues analysis by ReFinder

Discussion

The gene expression analysis is a common way for analyzing gene function in different tissues or under different experimental conditions. Several experimental methods have been used to measure gene expression including qRT-PCR. However, the choice of a reference gene is critical for reliable analyses and conclusions. Usually, housekeeping genes were commonly used as reference genes because of their conservation and abundant expression in different organisms. Nevertheless, no single housekeeping gene has a consistent expression in different tissues under different conditions (Artico et al. 2010). It is advisable to identify stable reference genes for target gene expression normalization. In past years, numerical reports had been published to evaluate reliable reference genes in different plants under various experiment conditions (Cassan-Wang et al. 2012; Chaouachi et al. 2013; Scholtz and Visser 2013; Serra et al. 2012; Zhu et al. 2013; Zhu et al. 2012). It has been stated that the stability of a reference gene must be validated with each experimental setup (Hruz et al. 2011). Several genes including ACTIN, GAPDH, TUA, and UBQ are used as single control genes in more than 90 % of the published expression studies (de Jonge et al. 2007). However, recent studies have indicated that these common reference genes are not always stably expressed when tested in other species or in a wider range of experimental treatments (Artico et al. 2010; Chen et al. 2011; Gutierrez et al. 2008a; Zhu et al. 2012). Here, we aimed to validate a collection of reference genes in every tissue qRT-PCR studies in G. raimondii. Our results indicated that UBQ was the worst candidate for normalization in G. raimondii.

To our knowledge, this is the first systematic study to validate a set of candidate reference genes for qRT-PCR in G. raimondii. In this study, we selected a series of candidate reference genes for which sequence information was available in the G. raimondii genome database. After determination of primer amplification efficiencies, 12 candidates were selected for evaluation of their normalization potential along a group of samples from different tissues. Our results suggested that TUA11 and SAD were the best reference genes according to the average expression stability (M) or stability values acquired by ∆Ct method, geNorm, BestKeeper, NormFinder and ReFinder. When the outcomes of five algorithms were compared, only slight differences were observed in ranking. However, for different tissues, TUA11, TUB6 and SAD were identified as the reliable reference genes in leaves. EF1a, MZA and GAPC2 were the less variable reference genes in shoots. MZA, GAPC2, SAD and TUA11 were the best reference genes in buds, while UBQ and MZA were considered as the most stable reference genes in sepals. These results showed that combined reference genes in overall level might solve this problem. In addition, using multiple reference genes might improve, yet is not necessary, for reliable gene expression analysis in G. raimondii.