Introduction

Gene expression analysis is becoming more and more important for current biological and biomedical research. No matter what method is employed, reference genes always provide evidence that the same amount of total gene products (mRNAs or proteins) is loaded in each sample. As an optimal reference gene, its expression level should be consistent under any possible condition. Because of the nature of genes, many housekeeping genes, such as genes that encode actins, cyclophilins, elongation factor-1α (EF-1α), glyceraldehyde-3-phosphate dehydrogenase (GAPD), microglobulins, ribosomal units (18S or 28S rRNA), ubiquitin (UBQ), and tubulins, have been frequently selected as reference genes (Pfaffl et al. 2004). These genes are typically needed for the maintenance of the cell structure and/or function; their expression levels are relatively stable compared to tissue-specific genes. However, recent studies have demonstrated that the transcription levels of housekeeping genes also vary in different cell types and developmental stages as well as being affected by different experimental conditions including varying treatments (Ahn et al. 2008; Jordan and Wilson 2004; Yu et al. 2020). This suggests that no perfect reference gene exists in living cells. Thus, it is necessary to evaluate and select the best reference gene(s) for an experiment before performing gene expression analysis.

To evaluate and screen reference genes for gene expression analysis, several computational programs have been developed in the past 15 years. These programs include geNorm (Vandesompele et al. 2002), NormFinder (Andersen et al. 2004), BestKeeper (Pfaffl et al. 2004), and the comparative ΔCt method (Silver et al. 2006). Different computational programs rank the stability of housekeeping genes based on statistic endpoints and algorithms, which causes the stability score and ranking to vary among these programs. As a result, these programs may give a different ranking than others and it is hard to delineate which ranking is the best (Ahn et al. 2008). A web-based tool for comparing and analyzing the stability of housekeeping genes will significantly enhance reference gene-related studies and further enhance gene expression profiling and functional studies by using quantitative real-time PCR (qRT-PCR) and/or Northern blotting.

Currently, only a single online resource exists for comparing and evaluating housekeeping genes as candidates to be reference genes—the RefFinder tool. In this study, we describe this user-friendly tool for evaluating and screening reference genes from extensive experimental datasets. This tool integrates the currently available four major computational programs (geNorm, NormFinder, BestKeeper, and the comparative ΔCt method) to compare and rank the tested candidate reference genes. Based on the rankings from each program, we assigned an appropriate weight to an individual gene and calculated the geometric mean of their weights for the overall final ranking.

Methods and program design

Following the introduction of algorithms as described in the four computational programs (geNorm (Vandesompele et al. 2002), Normfinder (Andersen et al. 2004), BestKeeper (Pfaffl et al. 2004), and the comparative ΔCt method (Silver et al. 2006)), we rewrote the algorithm program in PHP and then integrated them to the web. To make use more convenient, only the original Ct value from quantitative real-time PCR (qRT-PCR) is required for input on the web page (Fig. 1). Users only need to click the “submit” button and then all of the results from the four algorithms will be generated (Fig. 2). Finally, based on the rankings from each of the four algorithms, we developed a simple algorithm to present an overall ranking for testing reference genes. First, each tested candidate gene was assigned a number from each of the four computational programs based on the stability justified by that program. The number assigned will be from 1 to N (N is the total number of tested genes) according to the rank of reference genes based on that program; one is the most stable gene according to that program, and N is the least stable gene. Then, our developed tool (RefFinder) will automatically calculate the geometric mean of each gene weight across the four algorithms and will finally re-rank these candidate reference genes based on the geometric mean of each gene. The gene with the least geometric mean is viewed as the more stable reference gene, while the gene with higher geometric mean is the less stable reference gene.

Fig. 1
figure 1

The interface of the integrated tool for analyzing reference gene expression

Fig. 2
figure 2

Results of analyzing reference gene expression by using the integrated tool

Results

Data input and gene rank

The integrated tool for analyzing reference gene expression offers users an easy-to-use interface. Users only need to copy their original Ct values from an Excel file (can be generated and exported directly from qRT-PCR) to the input box and then submit the data. Users are also allowed to input data manually according to the required data format of the tool. A comprehensive overall ranking as well as the four individual rankings from each of the four algorithms will be generated immediately (Fig. 2). To help the user understand and run this web-based tool, we also added a real dataset in this program for testing the run; this dataset was obtained from a previous study (Chen et al. 2011), in which 10 housekeeping genes (five protein-coding genes and noncoding genes) were selected to test their stability under different treatments in human breast cancer cell line MCF-7. The users just need to click the “try example,” and then they will see the raw data and analyzed results based on this program.

Features of the four tools

The Excel-based approach, geNorm can be used to determine the most stable reference genes from a set of candidate reference genes in a given cDNA sample panel (Vandesompele et al. 2002). By calculating the average pairwise variation of a particular gene with all other genes, all candidate reference genes are ranked based on average expression stability value (M value) from the most stable to the most variable genes. Stepwise exclusion of the least stable gene with the highest M value is iteratively performed to rank the tested genes until the two most stable genes cannot be further ranked (Vandesompele et al. 2002). NormFinder employed a model-based strategy to identify stable expressed genes among a set of candidate normalization genes (Andersen et al. 2004). Based on a mathematical model of gene expression, NormFinder estimates not only the overall variation of the candidate reference genes but also the variation between subgroups of samples. System error induced when using a gene can also be evaluated, due to the fact that NormFinder allows a direct measure for the estimated expression variation (Andersen et al. 2004). Unlike geNorm, NormFinder assesses the expression stability of each candidate independently. BestKeeper is also an Excel-based tool which determines the best suited reference genes out of candidate genes using pairwise correlation analysis of candidate reference genes (Pfaffl et al. 2004). Correlations of the expression levels of all candidate genes are firstly estimated by BestKeeper and then the highly correlated ones are combined into an index. Finally, three indicators including standard deviation, percent covariance, and power of the candidates, are calculated to help users determine the best reference genes (Pfaffl et al. 2004). The comparative ΔCt method avoids the need to accurately quantify input RNA and instead uses ΔCt comparisons between genes (Silver et al. 2006). Using an algorithm similar to geNorm, the comparative ΔCt method assesses the most stable reference genes by comparing the relative expression of “pairs of genes” within each tissue sample or each treatment. The mean of standard deviation values derived from a comparison between a particular reference gene and any other candidate is calculated as the gene stability indicator. The lower the arithmetic mean, the more stable the gene (Silver et al. 2006).

Although these four tools are widely used in the selection of reference genes, to the best of our knowledge, there is no report of comparison on these four tools (Xie et al. 2012). It is hard to determine which one is the best due to the individual specific algorithm. Using assigned weights on each reference gene according to ranking from each of the four programs, we calculated the geometric mean of the weights from each ranking and made a comprehensive ranking from a general point of view. More importantly, we provided a web-based tool for public use, that will significantly enhance the studies of gene expression and functional analysis.

Discussion

Gene expression analysis is one of the most popular experiments which is widely used in many fundamental and applied biological and biomedical research, including genetics and toxicology studies, particularly during the past-genome sequencing era (Liu and Zhang 2022). During gene expression analysis, in almost all studies, an internal standard, termed a reference gene, is needed to normalize mRNA levels between different samples. The mRNA levels can be altered due to several factors, including pipetting errors and RNA quality. Unlike the PCR efficacy that will cause the fold changes to differ, selecting the wrong reference gene may result in a wrong conclusion. Thus, a good reference gene should be expressed at a consistent level and would not be altered among different developmental and environmental conditions. To meet this criterion, an appropriate reference gene is generally involved in regulating basic cell function (such as transcription elongation factor) and/or maintaining cell structures (such as actin); these genes are commonly referred to as housekeeping genes. Unfortunately, there is no perfect reference gene because the expression profiles of all genes are affected by developmental or environmental changes; the only difference is that some changes are large, and some changes are small. Thus, screening the most reliable reference genes has become an important study when performing a gene expression analysis. The common methods and experimental steps for screening and selecting an appropriate reference gene are (1) selecting housekeep genes pool (usually 10–16 genes) and designing primers following the primer designer criteria, such as avoiding four nucleotides in a row, no primer dime or hairpin structure; (2) treating the samples at different environmental and developmental conditions; (3) running RT-PCR and qRT-PCR. During this procedure, you need to be very careful to add the same amount of mRNAs as possible because of no reference gene as an internal standard; and (4) comparing each candidate gene at different conditions and selecting the most stable gene as the reference gene for the following study. Among all these steps, comparing the stability of each gene may be the most difficult part because of qRT-PCR generating so much raw data. To solve this problem, several laboratories have developed several computation programs to compare each candidate gene. Among these programs, geNorm (Vandesompele et al. 2002), NormFinder (Andersen et al. 2004), BestKeeper (Pfaffl et al. 2004), and the comparative ΔCt method (Silver et al. 2006) are four frequently used programs for studying reference genes. These four programs rank a reference gene based on different statistic endpoints and algorithms. Thus, each one has different advantages and may rank the reference genes with a slight difference. To solve this problem and allow people better rank each individual reference gene, we develop a new computational program called RefFinder. This is the first program to integrate the four most commonly used computational programs to rank a set of reference genes. Thus, the rank of reference genes is more reliable. More importantly, RefFinder is designed on a web platform and is easier to use for people without any bioinformatics skills. Currently, RefFinder has been widely used and validated by hundreds of research laboratories around the world for studying, comparing, and screening the most reliable reference genes for various research purposes (Bansal et al. 2015; Hazarika et al. 2023; Kochhar et al. 2022; Taki et al. 2014; Taki and Zhang 2013; Wang et al. 2013; Zhang et al. 20122023).