Key words

1 Quantitative PCR

In 1984, Kary Mullis de veloped the Polymerase Chain Reaction (PCR) . This technique amplifies a specific segment of DNA to obtain hundreds of millions of copies in few hours [1]. It was used for qualitative studies and it was not until 1992 when Higuchi et al. developed the quantitative Polymerase Chain Reaction (qPCR), which employs t he PCR technique to gene expression studies.

To do this, they used the same material as for conventional PCR, a pair of specific oligonucleotides as primers ; deoxynucleotide triphosphates (dNTPs); a reaction buffer; a thermo stable DNA polymerase; and they added a fluorochrome that fluoresces when excited [2]. Additionally, they designed a system capable of detect ing in real-time PCR products accumulated. The system uses a camera that detects the increase in fluorescence that occurs when the ethidium bromide is intercalated in new strands of DNA formed in each cycle [3, 4]. Therefore, in the real-time PCR or qPCR, the processes of amplification and detection occur simultaneously in the same vial.

The PCR reaction consists in a series of cyclical temperature changes. Each cycle is divided into three stages [5]:

  • Denaturation: separation of double-stranded DNA when subjected to 95 °C.

  • Hybridization of primers: alignment of the primers to the DNA template at temperature around 50–60 °C.

  • Elongation or Polymerization: binding of the corresponding dNTPs, to the DNA elongation chain at temperature around 68–72 °C.

Theoretically, if the reaction efficiency is 100 %, the number of DNA molecules will double with each cycle. The reality is that efficiency in optimal conditions will be slightly lesser than 100 % [6].

1.1 Methods of Detection

The qPCR technique allows t he quantification of starting material (DNA, cDNA , or RNA ), by using fluorophores. The fluorescence is measured in each cycle and is proportional to the amount of the PCR product.

On the one ha nd, fluorophores can be fluorescent dyes that nonspecifically bind to double-stranded DNA an d produce a fluorescent amount that correlates with the DNA copy number. In this group we find dyes such as SYBR Green [7]. On the other hand, can be used fluorochromes attached to probes, which specifically hybridize to the amplified DNA strands. Thus, the reaction is more specific and the signal only is generated when the probe hybridizes with its complementary region. In this group, we find hydrolysis probes “TaqMan,” FRET probes, Beacon probes, and Scorpions probes [8].

1.1.1 Nonspecific Fluorescent Dyes

SYBR Green I is an organic compound with the chemical formula C32H37N4S [9]. It is associated with the DNA molecule by interacting with the minor groove. It is used in DNA staining for electrophoresis analysis of PCR products, or as a means of direct visualization of the PCR products in real time [9].

The melting curve analysis is necessary to detect problems of nonspecific binding of SYBR Green to DNA. Comparison of the melting curves allows an interpretation of the amplified products in PCR . Different products have different temperatures because the melting temperature depends on the size of the amplicon, the GC content, and the secondary and tertiary structures. At the end of the PCR reaction, a gradient from 50 to 95 °C is made to denature the double-stranded DNA. When the double-stranded DNA becomes single-stranded DNA, the decrease in fluorescence can be observed as peaks, represented by performing the second derivative of fluorescence. If the reaction is not specific, the PCR products show different melting curve peaks [10]. Since more than a decade there are other fluorescent dyes such as LC Green, ResoLight, EvaGreen, Chromofy SYTO, and BEBO [11].

1.1.2 Sequence Specific Probes

  • TaqMan® probe: belongs to the group of hydrolysis probes and is based on the 5′-3′ exonucleasa activity of the Taq polymerase. The Taqman probe is a sequence complementary to a PCR product that is not part of the primers . The probe is labeled with a fluorophore that covalently bonds to its 5′ end called “donor” and with a quencher covalently bonded to its 3′ end also called “acceptor” whose function is to qu ench the fluorescence emitted by the fluorophore when is excited by the light. The 5′ exonuclease activity degrades the probe and releases the fluorophore that will emit fluorescence when not being close to the quencher. The liberated fluorophore is proportional to the amount of DNA amplified in the PCR. The probe should be close to a primer and the amplicon size should not be greater than 200 pair bases. Another drawback is that these probes do not allow melting curve analysis because the hydrolysis of the probe prevents its reutilization [12, 13].

  • Molecular Beacons: they have a fluorophore covalently linked to 5′ called “donor” and a quencher covalently linked to 3′ called “acceptor”; they also have a hairpin-like secondary structure which brings the fluorophore to the quencher. When the probe specifically hybridizes to DNA, the distance between the fluorophore to the quencher is opened, allowing registering the signal of the annealing phase. The difficulty is in the probe designing because the secondary hairpin structure of the probe–amplicon hybrid should be more stable than that formed by the molecular beacon [1214].

  • FRET probe: two probes system in which one oligonucleotide contains a donor fluorophore and the acceptor fluorophore is in another oligonucleotide. When both probes bind, the fluorophores are close and the power is transmitted from the donor to acceptor emitting fluorescence. A good design of the two primers and the two probes is critical to obtain good results [1214].

  • Sco rpion probe: it acts both as a probe and as a PCR primer and has hairpin structure. The molecule has a fluorophore and a quencher similar to molecular beacon probes that bind to amplicon by the same principle. The reaction leading to the fluorescent signal is imm ediate. This is because it is attached to primer and not collides with targeted region; therefore, signals are stronger [14].

  • MGB probe (Minor Groove Binder): it is considered a variant of Taqman probe but it is not hydrolyzed and has the bases modified. The probe is labeled with a fluorophore in 3′ and in 5′ with a quencher. The Minor Groove B inder is a part of the probe that binds to minor groove of DNA. Minor Groove Binder allows protecting the probe from the 5′-3′ exonucleasa activity of the polymerase. When the probe hybridizes the fluorophore is separated of quencher and a fluorescent signal is generated. The fluorescent signal is directly proportional to DNA quantity. An advantage of these probes is that they are shorter and more stable. They include superbases which ar e bases modified to increase the temperature of melting curve [12, 13].

1.2 Quantitative PCR Characteristics

1.2.1 Advantages of the Technique

The main advantages of this technique are as follows [15]:

  • Speed: the testing time is approximately 1 h.

  • Simplicity: the assay requires a pair of primers , an enzyme, dNTPs, and optionally a probe.

  • Convenience: does not require postamplification processing.

  • Sensitivity : samples with very few copies of mes senger RNA (mRNA) can be quantified.

  • Specificity : a well-designed assay is specific for a single target.

  • Robustness : a well-designed trial will give results in a wide range of reaction conditions.

  • High performance: thousands of reactions can be carried out in a single experiment.

  • Familiarity: the PCR is wel l known; their advantages and disadvantages are well understood.

  • Cost: the price of reagents is af fordable and with small reaction volumes, the costs decrease.

1.2.2 Basic Terms of qPCR

All systems similarly quantify and record fluorescent signal each cycle of the PCR. The reaction kinetics shows a representation forming a sigmoid shape. The graphical representation is defined by the background baseline, threshold, and threshold cycle [13] (see Fig. 1).

Fig. 1
figure 1

Basic terms of qPCR

  • Background: Fluorescence not specific of the reaction. The mathematical algorithm removes it.

  • Baseline: noise level in early cycles of PCR where it is not detected an increase in fluorescence of PCR product. Determines basal fluorescence.

  • Threshold: threshold value is set just above the baseline level where the exponential amplification begins. Threshold level can be determined automatically or manually.

  • Threshold cycle: t he cycle at which the fluorescence exceeds the threshold. This cycle is used for relative quantification of gene expression.

1.2.3 Phases of qPCR

The qPCR reaction involves three phases that may be represented as a sigmoidal curve [16]. The three phases are as follows [12] (Fig. 2):

Fig. 2
figure 2

Phases of qPCR

  • Exponential phase: the amount of product is small, the PCR product is exponentially generated because of that the enzyme and reagents are not limited, so that reaction can reach the maximum efficiency. This exponential growth is difficult to detect because the quantity of product is insufficient. The amount of product at t his stage is proportional to the amount of initial sample.

  • Linear phase: the amount of product increases linearly because the quantity of enzyme and reagents begin to be limited so that reaction efficiency decreases.

  • Plateau Phase: th e reaction slows until the dNTPs and primers needed for new synthesis are depleted.

The cycle at which the fluorescence begins to exceed the background level is called the threshold cycle (Ct) and is the beginning of the logarithmic phase. Therefore, the Ct is inversely correlated with the amount of sample: the lower Ct, the greater cDNA amount.

1.2.4 Quantification

To quantify the expression of the gene under study there are two methods , called absolute quantification and relative quantification. In the first case, the absolute expression amount is expressed as number of copies obtained. In the second, the quantification is based on a calibrator, whose value is benchmark for all others, assigning it a value of 1 [17].

To perform absolute quantification is necessary to know the number of copies of target gene in a standard sample. Generally, this standard sample is a plasmid DNA or complementary DNA (cDNA) whose concentrations are measured by a spectrophotometer. In the assay, serial dilutions of the standard sample in which the target gene will be amplified should be prepared.

The computer registers the threshold cycles (Ct) for each sample. The dilutions of a standard sample show a standard curve that reflects the number of copies by interpolation of the Ct and allows quantifying the samples [12, 17].

There are three methods for relative quantification; the most widely used is the method 2-ΔΔCT [12]. This method assumes that the procedure doubles the DNA content in each cycle, that react ion efficiency is 100 %, and that a reference gene is expressed at a constant level between all samples [18]. This reference gene is a constitutive gene that is used as an endogenous control to correct intra and inter-assay variability.

The expression of th e reference gene is changeless in the different samples of the assay as it is a gen which function is related to the cell maintenance, therefore, it is also called constitutive gene. The following formula expresses the ratio obtained from the relationship between the Ct values of the sample and the Ct values of the calibrator:

Ratio = 2−[ΔCt sample−ΔCt calibrator]

Ratio = 2−ΔΔCt

ΔCt sample is the difference between the Ct of the gene under study in sample minus the Ct of the reference gene i n sample; ΔCt calibrator is difference between the Ct of the gene under study in the calibrator minus the Ct o f the reference gene in the calibrator.

Another model for relative quantification was proposed by Pfaffl [19, 20]. In this model, different PCR efficiencies of the genes under study are taken into account, as shown in the following equation:

$$ \mathrm{Ratio}\left(\mathrm{fold}\right)=\frac{\left(E\kern0.28em \mathrm{t}\mathrm{arget}\kern0.28em \mathrm{gene}\right)\kern0.28em \varDelta \mathrm{C}\mathrm{t}\kern0.28em \mathrm{t}\mathrm{arget}\kern0.28em \left(\mathrm{calibrator}-\mathrm{sample}\right)}{\left(E\kern0.28em \mathrm{reference}\kern0.28em \mathrm{gene}\right)\kern0.28em \varDelta \mathrm{C}\mathrm{t}\kern0.28em \mathrm{reference}\kern0.28em \left(\mathrm{calibrator}-\mathrm{sample}\right)} $$

I n this model, it is necessary to know the efficiency of each pair of primers for each gene. The efficiencies are obtained from the slopes of standard curves obtained from serial dilutions according to the formula [16]:

E = 10[−1/slope]−1

The third model is called standard curve method or method-E that analyzes the efficiency of the gene under study and reference gene using standards, for which serial dilutions of a single sample are made. The standard cu rve method calculates th e efficiency for each pair of primers of each gene [19, 21].

1.2.5 Selection of Reference Genes

Selection of the pr oper reference gene is a critical step to assess correctly data obtained by quantitative PCR. Most authors agree that the use of reference genes is the most effective and simple method for correcting bugs and glitches such as [8, 22]:

  • Problems in the process of extraction, purification, or storage of RNA

  • Bad performance of reverse transcription to cDNA synthesis

  • Errors in pipetting or on transferring of material

  • Polymerase inhibitors

  • Poorly designed primers

  • Inappropriate statistical analysis

Many qPCR experiments have been wrongly designed and are difficult to reproduce due to poor quality of data [8]. For this reason, in recent years the process of normalization of reference genes has become a recurring problem addressed by scientists. This has led to development of a variety of protocols and methodologies. Moreover, related publications showed big differences regarding the published information on development of their researches. This is being solved by guidelines for all publications associated with this methodology that are published in the MIQE guide (Minimum Information for Publication o f Quantitative Digital PCR Experiments) [23].

Mo st publications agree that constitutive genes should show minimal var iability among the tissues, cells , physiological conditions, or treatments under investigation [8, 24]. Therefore, it is necessary validate the reference genes for each tissue and treatment analyzed. Furthermore, it is desirable that the constitutive gene presents a threshold cycle (Ct) as close as possible to the problem gene, but it is not always possible, so that general recommendations should be followed. These recommendations do not advise choosing a constitutive gene with low expression (Ct > 30) or with high expression (Ct < 15) [8].

It is always recommended to use at least two reference genes, since the use of one can only result in relatively large errors [8, 23]. Although the use of a single reference gene is acceptable if it was previously tested in an experiment with similar conditions and was properly validated, it should be avoided to minimize the possi ble bias [8].

Several programs and models to select reference genes have been proposed with different statistical approaches and algorithms. Among the best-known applications are statistics to normalize, Normfinder, geNorm, and BestKeeper. Although there are approaches such as the ΔCt method or the classic ANOVA model for comparing the stability of the reference gene expression in the different study conditions by comparing the average Cts [25].

It also should be mentioned that many algorithms exist to study the variability of the constitutive gene expression, therefore the classification of the candidate reference gene might vary depending on the software or statistical technique used [2629]. Finally, in selecting reference genes is advisable to select genes wi th different functions that do not have a common regulation that may affect its expression [8].

2 Microarrays

The Microarrays technique is based on complementarity between nucleic acid strands allowing detecting specific sequences by what is called hybridization [30]. The in situ hybridization was independently used for the first time by Gall and Pardue, and by John et al. in the same year. It was not used until 1986 to de tect mRNA, by Coghlan [31]. In the late 1980s, Ekins et al. in University College London developed the first array for immunoassay studies, this array began to be manufactured and marketed in 1991 [32].

In the field of gene e xpression, it could study between 500 and 18,000 cDNAs . These human cDNAs were obtained from bacterial libraries and generally were radiolabeled with 33P-dNTPs. The most important limitations of macroarrays were the low density of probes onto the nylon support, large volumes of sample necessary for hybridization and that bacterial libraries could not be composed of pure colonies [33].

Technologi cal efforts were focused on the miniaturization of arrays to overcome the limitations of the existing technology. In 1991, a group formed by Stephen Fodor, Leighton Read, Michael Pirrung, and Luberc Stryer fabricated a microarray. In 1993, Stephen Fodor creates a spin-off that was dedicated to the microarrays development and in 1994 began the manufacture and sale of microarrays [34, 35].

Later, Patrick Brown used a glass support allowing a higher density of probes. Miniaturization decreases the amount of sample required for the study of gene expression, on the other hand, the use of radioactively labeled nucleotides was replaced by fluorophores [36]. In 1996, discloses the design, tools, and knowledge to let other research groups can make their microarrays in their laboratories. This information will boost the us e of DNA microarrays [33].

The microarray, also known as DNA chip or biochip, is a solid support of glass, plastic, or nylon which is joined to oligonucleotides which sequences corresponds to all regions of the genome . The manufactures used the same technology used in the semiconductor chips but vertically placing million DNA strands on the support. Each oligonucleotide is placed in a specific area called “probe cell,” where billions of copies of the olig onucleotide are found. These oligonucleotides are synthesized prior to bonding to the support or on the support. In the second case, there are many different methods : photolith ography, phosphoramidite injection, or activation of precursors by an electric field [37, 38].

2.1 Types of Microarrays

Two types of microarrays are distinguished: expression microarrays, where specific RNA sequences are detected and genotyping microarrays for detecting specific DNA sequences [38]. By comparing the results of both arrays it could be established the relationship of polymorphisms with gene expression, which, among others, it could explain the different response to the treatment observed among patients with different genetic background.

2.1.1 Oligonucleotide Microarrays

Fragments of 25 specific base pairs are immobilized while chemically synthesized. By homogeneity, reproducibility, robustness , and high density are the most used since they can study up to 20,000 genes at once [38]. The main limitation is the cost of specific oligonucleotides selection and synthesis; it is more th an three times more expensive than a cDNA microarray. The most important advantage is that photolithography can be used to direct synthesis of oligonucl eotides which allows a high density of probes. Another advantage is that targeting sequence synthesis prevents cross-hybridization between sequences related genes. Furthermore, all the oligonucleotides of the microarray are of the same size, the same temperature melting, and the same concentration, therefore, the experimental variation is decreased, and the statistical power is increased [39].

2.1.2 cDNA Microarrays

The cDNA probes of 600–2000 base pairs are immobilized on a glass, nylon, or nitrocellulose base [38]. The main advantage is that these microarrays are cheaper than oligonucleotide microarrays. Another advantage is that it can be created from cDNA libraries, many of them of public domain.

The most important limitation is the data treatment due to the possibility of cross-hybridization that occurs between related genes. Another important limitation is that the variations of probe sizes and melting temperatures can diminis h the statistical power [39].

On the other hand, these microarray experiments can be used in co-hybridization of two colors. These experiments allow direct compari son of mRNA abundance in two populations, although by this approach it is obtained comparative ratios rather than absolute levels of expression. On the other hand, the use of rati os reduces the inter-assay variation [39].

2.2 Methodology in Microarrays

The methodology of this technique includes the following steps [38, 40]:

  • Extraction and preparation of RNA : the RNA is extracted from specific tissues, trying to obtain RNA with the highest purity and quality possible and thus avoiding a major source of variability.

  • RNA amplification: the RNA is amplified by PCR to facilitate hybridization because the amount of mRNA in cells may not be sufficient.

  • Reverse tra nscription: to convert mRNA into cDNA .

  • Labeling the probes: the cDNA is fragmented and labeled with biotin. After, the fluorescent molecule that binds to biotin is added.

  • Hybridization of the probes: the time required for complete hybridization is directly proportional to sam ple concentration. At the end of this process, the hybridized microarray is rinsed to remove unbound chains.

  • Scanning of microarray: the fluorescent light detection indicates that hybridization has occurred at a specific point to a specific sequence. Reading is performed by a laser and the fluorescence is recorded by scanning. The fluorescence intensity is proportional to the amount of probe bound to each sample.

2.3 Limitations of Microarrays

The expression microarrays have produced a lot of information, but have limitations. Some of the main limitations of using microarrays are as follows [41, 42]:

  • Limited dynamic range of detection: the detection of expression levels is limited to two or three orders of magnitude due to the background and the signal saturation. Thus, they are not suitable for studies of genes with low or very high expression.

  • Reliance on existing knowledge: only can study known genome sequences, an errors in the database can worsen outcomes.

  • Difficult comparis on between experiments: they require complicated normalization methods for comparing expression levels from different experiments.

  • Cross Hybridization: between genes of similar sequence; generates background and reduces the dynamic range.

  • New transcripts can not be detected: new transcripts produced by alternative splicing can not be detected if the sequence is unknown.

  • Necessity of validation : the microarrays need to be validated by qPCR for obtaining reliable microarray expression data. To do thi s, you must select and validate by using appropriate reference genes [43].

3 RNA Sequencing (RNAseq)

In 1964, H olley made the first complete sequencing of a gene. In the 1970s, Maxam and Gilbert developed a DNA sequencing technology based on chemical modification of DNA and subsequent cleavage at specific bases, while Sanger developed a DNA sequencing method based on the chain termination method. The Sanger sequencing was imposed by high efficiency and low radioactivity as the first-generation sequencing [44, 45].

The first automatic sequencer that used the Sanger method appeared in 1987 adopted the capillary electrophoresis as a more accurate and faster sequencing method. These sequencers have evolved from the 500 kilobases sequenced per day with the first model that appeared on the market until the 2.88 Megabases of the current model. These sequencers allowed to complete the human genome project in 2001 [45].

In 2005, the great revolution occurred in the field of DNA sequencing when the first sequencing of high performance appeared i n the market. Several companies are now responsible for the development of high-throughput sequencers [45].

3.1 Transcriptome and RNA Sequencing

The high-throughput sequencers allow investigating the transcriptome. The transcriptome is the set of ribonucleic acids in the cell , including messenger ribonucleic acid (mRNA), transfer ribonucleic acid (tRNA), ribosomal ribonucleic acid (rRNA), small nuclear ribonucleicacid (snRNA), noncoding ribonucleic acids (ncRNA), and others. These RNAs are differently expressed according to the tissue, the physiological condition, or the stage of development [46].

The interpretation of the transcriptome complexity is a crucial objective to understand the functional elements o f the genome and thus the functioning of the disease and its progression. In this sense, i t has recently been shown that the amount of noncoding DNA increases with the complexity of the organism, 0.25 % in the prokaryotic genome and 98.8 % in the human genome.

The existing level of complexity attached to the discoveries of endogenous small interfering RNA (siRNA) , long interspersed noncoding RNA (lincRNA), transcription initiation RNA (tiRNA), mic roRNAs (miRNAs), transcription start site-associated RNA (TSSa-RNA) among others, represent parts of the transcription puzzle that we must decipher to understand how the genome works [45].

For this, the new RNA sequencing technology must start with cataloging all RNAs from mRNA; through the noncoding RNA to reach sm all RNA s determine transcription start sites and quantify changes in the expression l evels of genes during development and in different conditions [41].

RNA sequencing is a technique in which sequenced fragments of 25–400 base pairs are used depending on the technology. To this end, a population of RNAs is fractionated and is transformed into a population of cDNA . These cDNAs are joined to adapters by one or both ends. Each cDNA molecule is sequenced t o obtain short readings of one or both ends [41].

3.2 RNAseq Methodology

The design of a transcriptome study follows these steps [46]:

  1. 1.

    Selection of the tissue of interest in which the RNAs are to be studied.

  2. 2.

    Building the cDNA libraries.

  3. 3.

    Using a massive sequencing system.

  4. 4.

    Analyze the data using bioinformatics tools. Million short readings are obtained, which are mapped to a genome or transcriptome. The reads must be aligned to summarize all data. Finally, the data are normalized and the statistical tests are applied for studying differential gene expression, resulting in a classification of genes with their expression levels, the p-values, and the fold-changes [47].

There are several methods and programs to treat RNA sequencing data which aim not only estimating the difference s in expression levels between samples, but also, detecting alternative splicing , analyzing the RNA editing, calculating the abundance of a transcript, etc. [47].

3.3 Advantages and Limitations of RNAseq

The RNAseq technology presents several advantages over other methodologies that aim to study gene expression:

  • Resolution of a base: it can also detect changes in a single nucleo tide (SNP) in the transcripts, microsatellites, isoforms , and allelic variants.

  • Wide Dynamic Range: no upper limit for quantification and correlates with the number of sequences obtained, consequently has a wide dynamic range of expression levels, estimated four to five orders of magnitude. So, it can analyze genes expressed at very low or very high levels that are not detected by other techniques. The low background-signal helps to improve the dynamic range.

  • Short Readings: readings from 30 bp to allow accurate information about how to connect two exons.

  • High accurate: it has been shown by validation studies with quantitative PCR.

  • High levels of reproducibility

  • Lower RNA sample: because there are no cloning steps.

  • There is no need of reference genome : a reading can be performed without reference genome and the transcriptome can be de novo assembled. This is an advantage for species whose genome has not been sequenced yet [41, 45, 46].

However, this technique also presents some limitations as the high cost. It requires lots of resources compared to other techniques. The Data set is large and complex, the large amount of generated data makes the interpretation difficult, and usu ally a bioinformatic adviser is needed [46].