Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 DNA Microarrays: A Revolution in Cell Biology

The last decade of the twentieth century witnessed two revolutionary experimental techniques to emerge in molecular and cell biology – the single-molecule mechanics techniques discussed in Chap. 11 (Xie 2001; Ishii and Yanagida 2007; Deniz et al. 2008) and the DNA microarray technique to be discussed in this chapter.

The advent of the microarray technique in molecular biology in the mid-1990s (Pease et al. 1994; Schena et al. 1995; Eisen et al. 1998; Holter et al. 2000; Watson and Akil 1999; Alon et al. 1999; White et al. 1999) marks an important turning point in the history of cell biology, perhaps comparable to the discovery in 1953 of the DNA double helix in molecular biology. Although there remain many challenging problems, both methodological and theoretical (Weinstein 2008; Ji and Yoo 2005; Ji et al. 2009a), this novel technology has a great potential to make fundamental contributions to advancing our knowledge about the basic workings of the living cell which will lead to practical applications in medicine, biotechnology, and pharmaceutical industry (Chaps. 18, 19).

The advancement of the microarray technique, which was critically dependent on the molecular biology of DNA, has initiated a paradigm shift away from DNA toward a system-based biology by allowing biologists to study the cell as an organized system of biopolymers in contrast to the earlier studies centered on individual biopolymers (DNA, RNA, and proteins). In other words, DNA not only opened the era of molecular biology in the mid-twentieth century but also ushered in its own eclipse as the prima dona of biology in the last decade of the same century, by giving birth to the microarray technique that led to the emergence of the systems biology. It is becoming increasingly clear that the genome-wide expression data revealed by the DNA array technique can no longer be rationally accounted for solely on the basis of the principles and knowledge gained from molecular biology of individual biopolymers alone (Ji 2004a; Ji and Yoo 2005; Bechtel 2010) and that new approaches and perspectives are needed that are deep and powerful enough to enable cell biologists to correctly analyze and interpret the avalanche of DNA array data that is accumulating on the Internet.

2 The DNA Microarray Technique

cDNA fragments can be fabricated into either microarrays or macroarrays (Garcia-Martinez et al. 2004), but the term “microarrays” frequently refer to both micro- and macroarrays. A microarray consists of a microscopic slide (or its equivalent), about 2 × 2 cm in dimension, divided into, typically, 10,000 squares, each of which covalently binds hundreds of copies of a fragment of DNA (i.e., cDNA, or oligonucleotides) that is complementary to a stretch of the genome encoding a RNA molecule (Watson and Akil 1999). Thus, using one microarray, it is possible to measure simultaneously the levels of 10,000 RNA molecules in a biological sample. Before the development of the microarray technique, it was possible to study only a small number of RNA molecules at a time. The experimental procedures involved in DNA microarray measurements are (Watson and Akil 1999):

  1. 1.

    Isolate RNA from broken cells.

  2. 2.

    Synthesize fluorescently or radioactively labeled cDNA from RNA using reverse transcriptase and fluorescently or radioactively labeled nucleotides. When fluorescently labeled nucleotides are employed, the red fluorophore Cy5 is used to label the DNA synthesized from the RNA isolated from query samples and the green fluorophore Cy3 is used to label the cDNA synthesized from the RNA isolated from reference samples.

  3. 3.

    Prepare a microarray either with EST (expressed sequence tag, i.e., DNA sequences, several hundred nucleotides long that are complementary to the stretches of the genome encoding RNAs) or oligonucleotides (synthesized right on the microarray surface).

  4. 4.

    Pour the labeled cDNA preparation over the microarray surface to effect hybridization and wash off excess debris.

  5. 5.

    Measure the light intensity or the radioactivity of the labeled cDNA bound to individual squares (or spots) on a microarray surface using a computer-assisted microscope.

  6. 6.

    Display the final result as a table of numbers, each registering the signal intensity of a square on the microarray which is proportional to the concentration of cDNA (and ultimately RNAs in the cell before breaking the cell membrane) located at row x and column y. Row x indicates the identity of the genes encoding RNAs being measured, and column y indicates the time of measurement or the conditions under which the RNA levels are measured.

Typical results of microarray measurements of RNA levels are shown in Figs. 9.1, 12.1, and 12.2. These data were measured by Garcia-Martinez et al. (2004) from budding yeast Saccharomyces cerevisiae undergoing glucose–galactose shift at 6 time points: 0, 5, 120, 360, 450, and 850 min after the nutritional shift. Except two trajectories in Fig. 12.1b (brown and yellow trajectories) and one in Fig. 12.1c (yellow), most of the 15 transcripts show kinetic behaviors that are similar to the genome-wide average kinetic behavior shown in Panel in Fig. 12.1d, despite the fact that the scale of the y coordinates varies over two orders of magnitude. The overall quality of the kinetic data, as evident in the smooth and coherent trajectory exhibited by each gene, increases our confidence in the microarray experimental method. The three unusual trajectories seen in Figs. 12.1b, c are most likely not artifacts of measurement but most likely reflect genuine biological responses of the associated genes to the nutritional stress.

Fig. 12.1
figure 1

Typical examples of the time-dependent RNA levels measured in budding yeast with DNA microarrays after switching glucose to galactose in the growth medium at t = 0 (Garcia-Martinez et al. 2004). Each data point is the average of three measurements. Panels a, b, and c show the RNA level trajectories of 15 genes chosen randomly out of over 6,000 genes in three different ranges on the y-axes, namely, between 0 and 60 RNA molecules per cell in Panel a, between 0 and 10 in Panel b, and between 0 and 1 in Panel c. Panel d depicts the average kinetics of the RNA molecules encoded by 5,836 genes. See Ji et al. (2009a) for the details involved in calibrating microarray signals in absolute units of RNA molecules per cell. A similar set of graphs are shown in Fig. 9.1

3 Simultaneous Measurements of Transcript Levels (TL) and Transcription Rates (TR) in Budding Yeast

Both the transcript level (TL) and transcription rate (TR) of individual genes were measured with DNA arrays at the same time by Garcia-Martinez et al. (2004). The S. cerevisiae yeast strain BQS252 was grown overnight at 28°C in YPD medium (2% glucose, 2% peptone, 1% yeast extract) to exponential growth phase. Cells were recovered by centrifugation, resuspended in YPGal medium (2% galactose, 2% peptone, 1% yeast extract), and allowed to grow in YP Gal medium for up to about 14 h after the glucose–galactose shift. Cell samples were taken at 0, 5, 120, 360, 450, and 850 min after the glucose–galactose shift. Two different aliquots were taken from the cell culture at each sampling time. One aliquot was processed to measure transcription rates (TR) using the genomic run-on protocol, the scaled-up version of the usual nuclear run-on method (Hirayoshi and Lis 1999), and the other was processed to measure transcript levels (TL) using the same DNA macroarrays recovered from the associated TR measurements. Two examples of the TL and TR measurements are shown in Fig. 12.2.

Fig. 12.2
figure 2

The average time courses of the transcript levels (TL) and transcription rates (TR) of 14 each of the glycolytic and respiratory (also called oxidative phosphorylation, or ox phos) genes (Garcia-Martinez et al. 2004; Ji et al. 2009a)

One striking feature of the curves shown in Fig. 12.2 is the contrasting kinetic behaviors of the glycolytic and oxidative phosphorylation (or oxphos) transcripts in Panel a and the close similarity of the transcription-rate profiles in Panel b of these two metabolic pathways. It is important to keep in mind that at least a part of the reason for this difference is attributable to the fact that Panel a deals with RNA concentrations whereas Panel b is concerned with the rates of RNA synthesis.

4 RNA Trajectories as Intracellular Dissipative Structures (IDSs) or RNA Dissipatons

The temporal trajectories of individual transcripts (i.e., RNA molecules) shown in Fig. 12.1 are dynamic structures whose existence depends on dissipating free energy and hence can be referred to as dissipative structures of Prigogine (1977, 1980) (see Sect. 3.1). These RNA trajectories (or waves) are also frequently referred to as “gene expression profiles,” which is an inaccurate statement because RNA trajectories reflect not only gene expressions (understood here as transcription) but also transcript degradation (see Step 2 in Fig. 12.4 below). The general decline in the average RNA levels shown in Fig. 12.1d during the first 360 min after switching glucose to galactose is primarily due to the depletion of intracellular ATP caused by the removal of glucose, the preferred energy source of budding yeast. The subsequent rise in RNA levels beginning around 360 min is most likely due to galactose-induced expression of the genes coding for the Leloir enzymes (Berg et al. 2002) needed to metabolize the new substrate, galactose, to generate ATP (Ji et al. 2009a). This interpretation is supported by the finding that the Leloir transcripts began to increase at about 120 min after the nutritional shift (see Fig. 12.3).

Fig. 12.3
figure 3

The RNA trajectories and time-dependent rates of transcription of the Leloir genes. (a) The kinetics of the transcript levels of the Leloir genes. (b) The average kinetics of the LeLoir transcripts. (c) The average time course of the transcription rates of the Leloir genes. (d) The average of (c)

Fig. 12.4
figure 4

A simplified version of the Bhopalator model of the cell (see Fig. 2.11). Gradients in this figure can be equated with “Dissipative Structure of Prigogine” in the Bhopalator or Intracellular Dissipative Structures (IDSs) (Ji 1985, 2002). It is important to note that what DNA arrays commonly measure is not the rates of transcription or gene expression (Step 1), as is widely believed, but the intracellular levels of RNA molecules (the balance between Steps 1 and 2) at any given time. In other words, the latter is determined by both the rate of synthesis (Step 1) and the rate of transcript degradation (Step 2). Ignoring this simple fact has led to erroneous interpretations of DNA microarray data since the beginning of the DNA microarray era in the mid-1990s (see Sect. 12.6) (Ji et al. 2009a)

The qualitative features of the temporal behaviors of TL and TR displayed in Fig. 12.2a, b are summarized in Table 12.1. In order to understand the molecular mechanisms underlying these dynamic behaviors of TL and TR, it is necessary to have a simplified model of the cell as shown in Fig. 12.4. As indicated in the first two rows in Table 12.1, the total observational period of 850 min can be divided into five phases, labeled I–V. During Phase I, the transcript levels of both glycolytic and respiratory genes decrease precipitously although the corresponding transcription rates increase, most likely because the stress induced by glucose–galactose shift increases transcript degradation rates (see Step 2 in Fig. 12.4) more than what can be compensated for by increased transcription. During Phases II and III, the glycolytic transcript levels decrease by twofold, whereas the oxidative phosphorylation (oxphos) transcript levels increase by fourfold. Since the corresponding transcription rates of both the glycolytic and respiratory genes decline rapidly followed by a plateau, the increased respiratory (also called oxphos) mRNA levels cannot be accounted for in terms of transcriptional control alone but must implicate degradational control as well. That is, just as the removal of glucose “de-induces” glycolytic mRNA molecules (leading to the declining TL and TR trajectories for glycolysis seen in Fig. 12.2a, b), so it might repress (or de-induce) the degradation of respiratory mRNA molecules, leading to a rise in respiratory mRNA levels as seen in Fig. 12.2a between 5 and 360 min. This hypothetical phenomenon may be referred to as glucose de-induction in analogy to glucose induction (Winderickx et al. 2002; Ronne 1995). During Phase IV, both TL and TR for glycolytic and respiratory genes increase, and this may be attributed to galactose induction of the Leloir transcription (Fu et al. 1995; Leuther and Johnston 1992). In support of this interpretation, it was found that glucose–galactose shift induced an increase in both TL and TR of the Leloir genes (GAL 1, 2, 3, 7, and 10) between 120 and 450 min by more than tenfold (see Fig. 12.3). The Leloir genes code for the enzymes and transport proteins that are involved in converting extracellular galactose to intracellular glucose-1-phosphate (Berg et al. 2002), which is then metabolized via the glycolytic and respiratory pathways. Finally, during Phase V, the glycolytic mRNA levels remain constant while the respiratory mRNA levels decline slightly, the latter likely due to galactose repression (in analogy to the glucose repression [Winderickx et al. 2002]) of respiration following the formation of glucose-1-phosphate via the Leloir pathway (Berg et al. 2002). The transcription rate of glycolytic genes continue to increase during Phase V probably due to galactose induction (Fu et al. 1995; Leuther and Johnston 1992), although the corresponding transcript levels remain unchanged, which may indicate the degradational control of glycolytic mRNA molecules. That is, budding yeast seems capable of keeping glycolytic TL constant in the face of increasing TR, by increasing the transcript degradation rates (TD). The TR trajectory of respiratory genes also continue to increase during Phase V despite the fact that their TL trajectory decline, which can be best explained in terms of the hypothesis that respiratory mRNA levels are controlled by transcript degradation during this time period. It is evident that the TL and TR data presented in Fig. 12.2a, b cannot be adequately accounted for in terms of TR alone but requires taking into account both TR and TD (transcript degradation) on an equal footing (Ji et al. 2009a).

Table 12.1 A summary of the kinetics of the TL (transcript level) and TR (transcription rate) depicted in Fig. 12.2a, b. The upward and downward arrows indicate an increase and a decrease, respectively

5 The IDS-Cell Function Identity Hypothesis: Experimental Evidence

Since RNA levels reflect the dynamic metabolic states of cells (or cell states) resulting from the interaction between two opposing processes – transcription and transcript degradation (see Fig. 12.4) – their maintenance requires free energy dissipation. Consequently, the time-dependent patterns of the changes in RNA levels (i.e., RNA trajectories or RNA waves) qualify as species (or tokens) of intracellular dissipative structures (IDSs) (Ji 1985a, b, 2002b) as already indicated.

S. cerevisiae has the capacity to metabolize both glucose and galactose but prefers the former as the carbon and energy source over the latter. In the presence of glucose, the organism turns on those genes coding for the enzymes needed to convert glucose to ethanol, the phenomenon known as glucose induction, and turns off those genes needed for galactose metabolism, which is known as glucose repression (DeRisi et al. 1997; Johnston 1999; Ashe et al. 2000; Jona et al. 2000; Kuhn et al. 2001). The detailed molecular mechanisms underlying these phenomena are incompletely understood at present (Gasch and Werner-Washburne 2002; Winderickx et al. 2002). When glucose is depleted, S. cerevisiae increases its rate of metabolism of ethanol to produce ATP via the Krebs cycle and mitochondrial respiration (Ronne 1995; Gasch 2002; Winderickx et al. 2002). This metabolic control is exerted by reversing the glucose repression of the genes encoding the enzymes required for respiration (or oxidative phosphorylation). This process is referred to as glucose de-repression (Gasch 2002).

The fact that the trajectory of the average glycolytic RNA molecules decreases (presumably because these transcripts are no longer needed in the absence of glucose as the substrate) while that of the average respiratory RNA molecules increases (presumably because these transcripts are needed to produce the corresponding enzymes to metabolize ethanol left over from previous glucose fermentation and the new substrate, galactose) during the first 3 h in Fig. 12.2 provides a strong experimental support for the notion that the intracellular dissipative structures (e.g., the RNA gradients in the time dimension under discussion) are correlated with cell functions, thus providing one of the first experimental evidences for Step 10 in Fig. 12.4 (or Step 20 in Fig. 2.11). Thus, IDSs reflect metabolic functions (see the opposite changes in the glycolytic and respiratory RNA trajectories in Fig. 12.2a), leading to the hypothesis that IDSs can be employed as reliable molecular signs (or signatures) for the metabolic and functional states of the living cells. This makes RNA trajectories or waves measured with microarrays convenient biomarkers for monitoring the functional state of metabolic pathways in whole cells whose de-regulation can lead to various diseases, including cancer (Watters and Roberts 2006). These observations have motivated me to formulate the IDS-Cell Function Identity Hypothesis as follows:

Fig. 12.5
figure 5

The key steps involved in measuring intracellular RNA levels (R C ) with microarrays. D c  = DNA inside the cell; R c  = RNA inside the cell; R T  = RNA isolated in a test tube; D T  = DNA reverse transcribed from RT inside a test tube; D M  = DNA hybridized to the probe DNA on the surface of a microarray; S = the fluorescence signal measured form DM. 1 = transcription; 2 = transcript degradation; 3 = isolation of RNA from cell (C) into a test tube (T); 4 = DNA synthesis from RNA catalyzed by reverse transcriptase in test tube; 5 = hybridization of DT to DNA probes covalently attached to the microarray (M) surface; 6 = measuring the fluorescence signal from the target DNA hybridized to the probe DNA on the microarray surface; 7 = an RNA molecule affecting the transcription of its own or other genes, either directly (via microRNA, for example) or indirectly through protein synthesis; 8 = an RNA molecule influencing its own degradation either directly as RNA or through protein synthesis; 9 = measurement of cellular DNA after isolation into a test tube

Fig. 12.6
figure 6

The “TL-TR phase diagram”: The plots of the transcription rates (TR) against transcript levels (TL) (in arbitrary units) measured in budding yeast at six time points (1 = 0 min, 2 = 5 min, 3 = 120 min, 4 = 360 min, 5 = 450 min, and 6 = 850 min) after the glucose–galactose shift. Each point represents the average of triplicate measurements (Garcia-Martinez et al. 2004). The plotted genes were randomly chosen out of the 5,725 genes showing no missing values in their triplicate measurements of TL and TR. The notation given on the top of each figure is the name of the open reading frame (ORF) whose transcript was measured

Fig. 12.7
figure 7

The triadic definition of the sign “gene expression”

Fig. 12.8
figure 8

The “unit circle” whose x-axis indicates the changes in TL values (ΔTL) and the y-axis indicates the changes in TR values (ΔTR) of a trajectory in the TL-TR plot (see Fig. 12.6). The direction of the radial arrow coincides with the direction of the component vector (or segment) of the trajectories in the TL-TR plot. The angle of the radial arrow is divided into eight ranges defined as follows: 1 = 357 ~ 3; 2 = 4 ~ 86; 3 = 87 ~ 93; 4 = 94 ~ 176; 5 = 177 ~ 183; 6 = 184 ~ 266; 7 = 267 ~ 273; 8 = 274 ~ 356. Range 9 indicates the situation where neither TL nor TR underwent any measurable changes (Reproduced from Ji et al. 2009a)

Fig. 12.9
figure 9

The frequency distributions of the RNA metabolic modules in budding yeast undergoing the glucose–galactose shift. This histogram is a visual representation of the data shown in Table 12.3. The y-axis records the frequency of the occurrence of individual modules defined by the ranges of the angles of the component segments/vectors of the TL-TR trajectories in Fig. 12.6 and the x-axis lists the nine mechanisms or modules of RNA metabolism (Reproduced from Ji et al. 2009a)

Fig. 12.10
figure 10

Visualization of the patterns of distributions of RNA dissipatons or ribons with ViDaExpert with three different grid resolutions. Three functional groups of transcripts were analyzed – the 46 transcripts involved in oxidative phosphorylation (left column), the 22 transcripts related to glycolysis (the middle column), and the 157 transcripts associated with protein synthesis (right column). The grid resolution increases from 4 to 25 to 64 from the bottom to the top rows

Fig. 12.11
figure 11

The ViDaExpert-enabled visualization of the RNA trajectories (also called RNA waves, RNA dissipatons or ribons) belonging to eight different metabolic pathways of budding yeast undergoing glucose–galactose shift. These patterns of distributions of ribons are functions of at least five parameters as shown in Eq. 12.20

Fig. 12.12
figure 12

The RNA spectra (or ribonic spectra). The patterns of the distributions of ribons such as shown in Fig. 12.1 are represented as what is here referred to as the “RNA spectra”(or “ribonic spectra”) which can be obtained by linearizing the two-dimensional (street-avenue) addresses of the RNAs on the three-dimensional plots (e.g., Fig. 12.11) as node numbers on the x-axis and plotting the frequency on the y-axis

Fig. 12.13
figure 13

A comparison between a molecular spectrum and an “RNA spectrum” (or “ribonic spectrum”). (a) The Raman spectrum of PCl4VOCl4 (tetrachlorophosphonium oxotetrachlorovanadate) indicates the probability of the vibrational transitions within the molecule as a function of the vibrational energies expressed in the units of wavenumbers (Reproduced from Roldán et al. 2004). (b) An example of the “ribonic spectrum” indicating the probability of observing various RNA trajectories (ribons) represented by node numbers

Fig. 12.14
figure 14

The duality of the structure–function relation in biology or the cyclic relation between structure and function

Fig. 12.15
figure 15

Molecular mechanisms underlying genotype–phenotype coupling. 1 = transcription; 2 = translation; 3 = enzymic catalysis; 4 = Output mechanisms (e.g., secretion, chemotaxis, cell shape changes). OPCPs are synonymous with dissipatons (Sect. 3.1), hyperstructures (Sect. 2.4.4), and SOWAWN machines (Sect. 2.4)

Fig. 12.16
figure 16

A molecular model of the genotype–phenotype coupling based on the concepts of dissipative structures and conformons, the key elements of the Bhopalator model of the cell (Ji 1985, 2002b) (see Fig. 2.11). This scheme is consistent with the multilevel representation of the cell depicted in Fig. 9.2

Fig. 12.17
figure 17

The genotypic similarity vs. phenotypic distance (GSvPD) plots of various metabolic pathways. The bottom two panels involves the cytoskeleton RNAs. The 56 RNA molecules in the left bottom panel were unfiltered original data. The 26 RNA molecules in the right bottom panel were selected because their coefficient of variations, defined as (standard deviation/mean) × 100, are less than 50%. Evidently the filtering had little effect on the distribution pattern. To find the diagonal line objectively, five points with the greatest phenotypic distances (i.e., y coordinates) and five points with the greatest genotypic similarity values (i.e., x coordinates) were selected. From these two sets of points, 25 (= 5 × 5) candidate diagonal lines were generated by connecting all possible pairs of x and y coordinates. Then the rest of the points were run through a distance formula to find their distance from each of the 25 diagonals. The 10–15% of the points that are closest to each diagonal are selected and a line of regression through these points is found. The median of the resulting lines of regression is chosen as thetrue” candidate diagonal that contains 80–90% of the points below it (I thank Mr. Kenneth So for developing this algorithm)

Fig. 12.18
figure 18

The plot of the phenotypic distances between pairs of RNA against the genotypic similarities between the corresponding RNA pairs. These RNA pairs belong to four metabolic pathways as indicated in the box (The RNA level data from Garcia-Martinez et al. 2004)

Fig. 12.19
figure 19

The three possible mechanisms giving rise to the difference between two points, X and Y in the GSvPD plot. Mechanism 1 = the distance, ΔXY = ((G2 − G1)2 + (P2 − P1)2)1/2, is determined by the genotypic difference only between RNA pair, A and B. Mechanism 2 = the distance is determined by the phenotypic difference only between the RNA pair. Mechanism 3 = The distance is determined by both the genotypic and the phenotypic differences between the RNA pair

Fig. 12.20
figure 20

A classification of the points on the genotypic similarity vs. phenotypic distance (GSvPD) plot

Fig. 12.21
figure 21

The genotypic similarity vs. phenotypic distance plots (GSvPD) of the RNA pairs belonging to different metabolic groups in the budding yeast were divided into the energy-poor early phase (blue) and the energy-rich late phases (red). The energy-poor phase obtains during the first 120 min after switching glucose to galactose when the energy source glucose is absent and the energy-rich phase obtains between 360 and 850 min when galactose is metabolized to provide energy. The slopes decrease as the budding yeast cells undergo the cell-state transitions from the energy-poor early phase to the energy-rich late phase, except for the heme biosynthesis and oxphos pathways where the slopes either increase or unchanged within experimental error

There is a one-to-one correlation between IDSs and cell functions because IDSs are the immediate driving forces for all cell functions. (12.1)

The question as to what regulates the intracellular levels of RNA is not simple to answer because of the complex interactions taking place among the myriad components of the cell (see Fig. 12.27 for a further discussion). It may well turn out that what ultimately regulates the intracellular concentrations of any metabolites, including RNA molecules, is the living cell itself in interaction with its environment (see Step 9 in Fig. 12.4 or Step 19 in Fig. 2.11) and not any component processes of the cell metabolism such as transcription or translation steps individually, and this conclusion may be viewed as a corollary of the postulate that the cell is the smallest DNA-based molecular computer (Ji 1999a), or the computon (see Row 9 of Table 6.3 in Sect. 6.1.2).

6 The Transcription-Transcript Conflation

The DNA microarray technique can be used to measure either DNA or RNA inside the cell (Fig. 12.5). Measuring DNA is relatively simple, since all that needs to be done in this case is to break the cell membrane, transfer the cellular DNA (see DC in Fig. 12.5) into a test tube (see Step 9), and hybridize it with the DNA (called the probe DNA) covalently attached to the surface of a microarray (see Step 5). But measuring RNA is much more complex, involving at least eight key steps or processes (see Steps 1–8 explained in the legend to Fig. 12.5). In other words, the signal S measured with microarray from an RNA sample isolated from a cell population is a function of at least eight parameters as shown in Fig. 12.5:

$$ {{S }}\infty\, { [}{{{R}}_{\rm{C}}}{]} $$
(12.2)
$$ = {{ K[}}{{{R}}_{\rm{C}}}{]} $$
(12.3)

where ∞ symbolizes a proportionality and K is a function, denoted as f in Eq. 12.4, of at least eight parameters, denoted as Pi, each reflecting the characteristics of one of the eight arrows or Steps 1–8, in Fig. 12.5:

$$ {{K }}\!=\! {{ f(}}{{{P}}_{\rm{i}}}{)} $$
(12.4)

where the index i runs from 1 to 8.

Combining Eqs. 12.3 with 12.4 leads to:

$$ {{S }} = {{ f(}}{{{P}}_{\rm{i}}}{)}[{{{R}}_{\rm{C}}}] $$
(12.5)

Equation 12.5 indicates that the microarray signal S would be proportional to the intracellular RNA levels, if and only if all the Pi values remain constant where the index i runs from 1 to 8. As evident in Fig. 12.5, the eight parameters that connect RC to S can be divided into two groups – (1) the biological parameters, P1, P2, P3, P7, and P8, and (2) what may be called the measurement parameters, P4, P5, and P6. The reproducibility of the measurement parameters from one experiment to another can be readily gauged by repeating a measurement three or more times using the same biological samples. The accuracy and reproducibility of the DNA microarray technique has been improving since its invention in the mid-1990s so that, under ideal conditions, the measurement parameters can be kept constant within 30–50% (as exemplified by the microarray data reported by Garcia-Martinez et al. 2004). When signal S varies more than 30–50% under such experimental conditions, then (and only then) the signal variations observed could be attributed to biological changes occurring in the cell under investigation.

Interpreting the results of the measurement of intracellular RNA levels using the DNA microarray technique even under ideal settings is not simple because it involves at least four biological steps, i.e., Steps 1, 2, 7, and 8, in Fig. 12.5. (See also Fig. 12.22.) It is generally safe to assume that there exists a 1-to-1 correlation between S and RC: When S increases, so does RC, and whenever S decreases, so does RC. But the common error committed by many users of the DNA microarray technique has been to assume that a 1-to-1 correlation exists between RC and the rate of Step 1, namely, the transcription (or gene expression) step. Such an interpretation of the RC level is invalid because RC levels are determined not by Step 1 alone but also by Step 2 or the transcript degradation (Ji et al. 2009a).

Fig. 12.22
figure 22

The hypothesis that a DNA molecule itself is a gene consisting of two subtypes – (1) the RNA-coding structural genes (see 2) and (2) the regulatory genes including promoters, enhancers and silencers (see 1). Structural genes act as templates for transcription (Step 4) catalyzed by RNA polymerase or for replication catalyzed by DNA polymerase (not shown). Regulatory genes, in conjunction with various transcription factors and ATP-driven molecular motors (including DNA gyrase and topoisomerases), are postulated to control the timing of the turning on or off of the expression of structural genes (Ji 1991), thereby contributing to the control of the observed intracellular levels of RNA (indicated by the square brackets) (Step 3) which result from the balance between transcription (4) and transcript degradation (5). It is convenient to distinguish between “visible” genes encoding the “visible” RNA being measured or observed and “invisible” or “hidden” genes encoding the “invisible” RNA molecules that affect the levels of “visible” RNA molecules via regulating the transcription (see Step 7) and the degradation (Step 6) of visible RNAs. It should be noted that visible RNAs can also regulate the levels of invisible RNAs (not shown) thereby indirectly regulating themselves via feedback (not shown) (The figure was drawn with the assistance of my undergraduate student Julie Bianchini in 2009)

Since the mid-1990s when the era of the DNA microarray technology began, cell biologists have been interpreting changes in the RNA levels measured with microarrays almost invariably in terms of transcriptional activation, or more generally called “expression,” of corresponding genes, i.e., increased rate of Step 1 in Fig. 12.5 (Alon et al. 1999; Troester et al. 2004; Rhodes and Chinnaiyan 2004; Tu et al. 2005), ignoring transcript degradation, i.e., Step 2 in Fig. 12.5. This is surprising since it has been known for a long time that RNA molecules are unstable toward degradation (Shapiro et al. 1986; Hargrove and Schmidt 1989; Wang et al. 2002; Yang et al. 2003).

The direct experimental evidence for the critical role that the transcript degradation step plays in determining transcript level (TL) came to light when two groups – Fan et al. (2002) and Garcia-Martinez et al. (2004) – measured both TL (transcript level) and TR (transcription rates) simultaneously using DNA microarrays. Examples of the TL and TR data obtained by the latter group from budding yeast subjected to the glucose–galactose shift are presented in Fig. 12.6, which may be referred to as the “TL-TR phase diagram.”

It is clear that TL and TR do not always change in parallel as most workers in the field have been assuming. The following quotations are typical of over 50 randomly selected papers that have been examined:

Microarrays prepared by high-speed robotic printing of complementary DNAs on glass were used for quantitative expression measurements of the corresponding genes… (Schena et al. 1995). (12.6)

Oligonucleotide arrays can provide a broad picture of the state of the cell, by monitoring the expression level of thousands of genes at the same time … (Alon et al. 1999). (12.7)

DNA microarrays, permits the simultaneous monitoring of thousands of genes … (White et al. 1999). (12.8)

DNA microarrays are used to monitor changes in gene expression levels … expression is estimated by comparing the relative amount of mRNA in two distinct cell populations …. The mRNA levels in control and test samples are … compared and the differential expression data given as a ratio or fold change. (Okamoto 2005) (12.9)

These statements would be correct if the term “genes” (in bolds) are replaced by “RNA levels” or “transcripts.” This is simply because:

DNA microarrays, as normally used, measure RNA levels but not the rates of transcription. (12.10)

The phrase “gene expression” in the context of usual microarray experiments necessarily signifies “transcription” or Step 1 in Fig. 12.5. In other contexts, “gene expression” can mean transcription rate, protein, metabolism, morphology, or any other phenotypes being measured (see Table 12.2). In particular, if (and only if) microarrays are used in conjunction with the genomic run-on protocol (Garcia-Martinez et al. 2004), does “gene expression” mean “transcription” (see Row 2 in Table 12.2).

Table 12.2 The multiple meanings of “gene expression” depending on context

We commonly read Statement 12.11 in journal articles and advertisements or hear in seminars and lectures all over the world:

Microarrays measure gene expression. (12.11)

Statement 12.11 can be true but not always so, since the phrase “gene expression” can mean any of the five different objects listed in the first column of Table 12.2, depending on the context of the experiment performed, leading to the following conclusion:

The meaning of “gene expression” cannot be determined without knowing the experimental protocol employed in microarray experiments. (12.12)

The critical dependence of the meaning of gene expression on protocol as embodied in Statement 12.12 can be represented diagrammatically utilizing the triadic definition of a sign given by Peirce (see Sect. 6.2.1) (Fig. 12.7):

Combining Statements 12.11 and 12.12 leads to:

Microarrays can measure any one of at least five different observables, including transcript levels and transcription rates, depending on the experimental protocol employed. (12.13)

A corollary of Statement 12.13 is:

Microarrays cannot measure both transcript levels (TL) and transcription rates (TR) without using two independent experimental protocols. (12.14)

To many investigators who utilize microarrays, Statements 12.10 and 12.14 may come as a surprise because microarrays have been widely advertised in journals and commercial media as the revolutionary tool for “measuring gene expression,” interpreting “gene expression” arbitrarily as “transcription.” To emphasize its importance, Statements 12.10 or 12.14 may be referred to as the First Law of Microarray Data Interpretation (FLMDI).

A survey of the literature indicates that most investigators employing microarrays routinely violate FLMDI, since:

Most biologists conflate the terms “gene expression” and “mRNA levels,” or “transcription” and “transcript levels”. (12.15)

Statement 12.15 will be referred to as the “transcription-transcript level conflation (TTLC).” The data obtained by Garcia-Martinez et al. (2004) and by Fan et al. (2002) clearly demonstrate that the mixing of these two terms can lead to false positive (Type I) and false negative (Type II) errors in interpreting microarray data (Ji et al. 2009a; Ji and Yoo 2005).

A gene, commonly defined as a DNA segment encoding proteins, is an equilibrium structure that is static (see Sect. 11.2). On the other hand, the level of an RNA molecule transcribed from its gene is a dissipative structure, because its maintenance requires dissipation of free energy. Many studies employing DNA arrays have been making the error of what may be referred to as “the gene-to-transcript misinterpretation,” which would be equivalent to compressing both transcriptomics (the study of the whole set of the transcripts of a genome) and genomics (the study of the whole set of the genes of a genome) onto the same plane. As alluded to above, two types of errors have resulted from the misuse of the phrase, “gene expression.” The Type 1 error (also called the false positive error) is committed when it is claimed that there is something when there actually is nothing; Type 2 error (or the false negative error) is committed when it is claimed that there is nothing when there actually is something. To help users of microarrays avoid making Types I and II errors, I recommend the following rules:

It is impossible to identify a gene as a possible cause of a disease based only on the finding that its mRNA level changed in the diseased state relative to the control state without eliminating the possibility that the change in the mRNA level arose from the changes in transcript degradation rates rather than from changes in the transcription rates. (12.16)

Similarly:

It is impossible to exclude a gene as a possible cause of a disease based only on the finding that its mRNA level did not change in the diseased state as compared to control without eliminating the possibility that the lack of changes in the mRNA level arose from the coincidence of an increased transcription rate and a similarly increased transcript degradation rate or from the coincidence of a decreased transcription rate and a similarly decreased transcript degradation rate. (12.17)

Again, to emphasize the importance of Statements 12.16 and 12.17, they may be referred to as the Second Law of Microarray Data Interpretation (SLMDI) and the Third Law of Microarray Data Interpretation (TLMDI), respectively. Violating FLMDI leads to a false positive error (or Type 1 or α error), and violating SLMDI or TLMDI results in a false negative error (or Type 2 or β error).

In summary, I have formulated three laws of microarray data interpretation (MDI) in this section, i.e., Statements 12.10, 12.16, and 12.17. It is truly surprising to find that, even after almost one and a half decade following the invention of one of the most revolutionary experimental techniques in biology, namely, DNA microarrays, many workers in the field are still committing Types I and II errors in interpreting the data measured with this technique. In December, 2009, I had opportunities to attend two meetings, the Regulatory Genomics, Systems Biology and DREAM 2009 held at the Broad Institute in Cambridge and the 102nd Statistical Mechanics Conference held at Rutgers, Piscataway. Having observed that several prominent participants in these meetings violated one or more of the laws of MDI described above, I was prompted to write several emails. I am taking the liberty of attaching two of these emails and related documents as Appendices C–F at the end of this book in the hope of stimulating worldwide discussions on ways to avoid misinterpreting microarray data, since misinterpreting microarray data can have far-reaching consequences in both basic and applied researches in cell biology, affecting drug discovery efforts in pharmaceutical industry and personalized medicine (Chaps. 18, 19)).

7 The Mechanistic Modules of RNA Metabolism

Each plot or trajectory in Fig. 12.6 can be divided into five segments or “component vectors” bound by two of the 6 time points labeled 1 through 6: First segment between 0 and 5 min measured after the glucose–galactose shift, the second segment between 5 and 120 min, the third segment between 120 and 360 min, the fourth segment between 360 and 450 min, and the fifth segment between 450 and 850 min. Each segment can be characterized in terms of the angle measured counterclockwise starting from the positive x-axis (see Fig. 12.8). For example, the segment between time points 5 and 6 in Fig. 12.6a is approximately 45° and that between time points 1 and 2 in Fig. 12.6b is approximately 225°, etc. These angles are conveniently divided into eight groups as explained in the legend to Fig. 12.8. Each group is associated with a distinct mechanism of RNA metabolism. For example Group 2 (with angles in the range between 4° and 86°) is associated with the mechanism in which both TL and TR increase (see the radial arrow in Fig. 12.8). In contrast, Group 8 (with angles in the range between 274° and 356°) is associated with mechanism in which TL increases despite the fact that TR decreases, etc. Thus, it is logical to equate these groups with their underlying mechanisms (or modules) of RNA metabolism. We may refer to such modules as “mechanistic modules of RNA metabolism,” “mechanisms of RNA metabolism,” or “modules of RNA metabolism.” These terms are related to “ribonomics” defined by Keene (2006) as the genome-wide study of RNA metabolism in cells.

The five angles characterizing the trajectory of each of the 5,725 genes were calculated from their TL and TR values as follows. The angle Θ determining the direction of the segment from the ith time point to the (i + 1)th time point in a TL vs. TR plot with coordinates (xi, yi) and (xi+1, yi+1), respectively, was calculated from the relation Θ = tan−1 [(yi+1 − yi)/(xi+1 − xi)] + α , where α = 0° if both the numerator and the denominator are positive, α = 180° if either the numerator is positive and the denominator is negative or both the numerator and the denominator are negative, and α = 360° if the numerator is negative but the denominator is positive. From the set of 5,184 pairs of TR and TL data measured at six time points, we calculated a total of 5 × 5,184 = 25,920 angles distributed over the eight modules, each over five different time segments. The results are given in Table 12.3 and Fig. 12.9 (Ji et al. 2009a).

Table 12.3 The frequency distributions of the eight modules or mechanisms of RNA metabolism as the functions of the five time periods following the glucose–galactose shift. If the angles are homogeneously distributed over 360°, the expected distributions can be calculated as (6/360) × 100 = 1.67% for Mechanisms 1, 3, 4, and 7 and (84/360) × 100 = 23.3% for Mechanisms 2, 4, 6, and 8 (see the seventh row). The p-values for the difference between the observed and the expected distributions are given in the last row. The differences are all significant, except for Mechanism or Module 5 (Reproduced from Ji et al. 2009a)

The mechanisms of interaction between TL and TR that are associated with the nine modules appearing on the x-axis of Fig. 12.9 are listed in Table 12.4. Since there are three logical possibilities for the changes in TL and TR, namely, increase (+), no change (0) or decrease (−), there are a total of nine possible changes that can be assigned to the combined system of TL and TR, and these possibilities are listed in the fourth column in Table 12.4. The third column in Table 12.4 indicates the range of changes for transcript degradation rates, TD, that are inferred from the associated changes in TL and TR.

Table 12.4 The nine mechanisms of interactions among transcript level (TL), transcription rate (TR), and transcript degradation rate (TD). ΔTR is + when the transcription rate is increased and – when the transcription rate is decreased. ΔTD is + when the transcript degradation rate is increased and – when the transcript degradation rate is decreased. The numerical labels of the Mechanisms in the fourth column are defined in Fig. 12.8

The first three mechanisms reflect the case where the transcript level is increasing. Mechanism 2 is activated (or realized) when TL increases due to TR increasing more than TD. Mechanism 1 is activated when TL increases due to decreasing TD with no change in TR. Mechanism 8 is activated when TL increases due to TD decreasing more than TR. Similar explanations can be provided for the remaining six cases. Please note that there is a 180° rotational symmetry in the arrangement of the relational signs in the third column in the table, i.e., > > >, = = =, < < <, which indicates that the mechanistic explanations given in the table are logically coherent.

As evident in the second and third columns of Table 12.4, each of the nine possible mechanisms described entails a unique relation between the variations in transcription rates, ΔTR, and that of the transcript degradation rates, ΔTD. Such relations cannot arise from random interactions between these two processes, thus leading to the following conclusion:

There exist mechanisms in living cells that control the interaction between transcription and transcript degradation rates. (12.18)

The enzyme system catalyzing transcription is known as transcriptosome (Halle and Meisteerernst 1996), and that catalyzing transcript degradation is referred to as degradosome in bacteriology which is here commandeered to represent transcript degradation in all cell types. Thus, the second column in Table 12.4 expresses the direction of the absolute changes in the activity of transcriptosome while the third column indicates the relative changes in the activities of transcriptosome and degradosome. The time-dependent patterns of the changes in TL (i.e., the RNA trajectories or waves) that result from controlled interactions between transcriptosome and degradosome will be referred to as ribons. Thus, Statement 12.18 can be interpreted as the prediction of the existence of ribons as both the RNA trajectories in the living cell and as the cooperative or coordinated system of transcriptosome and degradosome underlying such trajectories:

$$ {\mathbf{Ribons}} { } = {\mathbf{Transcriptosome}} { } + { }{\mathbf{Degradosome}} $$
(12.19)

Defined in this manner, ribons are examples of dissipative structures or dissipatons, since ribons cannot exist without yeast cells dissipating free energy. Ribons are also examples of SOWAWN machines (Sect. 2.4.4) composed of transcriptosome and degradosome. Unlike transcriptosome and degradosome which can be isolated and purified, ribons cannot be isolated just as the flame of a candle cannot be isolated and studied. Interestingly, the term ribons tend to emphasize the shapes of RNA trajectories (i.e., kinematics) and dissipatons highlights the free energy cost of maintaining such trajectories (i.e., dynamics). Thus, ribons embodies two complementary aspects – kinematics and dynamics (see Sect. 2.3.5).

Since there are obviously many different dissipatons (RNA trajectories, protein trajectories, metabolite trajectories, ion gradients, cytoskeletal stress gradients, cell migratory path, etc.) (i.e., dynamics) (see Sect. 2.3.5), it would be necessary to have a means to differentially represent them as different species of dissipatons. One such method would be to attach a prefix to different kinds of dissipatons. For example, ribons may be referred to as “RNA-dissipatons” and the ion gradients across the cell membrane as “ionic dissipatons,” etc.

8 Visualizing and Analyzing RNA-Dissipatons

8.1 ViDaExpert

The concept of dissipative structures (or dissipatons) has been around in the scientific literature for more than three decades (Prigogine 1977, 1980); Kondepudi and Prigogine 1998; Kondepudi 2008), but the experimental methods for studying them in the context of cell biology had been limited until the ViDaExpert became available about a decade ago (Zinovyev 2001; Gorban and Zinovyev 2004, 2005). My experience in analyzing budding yeast transcriptome (i.e., the genome-wide RNA metabolism) with this method (see below) induces me to speculate that ViDaExpert (and equivalent computer programs) may well turn out to be for the cell biology in the twenty-first century what the X-ray crystallography was to the molecular biology in the twentieth century. This speculation is based on one assumption – Cell biology is (and should be) mainly concerned with the study of dissipative structures (dissipatons) in contrast to molecular biology which has mainly been the study of equilibrium structures (equilibrons).

ViDaExpert is a stand-alone software that is freely available online at http://bioinfo-out.curie.fr/projects/vidaexpert/.ViDaExpert. It is a unique software tool for visualizing multidimensional datasets that was developed by A. Zinovyev in 2001 as his Ph.D. thesis under the supervision of the mathematician, A.N. Gorban, then at the Institute of Computational Modeling at the Siberian Branch of the Russian Academy of Science at Krasnoyarsk, Russia. The following description of ViDaExpert is largely based on the lecture that Zinovyev gave at Rutgers in 2006 and on the lecture slides that he generously made available to me.

ViDaExpert analyzes a finite set of objects in a multidimensional space endowed with some way of defining the distance (metrics) among the objects. ViDaExpert utilizes a form of the principal component analysis. One of the simplest objects that can be embedded in data space is a line that is aligned in the direction of a maximal dispersion of data. Such a line is referred to as the first principal component or axis. The second principal component can be calculated as the line passing through the middle of the first principal axis at a 90° angle, and the third principal component can be calculated as the line going through the intersection of the first and second principal axes at 90° to both, and so on to calculate higher principal components (Zinovyev 2006).

The principal component can be viewed as a generalization of the concept of the mean. The concept of the mean can be expressed in terms of a point, a set of points, or even an object with an arbitrary topology. The mean denoted as < X > is defined as the sum of all the values, X i , from i = 1 to m, divided by m, the number of the points or objects in the set. As a generalization of the mean value, we can define the mean point as a point which minimizes a functional, the sum of the squared distances between data points and the mean point. This definition is very general. Instead of the points used in K-means clustering, we can use any object or several objects which can be aligned in such a way as to make it the principal object or let it minimize the sum of the squared distances from data points to the object. After finding the principal object, we can project data points onto the surface of the object. When data points are so projected, we are in fact making a transition between two spaces – from data points in a high-dimensional space to a lower-dimensional space of the principal object (Zinovyev 2006).

The principal object (also called principal manifold or principal grid) is rather rigid. But ViDaExpert constructs a flexible principal object. To accomplish this goal, Zinovyev and Gorban employed the elastic net (Zinovyev 2006).

For simplicity, it is usually assumed that the stretching and bending coefficients are equal for all edges and ribs. This leaves only two parameters to be manipulated in constructing the principal manifold using ViDaExpert. The first parameter restricts the total length, or area, or the volume of the principal manifold. The second parameter tends to smooth out the topology of the manifold. One important point is that the energy functionals are all quadratic which means that they can be optimized in one step, solving a system of linear equations. And this makes ViDaExpert fast, in fact, one of the fastest methods now available to construct optimal principal manifolds (Zinovyev 2006).

There are two ways of projecting data points to the principal grid (i.e., the grid approximating the topology of the principal manifold calculated by ViDaExpert).

One way is to project data point to the nodes that are closest to them. The other is to project data points perpendicularly to the closest surface on the principal manifold. In analyzing the TL data from budding yeast using ViDaExpert, the “closest node” method of approximating the principal manifold was employed.

8.2 Ribonoscopy: Looking at RNA

In the following discussions, I will distinguish between RNA molecules which are equilibrium structures (i.e., equilibrons) and RNA trajectories or waves which are dissipative structures (i.e., dissipatons) by using two different stems – ribo- referring to the former and ribono- to the latter. Thus, “ribonoscopy” will denote the study of the time-dependent RNA concentrations in the cell (i.e., RNA trajectories, ribons or RNA waves) using DNA microarrays and computer-assisted analysis of microarray data, i.e., RNA waves.

About 1,000 genes were selected out of a total of over 6,000 genes whose transcripts were measured with DNA arrays in budding yeast after switching the nutrient glucose to galactose (Garcia-Martinez et al. 2004). These genes were selected for analysis because their transcript levels showed pronounced changes induced by the nutritional shift. The data set under consideration consists of a table with ~1,000 rows (each labeled with the name of the gene encoding the transcript involved) and six columns representing the time points of measurements, i.e., 0, 5, 120, 360, 450, and 850 min after the nutritional shift. Thus, one gene is associated with a set of six numbers, each representing the average of the triplicate measurements of the transcript level of the gene measured at one of the six time points. We can represent these data points in an abstract six-dimensional mathematical space (to be called the RNA concentration space), each axis representing one of the six time points of measurements. In this six-dimensional space, one point is equivalent to six numbers, which can be represented as a vector emanating from the origin of the six-dimensional concentration space and ending at the point whose coordinate is specified by the six numbers. The position of a point in the concentration space represents an RNA trajectory in the concentration-time graph (see Fig. 12.1 as an example) and hence encodes the shape information of such a trajectory. In other words, differently shaped RNA trajectories will occupy different positions in the RNA concentration space.

When we inputted our six-dimensional data (typically 104 numbers) to the ViDaExpert program, we obtained a table of numbers indicating the frequencies (or probabilities) of individual RNA molecules (their names appearing at the beginning of each row) exhibiting characteristic kinetic behaviors (or node numbers) appearing at the top of each column. This result of the ViDaExpert analysis can be graphically represented in two ways – as a three-dimensional plot (Figs. 12.10, 12.11) or a two-dimensional plot (Fig. 12.12).

Focusing on the uppermost row of the three-dimensional plot in Fig. 12.10, what ViDaExpert has done is to project the original data points in the six-dimensional RNA concentration space to the two-dimensional principal grid, i.e., the plane with x and y coordinates. The columns with different heights standing on the xy-plane indicate the numbers of the transcripts clustering at different nodes. The addresses of the columns on the grid reflect the different shapes of the RNA trajectories (i.e., different kinetic behaviors of individual transcripts observed over the 850-min time period), since the different shapes of the average RNA trajectories of the glycolytic and oxidative phosphorylation genes shown in Fig. 12.2a are transformed by ViDaExpert into different distributions of the columns on the principal grid. (Compare the left and the middle panels on the uppermost row in Fig. 12.10.)

Turning to one of the three columns, say, the left column, in Fig. 12.11, we can see the effects of changing the grid number from 4 to 25 to 64 (or n = 2, 5, and 8) on the pattern of distributions of transcripts on the principal grid. As one increases the grid number, the columns tend to get fragmented into smaller ones but their characteristic pattern of clustering seems to be retained. In general, increasing the grid number (which is equivalent to increasing the number of clusters in K-means clustering) is akin to increasing the resolution power of a microscope with which the clusters of the transcript trajectories in the six-dimensional space are viewed.

Figure 12.11 shows the RNA trajectories (i.e., ribons) belonging to a set of eight different metabolic pathways or functions – glycolysis (22 genes), protein degradation (15 genes), protein folding (11 genes), protein synthesis (156 genes), secretion (26 genes), sterol metabolism (11 genes), transcription (15 genes), and RNAs with unknown functions (294). The number of nodes in the principal grid (i.e., grid resolution) is fixed at 64. The visual inspection of these plots clearly demonstrates that the RNA trajectories (i.e., RNA dissipatons or ribons) belonging to different metabolic pathways/functions are distributed in distinct ways on the principal grid. It has been found that these patterns of distribution of ribons are sensitive to small variations (typically from 0.01 to 0.1) of the elastic coefficients, λ and μ. Thus, we can express the Pattern of the Distribution of Ribons (PDR) associated with a metabolic function, MF, on the principal grid with node n, stretching coefficient λ, and bending coefficient μ as in Eq. 12.20:

$$ {{PDR }} = {{ f(n}},{ }\lambda, { }\mu, \!{{ MF}},\!{{ EC)}} $$
(12.20)

where f is a function or a set of rules, and EC stands for the “experimental or environmental conditions” under which observations are made such as the glucose–galactose shift or normal vs. tumor tissues, etc.

Equation 12.20 can be interpreted as the ViDaExert-enabled visualization of the metabolic pathways in cells in terms of ribons under a given observational condition. The PDR defined by Eq. 12.20 may provide a useful method for analyzing microarray data with the goal of identifying pathway-dependent or pathway-specific biomarkers that are the focus of intense current attention among workers in the field of DNA array technology, because they possess the potential for facilitating the discovery of drug targets for various diseases and for providing diagnostic and pharmacotherpeutical tools for personalized medicine (Clarke et al. 2004; Burczynski et al. 2005; Watters and Roberts 2006; Boyer et al. 2006; Sears and Armstrong 2007; Dobbe et al. 2008) (see Chaps. 18 and 19).

Examples of the two-dimensional displays of the ViDaExpert results are shown in Fig. 12.12. In general, the kinetic patterns of the RNAs belonging to different metabolic pathways appear more clearly distinguishable in the two-dimensional counterparts (Fig. 12.12) than in the three-dimensional plots (Fig. 12.11). For example, Panels c and d or e and g in Fig. 12.11 are less easily distinguished than their two-dimensional plots in Fig. 12.12. Figure 12.12 clearly demonstrates that each metabolic pathway exhibits a unique RNA spectrum. When the node number is increased to 100 (as compared to 64 in Fig. 12.12), the pathway-specific features have been found to become even more readily distinguishable (data not shown). Also, the shapes of the RNA spectra have been found to change dramatically when RNA trajectories are measured in budding yeast with microarrays under different experimental conditions, for example, under nitrogen-deficient condition during alcoholic fermentation (Mendes-Ferreira et al. 2007) (data not shown). In other words, the RNA spectra (or ribonic spectra) described in this book for the first time (Fig. 12.12) are both pathway-specific and cell state-specific, thus suggesting the possibility that RNA spectroscopy (or ribonoscopy) may be employed as a sensitive experimental tool for characterizing living cells in normal and diseased states (see Chaps. 18 and 19 for further discussions).

The two-dimensional plots in Fig. 12.12 are strikingly similar to molecular spectra, an example of which is given in Panel a in Fig. 12.13 along with a ribonic spectrum in Panel b. The molecular spectrum in Panel a (Roldán et al. 2004) depicts the probability of exciting certain vibrational motions in the inorganic molecule, tetrachlorophsophonium oxotetrachlorovanadate, as a function of excitation energy expressed in wavenumbers. This similarity motivated invoking the term “RNA spectra” introduced in Fig. 12.12.

The concept of “RNA spectra” (or “ribonic spectra”) is compared with that of molecular spectra in Table 12.5. One interesting difference between them is that molecular spectroscopy studies the equilibrium structure of molecules whereas “RNA spectroscopy” or (“ribonoscopy”) studies the dissipative structures comprised of the time-dependent RNA concentrations or RNA waves (i.e., ribons) participating in a common metabolic function (or pathway) (see Row 2, Table 12.5). Molecular spectroscopy allows investigators to probe the internal energy levels of a molecule available for electronic, vibrational, and rotational excitations. Analogously, it is here postulated that ribonoscopy allows cell biologists to investigate the internal structures of the cell consisting of the functional connections (encoded in the genome) among individual RNA molecules (1), RNA pairs (2), and systems of RNA molecules (>2) participating in various metabolic pathways. By “functional connections among individual RNA molecules,” I mean, for example, the connection between an identical RNA molecule at two time points, i.e., a temporal autocorrelation.

Table 12.5 A comparison between molecular spectra and ribonic spectra

It is interesting to note that the x-axis of molecular spectra encodes energy levels expressed in terms of wavenumbers, whereas the x-axis of ribonic spectra encodes information specifying the shape of RNA trajectories, waves, or ribons expressed in terms of node numbers: Just as wavenumbers imply the energy of molecular motions, so node numbers carry the information about RNA trajectories (see Row 4, Table 12.5). The y-axis of molecular spectrum measures the probability of the transitions among the electronic, vibrational, and/or rotational levels within molecules. In contrast, the y-axis of RNA spectrum measures the number of RNA molecules whose trajectories have similar shapes, regardless of the underlying causes. In other words, molecular spectroscopy deals with dynamics, and RNA spectroscopy (or “ribonoscopy”) is concerned with kinematics (see Row 8, Table 12.5), the two subfields of mechanics that are complementary to each other (Sect. 2.3.5) (Murdoch 1987; Plotnitsky 2006).

The methodology of molecular spectroscopy is based on spectrophotometers that can produce molecular spectra in most cases without having to rely on computers or mathematical analysis (Row 6, Table 12.5). The methodology of ribonoscopy, however, depends not only on DNA microarrays invented in the mid-1990s (Sect. 12.1) but also on computer-based visualization techniques that reduce high-dimensional microarray data (e.g., six in the case of the data displayed in Figs. 12.1, 12.11) to low dimensions (e.g., to three in Fig. 12.11 and two in Fig. 12.12) utilizing the mathematical procedure of principal component analysis (Gorban and Zinovyev 2004). Therefore, just as the invention of spectrophotomers (i.e., the device measuring the absorption or emission of photons by molecules as functions of wavenumbers or wavelengths) in the nineteenth century led to the emergence of a vast field of “molecular spectroscopy,” so I am here predicting that the combination of DNA microarrays and computer software implementing the mathematics of principal component analysis such as ViDaExpert (Sect. 12.8.1) will give rise to what is here called “RNA spectroscopy” or more briefly “ribonoscopy” as indicated in Eq. 12.21 and exemplified in Fig. 12.12. Finally, just as the interpretation of molecular spectra requires applying the concepts, laws, and principles of quantum mechanics, so the correct interpretation of ribonic spectra is predicted to require applying a comprehensive molecular theory of the living cell, such as the one developed in this book:

$$ {\mathbf{Ribonoscopy}} = {\mathbf{DNA Microarrays}} + {\mathbf{Principal Component Analysis}}\,\,({\text{implemented by}},{\text{e}}.{\text{g}}., { }ViDaExpert) $$
(12.21)

The practical applications of ribonoscopy in pharmaceutical science and medicine are discussed in Chaps. 18 and 19.

8.3 Ribonics: The Study of Ribons with Ribonoscopy

The ribonic spectra of metabolic pathways, some examples of which are being shown in Fig. 12.12, can be analyzed in a tabular form, to be referred to as the “ribonic matrix” (see Table 12.6). The rows in the table represent m different RNA molecules encoded by genes, less than m in number due to alternative splicing (Myer and Vilardell 2009; Will and Lűhrmann 2006), and the columns represent n metabolic pathways (n numbering around 200 in the budding yeast cell). The interior of the matrix contains the node numbers, Ni, where an RNA molecule is located in a given ribonic spectrum, where Ni is the ith node in the r × r principal grid (see Sect. 12.8.1) with r ranging from 5 to 20. In Table 12.6, r = 10 and N = 100.

Table 12.6 The “ribonic matrix” for characterizing cell state. Pi = the ith pathway, where i runs from 1 to n, where n is ~200 in budding yeast; Rj = the jth RNA molecule, where j runs from 1 to m, where m is ~6,300 in budding yeast; Ni = the ith node number defined in Fig. 12.12, where i runs from 1 to r2 where r is the linear size of the principal grid; α/m = the average number of RNA molecules participating in a metabolic pathway, where α is the sum of the numbers in the last row and m is the number of the RNA molecules with known functions, and β is the sum of the numbers appearing in last column, and β/m is the average number of the metabolic functions carried out by an RNA molecule in budding yeast

The following features of the “ribonic matrix” are noteworthy:

  1. 1.

    If a given RNA molecule participates in more than one metabolic pathway, the number appearing in the last column of the ribonic table will be greater than 1.

  2. 2.

    The number of different RNA molecules (i.e., with different ORF’s) participating in a given metabolic pathway appears in the last row of the table.

  3. 3.

    Of the total of more than 6,000 RNA molecules, about 4,000 RNA molecules have known functions which number about 200. Hence the average number of RNA molecules supporting one function or one pathway is α = 4,000/200 = 20.

  4. 4.

    Although the entities on the horizontal and the vertical margins of the table are independent of experimental perturbations Y, the node numbers in the interior (yellow shading) of the table are sensitively dependent on Y, which would make the ribonic matrix a useful tool for characterizing cell states, both normal and diseased (Chaps. 18, 19).

  5. 5.

    Based on the frequency of occurrence of each node in the interior of the ribonic matrix, a histogram can be generated by plotting the frequency of a node occurrence (which is equal to the number of RNA molecules occupying that node) as a function of node numbers. Such a histogram will be referred to as the “total ribonic spectrum” (TRS) of known RNAs. The “total ribonic spectrum” of unknown RNAs is given in Panel h in Fig. 12.12. By comparing these two kinds of “total” ribonic spectra generated with a variety of different ViDaExpert parameters (e.g., different stretching and bending coefficients and principal grid sizes), it may be possible to identify the biological functions of unknown RNAs (see Chaps. 18 and 19).

9 Structural Genes as Regulators of Their Own Transcripts

Completing the sequencing of the human genome in 2003 was not the end (as many might have thought) but only the beginning of our long journey toward understanding the functioning of the genome and hence the living cell on the molecular level. From a nonequilibrium thermodynamics perspective (Prigogine 1977, 1980; Kondepudi and Prigogine 1998; Kondepudi 2008), we can readily identify the nucleotide sequences of the human genome as equilibrium structures or equilibrons and their biological functions as dissipative structures or dissipatons (see Sect. 3.1). Functions are dissipatons because functions imply processes and processes entail the dissipation of free energy (see Sect. 6.2.11). So to completely understand how the human genome functions, it is necessary to elucidate how free energy derived from chemical reactions (e.g., oxidation of glucose to CO2 and water, ATP hydrolysis) is combined with information encoded in DNA to effectuate various biological functions of the cell. In this view, the next major step in the Human Genome Project must include a complete elucidation of the molecular mechanisms underlying genotype–phenotype coupling.

It is a truism to state that structures determine functions over the ontogenetic time scale (seconds to years), but functions select structures over the phylogenetic time scale (decades to billions of years). We may refer to this fact as the duality of structure–function relations in biology, or the cyclic relation between structure and function in biology which may be represented schematically as shown in Fig. 12.14:

How functions select structures seems well understood in terms of the current evolutionary theories rooted in the environment-initiated selection of the fittest reproducing systems among varieties of organisms made available by mutations and other novelty-generating mechanisms. However, the molecular mechanisms underlying the causal relation between structure and function on the molecular level, i.e., the problem of genotype–phenotype coupling, or the question as to how genes control cell functions, are as yet poorly understood.

During the past century, we have learned a great deal about how genes control cell functions, which may be summarized as shown in Fig. 12.15:

The overall processes of the genotype–phenotype coupling can be divided into four distinct subprocesses (Fig. 12.15). Of these four, the first three processes (transcription, translation, and enzymic catalysis) are relatively well understood, but the fourth process connecting OPCPs to cell functions is not yet well known, because it is difficult to study OPCPs due to the paucity of appropriate experimental techniques. That OPCPs do occur inside the living cell is now beyond doubt. One of the first clear demonstrations of OPCPs was published by D. Sawyer et al. in 1985 as already mentioned in Sect. 3.1.2. Unlike the intracellular calcium ion gradients in human neutrophils measured by Sawyer et al. (1985), which are chemical concentration gradients in the three-dimensional Euclidean space (requiring x, y, and z coordinates for specification), the time-dependent intracellular RNA levels such as those measured by Garcia-Martinez et al. (2004) in budding yeast undergoing glucose–galactose shift exemplify a chemical concentration gradient in the time dimension. That is, the intracellular concentrations of RNA molecules can change with time – rising or falling within minutes to hours, depending on the functions of RNA molecules involved. What was most significant was that these time or temporal gradients of RNA levels are associated with activation or inhibition of select cell functions (e.g., glycolysis and oxidative phosphorylation, see Panel a in Fig. 12.2), thereby linking OPCPs to cell functions. This observation has led to the formulation of the IDS-Cell Function Identity Hypothesis given in Statement 12.1.

So we have two examples – one involving “spatial gradients” of chemical concentrations and the other “temporal gradients” – that demonstrate the causal relation between OPCPs and cell functions. These observations provide the empirical basis for the postulate that what drives the cell functions are OPCPs. It is important to keep in mind that OPCPs in Fig. 12.15 cannot exist without continuous dissipation of free energy and hence are examples of dissipative structures or dissipatons. Thus, we can replace the operationally defined OPCPs in Fig. 12.15 with the thermodynamics-based concept of “intracellular dissipative structures” or IDSs (Sect. 3.1) as shown in Fig. 12.16.

Processes 1, 2, and 3 are catalyzed by enzymes and Process 4 is suggested to be an identity relation (see Statement 12.1 in Sect. 12.5). Since functions of enzymes are postulated to be driven by internal mechanical stresses localized in sequence-specific sites referred to as conformons (Chap. 8 and Sect. 11.4), it can be concluded that Processes 1–3 and 5–7 are all driven by conformons generated within enzymes catalyzing exergonic chemical reactions. Thus, it may be concluded that the Bhopalator model of the cell provides molecularly realistic mechanisms for effectuating the genotype–phenotype coupling.

The RNA trajectories measured by Garcia-Martinez et al. (2004) provide indirect experimental evidence that structural genes can contribute to regulating the intracellular levels of their own transcripts. This novel idea is presented below.

Each of the intracellular RNA trajectories (i.e., ribons) such as shown in Fig. 11.6 carries two kinds of information – (1) the name of the gene (or the open reading frame, ORF) encoding the RNA molecule whose concentration is being measured, and (2) the time-dependent change in the intracellular concentration of the RNA (i.e., the ribons). The former can be represented in the N-dimensional sequence (or genotype) space, where a point represents an N nucleotide-long RNA molecule, and the latter in the six-dimensional concentration (phenotype) space, wherein a point denotes the ribon or the kinetic trajectory of an RNA molecule measured over the six time points. Thus, for any pair of RNA molecules, it is possible to calculate (1) the genotypic similarity as the degree of the overlap between the pair of nucleotide sequences (using the ClustalW2 program on line (Chenna et al. 2003)), and (2) the phenotypic distance as the Euclidean distance between the corresponding two points in the six-dimensional concentration space. When the phenotypic distances of a set of all possible RNA pairs (numbering n(n − 1)/2 where n is the number of RNA molecules belonging to a given metabolic pathway such as glycolysis and oxidative phosphorylation) were plotted against the associated genotypic similarities, the results shown in Fig. 12.17 were obtained. To facilitate comparisons, several functional groups of RNA molecules are plotted in one graph in Fig. 12.18.

One of the most unexpected observations to be made in these plots is that most, if not all, of the points belonging to a given function or metabolic pathway lie below a line with a characteristic negative slope (see Table 12.7). We will refer to this phenomenon as the “triangular distribution of the genotype similarity vs. phenotype distance (GSvPD) plots.” This triangular distribution indicates that structural genes have an effect on the intracellular levels of their own transcripts (but it is impossible to predict the phenotype based on genotype, see below), because, if structural genes had no effect at all on their transcript levels inside the cell (as currently widely believed by most molecular biologists), the distribution of the points on the GSvPD plot should be random and hence cannot account for the triangular distributions observed. On the other hand, if structural genes had a complete control over their intracellular transcript levels, all the points should lie along the diagonal line, but only a very small fraction of the points actually lie close to it. More than 95% of the points in Figs. 12.17 and 12.18 are contained in the region below the diagonal line. So the relation between genotype and phenotype as revealed in the GSvPD plots contains some regularities but these regularities are unpredictable, leading to the conclusion that the genotype–phenotype relation is stochastic or quasi-deterministic (see Glossary) (Ji et al. 2009b).

Table 12.7 The varying degrees of the efficiency of self-regulation by structural genes. n = the number of RNA molecules or ORFs

A greater absolute slope in the GSvPD plot indicates a greater variation in (or smaller control on) phenotypes for a given genotypic variation (see Figs. 12.17, 12.18). Thus, a greater absolute slope of a GSvPD plot can be interpreted as an indication of a smaller effect of structural genes on the intracellular concentrations of their transcripts, leading to the suggestion that the inverse of the absolute value of the slope of a GSvPD plot may be employed as a quantitative measure of the self-regulatory power of structural genes (SRPSG):

$$ {{SRPSG}} { } = {{ |Slope of GSvPD plot}}{{|}^{{ - {1}}}} $$
(12.22)

On the basis of Eq. 12.22 and the data given in Fig. 12.17 and Table  12.18 , it may be concluded that the structural genes of the glycolytic pathway have a lesser self-regulatory power than those of the oxidative phosphorylation pathway. Alternatively, it may be stated that glycolytic genes are more “other-regulated” than oxphos genes, “other-regulated” meaning the opposite of “self-regulated,” i.e., regulation by DNA regions other than structural genes such as promoters, enhancers, and silencers.

It is important to keep in mind that the points in the GSvPD plot such as Figs. 12.17 and 12.18 represent differences between two sets of numbers, ΔGAB and ΔPAB, where ΔGAB is the genotypic difference between RNA molecules A and B, and ΔPAB is the phenotypic difference between the same RNA pair. So when we compute the distance, ΔXY, between two points, X (G1, P1) and Y (G2, P2), in the GSvPD plot using the Pythagorean formula, there are three distinct mechanisms by which the difference can arise as explained in Fig. 12.19.

However, in order to simplify the argument, it will be assumed in this book (e.g., Fig. 12.17) that the mechanism underlying the metric (i.e., distance measurement) in the GSvPD plot is due to Mechanism 3 only. If the three mechanisms defined in Fig. 12.19 all have an equal probability of being realized, any conclusion made on the basis of the simplifying assumption would have approximately 33% of being correct. When more information about the mechanism of interactions among RNA pairs is available and taken into account, this probability could be increased toward unity.

10 Rule-Governed Creativity (RGC) in Transcriptomics: Microarray Evidence

The points in the GSvPD plots, e.g., Figs. 12.17 and 12.18, can be divided into four groups as explained in Fig. 12.20. More than 95% of the RNA pairs belong to the self-regulatory group, and only less than 5% belong to the “other-regulatory” group. In other words, during the glucose–galactose shift, most of the structural genes of the budding yeast cells contribute to regulating their own transcript levels, and this self-regulatory fraction of structural genes may vary depending on the environmental conditions under which RNA levels are measured.

Group A comprises the RNA pairs whose coordinates lie above the diagonal line, and thus their intracellular concentrations are controlled by factors other than their structural genes. These RNA pairs belong to the group of what will be referred to as the “other-regulatory” or “other-regulated” genes, meaning that these genes are regulated by other genes or regulatory DNA regions including promoters, enhancers, and silencers (see Fig. 12.17). Group B represents the RNA pairs whose coordinates lie below the diagonal line. The intracellular concentrations of these RNA molecules are under the control of their structural genes so that none of their differences lie above the diagonal. These RNA pairs belong to the group of “self-regulatory” or “self-regulated” genes. Group C consists of the RNA pairs whose coordinates lie along horizontal lines, indicating that their intracellular concentrations are similar despite the fact that their structural genes are different. We will refer to this behavior as the genotypic freedom with phenotypic constraint, which may be viewed as the molecular counterpart of (or as ultimately responsible for) the phenomenon of convergent evolution (see Glossary) on the macroscopic scale. Group D contains the RNA pairs whose coordinates lie along vertical lines, indicating that their intracellular concentrations can vary over a wide range despite the fact that their structural genes are similar. This behavior may be referred to as the phenotypic freedom with genotypic constraints, which may be analogous to (or ultimately responsible for) the phenomenon of divergent evolution (see Glossary) on the macroscopic scale.

Group B RNA pairs may seem paradoxical in the sense that they contain both Groups C and D that exhibit no correlation between genotypes and phenotypes, thus embodying the phenomena of the genotypic freedom with phenotypic constraints and the phenotypic freedom with genotypic constraints, respectively. One possible explanation for these seemingly paradoxical observations is that the intracellular levels of Group B RNA pairs are controlled not only by their own structural genes but also by other genes such as those encoding transcription factors, enhancers, and silencers. To the extent that intracellular RNA levels are controlled by genes other than their own structural genes, to that extent the points in the genotypic similarity vs. phenotypic distance (GSvPD) plots would deviate from the associated diagonal lines.

The triangular distribution of points in the GSvPD plots (Figs. 12.17, 12.18) embodies both determinism and nondeterminism, reminiscent of deterministic chaos in dynamical systems theories (Scott 2005). The determinism is reflected in the fact that almost all the points in the GSvPD plot lie below the diagonal line, whereas the nondeterminism is exhibited by the fact that Group D RNA pairs (Fig. 12.20) show unpredictable phenotypic behaviors despite their genotypic similarities. The term nondeterminism is interpreted here as synonymous with “unpredictability” in physics and creativity in linguistics. Therefore, it appears reasonable to conclude that the triangular distribution of the points in the GSvPD plot is an experimental evidence for the operation of the principle of “rule-governed creativity (RGC)” in the genome-wide metabolism of RNAs, i.e., transcriptomics. (The principle of RGC is discussed in detail in Sect. 6.1.4.) This conclusion is consistent with the postulate that living cells use language, cellese, which is isomorphic with the human language, humanese characterized by RGC (see Sects. 6.1.2 and 6.1.3).

11 Genes as Molecular Machines

The experimental data presented Sect. 12.9 indicate that most structural genes of budding yeast co-regulate their own transcript levels in the cell in cooperation with other genes under the experimental condition of glucose–galactose shift. The extent of such co-regulation may vary from one experimental condition to another. This goes against the commonly held views that structural genes simply act as passive templates for transcription and replication with their rates controlled by other regions of DNA such as promoters, enhancers, and silencers (Fig.  12.22 ). The idea that structural genes possess the capacity to regulate the intracellular concentration of their own transcripts (as proposed in Sect. 12.9) is novel to the best of my knowledge (Ji et al. 2009c).

If structural genes are to regulate their own transcript levels (through mechanisms discussed in Table 12.8), they must dissipate free energy, since no control of any kind is possible without dissipating free energy (Hess 1975). This means that structural genes must be able to store energy as well as control information. Structural genes, being DNA segments, can store mechanical energy in the form of conformational strains as exemplified by DNA supercoils (Benham 1996a), which are examples of conformons (Sect. 8.3). Since any material entity possessing both the control information and the energy to execute such information can be defined as a machine (Ji 1991), and since structural genes possess (1) genetic information encoded in their nucleotide sequences and (2) mechanical energy stored in their conformational strains, structural genes satisfy the necessary and sufficient condition for being molecular machines. Extending this argument further, it is here suggested that:

Table 12.8 The energy-dependent self-regulatory powers of structural genes of budding yeast observed during glucose–galactose shift. The slopes were read off from the diagonal lines of the first six plots in Fig. 12.21. The self-regulatory powers were calculated from the slopes using Eq. 12.22

Not only structural genes but also any DNA segment or the DNA molecule itself can be viewed as molecular machines since they all participate in controlling one or more phenotypes. (12.23)

Statement 12.23 will be referred to as the “Genes as Molecular Machines Postulate” (GAMMP).

One indirect evidence for the GAMM Postulate is provided by the energy-dependency of the self-regulatory powers of structural genes that can be estimated from the GSvPD plots such as shown in Fig. 12.21. As summarized in Table 12.8, the slopes of the GSvPD plots of most metabolic pathways decrease (except for the heme biosynthesis and oxphos pathways) as the budding yeast cell undergoes cell-state transition from the energy-poor early to the energy-rich late phases. The corresponding self-regulatory powers of structural genes increase by 53–162% (see the last column in Table 12.8), indicating that the regulatory activity of structural genes are generally enhanced by the availability of metabolic energy.

A structural gene can regulate its transcript level inside the cell as summarized in Table 12.8.

Mechanisms A, C, E, and G are trans-mechanisms, i.e., genes exert their control power through other molecules such as RNA, whereas Mechanisms B, D, F, and H represent both trans- and cis-mechanisms, the cis-mechanisms implicating genes regulating other genes directly (e.g., via transmitting mechanical or conformational strains), without being mediated by other molecules such as RNAs or proteins. The cis-regulatory mechanisms of structural genes postulated here appear to be new but are consistent with the ideas already discussed earlier – (1) the Bianchini cone in Fig. 9.2 which represents the set of the mechanisms by which DNA itself regulate transcription without being mediated by RNA or proteins, and (2) the concept of d-genes (see Fig. 11.8), i.e., the notion that the DNA molecule as a whole acts as a gene, for example, in self-replication where DNA acts as its own template.

One way to characterize the mechanism underlying the self-regulatory power of a structural gene is to represent it as a vector in an eight-dimensional “mechanisms space” defined by eight orthogonal axes, each encoding the extent (with numerical values ranging from 0 to 1) of the contribution of one of the eight mechanisms, A through H, shown in Table 12.9, to the overall mechanism of self-regulation, M i :

$$ \overrightarrow {{{\mathbf{M}}_{{\mathbf{i}}}}} { } = { }{{{M}}_{\rm{i}}}{ \ (}{{{c}}_{\rm{i1}}},{ }{{{c}}_{\rm{i2}}},{ }{{{c}}_{\rm{i3}}},.{ }.{ }.,{ }{{{c}}_{\rm{i8}}}{)} $$
(12.24)

where the subscript, i, refers to the ith RNA under consideration, and ci1, ci2, ci3, …, ci8 are the coordinates of the head of the vector , \( \overrightarrow {{{{M}}_{\rm{i}}}} \), whose base is located at the origin of the eight-dimensional mechanisms space.

Table 12.9 Possible mechanisms of structural genes self-regulating the intracellular levels of their own transcripts. An arrow can be read as “regulates.” The structural genes correspond to “visible genes” in Fig. 12.22, i.e., those genes whose transcripts are being measured, and those genes whose transcripts are not measured directly but are assumed to affect the “visible genes” in one way or another are referred to as “invisible genes” in Fig.  12.22

In order for a d-gene (i.e., a gene acting as template for DNA replication) to be able to regulate the activities of other genes through cis-mechanisms, it is necessary for the d-gene to exert a mech7anical force on its target gene(s) in order to meet the energy requirement for control (Hess 1975) and not to violate the laws of thermodynamics. One possible source of the energy needed to generate such forces is the conformational energy stored in DNA duplexes introduced by ATP-dependent enzymes such as DNA gyrases, variously called SIDDs (Benham 1992, 1996a, b) and conformons (Ji 1974b, 2000) (see also Sect. 8.4). Unlike protein molecular machines that can transduce chemical energy into the mechanical energy or conformons by their catalytic actions, d-genes cannot directly utilize chemical energy to generate their mechanical forces due to lack of enzymic activity and hence must rely on energy transfer from force-generating or conformon-generating protein machines as exemplified by supercoiled DNA. This leads me to suggest that molecular machines be divided into three groups as shown in Fig. 12.23, in analogy to the division of transport processes into a similar scheme.

Fig. 12.23
figure 23

Three kinds of molecular machines. Active molecular machines (e.g., ATP-dependent ion pumps) are autonomous in that they can generate mechanical forces directly from exergonic chemical reactions that they catalyze. Passive molecular machines (e.g., passive ion channels) cannot perform any active processes such as ion movements against their concentration gradients. Active molecular machines divide into two groups – the primary and the secondary. The primary active molecular machines (PAMM) can generate mechanical forces or conformons directly from the chemical reactions catalyzed by them (e.g., myosin). The secondary active machines (e.g., Ca++-ion driven Na+ -ion channel, DNA supercoils) cannot generate mechanical forces or conformons from any chemical reactions but depend on the energy or conformon transfer from primary active molecular machines

Utilizing the machine classification scheme shown in Fig. 12.23, the following generalization may be proposed:

d-Genes cannot act as primary active molecular machines but can act only as secondary active molecular machines (SAMM) or passive molecular machines (PMM). (12.25)

Statement 12.25 may be referred to as the “d-Gene as Molecular Machines (DGAMM)” hypothesis. Since structural genes are members of the drp-gene family (Sect. 11.2.4), the hypothesis that structural genes are molecular machines would follow as a corollary from the DGAMM hypothesis.

12 The Isomorphism Between Blackbody Radiation and Whole-Cell Metabolism: The Universal Law of Thermal Excitations (ULTE)

The genotypic similarity vs. phenotypic distance (GSvsPD) plots such as shown in Fig. 12.17 were useful in gauging the overall behaviors of the kinetic differences between all possible RNA pairs within a metabolic pathway but did not reveal any clear patterns of distribution of the points within the main body of the plots. However, when the data points in a GSvsPD plot are graphed in the form of what is called the phenotypic distance vs. frequency (PDvsF) plot by displaying the number of points found within an arbitrary interval (or a bin) of the phenotypic distance against the phenotypic distance class, unexpected patterns or regularities in frequency distribution emerged as shown in Fig.  12.24 . This contrasts with the seemingly random distributions found in the corresponding GSvsPD plots shown in the bottom two panels in Fig. 12.21. The following observations can be made:

Fig. 12.24
figure 24

Phenotypic distance vs. frequency (PDvF) plots of oxphos and glycolytic pathways in the energy-poor early phase and the energy-rich late phase. The x-axis represents the phenotypic distance divided into bins of 50 units and the y-axis records the number of points located within each bin

  1. 1.

    The PDvsF plots are energy-dependent. When budding yeast cells undergo a state transition from the energy-poor early phase to the energy-rich late phase, the mean and the variance of the PDvsF plot of the oxphos pathway remain unchanged and increase, respectively (see the table in the bottom panel of Fig.  12.24 ). In contrast, the mean and the variance of the glycolytic pathway both decreased during the same cell-state transition. These changes are most likely the results of the metabolic transitions from the respiratory to the glycolytic mode induced by the glucose–galactose shift (Ronne 1995; Winderickx et al. 2002).

  2. 2.

    A decrease in the variance of a pathway-specific PDvsF plot indicates a more coherent behaviors of RNA trajectories in the yeast cell secondary to the activation of the pathway involved (Ji et al. 2009, unpublished observation). An increase in the variance would indicate the opposite, namely, the deactivation of the metabolic pathway. This interpretation is consistent with the fact that upon removal of glucose, yeast cells (1) activate the oxphos pathway in order to generate ATP from respiration converting ethanol (presumably left over from the glycolysis before glucose was removed) to carbon dioxide and water and (2) subsequently deactivate oxphos and activate glycolysis when the LeLoir enzymes are induced (see Fig.  12.3 ) which convert galactose to glucose-1-phosphate, the substrate for the glycolytic pathway (Winderickx et al. 2002).

  3. 3.

    The shapes of the frequency distributions in PDvsF plots are not random nor normal but resemble surprisingly the blackbody spectrum (see the upper right-hand panel of Fig. 11.24). As evident in Fig.  12.25 , this visual impression is validated by the quantitative agreement found between the experimental data points of PDvsF plots and the theoretical predictions based on the following equation (referred to as the blackbody radiation-like equation, or BRE) that has the same form as the Planck’s radiation law (see Eq. 11.26 in Sect. 11.3.3):

    Fig. 12.25
    figure 25

    The blackbody radiation law-like equation, Eq. 12.26, also referred to as the universal principle of thermal excitations, has been found to be obeyed by single-molecule enzymology (Panel e), whole-cell metabolism (Panels ad), and protein stability (Panel f) with the following numerical values for the parameters, a, b, A and B: (a) a = 1.3 × 108, b = 41, A = 2, B = 1.5; (b) a = 109, b = 50, A = 2, B = 0; (c) a = 1.6 × 109, b = 72, A = 2, B = 3; (d) a = 109, b = 47.5, A = 2, B = 2; (e) a = 3.5 × 105, b = 200, A = 1, B = 0; (f) a = 1.8 × 1010, b = 300, A = 14, B = 18

    $$ {{y }} = {{ a}}{({{Ax }} + {{ B}})^{{ - {5}}}}/({{{e}}^{{{\rm{b}}/({\rm{Ax}} + {\rm{ B}})}}}{ }-{ 1}) $$
    (12.26)

    where y is frequency, x is the phenotypic distance, and a, b, A, and B are constants (Ji and So 2009d).

  4. 4.

    The Plank radiation law, Eq. 11.26, which successfully explained the blackbody radiation data in 1900 (Nave 2009; Kragh 2000) is of the form:

    $$ {{y }} = {{ (a}}/{{{x}}^{{5}}}{)}/({{{e}}^{{{\rm{b}}/{\rm{x}}}}}-{ 1}) $$
    (12.27)

    where a and b have the numerical values given in Table 12.10. The concept of the quantum of action introduced by this equation gave birth to quantum mechanics which revolutionized physics in the first three decades of the twentieth century. In the last decade of the same century, two experimental techniques known as the single-molecule enzymological method (Sect. 11.3) and DNA microarrays (Sects. 12.1 and 12.2) were invented that have been revolutionizing experimental biology ever since. Equation 12.27 generalized as Eq. 12.26 (by replacing x with Ax + B) has been found to fit not only the single-molecule enzymological data of cholesterol oxidase (as shown in Sect. 11.3.3) but also the whole-cell RNA metabolic transitions as shown in Fig.  12.25 . In addition, Fig. 12.25 includes protein stability data (see Panel f), because they are found to obey the Universal Principle of Thermal Excitations (UPTE), i.e., Eq. 12.26, as indicated by the solid line. The x-coordinates of the experimental points (open circles) were the negatives of the ΔG values read off from Fig. 12.26, which was reproduced from Zeldovich et al. (2007b). The solid curve is predicted by the equation:

    $${\text{P}} \left( \Delta {\text{G}} \right) = {\text{A}}\exp \left( {{\text {hE}}/ {{\left( {{{\text{h}}^2} + {\text{D}}}\right)}}} \right)\sin \left( {\pi \left( {\text{E}} -{{\text{E}}_{{\min }}} \right)} \right)/ \left( {{{\text{E}}_{{\max}}} - {{\text{E}}_{{\min }}}} \right)$$
    (12.28)

    derived in Zeldovich et al. (2007b), where h and D are, respectively, the mean and the mean square change of protein stability induced by point mutation. E is the energy of the native state of a protein, and Emax and Emin are the maximum and minimum energies that a protein store upon folding. It is assumed that E can be replaced by G, which is tantamount to assuming that the volume and entropy changes accompanying protein folding are relatively insignificant.

    Fig. 12.26
    figure 26

    The distribution of single-domain proteins reproduced from Zeldovich et al. 2007b. The x-axis records the Gibbs free energy change, ΔG, accompanying protein folding processes, and the y-axis registers the probability, P(ΔG), of observing free energies of protein folding

    Fig. 12.26a
    figure 26a

    The double logarithmic plot of the average values of a and b of the BREs that best fit the distance data of the TL (RNA level) and TR (transcription rate) trajectories (see Table 12.10A) of 17–18 metabolic pathways in budding yeast undergoing glucose-galactose shift, including (1) ATP synthesis, (2) fatty acid synthesis, (3) glycolysis, (4) oxidative phosphorylation, (5) respiration, (6) cell cycle, (7) cell wall biogenesis, (8) DNA repair, (9) DNA replication, (10) meiosis, (11) mitosis, (12) mRNA splicing, (13) nuclear protein targeting, (14) protein degradation, (15) protein folding, (16) protein glycosylation, (17) protein processing, (18) signaling, (19) cytoskeleton, (20) protein folding, (21) protein synthesis, (22) secretion, (23) transcription, and (24) transport. The means and standard deviations of a and b are given in Table 12.10A. 1 = blackbody radiation; 2 = single-molecule enzyme kinetics; 3 = distances between transcription rate trajectories, 4 = distances between transcript level trajectories, and 5 = protein stability

    The fact that protein stability data fit Eq. 12.26 indicates that thermal excitations or transitions are implicated in protein stability as well. One possible rationale for this inference is that protein stability data are quantitatively identical with the activation free energies of protein denaturation. If this interpretation is correct, the same mechanism of single-molecule enzymic catalysis proposed in Fig. 11.28 would apply to protein denaturation, except that the common transition state, C, now replace the denatured (or unfolded) state of a protein. Based on these findings, it is here suggested that Eq. 12.26 can be viewed as a universal law applicable to blackbody radiation, single-molecule enzymology, protein stability, and whole-cell metabolism, three of which are summarized in Table 12.10 with extensive commentaries and footnotes. The common mechanisms underlying all of the three phenomena listed in Table 12.10 are postulated to be the thermal excitations or activations of molecular motions (or Brownian motions), including bond vibrations, rotations, and translational motions of molecules (see Row 9, Table 12.10). It is for these reasons that Eq. 12.26 is thought to deserve to be referred to as the Universal Law of Thermal Excitations (ULTE).

  5. 5.

    To the best of my knowledge, Eq. 12.26 represents the only mathematical equation known so far that applies to thermal motions of material objects, from atoms to molecules to metabolic pathways and to living cells, having volumes that differ maximally by a factor of about 1015. Thus, Eq. 12.26 provides the experimental evidence for the hypothesis that thermal motions are one of the necessary conditions of life, leading to the following generalization:

    No thermal motions, no life. (12.29)

    We may refer to Statement 12.28 as the Heat Principle of Life (HPL) or the Principle of the Thermal Requirement of Life (PTRL). HPL is consistent with many mechanisms of the origin of life that invoke thermal cycling, including the conformon model of the origin of life (Sect. 13.2) (Ji 1991) and the RNA-based model of the origin of life proposed by Anderson (1983, 1987) (Sect. 13.1).

  6. 6.

    The first six rows of Table 12.10 demonstrate the quantitative fitting of the experimental data to the Universal Law of Thermal Excitations (ULTE), Eq. 12.26, that have been obtained from the experiments performed on very different kinds of phenomena, namely, black-body radiation, single-molecule enzymology, and whole-cell metabolism. Rows 7–10 (with extensive footnotes) attempt to provide possible mechanistic rationale for the universal application of ULTE to these three phenomena. Finally Rows 11–13 suggest the new concepts, principles, and theories that have been suggested by the mathematical fitting of the experimental data from single-molecule enzymology and whole-cell metabolism to the same type of equation that fits the black-body radiation data.

  7. 7.

    It is interesting to note that the last two columns of Table 12.10 summarize the key results of my theoretical research in molecular and cell biology that spans a period of almost four decades – from 1972 through 2010.

    Table 12.10 The Universal law of Thermal Transitions, y = (a/(Ax + B)5)/(eb/(Ax + B) − 1), as applied to black-body radiation, single-molecule enzymology, and whole-cell metabolism
    Table 12.10A Planck’s radiation law–like equation (BRE) is obeyed by (1) blackbody radiation; (2) single-molecule enzymic activity of cholesterol oxidase, whole-cell RAN metabolism measured as (3) distances between transcription rate trajectoriesa and (4) distances between transcript level trajectoriesb; and (5) protein stability data. The numerical values of the BRE parameters for Processes 3 and 4 are the averages of 17–18 metabolic pathways (listed in the legend to Fig. 12.26a) with standard deviations as indicated

aWhen matter is heated above some threshold temperature, electrons in their ground states are excited and promoted to higher energy levels (see Fig. 11.28). When these electrons return to their ground states, light is emitted with varying frequencies (or colors), giving rise to the so-called blackbody spectrum (see the upper right-hand panel in Fig. 11.24).

bWhen biopolymers are heated to physiological temperatures, all the degrees of freedom of motions of atoms and groups of atoms constituting them are excited to higher energy levels, including vibration, rotation, and bending motions and rarely electronic motions as in blackbody radiation which usually requires heating beyond the physiological temperature range.

cThe cell is densely packed (or “crowded”) with m different types of biopolymers, each type being represented by n copies, where m can be maximally 6,300 in budding yeast, the size of the yeast genome, and n can range from 1 to over 103. Out of the almost an infinite number, mn, of the systems of biopolymers that can form inside a single cell through various combinations of the biopolymers in different combining ratios, only a small fraction of them is thought to be metabolically active at any given time to meet the metabolic demand of the cell under a given environmental condition. It is assumed here that these metabolically active biopolymeric complexes (acting as a SOAWAN machine; Sect. 2.4) constitute a set of cell states, analogous to the electronic states in atoms or quantum dots (see Sect. 4.15). Therefore, just as heating matter leads to alterations in the electronic configurations of atoms, so it is postulated that heating causes rearrangements of component biopolymers leading to alterations in the number and kinds of metabolically active biopolymeric complexes (MABCs, or SOWAWN machines) formed, each catalyzing a specific metabolic pathway or its component processes during their lifetime. Ribons discussed earlier can be thought to represent the activities of MABCs.

dBlackbody radiation implicates heating at high temperatures, typically from 3,500 to 5,500 K. At these temperatures, electrons can undergo transitions from one energy level to another.

eHeating biopolymers to physiological temperatures (280–320 K) usually does not affect electronic energy levels of atoms (except at active sites) but cause transitions between vibrational, rotational, and bending energy levels of groups of atoms within biopolymers as well as alterations in the translational (or diffusional) motions (speeds) of biopolymer molecules as motional units.

fMost, if not all, of the energy absorbed by matter during heating is re-emitted as light during blackbody radiation.

gThe heat absorbed by an enzyme from its environment is re-emitted as heat after the residence time τ′, where τ′ is much shorter than the turnover time, τ, of an enzyme, the time required for an enzyme to catalyze one cycle of a chemical reaction: τ′ < τ. But what is measured in a single-molecule enzymological experiment (see Sect. 11.3) is not the heat re-emitted by an enzyme (as in blackbody radiation) but the consequence of heating, i.e., the catalysis (e.g., the disappearance of the fluorescence of FAD, the coenzyme of cholesterol oxidase, due to reduction) proceeding in times shorter than τ′.

hDuring whole-cell metabolism two or more metabolic pathways (or SOWAWN machines) are coupled (e.g., transcription and transcript degradation pathways; see Fig.  12.22 ) to maintain a certain cell state (e.g., RNA levels) and meet the metabolic demand of the cell. It is postulated here that thermal (also called Brownian) motions of biopolymers are essential for the cell to explore and access the right biopolymeric complexes among a large repertoire of the biopolymeric complexes available to it. In this view, what is measured in whole-cell metabolic experiments (e.g., the genome-wide RNA measurements in budding yeast undergoing glucose–galactose shift) is not the activities of individual enzymes but the balance of all the activities of the coupled metabolic pathways, i.e., the system properties of a group of dozens or more enzymes and other biopolymers.

iThe discrete units of light, a member of the family of quantum objects (Plotnitsky 2006) that include all microscopic entities such as electrons, protons, and neutrons. According to quantum mechanics (Morrison 1990; Plotnitsky 2006), light can be viewed as streams of particles (i.e., photons) or as waves.

jThe discrete units of mechanical energy stored as sequence-specific conformational strains of biopolymers (see Chap. 8).

kThe dynamic and transient systems of biopolymers (e.g., SOWAWN machines; Sect. 2.4) and associated small molecules that are coupled together to perform elementary metabolic functions inside the cell such as glycolysis, transcription, and RNA degradation. There are many different kinds of dissipatons just as there are many different kinds of molecules.

lThe fitting of the blackbody radiation data into Planck’s radiation law (see Eq. 11.27) established the concept that the physical quantity known as action, defined as the product of energy and time, is not continuous but quantized, ushering in the era of quantum revolution in physics beginning in 1900 (Nave 2009). One consequence of the quantization of action is the establishment of the energy levels in an atom between which electrons are constrained to undergo transitions. Thus, quantization of action and the electronic energy levels within an atom are the two sides of the same coin.

mThe idea that the conformational energy of biopolymers plays essential roles in catalysis (Lumry 1974, 2009), gene expression (Benham 1992) and molecular motions (Ji 1974b; Astumian 2001) is gaining general acceptance among biologists (Frauenfelder 1987; Frauenfelder et al. 2001; Ji 2000). But the idea that conformational energy levels of biopolymers may be quantized just as the electronic energy levels are in atoms is novel and suggested here (see Sect. 11.3.3) for the first time on the basis of the finding that the single-molecule enzymic data of Lu et al. (1998) fit the Planck radiation law-like equation, Eq. 11.26, which, when applied to atoms, leads to the “quantization” of the electronic energy levels in atoms. The term “quantization” here means that the energy levels are not continuous but are separated into discrete states.

nJust as electrons have their energy levels within an atom (which can be depicted diagrammatically as shown in the left panel in Fig. 11.28), it is postulated here that biopolymers or SOWAWN machines (Sect. 2.4) possess their unique Gibbs free energy levels (or more accurately “partial molar free energies,” also known as “chemical potentials” [Wall 1958, p. 192]) within a living cell. The partial molar free energy of the ith chemical, μi, including biopolymers, can be calculated as:

$$ {\mu_{\rm{i }}} = { (}\partial {\hbox{G}}/\partial {{\hbox{n}}_{\rm{i}}}{)_{{{\rm{T}},{\rm{P}},{\rm{ n1}},{\rm{ n2}},{ }.{ }.{ }.}}} $$
(12.30)

which states that the chemical potential of the ith chemical in a system consisting of components labeled as 1, 2, 3, …, is equal to the partial derivative of Gibbs free energy of the system with the temperature, pressure, and the concentrations of all the components held constant except the ith component.

If the ith chemical, say, A, interacts with at least one another chemical, B, to produce two products, C and D, i.e., A + B ↔ C + D, the Gibbs free energy change, ΔG, experienced by the system under consideration is given by:

$$ \Delta {{G}} = {{{G}}_{\rm{Final}}} - {{{G}}_{\rm{Initial}}} = \Delta {{{G}}^{ \circ }} + {{RT}}\;{ \ln }\;\left( {{{{\left[ {{C}} \right]\left[ {{D}} \right]}} \left/ {{\left[ {{A}} \right]}} \right.}\left[ {{B}} \right]} \right) $$
(12.31)

where GFinal and GInitial are the Gibbs free energy content of the system in the final (or product) and initial (or reactant) state, respectively, R is the universal gas constant, ln is the natural logarithm, and ΔG˚ is the change in the standard Gibbs free energy of the system, namely, the Gibbs free energy change per mole of the system at the standard state characterized by the standard T and P, and the unit concentrations (or more accurately activities) of all the chemicals in the system. At equilibrium nothing can change and hence ΔG = 0, and the quotient, ([C][D]/[A][B]) assumes a unique numerical value known as the equilibrium constant denoted as K, leading to the following equation:

$$ \Delta {{G}}^\circ { = } - {{RT}}\;\ln \;{{K}} $$
(12.32)
$$ {{K = }}{{{e}}^{{{{{ - \Delta {{\rm{G}}^{ \circ }}}}\! \left/ {\rm{RT}} \right.}}}} $$
(12.33)

As Eqs. 12.31 and 12.32 indicate, the standard Gibbs free energy change, ΔG˚, can be determined by measuring the equilibrium constant, K, of the chemical reaction system under the standard condition.

Gibbs free energy has the interesting property that it minimizes when spontaneous processes occur under the environmental conditions of constant temperature (T) and pressure (P) (Callen 1985; Kondepudi and Prigogine 1998; Kondepudi 2008). In other words, Gibbs free energy is a quantitative measure of the tendency of a physical system to change spontaneously, for whatever reasons, given the right environmental conditions to overcome kinetic barriers. Under constant T and P, all spontaneous processes occur with a net decrease in Gibbs free energy, indicating the special relevance of Gibbs free energy (among many other forms of energy including the Helmholtz free energy, enthalpy, etc.) to the biochemical reactions proceeding in homeothermic organisms. Thus, for all spontaneous processes occurring under homeothermic and constant pressure conditions, the accompanying Gibbs free energy change must be negative:

$$ \Delta {{G }} = { }\Delta {{E }} \!+ \!{{ P}}\Delta {{V }} - {{ T}}\Delta {{S}} { } < { }0 $$
(12.34)

where E is the internal energy of the thermodynamic system under consideration, V is its volume, and S is its entropy content (see Eq. 2.1).

Just as photons are related to the electronic energy levels in atoms and conformons are associated with the mechanical energy levels in biopolymers, so it is here postulated that dissipatons are related to the Gibbs free energy levels or chemical potentials (Wall 1958, pp. 193–195; Moore 1963, p. 98) of biopolymers inside the cell that associate themselves transiently to form a functional unit (i.e., a SOWAWN machine) for the purpose of catalyzing a specific metabolic process or pathway. In other words, just as atoms contain a set of electronic energy levels (see the left panel of Fig. 11.28) and enzymes contain a set of mechanical energy levels (see the right panel of Fig. 11.28), so it is hypothesized that:

Cells contain a set of Gibbs free energy levels, some of which being associated with or occupied by biopolymers constituting a dissipation (also called a SOWAWN machine or hyperstructure) that catalyzes a metabolic function. (12.35)

For convenience, Statement 12.34 will be referred to as the Postulate of the Quantization of Intracellular Gibbs Free Energy Levels (QIGFEL) postulate. The QIGFEL postulate may be viewed as addressing the energetic aspect of cell metabolism, whereas the theories of SOWAWN machines (Sect. 2.4.3) and hyperstructures (Sect. 2.4.4) focus on the structural and informational aspect of cell metabolism. In other words, the QIGFEL postulate and the theories of SOWAWN machines and hyperstructures may be viewed as complementary aspects of the phenomenon of cell metabolism driven by gnergy (Sect. 4.11).

oIn physics, the term “field” is defined as a region of space at every point of which a physical property, such as gravitational or electromagnetic force or fluid pressure, has a characteristic value. Electrons in an atom can be said to exist in an electromagnetic field at every point of which electrons possess unique values for their potential and kinetic energies.

pThe interior of a biopolymer may be viewed as a field at every point of which a mechanical stress can be defined that arises from physical properties such as electrical, mechanical, and van der Waals interactions among the monomeric units of biopolymers.

qUnlike the mechanical stress field confined within a biopolymer described in Footnote 16 which is “intramolecular”and “node-dependent,” the concentration field postulated to exist inside the cell is an “extramolecular,” “intermolecular,” and “network-dependent” property. Consequently, Eqs. 12.26 and 2.27 or their equivalents may apply to the concentrating field inside the cell but not to the mechanical stress field within a biopolymer.

rThe discovery of the quantization of action by Planck in 1900 (Kuhn 1978) led to the development of quantum mechanics by the mid-1920s which has revolutionized physics and the philosophy of science (Murdoch 1987; Plotnitsky 2006; Bacciagaluppi and Valenti 2009).

sPrigogine’s dissipative structures (also called dissipatons in this book) can be divided into local and global dissipatons based on the dichotomization of kinematics into local and global branches (see Sect. 3.1.6). The conformon theory of molecular machines developed between 1972 and 1991 (Ji 2000) represents a theory of local dissipatons (LDs), whereas the theories of SOWAWN machines (Sect. 2.4.3), hyperstructures (Sect. 2.4.4), and metabolic spacetime (Welch and Smith 1990; Smith and Welch 1991; Welch and Keleti 1981) belong to the family of the theories of global dissipatons (GDs).

The cell can be viewed as a dynamic system of molecules (biochemicals, proteins, nucleic acids, etc.) that is organized in space and time to form LDs (e.g., enzyme turnovers driven by conformons) as well as GDs (e.g., cell migration powered by conformons, cell cycle coordinated by dissipatons). Since all organizations in the cell are ultimately driven by the Gibbs free energy supplied by chemical reactions catalyzed by enzymes which in turn are driven by conformons (Chap. 8), it follows that all GDs in the cell are ultimately driven by LDs or that LDs are the necessary condition for GDs. This is reminiscent of the replacement of the Newtonian action-at-a-distance (i.e., the “gravitational force”) with the local curvature of spacetime induced by mass at the location (Wheeler 1990, pp. 12.15). That is, it appears that:

Both in physics and biology, there is no action-at-a-distance but only local actions. (12.36)

We may refer to Statement 12.35 as the “Universal Principle of Local Actions (UPLA).”

tAccording to the UPLA formulated in Footnotes, all global dissipatons (GDs) must derive from local dissipatons (LDs). What is the possible mechanism by which a GD can be produced from a set of LDs? In other words, how can a set of LDs give rise to a GD? The pre-fit hypothesis which was formulated on the basis of the Principle of Slow and Fast Processes or the Generalized Franck–Condon Principle described in Sect. 7.1.3 suggests the following plausible mechanism for coupling the formation of a GD from a set of LDs (or as set of two LDs as a simplest case):

  1. 1.

    \({{L}}{{{D}}_{{{1 }}}} + \sim\! { } +\! {{ L}}{{{D}}_{{2}}} { } \leftrightarrow \!{{ L}}{{{D}}_{{{1 }}}}\sim \!{{ L}}{{{D}}_{{2}}} \)

  2. 2.

    \({{L}}{{{D}}_{{{1 }}}}\sim \!{{ L}}{{{D}}_{{{2} { }}}} \leftrightarrow ({{L}}{{{D}}_{{{1} { }}}} -\!\!\!-\!\!\!- {{L}}{{{D}}_{{2}}}) \)

  3. 3.

    \( {({{L}}{{{D}}_{{1}}} -\!\!\!-\!\!\!- {{L}}{{{D}}_{{2}}})_{{ { }}}} \to {{L}}{{{D}}_{{1}}}^{\prime} -\!\!\!-\!\!\!- {{L}}{{{D}}_{{2}}}{^{\prime}_{{ { }}}}\ {{or}}\ {}{{GD}} \)

In Step 1, two LDs, i.e., LD1 and LD2, and a structural element denoted as ~ such as a microtubule spanning the cytosolic space between the nucleus and the cell membrane are thermally fluctuating, occasionally forming a tripartite complex shown on the right-hand side of the double-headed arrow. In Step 2, the loose complex, LD1 ~ LD2, is in equilibrium with its tight but transient complex denoted as (LD1 –– LD2). Finally, in Step 3, the unstable, transient complex (LD1 –– LD2) is stabilized to form LD1′ –– LD2′ through a synchronized dissipation of a part of the free energies of LD1 and LD2 which form their lower free energy states, LD1′ and LD2 ′, that are now coupled by a rigid connector symbolized by –– which is equivalent to a GD.

As evident in Steps 1–3 above, the coupling between LD1 and LD2 across an arbitrary distance symbolized by the bar, –– does not depend on any action at a distance but only on (1) local actions of thermal fluctuations of LD1 and LD2 and (2) the synchrony of the relaxations of LD1 and LD2 to LD1′ and LD2′ and the process of the rigidification of the connector, namely, ~ ➜ ––. If this mechanism proves to be correct, the main principles underlying the theory of global dissipations as applied to cell metabolism may turn out to be the Principle of Slow and Fast Processes or the generalized Franck–Condon principle (Sect. 2.3.3) (Ji 1991) and the Principle of Enzymic Catalysis based on enzymes acting as coincidence detectors (Sect. 7.2.2).

Based on the fact that both the single-molecule enzymic data (reflecting the activities of conformons, a member of LDs) and the genome-wide RNA trajectories of budding yeast undergoing glucose–galactose shift (reflecting GDs) obey the blackbody radiation law-like equation, Eq. 12.26, it may be asserted that (see Row 11, Table 12.10):

Conformons and intracellular dissipatons are to biology what photons and quantum objects are to physics. (12.37)

Statement (12.37) is consistent with the atom-cell isomorphism postulate (ACIP) described in Fig. 10.4.

13 The Cell Force: Microarray Evidence

The concept of the cell force was invoked in Ji (1991, pp. 8, 95–118) to account for the functional stability of the living cell, just as physicists were led to invoke the concept of the strong force to account for the structural stability of the nucleus of the atom:

…‘Cell force’ is a hypothetical force thought to act within the living cell to hold together biopolymers in functional states. The cell force is postulated to be mediated by a combination of conformons and IDSs called ‘cytons’... , just as the strong force is mediated by gluons.… (Ji 1991, p. 8; see also Appendix K)(12.37a)

…It is postulated that there exists a new kind of force in nature called the cell force that ‘holds’ together h-particles (i.e., biopolymers; h = heavy, my addition) and l-particles (i.e., small-molecular weight chemicals; l = light, my addition) of the cell together in the living state against environmental perturbations, just as the strong force holds nucleons together in atomic nuclei against electrostatic repulsions.(12.37b)

The purpose of this section is to present the first experimental evidence supporting the cell force concept. The evidence was derived from the observation that the whole-cell RNA metabolic data measured with microarrays from budding yeast (Garcia-Martinez et al. 2004) fit the blackbody radiation–like equation (BRE) discussed in Sect. 12.12. When I formulated the cell force concept over two decades ago (Ji 1991 and Appendix K), I did not know then that one day I might be analyzing experimental data shedding light on the existence of the cell force in living cell. In retrospect, it is not surprising that it took over two decades for the cell force concept to be tested against experimental data, because the relevant whole-cell metabolic kinetic data did not become available until 2004 when Garcia-Martinez et al. (2004) measured the time-dependent changes in the genome-wide RNA levels and transcription rates in budding yeast (Sect. 12.2), and it was not until 2009 when the fitting of the whole-cell RNA kinetic data to BRE was discovered (Ji and So 2009). Realizing the connection between the fitting of the genome-wide RNA kinetic data to BRE and the cell force concept invoked two decades earlier was not immediately obvious and took another couple of years to occur as a result of writing this book. The recognition of the cell force-BRE connection was made possible by a qualitative application to cell biology of the renormalization group theory described in Huang (2007) (see below).

In addition to the yeast RNA kinetic data, BRE was found to fit (after renormalizing the four parameters, a, b, A, and B) the data from single-molecule enzymic catalysis, and protein folding as summarized in Table 12.10A.

It is interesting to note that the numerical values of the four parameters of BRE, i.e., a, b, A, and B, increase in the following order as indicated by the a/b ratios:

Blackbody radiation < Single-molecule enzymology < Transcription < RNA level control < Protein stability(12.37a)

The order revealed in Inequality (12.37a) appears to coincide with the energy scales (and hence distance scales) characterizing each process, since the interaction energies are expected to decrease in the same order, i.e., from electronic energy levels involved in blackbody radiation (i.e., covalent bond energies of ~100 kcal/mole) to non-covalent energies involved in protein folding (1–5 kcal/mole).

When the logarithms of the a and b values of the five processes are plotted in what may be called the “BRE parameter space,” a continuous nonlinear trajectory is obtained that passes through Processes 1, 2, and 5, with Processes 3 and 4 deviating from it (Fig. 12.26a).

The nonlinear trajectory shown in Fig. 12.26a is reminiscent of the renormalization group (RG) trajectory discussed in physics, i.e., in quantum electrodynamics (QED) and quantum chromodynamics (QCD) (Huang 2007). The concept of renormalization was discussed in Sect. 2.4 in connection with bionetworks (e.g., metabolic pathways) viewed as renormalizable networks. A more detailed characterization of the concept of renormalization is given by Huang (2007, pp. 217–225):

…Renormalizability is not just a property of QED, but of all successful theories in physics. The important point is that a renormalizable theory describes phenomena at a particular length scale (e.g., nuclear, atomic, molecular, cellular, etc.; my addition), in terms of parameters that can be measured at that scale.....For example, we can explain the everyday world using thermodynamics, without invoking atoms. Properties such as specific heat and thermal conductivity, which really originate from atomic structure, can be treated as empirical parameters. At a smaller length scale atoms appear, and they can be described by treating the nucleus as a point. Similarly, at the scale of nuclear structure we do not need quarks (i.e., protons and neutrons are sufficient; my addition). Renormalizability is a closure property that makes physics possible. We would not be able to understand the world, if we had to understand every minute detail all at once.        (12.37b)

…In general a renormalizable theory is characterized by an RG trajectory in a space spanned by a definite and fixed number of parameters (i.e., in the ‘parameter space’; my addition).(12.37c)

On pages 223 and 224 of his book cited above, Huang discusses RG (renormalization group) trajectories representing QCD (quantum chromodynamics, the theory of the strong force) and QED (quatum electrodynamic theory, the theory of the electromagnetic force). The trajectory shown in Fig. 12.26a more closely resembles the RG trajectory of QED than that of QCD. This is most likely because the predominant force operating in the systems listed in Table 12.10A is the electromagnetic force, which is manifested in two extreme forms – the covalent interactions (i.e., those processes involving electronic transitions; see Fig. 11.28) and the non-covalent interactions (see Sect. 3.2), located graphically at the beginning and the end, respectively, of the trajectory shown in Fig. 12.26a.

As already indicated, Points 3 and 4 deviate from the trajectory that passes through Points 1, 2, and 5. If these deviations are not due to experimental error or noise but reflect reality as I assume here, this may indicate that Points 3 and 4 represent processes that implicate not only the electromagnetic force but also a new force that is intrinsic to these processes which happen to occur inside the living cell in contrast to those processes represented by Points 1, 2, and 5 that occur outside of living cells. It may be justified to refer to this new force as the cell force, since its action is postulated to be confined to the interior of the living cell just as the strong force is confined within the atomic nuclei (Han 1999; Huang 2007). If the proposed explanation for the trajectory in Fig. 12.26a turns out to be true, we may have here the first experimental evidence supporting the concept of the cell force that was invoked in Ji (1991; see Appendix K) based on a qualitative application of the Yang-Mills gauge field theory to cell biology in part inspired by the field theory of cell metabolism described in Smith and Welch (1991).

Therefore, it may be reasonable to refer to the trajectory in Fig. 12.26a as the “blackbody radiation trajectory (BRTs)” or, more speculatively, as the “blackbody radiation RG trajectory” suggesting that the so-called bare theory (Huang 2007, pp. 219–225) behind BRTs is QED and the single-molecule enzymology (Point 2) and the theory of protein folding (Point 5) represent the renormalized version of QED.

14 The Quantization of the Gibbs Free Energy Levels of Enzymes and the Living Cell

The finding that the rate constant (or waiting time) data of a single-molecule of cholesterol oxidase fit the blackbody radiation-like equation (BRE) (see Fig. 11.24) led me to postulate that the Gibbs free energy of the cholesterol oxidase molecule is quantized (or associated with discrete Gibbs free energy levels) as visualized in Fig. 11.28.

This postulate is supported by the fact that the Gibbs free energy changes accompanying protein denaturation also fit BRE (see Panel f in Fig. 12.25), suggesting that both enzymic catalysis and protein denaturation require thermal excitations of proteins from their ground-state free energy levels (depicted as C1 through Cn in Fig. 11.28) to their excited/activated states (depicted as C in the same figure) leading to catalysis or denaturation.

Based on these observations, I postulate that the Gibbs free energy levels of enzyme molecules in the living cell are quantized. The biochemical and cell biological consequences of this cannot be fully gauged without taking into account the molecular environment of the cell in which enzymes function. Figure 12.27 depicts simplified biochemical pathways involved in determining the intracellular concentrations of mRNA molecules. The cross-hatched lines in Fig. 12.27 symbolize the cytoskeleton to which most biopolymers (including transcriptosomes and degradosomes) are probably bound most of the time, exhibiting the phenomenon of intracellular “crowding” (Goodsell 1991; Minton 2001; McGuffee and Elcock 2010). A more realistic model of the cytoplasm of the living cell is shown in Fig. 12.28 which was computationally constructed by McGuffee and Elcock (2010) utilizing quantitative proteomic data (Link et al. 1997) and atomic-level structural data of protein molecules ranging in size from 7,000 to 1,350,000 Daltons (Berman et al. 2000). The McGuffee and Elcock model of the cytoplasm shown in Fig. 12.28 incorporates steric (i.e., molecular shape), electrostatic and short-range attractive hydrophobic interactions but does not yet include water, hydrodynamic interactions, and protein flexibility (and hence the role of conformons; Chap. 8). The maximum average distances moved by each molecule type during 15 μs of simulation decreases nonlinearly with molecular weights, ranging from 12 molecular diameters for 5,000 Da molecules to 1 molecular dimension for 1,000,000 Da molecules. The average number of neighbors possessed by a molecular type increases nonlinearly with molecular weights. Thus, the immediate neighborhood of a GFP molecule (26,000 Da), for example, is only five, at any instance, while, for the 50 S ribosomal subunit (1,355 KDa), it is more than 25. It is clear that the Gibbs free energy levels of enzymes will critically depend on their microenvironment inside the cell.

Fig. 12.27
figure 27

A simplified diagram indicating the interactions between transcriptosome (T) and degradosome (D) that together determine the trajectory of an RNA molecules, X or Y. Steps 1 and 4 = transcription; Steps 2 and 5 = transcript degradation; Steps 3 and 6 = translation; Steps 7 and 8 = functional coupling between the transcriptosome and degradosome associated with mRNAX and mRNAY; Steps 9 = the functional coupling between the two transcriptosome/degradosome complexes, (TXDX) and (TYDY), that determine the trajectories of the difference ([mRNA]X – [mRNA]Y). [RNA]X and [RNA]Y refer to the intracellular concentrations of mRNAX and mRNAY, respectively

Fig. 12.28
figure 28

The computer model of the cytoplasm of E. coli that contains over 80% of the 50 different types of the most abundant proteins in the organism (McGuffee and Elcock 2010). RNA molecular are colored green and yellow (Reproduced by permission of Adrian Elcock)

The kinetic trajectories of mRNA molecules of budding yeast undergoing glucose–galactose shift are not random but exhibit regularities as amply demonstrated by Figs. 9.1, 12.2a and 12.3, especially the mRNA trajectories shown in the last two figures that are function-related. This indicates that the transcriptosome (T) and the degradosome (D) responsible for the nonrandom behavior of an mRNA trajectory must be coupled (see Steps 7 and 8 in Fig. 12.27, where the wiggly lines indicate the functional coupling between T and D), although the molecular mechanisms underlying such a coupling are not obvious. If these two enzyme systems are not coupled, the associated mRNA trajectory is expected to behave randomly and unpredictably, contrary to our observations.

Since the rate constant of an enzyme is the exponential function of its ground-state Gibbs free energy level (see the right panel of Fig. 11.28 and Eq. 12.42), it would follow that the rate of the synthesis of an mRNA molecule depends on the ground-state Gibbs free energy level of the associated transcriptosome (see Steps 1 and 2 in Fig. 12.27), and the rate of its degradation depends on the Gibbs free energy level of the associated degradosome (see Steps 2 and 5 in Fig. 12.27). Thus, the kinetics of the Xth RNA trajectory “catalyzed” by TX and DX, either acting as separate entities or as components of a functional unit (to be denoted as (TXDX); see Step 7 in Fig. 12.27), is determined by the Gibbs free energy levels (or the quantum states) of TX and DX.

The analysis of the TL vs. TR plots such as Fig. 12.6 indicates that transcriptosomes and degradosomes can exhibit at least five distinct turnover rates, i.e., (1) slow decrease, (2) rapid decrease, (3) no change, (4) slow increase, and (5) rapid increase. For example, in (a), Fig. 12.6, during the first phase (i.e., 0–5 min), TL decreases despite the fact that TR increases. This can be accounted for only if we can assume that, during this phase, the rate of transcript degradation (TD) decreases more than TR does. If each enzyme system has five conformational (or quantum) states with free energy levels arbitrarily labeled as 1–5 as in Fig. 12.29, there are 25 possible conformational (or quantum) states for the (TX + DX) system, each associated with a rate of change in TL given in parenthesis as described in Table 12.11. These 25 different entries group into 9 classes (to be denoted as 1, 2, 3, 4, 5, 6, 7, 8, and 9) as indicated by the dotted lines, and these dotted lines are associated with the relative frequencies of 1, 2, 3, 4 , 5, 4, 3, 2, and 1 for the classes 1–9, respectively. For example, the relative frequencies of the occurrence of the (dTL/dt) classes, n, d 1 , d 2 , d 3 , and d 4 (or n, u 1 , u 2 , u 3 , and u 4 ) are 5, 4, 3, 2, and 1 (counting the number of cells along the dotted lines). Thus, if all the possible couplings between the quantum states of T and D have an equal probability of occurrence (as assumed in Table 12.11), RNA trajectories are five times more likely to remain unchanged, i.e., dTL/dt = 0 (or n), than to decrease (or increase) rapidly with dTL/dt = d4 (or u4). However, the experimentally observed data (see Series 9 in Fig. 12.30a) deviate from the theoretically predicted behavior depicted as a red triangle. However, two interesting features emerge. (1) Different metabolic pathways tend to show peak frequencies located at different rate classes (see a in Fig. 12.29), and (2) different phases within a given metabolic pathway tend to show peak frequencies at different rate classes (see b and c in Fig. 12.29). Thus, the possibility suggests itself that two metabolic pathways that overlap in a two-dimensional frequency-rate class (FR) plot such as Fig. 12.28a may be distinguishable in three-dimensional frequency-phase-rate class (FPR) plots such as Fig. 12.29b, c and this may make the FPR plots a sensitive tool for monitoring cell states in drug discovery research and personalized medicine (see Chaps. 18 and 19).

Fig. 12.29
figure 29

The distributions of the slopes (i.e., dTL/dt) of RNA trajectories of budding yeast undergoing glucose–galactose shift. The y-axis represents the frequency (f) and the x-axis the rate (or slope) classes divided into nine based on Table 12.12. (a) Series 1 = chromatin structure (38 RNAs); Series 2 = DNA repair (21); Series 3 = glycolysis (30); Series 4 = meiosis (17); Series 5 = mitochondrial protein targeting (19); Series 6 = nuclear protein targeting (32); Series 7 = oxidative phosphorylation (16); Series 8 = protein folding (28); Series 9 = sum of series (201 RNAs). The red dotted lines indicate the slope distribution predicted by in Table 12.12. (b) The phase-dependent slope distributions of the 18 RNA trajectories belonging to the mitochondrial protein targeting metabolic pathway. In Phase II and IV, the fifth slope or rate class is dominant with no contributions from the fourth and sixth slopes or rate classes. In contrast, the fourth and sixth classes dominate during Phases III and V, respectively. (c) The phase-dependent slope distributions of the 16 RNA trajectories belonging to the oxidative phosphorylation metabolic pathway. Again, in Phase II and IV, the fifth slope or rate class is dominant with no or little contributions from the fifth and sixth slopes or rate classes. In contrast, the sixth and fourth classes dominate during Phases III and V, an opposite pattern to what was observed in (b). Phase I slopes are not included because they are likely to contain large errors due to the short time of observation, 5 min. For the definition of the phases, see Fig. 12.30

Table 12.11 The relative rate constants of the transcriptosome–degradosome (TD) complexes predicted on the basis of the conformational (or quantum) states (1, 2, 3, 4, and 5) of their T and D components. These numbers can also be viewed as the “quantum numbers” of transcriptosome (n T) and degradosome (n D). The following symbols are used: n = no change; d = down regulation or decrease; u = up regulation or increase in the rate of change in the level of RNA molecules x and y. The subscripts indicate the relative magnitudes of the rates of changes in TL, i.e., u 1 < u 2 < u 3 < u 4, and d 1< d 2< d 3< d 4. The numbers above these symbols are the difference between the quantum numbers of the transcriptosome and degradosome, i.e., Δn = n Tn D , that are associated with the changes in TL levels, i.e., dTL/dt, indicated in parentheses
Fig. 12.30
figure 30

The interpretation of the RNA trajectories in terms of the quantum states of the underlying transcriptosomes and degradosomes. (a) The five phases of the typical RNA trajectory (or the kinetics of TL, transcript level) of budding yeast after replacing glucose with galactose at t = 0. (b) The five quantum states each postulated for the transcriptosome (T) and degradosome (D) constituting a TD complex that determines the shape of an RNA trajectory. (c) all possible quantum states for a TD complex and their effects on the shape of an RNA trajectory, i.e., on dTL/dt

As indicated above and according to Eq. 12.42 and Statement 12.43, the rate constant of an enzyme or an enzyme complex is an exponential function of the Gibbs free energy level of the enzyme. Hence it should be possible to infer the changes in the Gibbs free energy levels of transcriptosomes (T) and degradosome (D) from the kinetic patterns of TL or the RNA trajectories which reflect the enzymic activities of T and D (see Steps 1 and 2, and 4 and 5 in Fig. 12.27). A prototypical RNA trajectory of budding yeast undergoing glucose–galactose shift is schematically depicted in Panel a in Fig. 12.30. During each of the five phases, TL exhibits one of the nine rate behaviors, i.e., dTL/dt classes (denoted as −4, −3, −2, −1, 0, 1, 2, 3, 4, and 5) (see the interior of the table in Panel c, Fig. 12.30), that result from the interactions between T and D, each of which is postulated to exist in one of the five quantum states (denoted as −2, −1, 0, 1, and 2 in Panel b, Fig. 12.30). If we identify the value of dTL/dt during Phase I in Fig. 12.30 with the most rapid decrease, i.e., −4, the quantum states of the underlying T and D would be predicted to be −2 and +2, respectively (see the dotted line labeled I in Panel b, Fig. 12.30). During Phases II and V, TL is not decreasing as rapidly as during Phase I, so we may infer the quantum states of T and D during these phases to be −2 and 1 for Phase II and −2 and 0 for Phase V as indicated by the dotted lines in Panel b, Fig. 12.30. Phase IV is characterized by a relatively rapid increase in TL and hence the quantum states of T and D may be inferred to be 2 and −2. During Phase III, TL remains more or less unchanged and hence the quantum states (and the Gibbs free energy levels) of T and D are likely to be the same, leading to the equality of their rate constants, according to Eq. 12.42.

The quantization of the Gibbs free energy levels of enzymes in cells provides a possible explanation for the pathway-dependent correlations found among some of the RNA trajectories in budding yeast during glucose–galactose shift (see Table 11.12). This conclusion is based on the following reasoning.

Table 12.12 The fraction of the RNA pairs showing linear or nonlinear correlations among intra-pathway and inter-pathway RNA Pairs. Both members of an intra-pathway RNA pair belong to one metabolic pathway, whereas an inter-pathway RNA pair involves two pathways, each contributing one of the two members of a pair. The extent (or fraction) of the linear correlations (numbers in bold) were determined by counting the RNA pairs whose Pearson correlation coefficients were greater than 0.7 or less than −0.7, and the fraction of the nonlinear correlations (numbers in italics) were determined by counting the RNA pairs whose phenotypic distances fit the blackbody radiation-like equation (BRE), Eq. 12.26. The number of RNA molecules per pathway ranged from 12 to 50. P1 = protein folding, P2 = cytoskeleton, P3 = protein glycosylation, P4 = oxidative phosphorylation, P5 = respiration, P6 = glycolysis, P7 = nuclear protein targeting, P8 = DNA repair, P9 = protein degradation, P10 = meiosis
  1. 1.

    RNA trajectories can be correlated in two ways – (1) linearly and (2) nonlinearly. The former indicates that, when the concentration of the ith RNA molecule is increased by a factor of x and that of the jth is increased or decreased also by the same factor, the ith and jth RNA concentrations are said to exhibit a positive and a negative linear correlation, respectively. However, when the concentration of the ith RNA is increased (or decreased) by two different factor, say, x and xn, respectively, where the absolute value of n is greater than or less than 1, we are dealing with nonlinear correlations.

  2. 2.

    It is evident that the correlated changes in the intracellular concentrations of any pair of RNA molecules are impossible if the two enzyme systems, each supporting its associated RNA trajectory, are not coupled. That is, a linear correlation between [mRNA]x and [mRNA]y in Fig. 12.27 would be impossible without the functional coupling between the (TXDX) complex and the (TYDY) complex, where T is transcriptosome and D is degradosome, and X and Y are the two different RNA molecules, the correlation between whose trajectories are under consideration (see Steps 9 in Fig. 12.27). We can represent this idea symbolically as follows:

    $$ \mathop{{({{{T}}_{\rm{X}}}{{{D}}_{\rm{X}}})}}\limits_a + \mathop{{({{{T}}_{\rm{Y}}}{{{D}}_{\rm{Y}}})\,}}\limits_b \,\,\, \rightleftarrows \,\,\,\mathop{{({{{T}}_{\rm{X}}}{{{D}}_{\rm{X}}})({{{T}}_{\rm{Y}}}{{{D}}_{\rm{Y}}})}}\limits_c \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\Delta {{G}} $$
    (12.38)
  3. 3.

    In principle, there are two distinct mechanisms for coupling two enzymes or enzyme complexes – (1) the cis-mechanism whereby two enzyme systems form a higher-order structure or complex through direct physical binding interactions and (2) the trans-mechanism whereby two enzyme systems are coupled indirectly by sharing diffusible substrates (e.g., mRNA) or regulators (e.g., ATP, ADP, glucose). Similar cis- and trans-mechanisms operate in the control of gene expression mechanisms. Since no spontaneous interaction can occur without the associated Gibbs free energy decrease (under the conditions of constant temperature and pressure), Eq. 12.39 holds, where G is Gibbs free energy and the subscripts indicate the reactants and the products appearing in Reaction (12.38):

    $$ \Delta {{G}} { } = { }{{{G}}_c} { } - { (}{{{G}}_a}{ } + { }{{{G}}_b}{)} { } < { }0 $$
    (12.39)
  4. 4.

    The Gibbs free energy levels of (TXDX), (TYDY) and (TXDX)(TYDY) in Reaction (12.38) are represented diagrammatically as Gi, Gj, and Gij in Fig. 12.31. It is assumed that the transition state of all these enzyme systems is common as indicated by the red dotted line labeled G. Once the higher-order complex (TXDX)(TYDY) is formed spontaneously, it may act as a functional unit (or better as a SOWAWN machine; see Sect. 2.4.4), catalyzing the net formation of mRNAX and mRNAY in a coordinated manner so that their trajectories are correlated either linearly or nonlinearly as shown in Table 11.12. Just as the function-related behavior of an mRNA trajectory (or ribon) signals the underlying coupling between its transcriptosome and degradosome (see Steps 7 and 8 in Fig. 12.27), the nonrandom correlations found between two mRNA trajectories (or ribons) (see [mRNA]X and [mRNA]Y in Fig. 12.27) signal the underlying control mechanism that is thermally activated and functions during the observational period of 850 min (see Step 9 in Fig. 12.27). Therefore, it seems logical to invoke at least two levels of metabolic control in budding yeast – the first-order control catalyzed by the individual (TD) complexes and the second-order control exerted by the pathway-wide (TXDX)(TYDY) complexes. By “pathway-wide,” I mean the system of transcriptosomes and degradosomes belonging to a given metabolic pathway. Since each metabolic pathway usually involves 10–30 enzymes, the number of all possible pairs of enzymes, X and Y, that belong to a pathway would range from about 50 to 500 (based on the equation n(n − 1)/2, where n is the number of different enzymes belonging to a pathway). If all these possible pairs of enzymes function coherently, as evidenced by the correlations found among the intra-pathway mRNA trajectory pairs shown in Table 11.12, it may be reasonable to assume that any pathway-wide enzyme complexes such as (TXDX)(TYDY) complexes work, at least transiently, as a functional unit (i.e., as a SOWAN machine). Furthermore, it seems possible that the biochemical and control behaviors of pathway-wide enzyme complexes (TXDX)(TYDY) differ from the behaviors of individual enzymes, T and D, just as quantum dots behave quite differently from their constituent atoms (see Table 4.7). If this is true, the Gibbs free energy levels of the (TXDX)(TYDY) complex may contain the elements not predictable from the Gibbs free energy levels of its component enzymes alone.

    Fig. 12.31
    figure 31

    A diagrammatic representation of the Gibbs free energy levels of TXDX and TYDY complexes (see Gi and Gj) and their higher-order complex (TXDX)(TYDY) enzyme complexes (see Gij). The Gibbs free energy level of the common excited state is denoted by G. ΔGi is the Gibbs free energy difference between the excited and ground states of the ith enzyme system, i.e., ΔGi = G – Gi. It is assumed that TXDX and TYDY complexes have five energy levels and the (TXDX)(TYDY) complex has nine free energy levels (see Fig. 12.29 for related discussions). T = transcriptosome, D = degradosome, and X and Y are two different RNA trajectories

  5. 5.

    The (TXDX)(TYDY) complex in Fig. 12.31 can coordinately catalyze the formation of mRNAX and mRNAY (1) if it possesses a set of different conformational states (or different Gibbs free energy levels or quantum states) and (2) if it can be thermally excited from any one of these ground states with free energy level Gij to the common excited state with free energy level G ij , during the life time of which the complex can catalyze the coordinated enzymic process (through Step 9 in Fig. 12.27) with the rate constants determined by ΔG ij (see also Row 8 in Table 11.9):

    $${\text{k}}_{\text {ij}} = {\text{A}} {\text{e}}^ - (\Delta {\text{G}}\_ \text {ij} \ddag )/RT$$
    (12.40)
    $$ {{where}}\;\Delta {{{G}}_{\rm{ij}}}^{{\ddag { }}} = { }{{{G}}^{\ddag }} - { }{{{G}}_{\rm{ij}}} $$
    (12.41)

    and kij is the rate constant with which two RNA trajectories i and j change in coordination, and R is the gas constant (which is equal to Nk, where N is the Avogadro’ number and k is the Boltzmann constant), and T is the absolute temperature. Inserting Eq. 12.41 into 12.40 and dropping the subscript for brevity leads to:

    $$ {{k }} = {{ A}}{{{e}}^{{-{\rm{ (G}}\ddag { } - {\rm{G)}}/{\rm{RT}}}}}{ } = {{ A}}^{\prime}{{{e}}^{{{\rm{G }}/{\rm{RT}}}}} $$
    (12.42)

    where A′ = Ae– G‡ which is a constant, since G is assumed to be constant for all enzymes and enzyme complexes as indicated by the red dotted line in Fig. 12.31. Equation 12.42 is identical with the equation derived in Table 11.9 (see Row 8). The significance of Eq. 12.42 is that:

    The rate constant of an enzyme is the exponential function of its ground-state Gibbs free energy level, G, and not the Gibbs free energy level of the activated states, G. (12.43)

    The justification of Statement 12.43 derives from the fitting of the single-molecule enzymic data of cholesterol oxidase to the blackbody radiation-like equation, Eq. 11.27 (see Table 11.9).

    1. 6.

      The interiors of Table 12.11 and Fig. 12.30c deal with the rates of changes in RNA levels, denoted as dTL/dt, where TL stands for transcript level, while the margins of these tables and Fig. 12.31 address the Gibbs free energy levels of enzymes and hence rate constant, k (see Eq. 12.42). Rates (usually denoted as v, from velocity) and rate constants (denoted as k) are not the same but are related to each other through Eq. 12.44 under the first-order kinetics conditions, i.e., in the presence of enzymes in excess of their substrates:

    $$ {{v }} \!=\! {{ d[S]}}/{{dt }}\! =\! {{ k[S]}} $$
    (12.44)

    where [S] is the concentration of the substrate for an enzyme. As evident in Eq. 12.44, k can be defined as the v measured under condition where [S] is kept at the unit concentration, i.e., [S] = 1. The key idea behind Eq. 12.44 is that v depends on substrate concentration but k does not or that, under a constant substrate concentration, v and k are quantitatively equivalent.

  6. 7.

    The kinetics of the Xth RNA trajectory catalyzed by the (TXDX) complex is determined by the Gibbs free energy levels (or the quantum states) of TX and DX. The analysis of the TL vs. TR plots such as Fig. 12.6 indicates that transcriptosomes and degradosomes can exhibit at least five distinct turnover rates, i.e., (1) slow decrease, (2) rapid decrease, (3) no change, (4) slow increase, and (5) rapid increase. For example, in (a), Fig. 12.6, during the first phase (i.e., 0–5 min), TL decreases despite the fact that TR increases. This can be accounted for only if we can assume that, during this phase, the rate of transcript degradation (TD) decreases more than TR does. If each enzyme system has five conformational (or quantum) states with free energy levels labeled as −2, −1, 0, 1, and 2 as in Fig. 12.30b, there are 25 possible conformational (or quantum) states for the (TXDX) complex, each associated with a rate of change in TL given in parenthesis as described in Table 12.11. These 25 difference entries group into nine classes as indicated by the nine dotted diagonal lines, and these lines are associated with the relative frequencies of 1, 2, 3, 4, 5, 4, 3, 2, and 1. For example, the relative frequencies of the occurrence of the (dTL/dt) classes, n, d 1 , d 2 , d 3 , and d 4 (or n, u 1 , u 2 , u 3 , and u 4 ) are 5, 4, 3, 2, and 1 (counting the number of cells along the dotted lines). Thus, if all the possible couplings between the quantum states of T and D have an equal probability of occurrence (as assumed in Table 12.11), RNA trajectories are five times more likely to remain unchanged, i.e., dTL/dt = 0 (or n), than to decrease (or increase) rapidly with dTL/dt = d4 (or u4). However, the experimentally observed data (see Series 9 in Fig. 12.29a) deviate from the theoretically predicted behavior depicted as a red triangle. However, two interesting features emerge. (1) Different metabolic pathways tend to show peak frequencies located at different rate classes (see a in Fig. 12.29), and (2) different phases within a given metabolic pathway tend to show peak frequencies at different rate classes (see b and c in Fig. 12.29). Thus, the possibility suggests itself that two metabolic pathways that overlap in a two-dimensional frequency-rate class (FR) plot such as Fig. 12.29a may be distinguishable in three-dimensional frequency-phase-rate class (FPR) plots such as Fig. 12.29b, c and this may make the FPR plots a sensitive tool for monitoring cell states in drug discovery research and personalized medicine (see Chaps. 18 and 19).

15 Time-Dependent Gibbs Free Energy Landscape (TGFEL): A Model of Whole-Cell Metabolism

When two RNA trajectories exhibit a linear Pearson correlation coefficient greater than 0.7 or smaller than −0.7, these trajectories are positively or negatively linearly correlated, respectively. The degree of such linear correlations differs from one metabolic pathway to another, ranging from 39% to 98% (see the bold numbers in Table 12.12). The degree of nonlinear correlation among RNA pairs was determined by calculating the percentage of the intra-pathway or inter-pathway RNA pairs, the Euclidean distances between whose trajectories fit the blackbody radiation-like equation (BRE) (see the numbers in parentheses in Table 12.12).

To account for these observations based on the molecular mechanisms postulated to underlie the coupling between transcriptosomes and degradosomes (see Sect. 12.14), I have been led to invoke what is here called the time-dependent Gibbs free energy landscape (TGFEL) model of whole-cell metabolism as depicted in Fig. 12.32 and explained below:

Fig. 12.32
figure 32

The time-dependent Gibbs free energy landscape (TGFEL) model as applied to the whole-cell metabolism of genome-wide RNA molecules. A = precursor of RNA; B = RNA; C = ribonucleotides. ki = the rate constant for the ith quantum state of an enzyme, i.e., T or D; ΔG i  = the Gibbs free energy of activation for the ith quantum state of an enzyme; Gi = the ith ground-state Gibbs free energy level of an enzyme; R = gas constant; T = temperature; K, K′ = proportionality constants; 1 the rate law of RNA synthesis; 2 the rate constant as the exponential function of the ith Gibbs free energy of activation; 3 the ith Gibbs free energy of activation as the function of the ith ground-state Gibbs free energy level; 4 the ith rate constant as the exponential function of the ith ground-state Gibbs free energy level, and 5 the rate of change in TL, i.e., dTL/dt, as the balance between the transcription rate and the transcript degradation rate

  1. 1.

    Just as the atom is associated with a set of atomic orbitals representing the energy levels of electrons, the living cell is thought to be associated with a set of space- and time-dependent Gibbs free energy levels of biopolymers to be called the time-dependent Gibbs free energy landscape (TGFEL) of the cell.

  2. 2.

    The TGFEL model is formulated in terms of Gibbs free energy (G) and not just energy (E) as in the “energy landscape” concept frequently employed in chemistry and biology (see Eq. 2.1 for the difference between E and G), in order to emphasize the potential role of entropy in determining the ground-state of enzymes. Thus two enzymes, both at their lowed energy levels, can still be at two different Gibbs free energy levels, if entropy contributions are different. The five different Gibbs free energy levels of the TD complexes shown in Figs. 12.30 and 12.31 may all be associated with the same internal energy levels due to the same environmental temperature but still can exist at different ground-state Gibbs free energy levels mainly due to different entropy (or negentropy) contents of the complexes (Ji 1974a).

  3. 3.

    The TGFEL model can be represented as a five-dimensional surface since it takes five numbers to specify the position of a biopolymer in it – the three dimensions of the Euclidean space, time, and the Gibbs free energy level of a biopolymer. In Fig. 12.32, the three Euclidean dimensions are collapsed to (or compactified into) one dimension on the x-axis, the Gibbs free energy levels are encoded on the y-axis, and the time dimension is represented by the z-axis.

  4. 4.

    TGFEL is composed of valleys and a set of sub-valleys within a valley (see Valleys 1 and 2, and sub-valleys labeled 1–5 for T and 1′–5′ for D). The transcriptosomes (T) and degradosomes (D) catalyzing the transcription and degradation of a set of RNA molecules belonging to a metabolic pathway are postulated to occupy the same valley. Thus, Valley 1 may be occupied mostly by the glycolysis RNA molecules and Valley two mostly by protein glycosylation RNA molecules, for example. RNA molecules appear to be able to “cross over” from one valley to another just as they can cross over from one sub-valley to another within a valley (see the curved arrows).

  5. 5.

    The transcriptosomes and their conjugate degradosomes are thought to be coupled mostly through trans-mechanisms, i.e., through sharing common diffusible molecules (e.g., RNA precursors, ATP, nucleotides) which may accumulate within a valley, in contrast to the cis-mechanism of interactions assumed for the (TXDX)(TYDY) complex in Fig. 12.31. In the absence of further evidence, we cannot exclude the possibility that both trans- and cis-mechanisms operate in budding yeast, the relative contributions of which being determined by environmental factors.

  6. 6.

    The Gibbs free energy levels of transcriptosomes can fluctuate randomly among levels 1–5 and those of degradosomes among levels 1′–5′, and the probability of coupling between them may be different under different microenvironmental conditions as evidenced by the pathway-dependent and phase-dependent distributions of dTL/dt values shown in Fig. 12.29.

  7. 7.

    The rate constants of transcriptosomes and degradosomes are determined by their ground-state Gibbs free energy levels obeying an equation similar to Eq. 12.42, because the Gibbs free energy of activation of a given enzyme is the function of its ground-state Gibbs free energy level (see the upward arrows in Fig. 12.29 and (4) in Fig. 12.32).

  8. 8.

    TGFEL consists of two orthogonal planes denoted as the xy- and the yz-planes in Fig. 12.32. The former will be referred to as the synchronic plane and the latter as the diachronic plane to reflect the assumption that the processes occurring on the synchronic plane (see the arrows connecting the sub-valleys on trajectory 5) are much faster than those occurring on the diachronic plane (see the green trajectories labeled t1–t4) by a factor of at least 102 so that the generalized Franck–Condon principle or the Principle of Slow and Fast Processes (Sect. 2.2.3) applies to TGFEL.

  9. 9.

    The topology of TGFEL is postulated to evolve in time (as indicated by the green trajectories labeled 1–4). The shapes of these trajectories are postulated to be selected by biological evolution and hence record the evolutionary history of living cells (e.g., S. cerevisiae).

  10. 10.

    Since the height of the landscape where an enzyme (e.g., T and D) or an enzyme complex (e.g., TD) is located is directly related to the rate constant of the enzyme according to Eq. 12.42, we can quantitatively equate the curves on the yz-plane as RNA trajectories under the first-order kinetics conditions mentioned in connection with Eq. 12.44. In other words:

    Equation 12.42 transforms the curves on the diachronic plane of TGFEL into RNA trajectories under the first-order kinetic conditions. (12.45)

  11. 11.

    Not all the RNA trajectories located in a valley are correlated (e.g., see trajectories t3 and t4), thus accounting for the variable degrees of the linear and nonlinear correlations found in budding yeast undergoing glucose–galactose shift (see Table 12.12).

  12. 12.

    Two RNA trajectories belonging to two different metabolic pathways can be correlated (e.g., see trajectories t1 and t4), thus accounting for the high degrees of linear correlations found between some inter-pathway RNA pairs (see P1-P7 and P3-P7 entries in Table 12.12).

16 The Common Regularities (Isomorphisms) Found in Physics, Biology, and Linguistics: The Role of Gnergy

So far I have described two kinds of regularities. The quantitative regularity in the form of the blackbody radiation-like equation (BRE), Eq. 11.27 (without the additive term), that has been found to apply to blackbody radiation, single-molecule enzymology (Sect. 11.3.3), protein stability, and whole-cell metabolism (Sect. 12.12) and the qualitative regularity in the form of the linguistic rules and concepts found in natural (or human) language and cell language (Sect. 6.1.2). These regularities and their fields of applications are summarized in Rows 2 and 3 in Table 12.13. The first row of this table also exhibits another quantitative regularity, i.e., y = ax log x, which can be viewed as a generalization of both the Shannon entropy equation, Eq. 4.2, and the Boltzmann entropy equation, Eq. 4.23. It may be asserted that BRE and the Boltzmann entropy-like equation (BEE), y = ax log x, represent two of the very few mathematical equations that have been found to apply to both physics and biology.

Table 12.13 The regularities common to physics, biology, and linguistics as revealed by the “table analysis” (Ji 1991). Experience is assumed to possess two complementary aspects – quantitative and qualitative. Only the quantitative aspect of experience is subject to dimensional analysis (Stahl 1961). The following dimensionalities are assumed to be fundamental: M = mass; L = length; T = time; Q = electrical charge; Θ = temperature; and N = number of moles of chemicals. Three universal properties are suggested in this table: Iqt = quantitative information (Sect. 4.3); E = energy, including free energy (Sect. 2.1.2); and G = gnergy, postulated to be the universal driving force for all organizations in the Universe including communication (Sect. 2.3.2)

It should be pointed out that both BRE and BEE can be viewed as the “nondimensionalized” version of Planck’s radiation formula, Eq. 11.26, and Boltzmann equation, Eq. 4.23. A “nondimensionalized” equation is an equation with nondimensional parameters (i.e., numbers without any measuring units) that can be derived from its original physically meaningful equation based on the Buckingham π theorem. According to this theorem, if a physically meaningful equation contains a certain number, n, of physical values which can be expressed in terms of k independent fundamental physical quantities (e.g., mass, length, charge, temperature, etc.), the original expression can be converted into an equation involving a set of p = n – k dimensionless parameters constructed from the original variables (http://en.wikipedia.org/wiki/Buckingham_%CF%80_theorem).

The first two rows numbered 1 and 2 exhibit the quantitative regularities common to six different fields (see 1a, 1b, and 2a–2d in the second column) and the last row numbered 3 lists the qualitative regularities found in two fields (3a and 3b). Thus, the topics analyzed in this table using the “table method” of analysis (Ji 1991, pp. 8–13) cover the widest possible range of sciences, unlike, say, the dimensional analysis (http://en.wikipedia.org/wiki/Dimensional_analysis) which is limited to analyzing quantitative aspect of reality (Stahl 1961). As evident in Table 12.13, the objects appearing in the first six categories have either some dimensions or are dimensionless, while the objects in the last two categories represent qualitative entities without any quantitative dimensions.

The regularities appearing in the first column of Table 12.13, whether quantitative or qualitative, can be viewed as systems, machines, functions, or structure-preserving maps that convert an input (denoted as x on the top row) into an output (denoted as y). In category theory (defined in Sect. 12.17, Eqs. 12.55 and 12.56), such regularities are referred to as morphisms, and x and y are referred to as the source object and the target object, respectively. A category is a very abstract mathematical construction characterized by a set of objects that can be transformed into one another according to a set of rules called morphisms. Hence each of the eight rows in Table 12.13 numbered 1a–3b can be viewed as a category. The eight categories in Table 12.13 are grouped into three higher-order categories numbered 1, 2, and 3 based on the common properties (or universal properties) given in the last column. The universal property is here simply defined as the properties common to two or more categories. The formal definition of the universal property (http://en.wikipedia.org/wiki/Universal_propeerty) is complex and beyond the scope of this book. The universal properties common to a set of categories may require more than one term to be adequately expressed as evidenced by the appearance of multiple terms in each of the major categories in the last column of Table 12.13.

The universal property common to statistical mechanics and information theory (see Rows 1a and 1b) is suggested to be the quantitative aspect of “information,” denoted as Iqt. Information (I) is thought to have two complementary aspects – quantitative (Iql) and qualitative (Iqt). The Shannon equation applies only to Iqt and is blind to Iql. We may represent this idea thus:

$$ {{I}} = {{{I}}_{\rm{gt}}}^{ \wedge }{{{I}}_{\rm{ql}}} $$
(12.46)

where the symbol “A = B^C” represents the statement that “A is the complementary union of B and C ” or, equivalently, that “B and C are the complementary aspects of A” (Sect. 2.3.1). In addition, it is suggested here that I may be related to Firstness of Peirce (see Sect. 6.2.2) (hence denoted as 1-I), Iqt to Secondness (hence denoted as 2-I), and Iql to Thirdness (hence denoted as 3-I), although it is not possible to prove the legitimacy of this assignment.

The universal property spanning the four categories 2a–2d is suggested to be the quantization of energies (E), including free energies. It should be recalled that free energies are the functions of both energy (E) and entropy (S) (see Eq. 2.1). The process of quantization may be more general than has been thought in quantum mechanics. Quantization occurs not only in blackbody radiation (see Row 2a) but also in protein folding (Row 2b), single-molecule enzymology (Row 2c), and whole-cell RNA metabolism (Row 2d), as evidenced by the fact that some aspects of these processes all obey the same mathematical equation, the blackbody radiation-like equation (BRE) (see the second row and the first column of Table 12.13). Not only energy (or more accurately “action”) but entropy (and hence free energy) may be quantized. According to Gilson and McPherson (2011), the Boltzmann constant k is quantized and hence so is entropy (S), since k and S have the same dimensionality as evident in Eq. 4.23. The fact that protein stability, single-molecule enzyme activity, and whole-cell RNA kinetic data fit BRE may be an experimental evidence supporting the postulate of the quantization of the Gibbs free energy in the living cell. Just as the fitting of the blackbody radiation data into the Planck’s formula indicated the organization of the energy levels of electrons within the atom, so perhaps the fitting of the above biological data into BRE indicates that the Gibbs free energy levels of enzymes are organized inside the living cell. Again:

Just as the transitions of electrons between the energy levels in atoms are responsible for the absorption or emission of photons, so the transitions of enzymes between their Gibbs free energy levels inside the cell may be responsible for the rise and fall of the concentrations of intracellular biochemicals (including RNA) that determine cell functions. (12.47)

We may refer to Statement 12.47 as the Postulate of the Quantization of the Gibbs Free Energy of the Living Cell, or more briefly the Cell Free Energy Quantization Postulate (CFEQP).

In 1991, I postulated that all the molecular machines inside the cell are driven by conformons (i.e., the conformational strains of biopolymers harboring mechanical energy at sequence-specific sites) and that the minimum energy content of the conformon is kT or ~4 × 10−14 ergs, which is about 10 orders of 10 greater than Planck’s quantum, hν (where ν is the wave number) (Ji 1991; Table 1.9). This postulate may be expressed alternatively as:

The Boltzmann constant k is to biology what the Planck constant h is to quantum mechanics.(12.48)

Statement 12.48 may be referred to as the k-h isomorphism postulate.

The universal property inherent in the last two categories of Table 12.13 is suggested to be communication or information exchange, for which both information (I) to be transferred and the energy (E) to drive the transfer process are absolutely required according to Shannon’s channel capacity equation, Eq. 4.29. Since the complementary union E and I is referred to as gnergy (G) (Sect. 2.3.2), it would follow that communication absolutely requires (or is synonymous with) gnergy.

The selection of I (not Igt), E, and G as the universal properties of the associated categories in Table 12.13 is motivated by the information-energy complementarity postulate described in Sect. 2.3.2. In other words, the three universal properties, I, E, and G, are postulated to form a higher-order category, to be designated as C2 (from “CC” or the “cosmological category”), which is a 9-tuple consisting of four objects (denoted with capital letters) and five morphisms (denoted with lower-case letters) as follows:

$$ {{CC }} = {{ (E}},{{ I}},{{ G}},{{ U}},{{ f}},{{g}},{{ h}},{{ j}},{{ k)}} $$
(12.49)

Equation 12.49 can be represented graphically using the “commutative diagram” as shown in Fig. 12.33. A commutative diagram can be viewed as a network wherein the vertices are objects and directed edges (or arrows) are morphisms with the characteristics that all directed paths in the network with the same endpoints lead to the same result by the operation called “composition” usually denoted by the symbol ◦. Thus if the diagram in Fig. 12.33 commutes, it follows that g◦f = k◦h. I find it necessary to introduce another kind of composition to be denoted by the symbol ^ such that A^B = C indicates that A and B are the complementary aspects of C. The commutativity diagram in Fig. 12.33 embodies two major principles discussed in this book, namely, complementarity and supplementarity, first introduced into physics by N. Bohr (1958) and discussed in detail in Sect. 2.3.1. These principles are represented as two different compositions of morphisms as shown in Eqs. 12.50 and 12.51.

Fig. 12.33
figure 33

The commutative diagram for the cosmological category (CC or C2) consisting of four objects, energy/matter (E), information (I), gnergy (G), the universe (U) and five morphisms, tentatively identified as f (physical interactions), g (quantization), h (biological evolution), j (cosmogenesis), and k (communication) embodying/reflecting/ organized by two universal principles of complementarity and supplementarity discussed in Sect. 2.3.1

Hence we can characterize the commutativity diagram in Fig. 12.33 by the following statement:

The C2 category is complementarity/supplementarity dual. (12.52)

Based on Statement 12.52, we may refer to the C2 category as the complementarity/supplimentarity dual category. The physical interpretations of the five morphisms appearing in the C2 commutativity diagram are indicated in the legend to Fig. 12.33. These interpretations may be subject to improvements as our knowledge progresses.

17 Signal Transduction

Living cells constantly communicate with their environment using molecules as information carriers. The molecules carrying environmental information are called the primary messengers, and most primary messengers cannot enter the cell interior due to the impermeable cell membrane, except steroids that are lipid-soluble and hence can penetrate the hydrophobic barrier provided by the cell membrane. Thus, in order for the extracellular information to be transmitted to the interior of the cell, the information carried by primary messengers must be transferred to, or transduced into, secondary messengers catalyzed by receptors embedded in the cell membrane (see Table 12.14). This is phenomenon is known as signal transduction.

Table 12.14 Five mechanisms of transmembrane signaling in cells. The table was reproduced from Fig. 2.8, p. 18 in Katzung (2001)

Barbieri (2003, p. 108) recognizes three distinct mechanisms of signal transduction across the cell membrane as explained in the legend to Fig. 12.34. What is common to all these mechanisms of transmembrane signal transduction is the role played by membrane receptors which act as molecular machines that “translate” first messengers to second messengers (see Table 12.14). As can be seen in the second and the fifth columns of Table 12.14, there is no structural relation (or similarity) between first and second messengers (compare, for example, acetylcholine and cAMP, the first and second messengers for G-protein coupled receptors). In other words, the relation between first and second messengers are arbitrary from the point of view of chemistry and physics but fixed and constant (or absolute) from the point of view of semiotics or communications theory in that the information carried by first messengers are reliably transmitted across the cell membrane to second messengers. This phenomenon is reminiscent of the arbitrariness of signs in linguistics (see Sect. 6.1.3), and this principle evidently applies to signal transduction in the cell as postulated by the cell language theory (Ji 1997a, b).

Fig. 12.34
figure 34

Three mechanisms of transmembrane signal transduction according to Barbieri (2003). (a) The one-to-one signal transduction, where one primary messenger produces a corresponding secondary messenger. (b) The one-to-many signal transduction, where one primary messenger produces more than one secondary messenger. (c) The many-to-one signal transduction, where many different primary messengers produce an identical secondary messenger

There are five well-established mechanisms effectuating transmembrane signaling in living cells as summarized in Table 12.14. All except Mechanism 1 are mediated by receptors embedded in the cell membrane. Membrane receptors can be viewed as molecular machines that perform three major biological functions:

  1. 1.

    The specific recognition of the cognate primary messengers

  2. 2.

    The coupling between first messengers and their intracellular counterparts, i.e., second messengers and

  3. 3.

    The amplification of the second messenger signals either directly by acting as a kinase or an ion channel (see Mechanisms 2 and 4 in Table 12.14) or indirectly via another protein acting as a kinase (see Mechanisms 3 and 5)

We may refer to these functions as the triadic functions of membrane receptors. Since all molecular machines must work in more than one cycle in a given direction, their operation cannot be driven by random thermal fluctuations or Brownian motions alone but must ultimately be driven by free energy dissipation. There are two free energy sources for membrane receptors – (1) ATP or GTP for Mechanisms 2, 3, and 5 (see Table 12.14), and (2) transmembrane ion gradients for Mechanism 4.

The signal transduction machinery (STM) is more complex than membrane receptors and the interactions with their ligands. The STM of the cell may be viewed as composed of at least eight distinct components as depicted in Fig. 12.35. Common examples of the first three components of STM are listed in Table 12.14.

Fig. 12.35
figure 35

The major components of the signal transduction pathways in the living cell. Red = extracellular space; Green = cell membrane; Purple = intracellular space. Blue = nucleus. The receptor in this figure can be either membrane embedded or located in the cytosol

Components 4, 5, 6, and 7 in Fig. 12.35 cooperate inside the cell to produce intracellular dissipative structures (IDSs or dissipatons) that are thought to determine (or are identical with) cell functions. The principle of their operation can be described in two complementary ways, based on either (1) the energetic/structural perspective or (2) the semiotic/linguistic perspective. The energetic/structural principle underlying the workings of Components 4–7 is generally known as “protein–protein interactions” where structural complementarity and free energy of binding play fundamental roles. In contrast, the semiotic/linguistic principles underlying the operation of Components 4–7 are rarely discussed in the current literature on signal transduction, one exception being the cell language theory (Ji 1997a, b). According to the cell language theory (Sect. 6.1.2), the linear sequence of Components 4–7 in Fig. 12.34 can be viewed as molecular sentences and hence the principles of (1) double (or triple) articulations and (2) syntagmatic and paradigmatic relations should apply to them.

The triple articulation is defined in the second column and Rows 4, 5, and 6 in Table 12.14, where the arrow symbol can be read as “form” or “produce.” I formulated the notion of third articulation in 2003. Although linguists apparently have not widely discussed third articulation, there is no reason why the number of articulations in human language should stop at two. Hence it was proposed that human language exhibits the phenomenon of “third articulation,” defined as a sequential arrangement of sentences to form texts. If the isomorphism thesis between cell and human language is valid, there should exist the first, second, and third articulations in cell language as well (see the third column). The third articulation in cell language is suggested to be space-and time-dependent changes in concentrations or diffusible molecules or mechanical strains (known as conformons) inside the cell. Such dynamic structures were referred to as IDSs (intracellular dissipative structures or dissipatons) in the Bhopalator model of the living cell (Fig. 2.11), and IDSs in turn may be viewed as related to the concept of metabolic spacetime proposed by Welch and his colleagues (Welch and Smith 1990; Smith and Welch 1991; Welch and Keleti 1981).

Table 12.15 Triple articulations in cell and human languages

The concept of paradigmatic and syntagmatic relations are useful in understanding the structure of signal transduction pathways in general and the mode of operations of Components 4–7 in Fig. 12.35, in particular. These concepts are explained in Table 12.15 using familiar examples. The syntagmatic relation refers to the relation among the components of a sentence such as the subject, verb, and object as shown in the first row of Table 12.16. Thus, in English language, the subject of a sentence precedes the verb which in turn precedes the object: “He” precedes “loves” which precedes “her,” etc. This is analogous to Component 4 preceding Component 5 which precedes Component 6 which precedes Component 7 in STM shown in Fig. 12.35 or to MAPKKK preceding MAPKK which precedes MAPK which precedes transcription factors in Fig. 12.36. This figure summarizes the MAP kinase signaling cascade of vertebrates as reviewed by Seger and Krebs in 1995. An analogous signaling pathway was found to operate in the unicellular organism S. cerevisiae (Seger and Krebs 1995).

Table 12.16 The syntagmatic (row) and paradigmatic (column) in human language
Fig. 12.36
figure 36

The MAPK Signal Transduction Pathways (Cascades), adopted from Seger and Krebs (1995). Three distinct signal transduction pathways are shown. All of these pathways are composed of six functional elements (numbered 2–7), which, when activated by primary messengers or stimuli (numbered 1), lead to specific functions or responses (numbered 8). The kinematics (concerned with the question as to which of the possible pathways are actually activated under a given environmental condition) and the dynamics (concerned with the question about for how long and how fast a given pathway is activated) are determined by the primary messenger (or primary perturbation) and the state of the cell involved. To completely describe a signal transduction event, it is necessary to elucidate not only the kinematics but also dynamics of signal transduction since these are the complementary aspects of signal transduction (Sect. 2.3.5). MAPK mitogen-activated protein kinase, MAPKK or MAP2K MAPK kinase, MAPKKK or MAP3K MAPKK kinase, MAP4K MAP3K kinase, RAS rat sarcoma, PKC protein kinase C, MEKK MEK kinase, MEK mitogen-activated, ERK-activating kinase, MEK ½ MEK 1 and 2, ERK ½ ERK 1 and 2, ERK extracellular regulated kinase, RSK ribosomal S6 protein kinase, MAPKAPK2, MAP kinase-activated protein kinase 2. This table exemplifies the distinction between types and tokens discussed in Sect. 6.3.9. Seger and Krebs refer to types and tokens as generic and specific names, respectively

The paradigmatic relation is obtained between two or more words that can occupy the same syntagmatic position in a sentence. Thus all the words appearing within a column in Table 12.16 are related paradigmatically. Some examples of the paradigmatic relation as applied to the MAP kinase signal transduction pathway are given in the columns in Table 12.17, along with examples showing the syntagmatic relations given in the rows of the same table.

Table 12.17 The paradigmatic (row) and syntagmatic relations (column) in cell language. The question mark indicates the protein predicted to exist in vertebrate cells in analogy to the protein found in S. cerevisiae (Seger and Krebs 1995). See Fig. 12.36 for a more complete description of the MAPK signaling cascade

It is possible that, depending on the environmental conditions, an extracellular signal (or a first messenger) can trigger more than one series of protein–protein interactions (or molecular sentences) thereby activating a molecular text, which is an example of a third articulation. The postulated biological functions of the first, second, and third articulations are discussed in Table 12.18 in terms of the protein molecular language.

Table 12.18 Structural and functional comparison of protein language (proteinese) and human language (humanese)

Although proposed more than a decade before the cell language theory (Ji 1997a), which has proven to provide a sound theoretical foundation for signal transduction, the Bhopalator model of the cell (Ji 1985a, b, 2002b) appears to be capable of acting as the signal transduction machinery described in Fig. 12.36. The 20 steps constituting the Bhopalator can be roughly grouped into 6 steps of the signal transduction machinery as shown in the lower right-hand corner of Fig. 12.37.

Fig. 12.37
figure 37

The living cell as a signal-transducing machinery. The overall signal transduction process carried out by the cell can be decomposed into six major steps. Step 1 = transmembrane signaling; Step 2 = intracellular signal processing (also called molecular computing); Step 3 = activation or inhibition of protein factors that bind to DNA; Step 4 = interaction between processed transcription factors and target DNA regions, including promoters, enhancers and silencers; Step 5 = feedback interactions between the genome and membrane receptors; and Step 6 = cell outputs, including secreted proteins and small molecules, and mechanical processes such as cell shape changes and cell migration

It is a truism to state that no communication is possible without a language. Since communication is essential for cells to survive and function, cells must possess languages of their own, and such a hypothetical language was referred to as the cell language in (Ji 1997a, b) (see Sect. 6.1.2). Just as the computer language has many layers (i.e., digital logic, microarchitecture, instruction set architecture, operating system machine, assembly, and problem-oriented languages (Tanenbaum 2003)), so the cell language appears to have multiple layers:

  1. 1.

    DNA language = DNese

  2. 2.

    RNA language = RNese

  3. 3.

    Protein language = proteinese

  4. 4.

    Metabolite language = metabolese (e.g., ATP, ADP, glucose, H+, metal ions)

  5. 5.

    Intercellular language = intercellese (e.g., hormones, cytokines, PGs, ion gradients)

Of these cell sub-languages, proteinese is unique because it is the only autonomous (or active) language in the sense that only proteins acting as enzymes (except for some RNAs acting as ribozymes) can utilize the chemical free energy locked up in small molecules such as glucose, NADH, and ATP. Therefore, we can state that proteinese is the primary engine of the cell language and the other sub-languages are secondary and passive. So, to understand how the cell language works, it would be essential to understand how proteinese is constructed and works.

Proteinese and the human language (or briefly humanese) are compared at five structural levels in Table 12.18.

The relation between Columns B and C are well established in linguistics (Hockett 1960; Culler 1991). The relation between Columns A and B is suggested by the cell language theory. Therefore, Column A and C must be related, as the following syllogism demonstrates:

Based on the inference presented in Fig. 12.38, it appears reasonable to suggest that semantic biology of Barbieri (2003, 2008a, b) emerges logically from the combination of linguistics and molecular biology, i.e., the cell language theory (Ji 1997a, b).

Fig. 12.38
figure 38

Predicting the biological functions of the components of proteinese based on the cell language theory (Ji 1997a, b). 1 = Major premise; 2 = minor premise; 3 = conclusion. A, B and C refer to the columns so labeled in Table 12.18

18 Computing with Numbers, Words, and Molecules

The concept of computing is widely discussed not only in computer science and engineering but also in mathematics (Wolfram 2002), physics (Lloyd 2006), brain/mind research, and biology (Adleman 1994; Ji 1999a). This is most likely because computing is a general concept that can be defined as follows:

Computing is a series of the input-induced state transitions of a material system, artificial or natural, obeying a set of axioms, rules, and/or laws, leading to observable outputs. (12.53)

Thus defined, the concept of computing can be applied even to the Universe (Lloyd 2006, 2009). In Table 12.18 in Sect. 12.16, it was suggested that protein networks of the cell are the units of reasoning or computing. That is, the cell computes. In this section, the following items are discussed:

  1. 1.

    Three classes of computing

  2. 2.

    Computing as a category

  3. 3.

    The “conformon-P machine” as a formal model of the living cell

  4. 4.

    The “Turing/Zadeh complementarity” model of computing

  5. 5.

    The Bhopalator, a molecular model of the living cell, and its implications for computational theories of mind

(1) We can recognize three classes of computing– numerical, lexical, and molecular. They are distinguished by the nature of signs being manipulated to accomplish computing. The first class of computing is too well known to be commented on. The concept of “computing with words (CW)” was developed by Lotfi Zadeh in the mid-1990s by “fuzzifying” traditional crisp numerical variables into what he called “linguistic variables” (1996a, b, 2002). In Adleman (1994) an instance of the directed Hamiltonian path was solved by manipulating DNA fragments in test tubes. Living cells can be considered to be the smallest molecular computers in nature, since cells have evolved to manipulate molecular signals or messages based on genetic instructions or rules encoded in DNA, leading to desired outputs (Ji 1999a). In 1997, I reviewed some of the vast amount of experimental data available in the literature concerning the phenomenon of “apoptosis” (also called “programmed cell death”) and was led to conclude that cells have evolved to obey the following type of instructions (Ji 1997b):

If you are in cell state X and receive signal Y, then do Z. (12.54)

The conditional instruction, Statement 12.54, is very similar to (or is an example of) the “if-then” rule in fuzzy computing or computing with words. In this sense, the living cell and its molecular model, the Bhopalator (Ji 1985a, b), can be viewed as the natural “fuzzy computer.” The Turing machine is not a fuzzy computer in that its hardware is constructed on the basis of crisp binary logic, not fuzzy logic. But the Turing machine can be made to perform computations using fuzzy logic. The salient features of the above three classes of computing are summarized in Table 12.19.

Table 12.19 Three classes of computing

(2) Each of the three classes of computing shown in Table 12.19 may be viewed as a category in the mathematical sense. According to (Herrlich and Strecker 1973):

A category is a triple:

$$ \hskip 12pc{{C = }}\left( {{{O}},{{U}},\;\hom } \right)\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad \quad\ \ \,(12.55)$$

where C is category; O is a class (or a collection) whose members are called C-objects; U: O → u is a set-valued function, where for each C-object A, U(A) is called the underlying set of A; and hom: O × O → u is a set-valued function, where for each pair (A, B) of C-objects, hom(A, B) is called the set of all C-morphisms with domain A and codomain B.

It should be noted that u is the class of all sets. Classes differ from sets in that they are immune to the logical contradiction known as Russell’s paradox: Let R be the set of all sets that are not members of themselves. If R is a member of itself, then by definition, it must not be a member of itself. If R is not a member of itself, again by definition, it must be a member of itself. Thus R is both X and not–X at the same time, violating the principle of crisp logic, the law of excluded middle.

Baez (http://math.ucr.edu/home/baez/week73.html) provides another useful definition:

A category is something just as abstract as a set, but a bit more structured. It is not a mere collection of objects; there are also morphisms between objects, in this case the functions between sets…. A category consists of a collection of “objects” and a collection of “morphisms.” Every morphism f has a “source” object and a “target” object. If the source of f is X and its target is Y, we write f: \( {{X}} \to\! \!{{ Y}} \). In addition, we have:

  1. (a)

    Given a morphism f: \( {{X}} \to \!\!{{ Y}} \) and a morphism g: \( {{Y}} \to {{Z}} \), there is a morphism fg: \( {{X}} \to\!\! {{ Y}} \), which we call the “composition” of f and g

  2. (b)

    Composition is associative: (fg)h = f(gh)

  3. (c)

    For each object X there is a morphism 1X: \( {{X}} \to \!\!{{ X}} \), called the “identity” of X. For any f: \( {{X}} \to \!\!{{ Y}} \) we have 1Xf = f 1X = f. (12.56)

The first two three rows of Table 12.19 may correspond to the O (i.e., objects) and hom (i.e., morphism) components of a category defined in Statement 12.56.

(3) In analogy to the Turing machine, it may be convenient to refer to computing machines based on words or “linguistic variables” as “Zadeh machine.” A linguistic variable consists of a “linguistic term” and a positive real “number,” between 0 and 1, indicating a degree of membership to a fuzzy set (for a definition of fuzzy set, see Sect. 5.2.5) (Kosko 1993; Zadeh 1996c). Frisco and Ji (2002, 2003) applied the biological concepts of the conformon (Ji 1974b, 2000) and the cell membrane to modeling computability. It is shown that this so-called conformons-P system belongs to the universality class of the Turing machine. The suffix P stands for the P-system, a biological membrane-inspired computational model developed by G. Paun and his school in the 1990s (Paun 2002; Paun et al. 2002). The conformons-P system can be viewed as a formal model of the computing aspect of the living cell in contrast to the Bhopalator (Ji 1985a, b) which is its molecular model of the living cell as a whole. We can alternatively refer to the conformons-P system as the conformon-P machine, whenever convenient.

The conformon of the conformons-P system is a pair, [X, x], where X is the “name” and x the “value” of the conformon. X and x thus defined are the abstractions of their biological counterparts, information and energy, that constitute the complementary pair of the original conformon. Conformon [X, x] is formally identical with Zadeh’s linguistic variable, if X and x are equated with the linguistic term and the degree of membership to a fuzzy set, respectively. Consequently, the conformon-P machine can be reduced to the Zadeh machine, as it can be reduced (or related) to the Turing machine (Fig. 12.39).

Fig. 12.39
figure 39

The Turing and the Zadeh machines as the complementary aspects of the conformon-P machine, or the formal model of the Bhopalator

(4) As pointed out by fuzzy theorists (Zadeh 1996a, b, c; Kosko 1993; Yen and Langari 1999), the Turing machine is based on crisp sets, while Zadeh machine is rooted in fuzzy sets. I here postulate that the set of molecules underlying the conformon-P machine is both crisp (e.g., nucleotide sequences of DNA) and fuzzy (e.g., ensemble of conformations belonging to a given amino acid sequence of a protein) (Ji 2004a). Because of its “A AND not-A” nature (Kosko 1993), it may be asserted that the Bhopalator can act as the ultimate source or ground for both the Turing and Zadeh machines. According to Zadeh, probability theory (which is based on crisp sets) and fuzzy logic are complementary rather than competitive (Zadeh 1995). Therefore, it may be reasonable to suggest that the Turing machine and Zadeh machines are complementary. If so, there must exist a third term or entity, of which these two machines represent its complementary aspects, and it seems logical to conclude that the Bhopalator (and its formal model, the conformon-P machine) can qualify as the third entity.

(5) If the Turing/Zadeh complementarity model of computing turns out to be true in principle, it may have important applications in cognitive sciences. The computational theories of mind described in Putnam (1961) and Fodor (1975) appear to assume that the Turing machine is the best theoretical framework now available to model computing (Ji 1991, pp. 205–209). If the content of Figs. 12.39 and 12.40 is correct, the Turing machine may at best capture the crisp aspect of human mind, and misses out on its fuzzy aspect. In addition, the Turing machine, being formal and macroscopic, may completely miss out on the molecular energetic grounds for the working mechanisms of the human mind. Hence, it may be reasonable to suggest that the Bhopalator provides a sound starting point for modeling the human mind (Ji 2003a). This suggestion may gain support from the cell language theory, according to which living cells use a molecular language that shares with human language a common set of semiotic principles (Ji 1997a, b) (see Sect. 6.1.2).

Fig. 12.40
figure 40

The commutative diagram relating the molecular model of the cell, the Bhopalator, to the computer models of Turing and Zadeh. This diagram is consistent with Fig. 12.38 wherein the Turing and Zadeh machines are viewed as the complementary aspects of the Bhopalator