Introduction

The spate of “systems biology” initiatives that in the past few years has produced new institutions, departments, and research or educational programs suggests that “systems biology” is a reality, here to stay, and not merely a buzzword. But what is systems biology? While there is currently no consensus on a definition, life scientists have surprisingly quickly adopted this new term into their vocabulary because, on the surface, it has conveniently allowed them to communicate the dawn of a new era of interdisciplinary, all-encompassing biology in academia and the biotechnology industry. The deeper reason for the rapid and unanimous acceptance of the term despite the lack of a tangible definition is perhaps that it reflects an increasing awareness of the limitations of “reductionism” in molecular biology and of the necessity to overcome them.

The vast diversity of justifications for systems biology initiatives mirrors the similarly wide spectrum of what individual life scientists regard as “reductionism.” Some biologists are disturbed by the overt emphasis on genetic determinism (Strohman 1997; Morange 2001; Rose 2003). Others are more concerned about the neglect of quantitative analysis (Endy and Brent 2001), formalization, and abstraction, or the disregard of complexity and context (Goodwin 2001; Lewontin 2001). Because of this plurality of firmly anchored perceptions of molecular reductionism of the past decade and of opinions about what to do next, an attempt at an epistemologically precise definition of systems biology would be doomed to fail.

Certainly, systems biology is more than the return to traditional organismal physiology, as a few cynics have voiced. Molecular biology's revelation of the inner secrets of life has opened a new vista on living systems that distinguishes today's system-wide approach from that of the “innocent” days of classical physiology. Systems biology also is more than the logistic integration of bioinformatics databases and of computational modeling with experimentation. If the whole is more than the sum of its parts, as Aristotle taught us (Metaphysica), then the rapidly increasing knowledge of the (molecular) parts, stimulated by the “omics revolution” (Evans 2000; Ge et al. 2003) begs for an explanation of how the whole (organism) arises from its parts—other than by linearly adding them up. Therefore, one overarching goal of systems biology could be: the analysis of entirety rather than the entireness of analysis. This could be an operational definition were it not too philosophical to be useful. More concretely then, if systems biology is the migration away from the analysis of the individual molecular parts in isolation toward embracing the bigger picture of the components functioning as integrated parts of a whole, then its very nature will depend on which facet of reductionism is being evaded, as well as which aspect of complexity of the organism as a “whole” is being addressed. For this reason it is useful to dissect the fundamental nature of the complexity of living systems into naturally identifiable domains. Here we propose five principal aspects of living systems that not only represent fundamental, immanent properties of complex systems, but also roughly incorporate the operational goals of typical systems biology approaches. The proposed scheme provides a guiding reference system along five axes to help organize the diversity of existing initiatives. Depending on the relative emphasis, any research effort labeled “systems biology” can be projected onto the “hyperspace” spanned by each of the five dimensions (see Table 1): (1) molecular complexity; (2) structural complexity; (3) temporal complexity; (4) abstraction and emergence; and (5) algorithmic complexity. These five component aspects are not mutually exclusive but might overlap, so that as formal dimensions they are not perfectly orthogonal.

Table 1 The five dimensions that span the complexity of living systems and the challenges of systems biology

Dimension 1: molecular complexity

After decades of genetic analysis, the most obvious aspect of complexity in living organisms appears to be the large but finite number of genes and proteins in the genome of an organism, which stimulates the ambition to “know them all.” Thus, one of the earliest departures from molecular reductionism was driven by the notion that understanding life processes will first require knowledge of all the parts of an organism, not just individual molecules in isolation. This unquestioned idea was fostered by the arrival of genomics and proteomics, which have quickly triggered other “omics” sciences. Technologies for massively parallel and high-throughput analysis, such as automated DNA sequencing and nucleotide and protein microarrays, as well as advances in bioinformatics and database integration, brought the ambitious goal of a systematic—if possible, exhaustive—identification, categorization, and characterization of all genes and proteins of a genome within reach. This was to be followed by the study of the functional aspects of individual genes and their encoded proteins, such as the interaction partners, biochemical activities, and biological role.

However, such “functional genomics” approaches, as they were initially called to distinguish them from traditional sequence-centered genomics, nonetheless often appeared as “brute-force reductionism,” in which technology was used merely to accelerate the comprehensive characterization of pathways. In such massively parallel reductionism, entireness of analysis still trumps understanding of entirety. The significance of molecular complexity arises not only from the variety of individual molecules themselves, but also from how they function in consort. It is becoming obvious, as genomes of model organisms are sequenced, that functions of living systems are determined not only by individual genes and their encoded proteins, but also by how evolution led to an ever-increasing complexity of protein–protein and other intra- and intercellular interactions. The “organizational complexity” of these interactions (Strohman 2000), rather than punctual differences in the genome sequence, distinguishes higher organisms from lower ones. This is most prosaically manifested in the surprising (perhaps disappointing) finding that the number of genes in humans (around 25,000 in the latest count) is in the same order of magnitude as in disproportionally more “primitive” organisms, such as the plant Arabidopsis thaliana or the worm Caenorhabditis elegans (Southan 2004). Even if we consider differential splicing and posttranscriptional and translational modifications, as has been argued in an effort to restore the molecular superiority of humans (Graveley 2001), it is not at all clear whether we can directly map genomic complexity into phenotype complexity. Instead, it is obvious that the regulatory interactions between genes, proteins, and metabolites contribute, through combinatorics and constraints, to the enormous richness of this dimension of molecular complexity, and hence, ultimately can provide an answer for why the whole is more than the sum of the parts. In fact, the key operation is multiplication rather than addition—the whole is the product of high-order combinatorial multiplication, not a simple linear summation.

Consequently, one of the first explicit goals of “post-genomic biology” is to establish, piece by piece, the genome-wide “network map” of all the specific, regulatory interactions between the molecules—the wiring diagram, so to speak—of an organism (Davidson et al. 2002). Such endeavors brought the notion of “complex networks” (Marcotte 2001; Strogatz 2001; Barabasi and Oltvai 2004) into molecular biology. The biology of networks was a first step toward conceptualization beyond the ad hoc models in modern biology in which molecular pathways represent chains of causation and hence provided an explanation of macroscopic processes. Some experimentalists now even equate systems biology with “network biology.” Unraveling the molecular wiring diagram can be achieved in essentially two ways: the painstaking demonstration of individual interactions through experiments, or the use of a formalism to reverse engineer the network structure of a system based on system-wide measurement of the coordinated dynamic of molecular activities that is governed by the network interaction (D'haeseleer et al. 2000; Gardner et al. 2003). The study of the architecture of genome-wide interaction networks has stimulated an avalanche of theoretical work—not only in biology but also in physics (discussed in Sect. 5) (Barabasi and Albert 1999). However, the identification of topological features of a static network's topology features is only one of the first steps on the path to understanding organismal complexity. The network architecture imposes a constraint on how individual elements at the network nodes, the genes and proteins, can behave. Hence the next step is necessarily the study of the dynamics of the collective behavior of the molecules in the network. This will be discussed in Sects. 4 and 5 (Dimensions 3 and 4).

The need for the computational capacity to analyze and manage the vast amount of sequence, protein structure, functional, and expression profile data has brought bioinformatics to the center stage of genomics, necessitating the integration of wet lab work with computational science. However, although this cross-disciplinary integration of methodology is most apparent in this dimension of molecular complexity, computation in biology is more than a mere tool to handle large amounts of data. Computing is also a tool for modeling and simulation, as is discussed in Sects. 3, 4, and 5 (on Dimensions 2, 3, and 4). But beyond that, in a more profound sense, computing is an elementary process of complex systems itself, including the living organism, and hence it is an object of, rather than a tool for, analysis, as we will encounter in Dimension 5 of systems biology.

But in this first dimension, the integration of computation serves the default goal of the “omics” sciences, which is centered around identification, description, characterization, and classification of molecules, and hence appears to some critical observers as the molecular equivalent of the “stamp collecting” of old-time botany and zoology (to paraphrase Rutherford and Watson) (Blackett 1963; Wilson 1995). Even if the study of molecular networks now embraces quantitative analysis following targeted perturbations (Ideker et al. 2001), the systematic quest for the missing parts of a finite genomic jigsaw puzzle is governed by the same kind of mind-set that guided the great explorers of the past to the discovery of new islands or rivers on our finite planet. This mentality is reflected in the original perception of systems biology as a “discovery” science as opposed to the traditional “hypothesis-driven” sciences (Aebersold et al. 2000; Brent 2000; Ideker et al. 2001). The underlying motivation for the brute-force characterization of all the molecular parts and connections is the subconscious but widely held belief that knowing all the component parts of a system and their wiring diagram is equivalent to understanding that system. The problem is that this constructionist approach, thought to be a reversion of reductionism, preserves the fallacies of reductionist thinking (Anderson 1972).

Modern-day molecular stamp collecting, however, is warranted and necessary. The set of stamps is large, but finite, and it is not unrealistic to hope to collect them all within the next few decades. The comprehensive characterization of all quantitative and qualitative details is important because of one essential aspect of the complexity of living systems: the immense heterogeneity of the individual components and the astronomic combinatorial possibilities in which these parts (genes, peptides, cell types, organisms) can interact to generate function (Elsasser 1998). This unfathomably large “space” that arises from the combinatorics makes evolution a (quasi) non-ergodic process, so that not all features of organisms may be the result of functional optimization by adaptation (Dawkins 1996; Alon 2003), as an engineer would like to see it (Mangan and Alon 2003). Instead, organisms are replete with features that may represent frozen historical accidents (Gell-Mann 1995; Gould 2002) and local optima fixed by evolution (Kauffman 1993). Moreover, natural selection had to face the constraints imposed by laws of physics, rules of architectural design and the principle of evolvability (Gould and Lewontin 1979; Webster and Goodwin 1984; Autumn et al. 2002). Conversely, it can sometimes get a free ride from self-organizing processes that can generate more complex ordered patterns without external instruction (as discussed in Sect. 5) (Kauffman 1993), which in turn can be selected for. All this reduces the role of the optimizing (adaptive) force of natural selection and produces idiosyncrasies that are often difficult to capture by universal principles. The end result is a living organism which runs on software (the genome) that consists of multiple layers of “clever hacks,” invented ad hoc but kept forever because they work, rather than on a software that has been systematically designed de novo by disciplined programmers and optimized for algorithmic elegance or efficiency. The “organized heterogeneity” (Elsasser 1998) of an immense variety of subparts is a fundamental characteristic of living organisms that is absent in physical systems, and the dichotomy between idiosyncrasy and universality poses a challenge to formalization of multicomponent living systems in life sciences, in contrast to the description of homogeneous multicomponent nonliving matter in statistical physics. Thus, a systematic, exhaustive analysis of the individual parts, their interaction modes and quantitative parameters, as summarized in this molecular dimension of biological complexity, remains an important task in future biology. Indeed, although to early advocates of integrative biology the analysis of gene sequences was the poster-child of reductionist thinking, modern bioinformatics of (whole) genome sequence comparison between species has become one of the most important and exciting tools for the study of evolution (Medina 2005). More recently, such “brute-force,” large-scale comparisons have been extended to gene expression profiles, thus adding a functional aspect (Stuart et al. 2003). Holistic minds in biology will contend, however, that the availability of an entire list of parts and their functional specifications is but the first stage in the journey toward understanding a given living organism as an entity, a system. Therefore, other dimensions of complexity need to be considered.

Dimension 2: structural complexity

While the genomic perspective of complex systems emphasizes the information-encoding and processing aspect of interacting biomolecules, i.e., the basic software, the complexity of a living organism is also manifested in a characteristic fashion in the structure of its hardware. To understand how the information in molecular networks materializes into the mechano-electro-chemical machinery of organisms, we need at least to understand the actual underlying “device physics.” This aspect of a living system has been transparent to the genomic sciences, notably bioinformatics, but was the domain of traditional biophysics. So what new perspective in the study of the physics of biological devices then warrants the label “systems biology”?

A characteristic aspect of complexity of living systems is that they exhibit feature richness at multiple size scales across a wide range; i.e., they have a deep “scale space” (Bar-Yam 2004). Genes determine proteins (at the nanometer scale), which organize the chemistry so as to assemble organelles (micrometer) and cells (~10 µm). A group of cells assembles to tissues with characteristic patterns (millimeter). Tissues form organs (centimeter), and organs cooperate within the systemic physiology of the organisms (meter). In other words, when looking at a living organism through an imaginary universal magnifying device, one could continuously zoom in and out without encountering the abrupt jumps in feature richness that are often found in inanimate materials of man-made systems. (Note that a feature boundary such as the nuclear or mitochondrial membrane represents a spatial boundary but not necessarily a jump in feature richness.) The smooth variation in feature size in biological systems, spanning nanometers to meters, is a fundamental property that distinguishes living systems from man-made machines.

It is important to note that this hierarchical, multiscale complexity in the structure of higher organisms is a more encompassing principle than fractals, which in addition to feature richness at all size scales also exhibit self-similarity between the scales (Mandelbrot 1982; Bassingthwaighte et al. 1994; Cross 1997). Fractals are found in some subsystems within a complex organism, such as in bronchial or blood vessel branching systems, but these physical fractals extend over a finite range of size scales. Moreover, the defining property of fractals, the scale-invariance (self-similarity at different scales) (Mandelbrot 1982), is distinct from the multiscale feature richness of complex organisms discussed here. The latter is characterized by scale-specific features that vary between different levels of organization (molecules, cells, tissues) and are not necessarily repeated at other size scales. If the complexity discussed in Dimension 1 epitomizes the proposal of organized heterogeneity as a characteristic feature of living systems (Elsasser 1998), then the structural complexity discussed here may be summarized as the principle of (heterogeneous) hierarchy (Pattee 1973), which provides a second characteristic organizing principle.

Importantly, in this nested hierarchy each level exhibits system properties not obvious from the properties of its component parts (e.g., proteins) and harbors its own set of rules that determine new interaction modalities (e.g., protein–protein interaction, cell–cell communication) not evident at the smaller scale. Or, as Phil Anderson put it, psychology is not applied biology, biology is not applied chemistry, and chemistry is not applied physics (Anderson 1972). This apparent “irreducibility” of higher-level features is sometimes, without rigorous definition, referred to as an “emergent property.” Hence, multiscale structural complexity of higher organisms is tightly linked to emergence and abstraction, which represent our Dimension 4 of complexity.

Given the scale-transcending, hierarchical property, the study of structural complexity entails taking a multiscale approach, rather than studying each level separately—as was done in traditional biophysics. The depth of scale space required to fully describe physiological phenomena represents a key manifestation of structural complexity. In the heart, for example, the stability of the cardiac rhythm is determined by how wave fronts propagate over many centimeters, but this in turn is determined by features as small as ion channels, gap junctions, and intercalated disks in the nanometer range (Wilders and Jongsma 1993). In an individual cell, the micron-scale spatial localization of a signaling event (Simon and Llinas 1985) would suggest the need for submicron spatial descriptions of cellular activity, a point amply illustrated by high-resolution images of intracellular heterogeneity (Marsh et al. 2001). Structural complexity is thus already evident at this lowest molecular level: So-called molecular crowding (Hall and Minton 2003) of the cytoplasm leads to different scales of diffusion rates, precluding the use of classical kinetic formalism derived for ideal, well-stirred, single-scale homogeneous solutions. This necessitates the development of more explicit multi-algorithm modeling (Takahashi et al. 2005). From this perspective, a description of the state of a macroscopic organism in terms of the state of each independent submicron-to-micron feature would require the impossible specification of what could be a “mole of parameters”—the true measure of structural complexity.

If large-scale genomics (Dimension 1) was the horizontal integration of traditional molecular biology approaches, then transcending the various hierarchical levels of structures within one research project can be viewed as a vertical integration. What are the tools for this approach? In the molecular genetics of Dimension 1, the typical experiment consists of a genetic or pharmacological perturbation of a given system, followed by the (genome-wide) analysis of its response using established molecular biology and biochemistry. When studying structural complexity and associated “emergent features” (discussed in Dimension 4), perturbation translates into actuation and control, and the equivalent of molecular analysis is sensing, not only of changes in activity of molecular species as carriers of information, such as protein phosphorylation state, but of physical variables, such as spatial-temporal distributions of pH, temperature, photons, currents, and voltage, as well as shapes, motion, mechanical forces, material properties, etc., at the various levels of organization. In the horizontal dimension of molecular complexity, broad-scale and high-throughput tools have long been available, but the toolbox for climbing up the ladder of structural integration toward larger scales is almost empty. A systematic effort to develop standard devices for actuation, control, and sensing that would allow the manipulation of more complex experimental systems (cells, tissues, organoids) and the measurement of higher-order, system properties at each scale will be paramount for studying this aspect of complexity. As we will see in the next section, the ability to control biological processes at all spatial scales will also be critical for understanding the dynamics that result from the third dimension—temporal complexity.

Dimension 3: temporal complexity

The multiscale property of complexity applies not only to space, as discussed above, but also to time, the dimension of dynamic behavior. Chemical and physical processes exhibit characteristic temporal structures. While electron transfer reactions can occur in femtoseconds, protein conformational changes associated with cellular signaling are slower, taking place in nanoseconds. The nature of rapid ion channel gating events (microseconds), such as those that determine the depolarization of the heart (milliseconds), in turn determine the stability of the cardiac cycle (1 s), and can affect directly the longevity of the organism (gigaseconds), for a total span in time of a factor of 1018. In comparison, the most complex astrophysical calculations to date, which describe the gravitational accretion that led to large-scale structure in the universe, may span only a factor of 1,300 in time and involve only one force (gravity) and six degrees of freedom for each node (position and velocity) (Springel et al. 2005).

The most elementary complex temporal “pattern” (in the widest sense of the word) is fluctuations of variables due to stochastic processes, such as the local intracellular level of a protein that fluctuates in time because of the effect of finite numbers of molecules in the cells (e.g., two gene copies, 1,000 s of transcription factor copies) (Kaern et al. 2005). The study of “gene expression noise,” although its cell-biological manifestations in higher organisms have long been recognized (Enver et al. 1998; Hume 2000), has gained momentum in the era of systems biology, perhaps because the probabilistic nature appealed in a particular way to those who sought to escape genetic determinism.

But the more prosaic manifestations of temporal complexity are patterns in a narrower sense of the word, namely those that the human brain intuitively perceives as “regular,” for example, oscillations. Such temporal patterns of behavior “emerge” from nonlinear regulatory interactions (stimulus-response relationships), including thresholding, saturation and “biphasic” (non-monotonical) effects (Tyson et al. 2003), which connect genes, proteins, and metabolites to establish circuits and networks. A central goal in studying the generation of the temporal patterns in the behavior of a system is to understand the formal relationship between the network architecture on the one hand—which can implement a breadth of feedback, modulation, and control schemes well known to system engineers—and the observed dynamics on the other hand. Major research agendas include either reverse engineering to infer the network structure based on observed dynamics (as mentioned in Dimension 1) or the modeling and simulation of a given network structure to learn about the dynamic behavior the network governs.

Temporal oscillations, the most basic temporal pattern, in gene expression are best known in the cell division cycle, but also occur in a variety of other cellular processes, such as in the change of redox state or in the cellular response to DNA damage (Klevecz et al. 2004; Lahav et al. 2004). Periodic dynamics also are observed for higher-level physiological variables, including transmembrane potentials, cytosolic calcium, secretion of humoral mediators, neural activities, and muscular contractions (Glass 2001). Oscillations can also be local manifestations of large-scale spatiotemporal traveling waves, such as waves of excitation in the brain or heart (Gray et al. 1995) or of cellular behavior as in Dictyostelium (Palsson and Cox 1996), animal swarms, flocks, and schools (Okubo 1986) or even nationwide infectious outbreaks (Cummings et al. 2004). Such spatiotemporal patterns at various scales of magnitude are elementary but particularly impressive examples of “emergence” as discussed in Dimension 4.

Temporal patterns that arise from deterministic interactions and constraints of a network, however, need not be periodic but can also appear irregular, as in the case of deterministic chaos, which must be distinguished from the stochastic fluctuations discussed above (Mackey and Glass 1977; Gleick 1988). The discovery and study of chaotic behavior—as well as its spatial equivalent, the fractal structures—in life sciences (Goldberger et al. 2002) was perhaps one of the earliest and starkest demonstrations of the limits of “linear thinking” in mainstream molecular genetics. The “dynamic explanation” of disease defied the traditional reduction to genetic pathways. Chaotic behavior, e.g., in heartbeat frequency, has been associated with the healthy state, whereas loss of this type of temporal complexity is observed in disease states (Goldberger et al. 2002). Today, one might readily conclude that chaos and fractals, which can occur at a variety of size and time scales, are properties of a rather small set of subsystems within the complex system of the living organism. However, this conclusion may be based upon inadequate data: while the regulatory dynamics of the cardiac rhythm has been well studied because the heart rate is easily measured, the fractal nature of heart rhythm has only recently been appreciated (Ivanov et al. 1999; Goldberger et al. 2002), in part because of technical challenges in the numerical analysis of long-term time-series data. We should be prepared to uncover new dynamics as we begin to explore single-cell signaling, as with the recently discovered pulsatile expression of p53 following DNA damage to single cells (Lahav et al. 2004). The observation of multifractality in heartbeat dynamics raises the intriguing possibility that the associated nonlinear control mechanisms involve coupled cascades of feedback loops in a system operating far from equilibrium (Ivanov et al. 1999; Goldberger et al. 2002). Were this to prove true for larger classes of biological regulatory systems, one might encounter unexpected difficulties in creating models based on reverse engineering of cellular signaling and regulation.

Temporal complexity is not only manifested in the measured temporal pattern of a variable (e.g., spikes vs sinusoid vs chaotic), but also in the stimulus-response characteristics: in the simplest case of linear sensors, a temporal shape of a stimulus (e.g., a “rectangular stimulus”) may be mapped directly into the response shape. But the living system may also just “feel” the change rather than the actual intensity of a signal, i.e., sense its first time derivative (e.g., one spike in response to a step-shape change). Such differential sensors are important for adaptation (habituation, wherein we don't feel our clothes) and alerting the organism to sudden changes in the environment. There is also the need for a system to completely suppress the response to a change in environment (perfect adaptation, i.e., not even a spike in response to a step-shape change), a manifestation of robustness (Alon et al. 1999; Yi et al. 2000) (see Sect. 5). Another important facet of temporal response patterns is whether the response returns at all when the stimulus disappears (reversibility vs irreversibility). A characteristic of nonlinearity of the underlying mechanism is hysteresis, where the response strength to a given stimulus strength may depend on the history (previous state) of a system (Laurent and Kellershohn 1999; Ninfa and Mayo 2004). Hysteresis can be viewed as a primitive memory effect and is related to multistability and nongenetic persistence discussed in Sect. 5. Other aspects of temporal response characteristics include time delay, integration, and more complex signal transforms.

An important issue to note when analyzing temporal patterns is the interference with the vertical integration in structural complexity (Dimension 2): since tissues of multicellular organisms are composed of cells as functional units, tissue-level variables represent ensemble averages. The traditional monitoring of the change of expression of a gene in a tissue (e.g., as measured by RT-PCR, microarrays, or immunoblots of tissue lysates) produces an aggregate variable, averaged over the expression changes in all the individual cells in that tissue (Huang 2005). Unless the individual cells are synchronized with respect to the measured variables, such population measurements incur loss of information. As more and more (dynamic) measurements can now be made at the level of single cells, oscillations or other temporal patterns within individual cells that are masked by multicell ensemble-averaging will become evident (Xiong and Ferrell 2003; Levsky and Singer 2003; Lahav et al. 2004; Sachs et al. 2005). Tissue-level oscillations can be observed in the presence of either experimental or natural synchronization of the behavior of the individual cells (Whitfield et al. 2002; Klevecz et al. 2004; Yamamoto et al. 2004). Thus, the heterogeneity of a population of genetically identical cells, either due to random fluctuations of gene expression (Kaern et al. 2005) or persistent epigenetic individuality (Spudich and Koshland 1976; Rubin 1990; Ferrell and Machleder 1998; Balaban et al. 2004) must be considered. Complex organisms either exploit stochastic cell population heterogeneity to achieve a smooth, macroscopically continuous tissue behavior (e.g., in the graduated control of skeletal muscle contraction strength by varying numbers of activated fibers) or suppress it by coordinating cell behavior through structural or chemical cell–cell coupling (e.g., in generating the macroscopic pumping function in the heart). Such cellular ensemble behavior creates an “emergent temporal pattern” at a larger time and space scale, represented by another set of variables at a hierarchically higher level of structural organization. One challenge can be to identify the messengers that mediate the regional synchronization as well its function, as in the case of calcium oscillations in a single pancreatic islet (Rocheleau et al. 2004).

The notion of multivariate dynamic behavior in individual cells within a cell population adds another layer of complexity to deterministic genomics and poses new challenges for the acquisition and analysis of data. First and foremost, the study of temporal patterns will require the capacity for precise, real-time measurements of variables at differing time and size scales. In particular, systems will have to be studied when they are far from equilibrium: e.g., analysis of the dynamics of the metabolic response of cells to chemical perturbations (Eklund et al. 2004) will extend metabolic modeling that heretofore has concentrated on the steady-state analyses of metabolic fluxes. Passive acquisition of data alone is insufficient. It is well recognized that it is difficult to obtain a detailed understanding of a closed-loop, feedback system without opening the feedback loop—hence the intuitive emphasis in traditional biology on genetic perturbations, such as gene knockouts and RNA interference. However, it is not sufficient to measure dynamic variables following perturbations in such systems; it is also necessary to assert external control of the system to explore gain and stability of the feedback mechanisms (Wikswo et al. 2006). It is not yet widely recognized that the advances provided by the availability of more and more sensing capabilities for analyzing a single cell will be limited by the need for corresponding cellular actuators. At present, natural proteins that act as molecular machines by changing conformation or binding affinity in response to a stimulus can be thought of as actuators—albeit of limited controllability. But there is a growing need for externally fine-tunable subcellular actuators that can accomplish similar tasks. Thus, as is the case for the study of structural complexity, this will entail the development of micro- and nano-devices (Freitas 2002; Whitesides 2003) that act as intracellular switches, sensors (Lu and Rosenzweig 2000), or valves (Nguyen et al. 2005; Kocer et al. 2005) in combination with the latest achievements in microscopy and digital imaging techniques.

To integrate the analysis of the first three dimensions of systems biology covering the molecular, structural, and temporal complexity in living systems, our capabilities in gene- and protein-specific measurements (Dimension 1) need to be advanced and combined with measurements of the other two dimensions. Concretely, one of the ultimate goals would be the development of tools for high-resolution, real-time (Dimension 3), multiplex determination of individual molecular species (Dimension 1) within the context of the physical structure (Dimension 2), at various levels: from single cells to tissues to organs. To date, multiplexing measurements (e.g., of thousand-dimensional gene expression vectors by DNA microarrays) requires the destruction (“lysis”) of biological structure, while real-time measurement of genes and protein states in the intact, ordered structure are low-dimensional. But the first, small steps to overcome this challenge have recently been taken, such as expression profiling in single cells (Levsky et al. 2002; Le et al. 2005).

Dimension 4: abstraction and emergence

Many phenomena associated with complex systems are better understood in terms of general, abstract principles without making reference to particular material constituents. To illustrate this, let us use the example of circadian rhythm. Its oscillatory nature per se is what we think is of interest from a systems perspective, and hence needs to be studied as such, in a generic and abstract sense (Barkai and Leibler 2000). In contrast, traditional molecular biology by default focuses instead on identifying the tangible, material substrate that underlies the observed phenomena, in this case, the particular genes “that are involved” in circadian rhythm (Lowrey and Takahashi 2004). The identification of genes shown to be critical for the periodicity may provide a mechanistic, qualitative ad hoc explanation for the existence of the rhythm. But such findings, while necessary, are not sufficient to understand how the more abstract property of oscillation arises from the organization and interplay of the individual components. More generally speaking, molecular biologists are mostly satisfied with a concrete and specific, so-called proximal, explanation (“The car moves because the wheels are turning”) (Tinbergen 1952; Dretske 2000). This epistemological habit is manifest in the ubiquitous hunt for the molecular pathway that causes a particular behavior or disease. Were living organisms Rube Goldberg machines (Wolfe and Goldberg 2000), no further explanation would be needed. In contrast, physicists and mathematicians, and to a lesser extent, engineers, prefer to study such phenomena from a general perspective not closely tied to the underlying particulars (“The car moves because of the thermodynamic laws that the combustion engine has to obey”). The challenge for systems biology will be to unite these epistemologically disparate efforts and the associated communities.

In the by now traditional complex systems sciences (Waldrop 1992; Gell-Mann 1995; Goodwin 2001), the more abstract “system features,” as exemplified above by the oscillatory behavior, are at the center of interest and have come to be known as “collective behavior” or, more poetically, “emergent properties,” as encountered earlier. The latter term is not mathematically rigorously defined (Corning 2002) and some scientists question its necessity. Nevertheless, emergent properties have become somewhat a defining characteristic of a complex system (Bar-Yam 1997). The term is intuitively appealing to many and is operationally useful for communicating the idea of system-level features that cannot be understood by studying the dissected parts in separation. Here, given its widespread and loose usage, we propose that the term “emergent” refer to any abstract property of a system that is not obviously manifested as a property of an individual material subcomponent of the system. Thus, oscillations, switch-like behaviors, and striped or spiral wave patterns in an otherwise continuous system are examples of simple emergent features (Turing 1952; Murray 1993; Meinhardt 1996).

It is important to note that an emergent behavior in this definition is not equivalent to any notion of “irreducibility” in a stricter sense, be it the material, epistemological, or computational irreducibility (discussed in Sect. 10), nor is there anything romantic or magical about emergence, as is perhaps implied by the notion that the whole is more than the sum of its parts. In contrast, an emergent system property, in the loose definition used here, can in fact often be “reduced” to its underlying parts, i.e., understood as a consequence of the interplay of the parts. But to do so entails simplification, abstraction, conceptualization, and formalization. As philosophers of science and pedagogy put it, there is no understanding without abstraction, i.e., leaving out irrelevant details (Picht 1969).

The tool for abstraction in the widest sense is mathematical modeling, which comes in many forms and flavors. Mathematical modeling is in essence a shortcut that allows us to describe a system so as to predict its general or long-term behavior, given its structure and initial conditions, without having explicitly to “reenact” its behavior step by step from the knowledge of the specific parts and rules of their interaction. The best-studied emergent properties are features of dynamic behavior, and the tools used to describe such behavior in formal terms, typically differential equations, are well established through the discipline of system dynamics and control theory. Herein, the simplest emergent properties in living systems are dynamic features, such as temporal oscillations or stable or unstable steady-states, in particular, multistability. The last implies the existence of multiple alternative oscillatory or stable states, or “attractors” within one system (see below). On the other hand, as we have seen above, emergent properties can also be manifested as the more tangible geometric patterns in physical space. Generally speaking, we can say that emergence of both abstract behaviors and concrete structural patterns typically involves the counterintuitive transformation of gradual, quantitative differences into discrete, qualitative differences (Anderson 1972).

Emergent dynamic behaviors

In fact, the abstract dynamic behavior (temporal patterns) that emerges as a distinct quality can be represented as a geometric quality. To do so we need to introduce the concept of an abstract space, the so-called state space, which contains all possible states that a dynamic system can occupy, e.g., all gene expression patterns within a cell, if we model a gene network. The state space offers a more intuitive grasp of the behavioral repertoire of a system. The mathematical concept of the state space is central for the understanding of the potential behaviors of a multicomponent system, such as a molecular network. For instance, the presence of multiple stable states in a gene regulatory network would lead to the compartmentalization of a state space into “basins of attraction” in that all the unstable system states in the neighborhood of a stable “attractor” state would be attracted to it. Thus, attractor states illustrate how the very same gene network generates a variety of stable gene expression states representing the cell types in metazoan (Kauffman 1993; Huang 2005). The intuitive character of the state space representation is nicely illustrated in Waddington's metaphor of the “epigenetic landscape” (Waddington 1956; Reik and Dean 2002) in which watersheds, hills, and valleys compartmentalize a cell's state space and in doing so, represent a cell's fate decisions and development into stable cell types (the attractors, or valleys). It is important to realize that the most compact representation of the key features of a state space can, in some cases, be in the form of effective variables that have no exact physiological correlate, but provide a remarkable analytical mechanistic explanation of the phenomena, as with the “activation” and “inactivation” variables in a simple two-variable model of the nerve action potential (Fitzhugh 1961). Similarly, an explanation of trajectories in state space using the first few principal components can be more informative than the thousands of variables that are combined to produce these components (Huang et al. 2005).

Few molecular biologists have fully embraced the idea of a state space to conceptualize dynamic behavior. Instead, they mostly operate in the domain of network architectures (topologies) as evidenced by the preoccupation with pathways and network charts as explanatory schemes. Multistability, which is beyond such proximal explanations, establishes a system behavior with a memory that can epigenetically (without involving changes in the genome sequence) store and remember environmental inputs (Xiong and Ferrell 2003; Ozbudak et al. 2004), and offers a formal underpinning for the long-recognized phenomena of nongenetic inheritance, enduring modifications, and other epigenetically persistent states (Spudich and Koshland 1976; Rubin 1990; Balaban et al. 2004).

Pattern formation

In contrast to these rather abstract emergent properties, what has long attracted interest among organismal biologists, and more recently, complex system scientists, is the spontaneity or inevitability (given a certain system structure and initial condition) of how spatiotemporal patterns, such as stripes, spirals, and waves, as briefly mentioned in Sects. 3 and 4 (Dimensions 2 and 3), arise (Meinhardt 1996). At the core of spontaneous pattern formation are processes that resist the pull toward “indifferent randomness” or symmetry (thermodynamic equilibrium), and hence are also called symmetry-breaking events in a wider sense (Kirschner et al. 2000; Sohrmann and Peter 2003). Spontaneous pattern formation is also referred to as “self-organization” (Ball 2001)—a term equally poetic, overused, and poorly defined as “emergence.” The attempt to understand the spontaneous generation of order as a process that occurs far from thermodynamic equilibrium led to the (still controversial) idea of “dissipative structures” (Nicolis and Prigogine 1989), i.e., quasi-stationary, ordered structures that take up (nonthermic) energy and produce entropy to maintain order. Patterns (e.g., tissue structures, coat patterns) at a higher level are created by the particular (typically nonlinear) behaviors and interactions of the components at the lower level (e.g., molecules, cells) in time and space, and have characteristic lengths that are usually orders of magnitude larger than those of their constituent components (Turing 1952; Murray 1993; Meinhardt 1996); thus, pattern formation can be viewed as one of the natural processes that create the depth of scale space and hierarchies in Dimension 2 of complexity.

Other forms of emergence in molecular biology

The idea of emergence of abstract entities at higher levels of organization in living organisms is not limited to classical dynamic systems and pattern formation. The analysis of genome-wide gene regulatory or protein interaction networks, for instance, has benefited from “abstracting away” the particulars of molecular details (Dimension 1) to reveal another kind of emerging properties. This has opened the possibility of using graph theory for the analysis of network architectures, which has led to the discovery of particular “emergent” structures in the global architecture of the wiring diagram (Barabasi and Albert 1999; Barabasi and Oltvai 2004) or of local network motifs (Milo et al. 2002) in biomolecular networks. However, the functional and evolutionary significance of these graph theoretical properties remains to be studied carefully in the context of dynamics.

A central aspect of emergence in living systems is homeostasis, which is related to the more abstract property of the aforementioned robustness (to perturbations) (Yi et al. 2000; Stelling et al. 2004) that arises from the particular structure of the underlying wiring diagram between the component parts. The intricate interplay between robustness (stability), flexibility (adaptability), fragility (vulnerability), and complexity is at the core of living systems, behind the apparent temporal behavior (Goodwin et al. 1993; Carlson and Doyle 2002; Kitano et al. 2004), yet these properties are not much studied in the current “mainstream” complex systems approaches. [Robustness of a dynamic system should not be confounded with the use of “robustness” in a more obvious and concrete way in the context of graph properties and the analysis of network topology discussed in Sect. 2 (Dimension 1). There, robustness is not the ability of maintaining a system state in response to perturbations, but of a graph (network topology diagram) to maintain some global structural connectivity property in response to structural destruction of network nodes or links (Albert et al. 2000).]

Another fundamental and truly concrete emergent property in biological systems pertains to the question of how the soluble biochemistry of genes and proteins studied in Dimension 1 produces the tangible, macroscopic living systems (which exhibit the spatial structures of Dimension 2 and the temporal ones analyzed in terms of Dimension 3) that are subjected to the laws of macroscopic mechanics. How does solution chemistry create form and physicality of macroorganisms? Again, high-throughput characterization of all physical parts and their relationship alone will not suffice to answer this question. One approach is the formalization of physicality in the tensegrity (=tensional integrity) model for studying the emergence of macro-mechanical properties in living organisms (Ingber 1998; Ingber 2003). The concept of tensegrity explains how systems produce “macroscopic” mechanical stiffness and shape-stability through the particular spatial organization of “microscopic” tensile and compression resistant elements in a physical network, arranged so as to produce global force balance. Such a design principle has been suggested to be embodied in the interactions between amino acid residues within folded proteins, between the molecular filaments of the cytoskeleton within the cell, or between muscles, tendons, and bones in the musculoskeletal system (Ingber 1998; Ingber 2003). It allows a system to unite both mechanical stability and flexibility.

Spectrum of models of varying abstraction

It would go beyond the scope of this article to review the various types of modeling and the levels of abstraction that help formalize “emergence” in biological systems. But for this discussion of systems biology initiatives, it suffices to point to the broad spectrum of abstraction in the various biological modeling approaches, with extremes at the opposite ends that are complementary to each other (Huang 2004). At one end are the specific, detailed models that use stochastic equations that incorporate the behavior of single molecules, or with some minimal abstraction, deterministic equations that describe the kinetics and flows of molecular species (for modeling cellular processes) or cell types as entities (for modeling tissue processes). At the other end are models of such considerable abstraction as to be almost inconceivable to experimental biologists, such as cellular automata, graph theory, or even game theory.

Systems biologists with molecular biology backgrounds and engineers engaged in biology typically populate the “specific, detailed modeling” end of this spectrum and seek to design models as detailed and realistic as possible in order to “predict the dynamic behavior” of a given instance. Here the underlying conceptualization, such as formal reaction kinetics and laws of mass action, are not questioned, but even taken for granted, and all that is needed, so goes the idea, are precise experimental measurements of the quantitative parameters that can then be plugged into the models to validate them. A historical example of this detailed modeling based on precise quantitative data is the Hodgkin–Huxley model of the action potential. Such models based on quantitative data and an established formalism may have potential utility in the drug discovery industry if experimentation can be guided or even replaced by model-based simulations. Projects aimed at reenacting biological systems in silico, such as E-cell, The Virtual Cell, CyberCell, etc. (Normile 1999), combine the urge toward comprehensive characterization of Dimension 1 and detailed modeling. Practical and epistemological aspects of such modeling will be discussed under Dimension 5, in the context of the algorithmic complexity of living organisms.

In contrast, the other end of the spectrum that represents higher abstraction, which does not depend on quantitative data but rather on qualitative observations, is predominantly occupied by theoretical physicists and mathematicians who appreciate the necessity of simplification. Their operational goal is to find a minimal model that captures some essential generic aspect of a system's behavior. The long-term goal is to identify and understand generalizable, even universal principles rather than predict details of behavior of a particular instance. Therefore, their models typically contain “anonymous” genes and cells. They are interested in the potential behavioral repertoire rather than in predicting the actual behavior of a particular case. Historical examples include the Turing model of spatiotemporal patterns that can be produced by a set of partial differential equations (Turing 1952; Murray 1993; Meinhardt 1996), cellular automata-based models of cellular interaction (Ermentrout and Edelsteinkeshet 1993), or random Boolean genetic networks (Kauffman 1993). The distance between the two extremes of modeling approaches reflects the deep “scale space” discussed under Dimensions 2 and 3, and creates the need for middle-ground approaches. The practical necessity is obvious given the large number (tens of thousands) of different proteins active in cellular homeostasis and the desire to specify quantitatively the dynamic behavior of all the metabolic and signaling pathways. One example of such a compromise is the use of effective variables such as the “flux” in a specific metabolic pathway (Stephanopoulos et al. 1998), which allows precise descriptions at a fine scale to be converted into abstractions that are readily used at larger scales, much as statistical mechanics reduces to thermodynamics with ensemble averaging. Another abstraction is to focus on the stoichiometric or network constraints of biochemical reactions or on the structure of signaling pathways, while leaving out the physical kinetics (Papin et al. 2003; Ma'ayan et al. 2005). Without some “coarse-graining” it will be impossible to model the spatiotemporal distribution of all active proteins throughout a cell.

Not all mathematical approaches in biology serve the modeling and prediction of spatiotemporal behavior. Adjacent to the abstract end of the modeling spectrum are several branches of computational and theoretical biology, far beyond the “material world” of traditional systems biology, which after all is rooted in molecular biology. These areas deal with even more abstract but fundamental problems. They include, to mention only the most salient ones, the formal explanation of scaling laws in evolution, ecology, and physiology (Schmidt-Nielsen 1984; Brown et al. 2002); computational experimentation with simulated “artificial life” (Langton 1997); and the vast field of behavioral and cognitive biology, including models based on game theory (Imhof et al. 2005). If we view these endeavors as aiming at understanding the most nonmaterial, abstract, and emergent features of living systems, then their inclusion in systems biology may be justified by assigning them to this dimension of abstraction and emergence.

We conclude this section by noting that beyond technical and methodological issues, there is a fundamental limit to modeling, i.e., a useful description of a system by an abstract formalism, much as there is a limit to the compression of electronic files. This has to do with the concept of “computational irreducibility” discussed in the next section. For some aspects of sophistication of a system, there may be no theory to explain emergence, and no shortcut is possible, so that the detail-driven, exhaustive models aiming at a one-to-one reenacting of a specific instance of a real system would be the only way to predict behavior. One challenge, however, for such in silico step-by-step replay of the irreducible reality of a system is, in addition to the necessity to know all the details, the formidable computational cost, which we address in the last dimension of systems biology.

Dimension 5: “algorithmic” complexity

A central feature of complexity in a living organism is that it encodes and processes information. Technically, the term “algorithmic complexity” describes a measure for the computational resources needed to solve a computational problem. In a broader sense, we use it here both to refer to (1) the complexity of the “problems” that organisms compute and the physical medium they use for doing so, and (2) the complexity of the theoretical models required to simulate the organisms' computations. Viewing the organism as a computing machine opens many new perspectives: a living system is a physical system that computes its own development and homeostasis in response to external perturbations (Wolpert 1994). This is not an ad hoc analogy but has deeper implications that will become apparent, for it allows us to use concepts from computer science, and hence is distinct from the mathematical modeling used to address complexity.

One interesting question is the relationship between the physical implementation and the computation process. In von Neumann computers (which comprise most types of existing computer architectures) this is well understood; it is roughly captured in the dichotomy between hardware (transistors) and software (code). But how does biological computation exploit the physics of the living system? Computation by the organism is obviously a hybrid of digital and analog computing (Sarpeshkar 1998). While the DNA sequence (with the genetic code and transcriptional regulation as its programming language) is the most obvious embodiment of digital computing, the biochemistry of physiological and neuropsychological functions represents information processing that has analog components (molecule concentrations, membrane potentials). Digital and analog computing are intertwined. Gene regulatory networks in cells process information by transforming a continuous developmental signal, encoded by a hormone concentration as the input, into a gene expression pattern and associated discrete cellular phenotype as the output. By maintaining their distinct cell type identities despite harboring the same set of available instructions stored in the genome, cells also have an acquired memory of past events. At a higher level of organization, cell–cell communities form systems that also undertake computations, such as functional histological units, endocrine organs, or the immune system in which individual cells may act as discrete switches. The most prosaic system of multicellular computation is, of course, the brain and the peripheral sensory and neuromuscular system.

With our increasing knowledge of the physical medium that the living system uses for computing, one would then be interested in determining the computational efficiency, i.e., the use of energy, time, and space resources for computing. Analog computation costs less than digital computing chiefly because of the direct mapping of the device physics to operations and the lower communication overhead; however, the use of continuous variables leads to noise accumulation due to thermic fluctuations of molecular quantities (Sarpeshkar 1998; Kaern et al. 2005). Thus, a specific question of interest is: How does a living system use hybrid analog/digital computation to get the best of both and optimize computational efficiency, given the resource constraints and the challenge of performance degradation due to noise? In the hybrid architecture the digital components provide sets of discrete “attractor states” to which the analog signals can be reset in order to suppress noise accumulation (Cauwenberghs 1995). Such digital restoring states need not be embodied by an explicitly digital, molecular substrate, such as the nucleotide sequence. In fact, as discussed under Dimensions 3 and 4, the body is replete with nonlinear feedback circuit systems in molecular regulatory networks, electrochemical circuits or humoral inter-cell and inter-organ communications that produce stable oscillatory or stationary attractor states and all-or-none events, as found in biochemical switches (Ferrell and Machleder 1998; Tyson et al. 2003), action potentials, cell cycle progression, and cell fate determination (Huang 2005), which may act as the digital resetting attractor states (Barkai and Leibler 2000; Hasty et al. 2000). This is consistent with the use of transistors in digital computers—combinations of transistors, essentially analog devices, form digital flip-flops and from them logical (digital) AND, OR, and NOT gates.

It is important to note that the aspect of complexity in Dimension 5 is not easily captured by those formalisms of traditional mathematical modeling approaches which aim at predicting long-term dynamic behavior. Such modeling reduces the complexity by using a “shortcut” (in the form of a mathematical abstraction, such as sets of differential equations) to predict behavior. However, regarding the organism as a computing system leads us to the fundamental problem of “computational irreducibility” (Wolfram 2002). A system can be so sophisticated that the system used to describe and compute it (our brain, our mathematics, and our computers) is not sophisticated enough to “outrun” it. This may be the case with the attempt of a living organism (a human) to model a living organism: “If the [human] brain were so simple we could understand it, we would be so simple we couldn't” (Pugh 1977).

A consequence then is that mathematical modeling and general predictions may be impossible for some yet-to-be-identified aspects of organismal complexity. Instead, explicit simulation of every step, bit-by-bit, based on the knowledge of all the parts and a set of local interaction rules, would be theoretically the only way for computing such complex system behaviors at some reasonable resolution. However, epistemologically, such a one-to-one in silico replicate is not a model, but the thing itself. Therefore, it will be as difficult to interpret as reality. As explained above, there is no understanding without simplifying (Picht 1969). Thus, there may theoretically be an upper limit of sophistication only below which features of the living system are amenable to abstraction and mathematical modeling. The entities (subsystems, emerging properties) of a living system that are accessible to abstract modeling are represented by Dimension 4.

Computational irreducibility (Wolfram 2002) was discovered in the study of one of the most abstract types of models that biologists can fathom (or not): the cellular automaton (Ermentrout and Edelsteinkeshet 1993; Wolfram 2002). Thus, it is somewhat ironic that precisely the researchers pursuing the first dimension of complexity, who often dismiss abstract models intuitively or unknowingly, subliminally honor the principle of irreducibility by stressing the importance of exhaustive characterization of all possible molecular details. These biologists, habituated to “proximal explanations” and devoted to entireness of description rather than the study of entirety, in fact have always dreamed of an in silico bit-by-bit simulation of complex organisms using some “supercomputer” (Evans 2000). It was thought that with the knowledge of all the parts involved (“the words”), the rules of interaction (“grammar”) (Bray 1997; Aebersold 2005) and with the aid of computers, we could overcome both the ontological failure (not knowing all the parts) and the epistemological failure (inability of the human brain to grasp the sophisticated interactions) that reductionism suffers from (Bray 1997). However, the idea of computational irreducibility touches upon a more subtle aspect of epistemology and implies that the sophistication of a complex system may reach into a regime where no abstraction into rules, no mathematical shortcut, is possible: there may be no grammar. Perhaps this is why “complex” is not simply “complicated.”

In any case, the dream of exhaustively realistic models for bit-by-bit simulations among biologists exists—be it as an effort of burgeoning in silico biology in the spirit of our Dimension 1 of systems biology or (less likely) stimulated by the awareness of computational irreducibility. Such an endeavor faces practical challenges. For instance, even with the help of mathematical formalism as a shortcut, the electrical activity of the heart during ten seconds of fibrillation could easily require solving 1018 coupled differential equations (Cherry et al. 2000). (N.B., Avogadro's number of differential equations may be defined as one Leibnitz, so 10 s of fibrillation corresponds to a micro-Leibnitz problem.) Multiprocessor supercomputers running for a month can execute a micromole of floating point operations, but in the cardiac case such computers may run several orders of magnitude slower than real time, such that modeling 10 s of fibrillation might require 1 exaFLOP/s×year. (N.B., the Turing number may be defined as the ratio of the time required for a computation to time interval being simulated. A computer and program that could pass the Turing test for artificial intelligence might have a Turing number approaching unity.) Hence, it is easy to imagine bit-by-bit simulations of biological phenomena that are Leibnitz- and Turing-class problems that would take multiple lifetimes on the largest supercomputer imaginable. Thus, not without irony, here the loop closes: If shortcuts and coarse-graining in the spirit of mathematical modeling, as discussed under Dimension 4, are not satisfactory, and von Neumann computers are not powerful enough for a bit-by-bit reenacting of reality, then experimentation, again, becomes the method of choice for studying the complexity of life. It is in this context that we recognize organisms as massively parallel digital/analog computers. For instance, despite formidable efforts to create in silico hearts (Noble 2002), rabbit hearts will for a while to come still be used to “compute” the response of a drug-loaded heart to a stimulation or defibrillation protocol. DNA base-pairing is being explored for use in a new type of nanocomputers (Braich et al. 2002). Here, computational biology ends and biological computing begins. If an advanced computer and program could pass a full-fledged Turing test or fully model the details of a biological system, then the computation might be too complicated to understand. Now that may be beyond systems biology.

Conclusion

We did not intend to define systems biology here, nor do we claim this review to be comprehensive. We have focused on biology as an analytic science that seeks to understand the living organism as a complex entity, while leaving out discussions of areas of biology that represent synthetic science aimed at building useful systems, such as genetic, cell, and tissue engineering. (Of course, a synthetic approach also can contribute to understanding a class of a system.) A rigorous definition of systems biology would also have to identify what is really new and different from classical biophysics and mathematical or computational biology, and in particular, what distinguishes it from the previous waves of “anti-reductionism” in the past century that explicitly addressed the phenomenon of a “system” as such, including cybernetics (Ashby 1964; Wiener 1965), general systems theory (von Bertalanffy 1969), and the more recent complex systems sciences (Waldrop 1992; Bar-Yam 1997), whose decades-old manifestos sometimes read like those of the websites of modern initiatives in systems biology. Perhaps its novelty lies in the fact that systems biology is the first anti-reductionist movement that is broadly endorsed by mainstream experimental biologists and that emphasizes active experimentation in iteration with theory, rather than relying upon thinking about experimental observations made by others. There were times in biology when we had “theories but no data,” then molecular biology brought an era of “data but no theories”—perhaps now we enter the promising age of “theories with data” (and vice versa).

Yet caution is warranted. Because of its apparent claim to encompass all, systems biology may appear to be a biological discipline of everything. As we all know, a discipline of everything needs no name, and a theory that explains everything explains nothing. But we also need to avoid too narrow a view. Many initiatives in systems biology tend to describe their effort in terms of logistic novelty, namely, the multidisciplinary approach that integrates quantitative and computational methods with experimental biology, without embracing the concepts and addressing the challenges of complexity. Understanding “complexity” may not require a fundamentally novel theory at all (Horgan 1995), but certainly, it will entail adopting a more encompassing and pluralistic mind-set than the one that was sufficient for the characterization of proteins and pathways of the past decades. Since we think that the term “systems biology” is justified and useful, we have here tried to organize this new discipline into five dimensions, which reflect both the various approaches embodied by existing academic disciplines as well as the various aspects of complexity of the organism. The five dimensions should serve as an aid to systemize the immense diversity of the approaches and research questions. They should not be understood as five mutually exclusive directions or sub-disciplines of systems biology, but as a characterization of approaches that we think represent the most natural components of both systems biology as a scientific endeavor and living organisms as complex systems. Thus, while not independent, the five idealized axes can be treated as quasi-orthogonal, so that any given research in reality will be a combination of various contributions of all five dimensions. The true challenge for systems biology will be to bridge the diverse cultural and mental habits of scientists working along these different axes within one system, one project, and one question. It is in this context that we can view systems biology as an emergent discipline of biology, in every sense of that word.