Introduction

Cellular metabolism, being the sum of all chemical processes that support growth and reproduction, is a fundamental property of all living organisms. These processes are extensively interconnected by way of a complex system of chemical reactions that are often modeled as a network, where nodes symbolize metabolites and links represent metabolic enzymes (Jeong et al. 2000). Systems biologists and network theorists work under the assumption that analyzing metabolic networks—in particular, examining the connection between network shape and function—will advance our understanding of microevolution, disease, and biologic complexity (Alves et al. 2002; Oltvai and Barabasi 2002; Becker et al. 2006). Already, metabolic network topology has provided a novel context for mapping transcription regulation in Saccharomyces cerevisiae (Patil and Nielsen 2005) and predicting viability of mutant strains in Escherichia coli and S. cerevisiae (Wunderlich and Mirny 2006). There is clear functional relevance in network topology; however, it is difficult to generalize without knowledge of the extent and, equally important, the ecologic basis of interspecific variation.

Optimization of network performance results, at least in part, from a trade-off between two specific qualities: efficiency and robustness (Stelling et al. 2002). Efficient networks are most effective at carrying out a function with minimal input of resources; in this case “resources” refers to “metabolic enzymes.” Robust networks are less efficient but able to maintain functionality despite system perturbations. Although there is ample evidence that a trade-off between these qualities mediates optimal performance in human-engineered networks (Konsynski and Tiwana 2004; Meepetchdee and Shah 2007) and in networks constructed in silico (Venkatasubramanian et al. 2006; Meepetchdee and Shah 2007), whether these constraints are an important feature of biologic networks is a much more difficult question to answer. If we use a cartographic view of a biologic network, we can take a variety of topologic measurements that are theoretically related to efficiency and robustness. In this context, a maximally efficient network has lower average node degree, shorter average path length, and lower ratio of hub-to-nonhub nodes; a maximally robust network displays converse topologic features (Venkatasubramanian et al. 2006). Average node clustering coefficient, which indirectly measures link redundancy, and modularity have also been shown to have a strong positive relation with network robustness (Hartwell et al. 1999; Variano et al. 2004; Zhao et al. 2006; Holmgren 2006).

A recent large-scale survey of modularity showed that bacterial metabolic networks were more modular for bacteria from more variable environments (Parter et al. 2007). The results from Parter et al. (2007) are important for two reasons. First, they demonstrate the existence of substantial interspecific variation in modularity, an important feature of network cartography. Second, they illustrate the potential ecologic relevance of such variation. However, modularity is just one aspect of metabolic network cartography and is better viewed in conjunction with other measures of network topology to provide a complete picture of network efficiency and robustness. Furthermore, Parter et al. (2007) did not explicitly accommodate the influence of phylogeny on their statistical analyses, which is a critical component of any cross-species comparative analysis. Here, we report an expanded analysis based on a suite of different measures of network cartography (hereafter referred to as “network shape profile”) relevant to the trade-off between network efficiency and robustness. Furthermore, we employ a hierarchal mixed-effect statistical model to explicitly accommodate the influence of phylogenetic nonindependence among species in our data set.

Network Theory

Topologic measurement in network biology is dominated by a small set of shape indices: (1) average path length, (2) average node degree, (3) degree distribution (λ value), (4) clustering coefficient, and (5) modularity. Each of these indices conveys critical information about efficiency and robustness in network function. Average path length (APL) is measured as the average shortest distance between all possible pairs of nodes in a network, whereas distance is the number of edges between them. In a random network, i.e., a network that is produced by adding edges to randomly chosen pairs of nodes,

$$ APL_{\rm rand} \cong \frac{\ln n}{{\ln \overline{k} }}, $$
(1)

where n refers to the number of nodes in the network, and \( \bar{k} \) is the average node degree (Fronczak et al. 2004). Node degree is the number of connections (k) between a given node and other nodes in the network. In metabolic networks, smaller APL indicates that metabolic end products can be produced in a shorter average number of steps and, thus, with lower energetic input. Similarly, a low average node degree suggests that a smaller amount of cellular resources are dedicated to each network node (i.e., metabolite).

The degree distribution of a network is based on the pattern of variation in node degree (Barabasi and Albert 1999) and is defined as the probability that a randomly selected node will have k connections within the network. This distribution has been shown in a wide range of network types to decay as a power law P(k) ~k −λ, where λ represents the slope of the decay. High values of λ indicate a steeper slope, i.e., a smaller proportion of high-degree nodes. Networks with high λ are more centered on a small number of hub nodes and are more likely to be fractioned if nodes are removed (Venkatasubramanian et al. 2006).

Clustering coefficient is measured as the average fraction of pairs of a node’s neighbours that are themselves connected (Watts and Strogatz 1998). Equation 2 defines clustering coefficient for a single node,

$$ CC_{i} = \frac{{2|E\left( {G\left( {v_{i} } \right)} \right)|}}{{k_{i} \left( {k}_{i} - 1 \right)}}, $$
(2)

where | E(G(v i )) | denotes the number of links among all neighbours of node i, and k i is the number of degrees of i. If node i has a clustering coefficient of 1, each of its neighbour nodes are directly connected to each other. Node i therefore increases redundancy in the network by providing an alternate path between each of its neighbours. Average clustering coefficient for the network, \( \overline{CC} \), is defined by Eq. 3,

$$ \overline{CC} = \frac{1}{n}\sum\limits_{i = 1}^{n} {C_{i} } $$
(3)

Only in a globally coupled network (where every node connects with every other node) would \( \overline{CC} \) = 1. In a random network,

$$ \overline{CC}_{\rm rand} \cong \frac{{\overline{k} - 1}}{n}, $$
(4)

\( \overline{k} \) is average node degree (Light and Kraulis 2004).

The last measure, modularity, is unlike the other network indices because it is an optimality criterion that is maximized for a given network. Here, a module is defined as a semiautonomous group of nodes wherein the number of connections within the module outnumbers connections to other modules (Guimera and Nunes-Amaral 2005), and the degree to which a network can be subdivided in this way defines its modularity. For a given partition of a network, modularity = M,

$$ M \equiv \sum\limits_{s = 1}^{{N_{M} }} {\left[ {\frac{{l_{s} }}{L} - \left( {\frac{{d_{s} }}{2L}} \right)^{2} } \right]} , $$
(5)

where N M is the number of modules, L is the number of links in the network, l s is the number of links between nodes in module s, and d s is the sum of the degrees of the nodes in module s (Raff and Raff 2000; Variano et al. 2004). The goal of module-determining algorithms is to uncover the configuration that maximizes modularity in the network. Networks with high modularity have well-defined modules with a low proportion of between-module links. In such networks, any “damage” incurred through random node failure is more likely to occur within a module (rather than between modules) and to be contained within that module rather than fractioning the entire network (Raff and Raff 2000; Variano et al. 2004; Griswold 2006).

The relation between network shape and function is supported in simulated networks and applied in human-engineered networks (Konsynski and Tiwana 2004; Meepetchdee and Shah 2007). Highly efficient networks tend to have short APL, small average node degree and degree distribution (low ratio of hubs to nonhubs), and low level of clustering coefficient and modularity. Finding balance between efficiency and robustness is a matter of practical concern in man-made networks because highly efficient networks are more cost-effective but exhibit decreased error tolerance. In this article we address the question of whether real metabolic networks exhibit such trade-offs, and if they are associated with the ecology of the organism.

Materials and Methods

Metabolomic Data

Metabolic connectivity lists were extracted from the metabolomic data compiled by Ma and Zeng (2003). This database contains manually curated metabolic data from the Kyoto Encylopedia of Genes and Genomes (http://www.genome.jp/kegg/). Metabolic-reaction lists for 105 species of bacteria were extracted from the Ma and Zeng (2003) database and visualized by using Pajek network analysis software (Batagelj and Mrvar 2003).

Ecologic Data

Ecologic characteristics for each species were defined in terms of five indices: niche breadth, obligate endosymbiosis, host association, host –restriction, and pathogenicity. Status of host association was obtained from the National Center for Biotechnology Information (NCBI) microbial genome database (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). The rest were obtained from the Genome Bank database (http://www.genomics.ceh.ac.uk/cgi-bin/gmine/gminemenu.cgi). Niche breadth is a composite characteristic that can be viewed as a proxy for both the diversity and fluctuation of metabolites under which an organism’s metabolism must function. The index ranges from 1 to 5; a species that inhabits a narrow and stable environment has a niche breadth score of 1, whereas a score of 5 is ascribed to one found in a highly complex and dynamic environment (Supplementary Table S1) (http://www.genomics.ceh.ac.uk/gmine/genomebankbacterialinfo.html; see Web site entry on “environmental breadth”). For comparison, we also employ the number of input metabolites for a given metabolic network as an alternative measure of diversity and fluctuation of metabolic substrates that a species is capable of using. This is a quantitative rather than ordinal measure of niche breadth, and it is computed by determining the number of metabolite nodes consumed (i.e., as enzyme substrates) but not produced (as metabolic reaction products). The remaining ecologic characteristics are catagoric variables, with each species having a score of 0 or 1. Note that host association, host restriction, and obligate endosymbiosis have logical associations with niche breadth because each is related to some aspect of habitat complexity.

Construction and Topologic Assessment of Metabolic Networks

Metabolic networks are reconstructed from the metabolite connectivity lists extracted from the Ma and Zeng (2003) database. We compute the APL, clustering coefficient, degree distribution, and average node degree all 105 networks by using the Pajek (Batagelj and Mrvar 2003) software package. Modularity is determined using a program provided by Guimera et al. that employs simulated annealing as a heuristic algorithm to determine the network conformation that maximizes modularity (Guimera et al. 2004; Guimera and Nunes Amaral 2005). We note that simulated annealing is a stochastic optimization method (Kirkpatrick et al. 1983) that appears to be most successful in maximizing modular structure for a biologic network (Guimera and Nunes Amaral 2005). Using this method, it has also been shown that modularity increases with increasing average node degree (Guimera et al. 2004). Therefore, when comparing modularity between networks, we calculate modularity for multiple randomizations of a network and normalize modularity by averaging these values.

Hierarchal Mixed-Effect Regression Model to Accommodate the Influence of Phylogenetic Nonindependence

We use a regression framework to explore how ecologic characteristics of a lineage of bacteria are related to the shape of its metabolic network. However, because the bacteria are related by evolutionary history, their traits cannot be treated as independent observations. Statistical accommodation of character correlations caused by shared evolutionary history is traditionally handled by using a set of techniques that are collectively referred to as “phylogenetic comparative methods” (PCMs). Under these models, it is assumed that longer branch length between a pair of species implies greater phylogenetic and thus greater statistical independence. Such species are given greater weight in a PCM, whereas closely related species are downweighted in the model (see Supplementary Material for further discussion of these methods). The problem with applying a typical PCM to the present data set is that there is a degree of correlation between phylogenetic branch length and niche breadth. This correlation is thought to arise from two sources. First, species with narrow habitats tend to have smaller effective population sizes, thereby increasing the rate of fixation of mutations (Woolfit and Bromham 2003). Second, prokaryotic species in narrow environments tend to undergo genome reduction and lose genes that code for DNA repair enzymes, which further accelerates rate of nucleotide substitution (Dale et al. 2003). This correlation between branch length and niche breadth introduces a systematic bias wherein species with narrow niche breadth are consistently upweighted in the PCM, which can lead to inflated type I errors. However, we cannot ignore phylogeny either because treating lineages as completely independent also can lead to inflated errors.

Here, we propose a new PCM for cases where phylogenetic branch length and model covariates are not expected to be independent. The new method is based on linear mixed-effect (LME) regression models (Pinheiro and Bates 2000), which capture the hierarchical structure of phylogenetic history in a set of nested random effects but do not require use of a highly resolved phylogeny with branch length information. This random component of our LME model allows random variation in intercept between the nested phylogenetic groups, and correlation structure for errors within these groupings. The random effects structure of the LME is determined by the data in hand by way of a hierarchical set of likelihood ratio tests (LRTs). The LRT testing follows a backward-elimination procedure based on the topology of the original phylogeny.

We employed a phylogenetic tree that represented a majority-rule consensus topology from the literature and then calculated branch lengths with a maximum-likelihood–based analysis of 4 highly conserved ribosomal protein sequences (Yang 1997; Mollet et al. 1998; Daubin et al. 2002; Wolf et al. 2002, 2004; Lerat et al. 2003; Brown and Volker 2004; Canback et al. 2004; Santos and Ochman 2004; Belda et al. 2005; Bern and Goldberg 2005; Henz et al. 2005; Kunin et al. 2005; Zhao et al. 2005; Chan et al. 2006; Ciccarelli et al. 2006; Fitzpatrick et al. 2006). Random-effects structure was determined for each network index by dividing this tree into 21 phylogenetic groups and then performing a backward-elimination procedure to collapse all groups that did not have significantly distinct random intercepts. For each group, likelihood was calculated for an LME model with and without the group included, and a likelihood ratio test was carried out (on a 50:50 mixture of \( \chi_{0}^{2} \)and\( \chi_{1}^{2} \)) to determine if the grouping significantly improved the model fit (α = 0.05) (Self and Liang 1989). Once a random-effects structure was defined for each network statistic, ecologic features were assessed individually for significant associations with each measure of network shape. Additional details of the procedure, as well as a graphic representation of the phylogenetic groups, are provided as Supplementary Material (Supplementary Fig. S1; Supplementary Notes).

For comparison, we also analyze the same data under a classical PCM using a generalized estimating equation (GEE) framework, which is among the most commonly used statistical approaches in phylogenetic comparative modelling (Paradis and Claude 2002). As with typical PCMs, the phylogenetic tree structure is used to define a variance–covariance matrix, which is then applied to determine the weight given to each observation (i.e., species). Further explanation of this analytic framework is presented in the Supplementary Material.

Results and Discussion

Cartography of Metabolic Networks Reflects a Trade-Off Between Efficiency and Robustness

Using publicly available metabolomic data, we reconstructed metabolic networks of 105 bacterial lineages from 8 distinct phyla and a wide range of ecologic lifestyles. Based on these networks, we created a network shape profile for each lineage that includes measurements of APL, average node degree, exponent of power-law distribution of degrees (λ; approximately measures the ratio of hub-to-nonhub nodes), clustering coefficient, and modularity. Figure 1 shows the substantial natural diversity of each network shape index. Note that the indices in Fig. 1 are standardized to aid comparison (the original scores are presented in the Supplementary Material). Variation in normalised APL (mean 0.9292; SD 0.2738), node degree (mean 1.851; SD 0.129), and normalised clustering coefficient (mean 19.987; SD 7.291) is particularly dramatic. For example, clustering coefficient ranges from a maximum of 34.363 in E. coli (strain O157 EDL933), to a minimum of 1.243 in Ureaplasma urealyticum. Because these values have been normalized for network size, they indicate that metabolic network redundancy (in terms of alternate paths between neighbouring nodes) can be quite variable between species.

Fig. 1
figure 1

Observed variation in network index values for 105 bacterial lineages. For full species names see Supplementary Table S2. Colours of the branches indicate niche breadth score for each lineage: red = 1; green = 2; aqua = 3; fuchsia = 4; and dark blue = 5. For each lineage, the score for the five different measures of network shape are shown directly below its branch of the tree; scores are standardized to 1 for comparative purposes. Open circle average path length; inverted triangle average node degree, square λ value (exponent of degree distribution); solid circle clustering coefficient, triangle normalized modularity. Values are standardized by the maximum value of each index

To illustrate this variation on a system level, Fig. 2a and b show the striking topologic differences in the metabolic networks of two bacterial species, Buchnera aphidicola and Pseudomonas aeruginosa. Based on their shape profiles, the P. aeruginosa network should have higher robustness and lower efficiency compared with B. aphidicola. With short APL and low node degree, the B. aphidicola network has a more efficient topology. However, the low redundancy of links and rarity of hubs yields a network expected to be more subject to fragmentation in the face of system perturbations, i.e., it will be less robust. P. aeruginosa has a more densely connected network with higher average node degree, longer APL, and a higher ratio of hubs to nonhubs; also, this network is more clustered and modular (see Supplementary Table S2). Both are species of γ-proteobacteria; however, B. aphidicola is a nonpathogenic obligate endosymbiont, whereas P. aeruginosa is a pathogen that can be found in a variety of terrestrial and aquatic habitats. Although these examples are consistent with an association between niche breadth and network topology, a more systematic approach is required to determine if such an association holds over our broad sample of bacterial diversity.

Fig. 2
figure 2

Assessment and illustration of the trade-off between efficiency and robustness in bacterial metabolic networks. a and b Representations of metabolic networks were obtained by using Pajek software, with nodes symbolizing metabolites and links representing metabolic enzymes. Blue nodes represent the largest connected cluster of nodes, known as the “giant strong component” (GSC). Red nodes are not part of the GSC and are arranged around the periphery of the GSC for convenience. The metabolic network of B. aphidicola (a) contains 306 metabolites and 262 links. This network cartography favours efficiency over robustness, with metabolic end products of the GSC being produced in a small number of steps. The network of P. aeruginosa (b) contains 750 metabolites and 751 links. This network cartography favours robustness over efficiency, with a high amount of edge redundancy within the GSC. c The upper right off-diagonal shows all pairwise comparisons among 105 bacterial species for 5 different indices of network shape. The diagonal indicates each individual network index its expected relation to both efficiency and robustness. nAPL normalized average path length, \( \overline{k} \) average node degree, −λ exponent of degree distribution; nCC normalized clustering coefficient; nMod normalized modularity. Figures in the lower left off-diagonal represent R 2 and p to illustrate the strength of correlation between each pair of network indices

To explore the relation between efficiency and robustness of metabolic networks, we plot all possible comparisons of the network statistics included in our shape profile (Fig. 2c). The data for APL, clustering coefficient, and modularity are normalized by the expected values for a randomly connected network with the same number and average degree of nodes. This normalization is important because these indices are, to an extent, mathematically dependent on network density; i.e., if the number of edges in a given network was doubled without changing the number of nodes, APL and modularity would decrease, whereas average clustering coefficient would increase. The schematic plots along the diagonal of Fig. 2c show the relation between each network statistic, the properties of efficiency and robustness, and the Pearson correlation coefficient for each pair. Figure 2c shows that after normalization for network size and node degree, all measures of metabolic network shape exhibit evidence of the positive correlation. We note, however, that the measure of degree distribution (λ) exhibits a generally weaker correlation with the rest of the indices. This indicates that the proportion of hubs to nonhubs does not strongly covary with other measures of network shape. Nonetheless, there is a positive correlation across all measures, and this is expected if the network topographies tend to reflect a consistent trade-off between efficiency and robustness (Venkatasubramanian et al. 2006). Indeed, for many of the network statistics the correlation is strong (i.e., >0.8 correlation coefficient). Based on these results, it appears that bacterial metabolic networks with short APL also tend to have small average node degree and low clustering and modularity, all of which are expected to promote efficiency over robustness. Although it is theoretically possible for a network to be highly efficient in terms of some indices and highly robust for others (Albert and Barabasi 2002), this is not the case in the bacterial metabolic networks examined here.

Biologic Conclusions Are Sensitive to the Statistical Framework of a Phylogenetic Comparative Method

Simulation-based in silico studies of the origin of nonrandom structure in networks suggest that trade-offs between efficiency and robustness can reflect environmental factors (Venkatasubramanian et al. 2006), e.g., networks that must function in the face of frequent environmental perturbation, or over a wider variety of conditions, will tend to have a more robust structure. We use PCMs to explore the potential for such a relation in biologic networks by assessing whether ecologic characteristics are strong predictors of metabolic network topography. Under the LME framework, we incorporate the underlying phylogenetic structure of the data into our model by way of the random component. Under the GEE framework, the topology and branch lengths of the phylogeny are used to determine a correlation matrix for the regression model. The fixed component of the models are comprised of the number of input metabolites for each network as well as the five ecologic characteristics for each of the sampled lineages in our data set: niche breadth, host association, host restriction, obligate endosymbiosis, and pathogenicity.

The LME regressions indicate that certain ecologic characteristics of an organism are predictors of the shape of its metabolic network. We observe highly significant relations between network shape and niche breadth, number of metabolite inputs, obligate endosymbiosis, host restriction, and host association but not pathogenicity (Table 1). The GEE-based regression results, however, indicate different relations. GEE results suggest a significant relation between network shape and pathogenicity and significant correlation between some measures of network shape and obligate endosymbiosis, host restriction, and host association. In addition, for some regressions between network shape indices and niche breadth and number of metabolite inputs, GEE results indicate significant relations in the opposite direction of those indicated by the LME results (Table 1). Using APL and niche breadth as an example, the LME regression indicates a positive relation (p < 1 to 16), whereas GEE regression indicates a negative relation (p < 1e to 14; Fig. 3). Because both results are significant, the difference cannot be attributed to sampling errors; at least one case is an analytic artifact. Regardless of the origin of the discrepancy, biologic conclusions for these data are sensitive to the assumptions of the PCM.

Table 1 Fixed-effect coefficients for linear mixed effect and generalized estimating equation regressions
Fig. 3
figure 3

Relation between APL and niche breadth and distribution of weights in the GEE model. Colours indicate the weight of each observation in the GEE fit. Data points in blue receive higher weight (i.e., have greater influence on estimation of regression coefficients), whereas those in yellow are downweighted. Solid line GEE fit, dashed line LME fit, nAPL normalized average path length, NB niche breadth

The GEE regression uses a Brownian motion model whereby branch lengths determine the degree of independence among observations (Martins and Hansen 1997). Specifically, branch lengths determine the values of a variance–covariance matrix that are used to weight the observations from each species. A relation between branch lengths and the response variables would lead to an inappropriately formulated matrix and thereby introduce a systematic error into the weighting scheme. Figure 3 illustrates that the difference between the LME and GEE regression is indeed a consequence of the weighting scheme; a subset of observations are strongly weighted (in blue), and it appears that the result of the GEE regression is strongly influenced by these observations. We conducted a comprehensive sensitivity analysis of the two regression methods and found that the GEE results were sensitive to the weighting scheme of the variance–covariance matrix (see Supplementary Material; Supplementary Fig. S2). Given the sensitivity of the results under the GEE approach to the structure of the variance–covariance matrix, and a clear gap between the assumptions of the Brownian motion model and the branch length data, we conclude that the GEE-based regression results are unreliable in this setting.

The LME regression model accommodates phylogenetic structure as a set of nested random effects. Moreover, because it does not rely on branch length data, it is appropriate for modeling data with correlation between model covariates and branch length and also in cases where a phylogenetic tree cannot be confidently resolved. This is particularly important in studies of prokaryotes, where frequent lateral gene transfer (LGT) causes significant heterogeneity in phylogenetic signal among genes. The LME approach only requires that the structure of the nested random effects is correct, whereas the GEE approach requires that a fully resolved organism history, without errors, is known a priori. The LME regression is clearly more appropriate for these data and indicates that the shape of a metabolic network, as it relates to the trade-off between efficiency and robustness, is associated with habitat complexity.

Trade-offs Between Efficiency and Robustness in Bacterial Metabolic Networks Are Associated with Niche Breadth

We have shown that the structure of a metabolic network varies greatly among prokaryotes and that it is associated with several measures of habitat complexity. To investigate the robustness of this finding, we reanalyzed network cartography with respect to the number of input metabolites. This provides an alternative measure of the way an organism uses the metabolic complexity of its environment because it is a function of the diversity of metabolic substrates that a species is capable of metabolizing. An LME regression using number of input metabolites (rather than niche breadth) yielded the same qualitative results: a highly significant positive relation between network shape and number of input metabolites (Table 1). Thus, the relation is robust to these measurements of habitat complexity. We propose that niche breadth is the most useful predictor of network shape because it provides a simple yet encompassing measure of environmental complexity.

Figure 4 illustrates how network cartography changes as a function of niche breadth. In species with low niche breadth, topologic features conferring efficiency are prominent, whereas those that impart robustness are weaker. Short APL in such networks indicates that metabolic end products can generally be produced in a smaller number of steps and, thus, with less energetic input (i.e., cellular resources). This has been previously reported for genomes of intracellular species that have lost genes associated with processing metabolites that can be consistently acquired from the host (Batagelj and Mrvar 2003). Estimates of λ indicate that these networks tend to be centered on just a small number of hub nodes. Furthermore, lower clustering coefficients indicate lower redundancy within the network. Although highly efficient, such networks are susceptible to fragmentation if a hub node (i.e., metabolite) becomes unavailable. Network robustness, by way of redundancy, is more costly to maintain and may be an energetic liability in habitats with a narrow and stable collection of metabolites, such as the intracellular or host-restricted environment.

Fig. 4
figure 4

Relation between the cartography of a metabolic network and niche breadth. All five indices of network shape are significantly associated with niche breadth (see Table 1). Box-and-whisker plots are used to show the distribution of each shape statistic according to niche-breadth score. nAPL normalized average path length, \( \overline{k} \)average node degree, exponent of degree distribution, nCC normalized clustering coefficient, nMod normalized modularity

Species inhabiting highly complex and dynamic environments, i.e., large niche breadth, have larger average node degree, indicating greater energetic input into cellular metabolism (see Fig. 4 and the Supplementary Material). Although less efficient, the networks of such species are more robust through greater edge redundancy, a feature that is considered critical for survival in conditions with fluctuating metabolite availability. Indeed, species with wide environmental breadth have been shown to respond to environmental changes in metabolite availability through activation of alternate metabolic pathways (Almaas et al. 2005). Even in cases where alternative pathways are not available, changes in metabolite availability will have less impact on overall metabolism, compared with species with low niche breadth, because their networks tend to be more modular, i.e., they tend to have greater independence between clusters of nodes.

Variation in network shape is greatest in niche breadth scores 1 through 3 and more subtle in scores 3 through 5. This may reflect more dramatic lifestyle variation between species with narrow habitats, or it may indicate that the definitions of niche breadth require finer clarification. Future efforts to characterize habitat complexity may benefit from using a semiquantitative approach, by incorporating both ordinal and quantitative indices. The index describing number of metabolite inputs provides a useful new perspective on species ecology because it describes metabolic plasticity in quantitative terms.

Despite recent developments, science is a long way from a full understanding of the origins of biologic complexity. The tools of network analysis seem to offer great promise for advancing such studies; however, in the rush to apply these tools to questions of complexity, there has been little critical assessment of the scope of network shape diversity or its potential ecologic relevance (Lynch 2007). With this work we have taken some first steps in that direction. We have established that the natural diversity of metabolic cartography is extensive and covaries with features of ecology that have clear metabolic significance. The positive association between network robustness and niche breadth suggests that network cartography can be viewed as a complex phenotype with potentially adaptive qualities. The next challenge is to develop a rigid null model for the evolution of metabolic structure by neutral processes. Such a model could serve as the basis for an explicit test of the role of natural selection in the origins of network robustness and efficiency.