Introduction

Growth is a fundamental property of microbial communities that underpins myriad ecosystem functions1,2. For example, increased microbial growth comes at the cost of microbial resource acquisition (e.g., exoenzyme production)3, thereby influencing decomposition rates of soil carbon (C) sources. Similarly, microbial partitioning of assimilated C resources into growth vs. other processes (e.g., respiration) is a key driver of soil organic matter formation4,5. These tradeoffs highlight key feedbacks among microbial communities, soil C stocks, and global climate, as the decomposition and eventual fate of soil C sources (i.e., storage vs. respiration as CO2) is ultimately dependent upon microbial growth dynamics6. Therefore, to better understand and predict variation in ecosystem C cycling and storage, an improved understanding of the patterns and drivers of microbial growth potential is needed.

Microbial growth is frequently measured in ecological studies using isotope tracing methods, often to quantify microbial C use efficiency4 or to elucidate growth rates of specific microbial groups7,8. Studies employing these methods in soil have demonstrated that microbial growth rates vary among bacterial taxa9, with soil depth10, among ecosystem types11, and in response to many anthropogenic environmental changes, including land use change12, drought13, warming13,14, and fertilization15. Isotopic methods have the advantage of providing direct measurements of actual in situ microbial growth yield, though the addition of water or exogenous substrates biases isotopic growth estimates4. In addition, no study (likely due to low throughput) has applied these methods at the spatial scale required to elucidate macroecological patterns in microbial growth.

Another approach to estimating microbial growth rates involves the use of genomic or metagenomic sequence data16,17,18. Metagenomic approaches to estimating microbial growth potential, along with other community-aggregated traits19,20, are promising given the increasing prevalence of publicly available sequence data collected from different ecosystem types across the globe21. An additional advantage of genomic/metagenomic growth estimates is that they do not suffer from biases introduced by substrate or water addition. Estimates of maximum bacterial growth rates from genomes and metagenomes are based on the observation that bacterial growth rates are best predicted by the degree of codon usage bias present within a particular organism or community16. This is because rapidly-growing bacterial taxa have undergone selection for highly biased codon usage in their highly-expressed genes (e.g., ribosomal protein genes) such that the codons match their cellular tRNA pool, thus enabling rapid production of cellular components required for growth (e.g., ribosomes)16. Estimates of maximum bacterial growth rates from codon usage statistics have been shown to be highly accurate for both environmental and host-associated bacterial communities17,18,22. Surprisingly, however, these metagenomic growth potential estimates have only very rarely been used for soil bacteria, though one prior study did show that maximum growth rates were higher for soil bacteria in vegetated soils compared with bare soils in an arid ecosystem23. At the same time, another found no differences in growth rates among Chinese forest soils with different pH24.

Though estimating microbial growth rates is possible using metagenomic methods, we know of no studies that rigorously assess global macroecological patterns in soil bacterial growth potential. This contrasts with other aspects of soil bacterial communities, e.g., taxonomic diversity and functional gene composition, for which some macroecological studies have emerged in recent years21,25,26. These studies have demonstrated that soil bacteria do not exhibit the traditional latitudinal diversity gradients that are typical of macro-organisms (e.g., plants)21,25 and that global patterns in the functional attributes of bacterial communities (e.g., antibiotic resistance gene relative abundances and C acquisition gene relative abundances) are strongly associated with ecosystem aridity21,26,27. Similarly, this study aimed to explore macroecological patterns and environmental drivers of soil bacterial growth potential. We also sought to identify global-scale relationships between growth potential and key bacterial C cycle functions. Assessing these relationships is critical because they represent potential mechanistic links between microbial life history and soil C cycling and because the current prevailing theoretical frameworks in microbial ecology predict them. For example, the Y-A-S (i.e., growth yield vs. resource acquisition vs. stress tolerance) framework predicts that the dominant life strategies of microorganisms are subject to tradeoffs. Thus, optimization of growth potential should come at the cost of C resource acquisition potential28. Empirical evidence of these tradeoffs exists for some ecosystems3, though whether or not these tradeoffs manifest at macroecological scales is unknown.

In this study, we use a global dataset of soil metagenomes21 to estimate maximum growth rates for each community using a recently developed growth model based on codon usage statistics18. As an additional metric of growth potential, we estimate the community-averaged 16S rRNA operon number for each metagenome, which has also been shown to be a strong indicator of growth potential in bacteria29. We hypothesize that the global patterns in bacterial growth potential will reflect ecosystem productivity, as highly productive ecosystems should exhibit soil conditions that promote bacterial growth, e.g., high soil moisture and organic matter content. In addition, in accordance with the Y-A-S framework, we predict that bacterial growth potential is negatively correlated with the relative abundances of C cycle resource acquisition genes on a global scale. In this work, we show that maximum growth rates of soil bacterial communities exhibit clear relationships with several environmental variables, including positive associations with forest cover and ecosystem net primary productivity. We also show that bacterial growth potential is negatively associated with C acquisition gene relative abundances, which is in line with the prevailing conceptual frameworks in microbial ecology. Overall, our work demonstrates that bacterial growth is a key physiological mechanism that drives ecosystem C cycling at a global scale.

Results and discussion

Growth potential of soil bacteria across biomes

We used codon usage statistics to estimate bacterial growth potential in 176 soil metagenomes sampled from 11 terrestrial biomes across six continents (Supplementary Fig. 1). Estimated maximum growth rates exhibited significant variation across biomes, with latitude explaining ~ 15% of the variation in maximum growth rates (Fig. 1A). The best-supported regression model for maximum growth rates as a function of latitude was quadratic, with higher maximum rates observed at tropical and temperate/boreal latitudes and lower maximum rates in more arid subtropical latitudes (Fig. 1A). Interestingly, a prior analysis of these same metagenomes showed that bacterial taxonomic and functional diversity exhibited opposite latitudinal patterns compared to the patterns for maximum growth rates, i.e., taxonomic and functional diversity were higher in mid-latitudes21. This suggests that environmental selection for rapid microbial growth results in a less diverse set of highly competitive taxa that have a more streamlined set of functions. These global patterns are distinct from those observed in plant communities; unlike plants, high growth potential in bacteria is not associated with higher taxonomic diversity. This complements prior studies demonstrating that macroecological patterns in microbial community characteristics are different from the traditional latitudinal gradients seen in macro-organisms21,25.

Fig. 1: Global patterns in the growth potential of soil bacterial communities.
figure 1

Panel (A) shows the best-supported polynomial regression function (quadratic) for growth rates as a function of latitude (F = 16.1, p < 0.001). Differences in maximum growth rates between forested and non-forested biomes are shown in (B), with asterisks indicating significantly higher maximum growth rates (F = 11.2, p < 0.001). The box plots in (B) show group medians and interquartile ranges with diamond symbols indicating group means and whiskers representing the upper and lower quartiles plus or minus 1.5 times the interquartile range. The analyses shown in (A) and (B) were one-tailed (F-tests). Also shown are Pearson correlations between maximum growth rates and average 16 S rRNA gene copy number (C) and average genome size (D). The correlation analyses in (C) and (D) were two-tailed. The shaded areas in (A), (B), and (C) represent the 95% confidence interval around the regression line. For all the statistical analyses shown, n = 176. Source data are available as a Source Data file and on figshare: https://doi.org/10.6084/m9.figshare.24498484.v3.

Based on the latitudinal patterns, it appeared that maximum growth rates were generally higher in forested biomes. To test this, we categorized the biomes as forested vs. non-forested and indeed observed ~ 12% higher maximum growth rates in forests (Fig. 1B). We observed the same latitudinal patterns and the same effects of forest cover in every iteration of the growth models we ran: in predictions from assembled contigs (Fig. 1), from gene fragments (Supplementary Fig. 2), and in a GC-corrected version of the model (Supplementary Fig. 3) (see “Methods” section). However, this dataset has a limited number of samples from non-forested environments from which to draw firm conclusions. Therefore, we repeated the same growth rate analysis in metagenomes from 42 National Ecological Observatory Network (NEON) sites located across the United States. In the NEON dataset, the effect of forest cover was even more pronounced, with ~ 31% higher predicted maximum growth rates in forested sites (Supplementary Fig. 4). These results allow us to place global soil bacterial communities within the copiotroph vs. oligotroph life history framework17, where rapidly-growing communities from forested environments can be considered more copiotrophic while slower growing communities from non-forests are more oligotrophic.

For additional validation, we also assessed community-averaged 16S rRNA gene copy numbers, as organisms adapted to rapid growth are known to have elevated rRNA gene copy numbers in order to boost cellular ribosome numbers needed for replication29. As expected, average 16S rRNA gene copy numbers were significantly positively correlated with maximum growth rates (Fig. 1C). Maximum growth rates were also weakly positively correlated with average genome sizes (Fig. 1D). This contrasts with multiple prior studies that showed no relationship between growth and genome size16,30. This suggests that while higher maximum growth rates and larger genomes tend to co-occur in some soils (i.e., in forested biomes), they may be independently selected for in those environments30. To rigorously assess the relationship between maximum growth rates and metagenome GC content, we used a GC-corrected version of the growth model, as this ensures that the growth model itself has no dependency on GC content. We found a significant negative correlation between maximum growth rates and GC content (Supplementary Fig. 5), which has also been previously shown in soils from arid environments23. Elevated GC content is a known adaptation to increased thermotolerance in bacteria31. Therefore, our results suggest that there is a fundamental trade-off between stress tolerance and growth potential in soil bacteria. This tradeoff is a central component of the growth yield vs. resource acquisition vs. stress tolerance (Y-A-S) microbial life history concept (i.e., ‘yield’ vs. ‘stress tolerance’)28, as well as the older competitor-stress tolerator-ruderal (C-S-R) concept upon which Y-A-S is based32. Our results confirm the existence of this tradeoff and identify the global environmental factors (e.g., forest cover) that influence the balance between stress tolerance and growth potential traits in soil bacteria.

While the ecological patterns in bacterial growth potential we observed appear to be robust, more deeply sequenced metagenomes would provide improved estimates. Whereas the assemblies in this study were not ideal (see “Methods” section), greater sequencing depth would facilitate the assembly of full genomes from the metagenomes33, which, in turn, would yield more accurate growth estimates17. In addition to improved growth prediction accuracy, the estimation of growth rates of individual members of communities would also allow for studies of within-community variation of growth potential across environmental gradients. Given the continually increasing size and availability of sequence datasets, future studies should be able to address these questions. We also note that this study’s sampling scheme was unsuitable for assessing within-biome patterns in bacterial growth potential. Thus, future studies should also assess the scales at which metagenomic life history tradeoffs manifest, i.e., whether these tradeoffs exist within all biomes, only particular biomes, or only manifest at macroecological scales.

Environmental drivers of bacterial growth potential

Predicted maximum growth rates and associated genomic traits were significantly correlated with several climatic and soil variables (Supplementary Fig. 6). Growth metrics were positively correlated with indicators of productivity – for example, maximum growth rates and 16S rRNA gene copy number were both positively correlated with ecosystem net primary productivity (NPP) (Supplementary Fig. 6). Growth rates were also positively correlated with soil moisture, soil organic C content, and C:N ratios, and negatively correlated with soil pH (Supplementary Fig. 6), all of which are soil properties that vary across productivity gradients. Similar correlations were also present in the metagenomes from the 42 NEON sites, where maximum growth rates were strongly negatively correlated with soil pH (Supplementary Fig. 4). To identify the strongest statistical predictors of bacterial growth potential, we used random forest regression. The random forest regression model explained 43% of the variation in maximum growth rates, and the model results corroborated the correlation analyses, identifying soil pH, soil C:N, and distance to the equator as the strongest predictors of maximum growth rates (Fig. 2). For 16S rRNA gene copy numbers, ecosystem NPP and soil moisture were the strongest predictors (Supplementary Fig. 7).

Fig. 2: Random forest regression model showing the strongest environmental predictors of bacterial community-averaged maximum growth rates.
figure 2

The ‘importance’ of each variable was quantified as the increase in model error (mean squared error, ‘MSE’) when each respective variable was randomly shuffled across the dataset. For the random forest model, n = 176 samples. Source data are available as a Source Data file and on figshare: https://doi.org/10.6084/m9.figshare.24498484.v3.

Given that acidic soils are generally understood to be stressful environments for soil microorganisms, it seems unlikely that increasingly acidic soil pH in itself would promote rapid bacterial growth, as is suggested by our random forest model (Fig. 2). Indeed, prior work has shown that within specific biomes, lower soil pH tends to inhibit microbial growth (grasslands)3 or does not influence growth rates (forests)24. Therefore, we suggest that the strong association of low soil pH with high growth potential we observed on a global scale is probably due to low soil pH simply being a highly consistent indicator of high productivity rather than any direct positive influence of low pH on growth. Soil pH was indeed significantly correlated with nearly all indicators of ecosystem productivity (e.g., NPP, MAP, distance to equator) (Supplementary Fig. 8). Highly productive ecosystems then have other soil characteristics that are more likely to directly promote bacterial growth potential, e.g., high moisture and organic carbon content, both of which were positively correlated with growth potential (Supplementary Fig. 6). To support this conclusion, we ran an alternative random forest model for maximum growth rates that excluded soil pH as a candidate environmental predictor. The alternative model still explained 38% of the variation in maximum growth rates (compared with 43% for the model that includes soil pH) and identified soil C:N, distance to equator, and soil moisture as the most important predictors (Supplementary Fig. 9). Thus, while we cannot rule out the possibility that soil pH plays a direct role in driving bacterial growth potential, our results do show that indicators of productivity other than soil pH perform similarly in accounting for variation in maximum bacterial growth rates. Overall, these results support our first hypothesis, demonstrating that the productivity of soil bacterial communities mirrors the productivity of the ecosystems that they inhabit. This finding demonstrates that through a chain of energy and nutrient transfer, the productivity of an ecosystem drives the productivity of all organisms within that ecosystem, including microorganisms. While there are surprisingly few studies investigating patterns and environmental drivers of bacterial growth across biomes8,11, this conclusion is in agreement with experimental work that has demonstrated positive links between plant productivity and microbial growth34,35. Our results are also in agreement with a previous study from an arid ecosystem that showed higher bacterial growth rates in more productive vegetated soils than in bare soils23 as well as a study on marine bacteria that showed higher maximum growth rates at the ocean surface than in the less productive subsurface18. Prior work has also demonstrated positive links between ecosystem productivity and key soil health indicators36, which suggests that bacterial growth potential may also be positively associated with overall soil health.

Overall, our results suggest that ecosystem productivity is a key global driver of bacterial growth potential, which, in turn, was highest in both tropical forests and temperate/boreal forests. This result seems inconsistent with the prevailing paradigm in ecology that ecosystem productivity is much higher in tropical latitudes than in temperate/boreal latitudes37. However, in contrast to the historical paradigm, some recent studies have shown that temperate/boreal forests may actually have productivity similar to that of tropical forests37,38, at least during the growing season39. Therefore, ecosystem productivity and associated soil properties are plausible global drivers of bacterial growth potential. It is also possible that other characteristics of forest ecosystems, potentially related to ecosystem productivity but not productivity per se (e.g., high plant biomass accumulation), play a role in driving high bacterial growth potential, which should be investigated in future studies.

Relationships between growth potential and functional genes

Our second hypothesis was that bacterial growth metrics would be associated with the potential C cycle functions of the bacterial communities, with the expectation that higher growth potential would come at the cost of reduced potential for C resource acquisition. Here, we focus on the relative abundances of genes encoding enzymes involved in carbohydrate metabolism and transport (eggNOG COG ‘G’) and energy production and conversion (eggNOG COG ‘C’), as these genes are key drivers of microbial C cycling in soil. We found that maximum growth rates were positively correlated with the relative abundances of energy production and conversion genes (Fig. 3A) and negatively correlated with relative abundances of carbohydrate metabolism and transport genes (Fig. 3B). We observed the same correlations between 16S rRNA gene copy numbers and the two gene categories (Supplementary Fig. 10). In addition, similar to the results for eggNOG carbohydrate metabolism/transport genes, relative abundances of genes encoding carbohydrate-active enzymes (CAZymes) were negatively correlated with both maximum growth rates and 16S rRNA gene copy numbers (Supplementary Fig. 11). These results support our second hypothesis and are similar to prior studies that also found enrichment of carbohydrate metabolism genes in more oligotrophic (i.e., slower growing) bacterial communities23 as well as prior work that has shown rapidly growing communities to have reduced investment into resource acquisition (i.e., extracellular enzymes)3. This tradeoff between bacterial growth potential and resource acquisition potential is another fundamental aspect of the Y-A-S microbial life history framework (i.e., ‘yield’ vs. ‘acquisition’)28. Again, our results confirm the existence of the tradeoff and identify ecosystem productivity as the key environmental gradient along which the tradeoff operates.

Fig. 3: Pearson correlations between maximum growth rates of bacterial communities and bacterial C cycle functional gene relative abundances.
figure 3

Shown are correlations between growth rates and the relative abundances of energy production and conversion genes (eggNOG COG category ‘C’) (A) and between growth rates and the relative abundances of carbohydrate transport and metabolism genes (eggNOG category ‘G’) (B). The correlation analyses on both (A) and (B) were two-tailed and n = 176. The shaded areas in (A) and (B) represent the 95% confidence interval around the regression lines. Functional gene relative abundances are expressed as the percentage of annotated reads assigned to the respective categories. Correlations between the two categories and 16 S rRNA gene copy numbers are shown in Supplementary Fig. S10. Source data are available as a Source Data file and on figshare: https://doi.org/10.6084/m9.figshare.24498484.v3.

In addition, the positive association between bacterial growth potential and energy production/conversion genes we observed suggests that high growth potential is associated with the conversion of assimilated C resources to energy as opposed to biomass accrual. Another recent study in soils also associated copiotrophic bacterial communities with greater energy metabolism24, and prior culture-based studies have similarly demonstrated a negative relationship between maximum growth rates and bacterial C use efficiency29. It is possible that rapid bacterial growth and turnover will, in turn, promote soil organic matter formation via increased production of microbial residues5. Alternatively, the reduction in C use efficiency that likely accompanies increased growth rates could reduce rates of soil organic matter formation due to increased respiratory loss of C40. Future research should attempt to disentangle these potential C cycle pathways in relation to bacterial growth potential. It should also be noted that some of the relationships between metagenomic traits we observed are at odds with prior work on marine bacteria, which showed faster-growing organisms to be enriched in carbohydrate metabolism genes and slower-growing communities to be enriched in energy metabolism genes17, the opposite of what we observed. The reason for the discrepancy is not clear, though it is possible that marine vs. soil environments impose different selective pressures on bacteria that lead to different associations between metagenomic traits. Given this discrepancy, it is not clear whether Y-A-S tradeoffs will be universally observed across all microbial habitats. Identifying the similarities and differences in microbial tradeoffs among different microbial habitats represents another important area of future research.

The results described above are corroborated and summarized by our structural equation model (SEM), which illustrates the links among climate, soil properties, bacterial growth potential, and C cycle functions (Fig. 4). The SEM reflects our hypothesis that high bacterial growth potential is unlikely to be mechanistically linked to low soil pH, i.e., pH was not included in the a priori SEM. The SEM also reflects our hypothesis that productivity is an important global control over bacterial growth potential and thus includes NPP as a candidate variable. Though NPP was not among the strongest predictors of maximum growth rates in the random forest model (Fig. 2), it is reasonable to include NPP in the SEM given our hypotheses and given that NPP was significantly correlated with maximum growth rates (Supplementary Fig. 6) and was also the strongest predictor of 16S rRNA gene copy number (Supplementary Fig. 7). The SEM supports our hypothesis that bacterial growth potential is driven primarily by soil properties such as moisture and C availability, which, in turn, are controlled by productivity (i.e., NPP, distance to equator) (Fig. 4). Growth potential is then significantly associated with C cycle functions, with a negative effect of maximum growth rates on carbohydrate metabolism genes and a positive effect of maximum growth rates on energy production/conversion genes (Fig. 4). Because we cannot rule out the possibility that soil pH plays a direct role in driving growth potential, we ran an alternative SEM that includes pH, which produced very similar results (Supplementary Fig. 12). Overall, our results identify the environmental conditions that select for high bacterial growth potential and suggest that variation in bacterial growth potential among biomes will drive varying rates of decomposition and C storage in soils from different environments. These results complement a prior study using the same metagenomes, which described life history trait dimensions on the basis of functional gene relative abundances and genomic traits26. That study found that arid biomes were associated with bacterial communities with small genomes and increased relative abundance of C acquisition genes26. Other studies have similarly found that global patterns in the taxonomic composition of bacterial communities are strongly associated with ecosystem aridity vs. productivity27. Our results demonstrate that those macroecological patterns in the functional and taxonomic characteristics of bacterial communities are fundamentally linked to the growth potential of those communities.

Fig. 4: Structural equation model demonstrating links among climate variables, soil variables, bacterial community growth potential, and relative abundances of bacterial C cycle functional genes.
figure 4

The path diagram shows standardized path coefficients, and only statistically significant paths are shown. The SEM explains 43% of the variation in energy production/conversion genes, 21% of the variation in carbohydrate transport/metabolism genes, and 21% of the variation in maximum growth rates. All fit indices indicated adequate fit of the model (CFI > 0.99, SRMR and RMSEA both < 0.05). For the structural equation model, n = 176 samples. Source data are available as a Source Data file and on figshare: https://doi.org/10.6084/m9.figshare.24498484.v3.

Concluding remarks

Growth potential is a fundamental property of microbial communities that control rates of C cycling and storage in terrestrial ecosystems. We demonstrate that the growth potential of soil bacterial communities varies among terrestrial biomes, with high bacterial growth potential associated with highly productive forest ecosystems. Further, variations in growth metrics were associated with variations in microbial C cycle gene relative abundances, with an apparent tradeoff between growth potential and resource acquisition potential. Importantly, the metagenomic approach we used only estimates maximum growth rates and not actual in situ growth, which can be substantially slower than maximum rates11. Thus, future studies should use empirical methods (e.g., isotope tracing) to validate the patterns we observed. Future studies should also validate the results observed here using more deeply sequenced metagenomes, which would allow for a more accurate estimation of growth potential in bacterial genomes assembled from the metagenomes17. Regardless, the macroecological patterns in bacterial growth potential we identify are a substantial contribution to bacterial life history theory, confirming the existence of hypothesized bacterial life history tradeoffs (i.e., growth vs. stress tolerance and growth vs. resource acquisition) as well as identifying the environmental gradients along which those tradeoffs manifest. These tradeoffs and their associated environmental drivers, in turn, determine the relative dominance of different soil C cycle pathways. Thus, our work establishes a microbial physiological framework with which to better understand the decomposition and storage of soil C at a global scale.

Methods

Sample collection

For this study, we analyzed a dataset of 176 metagenomes generated from soils from 11 terrestrial biomes across six continents (Supplementary Fig. 1). All soil samples were collected and shipped in accordance with local and international laws, as described previously41. The soil samples were analyzed for chemical properties, including pH, total N, organic C, total phosphorus (P), calcium (Ca), potassium (K), and magnesium (Mg) contents21. Temperature and precipitation data for each soil sampling location were obtained from the WorldClim database, while estimates of potential evapotranspiration (PET) and net primary productivity (NPP) were obtained from the Atlas of the Biosphere21.

Metagenomic analyses

DNA was extracted from 2 g of each soil sample using the PowerMax soil DNA isolation kit (MoBio). DNA was prepped for shotgun metagenomic sequencing using the Illumina TruSeq Nano HT kit and sequenced on the Illumina HiSeq 2500 platform (2 × 250 bp)21. We downloaded the raw metagenomic sequence reads from NCBI accession number PRJEB18701. We quality-filtered the raw reads using trimmomatic (version 0.39)42 and assigned taxonomy to the quality-filtered reads using kraken2 (version 2.1.3)43 with the Refseq genomes for bacteria, archaea, viruses, fungi, and protists as reference databases. To avoid biases from varying relative abundances of eukaryotic sequences among ecosystems (Supplementary Fig. 13), we conducted all downstream analyses using only the metagenomic reads identified as bacterial in origin.

We estimated the average minimum doubling times of the bacterial communities using gRodon217, which works by quantifying codon usage bias in highly expressed ribosomal protein genes while also accounting for the relative coverages of genes in the community. The method has previously been shown to be most accurate when genes are predicted from assembled contigs18. Therefore, we assembled contigs using megahit (version 1.1.3)44 and predicted coding regions in the contigs using prodigal (version 2.6.3)45. We then identified ribosomal protein genes from the coding regions by using blastn (version 2.9.0)46 with a previously constructed database of ribosomal protein gene sequences as the reference database16. Relative coverages of the genes were determined by aligning the sequence reads to the predicted coding regions using bwa mem (version 0.7.17)47 and quantifying gene coverages using samtools (version 1.9)48.

For the gRodon models, we used the ‘metagenome_V2’ (or ‘MMv2’) version of the software. MMv2 is an updated version of the original gRodon model (MMv1), which toggles between MMv1 and a GC bias-corrected version of the model (MMBC) depending on the consistency of codon usage in the ribosomal protein genes in a metagenome. Further information on the mechanics of the model can be found in the original gRodon studies17,18. For our samples, consistency values were always below the toggle threshold, so the original MMv1 was applied. We also analyzed our samples with MMBC to ensure that the patterns we observed were not simply reflecting any GC dependency in the model (mode = ‘meta_testing’, bg = ‘individual’). MMBC also allowed us to rigorously assess relationships between growth rates and metagenome GC content. However, we focus on the results from the recommended MMv2 (really MMv1) results, as the tradeoff between GC bias in MMv1 vs. the higher variance in the MMBC model favors the MMv1 predictions for our samples18. Because the assemblies of these metagenomes were poor (mean contig N50 = 508 bp), we also repeated the above growth rate analysis using gene fragments predicted directly from the sequence reads using FragGeneScanRs49. The predicted growth rates from all the gRodon model variations we ran were well correlated and exhibited the same global patterns (Supplementary Fig. 2, 3), though in the main text, we present the MMv2 model results using genes predicted from contigs, as recommended by the gRodon developers18. The model outputs community-averaged minimum doubling times (in hours), which we converted to maximum growth rates (i.e., cell doublings h−1) by taking the reciprocal of the doubling times. For all growth rate analyses, we used the default minimum gene length of 240 bp and did not include the optional correction for optimal growth temperature. For additional validation of the ecological patterns in growth rates we observed, we repeated the growth rate analysis using metagenomes from 42 National Ecological Observatory (NEON) sites across the United States. Sample information for the NEON metagenomes is provided in our figshare repository: https://doi.org/10.6084/m9.figshare.24498484.v3.

To estimate the average genome size and the effective number of genomes within each metagenome, we used MicrobeCensus (version 1.1.1)50. To estimate the average 16S rRNA gene copy number in each community, we first curated a custom bacterial 16S database using RESCRIPt (version 2023.5.0)51. Our custom database is based on the SILVA SSU database (version 138.1)52 but with all sequences belonging to eukaryotes, archaea, chloroplasts, mitochondria, or unclassified organisms removed. We also removed any sequences with ‘n’ bases and sequences that were shorter than 1200 bp. We then dereplicated the sequences at a 99% identity threshold, resulting in a 16S database containing 323,218 high-quality sequences. We then identified bacterial 16S genes within the metagenomes by aligning the sequence reads to the curated 16S database using bwa mem47. This approach is similar to that of riboPicker53 but with an updated 16S database and an updated alignment algorithm. We then quantified the number of aligned base pairs in each metagenome using samtools48 and quantified the effective number of 16S regions in each metagenome by dividing the number of aligned base pairs by 1542 (the length of 16 S in E. coli). The average 16S gene copy number for each metagenome was then calculated as the number of 16S regions divided by the effective number of genomes present54. We assigned functional gene categories to the sequence reads using eggNOG-mapper (version 2.1.3, ‘DIAMOND’ mode) with a reference database of bacterial eggNOG COGs55. Functional gene relative abundances were calculated as the percentage of annotated reads (as opposed to the percentage of total reads) to account for biases resulting from systematic variation in annotation success among samples and environments26. Full details on all metagenomic analyses, including bioinformatics scripts, are provided in the figshare repository: https://doi.org/10.6084/m9.figshare.24498484.v3.

Statistical analyses

All statistical analyses were conducted in ref. 56. We analyzed latitudinal patterns in growth rates using polynomial regression, with the best-supported regression model selected using AICc. We assessed the effects of forest cover on growth rates using linear models or generalized linear models (Gamma distribution, log-link function) when linear models did not meet assumptions of normality of residuals. Polynomial regression and linear models were analyzed using F statistics (or χ2 statistics in the case of generalized linear models). We determined relationships among variables using Pearson correlation. To identify environmental drivers of growth rates, we used random forest regression in the randomForest R package57. We quantified the ‘importance’ of the environmental variables in accounting for variation in the growth rates by determining the increase in model error (mean squared error, ‘MSE’) when each respective variable was randomly shuffled across the dataset. We identified putative mechanistic links between environmental variables, bacterial growth metrics, and functional genes using structural equation modeling (SEM) in the lavaan R package58. We assessed SEM fit by bootstrapping estimates of the comparative fit index (CFI), standardized root mean square residual (SRMR), and root mean square error of approximation (RMSEA). All of these metrics indicated adequate fit for our model: CFI was > 0.99 while SRMR and RMSEA were both < 0.05.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.