Introduction

The primary goal of Earth System Models (ESMs) is to improve understanding and projection of future global change. Since global change is driven principally by the increase in atmospheric CO2 concentration ([CO2]), projection of future [CO2] is a primary product of the models. Therefore, uncertainty in the estimation of the large CO2 fluxes associated with the global C cycle will have a marked impact on projected global change. Photosynthetic CO2 uptake by the terrestrial biosphere is the largest of these fluxes and the entry point of C into the terrestrial C sink that currently subsidizes anthropogenic fossil fuel use. It is therefore critical that ESMs accurately model photosynthesis (Canadell et al. 2007; Beer et al. 2010). However, current understanding and model representation of the terrestrial C cycle, and the response of the terrestrial C cycle to rising [CO2] and temperature, are among the greatest uncertainties in ESMs in terms of both scientific understanding and model representation (Knorr 2000; Friedlingstein et al. 2006; Gregory et al. 2009; Smith and Dukes 2012).

Photosynthetic CO2 uptake is well-described by the Farquhar, von Caemmerer and Berry (FvCB) model of photosynthesis (Farquhar et al. 1980) and many ESMs use a derivation of this model to estimate gross primary production (GPP). One of the key parameters required by the FvCB model is an estimate of the maximum rate of carboxylation by the enzyme Rubisco (EC number 4.1.1.39), commonly termed V c,max. Sensitivity analysis of ESMs has shown that projections of net primary production (NPP) are particularly sensitive to fixed parameters associated with estimating V c,max from leaf N content (Friend 2010) and model uncertainty over the parameter V c,max, has been shown to account for a c. 30 Pg C year−1 variation in model estimation of GPP (Bonan et al. 2011). Given that anthropogenic CO2 sources add c. 9 Pg C year−1 to the atmosphere (Boden et al. 2012), the uncertainty associated with estimation of V c,max, and the need to constrain this estimate are obvious. In many models V c,max is not only the key parameter for estimating photosynthesis, but also, through a simple multiplier, autotrophic respiration (Knorr 2000; Kucharik et al. 2000; Sitch et al. 2003; Biome-BGC 2010). In other words, accurate representation of V c,max is critical not only just for modeling global GPP but also for NPP. Simply stated, V c,max is one of the most critical parameters for the successful projection of future global change. The aim of this study was to examine the different ways in which V c,max is derived for use in ESMs, examine some of the key assumptions, and suggest some opportunities for improving the representation of V c,max in these models.

Models investigated

The key land model components of ESMs were identified from the fourth and fifth phases of the Coupled Climate Carbon Cycle Model Intercomparison Project, commonly known as CMIP4 (Friedlingstein et al. 2006) and CMIP5 (Arora et al. 2013), and from a recent review on respiration and photosynthesis in global scale models (Smith and Dukes 2012). Since the focus of this study was to examine the use of V c,max in ESMs, the scope was restricted to models where V c,max was used to simulate photosynthesis using the FvCB model. As a result several models that use alternative approaches, such as CASA, GTEC, SEIB-DGVM, SLAVE, Sheffield-DGVM, TEM, and VEGAS were not considered (McGuire et al. 1992; Potter et al. 1993; Friedlingstein et al. 1995; King et al. 1997; Woodward and Lomas 2004; Zeng et al. 2004; Sato et al. 2007).

All the models investigated used plant functional types (PFTs) to describe the landscape. Grouping species into PFTs allows the complexity of diverse communities to be reduced to a few key PFTs. Each PFTs can then be parameterized with relevant traits. When coupled with estimation of community composition this allows models to link plant physiology and ecosystem processes, and when sufficient PFTs are used, provides a higher resolution than classifying vegetation by biomes alone. In most models, gas exchange and energy balance calculations happen separately for each PFT, as a result the number of calculations, and required computer power, scales linearly with PFT number. With a finite resource, increasing the resolution of a model by adding PFTs is tradeoff between other kinds of resolution e.g., time steps, vertical resolution, and the accuracy of iteratively solved processes. Typically the models include many plant traits in their PFT definitions and that information covers a range of ecosystem processes, of which V c,max is one. The number of PFTs used by the models varied from 5 to 16 with the richer models dividing broad PFT definitions into additional categories.

Eleven models were investigated, and four main approaches for estimating V c,max emerged; (1) an empirical relationship between V c,max and leaf N content, (2) a mechanistic relationship between V c,max and leaf N content, (3) an estimation based on the theory that a leaf will optimize the tradeoff between photosynthesis and respiration (Haxeltine and Prentice 1996b), and (4) calibration of fixed V c,max values to obtain a target model output.

Unless specified, all discussion of V c,max assumes normalization to 25 °C. Temperature corrections associated with CO2 assimilation in ESMs has been covered recently and is not discussed here (Smith and Dukes 2012). The V c,max values presented here refer to upper canopy sunlit foliage. Detailed comparison of the different approaches used to attenuate V c,max with canopy depth is beyond the scope of this study, but typically V c,max is decreased with canopy depth through a gradient in leaf N content that is specified through leaf area index, specific leaf area, or by the use of a nitrogen profile coefficient (Friend and Kiang 2005; Biome-BGC 2010; Oleson et al. 2010; Zaehle and Friend 2010; Clark et al. 2011; Bonan et al. 2012).

Empirical relationship

BETHY & JSBACH

The Biosphere Energy Transfer Hydrology Scheme (Knorr 2000; Ziehn et al. 2011) estimates V c,max (μmol m−2 leaf area s−1) from the linear relationship with leaf N content (Kattge et al. 2009).

$$ V_{{{\text{c}},\hbox{max} }} =i_v + s_v\times N_a $$
(1)

where the intercept (i v) and slope (s v) for V c,max as a function of leaf N content expressed on an area basis (N a, g m−2 leaf area) are derived for each PFT. BETHY uses an extensive data base of V c,max values (723 data points), and V c,max values determined by standardized model inversions of the maximum photosynthetic rate (A max, 776 data points) to determine the PFT-specific values for i v and s v. Then, a larger data base of N a is used to provide additional data (1966 total data points) for the determination of PFT-specific V c,max values (Kattge et al. 2009). Representation of photosynthesis and estimation of V c,max in 13 PFTs within the Joint Scheme for Biosphere Atmosphere Coupling in Hamburg (JSBACH) is based on BETHY (Raddatz et al. 2007).

Hybrid and HyLand

Hybrid 6.5 (Friend 2010) obtains V c,max by multiplying N a by a proportionality coefficient n 2 (Kull and Kruijt 1998). The coefficient n 2 (0.23 μmol CO2 mmol N−1 s−1) was derived from 45 measurements of V c,max and leaf N made on two tree (Populus tremula, Corylus avellana) and one shrub (Tilia cordata) species (Kull and Niinemets 1998). To account for variation in n 2 between PFTs an additional cooefficient, n f (photosynthetic N factor or relative photosynthetic capacity parameter) is used to adjust n 2 by −10 to +50 % (Friend and Kiang 2005; Friend 2010).

$$ V_{{{\text{c}},\hbox{max} }} = N_{\text{a}} \times n_{2} \times n_{f} \times \frac{1000}{{M_{N} }} $$
(2)

where M N  = molecular mass of N (14 g mol−1). Calibration against eddy flux data from mixed rain forest, evergreen forest, deciduous forest, and wheat (Wofsy et al. 1993; Goulden et al. 1996; Malhi et al. 1998; Berbigier et al. 2001; Hanan et al. 2002) provided values for n f for some PFTs, but others were estimated based on similarity to the PFTs that were dominant in the flux data calibration. For example, the n f for Arctic tundra is estimated as 0.75n f deciduous forest +0.25n f rain forest (Friend and Kiang 2005). HyLand is a simplified version of the Hybrid model where the representation of photosynthesis, and the derivation of V c,max are unaltered (Levy et al. 2004).

O–CN

The O–CN model (Zaehle and Friend 2010) is an extension of ORCHIDEE (described below) that includes key N cycle processes. Unlike ORCHIDEE, O–CN contains an explicit link between V c,max and leaf N content. In O–CN, V c,max is estimated using the canopy photosynthesis and conductance model employed by Hybrid (Friend and Kiang 2005), where the leaf N content for a given canopy layer is used to estimate the V c,max in that layer,

$$ V_{{{\text{c}},\hbox{max} }} = n_{{f{\text{O}} - {\text{CN}}}}\frac{{f_{{N{\text{p}}}} }}{{f_{{N{\text{p}},ave}} }}\times \frac{{N_{\text{a}}\times 1000}}{{M_{N} }} $$
(3)

where n fO–CN (μmol CO2 mmol N−1s−1) is a PFT-specific parameter linking N a to photosynthetic potential at an average observed leaf N content (N a,ave) for a specific PFT. Based on the linear relationship describing the amount of N invested in non-photosynthetic processes (Friend et al. 1997) the fraction of leaf N in the photosynthetic apparatus (f Np, which includes N partitioned to apparatus associated with both the light and dark reactions of photosynthesis) for a given N a is,

$$ f_{{N{\text{p}}}} = a_{{N{\text{p}}}} + b_{{N{\text{p}}}} \times N_{\text{a}} $$
(4)

where a Np is the minimum fraction of leaf N associated with photosynthesis, a Np is set at 0.33 and 0.17 for broad leaf and needle leaf PFTs, respectively (Evans 1989). The slope of the relationship between N a and f Np (b Np) is set at 0.0714 (Evans 1989; Zaehle and Friend 2010). The f Np,ave is the f Np for a given PFT where N a = N a,ave (Zaehle and Friend 2010). The N a,ave is calculated from PFT-specific values of SLA (m2 leaf area C g−1), and average leaf N content expressed on a dry mass basis (N m,  %), assuming a C content (C m) of 48 % dry mass (Reich et al. 1997; White et al. 2000; Zaehle and Friend 2010).

$$N_{{a,{\text{ave}}}} = \frac{{\frac{{N_{{{\text{m,ave}}}} }}{{c_{{\text{m}}} }}}}{{{\text{SLA}}}}$$
(5)

O–CN constrains possible N a by imposing limitations on the minimum and maximum N m values possible for each PFT. The data set used to parameterize the 12 PFTs for N m,ave (White et al. 2000) is the same source that was used to parameterize CLM and Biome-BGC (see below). The data listed by White et al. (2000) cover a wide range of species with over 150 data entries but there are some important gaps and under-represented PFTs e.g., C4 grasses and shrubs.

JULES

The Joint UK Land Environment Stimulator (JULES) model (version 2.2) is coupled to the MOSES 2 land-surface scheme and the TRIFFID dynamic global vegetation model. JULES estimates V c,max from leaf N content for five PFTs (Schulze et al. 1994; Cox 2001; Clark et al. 2011). V c,max is assumed to be linearly related to leaf nitrogen concentration.

$$ V_{{{\text{c}},\hbox{max} }} = n_{\text{e}} \times n_{\text{l}} $$
(6)

where n l is the leaf N content (kg N kg C−1) and n e is a constant relating leaf N to Rubisco carboxylation capacity that is 0.0008 and 0.0004 mol CO2 m−2 s−1 kg C kgN−1 for C3 and C4 plants, respectively (Cox 2001; Clark et al. 2011). The values for n e were derived from assumptions and regressions in Schulze et al. (1994) and are elucidated below,

$$n_{{\text{e}}} = C_{{\text{m}}} \times S_{{R1}} \times S_{{R2}} \times S_{{R3}} \times S_{{A2}} \times 0.001$$
(7)

where the fraction of leaf dry matter in the form of C (C m) is 0.4. The regression slopes S R1, S R2, and S R3, (0.3012, 2.996, and 1.048, respectively) empirically link observations of leaf N concentration to maximum stomatal conductance, maximal stomatal conductance to maximum surface conductance, and maximimum surface conductance to maximum surface assimilation rate (Fig. 3 in Schulze et al. 1994). The data sets used to generate the regressions described above are extensive and were compiled from over 50 studies and included data from over 200 species covering all major biomes. The theory behind the link between leaf, canopy, and ecosystem fluxes is described in detail by Schulze et al. (1994). The final assumption, S A2, is that the maximum surface assimilation rate is 0.5 V c,max for C3 plants and equal to V c,max for C4 plants.

In the Hadley Centre technical note that describes TRIFFID (Cox 2001), the values for n l are derived from Schulze et al. (1994) as described above. In the more recent description of JULES (Clark et al. 2011), the values for n l from Schulze et al. (1994) have been updated. This was the result of a model calibration exercise where JULES was coupled to the atmospheric model as part of HadGEM2-ES (the Earth System version of the Met Office Hadley Centre HadGEM2 model) and various parameter values were changed to match observed vegetation distribution, carbon stores, and fluxes (Clark, personnel communication). The resulting updated values for V c,max in JULES are smaller for 4 of the 5 PFTs when compared to the values in the previous model description (Cox 2001) this is most marked for the shrub PFT, where V c,max is 50 % smaller.

Mechanistic relationship

Biome-BGC and the Community Land Model

Biome-BGC version 4.2 and the Community Land Model (CLM) version 4.0 (Biome-BGC 2010; Oleson et al. 2010) estimate V c,max using a more mechanistic approach that is constrained by constants associated with the structure, function, and amount of Rubisco in the leaf. CLM estimates V c,max from PFT-specific parameters of N a and the fraction of that N invested in Rubisco (F LNR, g N in Rubisco g−1 N). F NR is the mass ratio of total Rubisco molecular mass to N in Rubisco (g Rubisco g−1 N in Rubisco). The specific activity of Rubisco at 25 °C (α R25, μmol CO2 g−1 Rubisco s−1) is set at 60 μmol CO2 g−1 Rubisco s−1 (Woodrow and Berry 1988). Recently, canopy processes in CLM were updated but the derivation of V c,max as described by Eq. 8 and the values for the fixed parameters in that equation are unaltered in the revised model (Bonan et al. 2011, 2012).

$$ V_{{{\text{c}},\hbox{max} }} = N_{\text{a}} \times F_{\text{LNR}} \times F_{\text{NR}} \times \alpha_{\text{R25}} $$
(8)

The value for N a is derived from PFT-specific variables for the leaf C:N ratio (CN L, gC g−1N) and the SLA.

$$ N_{\text{a}} = \frac{1}{{CN_{\text{L}} \times {\text{SLA}}}} $$
(9)

The parameters used for the different PFTs in CLM are provided in a technical note (Oleson et al. 2010) and derived from the parameterization used in Biome-BGC (White et al. 2000). Due to insufficient data from which to parameterize F LNR, White et al. (2000) calculated F LNR from a review of V c,max values in 109 species (Wullschleger 1993), and their own data sets for SLA and CN L following Eq. 8. Shrub F LNR was based on hot shrubs only and set to the value for evergreen needle leaf forest. The F LNR for deciduous needle leaf forest was set to match that used for deciduous broad leaf forest, and the F LNR for evergreen needle leaf forests was increased by one standard deviation because the measurement temperatures were generally lower than for other biomes (PFT nomenclature follows White et al. 2000).

Biome-BGC (v4.2) calculates V c,max using a mathematically identical method to CLM (Thornton et al. 2002; Biome-BGC 2010). BIOME-BGC is typically run with six PFTs (Wang et al. 2011) parameterized as described by White et al. (2000).

$$ V_{{{\text{c,max}}}} = N_{{\text{a}}} \times F_{{{\text{LNR}}}} \times 7.16 \times {\text{ACT}} $$
(10)

where 7.16 is the mass ratio of total Rubisco molecular mass to N in Rubisco (equivalent to F NR in CLM) and ACT is the specific activity of Rubisco, equivalent to α R25 in CLM and also derived from the same kinetic constants (Woodrow and Berry 1988). Leaf N a content is a function of the user-defined ratio of C:N and SLA as described above for CLM (Eq. 9).

The CLM includes a full prognostic N cycle, referred to as CLM–CN which introduces a down regulation of canopy photosynthesis based on the availability of mineral N to support new growth which impacts V c,max through changes in CN L. A prescribed PFT-specific N availability factor, f(N), was derived so that simulated photosynthetic rate was comparable to the realized rate when the CN module was active. This allows CLM to run with the biogeochemistry module (CN) turned off, but still represent the impact of N availability on photosynthesis (Oleson et al. 2010). Adjustment of standard CLM V c,max values with f(N) decreases V c,max by 17–40 %.

$$ V_{{{\text{c}},\hbox{max} - CN}} = V_{{{\text{c}},\hbox{max} }} \times f(N) $$
(11)

Optimization of resources

LPJ

The Lund-Potsdam-Jena model (Sitch et al. 2003) calculates GPP for 10 PFTs following the approach used in BIOME3 (a predecessor to Biome-BGC that uses a different approach to estimate V c,max), where GPP is calculated as a function of absorbed photosynthetically active radiation (APAR) based on an alternative version of the FvCB model (Haxeltine and Prentice 1996a). A fixed V c,max is not prescribed for each PFT but is calculated using an optimization algorithm which is described in detail elsewhere (Haxeltine and Prentice 1996b). Succinctly, V c,max is calculated to obtain the highest net CO2 uptake based on tradeoff between the advantage of having a high Rubisco activity versus the respiratory cost of maintaining it. A high net photosynthetic rate at a high APAR can be achieved by having a high V c,max, but the respiratory cost is relatively high at low APAR and low CO2 uptake. Thus for any APAR there is an optimal investment in Rubisco that produces the highest net photosynthetic rate. The algorithm also accounts for photoperiod, and leaf age in conifers (Haxeltine and Prentice 1996a, b; Sitch et al. 2003). LPJ also includes a global dynamic N model with full interaction with C, N and water cycles (Xu and Prentice 2008).

IBIS

The Integrated Biosphere Simulator (IBIS) version 2 (Foley et al. 1996; Kucharik et al. 2000) proscribes constant values of V c,max at 15 °C for 12–15 PFTs. In the original IBIS, V c,max was estimated by predicting the maximum V c,max (without water stress or N limitation) possible that maintains an optimal balance between gross and maintenance respiration as described above for LPJ (Haxeltine and Prentice 1996b). This dynamic prediction of V c,max was dropped from IBIS-2 for the sake of simplicity. The values used in IBIS-2 are for upper canopy sun lit foliage. The source of these values is unclear. Because IBIS-2 did not use the optimization procedure implemented in IBIS (Haxeltine and Prentice 1996b) a new approach was adopted where the photosynthetic rate of upper canopy foliage was scaled in proportion to the APAR within it (Kucharik et al. 2000). In this study, the values of V c,max at 15 °C were adjusted to 25 °C for model comparison (Bernacchi et al. 2001).

ORCHIDEE

The Organizing Carbon and Hydrology in Dynamic Ecosystems model (Krinner et al. 2005) represents photosynthesis in a sub model, STOMATE (Saclay Toulouse Orsay Model for the Analysis of Terrestrial Ecosystems). ORCHIDEE has 12 PFTs which are parameterized following LPJ (Sitch et al. 2003), but the approach for parameterizing V c,max is different. Each PFT has a prescribed optimum photosynthesis temperature (T opt) and a corresponding V c,maxopt (unstressed, V c,max measured at optimum temperature). The T opt for C 3 grasses and C 3 crops is calculated as a function of multiannual mean temperature for C 3 grasses to account for the presence of these PFTs in a large range of ecosystems. V c,maxopt is adjusted to account for changes in carboxylation capacity with leaf age and canopy position, neither adjustment is explicitly linked to leaf N content (Johnson and Thornley 1984; Ishida et al. 1999). The source of the values for V c,maxopt is unclear. In this study values for V c,maxopt were adjusted to 25 °C to facilitate comparison with other models (Bernacchi et al. 2001).

Model calibration

AVIM

The Atmosphere-Vegetation Interaction Model (AVIM) is the land carbon cycle component of the Beijing Climate Center model. The AVIM model calculates a temperature corrected V c,max, as a function of nitrogen concentration and soil moisture (Lu and Ji 2006).

$$ V_{{{\text{c,max}} - N_{{w_{s} }} }} = V_{{{\text{c,max}}}} \times f({\text{N}}) \times f(w_{s} ) $$
(12)

Calculation of V c,max in AVIM is based on CLM3, where V c,max-Nws is the value of V c,max following correction for N and water availability. Both f(N) and f(w s) are heuristic functions ranging from 0 to 1, where 1 is no limitation on V c,max due to nitrogen limitation or soil moisture availability, respectively (Ji 1995; Bonan 1996; Sellers et al. 1996; Lu and Ji 2006). In CLM3, V c,max values were obtained from published estimates (Wullschleger 1993; Kucharik et al. 2000; Oleson et al. 2004). Currently V c,max-Nws is not corrected for N availability, i.e., f(N) = 1, because estimates of V c,max already account for potential N limitation (Bonan 1996). The function adjusting V c,max for water availability, f(w s), is also set to 1, i.e., no limitation (Bonan 1996). Therefore both f(N) and f(w s) are essentially unused, and parameterization of AVIM is purely through a user-defined parameter. Although AVIM uses the same PFT definitions as CLM3 (Oleson et al. 2004) the values for V c,max were calibrated with remotely sensed estimates of GPP and through the Ecosystem Model-Data Intercomparison project, estimates of NPP (Ji, personal communication). This calibration adjusted CLM3 V c,max values by −15 to +353 %.

CTEM

The photosynthesis sub-module used in the Canadian Terrestrial Ecosystem Model (CTEM 1.0) is implemented as in the Simple Biosphere Model 2 (SiB2), where V c,max is an input parameter that is varied for different model applications (Sellers et al. 1996; Arora 2003; Arora and Boer 2010). When used in the Canadian Center for Climate Modeling and Analysis Earth System Model, CTEM is parameterized with nine PFTs. For CTEM 1.0 values for V c,max were derived from unspecified sources and then tuned to reproduce observed global spatial patterns in GPP. An updated parameterization (CTEM 1.1, Arora, personal communication) is based on published PFT-specific values of V c,max (Scholze et al. 2007; Kattge et al. 2009) which are subsequently calibrated to match site level or global estimates of productivity and NPP/GPP ratios (Luyssaert et al. 2007; Zhang et al. 2009; Beer et al. 2010). In most cases, calibration occurs within the standard deviations of the estimates for V c,max listed by Kattge et al. (2009).

Representation of V c,max by PFTs

Due to lack of clarity in model descriptions or the approach taken to model V c,max, PFT-specific values for V c,max were not readily available for all models, so where possible this information was obtained directly from the modeling groups. Ten PFT data sets were identified for V c,max (Table 1). Of these, a range (5–16) of PFTs was used to represent the terrestrial biosphere. However, in several models the same V c,max is used to represent a number of PFTs so the range of V c,max values is typically smaller. Figure 1 shows the range of V c,max values from the models in Table 1 conflated into 16 common PFT definitions. There is considerable variation between the values of V c,max used to represent a given PFT. Across all PFTs the average range of V c,max values was −46 to +77 % of the PFT mean. The range of V c,max values used to parameterize evergreen and deciduous trees in the tropics was particularly large, and only 4 models included a PFT for Arctic vegetation.

Table 1 Values of V c,max at 25 °C for the plant functional types (PFTs) listed in the models investigated
Fig. 1
figure 1

The maximum rate of carboxylation by Rubisco (V c,max) for 16 plant functional types (PFTs) derived from the models described in Table 1. For a given PFT a value for V c,max was assigned from each model based on the similarity to the original PFT description. Where a PFT is delineated to a greater extent than in the original model the most appropriate V c,max value is repeated for that PFT division. Where no V c,max value was assigned to a PFT, no data were included from that model. BETHY classified tropical trees based on soil type (Table 1). The mean V c,max for oxisols and non-oxisols from BETHY was used to represent both tropical PFTs in this figure. Values for CLM were for CLM–CN and values for CTEM were for CTEM 1.0. PFT abbreviations; N needle leaf, B broad leaf, E evergreen, D deciduous, T tree, S shrub

Model parameters and assumptions

The models outlined above have both PFT-specific parameters and fixed parameters, in the algorithms used to estimate V c,max. Some also include PFT-specific coefficients that are used to adjust fixed parameters. The aim of the next section is to examine the assumptions underlying the values that are used for fixed parameters, and where possible, offer suggestions for improvements.

Leaf traits

Many of the models include leaf traits in their estimation of V c,max. These traits have traditionally been obtained from the literature or from private data bases. The TRY database now offers a central portal through which to access global plant trait data from many sources (Kattge et al. 2011). As awareness, confidence and data submission to TRY grows it is hoped that ESMs can use the large amount of trait data to constrain input parameters, or at least agree to use the same numbers. Improved parameterization of a leaf level model using trait databases has been shown to constrain estimates of photosynthesis, which suggests that this approach could also be used effectively to parameterize ESMs (Ziehn et al. 2011). In addition, well-established correlations among leaf traits, that include SLA, and N a, can be used to markedly reduce model uncertainty when multitrait covariance is incorporated into the models (Wang et al. 2012).

Currently different sources of trait data are used, resulting in variation among model input. For example, leaf C content (C m, %) is a fixed parameter in some models but varies from 40 to 50 % (JULES, O–CN, Biome-BGC). The variation in N a is also substantial. The six models reporting sufficient information to calculate N a for two commonly defined PFTs show that for temperate broadleaf deciduous trees the PFT defined N a ranged from 1.03 to 1.73 g m−2 and in C 3 grasses ranged from 0.82 to 1.74 g m−2 (BETHY, Biome-BGC, CLM, Hybrid, JULES, O–CN).

Metadata associated with traits will be important for accurate scaling and modeling. For example N a may be derived from an entire leaf, a lamina, or a lamina section. When linking N a to a photosynthetic parameter the most appropriate measurement of N a will come from the section of leaf enclosed by the gas exchange cuvette, which may not include large veins. Scaling this relationship to the leaf in the most accurate way would require estimates of N a based on the entire lamina, whereas modeling construction costs of foliage would require the petiole to be included in the estimate. The parameter V c,max is subject to even more confounding variables, these include: photoperiod length, time of day, day of year, the measurement protocol, subsequent modeling approaches, and temperature corrections (Bernacchi et al. 2001; Wilson et al. 2000; Long and Bernacchi 2003; Xu and Baldocchi 2003; Ethier and Livingston 2004; Gu et al. 2010; Bauerle et al. 2012; Smith and Dukes 2012). This poses a real challenge to databases. A possible solution to the problem of post measurement variation in modeling V c,max was recently offered (Gu et al. 2010). They established a service-in-exchange-for-data-sharing website where they provide analysis of leaf gas exchange measurements and in return the gas exchange data used to compute V c,max are stored and made freely available to the community. In time this project may offer the chance to re-compute large data sets using common protocols.

The fraction of leaf N invested in Rubisco (F LNR)

Of central importance to modeling photosynthesis and linking that estimation to the N cycle is the link between V c,max and leaf N content. In CLM and Biome-BGC, the PFT-specific parameter F LNR (Eq. 8 and 10) sets potential rates of carboxylation as a function of leaf N content, and is a dominant control on photosynthesis. A recent sensitivity analysis of 80 CLM input parameters identified the parameter F LNR to be the second most important parameter influencing model output, the most important parameter, was CN L (Eq. 9), also a key input for the estimation of V c,max (Sargsyan et al. 2013). The disparity between the model representation and observation of this parameter is significant. The combination of the very low turnover rate (k cat) of Rubisco and the wasteful oxygenation reaction, means that plants must make a major investment in this inefficient enzyme (Ainsworth and Rogers 2007). Values for F LNR estimated from C 3 crops are c. 20 % (Evans and Seemann 1984; Mitchell et al. 2000; Leakey et al. 2009). However, the values for F LNR used in ESMs are typically less than half the observed value (Fig. 2). This suggests that the lower estimates of V c,max (Fig. 1) for CLM and Biomes-BGC may be driven by the low values for PFT-specific F LNR (Fig. 2). For some PFTs the estimate used for F LNR is arbitrary (Biome-BGC, CLM). Clearly, more data are required to improve confidence in proscribed F LNR values. Calculation of F LNR also revealed that the value used by BETHY to represent C 3 crops closely matches the widely reported value of c. 20 % (Fig. 3). This suggests that application of PFT-specific values of F LNR derived from BETHY in models that estimate V c,max mechanistically from N a (Biome-BGC, CLM) would markedly increase V c,max in these models, and may offer a better source of parameterization than current sources.

Fig. 2
figure 2

The fraction of leaf N invested in Rubisco (F LNR, %) for 16 Plant Functional Types (PFTs). F LNR was either provided in the model description (Biome-BGC, CLM) or was calculated from V c,max and N a following Eq. 8 and 13 assuming, FNR% = 16.07 and αR25 = 47.34 µmol CO2 g−1 Rubiscio s−1. For a given PFT a value for F LNR was assigned from each model based on the similarity to the original PFT description. Where a PFT is delineated to a greater extent than in the original model the most appropriate F LNR value is repeated for that PFT division. Where no F LNR value was assigned to a PFT, no data was included from that model. BETHY classified tropical trees based on soil type (Table 1). The mean of the F LNR for oxisols and non-oxisols calculated for BETHY was used to represent both tropical PFTs in this figure. PFT abbreviations; N needle leaf, B broad leaf, E evergreen, D deciduous, T tree, S shrub

Fig. 3
figure 3

The fraction of leaf N invested in Rubisco (F LNR, %) as a function of leaf N content (N a, g m−2) for the Plant Functional Type–Temperate broad leaf deciduous trees. It was possible to calculate the response of F LNR to N a for six models based on V c,max and N a following Eq. 8 and 13 assuming, FNR% = 16.07 and αR25 = 47.34 µmol CO2 g−1 Rubisco s−1. Biome-BGC and CLM had fixed proscribed parameters for F LNR, Hybrid and JULES did not proscribe F LNR but F LNR did not vary with N a

Biome-BGC, CLM, Hybrid, and JULES currently assume that the amount of N invested in Rubisco does not change with N a and fix that parameter for each PFT, either directly or through PFT-specific N a and SLA values. However, there is considerable evidence that this is not the case when variation in F LNR with N a is examined in a single species (Wong 1979; Evans 1989; Makino et al. 1992; Theobald et al. 1998). For example, Evans (1989) showed that in spinach the investment in Rubisco increased from 10 to 19 % as N a increased from 1.05 to 2.80 g m−2. The response of F LNR to N a in temperate broad leaf deciduous trees is shown for the models where it was possible to calculate this response (Fig. 3). O–CN uses the work of Evans (1989) to adjust F LNR as a function of N a (Eq. 3 and 4). Because the linear relationships used in BETHY (Ziehn et al. 2011) have positive i v values (Eq. 1) BETHY decreases F LNR as N a rises (Fig. 3). Experiments designed to elucidate PFT-specific relationships between F LNR and N a would provide valuable data for estimating V c,max, particularly at low N a, where the estimation of V c,max from N a is markedly impacted by the intercept of these relationships. One approach used to determine F LNR requires estimates of V c,max from leaf level gas exchange and determination of N a in the same tissue. Fixed parameters are then used to calculate F LNR (Eq. 8) as described previously (Leakey et al. 2009). Alternatively, Rubisco can be extracted from leaf tissue and the activity or amount determined biochemically and compared with total leaf protein (Evans 1989; Makino et al. 1992).

The fraction of N in the Rubisco holoenzyme (F NR)

The parameter F NR is more commonly expressed as the percentage of N in the Rubisco molecule relative to the total molecular mass of the holoenzyme, termed F NR% here. In CLM and Biome-BGC, F NR% is 13.96 % (Kuehn and McFadden 1969; Thornton and Zimmermann 2007).

$$ F_{{{\text{NR}}\% }} = \frac{100}{{F_{\text{NR}} }} $$
(13)

Given that Rubisco is highly conserved, there is a surprising range (13.96–16.36 %) of reported values for F NR% (Steer et al. 1968; Evans and Seemann 1984; Niinemets and Tenhunen 1997; Thornton and Zimmermann 2007). Estimation of F NR% is possible based on the amino acid composition of form I (L8S8) of the mature holoenzyme and well-documented post translational modification that includes methylation and N terminal processing (Spreitzer and Salvucci 2002; Houtz and Portis 2003). The amino acid sequences of the large (accession NP_054944.1, NCBI) and small (P00870, UniProtKB/Swiss-Prot) subunits of Spinacia oleracea (Martin 1979; Schmitz-Linneweber et al. 2001) were used here to calculate an F NR% of 16.07 %. Substituting 16.07 for 13.96 in Eq. 8 and 10, would decrease the estimated V c,max in CLM and Biome-BGC by c. 13 %. It is possible that the difference between the estimate provided here, and that used in CLM and Biome-BGC, is due to subtraction of the mass associated with the formation of peptide bonds which appears to have been omitted in the CLM and Biome-BGC estimate.

Specific activity

Although CLM (α R25) and Biome-BGC (ACT) use a fixed value for specific activity, it can be calculated from the k cat, the number of available active sites in Rubico (S R), and the molecular mass of the holoenzyme (M R , g mol−1), implicit is the assumption that Rubisco is fully activated, i.e., S R = 8 active sites, but see below.

$$ \alpha_{\text{R25}} = \frac{{k_{\text{cat}} \times S_{\text{R}} }}{{M_{R} }} $$
(14)

The specific activity cited in CLM and Biome-BGC is 60 μmol CO2 g−1 Rubisco s−1. The molecular mass and activation state of Rubisco are not quoted in the technical descriptions (Biome-BGC 2010; Oleson et al. 2010), the supporting reference (Thornton and Zimmermann 2007) or the available proximal reference (Woodrow and Berry 1988). However, the k cat is listed as 3.3 s−1 site−1 (Woodrow and Berry 1988). Based on this k cat, and the assumption that the specific activity was estimated with a saturating CO2 concentration, a fully activated Rubisco with a molecular mass of 575 kDa (see below) would result in a specific activity of only 46 μmol CO2 g−1 Rubisco s−1. Holding all other variables constant, and assuming full activation of Rubisco, substituting this value into Eq. 8 and 10 would result in an estimate of V c,max that is 23 % lower than the value used by CLM and Biome-BGC.

Molecular mass of the Rubisco holoenzyme (M R )

There is considerable variation in the molecular mass of the holoenzyme that results from variation in reported values for both the large subunit (Lsu) and small subunit (Ssu) of Rubisco (Andersson and Backlund 2008). The mass of the Lsu (56,629 Da) and Ssu (15,193 Da) calculated from the Spinacia oleracea amino acid sequence (above) results in a holoenzyme that is 574,575 Da, which is just under 3 % higher than the commonly cited 560 kDa estimate for land plants and green algae (Spreitzer and Salvucci 2002). The mass estimated from the sequence is probably within the margin of error (6 or 7 residues per subunit) when estimating the molecular mass of the Lsu and Ssu using SDS-PAGE approaches. Equations 8 and 14 show that the small (less than 3 %) impact of changes in M R is inversely proportion to the projected V c,max.

k cat

The k cat of Rubisco from higher plants has been reported to range from 2.5 to 5.4 s−1 (Tcherkez et al. 2006). Values differ between species, habitats, techniques, and laboratories (Makino et al. 1988; von Caemmerer et al. 1994; Sage 2002; Tcherkez et al. 2006; Kubien et al. 2008; Cousins et al. 2010). In the absence of robust PFT-specific parameterization of k cat, ESMs need a well-constrained estimate for broad application. Recent in vitro measurements have begun to converge (Kubien et al. 2008; Whitney et al. 2009; Cousins et al. 2010) and closely match earlier estimates made in vivo using antisense-Ssu transgenic tobacco (von Caemmerer et al. 1994). While PFT-specific k cat values are not yet available, it is clear that C 3 and C 4 specific values for k cat should be considered for implementation in ESMs, although uncertainty surrounding the C 4 k cat values is greater than k cat estimates for C 3 species (Sage 2002; Kubien and Sage 2004; Kubien et al. 2008). Current recommended values for the k cat in C 3 and C 4 species are 3.4 and 4.4 s−1, respectively (von Caemmerer et al. 1994; Kubien et al. 2008; Whitney et al. 2009; Cousins et al. 2010). Estimation of PFT-specific k cat, and other kinetic constants, is a much needed area of active research, and attention should be paid to this emerging literature. Equations 8 and 14 show that changes in k cat are directly proportional to estimation of V c,max. Unlike purely empirical models, the approach taken by CLM and Biome-BGC allows new PFT-specific data on enzyme kinetics to be readily incorporated into the models as it becomes available.

Activation (S R)

Rubisco must be activated through reversible carbamylation of a lysine residue and binding of Mg2+. Rubisco is usually fully active and carbamylated at current [CO2], steady-state saturating light, and optimum temperatures (von Caemmerer and Quick 2000; Portis 2003). Current ESMs assume constant and high activation states. However, for many PFTs, especially those with a high leaf area index in biomes where rising temperatures will push operating temperatures above thermal optima, this assumption may not be valid. The activation state of Rubisco is not incorporated into current ESMs, but could be, as described previously (Sage et al. 2008). PFT-specific estimates of activation for a given set of environmental conditions can be readily determined following rapid extraction of the enzyme and comparison of initial and fully activated enzyme activity (Rogers et al. 2001). Activation is important to represent in models where V c,max is mechanistically linked to N acquisition because the impact on N availability could be significant. For example, if Rubisco was 80 % activated, it would require a 25 % increase in N a to match the V c,max obtained if Rubisco were fully activated (Eq. 8 and 14).

Model calibration

Model calibration, or to use the more provocative term, tuning, is an approach that modeling groups use to match model outputs with site-specific or remotely sensed measurements of NPP or GPP (AVIM, CTEM). This makes sense because the models need to be able to reproduce current C stocks and fluxes if we are to have any confidence in their ability to project future responses. V c,max is an excellent parameter to use for tuning because the impact on GPP and NPP is direct and tightly coupled to model output (Sargsyan et al. 2013). Therefore, relatively simple tuning of V c,max to match observed NPP or GPP can be used to compensate for deficiencies in other areas of the model (Bonan et al. 2011). Some models set V c,max directly through tuning exercises (AVIM, CTEM) and others adjust best estimates of V c,max through subsequent tuning of V c,max (CTEM 1.1) or the coefficients used to adjust it (Hybrid, n 2; JULES, n l). A major problem of tuning the models using V c,max is that the response of photosynthesis to rising CO2 concentration is determined largely by the investment plants make in Rubisco. Plants with a larger investment in Rubisco, and a large V c,max, typically have a higher photosynthetic rate but are less responsive to rising CO2 concentration because photosynthesis becomes limited by the capacity to regenerate ribulose-1,5-bisphophate at a lower [CO2] than plants with a smaller V c,max. This is why trees typically show a greater percent stimulation in photosynthesis at elevated [CO2] when compared with C 3 crops (Ainsworth and Rogers 2007). Since a principal goal of ESMs is the projection of future [CO2], tuning models with a parameter that directly impacts the CO2 responsiveness of the future terrestrial biosphere needs careful consideration. Of course, it is possible that current plant trait data and model representation of plant physiology are not sufficient to provide accurate model outputs without tuning.

Conclusion

The ESMs surveyed in this study seek to represent the CO2 uptake of the same biomes using similarly, and in some cases identically, defined PFTs, yet the variation in the V c,max between the different models is substantial. This is unacceptable in the given currently available resources and the critical role that V c,max plays in determining global C flux. ESMs need to take greater advantage of plant trait data bases, e.g., TRY (Kattge et al. 2011) to constrain the PFT-specific estimates used for key parameters such as SLA, CN L, N a, C m, F LNR, where relevant V c,max directly, and to do that using parameterization approaches that minimize the uncertainty in model outputs (Kattge et al. 2009; Ziehn et al. 2011; Alton 2011; Wang et al. 2012). We also need to continue to expand plant trait databases to reduce uncertainty in existing PFT parameterization, which can be large. This is particularly important for PFTs in under-represented biomes, biomes that dominate global C fluxes, or those that are particularly vulnerable to global change, such as PFTs in Arctic and Tropical ecosystems. In addition to identifying the approaches used to estimate V c,max, and the wide range of resulting values, this study also identified some alternatives to the fixed parameters used by CLM and Biome-BGC. If the suggested changes in k cat , F NR and M R were implemented, it would collectively reduce estimated V c,max in these models by 31 % for C 3 species and 11 % for C 4 species. These reductions in V c,max may be offset if potential increases in PFT specific F LNR are also implemented.

The range of PFTs and their physiological characterization should also be expanded to enable more accurate and dynamic representation of plant communities and their response to global change. In particular, variation in V c,max due to long-term acclimation to growth at rising temperature and [CO2] is currently absent from most ESMs (Smith and Dukes 2012). Because the movement of CO2 from the atmosphere to the chloroplast plays a major role in determining CO2 responsiveness, it also will be critical to improve understanding and model representation of the limitation on photosynthesis imposed by stomatal and mesophyll conductance. In short, as increasing computing power allows it, we need to expand the representation of plant physiology in ESMs.