Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

Predicting the behaviour of a given genotype under contrasting environments or predicting the phenotypes of contrasting genotypes under one environment is difficult for various reasons. First, small genome variations may have huge impacts on the phenotype due to cascade effects on the processes involved in the observed traits . Second, strong genotype-by-environment (G × E) interactions occur for most plant traits. During the last decades, many process-based models have been developed to predict crop yield or quality under fluctuating environments (Martre et al. 2011). These models usually describe the temporal variations of main processes involved in final traits as well as their interactions and responses to environmental variations or cultural practices. Yet, most of the processes involved in crop yield or quality depend also on the genetic makeup with high environment-by-genotype interactions (e.g., Kromdijk et al. 2014; Prudent et al. 2010). This implies that model parameters are usually specific for one genotype, restricting the validity range of the model itself. To overcome this limitation, several authors tried to take into account the genetic control in plant models. This implies to identify the model parameters which are genotype dependent and to quantify them depending on combinations of Quantitative Trait Loci (QTL ) (QTL-based models ), alleles or genes (gene-based model s) involved in the process which is modelled. The complexity of the genetic control accounted for in models is usually the inverse of the complexity of the modelled system. Indeed integrating the effects of many genes is possible at the cell or organ level, more rarely at the plant level. However, methods developed at the lower levels of organization may teach us how to proceed through higher levels. Plant complexity cannot be described in one big multi-scale model. Instead we should now focus on (i) tracing main hubs at the lower levels of organization; (ii) quantifying their effects at the higher levels of organization; (iii) refining the plant models to be able to link model parameters to physiological components. This will be a real challenge for the future due to the polygenic control of most of the variables and processes involved in crop yield and product quality and to the strong genotype-by-environment interactions . The success will rely on the advance in the understanding of the genetic control of the target trait s and on our ability to phenotype large populations under contrasting environments at the process level.

The overall objective of this chapter is to provide an overview of the integration of genetic control in plant and crop models at different levels of organization, from gene network s, to cell, organ and plant.

The chapter is organized as follows: the fundamentals of quantitative genetics of complex traits are first introduced, with special attention to methods for QTL cartography and QTL genetic modelling. The integration of genetic control within ecophysiological models is then discussed. Classical approaches are first introduced that rely on the specification of model parameters as a function of QTL or gene effects using simple empirical relations. The advantage of this approach resides in its flexibility and in the possibility to describe experimental data, even in absence of a clear understanding of the underlying biological mechanisms.

The last part of the chapter is devoted to emerging multi-scale models , explicitly integrating the description of molecular processes with a broader view to plant physiology and development. In this perspective, Sect. 1.3 proposes a review of a few selected methods from systems biology that can be used to describe the behaviour of cellular network . Model simplification and coupling among different organizational scales is the topic of Sect. 1.4.

1.2 Fundamentals in Quantitative Genetics

Here is a list of definitions that may help the reader to follow the subsequent paragraphs.

Allele

Functional form of a gene

Locus

Particular site of a gene or DNA sequence on a chromosome

Homozygosity

In case of diploid plants, both alleles have the same phenotypic effect

Heterozygosity

Each allele at a gene has a different phenotypic effect

Additive allele effect (a)

In a population, a is half of the difference between the trait value (Y) of the mean homozygous genotypes for one parental allele (mm) and for the other parental allele (MM):

a = (Y mm −Y MM )/2

Dominance

effect

Accounts for the interaction between alleles at each locus. This term is non null when heterozygotes (a A) are not exact midpoints of homozygote parents. It is the deviation between the mean of heterozygotes and the half sum of the homozygotes

Epistasis

The alleles at one locus change the phenotypic effects of genetic variation at another locus

Plasticity

A single genotype gives rise to a diversity of phenotypes, depending on specific environmental conditions or lifetime

Polygeny

Many genes contribute to a particular phenotypic character

Pleiotropy

Different phenotypic characters are affected by a single genetic variation

Transgressive

individuals

An individual has a phenotype more extreme than the phenotype displayed by the two parents

1.2.1 Quantitative Traits Controlled by Numerous Genes

One of the fundamental ideas of quantitative genetics , as defined by Fisher in 1918, is that the phenotypic value P of an individual is the sum of that individual’s genotypic value G plus its environmental value E: P = G + E. The genotypic value is the combined effect of all the genetic effects, including nuclear and mitochondrial genes , and interactions between the genes. The average phenotypic outcome may be affected by dominance and by how genes interact with genes at other loci.

In case of clones, the value G corresponds to the average value of all clones whereas the environmental value E is inferred from the difference between the phenotype of the clone and G. However, G and E can also be estimated in case of other types of related individuals. This is the common case where G values are different between individuals and depend on their relations. To account for this, the genotypic value is decomposed as:

$$ G=A+D+I $$

where A represents the contribution to the character from the effects of individual alleles, D is the contribution from the interaction between these alleles, and I represents the contribution from interactions between different loci.

In summary, the genetic architecture of complex traits first implies the actions of genes in singular locus but also the inter-locus interactions and gene × environment interactions.

Quantitative genetics developed first from the analysis of characters with discrete variations. The determinism of such characters often proved to be monogenic characterized by the wild phenotype and the mutant. In most cases, those phenotypes were caused by a major alteration of a single gene. However those alterations are quite rare since they are counter-selected because of their large phenotypic effect.

Indeed, many characters show continuous variations in the populations. Those characters are under polygenic control: many loci called QTL (Quantitative Trait Loci ) are responsible for the observed variations. Their nature may not be different from the one of the loci responsible for discrete variations. Indeed the main difference lies in the moderate effects of each locus. Most of the traits of interest are controlled by multiple interacting genes . So up to now the huge progress in gene discovery has only weakly aided genetic selection (Miflin 2000; Sinclair et al. 2004). For instance, in tomato (Solanum lycopersicum L.) fruit, more than 100 genes located in 16 regions of the genome, are associated with fruit composition, mainly sugar and acid contents (Causse et al. 2004; Bermudez et al. 2008). Consequently, QTL for a given trait usually explain only low proportions of the observed trait variations. Moreover, most of these QTL depend on the environment (QTL × E) and on the genetic background (QTL × QTL) (Börner et al. 1993; Blanco et al. 2002; Chaïb et al. 2006; Causse et al. 2007; Dudley et al. 2007).

Current technological progresses and recent advances in genetic analyses may offer possibilities to estimate more and more precisely the individual effects of each locus detected, their location on the genome and their potential interactions with other loci. This information is precious to build a model that predicts the value of an individual with a given combination of alleles.

1.2.2 Principles and Methods of QTL Cartography

Mapping QTL is based on the systematic search for associations between marker loci and the quantitative traits (Kearsey 1998). The prerequisites for QTL mapping are:

  • create a segregating progeny. Best efficiency is reached in case of crosses between inbred lines.

  • for each individual of the progeny, get the genotype of a set of marker loci distributed along the genome, so as to build a genetic map of the progeny.

  • get the value of the studied trait for each individual of the progeny.

  • perform biometric methods to detect an association between the score of marker genotypes and the value of the measured trait and estimate the genetic parameters of the detected QTL .

In the last decades, the tremendous advances in molecular genetics have greatly facilitated genetic analysis of quantitative traits . More recently, the use of markers based on single nucleotide polymorphisms (SNPs) have rapidly increased in plant genetics due to their abundance in the genomes and the possibility of high-throughput detection (Mammadov et al. 2012). It is now fairly routine to locate highly polymorphic marker loci that span the genome. Consequently, the major challenge is now the phenotypic analysis of the genetic variability (Houle et al. 2010), which requires simultaneous analyses of hundreds to thousands of plants. To face this difficulty, phenotypic platforms allowing fine environmental control (Tisné et al. 2013; Granier et al. 2006) or field characterization (Andrade-Sanchez et al. 2013) are emerging.

The simplest method to detect QTL is to consider the molecular markers independently. A difference in the mean trait value between different marker genotypes is sought. However, if the marker and QTL are separated by some recombination fractions, the strength of the marker–trait association decreases. Thus, a weak association can be generated by tight linkage to a QTL of small effect or loose linkage to a QTL of major effect. To further decipher these cases, different statistical approaches have been used that allow estimating QTL effects and their map positions.

The premise of identifying QTL is based on the likelihood ratio of the probability of having an association between a marker and a QTL assuming genetic linkage , divided by the probability of having an association assuming no linkage. This ratio is called LOD (logarithm of the odds) score. A LOD score of 3 or greater is usually considered as statistically significant evidence for linkage between a marker and a QTL. However, different methods are available to calculate a genome-wide significance threshold, from permutation tests to Bayesian approaches.

A number of statistical methods have been developed for mapping QTL , from marker by marker analysis (variance analysis, Student test) to multi-environment mapping. Based on maximum likelihood algorithms, Lander and Botstein (1989) proposed what is now called interval mapping (IM) to scan the genome for evidence of QTL. IM can also be performed by regression (Knapp et al. 1990; Haley and Knott 1992). Subsequently, composite interval mapping (CIM) was developed. The method described by Zeng (1993) is based on multiple regressions to isolate individual QTL effects and genetic variation in other regions of the genome. The aim is to reduce the background “noise” that can affect QTL detection by incorporating into the model a set of markers significantly associated with the trait. These ‘cofactors’ may be located anywhere in the genome. Jansen and Stam (1994) also proposed ‘multiple QTL model’ (MQM), a method similar to CIM. Compared with IM, both CIM and MQM can significantly improve mapping precision and the estimation of QTL effects by the fact that more QTL are detected.

Advanced statistical methods, e.g., to perform multi-environment and/or multi-trait QTL mapping , have emerged and have recently been reviewed by van Eeuwijk et al. (2010). They state that the mixed model QTL methodology is suitable for many types of populations and allows predictive modelling of QTL by environment interaction s.

In parallel, other statistical methods have been developed to detect QTL involved in response curves (‘functional mapping ’). For example, Ma et al. (2002) combined logistic growth curves and QTL mapping within a mixture model approach. This method proved to be powerful and to produce accurate estimates of QTL effects and positions (Wu et al. 2002, 2003). Using a similar framework, Malosetti et al. (2006) proposed a non-linear extension of classical mixed models.

1.2.3 QTL Genetic Parameters

The biometric methods cited above not only detect QTL (map location) but also estimate a number of genetic parameters of these QTL. Among them, the QTL effect is the difference of effect between alleles, usually referred to as the additive effect (a) or effect of a double substitution (2a). In addition to the magnitude, the sign of the effect is also of particular interest. Indeed, both favorable and unfavorable alleles sometimes come from the two parents. In addition, the QTL effect can also refer to the part of the phenotypic variation explained by each QTL or by all QTL controlling a trait. This part (R2) is quantified by the percentage of the difference between the residual sum of squares (RSS) of the reduced model and the full model, divided by the full model RSS.

Based on the QTL analysis results, a quantitative genetic model can be defined that relates the genotypic value of an individual to the alleles at the loci that contribute to the variation in a population in terms of additive, dominance, and epistatic effects. For example, Podlich and Cooper (1998) developed a platform for quantitative analysis of genetic models, QU-GENE. The definition of the genetic model includes the following components:

  1. 1.

    Number of genes (loci).

  2. 2.

    Intra-locus gene action (additive, dominance).

  3. 3.

    Inter-locus gene action (epistasis ).

  4. 4.

    Pleiotropy.

  5. 5.

    Number of alleles.

  6. 6.

    Gene frequency (allele frequency).

  7. 7.

    Mutation.

  8. 8.

    Ploidy.

  9. 9.

    Linkage and chromosomal arrangements.

  10. 10.

    Genotype-by-environment interaction.

From this genetic model, the genotypic value of any individual genotype , carrying any combination of alleles from this population, may be inferred. Reymond et al. (2003) used a QTL model including additive and epistatic effects. Then they estimated for each individual the allelic probability at QTL positions, given the information at flanking markers, and finally used them in the QTL model.

1.3 Integration of Genetic Control in Crop Models

1.3.1 Levels of Integration

White and Hoogenboom (2003) reviewed the issues related to incorporating gene action into crop models. They proposed a classification of models based on the level of genetic details they included. Six levels were proposed that are still relevant:

  1. 1.

    Generic model with no reference to species

  2. 2.

    Species-specific model with no reference to cultivars

  3. 3.

    Genetic differences represented by cultivar specific parameters

  4. 4.

    Genetic differences represented by gene actions modeled through their effects on model parameters

  5. 5.

    Genetic differences represented by genotypes, with gene action explicitly simulated based on knowledge of regulation of gene expression and effects of gene products

  6. 6.

    Genetic differences represented by genotypes, with the gene action simulated at the level of interactions of regulators, gene products, and other metabolites.

Historically, ecophysiological and crop models were of Levels 1 and 2. Progressively they have included genetic information and most current models are of Level 3. Level 4 is the one currently developed. It is especially largely used to include information from quantitative genetics (outlined later in 1.2.2). Level 5 is still rarely encountered. It is restricted to the cases of model species for which the understanding of gene action in some particular physiological processes is advanced. In general, too few genes are known to feed gene-based model s. However, this level may also be tested with virtual genes (example of the phenology module in GeneGro Version 2, Hoogenboom and White (2003)). Lastly, Level 6 corresponds to models simulating gene action based on interactions of regulators, gene-products, and other metabolites. It has only been achieved in case of unicellular organisms (Tomita et al. 1999).

White and Hoogenboom (1996) and Yin et al. (2000a) were pioneers in the integration of gene action and QTL effects, respectively, in process-based models (models of Levels 4 and 5). These works were promising proofs of concept and aroused keen interest in the scientific community. The principle used in these works is to define genotypes by a set of model parameter values. These values depend both on the allelic combination carried by the genotype and on the genetic model defined from genes or QTL controlling the parameters of the model. Then the model can simulate these different genotypes.

1.3.2 QTL -based Modelling

In the absence of information on specific genes or loci, QTL analyses can be performed on model parameters. Indeed, these parameters often display quantitative and continuous variations in populations, in the same way as variables classically observed (e.g., plant height , yield, biomass). This approach was pioneered by Yin et al. (1999; 2000b), who recalculated the value of 10 genotypic parameters of the SYP-BL simulation model for barley crop growth. The major weakness of this approach was the inability of the original model to simulate observed variations. The authors suggested that the level of integration considered was not appropriate. They concluded that further physiological processes might be incorporated in the model to improve the performance of the coupling. Since, promising results have been obtained using physiological components of different traits , such as leaf elongation (Reymond et al. 2003), plant development (Yin et al. 2005; Messina et al. 2006), phenology (Nakagawa et al. 2005), early plant growth (Brunel et al. 2009), nitrogen adaptation (Laperche et al. 2006) and fruit quality (Quilot et al. 2005b). In each of these studies, QTL associated with the considered traits/processes were identified. Test of the model against independent conditions (new genotypes and new environmental conditions) also gave promising results (e.g., Reymond et al. 2003).

The latter authors focused their work on the analysis of the genetic variability of leaf elongation rate on maize in response to temperature and soil water deficit (Reymond et al. 2003, 2004). In these studies, a simple static model based on response curves of leaf elongation rate to temperature, vapour pressure deficit and soil water potential was used. Thirteen maize lines grown under six contrasted environments were used as material for validating the model, which accounted for 74 % of the genetic and environmental variations of leaf elongation rate (Reymond et al. 2004).

The QTL associated to the model parameters do not systematically co-localize with the QTL for the more integrated variables, thus highlighting the complexity of the system. For instance, no co-localization was found in maize between QTL for final leaf length under water deficit and QTL for the parameters associated to leaf expansion response to water deficit (Reymond et al. 2004).

However, co-localizations of QTL for different traits or parameters were observed in other studies. For instance, co-localizations between QTL for leaf elongation and anthesis-silking interval in maize suggest that these traits might be regulated by the same process (e.g., tissue elongation for either the leaves or the silks; Welcker et al. 2007). Similarly, the study performed by Quilot et al. (2005b) on peach gave some insight in the processes that control quality traits. Ten genotypic parameters of the virtual peach fruit model which strongly affect fruit growth and sugar accumulation (Quilot et al. 2005a) were selected among the 40 parameters of the model, for a QTL analysis (Quilot et al. 2005b). These genotypic parameters were substituted in the simulation model by the sum of QTL effects. The model was then able to account for a large part of the genetic and environmental variations in fruit size (observed and predicted values of fruit dry mass showed a correlation coefficient of 0.55). In this example, the QTL analysis of the genotypic parameters gave some insight in the processes that control quality traits, as they co-localized on the genetic map with QTL for fruit size and sugar content. This suggests putative physiological interpretations of the functions of genes under these QTL. Moreover, such results can help understand the processes involved, and thus assist the improvement of the process-based model. More details on the approach can be found in recent reviews (e.g., Hammer et al. 2006; Yin and Struik 2008; Messina et al. 2009; Bertin et al. 2010).

The number of parameters in most process-based models (from tens to hundreds) appears relatively low compared with the large number of genes of a plant (tens of thousands). Nevertheless, this simplification of the complexity of the genetic architecture shows potential anyway. Indeed, genes very often act in coordination, and it is the action of the gene group, instead of the action of each gene, that can be represented by model parameters. The set of interconnected processes controlled by such a group of genes was defined by Tardieu (2003) as “meta-mechanism”. The essential is to pick out the right level of organization at the cell, organ or plant level where the consequences of the switch explaining genetic variability in the mechanism and response curves to environmental factors can still be represented to explain the observed variations of the trait of interest.

1.3.3 Gene-based Modelling

When information on specific genes is available, a gene-based model can be attempted, directly relating genotypic parameters to the expression of a few key genes. One of the earliest works was carried out by White and Hoogenboom (1996) and Hoogenboom and White (2003) using the BEANGRO simulation model for common bean (Phaseolus vulgaris L.). They assumed that seven genes controlled some of the genotypic parameters of the model and replaced them by linear functions describing the effect of the genes. A simple theoretical genetic model was considered with two alleles for each gene, one dominant and the other one recessive. The genotypes of 30 common bean cultivars were determined for these seven genes and included in the model in place of the genotypic parameters. The new model (of class 5), GeneGro, simulated growth and development and could even simulate new G × E interactions. This approach has been recently included into the soybean simulation model CROPGRO-soybean to characterize the effect of six loci on growth and development, using a set of isogenic lines (Messina et al. 2006).

When a trait is controlled by a low number of major genes , modelling of gene network can also be attempted. Such an approach has been successfully used to model flowering time (Welch et al. 2003; Welch et al. 2004) and cell cycle and expansion in leaves (Beemster et al. 2006) for Arabidopsis (Arabidopsis thaliana (L.) Heynh.). More recently, Coen and co-workers investigated the mechanisms by which genes can control the emergence of complex shapes in Snapdragon flowers (Green et al. 2010) and Arabidopsis leaves (Kuchen et al. 2012). In their model, a set of experimentally-defined rules fixes the values of two key quantities, the direction of growth (the tissue polarity) and the local growth rates, as a function of the expression of a few genes, thus providing an implicit coupling between the cellular and the tissue levels. Petal tissue is described as a continuous sheet that grows and bends under the effect of a growth field, as specified by the tissue polarity and the local growth rates, according to elasticity theory (Kennaway et al. 2011). At the cellular level, the interactions among genes are described by a gene regulatory network that captures the evolution of gene expression levels during organ development. A specificity of the present model is that developmental changes are taken into account by explicitly modifying both the genetic control of the growth field and the regulatory interactions among genes, according to the experimental data.

Currently, the strongest limitation to develop gene-based models for complex traits is the lack of knowledge and characterization of specific genes or loci controlling these traits , including epistatic interactions and pleiotropic effects , to define the phenotypic fingerprint of cultivars for genotypic parameters. Moreover, detailed studies to quantify the environmental effects on gene expression and gene action are also required.

1.4 Modelling Cellular Networks

This section aims to draw the attention to a panel of approaches, developed in the context of systems biology that can be used to analyze, simplify and model the behaviour of cellular network s. Several formalisms are available, in a perpetual trade-off between predictive power and information needed. Here we restrict to those techniques that require a reduced amount of information as these approaches are more likely to be appropriate when dealing with large and under-characterized systems, like plants. For simplicity, we separate the analysis of gene regulatory networks and metabolic networks as they historically evolved in distinct domains, and specific tools are available.

1.4.1 Modelling and Analyzing Gene Regulatory Networks

1.4.1.1 Qualitative Models of Gene Regulatory Networks

A number of methods exist to model the dynamics of a gene regulatory network with increasing accuracy, including discrete and continuous approaches, deterministic or stochastic (see de Jong (2002) and Schlitt and Brazma (2007) for a review). Here we focus on qualitative approaches (see Appendix 1) as handy methods to capture the logic of gene control without the need for precise parameter values, rarely available for most biological systems. The simplest formalism for gene regulation is a Boolean model in which each gene is represented as binary switch that can be either on (value 1) or off (value 0). At any given time, the state of the network is represented by the n-tuple of zeros and ones describing which gene is active or inactive. Transitions from one state to another are determined by gene mutual regulation, expressed by means of logical rules. The dynamics of a logical network is thus represented by a sequence of states, describing all possible activation/inactivation pathways compatible with a given regulatory logic. Any trajectory naturally leads to a steady state, i.e., an expression pattern that is maintained indefinitely by the system (it can be either a fixed point or a cycle). The existence of multiple steady states is often associated with the existence of distinct developmental outcomes.

The disadvantage of Boolean model s resides in their high level of abstraction that makes it difficult to integrate (when available!) data on promoter activity and sequential gene regulation. Indeed, a same transcription factor can regulate the expression of several genes depending on its concentration. In this perspective, an extension to the Boolean formalism is the logical models, in which the simple binary nature of gene activation/inactivation is replaced by a variable that is able to assume p discrete values (0, 1, 2, …, p). In this way, a level-dependent action of each transcription factor can be included into the logical rules (see Fig. 1.1 for an example). Alternatively to logical models, piece-wise linear models offer a formalism that is somehow closer to a continuous description, combining the qualitative approach with a time-continuous description of gene regulation (see Appendix 1 for more information). We refer to Snoussi (1989) and Wittmann et al. (2009) for a discussion on the relation between logical and piece-wise linear models. For both types of models, software is available for the construction and analysis of a qualitative model of gene regulatory networks (de Jong et al. 2003; Gonzalez et al. 2006).

Thanks to their simplicity, qualitative approaches , in combination with formal verification tools (Monteiro et al. 2007) can be used to test the coherence of experimental data, possibly pointing out missing regulators or interactions between components.

In this perspective, a promising approach is the one proposed by Li et al. (2006) to investigate the functional basis of Abscisic Acid (ABA) signaling. Starting from sparse literature data, including protein-protein interactions, knockout experiments and pharmacological tests, the authors developed an heuristic qualitative reasoning to (a) assemble a consistent signal transduction network of ABA-induced stomatal closure and (b) build a (Boolean) dynamical model of the system. Interestingly, the lack of quantitative information on process kinetics is circumvented by randomly sampling all possible updating orders (Chaves et al. 2006) and computing an average probability of stomatal closure over 10,000 initial conditions. The model is then used to predict essential components of the system, evaluating the effects of single and multiple node disruptions on the resulting responsiveness of stomatal closure. Notice that due to its qualitative nature, model predictions are valid independently of the specific kinetic properties of the system. Nevertheless, future quantitative data on biochemical mechanisms could be easily accounted for, by replacing the corresponding Boolean rule with a stochastic or continuous description, in the framework of hybrid modelling approach (Chaouiya 2007; Chaves et al. 2006).

1.4.2 Modelling and Analyzing Metabolic Networks

1.4.2.1 Steady-state Models of Metabolic Networks

The so-called Constraint-Based (CB) approaches use the stoichiometry and the thermodynamics of biochemical reactions as mathematical constraints to progressively reduce the space of possible steady-state solutions of the metabolic system of equations (see Appendix 2). In this framework, Metabolic Pathways Analysis aims at describing all possible steady-state behaviours of the system, compatible with mass balance and thermodynamics as constraints (Papin et al. 2003; Schilling et al. 2000; Schuster et al. 2000). The aim here is to provide a general characterization of network capabilities, pointing at specific designing features that may provide insights into the functional organization of the system (Rios-Estepa and Lange 2007; Stelling et al. 2002).

Among all possible behaviours predicted by pathway analysis, only few are actually realized by a given organism, depending on environmental conditions. The hypothesis behind this observation is that a selective-external pressure acts as an additional constraint that favors few specific flux distributions instead of others. Flux Balance Analysis (FBA) is a method developed by Palsson and collaborators that aims to predict such reasonable (i.e., likely to be realized) flux distributions by assuming that they maximize a given objective function (Orth et al. 2010). Recently, a number of improvements to FBA have been developed, with the aim of relaxing the notion of optimality. The impact of alternative optimal states (i.e., alternative solutions that share the same optimal score) have been analyzed (Mahadevan and Schilling 2003) as well as the existence of non-optimal solutions (Mahadevan and Schilling 2003; Segrè et al. 2002), (i.e., solutions that score near the optimal value but not exactly). These solutions indeed may be more appropriate to describe the behaviour of genetically engineered organisms (e.g., a knockout mutation) or to compare with experimental data, for which optimal growth conditions are not guaranteed.

In plants, a number applications of pathways analysis and FBA are starting to appear, as reviewed by Rios-Estepa and Lange (2007) and Sweetlove and Ratcliffe (2011). In spite of these successes, two main problems currently limit the application of CB methods to plants. The first is the presence of sub-cellular compartments that can modify the predicted flux distribution when not a priori included in the metabolic model. To date, assigning a reaction to a specific compartment is far from trivial. Experimental data are still scarce and information on metabolite transport mechanisms between different organelles is often lacking, due to technical difficulties (Allen et al. 2009). Further advancements in Nuclear Magnetic Resonance spectroscopy and fluorescence methods however are meant to overcome these problems, contributing to the development of more realistic CB models. The second issue, proper to FBA, is more fundamental and regards the maximization of an appropriate objective function in plants (Sweetlove and Ratcliffe 2011). Historically, FBA have been developed for microbes for which good objectives functions may be established (generally optimal biomass production) and whose rapid adaptive evolution guarantees a near-optimal functioning, soon after a perturbation. In the case of plants, adaptation is generally too slow to guarantee optimality in a reasonable experimental time and the choice of maximal cost function can be called into question. In this context, alternative methods using sub-optimal solution may be more appropriate. The method of minimization of metabolic adjustment (Segrè et al. 2002) in particular has been suggested as especially suitable as it allows to investigate the effect of a perturbation starting by the knowledge of the initial flux distribution only. Experimentally determined flux distribution (via steady-state isotope-labeling) may therefore be used, thus avoiding the problem of the objective function definition.

1.4.2.2 Dynamical Models of Metabolic Networks

Plants, more than other systems, are continuously subject to fluctuations of environmental factors that can induce rapid rearrangement in metabolite levels or in metabolic fluxes. In this situation, the steady-state assumption behind constraint-based methods can become a limit. Further insights into the dynamics of metabolism and in the “what-if” scenarios can be obtained by building a kinetic model of the system, describing how metabolites concentrations change over the time due to the interactions between other molecules. Traditionally, the dynamics of a metabolic system is described as a set of nonlinear ordinary differential equations (ODE system) assigning to each reaction a rate law (e.g., Michaelis-Menten kinetics, as a classical choice) which describes how its speed depends on the concentration of other molecules (metabolites, enzymes). The choice of the appropriate rate law and its complete definition is extremely expensive and requires a good knowledge of all biochemical steps. For this reason, most available kinetic models describe only small metabolic networks (a dozen of variables) where kinetic information and parameters values have been derived from literature data (Uys et al. 2007; Nägele et al. 2010) or from dedicated experiments (Curien et al. 2009; Beauvoit et al. 2014).

For larger networks, such information is generally not available and approximated expressions for reaction rates have to be used.

A first strategy is based on the use of simplified kinetics that are valid near a reference state, usually chosen to be the steady state of the system (Heijnen 2005; Stitt et al. 2010). Smallbone et al. (2007) propose a method based on Flux Balance Analysis . The idea is to make fluxes vary according to a lin-log kinetics (Visser and Heijnen 2003; Heijnen 2005) around their steady state value (as determined by FBA) and then predict the evolution of metabolic concentrations according to the ODE system. The main advantage of this approach resides in the limited information required; indeed, the model can be defined using the information contained in the stoichiometric matrix only, even in the absence of experimental data for kinetics parameters. Of course, limitations already discussed for FBA naturally apply to this approach too, especially in what concerns the assumption of optimality for complex systems like plants.

A step further along the simplification of reaction rates, qualitative methods offer a way to investigate essential dynamical properties of the network that are invariant for a range of reaction mechanisms and parameters values. Among all available formalisms for qualitative modelling, Petri Nets (see Appendix 1) are particularly suitable for metabolism because their structure agrees well with the idea of conversion embodied in biochemical reactions (Chaouiya 2007). In particular, Petri Nets allow taking into account reaction stoichiometry and differences in reaction rates (“delay”), when this information is available.

In the case of well-studied systems, Palsson and coworkers proposed an alternative approach for modelling large metabolic networks that relies on the increasing availability of high-throughput data (Jamshidi and Palsson 2008b, 2010). To this aim, a stoichiometric model of the network is combined to a mass action description of rate laws. Data on flux distribution, metabolites concentration and equilibrium constants are used to estimate kinetic parameters and simulate the dynamics of the system around a particular steady state, for which data are available. Regulatory effects (enzymes binding, allosteric regulation) can also be taken into account by directly modelling the regulator-substrate binding reaction, once information on enzymes concentrations and binding rates are available. The advantages of this method reside in its scalability, as demonstrated by its application to the red-blood cell metabolism (Jamshidi and Palsson 2008b; Kauffman et al. 2002), and in the possibility of automatically refining the model, as long as new omics data are collected. The need for a rich and reliable set of experimental data, however, currently limits its application to plants.

1.5 Integrating Cellular Networks into Plant Models

When biological information is available, a mechanistic description of cellular network s can be attempted and integrated into plant models, in the perspective of ‘crop systems biology’ (Yin and Struik 2010). Such integration is relevant in two ways. From an ecophysiological point of view, the integration of cellular and molecular levels can help to refine plant models, shedding light onto the complex interplay between different spatial and temporal scales in the emerging system response. In particular, the presence of explicit molecular variables can help to identify those molecular mechanisms that may convey interesting agronomical properties to current crops varieties. From the point of view of molecular biology, the existence of an integrated model could offer a useful framework for interpreting omics data, in relation to environmental factors and agricultural practices.

The ambition of the so-called multi-scale models is to explicitly integrate mechanisms that take place on different temporal or spatial scales, while keeping the computational cost low (Baldazzi et al. 2012; Southern et al. 2008). Two main issues characterize these models: (1) the (simplified) description of processes on a common scale, and (2) the way different scales are connected, i.e., how the information is passed among organizational levels.

1.5.1 Model Simplification

In order to be coupled, models on a single scale must be reasonably simple to avoid computational problems but, at the same time, sufficiently complex to correctly represent the expected biological behaviour. The claim is that, for most ecophysiological questions, there is no need for detailed modelling: at the scale of plant development and adaptation , only few molecular mechanisms and variables are likely to significantly affect the behaviour of the system, and need to be explicitly accounted for (Hammer et al. 2006; Génard et al. 2007). Here we review a few of methods that can be used to analyze cellular network s and obtain a simplified representation of cellular functioning, in both variable number and mathematical expression.

1.5.1.1 Structural Analysis

As a preliminary step, the inspection of network topology by means of statistical and graph analysis methods can provide useful insights into the regulatory architecture of the system (see Barabási and Oltvai (2004) for a review), pointing at few nodes that “naturally” emerge as key variables of the system. The analysis of transcriptional regulatory networks in unicellular systems (Barabási and Oltvai 2004; Ma et al. 2004a) but also eukaryotic systems (Carrera et al. 2009) for instance, has uncovered a typical hierarchical structure, with few genes (hubs) having a huge number of outgoing connections (i.e., regulating a large number of genes). These genes thus represent a sort of ‘master’ regulators of the network, able to control most biological functions (Martínez-Antonio and Collado-Vides 2003; Seshasayee et al. 2009) and their adaptation to environmental changes (Görke and Stülke 2008; Hengge-Aronis 1999).

Another important aspect of structural analysis is the search for functional modules, i.e., sub-networks able to work (almost) independently of the rest of the network (Wagner et al. 2007). Several methods try to identify modules in an automated way (Wang et al. 2008). Most of them make use of connectivity properties (Ma et al. 2004b; Schuster et al. 2002; Tanay et al. 2004) whereas others combine topology and experimental data to increase their interpretability in terms of biological function (Mao et al. 2009; Sridharan et al. 2012). In plants modular organization has been recently highlighted in Arabidopsis (Mao et al. 2009): a hundred of modules have been identified and assigned to the main biological processes. Some of these modules are large (>1000 genes ), like the one related to photosynthesis but smaller modules are also identified for specific processes, as the one related to starch metabolism that includes only 10 genes, thus providing a reasonable starting point for further characterization.

1.5.1.2 Time Scale Analysis

When integrating processes of different nature, an analysis of typical time-scales involved can prove extremely useful for model reduction.

Based on time-scale separation, the original model can be usually rewritten into (at least) two distinct subsystems, corresponding to slow processes and fast processes:

$$ \begin{array}{l}\raisebox{1ex}{$d{x}_s$} \left/ \,\raisebox{-1ex}{$dt$}\right.={f}_s\left({x}_s,{x}_f,p\right)\hfill \\ {}\raisebox{1ex}{$d{x}_f$} \left/ \,\raisebox{-1ex}{$dt$}\right.={f}_f\left({x}_s,{x}_f,p\right)\hfill \end{array} $$

where x s and x f are the slow and fast variables, respectively, and p are parameters.

If the time-scales of the processes differ by at least an order of magnitude a few approximations are likely to be possible. Variables that are changing on a time-scale much slower than the one of interest can simply be assumed constant, averaging out (small) variations in the time-window of interest (Radulescu et al. 2008). For variables that are evolving on a time-scale much faster than the reference time-scale, one commonly assumes the fast variables to be in a quasi-steady-state , i.e., instantly adapting to changes occurring on the reference time-scale. Mathematically speaking, this means that the dimension of the model can be reduced by setting the time derivative of the fast system to zero (Heinrich and Schuster 1996), thus resulting in a simple set of algebraic equations.

$$ \begin{array}{l}\raisebox{1ex}{$d{x}_s$} \left/ \,\raisebox{-1ex}{$dt$}\right.={f}_s\left({x}_s,{x}_f,p\right)\hfill \\ {}\raisebox{1ex}{$d{x}_f$} \left/ \,\raisebox{-1ex}{$dt$}\right.={f}_f\left({x}_s,{x}_f,p\right)=0\hfill \end{array} $$

If a kinetic model of the system is available, the analysis of time-scales can be rigorously performed by means of modal analysis (Jamshidi and Palsson 2008a). The application of this method naturally leads to pooling of variables into groups of species that evolve in a coordinated fashion above a specific time-scale. This means that the model size can be effectively reduced by considering the dynamics of pools as representative of the dynamics of their constitutive species. Differences in the typical time-scales of metabolism (of the order of seconds) and gene expression (minutes to hours) have been recently exploited to investigate the coupling between metabolic and genetic networks in bacteria (Baldazzi et al. 2010; Covert et al. 2008; Shlomi et al. 2007).

1.5.1.3 Metabolic Control Analysis

When dealing with metabolic networks, the stoichiometry of the biochemical reactions imposes a rigid constraint to the dynamics of the system: any change in a flux (or metabolite concentration) must be compensated by other changes in the network, thus linking local kinetics properties to the global system behaviour.

Metabolic control analysis (MCA) is a strategy to analyse how the control of a metabolic pathway is shared among the different reactions (Heinrich and Rapoport 1974; Kacser and Burns 1973). The degree of control of a reaction is quantified in terms of control coefficients, defined as the fractional change of the system property (flux (ν) or metabolite concentration (x), at steady state) in response to a change in enzyme activity E.

$$ {C_{ij}}^v=\frac{\partial {v}_i}{\partial {E}_j}\cdot \frac{E_j}{v_i}=\frac{\partial \ln {v}_i}{\partial \ln {E}_j} $$
$$ {C_{ij}}^x=\frac{\partial {x}_i}{\partial {E}_j}\cdot \frac{E_j}{x_i}=\frac{\partial \ln {x}_i}{\partial \ln {E}_j} $$

A zero control coefficient means that the system variable does not change when enzyme activity is modified. A flux control coefficient of 1 means that the reaction catalysed by the enzyme completely determines the flux value. In these conditions, no other reaction can affect the flux and the enzyme is said to be rate-limiting. In nature, rate limiting reactions are very rare. System control is generally shared among multiple steps and MCA allows ranking their importance, once the target variable has been defined.

Within the context of model reduction, MCA can be used to identify those enzymes that most affect a target process and that should therefore be retained in the model. Geigenberger et al. (2004), for instance, used MCA to investigate the biosynthetic pathway of starch in potato tubers, showing that starch accumulation is mostly controlled at the level of ATP transport between cytosol and amyloplast, with a minor role for starch synthesis enzymes.

1.5.2 Coupling Among Scales

Once the description of processes on a common scale has been defined, models have to be connected together. The scaling up from cell to tissue or organs properties implies understanding the way cells communicate and coordinate together. Recent studies have shown that cell-to-cell coupling may involve different but intertwined mechanisms that include biochemical signalling as well as physical processes, as in the case of relaxation of mechanical stresses (Howard et al. 2011). A full multi-scale approach requires the identification of the predominant mechanisms and of those molecular variables that effectively act as “hubs”, connecting different organizational levels (Keurentjes et al. 2011).

This is the case of calcium ions in heart models, probably the most advanced example of multi-scale approach (Hunter and Borg 2003; Noble 2002). The calcium intracellular concentration indeed affects the kinetics of actin filaments, providing the desired link between cellular metabolism and local mechanical properties of the heart. At the cellular scale, a set of ODE equations describe the temporal change of local concentration of ions (calcium and sodium mainly) due to the presence of ion channels, ion pumps and exchangers etc. The stretching of actin fibres, following Ca2+ binding, is modelled using a first-order kinetics and the fibre tension computed as a function of the intracellular calcium concentration. A diffusion equation, solved by means of a finite elements method, then describes the propagation of the mechanical wave on the scale of the whole organ. Interestingly, the organ level can also exert a feedback to the cellular one. In the model by Hunter et al. (1998) mechanical perturbation of the heart can alter the release of calcium ions from specific regulatory proteins, thus affecting the intracellular Ca2+ concentration.

In plants, multi-scale approaches are often employed to investigate organ emergence and morphogenesis , in both vegetative and non-vegetative organs (Band et al. 2012a; Prusinkiewicz and Runions 2012). A multi-scale model has been recently proposed to explain the dynamics of cell elongation, in Arabidopsis thaliana roots and its control by the hormone gibberellin (Band et al. 2012b). To this aim, root elongation zone is described as a single cell file along which gibberellin can diffuse. The movement of gibberellin hormone is described in details, both within and across the cells, as well as cell expansion along the elongation zone. At the subcellular level, a complex signalling network links the concentration of gibberellin to the distribution of DELLA proteins, a known growth repressor. Following vacuole expansion, gibberellin is rapidly diluted in the cell creating a significant concentration gradient of both hormone and DELLA proteins along the elongation zone. The model predicts indeed a progressive accumulation of DELLA towards the end of the elongation zone, thus explaining the observed reduction of cell elongation.

The above examples illustrate the potential of multi-scale modelling in elucidating the emergence of complex phenotypes. Such examples however are still rare. Our knowledge of biological systems and plants in particular remains limited and further efforts are needed to understand the interplay among different organizational levels and identify potential hubs. At term, interactions with environment should also be taken into account in order to develop a full multi-scale ecophysiological model.

1.6 Conclusions, Open Issues and Perspectives

Recent advancement in high-throughput methods has led many authors to hope for the advent of crop systems biology, combining information from molecular biology and physiology with a broader look to plant development and growth, in relation to environmental factors and agricultural practices. This is a great challenge of integrative biology that needs the collaborative work of many disciplines. The success of crop systems biology relies on the close collaboration of scientists from different fields (including biology and mathematics), with several iterative cycles between experiments and models.

From a modelling point of view this calls for the development of new methods to account for gene control. Classical approaches by parameter specification have already proven their various interests. Among them we can highlight that integration to crop models is direct and does not require rebuilding new specific models and that quantitative genetic control can easily be taken into account. The drawback of course is a low explicative power of the fine mechanisms; model predictions may be valid only in well-defined experimental conditions, making this kind of model less amenable to exploratory in silico research. In parallel, approaches have been developed to model the behaviour of cellular network s. At this organization scale, knowledge is more complete and specific genes or loci controlling the traits are better known and described. A panel of approaches are available, but in this chapter we focused on gene regulatory networks and metabolic networks that could then be linked to crop models. The last part of the chapter was devoted to the integration of different temporal and spatial scales (cellular, tissue, organ) within a single model, in the perspective of multi-scale modelling.

Plant and crop ecophysiological models formalize traits as the result of genotypic and environmental effects and the relations among traits. In this sense they provide a platform of virtual profiling (i) for the integrative analyses of the impact of a combination of traits on whole plant and crop phenotype (e.g., Bertin et al. 2010; Hammer et al. 2009) and (ii) for quantifying individual impact of traits, or in interaction with other traits in a trait network, within a range of agro-climatic conditions. This opens the door towards new opportunities of virtual breeding of ideotypes such as developing genotypes specifically adapted to a set of conditions of particular interest (e.g., non-optimal pedoclimatic scenarios, new cultivation techniques, future climates). However, going back to experiments will be crucial to assess simulations by experimental evidences (Andrivon et al. 2013).

Despite recent advances in knowledge and development of tools, several scientific and technical challenges still need to be overcome. Considerable efforts are still needed to make the links between levels of organization and to integrate different types of information but we believe that this framework will soon become essential to further decipher plant complexity.