Keywords

1 Introduction

Among the different cell-free protein synthesis systems, the PUREsystem™ (Protein synthesis Using Recombinant Elements; PS for short hereafter), developed in 2001 by Ueda and coworkers [1] and commercially available, is one of the most known and employed, in particular in synthetic biology studies. Despite the relatively small number of chemical species contained in it (mainly DNA, ribosomes, a tRNA mixture, 36 purified E. coli enzymes, plus several small metabolites, for a total of about 80 different macromolecular species), it shows excellent performance in producing proteins starting from coding DNA [2]. Its description and preparation meet the standardization requirements of synthetic biology, it is listed in the Registry of Standard Biological Parts [3].

The studies of biochemical pathways in vitro proceeded in parallel to those performed in vivo, since in vitro approaches allow to test or measure quantities, kinetic parameters and other variables, that are not easily accessible in vivo. Together with the experiments, the necessity of a model able to describe the principal pathways lead several Authors to propose different strategies, focused on different pathways and using different detail level and different formal approaches. From early works where cell-free synthesis has been used to synthetize different isoforms of a protein [4], to more recent models where protein synthesis is studied in vivo or in vitro, by either “standard” differential-equation based models [5] or stochastic approaches to understand the role of the noise in the main biochemical patways [6, 7], trying to investigate emergent features due to the biochemical network complexity [8].

Apart from these general approaches, the PS has been successfully applied in lipo Footnote 1, allowing the possibility to monitor and study gene expression in small volumes.

For instance, an interesting study used the PS to study membrane proteins and pore-complex assembly using a novel approach called liposome display, where different DNA constructs are used as a template for in vitro protein production and assembly on the liposome membrane [9]. Other numerous applications of the compartmentalized PS were focused on understanding gene expression inside liposomes, such as in the presence of different membrane lipid composition [10], with different preparation methods [11], and in different-sized vesicles [12].

Understanding gene expression in small compartments is of crucial importance for defining the so-called minimal cell [10, 13, 14] a minimal entity able to display the fundamental properties of living system; the PS is the most used system for the construction of semi-synthetic minimal cells [13, 14].

Indeed, semi-synthetic minimal cells often consist of a liposome enclosing defined biochemical pathways, specifically pathways capable to synthesize proteins that could, in principle, close the circle towards a complete self-sustainability of the minimal cell: in other words, the different components of the minimal cells can cooperate to restore themselves. As of today, only a few papers have been published dealing with theoretical simulations, at different detail level, of PS behavior and time course [7, 1517], and only three deal with PS entrapped in liposomes [1820]. This last condition is very interesting since the smaller the liposome, the lower the probability that a large amount of molecules of each constituent chemical species be entrapped in the volume of the liposome. This implies that a standard simulation of PS based on deterministic ODE formalism is no longer suitable to describe a biochemical system beyond the deterministic limit. If even a single chemical species is present with only a few molecules, the behavior of all the PS is turned to stochastic.

To describe a stochastic chemical system, the most often used approach is to employ Gillespie algorithm [21]: this is an algorithm derived from collision theory that operates by selecting, by means of two suitably generated random numbers, which reaction r to execute at the current step, and the waiting time\(\tau \) for the next reaction. The probability distribution of r is uniform over the propensity of the reactions, while the \(\tau \) distribution is an exponential decay (i.e., the reaction system is described as a pure Markovian process). Gillespies algorithm has been proved to be correct: for large numbers of molecules and for long times, the time courses described by the algorithm are identical to those described by the Chemical Master Equation for the continuous case. Because of this characteristic, Gillespies algorithm (with several variants and improvements) has been applied in a very large number of stochastic simulations of chemical, and even biochemical, systems [22].

2 Describing the PS in a Stochastic Simulator

There are several problems to address in order to have a detailed description of the PS in a Gillespies algorithm-based simulator. These problems arise from the not negligible differences between a common chemical system as hypothesized by Gillespie, and a cell-free system, which hosts several complex multi-step processes. For instance, two major problems are represented by the description of transcription and translation: as they imply the presence of a nucleic acid (either DNA or RNA) with an extensive reaction machinery (either polymerases or ribosomes) bound to it. These complexes are spaced from one another by a certain minimum distance, and slide on the nucleic acid molecules moving one step only when the complex ahead has itself moved.

Even this description is very simplified; at any rate, it is incorrect to represent this as a standard stoichiometric system. The concept of sequentially ordered movements is not contemplated in Gillespies approach, and there is no natural way to describe it. A possible solution to bypass these obstacles is to use some dummy chemical species, which do not exist in reality, but which can be used to the required effect (the term dummy is introduced by analogy with computer programming, where dummy variables indicate placeholders variables). Therefore, we simulated the attachment and movement of polymerases and ribosomes through a series of dummy chemical species simulating binding sites and movement from a nucleic acid segment to the next.

Adding to the complexity of this scenario, we have to note that translation and transcription are simultaneous events in cell-free PS, and translation starts immediately after the transcription of the first RNA segment, while polymerases are still working on the DNA molecule. This situation is close to that of prokaryotic cells, where translation and transcription take place both in the cytoplasm. As a consequence, the dummy species and the real reactions for both transcription and translation should be arranged to work together.

In the following, we would like to explain in detail the structure of our stochastic description of liposome-entrapped PS.

3 Characteristics and Properties of a Stochastically Simulated PS

3.1 Principles and General Remarks About the Simulator Used (QDC)

In our case study, the PS is used to synthesize Green Fluorescent Protein (GFP): the DNA input molecules code for the standard GFP sequence. Accordingly with Gillespies approach, all the described chemical species can interact independently of each other, and the solutes entrapped inside the liposome, even in the case of dummy species, are supposed to be well-stirred.

We simulated the system by using QDC, a lab-made simulator, whose core is based on Gillespies direct method, with extensions allowing the user to simulate a metabolic experiment, rather than an isolated metabolic system. A detailed survey of QDC is presented in the literature [23]. Here we briefly recall only the most important concepts. QDC can simulate a metabolic experiment since it implements control functions that allow the user to exert some control on the metabolic system during the simulation. In particular, QDC allows: to simulate the addition of molecules at a given time; to simulate continuous feeding or leaking of some species at a set time rate; to specify the so-called “immediate” reactions (reactions with theoretically infinite propensity, that take place immediately after their stoichiometric conditions are met). In our experiments, we have extensively used immediate reactions to describe the advancement of polymerases and ribosomes on their respective nucleic acids.

QDC outputs three main result files: the time courses of all the chemical species present in the system; the propensity time course for each reaction; the number of times that each reaction has been executed. By analyzing and comparing all these three outputs one can extract information about the status of the system and detect possible biases that might have perturbed the simulation results (e.g., a possible stiffness of the system).

3.2 Transcription and Translation Processes

The GFP DNA sequence was divided (according to its length) into 80-bp-long segments, each of which is treated as a distinct chemical species, and named, for instance, DNA1, DNA2, DNA3,.... As a consequence, the polymerization process is represented by several reactions: their core module includes a second-order reaction for nucleotide binding, and a first-order reaction for nucleotide incorporation, which returns the polymerase molecule (which, in turn, can bind to another nucleotide) and a dummy product designed to track the number of nucleotides incorporated in the RNA molecule (these will be named RNA1, RNA2, RNA3, ...

Some immediate reactions determine the transition to the next step, ensuring the following conditions are met: (a) an adjacent DNA site is available, (b) the correct number of nucleotides has been added to the RNA sequence, (c) the corresponding RNA sequence is produced, (d) the previously occupied DNA site is released. Here is an example for transcription at the fourth step (GTP and ATP are considered):

figure a

This reaction routine was repeated different times ensuring the correct succession of molecular states; when only one DNA molecule is available, the polymerases advance one by one, separated by at least one DNA site between each other (elongating RNA polymerases are separated from each other by at least 80 bp [24]). The same strategy was used to describe the translation reactions.

An elongating ribosome (eR2) binds the complex carrying the aminoacid (EFaRGTP), after which it moves to the next codon, aided by the elongation factor EFg charged with GTP (EFgGTP); this translocation reaction yields an additional product used to regulate the progression to the next state.

figure b

After a fixed number of translocation steps (the minimal space between two elongating ribosomes is, as with polymerases, \(80\ \text {nt} \sim 20\ \text {codons}\) [25]) an immediate reaction occurs in a similar fashion as seen for transcription:

  1. 1.

    the correct amount of aminoacids are incorporated, and thus consumed;

  2. 2.

    the next free RNA site is occupied, and

  3. 3.

    the previous one is therefore freed;

  4. 4.

    an entity named PEPT is also produced, tracking the length of the peptide sequence produced so far:

For example, if 4 species named PEPT3 are present in a certain time of the simulation, this means that there are 4 peptides, still bound to the ribosomes, with a length spanning from \(27\times 3= 81\) to \((27 \times 4) -1 = 107\) aminoacids. This “segmental” partition of DNA and RNA, with the discrete and stepwise occupation by the transcription and translation molecular machineries respectively, is schematically represented in Fig. 1.

Fig. 1.
figure 1

A schematic representation of the stepwise transcription and co-translation processes (redrawn from [20]).

3.3 The Overall PS

As this system focuses on the synthesis of GFP starting from its DNA and other basic molecules (nucleotides, aminoacids, etc.), it hosts only a few reactions other than translation and transcription, namely the initialization of both processes and the eventual ribosome inactivation (which takes place 3 h after the start of protein synthesis, see [17]).

4 Results: Simulating in-lipo PS at Nanoscale Range

4.1 Poisson vs. Power-Law: Which Distribution for Solute Entrapment in Nano-Vesicles?

The process of solute entrapment during the formation of liposomes is of special interest, as it affects the distribution of molecules inside them - a relevant issue for studies on the origin of life. Theoretically, when no interactions are supposed to exist between the chemical species to be entrapped, or between these and the nascent lipid bilayer, a standard Poisson process well describes the entrapment mechanism. Recent experimental findings, however, show that, for small liposomes (100 nm diameter), the distribution of entrapped molecules is best described by a power-law function [26], where supercrowded liposomes (i.e., liposomes showing a very high inner solute concentration) are present in a very small but not negligible number. This is a matter of great consequence, as the two random processes generate two completely different scenarios.

We used QDC to simulate a GFP-synthesizing PS encapsulated into liposomes, with the solute partition inside the vesicles obtained by the two entrapment models: a pure Poisson process or a power law. The protein synthesis in lipo was studied in both cases to highlight experimental observables that could be measured to test which model best fits the real entrapment process.

The time course of protein synthesis within each kind of vesicle was then simulated, as a function of vesicle size. Our study can predict translation yield in a population of small liposomes down to the attoliter (\(10^{-18}\) L) range. Our results show that the efficiency of protein synthesis peaks at approximately \(3\cdot 10^{-16}\) L (840 nm diam.) with a Poisson distribution of solutes, while a relative optimum is found at around \(10^{-17}\) L (275 nm diam.) for the power-law statistics. Figure 2 shows the time course of GFP synthesis in a vesicle of \(10^{-17}\) L volume: the protein copies accumulate over time, until the depletion of inner reagents stops the generation of new copies.

Fig. 2.
figure 2

Time course of the GFP synthesis in a \(10^{-17}\) L vesicle, populated accordingly to a power-law distribution. In the abscissa the time (in s), in the ordinate the number of GFP molecules synthesized (redrawn from [19]).

Our simulation clearly shows that the wet-lab measurement of an effective protein synthesis at smaller volumes than \(10^{-17}\) L would rule out, according to our models, a Poisson distribution of solutes, and thus would indirectly support that a supercrowding effect took place. This suggestion fits well with a discussion found in [27].

4.2 How Is Intra-vesicle Solute Composition Driving Gene Expression from DNA to Protein?

The next step was to determine whether all chemical constituents of the PS are entrapped in liposomes according to a power law (then, giving rise to a supercrowding effect) independently of each other. Independence (or lack thereof) in solute partition is a foundational feature which we hypothesized would have a significant effect on the final state of the system (e.g. efficiency of gene expression). Finding key end-state parameters dependent on the original entrapment dynamics would also offer an obvious pathway to empirical testability. In this study, the simulations were firstly performed for large liposomes (2.67 \(\mu \)m diameter) entrapping the PS to synthesize GFP. By varying the initial concentrations of the three main classes of molecules involved in the PS (DNA, enzymes, consumables), we were able to stochastically simulate the time-course of GFP production. A mathematical fitting of the simulated GFP production curves allowed us to extract quantitative parameters describing the protein production kinetics; as expected, these parameters resulted significantly dependent on the initial inner solute compositions. Then we extended this study to small-volume liposomes (575 nm diameter), where intra-vesicle composition is harder to infer due to the expected anomalous entrapment phenomena.

In our in silico perturbation of the PS composition, we observed how the competition between DNA transcription and translation for energy resources (here represented by the sum of ATP and GTP available molecules) shapes the protein production dynamics, in terms of total protein yield and its production kinetics. Figure 3 offers a global survey of these results: it shows, for each class of initial DNA inner concentration, the resulting global production in GFP and the corresponding use of energy molecules by the different biochemical processes.

Fig. 3.
figure 3

The graph shows the final GFP production obtained (ordinate) according to the initial DNA inner concentration (abscissa). The pie charts placed in correspondence of the abscissa summarize the global energy molecules use for all the case studied (redrawn from [20]).

A mathematical formalization of the PS reaction network allowed us to focus on different aspects of in lipo protein production, and a discussion about our computational strategy together with its scientific relevance is presented in the following sections.

5 On the Relevance of a Detailed Stochastic PS in Silico Model

Different in silico approaches have been tackled for the description of the complex system of reactions encompassing the PS. Despite the large number of biochemical species, the interactions between the single molecules are well-known. Moreover, higher-order interactions between different molecular classes seem to have a modest impact on the overall protein production [29]. Therefore, it is reasonable to approach a comprehensive description of the PS by modeling every reactant as a single entity.

A deterministic formulation of transcription and translation processes can be a useful representation of the PS [17] under the assumption of a continuous, homogeneous distribution of solutes. When dealing with compartmentalized biochemical networks, the intrinsic stochasticity of molecular reactions plays an important role in the observed protein production. A very recent study [30] highlighted the importance of stochasticity in cell-sized lipid compartments, showing how fluctuations in the number of DNA molecules is one of the main responsible of protein production noise using the PS. As volume decreases, well-described anomalous entrapment phenomena [26] create a drastic vesicle-to-vesicle variability in terms of protein production. Moreover, the extremely low number of molecules that eventually can govern a cellular process poses a great challenge in modeling the PS biochemical network. A stochastic representation of the PS deals with those limitations, allowing researchers to focus on single components of the network, with the possibility to focus on nanoscale-sized compartments. This creates the ability to further test different assumptions of the encapsulation efficiencies for different groups of reactant, which, by differentially interacting with the lipidic bilayer, can differentially drive protein production in a size-dependent manner.

Moreover, quantifying the extent to which the intrinsic vesicle-to-vesicle variability influences the protein production in small compartments is of great relevance to discriminate between different components of variability in nano-scaled compartmentalized gene expression: one coming from the random collision between molecules, and the other coming from different internal vesicles composition.

6 Conclusions and Future Directions

Cell-free technologies like the in lipo-PS allow us to monitor and study gene expression kinetics under controlled, known conditions, whereas this is not possible in vivo as well as for whole cell-extracts (e.g. E. coli extracts) also used in similar synthetic biology approaches. Such information is of great value for the most diverse applications, in both applied and basic research [30, 31].

Our main focus is the quantitative understanding of compartmentalized gene expression with respect to the internal solute distribution. By testing different entrapment mechanisms and their effect on protein production in the different PS processes, it is possible to create a link between an observable feature (protein production) and the internal liposomal solute distribution. The ability to infer the precise internal vesicle content is of course of great interest for understanding the different kinetics in different size compartments. This means that the ideal experimental approaches are those able to perform single vesicle measurements, as population measurements could be unable to characterize the high complexity of vesicles behavior.

The results coming from our stochastic modeling are geared to aid experimental design, and experimental validation of our predictions will be needed to further explore and improve our understanding of the compartmentalized PS. A better description of liposome formation kinetics will also help the community to test different encapsulation efficiencies for different molecules, which will in turn affect the internal gene expression kinetics.

As more data about PS use in compartmentalized systems become available, our interpretation of in silico experiments will also improve, allowing us to further tackle the problem of understanding nano-scaled protein-producing liposomes.