Keywords

1 Introduction

The emerging science of synthetic biology is providing novel tools for the design, realization, and control of biological processes through the programming of cells’ genetic code [21]. These tools are allowing engineers to study and access the basis of molecular information processing, and to develop software with specific functionality that can be engineered into living organisms [33]. Synthetically engineered biological organisms (SEBOs) are leading to novel applications that include the enhancement of soil quality [6], the creation of novel sources for biofuels [36], the development of engineered biological tissue [26, 31], and the synthesis of biocompatible intelligent drug delivery systems [24].

SEBOs are created by manipulating and combining genetic components in the laboratory to modify existing organisms’ traits or to generate entirely new functionality, including computational components (both analog and digital [32]). This will open the road to the development of biological computers [5]. From the identification of sequences of DNA genetic code with precise biological functions, to the design of components into more complex programs with information processing and logical control capabilities, software principles are used in every SEBO. As a consequence, SEBOs are truly software-intensive systems [17].

While synthetic biology is opening doors to the programming of living organisms, questions should be posed about the interaction of these programs with human life and the environment. Even if we ignore the possible malicious uses of this technology [22], or the bioethical aspects [2], faults and failures in biological programs can pose serious threats, such as by the inadvertent production of compounds toxic to humans, or by polluting the environment, or colonizing ecological niches. If SEBOs are used for drug delivery or to protect us from harm, then we must also rely on the intended behavior occurring as expected. However, SEBOs are living and therefore their programs are dynamic in nature. They are subject to mutations, and regular evolution, and this has made it difficult to build a framework for their certification, and ultimately for governmental regulation and guidance [1, 25].

To date, there has been little research that provides systematic techniques to verify biological systems in general [8]. In this paper we argue that it is possible to build assurance cases (more specifically safety cases) for reasoning about the behavior of SEBOs. We first show how they are similar to software systems, and as such we can build arguments and evidence for their safety. However, we also identify some key challenges due to the their mutation and evolution over time. We propose a new dimension to the assurance case, the assurance timeline, which leverages recent work on dynamic assurance cases [10], and at the same time addresses a new problem, random changes in the system to be assured. Since the software code itself mutates as it evolves, it can change functionality over time. The assurance timeline captures this new dimension through the identification of intervals that may require either new evidence or a new set of claims and arguments. We present an example of an SEBO assurance case and discuss its potential evolution, based on a real SEBO project from the literature.

The contributions of this paper are:

  • The proposed use of assurance cases for SEBOs;

  • A new kind of assurance case - the assurance timeline; and

  • An illustrative example demonstrating how these might be applied to a real-world SEBO.

The rest of the paper is organized as follows. In the next section we present some background on synthetic biology and motivation for the use of assurance cases. We follow this with an example of applying the assurance case and an assurance timeline in Sect. 3. We then present related work in Sect. 4 and finally conclude in Sect. 5.

2 Background and Motivation

Consider a strain of high yield soybeans that thrives on a specific type of protein. If that protein is only produced by bacteria that also produce a toxic protein which can seep into the water supply, the cultivation of these soybeans would be an environmental hazard. A synthetic biologist might recognize an opportunity and solve this problem by (i) modifying the soybeans to utilize a different protein while retaining a high yield, or by (ii) designing a program with protein inhibitors and repressors that will prevent the production of the toxic protein in the pre-existing bacteria, or by iii) developing a new version of the bacteria that only produces the beneficial protein and does not output the toxin at all. In the rest of this section we describe how this type of programming is achieved in SEBOs and then discuss the need for building assurance cases.

2.1 Synthetic Biology

In synthetic biology, engineers design plasmids and insert them into living cells. A plasmid is a DNA sequence that is not part of a cell’s chromosome but that can trigger the ribosomes to synthesize new proteins. Ribosomes are responsible for creating proteins, which underlie most cellular functions [4]. Engineers have already implemented various logic gates within SEBOs including AND, NOR, and NOT gates [4]. They have also realized more complicated systems, including feedback loops, intercellular signaling, biological on/off switches, oscillators, and counters, leading also to the idea of cellular memory [4].

A language called the Synthetic Biology Open Language (SBOL) [14] has been developed to represent programs for SEBOs (or SEBO parts). Figure 1(a) shows a small (abstract) program in SBOL. At the top left is a promoter. A promoter is a short DNA sequence that causes transcription to occur (transcription is how DNA copies itself). It is followed by a ribosome binding site that allows ribosomes to bind to the transcribed mRNA to start its translation into a protein. This is followed by a coding sequence which contains the information to synthesize a protein. Finally, it ends with a transcription terminator which indicates where to stop the transcription of DNA into mRNA. The circular line back to the promoter indicates a repetition operator in this process. A concrete version of this program would indicate specific instances of promoters, terminators, ribosome binding sites, etc. If we consider SBOL as a programming language, we can view the DNA sequences as our byte code and the transcription and translation of the DNA into proteins as compilation, after which point we have proteins which perform as binary code, yet are not humanly readable.

The International Genetically Engineered Machine (iGEM) Competition, hosted by MIT since 2004, is an exemplar of the widespread use of SEBO programming. It is a competition for students ranging from high school through to graduate school. Teams are given a kit of biological parts and develop novel functionality. The parts are added to biological chassis such as E. coli or other common bacteria. iGEM encourages modularity, reusability, and interoperability of parts. As such there are over 20,000 parts, or BioBricks, with known functionality in the iGEM parts registry [19]. An example of a Reporter part from the registry, which causes an organism to fluoresce under certain conditions, is shown in Fig. 1(b). This part can be obtained from iGEM and added to a team’s own program in combination with a multitude of other parts.

Fig. 1.
figure 1

An example of (a) an SBOL program and (b) concrete instance of a reporter part that creates fluorescence from [18]

2.2 An Argument for SEBO Assurance Cases

Given the ability to mix and match thousands of promoters, ribosome binding sites, sequences, terminators, etc. and to define concrete logic for their combination, we can view this in a similar fashion to designing a program. We have an abstract model and should be able to predict behavior under a variety of conditions (inputs). While the programs and parts developed by iGEM are by default restricted to the laboratory environment, students are required to present informal safety arguments about their systems (if they were to be released) and the long term goal of synthetic biology is to make variants of organisms that can actually co-exist in the environment. Given that these new programs will interact with humans and their environment, we need confidence that the devices work safely as expected. In this direction, it is possible that synthetic biological systems can be designed with kill switches, which force the bacteria to die under specific conditions, but these too must be reliable. Assurance safety cases seem to be the natural way to provide the necessary evidence for these systems.

While we believe that safety cases are a good starting point for SEBOs, we also argue that they have novel characteristics which may add challenges to their formal definition. Given that they are living systems, once the code is compiled, we cannot assume (as in most software systems) that the software will remain unchanged. In fact, we expect that random mutations will occur during each generation of the organisms lifecycle and that new (previously unknown) functionality may appear over time. While a single mutation is unlikely to dramatically change an SEBO’s function, the cumulative effect over many generations can lead to strains that do not adhere to the original safety case. At the same time, SEBOs are subject to evolution, where natural selection will tend to promote mutated strains that best fit the environmental conditions, by growing and reproducing at a faster rate than others. Hence, while SEBOs share characteristics with other dynamic systems and may benefit from a dynamic safety case, the peculiarity of their mutation and evolution processes differentiates them from other systems studied in previous literature, such as unmanned aircraft systems (UASs)[10]. However, we also believe that the seemingly overwhelming complex nature of SEBO mutations and evolution can be successfully captured at regular time intervals, which can allow identifying when new evidence (or new safety cases) are needed. This will lead to what we call the assurance timeline, presented in the next section.

3 Assuring SEBOs: An Illustrative Example

To illustrate our ideas we have studied a project from the 2012 iGEM competition and used their safety documents as the basis for our example. This project, called the Food Warden, was created by the Groningen team [16]. The goal of Food Warden is to engineer Bacillus subtilis to change color when it detects spoiled meat in a refrigerator. Spoiled meat releases ammonia, and ammonia contains nitrogen. The team uses the PsboA promoter that causes a color change in the bacteria Bacillus subtilis (or B. subtilis for short) to alert in the presence of nitrogen. The design includes a semi-permeable “sticker” with an outer membrane made from an FDA-approved material, Polymethylpentene, which has nanopores large enough to allow passage of gases, but small enough to prevent the bacteria from escaping. The sticker relies on sensing nitrogen molecules from the rotten meat through a process known as quorum sensing. When a threshold number of signaling molecules is detected by a sufficient number of bacteria, it will trigger a response. The bacteria will produce yellow fluorescent proteins to visually indicate meat spoilage. The sticker also features a breakable inner packet of Luria broth as a growing medium, which is activated when ready to use. If the bacteria escape the packet, the team claims that the bacteria cannot live without the food source in the sticker, and that the original bacteria are known to be non-harmful to humans or to the environment.

The team identified five separate threads of safety requirements. The first two, general safety of synthetically engineered organisms, and safety in the lab, are out of the scope of our assurance example presented here. The other three, we discuss briefly. First, the team identifies the safety of their sticker design which contains the engineered organisms. They consider this a public safety issue, since B. subtilis needs to remain isolated from the consumer and the meat. Second, they highlight food safety, which includes protecting the consumer from the effects of eating spoiled meat, i.e. the color must be visible and reliable. Last they highlight environmental safety, in the case the bacteria should accidentally be released into the environment (all iGEM projects are laboratory based, however if this project was to move out of the laboratory, it would become an issue).

Fig. 2.
figure 2

Food Warden (partial) SEBO assurance case using GSN

3.1 Initial Assurance Case

We have built an initial (partial) safety assurance case using the goal structuring notation (GSN), to demonstrate its feasibility for SEBOs. We show this in Fig. 2. The top level Goal (G0) is that Food Warden is safe for humans. Using arguments over identified hazards as the strategy (S0) there are four subgoals for this case (G1-G4). G4 has an open diamond showing that it is not complete. G1 is the goal that the sticker will keep the bacteria isolated under normal conditions. This goal has a claim that the sticker membrane will not allow leakage of the bacteria. G2 says that the sticker is not harmful to humans. Both use a strategy (S1) that identifies known material properties, which connects to three subclaims (C3-C5), namely, the membrane material is smaller than 0.5 \({\mu }\)m in diameter, it is made of a known material with strong properties, and, finally, it is safe (C4) for humans. Evidence for these claims includes testing the diameter of B. subtilis, confirming the strength of the material, and the existence of FDA approval of human safety of the material.

Subgoal G3 reasons about the safety of the system’s logic – it will correctly identify meat that is unsafe for humans. It states that spoiled meat will result in observable fluorescence of the bacteria. This has a subclaim that the sticker works for 30 days once the inner packet is broken. For these goals the strategy used is that of existing food standards. Two assumptions are included. A0 states that spoiled meat will emit ammonia consistent with the total aerobic microbial count (TAMC) which is used to measure the degree of spoilage as determined by FDA acceptable levels. Since the bacteria sense the ammonia which then cause them to fluoresce, this assumption must hold. The second assumption (A1) states that the inner packet contains Luria broth with a 30-day supply of glucose (required for growth). Claims 6, 7 and 8 argue that the bacteria fluoresce once the TAMC threshold is reached, that this is visible under a range of light conditions and that there is sufficient nutrients for 30 days.

Finally, Goal 4 requires that release into the environment will not be harmful to humans. We have only partially expanded this goal (the team identified several other subgoals). Claim 2 says that the bacteria cannot live outside of the sticker, with evidence (E0) that shows via laboratory tests that B. subtilis will die without the glucose in the broth.

3.2 Evolution in SEBOs

While we show that a safety case can be built for a system such as Food Warden, this case is static. It does not account for evolution. Denney et al. [10] describe the need for dynamic assurance cases for systems such as unmanned aircraft systems. We argue that SEBOs are also dynamic. Not only do their environmental conditions change, so too does their “software” which is subject to random changes at regular intervals via evolution and natural selection. Since B. subtilis are living, there will be mutations to the organism’s DNA. However, we have models of the organisms and know their evolution frequency and magnitude, therefore we should be able to reason about the likelihood of a behavior change at a point in time. Assuming that we have this information, we can allow the safety case to hold for a specified regular time interval before re-assurance.

Fig. 3.
figure 3

Two types of SEBO assurance case evolution. A. shows a change required in the evidence and arguments, while B. shows a change in the assurance case structure.

We show how the safety case can evolve in two ways (see Fig. 3). First it may require a change in evidence. Second, it is possible that the entire structure of a branch of the assurance case will change. We discuss each of these next. Assume the bacteria are engineered to only digest glucose in the Luria broth and that there is only enough glucose to keep the bacteria alive for 30 days. If the bacteria evolve or mutate to digest an alternate sugar such as lactose as well, then we would need new evidence for assumption 1 (A1) and claim 8 (C8), because the system has changed [20]. We can either provide evidence that there is no lactose in the Luria broth or the meat, or that there is an insufficient combination of lactose in the Luria broth to keep the bacteria alive for more than 30 days. We depict this evolution in Fig. 3A where E0 has been updated to E0’ at time interval 1 and A1 has been updated to A1’. This type of evolution does not change the assurance case - it simply requires that new evidence and assumptions are provided.

A second possibility for evolution, is that the bacteria develop the ability to digest the gases from the spoiled meat, and they no longer fluoresce to warn consumers since the gas has been consumed. It is possible, that an inhibitor protein may need to be added to the bacteria to prevent this behavior from happening. This would require a change in the structure of the safety case itself. We would need a new claim that provides evidence for the functionality of the inhibitor. We show this in Fig. 3B.

3.3 The Assurance Timeline

We now present the assurance timeline, an orthogonal assurance case that reasons about our confidence in the stability of the current assurance case. It argues that within specific times slices, evolution is unlikely to impact the existing case.

Fig. 4.
figure 4

Assurance timeline informs times intervals for re-assurance. Confidence decay occurs between timeslices. Evolution envelope contains set of possible changes at that interval. Selected changes are shown.

While biological mutations are random, if we use known models of organisms, we can infer the possible trajectories of evolution at a given point in time and group the potential changes into an evolution envelope. The envelope then contains a set of possible behavior changes which are reflected as changes to our current assurance case. We illustrate this idea in Fig. 4. Based on the set of possibilities within an envelope, we can analyze the evolved system to select the new assurance case. In Fig. 4 we see that at time T\(_2\), a change in evidence occurs, while at T\(_3\) a change in structure occurs. Within each time slice, there is a decay in our confidence of the current assurance case.

The assurance timeline itself can be viewed as an assurance case. Figure 5 shows this view (note that this is generic and may not be accurate for a specific organism such as B. subtilis). The top goal states that the system behavior is stable for 10 days (our time interval for this assurance case). The subgoals and claims are based on the known evolution of the organism. The evidence comes from empirical data or in silico computational models. The timeline can be combined with other elements of a dynamic assurance case such as increased traceability for a through-life assurance as proposed by Denney et al. [10].

Fig. 5.
figure 5

Assurance timeline as an assurance case.

4 Related Work

Safety assurance cases have been applied to a wide variety of domains. The list includes offshore drilling operations, railroads, nuclear power plants, avionics, national defense, automobiles, and medical devices [11, 23]. There has been research on how to build safety cases, such as showing evidence traceability [30], automating the collection of evidence [28], creating a controlled and structured textual language [3], arguing the need for a hierarchical structure [12], and automating safety case generation from tabular requirements specifications [9]. Although the trend has been to build more structure, formalism, and consistency, there have been arguments against formalism because safety cases require natural language which is open to multiple interpretations [15]. This work differs in that we are applying an assurance case to a biological domain.

The most closely related area to synthetic biology is the medical and medical devices domain. Within the medical domain there has been a push towards using assurance cases [7, 27, 35], however there have also been arguments against this idea [34]. Sujan et al. argue that since the healthcare industry is one where the “level of maturity of safety management systems is arguably still lower than in traditional safety-critical industries” it might be best to use safety cases only for internal purposes, rather than to have regulators mandate them. SEBOs may suffer similar issues.

There has been research on formal methods for biological organisms. David et al. [8] proposed use model checking and Petri nets for runtime verification of a biological oscillator, while Ellis et al. and Lutz et al. applied automated software requirement analysis and probabilistic model checking to DNA self-assembly [13, 29]. Both of these threads focus on a single biological function rather than the entire system within their environment, including mutations and evolution, and neither propose the use of formal assurance cases.

Finally, Denney et al. [10] propose dynamic safety cases which consider through-life assurance. In this type of safety case, environmental conditions change and emergent behaviors appear and the notion of continuous assurance over the life of the system is needed. We view our assurance timeline as a variation a dynamic safety case, where not only the environment, but the software itself is subject to random changes.

5 Conclusions and Future Work

In this paper we have shown that the emerging field of synthetic biology is a novel direction for building safety assurance cases. We have illustrated how we can build a safety case for an existing project from the iGEM competition. However, we also differentiate an SEBO assurance case in that there is an expected evolution over time given the living nature of these systems. We show how the assurance case can evolve in two ways. One simply requires new evidence, however the other may change the structure. We argue that the evolution itself can be reasoned about and propose the use of an assurance timeline, an assurance case for the evolution itself. This will provide arguments and evidence for the necessary time intervals at which the assurance case will change. As future work we will build larger, more complete assurance cases for other projects and work with biological engineers to develop evidence. We will also explore the use of regression testing techniques to reason about impactful change.