Abstract
Biomolecular computation is the scientific field focusing on the theory and practice of encoding combinatorial problems in ordinary DNA strands and applying standard biology lab operations such as cleansing and complementary sequence generation to them in order to compute an exact solution. The primary advantage offered by this computational paradigm is massive parallelism as the solution space is simultaneously searched. On the other hand, factors that need to addressed under this model are the DNA volume growth and computational errors attributed to inexact DNA matching. Biomolecular computation additionally paves the way for two- and three-dimensional self assemblying biological tiles which are closely linked at a theoretical level to a Turing machine, establishing thus its computational power. Applications include medium sized instances of TSP and the evaluation of the output of bounded fan-out Boolean circuits.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Biomolecular computation
- DNA computation
- Computing paradigm
- Computational media
- Parallel computing
- CREW PRAM
- TSP
- Boolean circuits
- Nondeterministic Turing machine
1 Introduction
The seminal paper [19] is widely known for essentially establishing the field of quantum computing. However, a lesser known offshoot is biomolecular computing, alternatively known as DNA computing. The founding notions of this computational paradigm were presented in [2], where it was proposed that regular DNA strands can represent combinatorial inputs instead of the functions of a living organism. Then standard lab operations can be applied to these strands in order to extract strands containing a solution. As a concrete application, the TSP was solved exactly through a brute force methodology in a medium sized graph \(G \,=\, \left( {V,E}\right) \) in \({\mathrm {O}}\left( {\left| {V}\right| + \left| {E}\right| }\right) \) elementary operations and \({\mathrm {O}}\left( {\log \left| {V}\right| }\right) \) DNA strands, indicating the potential of massive parallelism. This was repeated in [32] with a different algorithmic approach though in the sequence of lab operations, establishing the fact that novel and efficient algorithms are also necessary in this computational paradigm.
The primary objective of this survey is to concisely summarize the principles and notions of the paradigm of biomolecular computing with an emphasis on the potential for massively parallel computations. The latter may well serve in the dawn of the big data and 5 V era as an unconventional inspiration for the designers of parallel algorithms or distributed systems.
The remaining of this survery is structured as follows. Section 2 summarizes the principal concepts of biomolecular computing, the connections to known computational models, and describes computational applications not examined elsewhere in this survey. The elementary operations, advantages, and disadvantages of the biomolecular paradigm are explained in Sect. 3. The most important application, namely TSP, is described in Sect. 4. The parallelism potential is explored in Sect. 5, while factors working against the computation scale up are investigated in Sects. 6 and 7. The main points are summarized in Sect. 8 and certain conclusions are drawn. Finally, Table 1 summarizes the survey notation.
2 Previous Work
As stated earlier, the groundwork for the paradigm of biomolecular computing was laid in [2] followed by [32]. Soon, notions and applications emerged as noted in the early surveys [40, 45, 47]. The computational power of the paradigm is explored in [6, 9] and its limits in [18]. The enormous potential for parallelism is highlighted in [21,22,23,24, 43]. The notion of self assembly was applied to this paradigm in [1, 31, 48]. According to this principle, which is reminiscent of non-supervised learning, DNA tiles in a test tube given proper mobility conditions and time can attach themselves to tiles or strands containing fully or partial complementary sequences without human intervention in two [50, 51] or three dimensions [28]. Issues pertaining to complexity of the biomolecular paradigm are examined from various viewpoints in [4, 29, 46].
Another way to examine the potential and the complexity of biomolecular computation is through the simulation of the operation of sequential, bounded fan-in Boolean circuits with DNA strands as first shown in [3, 36] and described in detail in the follow up work [39]. Questions regarding the complexity of constructing and evaluating the output of such circuits are addressed in [34, 35, 37], whereas self assembling circuits are investigated in [38].
Among the algorithmic applications of biomolecular computation are the brute force parallel solution of k-SAT in [13] and in [5], dynamic programming on the Cell Matrix architecture [49], and splicing systems [11]. Shortest path algorithms implemented in biomolecular elementary operations are presented in [33] and in [42]. Length bounded computing with DNA strands and its connections to space complexity are explored in [20]. An evolutionary algorithm also expressed these operations is described in [12].
Finally, steps regarding the implementation of a DNA computer are given in [25, 26, 41, 44], although these proposals vary. DNA operations can be also simulated over Neo4j with properties in edges corresponding to physical or chemical DNA properties in an approach similar to the one presented in [30] for implementing persistent data structures. Concerning software, a fully functional graphical computing environment for biomolecular computation is described in [10]. It also includes DNA C, a C variant which is a combination of the C constructs pertaining to integers with an extension implementing the fundamental operations of biomolecular computing. The exclusion of floating point arithmetic should not come as a surprise, since biomolecular computation is discrete in nature and has been so far applied only in combinatorial problems. Nonetheless, should the need arise, floating point numbers can be approximated fairly well by rationals or by continued fractions. For instance, the golden ratio \(\varphi \) is represented by the infinite fractionFootnote 1
3 Paradigm Notions
3.1 Definition
Knowledge transfer and inspiration between computer science and biology has been fruitful. This relationship has already resulted in bioinformatics, connectomics, and computational biology. One of them, with direct reference to the everyday laboratory handling of DNA strands, is the paradigm of biomolecular computation which can be formally defined as
Definition 1
Biomolecular computing is the art and science of using DNA strands as computational medium to appropriately encode candidate solutions to (possibly intractable for a Turing machine) combinatorial problems and using standard biological laboratory techniques in order to select an exact solution.
From Definition 1 follows that any DNA encoding corresponds strictly to candidate solutions of a combinatorial problem and not to the design and functions of a living being. Also, the difference from the fields of bioinformatics and computational biology should be clear. Although the inspiration, terminology, and implementation are biological, the paradigm is definitiely computational as the objective is to codify and efficiently solve instances of intractable, at least under the conventional Turing machine model, combinatorial problems.
3.2 Abstract DNA Operations
Under the biomolecular computation paradigm the elementary operations applied to the DNA strands in the test tube are the following.
-
Initialize (\(T_0\)): Crate a test tube \(T_0\) containing each admissible candidate solution for a combinatorial problem according to a probability distribution, usually the uniform one.
-
Copy(\(T_0\), \(T_1\)): Copy the contents of \(T_0\) to the tube of \(T_1\).
-
Merge(\(T_0\), \(T_1\)): Mix the contents of \(T_0\) and \(T_1\) to \(T_0\).
-
Detect(\(T_0\)): Examine whether \(T_0\) is empty.
-
Select(\(T_0\), \(s_0\)): Extract \(s_0\), if present, from \(T_0\).
-
Extract(\(\ell \)): Extract strands of length \(\ell \).
-
Cleavage(\(s_0\), \(\sigma _0\)): Slice \(s_0\) according to the shorter template strand \(\sigma _0\).
-
Anneal(\(s_0\)): Create double DNRA strands from a single one.
-
Denature(\(s_0\)): Create single DNA strands from a double one.
-
Ligate(\(T_0\)): Create bonds between double strands inside tube \(T_0\).
The basic storage unit in this paradigm is the test tube which may contain a fixed volume of DNA strands. The latter are not necessarily of the same type. In fact, the opposite is quite common as the tube contains the results of a sequence of lab operations applied to a set of candidate solutions. It is only after the end of these operations where the solution, if any, to the instance at hand is extracted and cultivated that a test tube may contain only copies of a single strand.
4 TSP
Perhaps the most well known biomolecular algorithm is the one shown in algorithm 1 for solving TSP for a graph \(G = \left( {V,E}\right) \). Notice this is a brute force method which can be used among others to discover community structure in graphs as in [15, 27]. However, this approach could not be a basis for heuristic techniques such as those in [16, 17]. The primary characteristic of algorithm 1 is that all paths of length j are created in j steps. Assume that each vertex is encoded with \(b_v\) and each edge with \(b_e\) bases. Each vertex \(v_k\) has a unique coding \(s_k\) in DNA bases, while the edge \(\left( {v_i, v_j}\right) \) is encoded as the concatenation of \(\bar{s_i}\) and \(\bar{s_j}\).
5 Parallelism
Theoretically, the biomolecular computation paradigm can be reduced to simulating a CREW PRAM, a version of RAM with \(P > 1\) processors and M memory locations where multiple processors can read the same memory address but only one can write any given time at a given memory address. The CREW PRAM operation set LOAD, READ, WRITE as well as the memory configuration of this machine can be simulated by a sequence of biomolecular operations as shown in [46] through a sequence of successive configurations of length \({\mathrm {O}}\left( {\log \left( {PM}\right) }\right) \) each. Starting from a random configuration, mixing a sequence of admissible configurations, at the expense of a multiplicatively growing volume, finally yields one superconfiguration of concatenated configurations representing a desired sequence of computation, assuming that one can be found. By repeatedly applying cleavage operations, the individual configurations are extracted.
Another way to quantify the actual parallelism offered by the biomolecular paradigm is to use Amdahl’s law as a benchmark
where \(\varGamma \) is the actual speedup, \(\gamma _0\) is the parallelizable part of the task, and s is the speedup of that part. Thus, the size of the parallelizable part plays a crucial role and essentially defines how much can be actually sped up as for infinite s
6 Error Free Computations
6.1 Overview
Errors in the biomolecular computation can be classified into two broad categories. The first source of errors lies in the encoding of candidate solutions as there can be partial DNA matches which may eventually create false final solutions, in other words creating false positives. Moreover, the elementary operations described above may not be executed with absolute sucess due to a number of factors. This might create depending on the nature of the operations involved either false positives or false negatives.
6.2 Encoding
Concerning the solution encoding, an obvious approach might be to choose an encoding with controllable redundancy \(\beta _0\). Namely, each bit of information is expressed with \(\alpha _0 = 1+\beta _0\) bits in total and on average. Thus, the ratio \(\rho _0\) of original to redundant information is
For instance, in a graph problem with \(\left| {V}\right| \) if each vertex normally requires \(\left\lceil \log \left| {V}\right| \right\rceil \) bits to encode, then it would require \(\alpha _0\) times as much.
Regarding the number of false negatives or false positives, it can be argued that it is proportional to the partial matches of DNA strings. While there is a case where an intended match of two DNA strands \(s_1\) and \(s_2\) fails, strand mismatches denoted by \({s_1}\not \parallel {s_2}\) can be considered accurate for the most part. On the contrary, partial matches either from left or from right are with high probability indicators of failed operations, especially if the two strands overlap in only a few bases. For a single biomolecular step in the same test tube \(T_0\), let
denote respectively the number of DNA perfect matches, right and left matches in k bases, middle matches and no matches. Thus, in a single biomolecular step the ratios of true matches \(Q^+\) and true mismatches \(Q^-\) are
Another way to assess the reliability of a sequence of \(n_0\) biomolecular operations is the following. Let \(\pi _k\) be the success probability of each operation. The success probability \(\pi ^+\) by the product rule, assuming operation independence, is
Thus, if \(\mu _1 \,\le \, \pi _k \,\le \, \mu _2\), then the following bounds can be derived
Perhaps a more accurate way to assess the average value of \(\pi ^+\) is to consider the geometric mean of the sequence
6.3 Trials
Due to the nature of the actual biological operations, they are not always executed with abolute correctness. This can be attributed to a number of reasons including the age, technology, and condition of lab equipment, the chemistry and quantity of DNA strands themselves, the nature of bonds between strands, and lab conditions such as radiation and electomagnetic pulses of various frequences. As a result, a strand encoding an invalid solution can emerge from the test tubes or a strand containing the solution can be missed in them [7, 8].
The geometric distribution models sequences of test outcomes where the probability of success is \(p_0\). Its probability mass function is defined as
The interpretation of X is that it models the number of failed attempts of an experiment before the single successful outcome of that experiment. The mean value of this distribution is readily calculated in equation (11).
The \({{\mathrm{logit}}}\left( {\cdot }\right) \) function is the inverse of the sigmoid function extensively used to train deep recurrent and tensor stack networks [14] expresses the logarithm of the odds of a single Bernoulli trial. Also is a special case of a link function to the generalized linear model and leads to the logistic regression, commonly found in deep learning applications.
The variance of X is its mean value scaled by the success probabilty \(p_0\). This is coherent with intuition, since the larger \(p_0\), the fewer attempts are required to achieve success and the number of failed attempts will be close to zero.
7 Volume Considerations
The major practical limitation of biomolecular computation is the volume necessary to represent each candidate solution. This is limited not only by the volume of the test tube, but also by the encoding which, in turn, implies a redundancy rate to ensure higher operation success probabilites. If \(V_0\) is the hard limit for the test tube volume, \(\alpha _0\) the encoding redundancy factor, n is the number of candidate solutions, and \(\ell _0\) the solution length, then
where \(\epsilon _0\) is a constant which depends on a number of diverse factors such as lab temperature and tube technology. If a biomolecular computation requires \(J_0\) steps to complete, then the required volume grows exponentially. Then, the above limit should be modified as
8 Conclusions
This survey explores the foundations, applications, and limits of biomolecular computation which represents an alternative computational paradigm. The latter is based on the handling of potentially very long strands of ordinary DNA using standard biology lab operations such as annealing, generating a complementary strand, selecting a strand, or concatenating two strands. These strands do not codify the inner workings of any living organism. Instead, they contain a suitably selected representation of a computational problem. By selecting an appropriate representation of an input instance, a plethora of various output instances are created by a series of biological operations. Although a fraction of the abovementioned output instances may contain incomplete operations and must be removed by cleansing operations, their majority will contain with high probability a solution.
Notes
- 1.
OEIS sequence A000012.
References
Adleman, L., Cheng, Q., Goel, A., Huang, M.D., Kempe, D., De Espanes, P.M., Rothemund, P.W.K.: Combinatorial optimization problems in self-assembly. In: STOC, pp. 23–32 (2002)
Adleman, L.M.: Molecular computation of solutions to combinatorial problems. Nature 369 (1994)
Amos, M., Dunne, P.E.: DNA simulation of Boolean circuits. In: Proceedings of 3rd Annual Genetic Programming Conference, pp. 679–683 (1997)
Amos, M., Gibbons, A., Dunne, P.E.: The complexity and viability of DNA. In: Biocomputing and emergent computation: Proceedings of BCEC97 (1997)
Baum, E.B., Boneh, D.: Running dynamic programming algorithms on a DNA computer. DNA Based Comput. II(44), 77–80 (1999)
Beigel, R., Fu, B.: Solving intractable problems with DNA computing. In: IEEE Conference on Computational Complexity, p. 154 (1998)
Bijlani, R., Cheng, Y., Pearce, D.A., Brooks, A.I., Ogihara, M.: Prediction of biologically significant components from microarray data: independently consistent expression discriminator (ICED). Bioinformatics 19(1), 62–70 (2003)
Boneh, D., Dunworth, C., Lipton, R.J., Sgall, J.: Making DNA computers error resistant. DNA Based Comput. II(44), 163–170 (1996)
Boneh, D., Dunworth, C., Lipton, R.J., Sgall, J.: On the computational power of DNA. Discrete Appl. Math. 71(1–3), 79–94 (1996)
Carroll, S.: A complete programming environment for DNA computation. In: Workshop Non-Silicon Comp. NSC-1, pp. 46–53 (2002)
Dassen, J.: Molecular computation and splicing systems. Master’s thesis, Leiden University (1996)
Deaton, R., Murphy, R.C., Rose, J.A., Garzon, M., Franceschetti, D.R., Stevens, S.: A DNA based implementation of an evolutionary search for good encodings for DNA computation. In: International Conference on Evolutionary Computation, pp. 267–271. IEEE (1997)
Díaz, S., Esteban, J.L., Ogihara, M.: A DNA-based random walk method for solving k-SAT. In: Condon, A., Rozenberg, G. (eds.) DNA 2000. LNCS, vol. 2054, pp. 209–220. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44992-2_14
Drakopoulos, G.: Knowledge mining with tensor algebra. Tech. rep., Ionian University October 2017. https://doi.org/10.13140/RG.2.2.25548.92803
Drakopoulos, G., Kanavos, A., Karydis, I., Sioutas, S., Vrahatis, A.G.: Tensor-based semantically-aware topic clustering of biomedical documents. Computation 5(3), 34 (2017)
Drakopoulos, G., Kanavos, A., Tsakalidis, A.: A Neo4j implementation of fuzzy random walkers. In: SETN May 2016
Drakopoulos, G., Kanavos, A., Tsakalidis, K.: Fuzzy random walkers with second order bounds: an asymmetric analysis. Algorithms 10(2), 40 (2017)
Dunne, P.E., Amos, M., Gibbons, A.: Boolean transitive closure in DNA (1998)
Feynman, R.P.: There is plenty of room at the bottom. Eng. Sci. 23(5), 22–36 (1960)
Fu, B., Beigel, R.: Length bounded molecular computing. BioSyst. 52(1), 155–163 (1999)
Gehani, A., Reif, J.: Micro flow bio-molecular computation. Biosyst. 52(1), 197–216 (1999)
Gorban, A.N., Gorbunova, K.O., Wunsch, D.C.: Liquid brain: the proof of algorithmic universality of quasichemical model of fine-grained parallelism. Neural Netw. World 11(4), 391–412 (2001)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Henaut, A., Contamine, D.: Computation with DNA. Tech. rep., Rapport de recherche - Institut national de recherche en informatique et en automatique (1996)
Hinze, T., Sturm, M.: Towards an in-vitro implementation of a universal distributed splicing model for DNA computation. In: Proceedings of Theorietag pp. 185–189 (2000)
Hoheisel, J.D., Vingron, M.: DNA chip technology. Biospektrum 4, 17–20 (1998)
Kanavos, A., Drakopoulos, G., Tsakalidis, A.: Graph community discovery algorithms in Neo4j with a regularization-based evaluation metric. In: WEBIST, April 2017
Ming-Yang, K., Ramachandran, V.: DNA self-assembly for constructing 3D boxes. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 429–441. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45678-3_37
Karp, R.M., Kenyon, C., Waarts, O.: Error resilient DNA computation. In: SODA, pp. 458–467 (1996)
Kontopoulos, S., Drakopoulos, G.: A space efficient scheme for graph representation. In: ICTAI. IEEE, November 2014
LaBean, T.H., Yan, H., Kopatsch, J., Liu, F., Winfree, E., Reif, J.H., Seeman, N.C.: Construction, analysis, ligation, and self-assembly of DNA triple crossover complexes. J. Am. Chem. Soc. 122(9), 1848–1860 (2000)
Lipton, R.J.: Speeding up computations via molecular biology. DNA Based Comput. 27, 67–74 (1995)
Narayanan, A., Zorbalas, S.: DNA algorithms for computing shortest paths. In: Proceedings of Genetic Programming, vol. 718, p. 723 (1998)
Ogihara, M.: Relating the minimum model for DNA computation and Boolean circuits. In: Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1817–1821. Morgan-Kaufmann Publishers Inc. (1999)
Ogihara, M., Ray, A.: Circuit evaluation: Thoughts on a killer application in DNA computing. Comput. Bio-Molecules. Theor. Exp. 111–126 (1998)
Ogihara, M., Ray, A.: DNA-based self-propagating algorithm for solving bounded-fan-in Boolean circuits. Genet. Program. 98, 725–730 (1998)
Ogihara, M., Ray, A.: The minimum DNA computation model and its computational power. In: Unconventional Models of Computation, pp. 309–322 (1998)
Ogihara, M., Ray, A.: Executing parallel logical operations with DNA. In: Proceedings of the 1999 Congress on Evolutionary Computation, vol. 2, pp. 972–979. IEEE (1999)
Ogihara, M., Ray, A.: Simulating Boolean circuits on a DNA computer. Algorithmica 25(2–3), 239–250 (1999)
Pisanti, N.: DNA computing: a survey. Bull. EATCS 64, 188–216 (1998)
Qiu, Z.F., Lu, M.: Take advantage of the computing power of DNA computers. In: Rolim, J. (ed.) IPDPS 2000. LNCS, vol. 1800, pp. 570–577. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45591-4_78
Reif, J.H.: Paradigms for biomolecular computation. In: First International Conference on Unconventional Models of Computation, pp. 72–93 (1998)
Reif, J.H.: Parallel biomolecular computation: models and simulations. Algorithmica 25(2), 142–175 (1999)
Reif, J.H.: Successes and failures. Science 296, 478–479 (2002)
Reif, J.H.: The emergence of the discipline of biomolecular computation in the US (2002). http://citeseer.nj.nec.com/reif02emergence.html
Reif, J.H.: The design of autonomous DNA nanomechanical devices: walking and rolling DNA. In: Hagiya, M., Ohuchi, A. (eds.) DNA 2002. LNCS, vol. 2568, pp. 22–37. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36440-4_3
Rooß, D.: Recent developments in DNA computing. In: Proceedings of the 27th International Symposium on Multiple Valued Logic, pp. 3–9 (1997)
Rothemund, P.W., Winfree, E.: The program-size complexity of self-assembled squares. In: STOC, pp. 459–468. ACM (2000)
Wang, B.: Implementation of a dynamic programming algorithm for DNA Sequence alignment on the Cell Matrix architecture. Master’s thesis, Utah State University, Department of Computer Science (2002)
Wąsiewicz, P., Borsuk, P., Mulawka, J.J., Węgleński, P.: Implementation of data flow logical operations via self-assembly of DNA. In: Rolim, J., et al. (eds.) IPPS 1999. LNCS, vol. 1586, pp. 174–182. Springer, Heidelberg (1999). https://doi.org/10.1007/BFb0097898
Yan, H., LaBean, T.H., Feng, L., Reif, J.H.: Directed nucleation assembly of DNA tile complexes for barcode-patterned lattices. Proc. Natl. Acad. Sci. 100(14), 8103–8108 (2003)
Acknowledgments
The financial support by the European Union and Greece (Partnership Agreement for the Development Framework 2014–2020) under the Regional Operational Programme Ionian Islands 2014–2020 for the project “Smart vine variety selection and management using ICT - EYOINOS” is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 IFIP International Federation for Information Processing
About this paper
Cite this paper
Drakopoulos, G., Tsolis, D., Stefani, A., Mylonas, P. (2018). The Biomolecular Computation Paradigm: A Survey in Massive Biological Computation. In: Iliadis, L., Maglogiannis, I., Plagianakos, V. (eds) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in Information and Communication Technology, vol 520. Springer, Cham. https://doi.org/10.1007/978-3-319-92016-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-92016-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92015-3
Online ISBN: 978-3-319-92016-0
eBook Packages: Computer ScienceComputer Science (R0)