Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction to DNA

In 1953, Rosalind Franklin and her student R. G. Gosling published their crystal structure of DNA upon which our current models of DNA are based. As discussed in her 1953 paper [36], “DNA is a helical structure” with “two co-axial molecules.” The “co-axial molecules,” shown as blue ribbons in Fig. 1a, refer to a chain of phosphate groups connected via sugar groups. The sugar groups are shown in red in Fig. 2. Each sugar group is connected to a base: A, T, G, or C. These bases spell out our genetic information, i.e., they form our DNA sequence. Using Franklin’s data without her knowledge, Watson and Crick came to similar conclusions regarding the structure of DNA and published their model in the same issue of Nature [90].

Fig. 1
figure 1

(a) Structure of DNA. Per Franklin and Gosling, the “period is 34 Å” and “one repeating unit contains ten nucleotides on each of two co-axial molecules.” They conclude “The phosphate groups lie on the outside of the structural unit, on a helix of diameter about 20 Å” and “the sugar and base groups must accordingly be turned inwards towards the helical axis.” [36]. The four bases are called adenine (A), thymine (T), cytosine(C), and guanine (G). Figure courtesy of the National Library of Medicine (NLM). (b) Computer model of a DNA molecule

Fig. 2
figure 2

Chemical structure of DNA. The sugars are shown in red. Note that the pairing of backbone strands is antiparallel (Figure from [50])

In most cases, the base A pairs with the base T while the base G pairs with the base C [90]. Thus, knowing the sequence of one strand of double-stranded DNA (dsDNA) means knowing the sequence of both strands. However, the two strands making up dsDNA are read in opposite directions. Rosalind Franklin noted that the two sugar–phosphate backbones are antiparallel [35]. A sugar residue contains five carbons that are numbered from 1 to 5, as shown in Fig. 2 for the sugar in the upper left corner, which is connected to the base A. The 5 carbon of this sugar is circled in green. Its 3 carbon is connected via a phosphate bond to the 5 carbon of the next sugar (which is connected to the base G). Hence the chemistry of this connection can be used to assign an orientation to a DNA strand. Sequences are read from 5 to 3. Thus the strand on the left is read from top to bottom, and thus its sequence is AGCTC. The direction of the strand on the right goes from bottom to top. Hence its sequence is read GAGCT. Thus both AGCTC and GAGCT refer to exactly the same double-stranded DNA sequence.

In the laboratory, molecular biologists often work with linear DNA that has sticky ends. The end of the DNA is sticky if the portion at the end is single-stranded, as shown in Fig. 3a. This means that there are unpaired bases. If linear DNA contains two sticky ends that have complementary sequences, then if the DNA is sufficiently long, the linear DNA will circularize to form nicked circular DNA as shown in Fig. 3b. The DNA is called nicked because the phosphate backbone is not closed. A protein called ligase is needed to create a phosphodiester bond to close the nicks to form closed circular DNA (Fig. 3c). This closed DNA can be modeled by an annulus. Since the phosphate backbones are antiparallel, DNA cannot form a Möbius band (under normal circumstances). DNA has a preferred twist of about 10.5 base pairs per turn [36, 68, 85]. Closed circular DNA is called relaxed if it is as close as possible to its preferred twist. In nature, DNA is usually underwound, and hence it supercoils negatively (Fig. 3d). Since the DNA is underwound, the two strands are easier to pull apart for replication or transcription. For an elementary introduction to DNA, see [12]. For more on DNA topology, see [4].

Fig. 3
figure 3

(a) Linear DNA with sticky ends. (b) Nicked circular DNA. (c) Closed circular DNA. (d) Atomic-force microscopy image of supercoiled DNA (Unpublished data courtesy of Dr. Alexandre Vetcher)

2 DNA Knots and Topoisomerase

There are many beautiful knot tables in the literature and online. For an excellent introduction to knot theory, see [1]. Knots were first tabulated by Tait in the late 1800s [81]. The knot/link table shown in Fig. 4 was created by KnotPlot [72] based on data provided by Dale Rolfsen. The knot n k refers to the kth knot in the list of knots containing n crossings in their minimal crossing diagram. The superscript in the link table refers to the number of components. Mathematicians sometimes use the term “link” to include knots (and in rare cases, a link may be referred to as a knot). Molecular biologists normally use the term catenane when referring to links with at least two components. Most tables, including the one in Fig. 4, only contain prime knots. These are knots that cannot be subdivided into two or more simpler nontrivial knots. Knots that are not prime are called composite. Composite knots correspond to the operation of tying two separate knots sequentially in a piece of rope and closing the ends, as shown in Fig. 5a, where the individual prime knots are colored differently.

Fig. 4
figure 4

Knot/link table containing prime knots up to seven crossings and two- and three-component prime links up to six crossing. Twist knots include 31 (the trefoil knot), 41 (the Fig. 8 knot), 52, 61, and 72. Torus knots include 31, 51, and 71. The unknot, 01, can also be considered to be a twist knot and a torus knot

Fig. 5
figure 5

(a) The composite knot 31 #41. (b) Chiral pair of trefoil knots, 31 (in gold) and 31 (in blue)

A knot is called chiral if it cannot be smoothly deformed into its own mirror image. The mirror image of a knot K is denoted K . The simplest example of a chiral knot is 31; it is shown together with its mirror image, 31 , in Fig. 5b. Most knot tables like the one in Fig. 4 list only one enantiomer of a chiral pair. Knots that are not chiral are achiral. The simplest three achiral knots are 01, 41, and 63.

One of the most beautiful knot tables is the one created by topoisomerase I. This protein acting on nicked circular DNA, was able to create all different types of knots up to six crossings, 10 of the 16 possible seven-crossing knots, and a few eight- and nine- crossing knots [30]. There are two main types of topoisomerase. Type I topoisomerase will break and reconnect one strand of DNA, while type II topoisomerases will break and reconnect both strands of dsDNA. Thus type I topoisomerases can knot circular single-stranded DNA (ssDNA) as well as nicked DNA. Type II topoisomerases can knot dsDNA [89]. The normal function of topoisomerases is to keep DNA unknotted, unlinked, and properly supercoiled. For more on topoisomerase, see [87, 88].

3 Recombinases

The genome of any organism must possess two key characteristics. It must be stable enough to pass accurate information through inheritance, yet remain sufficiently dynamic to respond to selective environmental pressures. These requirements create a tension between genome integrity and flexibility. In all organisms, recombination systems are the principal mechanism that regulates genome stability. Chromosomal breakages and mutations stemming from problems in DNA replication or environmental stress can be repaired by recombination. Most organisms have multiple recombination pathways by which damage can be repaired, underscoring the importance of this process.

We focus here on several examples of site-specific recombination mechanisms. These processes involve interactions among specialized DNA-sequence elements that also contain specific binding sites for recombination proteins. The requirements for sequence specificity and specialized proteins distinguish site-specific recombination from general or homologous recombination, which can occur with arbitrary DNA sequences that share very high levels of sequence identity (see [58, 92] for reviews). Site-specific-recombination target sequences form the point of genetic exchange and usually are present in few copies in the genome. Often these sites are present in pairs; in the case of the bacteriophage-lambda integration site, only one copy is present in the E. coli genome. This extraordinary degree of specificity leads to precisely defined genetic rearrangements. In the examples considered here, the rearrangements that occur are essentially uniquely defined.

Another important attribute of a site-specific recombination locus is the polarity of the recombination site. These loci are frequently nonpalindromic and therefore have an intrinsic polarity (Fig. 6). Recombination normally occurs only when a pair of recombination sites has been juxtaposed in a particular spatial alignment, thereby imparting both positional and orientational specificity to these systems [37, 73]. This specificity has important biological consequences; moreover, the site-orientation specificity leads to the formation of specific DNA topologies in the recombination products. If two recombination sites are oriented in opposite polarities on a circular DNA molecule as shown in Fig. 6a, then the sites are said to be inversely repeated. Recombination on inversely repeated sites is called an inversion because it results in the inversion of one of the DNA segments between the two recombination sites with respect to the other DNA segment. In Fig. 6b, the two recombination loci are oriented in the same direction and are thus called directly repeated. Recombination on directly repeated sites results in the deletion (also called excision) of a DNA segment, changing the number of components of the substrate. The reverse reaction is called integration.

When supercoiled DNA substrates are used in reconstituted in-vitro (i.e., in the test tube) recombination reactions, it is possible to examine the topological changes that take place during recombination. For intramolecular recombination reactions, supercoiled plasmid substrates bearing inversely oriented sites generate knotted recombination products, whereas supercoiled substrates containing directly repeated sites generate topologically linked circles called catenanes (Fig. 6). The knots and catenanes that are formed during recombination are never random; instead, recombination generates a highly restricted subset of all the possible knotted or catenated structures that can be formed. For example, all of the knots with up to 13 crossings are known – there are over 12,000 topologically distinct knots. Integrative recombination on a circular substrate with inverted sites yields only seven of the possible knots containing up to 13 crossings, each containing an odd number of crossings. Among all possible recombination mechanisms that can lead to the formation of a knotted DNA product, the formation of this particular set of observed products can be ascribed uniquely to a particular mechanism. The topological specificity of site-specific recombination systems has been exploited to great effect in unraveling the mechanisms of many site-specific recombinases (see, e.g., [10, 21, 23, 34, 41, 79, 80, 8284, 95]).

Fig. 6
figure 6

(a) Recombination on inversely repeated sites on a circular DNA molecule can result in knotted DNA. (b) Recombination on directly repeated sites changes the number of components of the DNA substrate

The complementarity of DNA strands normally plays a very limited role in site-specific recombination, more as a feature of specific recombinase–DNA interactions than a necessity for homologous pairing or strand exchange. Unlike other modes of recombination, site-specific recombination is conservative in that no DNA is gained or lost during the recombination reaction. This aspect of site-specific recombination applies both at the level of genetic information (recombination products are merely permutations of the original parental DNA) and at the level of actual DNA nucleotides (no DNA synthesis or nucleolytic degradation is involved).

All site-specific recombination systems that have been investigated to date fall into two superfamilies: the lambda-integrase and resolvase/invertase families. Particular examples from the lambda-integrase family are discussed below. The products of the lambda-integrase recombination reaction depend on the orientation and disposition of recombination sites; this variability permits systems such as lambda-integrative recombination to carry out excisive as well as integrative recombination in a highly regulated fashion. The two superfamilies are also distinct in terms of the intermediate structure of the DNA segments undergoing recombination; whereas lambda-integrase-type mechanisms proceed through a four-stranded DNA intermediate called a Holliday junction, the resolvase/invertase mechanisms do not. Site-specific recombination systems participate in a wide range of biological processes in both prokaryotes and eukaryotes: viral integration, antigenic variation, gene duplication and copy-number control, and the integration of antibiotic resistance cassettes. For a very nice review of site-specific recombination, including resolvase/invertase recombination, see [42].

3.1 Holliday Junctions

In 1964, Robin Holliday proposed that recombination could be mediated by a hypothetical DNA structure consisting of four polynucleotide strands associated by a single-stranded crossover (Fig. 7) [48]. This structure, later to be named the Holliday junction, has played a central conceptual role in models of recombination. A large body of evidence has accumulated in the intervening decades that substantiates the role of these junctions in both general recombination mechanisms [46, 93, 94] and those belonging to the lambda-integrase superfamily of site-specific recombinases [3, 21].

Fig. 7
figure 7

Molecular model of a Holliday-like four-way DNA junction. The DNA sequences of the four strands in this structure lack the symmetry of a true Holliday junction, thereby inhibiting migration of the junction’s branch point

Although the existence of this intermediate structure is no longer questioned, the details of Holliday junction geometry remain controversial. Since the early 1990s, a wide range of biochemical and biophysical tools have been used to characterize the conformation of these recombination intermediates, both as complexes with recombination proteins [6, 15, 39, 43, 49, 84] and as free DNA molecules [13, 14, 16, 1820]. Many studies of protein-free junctions were focused on the structure and dynamics of immobile four-way DNA junctions, in which four synthetic DNA strands designed with specific patterns of homology have been annealed together (Fig. 7) [51]. The limited homology among the DNA strands fixes the branch point of the junction; thus, such structures lack the ability to undergo branch migration, an essential isomerization step in general recombination. The extent to which the behavior of such immobile analogs actually mimics that of mobile junctions is an interesting issue that has remained largely unaddressed. However, it is clear that even immobile four-way junctions are conformationally quite flexible, a feature that is likely to be, if anything, more pronounced in fully mobile junctions [77]. Several groups have succeeded in obtaining high-resolution X-ray structures of four-way junctions [31, 61, 62]. These high-resolution structures exemplify many features that are consistent with those of immobile junctions based on studies in solution.

3.2 λ-Int: Integration and Excision of Phage Genomes

The λ-integrase (λ-Int) system is vital to the lysogenic stage of the life cycle of bacteriophage λ and is one of the most intensively studied site-specific recombination systems. A notable feature of this system is the nonsymmetrical nature of the integrative and excisive recombination reactions: although the strand exchange activities are identical for both integration and excision of the phage-λ genome, each reaction has distinct requirements for specific DNA sequences at the recombining loci and the subsets of protein cofactors involved in recombination.

Integration of phage λ occurs at a unique 25-bp site, termed attB, on the 4.6-Mbp E. coli chromosome. The catalytic activity for strand exchange resides in the λ-encoded integrase protein (Int), which functions in concert with a number of DNA-binding accessory proteins: the integration host factor (IHF) and factor for inversion stimulation (Fis) proteins of E. coli, and the λ-excisionase (Xis), which is phage-encoded. In contrast to the attB site, which by itself has negligible affinity for the recombination proteins, the recombination locus on the phage genome, attP, is about 250 bp in size and has multiple binding sites for Int and the accessory factors. Integrative recombination most likely involves assembly of Int and IHF proteins to form an organized nucleoprotein structure called the intosome [5], which subsequently captures a protein-free attB site during synapsis [67]. The products of the integrative recombination reaction are a functionally distinct pair of new recombination sites called attL and attR that are no longer competent to participate in subsequent rounds of integrative recombination. Instead, these sites are substrates for excisive recombination, a reaction that requires Fis and Xis in addition to Int and IHF. By coupling recombination to the intracellular levels of specific protein factors, tight regulation of the phage-λ life cycle can be achieved in vivo. The topology of λ-Int recombination is discussed in [21].

3.3 Cre and XerC/D: Excision and Resolution of DNA Dimers

The genome of bacteriophage P1 is a 90-kbp circular molecule; as with all circular genomes, daughter molecules must be decatenated after replication [91]. This process is facilitated by a protein called Cre recombinase, a phage-encoded member of the λ-Int superfamily. The Cre mechanism acts on specific sites, denoted loxP in a multistep reaction scheme that involves fusion followed by resolution (Fig. 8a). A common feature of the λ-Int superfamily is phosphoryl transfer based on a catalytic tyrosine residue. The enzymatic reaction progresses in two distinct stages (Fig. 8b): an initial round of strand cleavage followed by DNA strand exchange to form a stable recombinase-bound Holliday junction. The junction is resolved by a second set of tyrosine-catalyzed cleavage and strand-exchange steps that lead to recombinant products.

Fig. 8
figure 8

(a) Unlinking by Cre recombinase: Cre activity initially generates circular concatamers from linked circular substrates. This fused intermediate is subsequently unlinked via an excision reaction that is also mediated by Cre. (b) Mechanism of Cre acting on a pair of loxP target sequences. The loxP site consists of two inversely repeated, 13-bp Cre-binding sequence elements (cyan) flanking an 8-bp spacer region (yellow). Recombination takes place via ordered and reversible strand cleavage, exchange, and resolution reactions. The central intermediate is a Holliday junction, shown in an open, square planar conformation similar to that in Fig. 7

The wild-type loxP target site for Cre is a 34-bp DNA sequence that consists of two 13-bp inverted repeats flanking an asymmetric 8-bp core region [38]. The core sequence confers an overall directionality on the loxP site. Recombination of directly repeated loxP sites leads to the exclusive formation of deletion products, whereas recombination of inversely repeated loxP sites results in an inversion of the intervening DNA sequence with respect to the parental substrate [84].

Normal replication of the E. coli chromosome yields intermediate forms consisting of multiply linked circular DNA molecules. The linked intermediates are resolved to unlinked monomers by the action of type II topoisomerases, most notably topo IV [78]. However, homologous recombination during replication generates concatenated dimers at a significant frequency; such structures cannot be resolved by topoisomerases. These dimers are instead resolved by the XerC/D system, which also belongs to the λ-Int superfamily. The activity of Xer is tightly coupled to that of FtsK, a molecular machine that controls the transport of DNA across the intercellular septum during cell division [66]. Cells lacking functional XerC or XerD genes cannot properly segregate daughter chromosomes. These cells develop an anomalous filamentous-growth phenotype, in which cells elongate without dividing [59].

The target site for Xer recombination on the E. coli chromosome is a 28-bp sequence called dif. This sequence is located opposite the chromosomal replication origin and consists of a pair of 11-bp inverted repeats that flank a central 6-bp spacer region. Unlike the loxP site of Cre, the dif repeats are targeted by the Xer C and D subunits. In other respects, however, the similarities between Xer and Cre are more striking than their differences. Like the Cre–loxP mechanism, Xer recombination proceeds via a Holliday-junction intermediate [7].

Plasmids in E. coli can also become dimerized during replication. Resolution of these concatameric forms occurs via Xer activity at plasmid sequences such as cer and psi. These sequences contain the 28-bp core sequence from the chromosomal dif element in addition to flanking sequences that bind the accessory proteins PepA and either ArgR (in cer) [74] or ArcA (in psi) [8]. These proteins are required for recombination and play a role in organizing the active synaptic complex of proteins and DNA sequences needed for site pairing and strand exchange. This also ensures that recombination occurs exclusively via an intermolecular pathway involving directly repeated target sites. A similar mode of synaptic-complex organization occurs in gamma/delta resolvase recombination (based on a serine recombinase) and accounts for exclusivity of deletion in that system [71].

4 The Tangle Model for Protein Action

In this section, we start by giving some mathematical background on tangles and then describe how tangles are applied to study protein action. An N-string tangle is a collection of N disjoint arcs properly embedded in a three-dimensional ball (3-ball) that have their endpoints fixed on the 2-sphere boundary of the 3-ball. Examples of 2-string tangles are shown in Fig. 9. A tangle is rational if it can be formed from a zero crossing tangle by moving the endpoints in an arbitrary fashion with the constraint that they remain on the boundary of the 2-sphere and the arcs are confined to stay within the 3-ball. It is common practice and sufficient to consider only 180 rotations about the horizontal and vertical axes. Conway introduced rational tangles in a paper that was concerned with enumerating prime knots and links [17]. He discovered a curious relationship between rational tangles and the set of extended rational numbersFootnote 1 (hence the name rational tangle). He showed that two rational tangles are equivalent (in the sense that one may be converted into the other, keeping the boundary of the 3-ball fixed) if and only if the continued fractions corresponding to the individual tangles are equal to the same extended rational number. The details of this proof are beyond the scope of this chapter, however, a few examples will be illustrative. First of all, let us consider how to associate a tangle with a continued fraction. We denote a tangle by a list of integers \((c_{1},c_{2},\ldots,c_{n})\), n odd, and create this tangle by starting with the 0 tangle and rotating about the horizontal axis by \(c_{1} \times 18{0}^{\circ }\), followed by a rotation about the vertical axis by c 2 × 180, alternating the axes in this way until we reach the end of the list. Since n is odd, we always end in horizontal twists. For the tangle \((c_{1},c_{2},\ldots,c_{n})\), we assign the rational number

$$\displaystyle{r = c_{n} + \frac{1} {c_{n-1} + \frac{1} {\ldots + \frac{1} {c_{1}} } }.}$$

Figure 10 shows several rational tangles and one nonrational tangle. We can see that tangle (2, 3, 4) and tangle \((-1,-1,-4,1,3)\) have the same rational number 30∕7, and therefore must be equivalent tangles from Conway’s theorem.

Fig. 9
figure 9

Various tangles. (a) The 0 tangle. (b) The infinity tangle. (c) the + 1 tangle. (d) The − 1 tangle. (e) The \(\frac{1} {2} + -\frac{1} {3}\) tangle

Fig. 10
figure 10

Four rational tangles with their associated rational numbers and one nonrational tangle. (a) The (2) tangle. (b) The (2, 3, 0) tangle = \(\frac{2} {7}\) tangle. (c) The (2, 3, 4) tangle = \(\frac{30} {7}\) tangle. (d) The \((-1,-1,-4,1,3)\) tangle = \(\frac{30} {7}\) tangle. Note that the signs of the integers in \((-1,-1,-4,1,3)\) determine the handedness of the crossings. (This tangle can be created in KnotPlot by using the command: tangle 1z1z4z13o.) (e) A nonrational tangle

Tangles that cannot be formed from the operations described above are known as nonrational tangles. Informally, these are tangles whose construction would require one of the endpoints of an arc to leave the 2-sphere, and pass through the 3-ball and around one of the arcs inside. An example is shown in Fig. 10e. Nonrational tangles are not uncommon in DNA; however, their analysis is considerably more complicated than the rational case, and we will not discuss them further in this chapter.

Ernst and Sumners [34] were the first to apply tangles to DNA biology. In their model, a protein complex binding N segments of DNA is represented by a tangle ball, and the DNA itself by the disjoint arcs. Of course, this is a highly simplified model of the binding of proteins with DNA. A sphere is a very rough approximation to a protein complex, and the DNA is likely to exist in more complicated conformations than the arcs seen in the above illustrations. For example, the DNA likely winds around the tangle ball rather than being embedded within the 3-ball.

If at least two DNA segments are bound in a protein–DNA complex, then this complex is referred to as a synaptic complex. The protein complex together with the segments of DNA bound by protein is called a synaptosome. In many cases it is possible to prove that a tangle modeling a synaptosome is rational (e.g., [22, 23, 3234, 47]), but there are also several biological reasons why rational tangles are the most likely models of synaptosomes. Although an upper bound for the number of DNA crossings that can be bound in a synaptosome has not yet been determined, it is believed that synaptosomes cannot be overly complicated. The simplest nonrational two-string tangles are the five-crossing tangles shown in Figs. 9e and 10e. Moreover, as protein complexes bind supercoiled DNA (Fig. 3d), rational tangles are likely models of synaptosomes, since such tangles are formed by adding twists. Lastly, a tangle is rational if and only if one can push the strings to lie on the boundary of the 3-ball so that the strings do not cross themselves on the 3-ball. Thus if DNA wraps around a protein complex without crossing itself and if the protein complex can be modeled by a topological sphere, the tangle modeling the synaptosome must be rational. Since DNA is negatively charged, it is unlikely to cross itself on the boundary of a protein. There are, however, protein complexes that can be modeled by higher-genus objects such as tori; but in all known cases, the protein–DNA complex can still be modeled by a spherical tangle.

Let K represent knotted circular DNA. For those who like to think of this circular DNA as living in S 3, then the tangle model of Ernst and Sumners [34] divides S 3 into two tangles (Fig. 11a). One tangle, B, models the synaptosome (i.e., the protein complex together with the portion of DNA bound by protein), whereas the unbound DNA is in the complementary tangle, U f . For simplicity, this is written as the tangle equation N(U f + B). Tangle addition, U f + B, corresponds to the operation shown in Fig. 11b. The numerator closure operation is shown in Fig. 11c. The numerator closure of a rational tangle is a rational knot/link (Fig. 11d). By rotating the vertical crossings in a rational tangle so that they appear horizontal, it is easy to see that rational knots/links are equivalent to 2-bridge knots/links (also known as called 4-plats); see the example in Fig. 11e. If ac ≥ 0, then two rational knots \(N(a/b) = N(c/d)\) are equivalent if a = c and \(b{d}^{-1} = 1\) mod a.

Fig. 11
figure 11

(a) N(U f + B). (b) Tangle addition, U f + B. (c) Numerator closure of a tangle N(B). (d) \(N(\frac{2} {7}) = N(2)\). (e) Same link as in (d), but shown in 2-bridge = 4-plat form

Let the tangle B represent the synaptosome before protein action, and let the tangle E represent it after protein action. Recall that the tangle U f represents the DNA not bound by protein. A protein action that changes the knot K 1 into the knot K 2 is represented by the system of two tangle equations (Fig. 12):

$$\displaystyle\begin{array}{rcl} N(U_{f} + B) = K_{1},& &{}\end{array}$$
(1)
$$\displaystyle\begin{array}{rcl} N(U_{f} + E) = K_{2}.& &{}\end{array}$$
(2)

The starting conformation of the DNA, K 1, is called the substrate, while K 2 is called the product. Different recombinases have different topological mechanisms. A tangle model of Cre recombination, is shown in Fig. 12b, and tangle model of Xer acting on psi sites, is shown in Fig. 12c. For a tangle model of Xer acting on Ftsk, see [76]. The software packages TangleSolve [70], TopoICE-R [25], and TopoICE-X [27] can be used to solve certain types of tangle equations and visualize the solutions.

Fig. 12
figure 12

(a) Protein action is represented by the tangle equations \(N(U_{f} + B) = K_{1}\), \(N(U_{f} + E) = K_{2}\). (b) A tangle model for Cre recombination. (c) Tangle model for Xer acting on psi sites

In the original tangle model, the tangle B is divided into a sum of two tangles \(B = U_{b} + P\), where the tangle P represents the local action of the protein, and the tangle U b represents protein-bound DNA whose conformation is unchanged by protein action. Often proteins act on very short segments of DNA, and thus one can often assume that P is a zero-crossing tangle. Protein action is represented by replacing the tangle P by the tangle R, as shown in Fig. 13 and modeled by the equations \(N(U_{f} + U_{b} + P) = K_{1}\), \(N(U_{f} + U_{b} + R) = K_{2}\). However, this model assumes that the local protein action can be separated from the remaining protein-bound DNA by a disk that intersects the strings of tangle B in exactly two points. However, this eliminates potential biologically relevant models. A more general tangle model is shown in Fig. 13b [23]. Note that the tangle B is transformed into the tangle E via replacing the subtangle P with the subtangle R. For more on rational subtangle replacement, see [9].

Fig. 13
figure 13

(a) The Ernst and Sumners model of protein action is represented by the tangle equations \(N(U_{f} + U_{b} + P) = K_{1}\), \(N(U_{f} + U_{b} + R) = K_{2}\). (b) A more general tangle model

5 Concluding Remarks

We have only touched the surface regarding the topological modeling of protein–DNA complexes. In addition to modeling the action of proteins that can knot circular DNA, tangles can also be applied to probe the structure of multiple DNA segments in any stable protein–DNA complex regardless of the action (or inaction) of the protein via an experimental technique called difference topology [2, 26, 28, 40, 45, 5257, 6365]. This technique uses a recombinase or topoisomerase to trap crossings bound by the protein under study. This requires knowledge regarding how the recombinase or topoisomerase acts. However, there are still unsolved problems regarding how these proteins act. With respect to recombination, we can only determine all solutions to the tangle equations modeling these reactions in special cases, even when the substrate and product are both rational knots/links (e.g. [10, 24, 29, 47]). Although we can solve all tangle equations modeling topoisomerase action when the substrate and product are rational knots/links [24], there are still questions regarding preferred pathways [11, 44, 60, 69, 75, 86].