Introduction

Topoisomerases are important enzymes that support cell viability and chromosome topology. They have the ability to cut, shuffle and reconnect DNA strands by adding or removing DNA supercoils and disentangle DNA segments (Champoux 2001; Wang 1996). Type IIA topoisomerases transiently cleave two strands of DNA and include bacterial and archaeal gyrase, bacterial topoisomerase IV and eukaryotic topoisomerase II. Some viruses also encode for type II topoisomerases, such as some Nucleocytoplasmic Large DNA Viruses (NCLDV) and T4-like bacterioviruses (Forterre et al. 2007; Gadelle et al. 2003; Schoeffler and Berger 2008). Type II topoisomerases have been identified in early diverging lineages of eukaryotes, such as kinetoplastid protozoans, Giardia lamblia and Plasmodium falciparum (Chakraborty and Majumder 1987; Cheesman et al. 1994; De et al. 2005; Strauss and Wang 1990). Previous studies have shown that most eukaryotes have a single type IIA topoisomerase (TOPIIA), with the notable exception of vertebrates that have two paralogues, topoisomerase IIα (TOP2A) and topoisomerase IIβ (TOP2B) (Austin and Fisher 1990; Drake et al. 1987). In humans, TOP2A is encoded by the TOP2A gene on chromosome 17q21-22 (Tsai-Pflugfelder et al. 1988) and TOP2B by TOP2B gene on chromosome 3p24 (Austin et al. 1993; Jenkins et al. 1992; Tan et al. 1992). Both proteins display similar structures (Fig. 1) and biochemical activities but have different biological roles (Cornarotti et al. 1996; Drake et al. 1989; Leontiou et al. 2003; Marsh et al. 1996; Wang 2002).

Fig. 1
figure 1

TOPIIA structural organization. A Arrangement of human TOP2A with domains and other protein regions labelled and coloured. CTD, C-terminal domain. B Cartoon representation of the human TOP2A and TOP2B structures with major domains highlighted (Color figure online)

TOPIIA activity relies on a mechanism that involves the controlled association and dissociation of three subunit-dimerization interfaces, or ‘gates’, termed the N-gate, DNA-gate and C-gate, which guide the physical movement of one DNA duplex through another (Berger et al. 1996; Cabral et al. 1997; Roca et al. 1996; Roca and Wang 1992, 1994; Wigley et al. 1991). TOPIIA can also be described as including three structural domains (Fig. 1): an N-terminal ATPase domain (NTD), a central catalytic DNA-binding/cleavage domain (CD) and a C-terminal domain (CTD). The N-terminal gate (ATPase domain) is composed by two elements: an ATP binding domain of the GHKL superfamily of proteins, including the Bergerat fold (Dutta and Inouye 2000), and an adjacent domain called the transducer (Classen et al. 2003; Corbett and Berger 2003; Wigley et al. 1991) that is thought to transmit signals (Corbett and Berger 2003; Kingma et al. 2000) from the N-gate to the DNA-gate (where DNA is bound and cleaved). The DNA-gate is formed by a divalent metal-binding TOPRIM domain and a WHD similar to that typified by the catabolite activation protein (Aravind et al. 1998; Berger et al. 1996; Gajiwala and Burley 2000; McKay and Steitz 1981). The WHD contains the catalytic tyrosine and cooperates with the TOPRIM fold to cleave DNA, generating a pair of 5′cuts staggered 4 bp apart from each other on opposite strands (Liu et al. 1983; Morrison and Cozzarelli 1979; Sander and Hsieh 1983). Adjacent to the WHD is a fold termed the “shoulder” or “tower” (Cabral et al. 1997), which also participates in DNA binding (Dong and Berger 2007). The third dimerization interface, the C-gate, is a coiled-coil element capped at its distal end with a small globular domain that extends from the Tower. This interface is well conserved and serves as the primary dimer interface of the protein (Berger et al. 1996; Cabral et al. 1997; Corbett et al. 2005; Fass et al. 1999; Laponogov et al. 2007).

TOP2A and TOP2B have very similar structures resulting from a high degree of sequence homology (~ 70 to 80%) (Austin et al. 2018; Austin et al. 1993; Jenkins et al. 1992). The main differences reside in the C-terminal region, which is responsible for their different biological roles (Kozuki et al. 2017; Linka et al. 2007). The complete CTD crystal structure has not been determined, but secondary structure prediction suggests that it is structurally disordered (Broeck et al. 2021). The TOP2A CTD seems to act in the preferential relaxation of positive supercoils, whereas the equivalent region of TOP2B does not reveal a supercoil preference (McClendon et al. 2005). The C-terminal region contains nuclear localization signals and undergo extensive post-translational modifications (Lane et al. 2013; Lotz and Lamour 2020).

TOP2A is highly expressed during mitosis, being essential in proliferating cells and assists in chromosome segregation and replication (Akimitsu et al. 2003; Ali and Abd Hamid 2016; Cuvier and Hirano 2003; Grue et al. 1998; Niimi et al. 2001; Ye et al. 2010). On the other hand, TOP2B appears dispensable for cell proliferation, but regulates gene expression and is associated with developmental and differentiation events, in particular nerve growth and brain development (Austin et al. 2018; Bollimpelli et al. 2017; Ju et al. 2006; Lyu et al. 2006; Lyu and Wang 2003; McNamara et al. 2008; Tiwari et al. 2012; Yang et al. 2000). Mutations in TOP2B have been associated with B-cell immunodeficiency (Broderick et al. 2019; Papapietro et al. 2020), hearing loss (Xia et al. 2019) and intellectual disability (Lam et al. 2017). For instance, mice lacking TOP2A fail to develop beyond the 4–8-cell stage, whilst those without TOP2B exhibit a perinatal death due to defects in neuronal development (Akimitsu et al. 2003; Lyu and Wang 2003; Yang et al. 2000).

In addition to their vital cellular functions, TOPIIA are a target for some of the most active anticancer agents (Nitiss 2009). TOP2A is responsible for the anticancer effects of TOP2 inhibitors due to its high activity in proliferating cells. TOP2 poisons (e.g. etoposide, doxorubicin) increase the levels of TOP2–DNA covalent complexes, resulting in double-strand DNA breaks that can cause cell death. TOP2B is believed to be responsible for most of the secondary malignancies and cardiotoxicity caused by TOP2-targeting drugs due to its similar structure to TOP2A (Azarova et al. 2007; Chen et al. 2012; Haffner et al. 2010). Moreover, several mutations in TOP2A confer resistance to anticancer drugs (Nitiss and Beck 1996; Wu et al. 2011).

TOPIIA structure and function have been well studied in model organisms and humans, but often without including a comprehensive evolutionary analysis. Here, we address this limitation by providing a detailed evolutionary study of TOPIIA in animals. We provide a comprehensive phylogeny of TOPIIA in animals, including a detailed view of the duplication event that originated the TOP2A and TOP2B paralogues. We also identified the most conserved protein domains of functional relevance and assessed selective pressures governing the evolution of these important topoisomerases.

Materials and Methods

TOPIIA Sequences

We obtained TOPIIA protein sequences using the Ortho DB v10 (https://www.orthodb.org), which is a comprehensive catalogue of putative orthologues from more than 400 metazoan species (Kriventseva et al. 2019). We also used the protein–protein BLAST (blastp) to retrieve sequences from phyla that were not found in the Ortho DB, by using as query TOPIIA sequences from close phylogenetic groups. We excluded repeated sequences from the same species that showed 100% identity, which most likely represented different entries of the same sequence in the databases. We also removed short sequences with less than half of the average of TOPIIA length from further analyses as they can represent partial protein sequences derived from poorly assembled genomes. In some cases, contigs do not cover the complete genomic region where the protein is encoded, resulting in partial protein sequences. By this reason, some species lack one of the paralogues in our dataset, although it may be present in their genomes.

Denisovan and Neanderthal TOP2A and TOP2B sequences were downloaded from the UCSC Genome Browser (http://genome.ucsc.edu/) (Kent et al. 2002). We downloaded all BAM reads for tracks Denisova and Neanderthal Cntgs matching the Human Mar. 2006 (NCBI36/hg18) chr17:35,798,322-35,827,695 (TOP2A) and chr3:25,614,479-25,680,835 (TOP2B). The BAM reads from each track were then reassembled against the human TOP2A (NC_000017.11) and TOP2B (NC_000003.12) reference sequences using Geneious v2020.2.4 (http://www.geneious.com). We only considered a polymorphic position in archaic hominids when: (1) at least two reads overlap in that position; (2) the polymorphism represents more than 75% of all the reads and (3) the polymorphism is not at the end of a read.

TOPIIA Multiple Sequence Alignments

We aligned the TOPIIA protein sequences with MAFFT version 7 (Katoh et al. 2019). The following three multiple alignments of protein sequences were used in subsequent analyses: Metazoa (n = 389), Chordata TOP2A (n = 105) and Chordata TOP2B (n = 125). The conservation across the alignments was estimated using the percentage of pairwise identity (PI) calculated in the Geneious program. Note that PI constitutes the average percent identity calculated by comparing the base pairs at every site.

TOPIIA coding domain sequences (CDS) from chordates were obtained from the Ensembl Genome Server (Hunt et al. 2018), as multiple sequence alignments of TOP2A (ENSG00000131747) and TOP2B (ENSG00000077097) human orthologues. The orthologues were organized in four different alignments for either TOP2A or TOP2B: Chordata (n = 159), Actinopteri (n = 51), Aves (n = 11) and Mammalia (n = 86). For coherence, we included the same species in the alignments of TOP2A and TOP2B.

The Ensembl server includes two long transcripts for TOP2B: TOP2B-201 (ENST00000264331.9) with 5814 nucleotides and 1626 amino acids and TOP2B-204 (ENST00000435706.6) with 5389 nucleotides and 1621 amino acids. Here we used the longest transcript (TOP2B-201) and resulting protein sequence unless stated otherwise.

The sequence alignments are available at Mendeley Data (https://data.mendeley.com//datasets/h2xfj5fsxw/1).

Phylogenetic Analyses

The phylogenetic tree with all metazoan species (n = 389), considering Arabidopsis thaliana as outgroup, was obtained from the corresponding multiple alignment of protein sequences. We reconstructed a maximum likelihood (ML) phylogenetic tree with PhyML 3.0 (Guindon et al. 2010), implemented in the ATGC bioinformatics platform (http://www.atgc-montpellier.fr). The JTT+G+I substitution model of protein evolution was selected with the Smart Model Selection (SMS) v1.8.4 method implemented in PhyML (Lefort et al. 2017), under the Akaike Information Criterion (AIC). The branch support was evaluated with aBayes method (Anisimova et al. 2011). We analysed the duplication events in chordates with a phylogenetic tree based on the alignment of 34 CDS from Cephalochordata, Tunicata and Vertebrata species, considering Acanthaster planci (Echinodermata) as outgroup. Again, the ML tree was reconstructed with the ATGC bioinformatics platform, under the GTR+G+I substitution model and bootstrap based on 100 replicates. The resulting phylogenetic trees were edited with FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree) and TreeViewer v1.2.2 (https://treeviewer.org).

Evaluation of Selection

We evaluated molecular adaptation in TOP2A and TOP2B protein-coding sequence alignments with the nonsynonymous/synonymous substitution rate ratio (dN/dS) (Del Amparo et al. 2021; Jeffares et al. 2015). We started by identifying the best-fitting substitution model of DNA evolution with jModelTest2 (Darriba et al. 2012) and reconstructed a ML phylogenetic tree under the selected substitution model. Next, we estimated dN/dS under a ML method considering the reconstructed phylogenetic tree with the well-established evolutionary framework Hyphy (Kosakovsky Pond and Frost 2005; Kosakovsky Pond et al. 2020). In particular, we applied the single-likelihood ancestor counting (SLAC) method, which provides dN/dS estimation with accuracy similar to that obtained with other likelihood-based methods (Kosakovsky Pond and Frost 2005). We identified global (entire sequence) genetic signatures of selection and positively selected sites (PSSs). The difference dN − dS was also used to evaluate selection at the site codon level.

TOPIIA Protein Structure

We used the TOP2A domains and other regions previously described (Broeck et al. 2021) and we considered them also in TOP2B by aligning the human reference sequences of both proteins. The TOP2A protein structure with PDB (Protein Data Bank) (Berman et al. 2000) code 6ZY7 (Broeck et al. 2021) and TOP2B structure with code 5ZAD (Sun et al. 2018) were obtained with Mol* (Sehnal et al. 2021) and RCSB PDB. The structures were coloured according to the protein domains and main regions.

Results and Discussion

TOPIIA Proteins are Conserved Across Metazoa

The phylogenetic tree built with TOPIIA protein sequences placed Cnidaria at the root of Metazoa, which was expected considering that it was the only phylum in our dataset not belonging to Bilateria (Fig. 2, Supplementary Fig. S1). In addition to Cnidaria, our dataset included eight Protostomia and 13 Deuterostomia phyla. Our phylogeny supports the split of Protostomia in Ecdysozoa (those exhibiting moulting) and Spiralia or Lophotrochozoa (those having lophophores and trochophore larvae), although only Spiralia was retrieved as a monophyletic group. Within Ecdysozoa, our results did not support the existence of Cycloneuralia, a clade including Scalidophora (represented here by Priapulida) and Nematoida (represented here by Nematoda) (Dunn et al. 2008). Indeed, our phylogenetic tree placed Nematoida more related to Panarthropoda (represented here by Tardigrada and Arthropoda) than to Scalidophora (Priapulida) (Campbell et al. 2011; Pisani et al. 2013). Spiralia formed a well-supported monophyletic group including Annelida, Mollusca, Brachiopoda and Platyhelminthes. The relationships within Spiralia are poorly resolved and often a matter of debate (Dunn et al. 2014). The clustering of Brachiopoda with Platyhelminthes supports the hypothesis that Lophophorata, organisms with a rake-like feeding structure (represented here by Brachiopoda), forms a separate clade from Trochozoa, a group defined by trochophore larvae, including at least Annelida and Mollusca (Dunn et al. 2014; Nesnidal et al. 2013) (Fig. 2).

Fig. 2
figure 2

Phylogenetic analysis of TOPIIA in Metazoa. Maximum likelihood (ML) phylogenetic tree built with an alignment of 389 TOPIIA protein sequences from metazoans and considering Arabidopsis thaliana as outgroup. The branch support was estimated with aBayes, shown on the internal nodes. The scale bar indicates substitutions per site. Major monophyletic clades are collapsed for visualization purposes. The complete tree can be accessed in the supplementary material

The Deuterostomia clade includes a well-supported monophyletic group with Hemichordata and Echinodermata, which together form Ambulacraria. Within the Chordata, cephalochordates and tunicates diverged first and vertebrates formed a monophyletic group with the two paralogues TOP2A and TOP2B in different branches. In order to have a better resolution within Chordata, we built a phylogeny with all available TOPIIA CDS sequences from basal chordates (Fig. 3A). In the CDS tree, cephalochordates diverged first, with urochordates and vertebrates forming a sister group known as Olfactores, an arrangement that has now been widely accepted (Delsuc et al. 2006; Putnam et al. 2008; Satoh et al. 2014). We found particularly long branches in tunicates that suggests a high rate of molecular evolution, as previously noted for other genes (Delsuc et al. 2006; Tsagkogeorga et al. 2010). In particular, the extremely long branch of Oikopleura dioica (Fig. 3A) supports the claim that it is the fastest evolving metazoan recorded so far (Berna and Alvarez-Valin 2014; Denoeud et al. 2010).

Fig. 3
figure 3

Evolutionary history of TOPIIA in basal chordates. A Maximum likelihood (ML) phylogenetic tree built with an alignment of 34 TOPIIA coding domain sequences (CDS) from Cephalochordata, Tunicata and Vertebrata, considering Acanthaster planci (Echinodermata) as outgroup. The branch support was estimated with 100 bootstrap cycles, shown on the internal nodes. The scale bar indicates substitutions per site. The putative occurrence of two rounds of tetraploidization (1R and 2R) is indicated. B Pairwise identity between Petromyzon marinus and Homo sapiens paralogues. C Hypothetical succession of events to explain the presence of duplicated genes in Cyclostomata and Gnathostomata. The Cyclostomata paralogues originated during the 1R event, the Gnathostomata lost one of the duplicated genes from 1R and the TOP2A and TOP2B paralogues originated in the 2R event

TOPIIA Genes Duplicated Independently in Cyclostomata and Gnathostomata

We extensively searched for TOPIIA sequences in all available chordate genomes, and only found cases of TOPIIA paralogue genes within vertebrates. It is therefore evident that the formation of TOPIIA paralogues is related with vertebrate-specific genome duplication events. There is now convincing evidence that early vertebrate evolution is characterized by two rounds of tetraploidization (known as 1R and 2R), whose timing is still a topic of debate (Ohno 2013; Smith and Keinath 2015; Van de Peer et al. 2009). Irrespectively of when tetraploidization events occurred, our analyses suggest that the formation of paralogues in jawless vertebrates (Cyclostomata) and in jawed vertebrates (Gnathostomata) were independent events. Although with weak bootstrap values (~ 60%), our analyses always placed Cyclostomata paralogues in a different group from Gnathostomata TOP2A and TOP2B paralogues (Fig. 3A). In fact, Cyclostomata TOPIIA paralogues cannot even be classified as TOP2A or TOP2B, as they equally diverged from Gnathostomata paralogues, as shown by the similar pairwise sequence identities between Petromyzon marinus and Homo sapiens paralogues (Fig. 3B).

There is a growing consensus that all vertebrates share the first tetraploidization event (1R), but only jawed vertebrates had a second whole genome duplication (2R) (Aase-Remedios and Ferrier 2021; Nakatani et al. 2021; Simakov et al. 2020), following previous studies (Escriva et al. 2002; Stadler et al. 2004). If that is the case, the most parsimonious succession of events to explain TOPIIA paralogues was as follows: (a) Cyclostomata paralogues originated during the 1R event; (b) Gnathostomata lost one of the duplicated genes from 1R and (c) Gnathostomata TOP2A and TOP2B originated in the 2R event (Fig. 3C). This scenario only assumes a single gene loss to explain the observed phylogeny. Other scenarios are less parsimonious by requiring two or more gene loss/duplication events (Supplementary Fig. S2). For example, if 1R duplicated genes were retained in Gnathostomata at the time of 2R, two of the four resulting copies had to be subsequently lost. Nonetheless, further studies on early vertebrates will better elucidate these evolutionary events.

Non-synonymous Changes in TOP2A and TOP2B Amongst Modern and Archaic Humans

Neanderthals and Denisovans are the closest relatives to modern humans, diverging from the modern human lineage early in the Middle Pleistocene (Green et al. 2010; Reich et al. 2010). The recent shared ancestry explains the low genetic divergence amongst modern and archaic humans. We identified four polymorphisms amongst human, Denisovan and Neanderthal TOPIIA coding sequences (Table 1). The polymorphic positions were observed in the Transducer region of the ATPase domain (n = 1) and CTD (n = 3). The mutation in the TOP2A Transducer (position 267) occurred in the Denisovan lineage (A>G) without replacing the amino acid, perhaps the only possible type of mutation considering that this site is invariable in the 389 species analysed here. Two missense mutations were found in the TOP2A CTD. A T>C mutation in the human lineage replaced aspartate by glycine at position 1386. A C>A mutation in the Neanderthal lineage changed alanine by serine in position 1515 (Table 1). In both cases, the replacement was between amino acids with different charges or polarities. Such replacements may not affect the protein structure as the CTD is believed to be disordered. However, it is possible that such substitutions may impact the CTD interactions with other cellular components, raising the possibility that it may have contributed to the evolution of present-day humans. For example, position 1515 is within the chromatin tether domain (ChT), shown to be essential for TOP2A to interact robustly with chromosomes in mitosis (Lane et al. 2013). The only polymorphism detected in TOP2B occurs also in the CTD in the human lineage without replacing the amino acid. We found that all the polymorphic positions in the CTD are not conserved (< 50% pairwise identity), as expected considering the high variability of this protein region.

Table 1 Polymorphisms identified in Denisovan (Denis.) and Homo neanderthalensis (Neand.) TOP2A and TOP2B coding sequences

Very few amino acid changes become fixed in modern humans when comparing with archaic humans. For example, only 78 of those substitutions were described in the original publication of the Neandertal genome (Green et al. 2010). A recent survey, using data from different Neanderthal and Denisovan genomes, identified 571 genes with human-specific amino acid-changes (Kuhlwilm and Boeckx 2019). In fact, human TOP2A was identified as the protein with the largest number of interactions with other proteins (n = 53), suggesting that it may operate as an interaction hub in modifications of the cell division complex (Kuhlwilm and Boeckx 2019). Experimental studies exploring the influence of these changes on the cell cycle machinery could evaluate this intriguing hypothesis.

TOP2A and TOP2B Evolved Under Strong Purifying Selection But a Few Sites were Positively Selected

We tested for the strength and mode of selection acting on TOP2A and TOP2B using dN/dS in chordates (Table 2). We found molecular signatures of purifying selection in both genes (dN/dS < 0.3), indicative of a strong selective pressure to conserve both TOP2A and TOP2B. It has been suggested that paralogues are subject to weaker purifying selection than single-copy genes (Kondrashov et al. 2002; Scannell and Wolfe 2008). In this concern, dN/dS values for TOP2A and TOP2B are higher than those estimated for Topoisomerase III Beta (TOP3B) in chordates (dN/dS < 0.1) (Moreira et al. 2021), which suggests that functional constraints were relaxed during the functional divergence to TOP2A and TOP2B. Moreover, the presence of (at least partially) redundant gene copies may have permitted the accumulation of previously forbidden deleterious mutations, which can explain the higher dN/dS values.

Table 2 Nucleotide diversity, substitution rates and selection pressure in TOP2A and TOP2B

Paralogues may exhibit asymmetric rates of sequence evolution (Conant and Wagner 2003; Scannell and Wolfe 2008; Van de Peer et al. 2001). The strength of purifying selection was stronger in TOP2B (i.e. dN/dS = 0.156 on Chordata) than in TOP2A (e.g. dN/dS = 0.238 on Chordata), which suggests that TOP2B is under stronger functional constraints. Moreover, TOP2A displayed a higher nucleotide diversity and substitution rate than TOP2B (Table 2), which can also be noted in its longer branches in the reconstructed phylogenetic trees (Figs. 2, 3A). The specific activity of TOP2B in nerve growth and brain development (Lyu and Wang 2003; Yang et al. 2000) could impose relevant constrains on molecular evolution by the need of interacting with different partners and chemical environments. For example, it has been suggested that the role of TOP2B in the organism development involves the activation and repression of specific developmental genes (e.g. Myt1l, Cacna2d1, Syt1, Kcnd2) in association with diverse proteins (Lyu et al. 2006; Lyu and Wang 2003; Tiwari et al. 2012).

We detected signatures of positive selection in three TOP2A and two TOP2B sites in mammals (Table 2). Three of these positively selected sites (PSSs) are placed in the CTD, which is believed to conform specificity to the different activities of TOP2A and TOP2B (Kozuki et al. 2017; Linka et al. 2007). The TOP2A position 928 in the Tower domain was also found under positive selection (dN − dS = 4.832). The highest dN − dS values were obtained for the TOP2B position 28 in the ATPase domain in both the Chordata and Mammalia datasets (Table 2). The ATPase domain binds to ATP for a nucleotide-actuated protein dimerization gate through which DNA duplexes are passed. Considering the domains where the PSSs are located, we believe that these sites accumulated nonsynonymous substitutions over time to improve TOP2A and TOP2B interaction and cellular functions, although it should be evaluated with experimental studies.

Low Conservation in Putative Regulatory Regions of the C-terminal Domain (CTD)

Human TOP2A and TOP2B proteins had an overall pairwise identity of 66.6% (Fig. 4A, B). The DNA Binding/Cleavage domain was slightly more conserved (80.2%) than the ATPase domain (77.3%), whilst the paralogues mainly diverged in the CTD (28.8%). In this concern, the alignment of TOPIIA sequences of metazoans clearly showed the contrast between the poorly conservated CTD and the other well-conserved protein domains (Fig. 4C). The most conserved regions were the WHD (84.8%) and TOPRIM (76.9%) domains (Fig. 4D), where conservation can be explained by their critical role in interacting with DNA (Aravind et al. 1998; Gajiwala and Burley 2000; McKay and Steitz 1981; Roca et al. 1996). The same domains stand out as the most conserved when comparing TOP2A with TOP2B (Fig. 4D). All the other TOPIIA domains displayed a pairwise identity of around 50–70%, excepting the CTD linker (32.9%) and CTD (21.9%), which are almost impossible to align when using all metazoans. The CTD regulates nuclear localization and protein–protein interactions, which could have differentially evolved amongst species. Moreover, the CTD is believed to be a disordered region, which is known to evolve faster than well-structured regions (Brown et al. 2011). Overall, we found that TOP2B domains are more conserved than TOP2A domains (Fig. 4D), in agreement with the strong purifying selection observed in TOP2B (Table 2). We found poor conservation in the ending CTD 30 amino acids (positions 1502–1531) that constitute the ChT domain (Lane et al. 2013). The ChT domain had a 24.8% pairwise identity in Metazoa whilst a 55.7% in TOP2A Chordata. Similarly, the bipartite nuclear localisation signal (NLS) near the TOP2A CTD end (positions 1454–1497) (Mirski et al. 1997) was found poorly conserved in Metazoa (24.9%) and Chordata TOP2A (49.9%).

Fig. 4
figure 4

TOPIIA diversity and structural organization. A Identity plot for the alignments of human TOP2A and TOP2B reference sequences. The identical positions are shown in green bars and the different positions are shown in yellow bars. Highlighted are the positions polymorphic in Denisovan or Neanderthal (green bars), associated with resistance to anticancer drugs (red bars) and positively selected (grey bars). TOP2B mutations resulting in disease are indicated by black bars. B Percentage of pairwise identity between human TOP2A and TOP2B protein complete sequence and domains. C Identity plot for the alignment of 389 TOPIIA protein sequences from metazoan species. The most conserved positions are indicated with brown bars, the less conserved with red bars. The main protein domains are included. D TOPIIA conservation across Metazoa. The percentage of pairwise identity was calculated for the full TOPIIA protein and domains in three different alignments, TOP2A (chordates), TOP2B (chordates) and all metazoans (Color figure online)

High Conservation of the Linker Connecting the ATPase and the TOPRIM Domains

We found that the linker connecting the ATPase and the TOPRIM domains (Fig. 5A) is well conserved in both TOP2A (92.7%) and TOP2B (96.4%), but more variable (65.4%) when all metazoans are compared (Fig. 4D). Two insertions of several amino acids are observed in Trichuris suis and Habropoda laboriosa. This linker forms an alpha helix with 29 amino acids connecting the N-gate to the DNA-gate (Figs. 1, 5B). Broeck et al. (2021) identified four highly conserved residues (positions 414, 417, 418 and 425) in this linker and tested different mutants to assess their contribution to the allosteric regulation of the human TOP2A. Our dataset of metazoan TOPIIA sequences confirmed that the positions 414, 417 and 418 were highly conserved (> 93%), but the position 425 showed a moderate level of conservation (70.8%; Fig. 5C). Indeed, positions 431 (99%), 409 (92.5%), 419 (84.5%) and 407 (82.4%) were also conserved, suggesting that they may play an important role in connections between the ATPase and TOPRIM domains. These residues could be tested in future experiments on allosteric regulation of TOP2A.

Fig. 5
figure 5

Structure and diversity of TOPIIA linker regions. A Identity plot and sequence logo for the linker joining the N-gate to DNA-gate. The most conserved positions are indicated by brown bars and the less conserved by red bars. B Cartoon representation of the linker region in the human TOP2A structure. C Percentage of pairwise identity per site for the linker region obtained from the alignment of 389 animal species. From D to F, we show the same information as from A to C but for the C-terminal domain (CTD) linker (Color figure online)

It has been suggested that the CTD linker (Fig. 5D, E) can structurally favour the curvature of the G-segment, stimulating DNA cleavage and facilitating strand passage (Broeck et al. 2021). Our analyses show that the CTD linker is poorly conserved in Metazoa, with several insertions and deletions (Fig. 4C, D). A better conservation was observed in Chordata TOP2A and TOP2B, but still much lower than other domains. In the Metazoa alignment, only the CTD linker region near the CTD is relatively conserved, with a few sites showing a pairwise identity above 75% (Fig. 5F). A few lysines (K) stand out as relatively conserved (Fig. 5F). The TOP2A position 1213 is phosphorylated during mitosis and contributes to localization of the protein to the centromere (Ishida et al. 2001). This position is moderately conserved in Metazoa (57.1%) and Chordata TOP2A (68.5%), but completely conserved in TOP2B (100%), which suggests that it is fundamental for the localization of TOP2B.

TOP2B Mutations Associated with Disease Occur in Conserved Sites and Replace Amino Acids with Different Physicochemical Properties

Alterations in topoisomerases have been associated with neurodegenerative and immune disorders and cancer (Pommier et al. 2016). TOP2A is essential for life; therefore, mutations that significantly affect its activity in relaxing topological stress are expected to be lethal. In fact, human disorders caused by mutations in TOP2A are rare. For example, the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/) only reports a gross deletion associated with congenital heart disease (Glessner et al. 2014). On the other hand, TOP2B is not embryonic lethal and acts particularly in postmitotic cells. Perhaps because of this, a few cases of inherited TOP2B mutations associated with diseases have been reported (Broderick et al. 2019; Erdős et al. 2021; Lam et al. 2017; Papapietro et al. 2020; Xia et al. 2019). We identified six TOP2B mutations in the literature related with impaired B-cell development and function, hearing loss or neurodevelopmental disease (Table 3). A mutation at position 63 replaced a histidine by tyrosine in the Bergerat fold of the ATP binding domain, in an invariable site across Chordates. The mutation replaces a positively charged (histidine) by a neutral (tyrosine) amino acid. A total of four mutations were observed in the TOPRIM domain, all of them in highly conserved residues (PI > 95.2%). The heterozygous mutations affecting the TOPRIM were shown as partially dominant loss-of-function mutations (Broderick et al. 2019) and affect essential sites for the catalytic activity of the TOP2B. For example, the alanine to proline replacement at position 490 is predicted to destabilize an alfa helix within the TOPRIM domain (Papapietro et al. 2020). The Ser>Leu and Gly>Ser replacements change polar and nonpolar amino acids.

Table 3 TOP2B mutations associated with human disorders

A single mutation (position 1618) was described in the TOP2B CTD (Table 3). The residue is poorly conserved (PI of 54.9%). The mutation occurs near the end of the TOP2B sequence in a region that is homologous to the TOP2A ChT domain (Lane et al. 2013) that facilitates stable binding to chromatin. Still, it remains to be determined if TOP2B presents a similar domain and if the identified mutation could affect its activity. Overall, TOP2B replacement mutations resulting in disease are found at conserved amino acid positions, a pattern also observed in other genes (Miller and Kumar 2001). The high conservation of several TOP2B residues suggests that other undetected mutations could cause similar diseases.

Six Residues Conferring Resistance to TOP2 Poisons Differ Amongst TOP2A and TOP2B, and Can be Used to Develop Paralogue-Specific Drugs

TOPIIA is a molecular target of several important classes of anticancer drugs, whose efficiency can be affected by mutations in critical protein sites (Delgado et al. 2018; Nitiss 2009). We analysed 27 amino acid replacements previously shown to confer resistance to anticancer drugs (Beck et al. 1993; Gilroy et al. 2006; Leontiou et al. 2006; Leontiou et al. 2007; Vassetzky et al. 1995), all of them located in the DNA Binding/Cleavage domain (Table 4). A total of six out of the 27 sites represented different amino acids in TOP2A and TOP2B sequences (positions 450, 480, 762, 763, 908 and 909 in TOP2A). These six residues are amongst the most variable sites in Chordata TOP2A. However, we did not find the same pattern in TOP2B, with only the position 929 being variable amongst chordates. TOP2B is believed to be responsible for undesirable side effects of anticancer chemotherapy by leading to therapy-related leukaemia (Azarova et al. 2007; Cowell et al. 2012). Therefore, we believe that future works could explore these six variable sites to design TOP2A-specific anticancer drugs with less undesirable side effects caused by interfering with TOP2B (Wu et al. 2013). With the exceptions mentioned above, positions conferring resistance to chemotherapy were well conserved (Table 4). This pattern is expected because target sites of TOP2 poisons should disrupt functionally relevant protein sites, which therefore are under strong negative selection. Nevertheless, such sites can vary in cases of resistance to TOP2 poisons and still allow functional topoisomerases, perhaps only possible in the specific environment of cancer cells under different selective pressures.

Table 4 TOPIIA amino acid replacements known to affect the efficiency of anticancer drugs

Conclusion

Our results suggest that the long-term evolution of TOPIIA is primarily driven by strong purifying selection, which also explains the high levels of sequence conservation. TOP2B is under stronger selective constraints than TOP2A, which may be explained by the specialized role of TOP2B in the genetic programming of postmitotic cells that impose additional constrains to its evolution. The TOPIIA phylogeny lead us to conclude that Cyclostomata TOPIIA paralogues have evolved independently from jawed vertebrates. Therefore, jawless vertebrates are a good model to uncover the role of additional TOPIIA genes in vertebrate evolution. Our study also identified two missense mutations in the TOP2A CTD when comparing modern and archaic humans that may have contributed to the evolution of human-specific features. We found that almost all mutations related with resistance to chemotherapy or causing diseases occur in conserved sites of the TOP2B ATPase and DNA Binding/Cleavage domains, including their linker region. Therefore, we recommend that these domains should be included in the screening of undiagnosed diseases, particularly considering the multiple roles of TOP2B in cells. Similarly, we provide a list of residues that could be a good target to design TOP2A-specific anticancer drugs that would avoid the undesirable side effects caused by interfering with TOP2B. Overall, our study provides important insights into the evolution of TOPIIA in animals and represents a valuable resource for future functional studies of topoisomerases.