Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Systematic Classification of Peroxidases

It is already well established that peroxidases are ubiquitous and abundant enzymes in all forms of life. Even in strictly anaerobic bacteria [1, 2], hydroperoxidases play an essential function mainly in signaling and maintaining oxygen tolerance. Primordial blue-green algae (i.e., cyanobacterial predecessors) that appeared about 3.2 billion years ago must have been among the first organisms that elaborated mechanisms for the detoxification of partially reduced, reactive oxygen species as a consequence of their oxygen-evolving photosystem [3]. A significant increase in the level of atmospheric oxygen at the beginning of Proterozoic (i.e., between 2.45 and 2.32 billion years ago) [4] necessarily led to further sophisticated evolution of both prokaryotic and eukaryotic antioxidative stress responses, with peroxidases and catalases being among the most prominent enzymatic factors [5]. The genome sequencing projects of last two decades have shed light on the presence of numerous gene orthologs and paralog variants of heme peroxidase gene families in almost all known genomes. The full-length coding sequences (both DNA and amino acids) are the base for any comprehensive higher-level phylogenetic reconstruction, and with increasing number and reliability of sequence data, our evolutionary views are improving. We have attempted to collect all available peroxidase sequences in the PeroxiBase [6] http://peroxidase.isb-sib.ch. Out of 6,861 known peroxidase sequences collected in PeroxiBase (January 2010), more than 73% of them code for heme-containing peroxidases. In the majority of cases, heme b is the prosthetic group, and its evolutionary highly conserved amino acid surroundings influences its reactivity. All currently known heme-containing peroxidases can be divided in two main superfamilies and three families. Contrary to some recently published opinions [7], the in-depth phylogenetic analysis clearly reveals that almost all of these heme peroxidase coding genes have very ancient prokaryotic origins probably dating back to the rise of atmospheric oxygen. All (super)families described below represent their own peculiar mode of evolution in maintaining and regulating the heme reactivity. The classical division into plant and animal heme peroxidases nowadays appears to be obsolescent and not appropriate. There are numerous examples of interkingdom distribution for members of most herein described gene families. It is therefore more suitable to name the appropriate groupings according to peculiar reaction specificity in each observed (super)family. As a special case of hydroperoxidases, heme-containing catalases represent a very abundant gene family [8, 9], but they will not be analyzed in this contribution as they dominantly exhibit catalatic activity and their peroxidatic reaction mode is only of marginal importance. Here, first, we present an overview of the two ubiquitous heme peroxidase superfamilies, followed by the description of the phylogeny of three less abundant but physiologically and biotechnologically important heme peroxidase families. In Fig. 2.1, a brief overview of all types of heme peroxidases addressed in this chapter is presented.

Fig. 2.1
figure 1

Overview of all heme peroxidase (super)families

2 The Peroxidase–Cyclooxygenase Superfamily

The peroxidase–cyclooxygenase superfamily, named after its most typical activity profiles [10], represents one of the two main evolutionary streams of heme peroxidase development in the living world. Currently, there are 371 sequences of peroxidases belonging to this superfamily that are collected in PeroxiBase (January 2010), but many further sequences are expected to be added from recent sequencing projects. In the corresponding Pfam database at http://www.ebi.ac.uk/interpro/, there are already over 820 entries for PF03098 or IPR002007, describing this superfamily. The PROSITE profile “peroxidase_3” counts up to 807 protein sequences matching these criteria. The importance of the peroxidase–cyclooxygenase superfamily is underlined by the fact that its numerous representatives are involved in the innate immune system, e.g., [11]. This is true not for the mammalian peroxidases alone, which have undergone the most complicated evolutionary step. Even several peroxidases from the bacterial predecessor clades [12] are supposed to be involved in unspecific defense mechanisms. The overall phylogeny is outlined in Fig. 2.2. Seven main clades representing distinct subfamilies are well segregated in the reconstructed unrooted phylogenetic tree (with 160 full sequences involved in the analysis). From the occurrence of multiple paralogs of otherwise conserved peroxidase genes in addition to the rare occurrence of pseudogenes, it was concluded that this superfamily obeys the rules of birth-and-death model of multigene family evolution [10, 14]. Details for each of the seven subfamilies with the exception of dual oxidases will be presented in the following subchapters. In dual oxidases, the peroxidase domain has lost its functional residues, and its physiological role remains elusive [15].

Fig. 2.2
figure 2

Phylogenetic relationships within the peroxidase–cyclooxygenase superfamily. This circle tree was obtained in MEGA package [13] after compressing all resolved branches from 160 full sequences of this superfamily to main clades. Seven subfamilies [10] are clearly discernible

2.1 Bacterial Members: Peroxicins, Peroxidockerins, Primordial Peroxidases

Bacterial members represent the most ancient forms of the peroxidase–cyclooxygenase superfamily that probably remained without significant changes during the long evolution of this superfamily [3]. There are currently 30 complete sequences entered in PeroxiBase. Open reading frames of primordial peroxidases, containing only the peroxidase domain, can be found mainly in cyanobacteria and also to some extent in proteobacteria. Peroxidockerins represent more complicated gene structures with an additional N-terminal dockerin domain containing dockerin type I repeats with predicted transmembrane helices [10]. Peroxicins are multidomain peroxidases that possess a short N-terminal peroxidase motif besides the normal-length peroxidase domain as well as C-terminal hemolysin-like calcium-binding repeats [12]. These very long C-terminal domains might be involved in defense mechanism against competitor bacteria. Figure 2.3 outlines the reconstructed phylogeny of peroxicins, peroxidockerins, and ancestral peroxidases related to them. The single-domain cyanobacterial genes might represent a molecular fossil of the ancestral peroxidase gene of this superfamily [3].

Fig. 2.3
figure 3

Phylogeny of bacterial representatives of the peroxidase–cyclooxygenase superfamily. The reconstructed tree obtained from the NJ-method of the MEGA package [13] with JTT matrix and 1,000 bootstrap replications is presented. A similar tree was obtained also with ProML-method of the PHYLIP package [16] with 100 bootstraps. Numbers in the nodes indicate bootstrap values for NJ and ProML method, respectively. Abbreviations of protein names correspond to PeroxiBase

2.2 Bacterial, Fungal, and Animal Cyclooxygenases

Cyclooxygenases diverged rather early from the remaining peroxidase genes of the peroxidase–cyclooxygenase superfamily [10]. This is underlined by the fact that corresponding genes can be found in various organisms ranging from bacteria and fungi to mammals. There are currently 74 sequences entered in PeroxiBase. The eukaryotic genes exhibit a peculiar structure consisting of a N-terminal signal peptide followed by an epidermal growth factor domain, a membrane-binding domain, and the conserved globular peroxidase domain. Two main clades of this subfamily are discernible (Fig. 2.4). In the first subfamily, an evolutionary connection between bacterial cyclooxygenases towards human prostaglandin synthases can be followed. Also the fungal linoleate diol synthases participating in linoleic acid metabolism are located in this clade as a separate branch. In the second main clade, further cyclooxygenase paralogs and alpha dioxygenases are located. Corresponding genes are abundant mainly among fungal and plant genomes, but there are also predecessors among bacterial genes. The clade of typical plant alpha-dioxygenase genes within this subfamily clearly demonstrates that it is not appropriate to name such superfamily as “animal peroxidase” superfamily, although the nomenclature is still present in some databases.

Fig. 2.4
figure 4

Phylogeny of cyclooxygenases. The reconstructed tree obtained from the NJ-method [13] with JTT matrix and 1,000 bootstrap replications is presented. A similar tree was obtained using the ProML-method [16] and 100 bootstraps. Numbers in the nodes indicate bootstrap values obtained for NJ and ProML method, respectively. Abbreviations of protein names correspond to PeroxiBase

2.3 Ecdysozoan and Echinozoan Peroxinectins

The very abundant subfamily of peroxinectins is mainly spread among Ecdysozoa and Echinozoa, i.e., among various arthropods and nematodes. Up to 40 sequences can be found in PeroxiBase. Originally, peroxinectin from crayfish was described as a cell adhesion protein with a peroxidase domain and an integrin-binding motif [17]. Two main clades are visible within the phylogenetic distribution of this subfamily (Fig. 2.5). In the first clade, nematode and squid peroxinectins are dominating, whereas in the second main clade, various gene duplicates of insect and crustacean peroxinectins are located. The fact that until now no vertebrate peroxinectin gene was found can be understood as an impasse of natural evolution of this subfamily within the whole superfamily [10]. Restricted gene variant distribution further supports the proposed evolutionary scheme based on the birth and death model of gene family evolution [14].

Fig. 2.5
figure 5

Phylogeny of peroxinectins. The reconstructed tree obtained from the NJ-method [13] with JTT matrix and 1,000 bootstrap replications is presented. A nearly identical tree was obtained also with the ProML method [16] with 100 bootstraps. Numbers in the nodes indicate bootstrap values for NJ and ProML method, respectively. Abbreviations of protein names correspond to PeroxiBase

2.4 Ecdysozoan and Deuterostomian Peroxidasins

Peroxidasins (in humans also designated vascular peroxidases [18]) represent peculiar multidomain peroxidases. In PeroxiBase, 37 sequences are described. During later steps of evolution, the peroxidase domain was fused with immunoglobulin domains, suggesting an essential role of this subfamily in the innate immune system. Peroxidasins are found both in invertebrates (even in nematodes) and vertebrates including mammals. The first peroxidasin was described in Drosophila as a multidomain protein combining the peroxidase domain with extracellular matrix motifs [19]. In humans, these proteins were detected to be most abundant in heart and vascular wall [18]. In PeroxiBase, 37 sequences are deposited (January 2010). A typical peroxidasin gene encodes an N-terminal signal peptide followed by leucine-rich repeats (LRR), immunoglobulin domains, the peroxidase domain, and finally a von Willebrand factor C type (VWC) domain [18]. The LRR regions frequently mediate protein–protein and protein–lipid interactions [20]. The repeated immunoglobulin domains are of the C-2-type. They are closely related to extracellular domains in Class I and Class II major histocompatibility complexes and are supposed to be involved in vascular cell adhesion molecule by binding to integrins [18]. The proposed physiological role of the VWC domain involved in specific protein–protein interactions is mainly in embryonic development and tissue specification [21], but for peroxidasins, this function has not yet been proven experimentally. In the presented part of the evolutionary tree (Fig. 2.6), all analyzed peroxidasin subfamily members form a single clade with a frequent occurrence of gene duplicates within ecdysozoan and deuterostomian genomes. In mammals, there are apparently one or several peroxidase paralogs (cf. with Sect. 2.2.5) participating in the innate immunity. The phylogenetic position of Branchiostoma belcheri (invertebrate) peroxidasin is very interesting. It is located at the root of this subfamily and is also the closest homolog to thyroid peroxidases.

Fig. 2.6
figure 6

Phylogeny of peroxidasins. The reconstructed tree obtained from the NJ-method of the MEGA package [13] is presented. A nearly identical tree was obtained also from ProML method of the PHYLIP package [16]. Numbers in the nodes indicate bootstrap values for NJ and ML methods, respectively. Abbreviations of protein names correspond to PeroxiBase

2.5 Chordata Peroxidases

The sequences coding for well-known and intensively investigated mammalian representatives, myeloperoxidase (MPO), eosinophil peroxidase (EPO), lactoperoxidase (LPO), and thyroid peroxidase (TPO), were phylogenetically analyzed in detail recently [10, 22]. The reconstructed unrooted tree that focused on all available chordata sequences is presented in Fig. 2.7. In the statistically highly supported output, the division in two gene duplicated twin-clades of EPO and MPO with a closely related clade of LPO and more distantly related clade of TPO was observed and also the sites of positive Darwinian selection were elucidated [22]. The phylogenetic distribution of these four monophyletic clades is in accordance with the proposed physiological function: whereas MPO, EPO, and LPO members play key roles in antimicrobial and innate immune responses of mammals [23, 24], TPO is essential in thyroid hormone biosynthesis [25]. The amino acid positions of positive selection detected separately for each clade within the highly evolved peroxidase domain have physiological implications that most likely contributed to the functional diversity within this subfamily. Variants in most of the observed positions are associated with such diverse diseases such as asthma, Alzheimer´s disease, and inflammatory vascular disease. It has to be noted that even in the more complex evolutionary tree of this superfamily [10], nonmammalian members are distributed between the well-resolved clades of mammalian peroxidases (cf. Figs. 2.2 and 2.7). For example, between the clades of MPO–EPO and LPO, there are minor clades of bird and amphibian peroxidases of so far unknown physiological role. Very interestingly, XlPOX1 (a frog peroxidase) has been reported to possess a ribonuclease activity [26], thus being involved in the destabilization of albumin mRNA. Furthermore, between the LPO and TPO clades, a minor clade of fish peroxidases is located (cf. Fig. 2.2) also with so far unknown physiological role. In some databases, corresponding (yet putative) proteins are classified as myeloperoxidases (e.g., the Zebrafish sequence), but the sequence analysis reveals that they diverge from mammalian myeloperoxidases in critical amino acid residues [10]. All presented members of this subfamily demonstrate a high level of gene variability and speciation up to modern day mammals. However, future research that mostly focus on the nonmammalian members of this subfamily is of essential importance not only for basic science but also for biotechnological applications.

Fig. 2.7
figure 7

Detail of the reconstructed phylogenetic tree showing the subfamily of vertebrate peroxidases including the mammalian enzymes myeloperoxidase (MPO), eosinophil peroxidase (EPO), lactoperoxidase (LPO), and thyroid peroxidase (TPO). Reproduced from [10] with the permission of John Wiley and Sons (License Nr. 2326000554179)

3 The Peroxidase–Catalase Superfamily

This is the currently best known and most intensively studied superfamily of heme peroxidases. Its representatives being among the oldest peroxidases known, e.g., horseradish peroxidase, have been systematically studied since 1930s [27]. In 1940s, yeast cytochrome c peroxidase was discovered, but the corresponding sequence was available only in 1982 [28]. In 1980s, the advent of intensively studied fungal lignin and manganese peroxidases occurred with plenty of newly available sequences. In parallel, catalase–peroxidase and ascorbate peroxidases-focused research also brought numerous new representatives. Currently, up to 4,020 protein sequences match the criteria for this superfamily in IPR002016 (PF00141). The first systematic classification of accumulated sequences and structures was performed by Welinder and divided all members of this superfamily in three distinct classes [29]. Class I was identified as containing mainly yeast cytochrome c peroxidase, ascorbate peroxidases, and bacterial catalase–peroxidases. Class II is dominated by fungal lignin and manganese peroxidases. Class III is represented by secretory plant peroxidases related to horseradish peroxidase. The whole superfamily was thus first named according to the origin of included members as the superfamily of plant, fungal, and bacterial heme peroxidases. It was shown recently that few representatives occur also in the genomes of Hydra viridis [30] and related species belonging to the phylum of Cnidaria that is phylogenetically located at the root of animal kingdom [31]. Therefore, it would be more appropriate to name this superfamily as peroxidase–catalase superfamily, i.e., according to main enzymatic activities performed by its members. Such name is similar to the nomenclature used for other families and superfamilies (cf. Sect. 2.2 or Sects. 2.42.6) In PeroxiBase, more than 4,100 sequences of its members can be found (January 2010). The molecular phylogeny was already investigated from various aspects. In general, it was proposed that both Class II and Class III evolved from a common ancestor [32] and (Fig. 2.8). As there is no prokaryotic sequence in the neighborhood of the corresponding phylogenetic node [32], it is very probable that Class II and Class III diverged in already formed ancestral eukaryotic genome. Such a predecessor gene may have diverged after an earlier, very distant gene duplication from ancestral bacterial Class I representative [33]. Details of this event still need to be resolved. Nevertheless, Class II secretory fungal peroxidases and Class III secretory plant peroxidases evolved in parallel after their early division for a long time period, leading to two highly evolved groups of genes within this superfamily.

Fig. 2.8
figure 8

Phylogenetic relationships among heme peroxidases belonging to classes I, II, and III of the peroxidase–catalase superfamily. This circle tree was obtained in MEGA package [13] after compressing all resolved branches from 123 full sequences of this superfamily to main clades. All three classes originally defined in [29] are clearly discernible

3.1 Class I: Peroxidases

3.1.1 Catalase–Peroxidases (KatGs)

It is reasonable to suppose that bifunctionality was at the root of the peroxidase–catalase superfamily since it has been argued that catalytic promiscuity frequently occurred in the evolution of protein superfamilies [34]. Catalase–peroxidases are physiologically active primarily as catalases and their peroxidase functionality in vivo is still under discussion [3]. Currently, 366 peroxidases deposited in PeroxiBase match the criteria for designation as catalase–peroxidase. They are phylogenetically distributed among archae (8), eubacteria (305, including also facultative anaerobes), fungi (42), and protists (11). KatG genes reveal a peculiar gene structure [35, 36]. In all sequenced katG genes, a tandem gene duplication with different function for each of the duplicated domains is obvious. This gene structure is unique not only within the peroxidase–catalase superfamily but also among all known heme peroxidase families. In the first attempt to reconstruct the evolution of this family within the whole Class I, the tandem gene duplication of katG was taken into account [37]. However, the N- and C-terminal domains revealed slight differences in their phylogenetic distribution possibly due to differences in mutational rates. It was proposed that the tandem duplication occurred after the earlier segregation between KatG and cytochrome c peroxidase and ascorbate peroxidase branches. Further phylogenetic analysis with 152 full sequences reported recently [38] gave a more detailed insight, although the distribution among the microorganisms is influenced by the fact that katG genes are not an essential equipment of the genome and their role can be eventually adopted also with alternative loci (e.g., katE encoding typical catalase or genes encoding non heme peroxidases). The most interesting aspect of this family evolution is the occurrence of several lateral gene transfers (LGT). The distribution of known katG sequences with most apparent LGT from bacteroidetes towards sac fungi in an unrooted tree is evident from Fig. 2.9. Two basal paralog clades were segregated at the beginning of katG gene evolution. In most bacteria, the presence of these two paralogs is rare, but the minor clade 2 contains besides proteobacterial also mostly (but not all) archaean representatives. The main clade 1 contains katG genes from almost all bacterial phyla. Mainly among proteobacteria, numerous closely related orthologs are present. Cyanobacterial catalase–peroxidases are located at the root of this clade and also the fungal branch has its origin in later steps of this main clade evolution. The phylogenetic distribution clearly suggests that fungal catalase–peroxidases were segregated from bacteroidetes ancestor via a LGT towards ancient fungi [39]. Further support for this rare LGT event came from differences in G+C content (gene vs. whole genome) and from rare occurrence of introns within fungal katG genes. The future research within KatGs with potential biotechnological applications will be probably focused mainly on the elucidation of structure and function of eukaryotic catalase–peroxidases. Besides detection of KatG as a virulence factor in human fungal pathogens such as Penicillium marneffei [40], mainly the elucidation of potential role of catalase–peroxidases in phytopathogenic fungi might have a significance for the preservation of culture crops, particularly against attacks of Gibberella sp. [41] and Magnaporthe grisea [42], two important phytopathogens.

Fig. 2.9
figure 9

Phylogeny of selected 66 KatGs for the demonstration of the presence of two paralog clades separated very early as well as a lateral gene transfer (LGT) from bacteria to fungi within this gene family. The reconstructed tree obtained with the NJ-method [13] is presented. Nearly identical trees were obtained using the ProML- [16] and the MP-methods [13]. Numbers in the nodes indicate bootstrap values for NJ/MP/ML methods, respectively. Arrow indicates the occurrence of LGT from Bacteroidetes towards sac fungi. Abbreviations of protein names correspond to PeroxiBase

3.1.2 Ascorbate Peroxidases, Cytochrome c Peroxidases, and Their Putative Hybrid Types

Ascorbate peroxidases (APx) and cytochrome c peroxidases (CcP, i.e., single heme cytochrome c peroxidase) segregated very early from katG genes [33, 37], but the details of this event and the process of gene speciation towards different substrate specificities remain unclear. There are still missing important intermediate sequences and corresponding proteins from, e.g., primitive fungi and protists that could clarify this problem. In PeroxiBase, already 409 ascorbate peroxidases are included (January 2010), but approximately one third of them are only partial sequences that are thus not suitable for higher-level phylogenetic analysis. These partial sequences originate from EST-database and shall indicate their expression profiles in various growth and developmental phases. Soon after its development, the clade of APxs segregated from that of CcPs. All known ascorbate peroxidases are divided into three types according to their cellular location [33]. Chloroplastic APxs diverged in the earlier phase, whereas cytosolic and peroxisomal variants evolved together and were separated only in the later stage of evolution. Of particular interest is the sequence analysis of ascorbate peroxidases in chloroplastic protists, which acquired chloroplasts by endosymbiosis [43].

Detailed phylogeny of fungal cytochrome c peroxidases with the inclusion of novel, previously not analyzed sequences mainly from recently finished sequencing projects was recently presented [44]. In Fig. 2.10, an update with 88 full-length sequences (out of 108 known) including also nonfungal protist sequences is depicted. There are three distinct subfamilies that differ in subcellular location. In subfamily I, no signal sequence was detected, suggesting that these are cytosolic CcPs. Subfamily II is the largest one with signal sequence targeting them to mitochondria. The well-known S. cerevisiae CcP belongs in this branch. Enzymes from subfamily III, probably the most ancient one, can even possess targeting signals not solely for mitochondria. According to recent data mining (M. Zámocký, unpublished work), there exists a minor but important group of hybrid type APx–CcP sequences in fungal and protistan genomes (Fig. 2.8). In PeroxiBase, 36 sequences of this type are entered (January 2010). These completely unknown peroxidases might represent the missing proteins for the complex phylogenetic reconstruction of this family. Investigation of hybrid type APx–CcP can give novel aspects to our understanding of substrate specificity of Class I heme peroxidases with yet hardly predictable biotechnological impact.

Fig. 2.10
figure 10

Phylogeny of 88 full sequences coding for fungal and protistan cytochrome c peroxidases (CcP). The reconstructed tree obtained from the NJ-method of the MEGA package [13] is presented. A nearly identical tree was obtained with the ProML method of the PHYLIP package [16]. Numbers in the nodes indicate bootstrap values for NJ and ML methods, respectively. Abbreviations of protein names correspond to PeroxiBase

3.2 Class II: Manganese, Lignin Peroxidases, and Versatile Peroxidases

Phylogenetic relationships of Class II peroxidases have been already reconstructed in a comprehensive way [32]. Extracellular Class II heme peroxidases are currently known only in the kingdom of fungi (see PeroxiBase for details) and are essentially involved, through various mechanisms, in lignin degradation [45]. There are three main evolutionary groups (i.e., subfamilies) secreted by white rot basidiomycetes: manganese peroxidases (MnP), lignin peroxidases (LiP), and versatile peroxidases (VP). These subfamilies apparently present a gene speciation within one fungal paralog clade. This paralog clade was overwhelmed by frequent gene duplications giving rise to multiple gene variants (e.g., up to eight in P. chrysosporium) of both lignin and manganese peroxidases [46]. The same is true also for versatile peroxidases, but the duplication frequency appears to be lower than for LiP [47]. Phylogenetic analysis of 90 Class II peroxidase sequences revealed that this class is a monophyletic group [32] (Fig. 2.11). It was suggested, with good statistical support, that LiPs, VPs, and the classical MnPs from Agaricomycetes are derived from ancient peroxidases with a manganese-dependent activity. Besides LiPs, MnPs, and VPs, there are evolutionary very interesting types of Class II peroxidases that do not fall in either of these distinct lignin degrading groupings. These so called “basal peroxidases” [32] include also sequences from Coprinopsis cinerea and Antrodia cinnamomea – both fungi that do not produce the white rot of wood. Such “basal peroxidases,” surprisingly closely related with ascomycetous Class II-peroxidase representatives, may have retained some properties of the ancestral Class II forms and are thus best candidates for applications like protein engineering via directed evolution. It will be intriguing to compare the parallel evolutionary history of lignin-forming plants with those of lignin degrading fungi including also their immediate predecessors to get valuable hints about plant and fungi interactions and the role of Class II heme peroxidases in this process.

Fig. 2.11
figure 11

Phylogeny of Class II of the peroxidase–catalase superfamily. Sequences coding for secretory fungal peroxidases: lignin peroxidase (LiP), manganese peroxidase (MnP), and versatile peroxidase (VP) were used for this reconstruction. One of nine equally parsimonious trees is presented. Bootstrap values are indicated before slash, and Bayesian posterior probability values are indicated after the slash. With kind permission from Springer Science & Business Media: Morgenstern et al. [32], Fig. 2

3.3 Class III: Plant Secretory Peroxidases

Heme peroxidases of Class III are, in principle, plant-secreted glycoproteins involved in cell elongation, cell wall construction, and differentiation, as well as in the defense against various plant pathogens. They are encoded by a surprisingly large number of paralogous genes, but in most genomes of Angiosperm plants, the peroxidase genes of this superfamily (i.e., Class III but also Class I) have additionally undergone numerous gene duplications [48] in a relative recent evolutionary phase. For example, in Arabidopsis thaliana, which has served for long time as “model plant,” up to 73 distinct Class III peroxidase genes were detected [49], for Medicago truncatula up to 101 Class III genes, for Oryza sativa even up to 138 various Class III sequences were entered, and in Zea mays, this number reaches a maximum of 151 in the complete genome. The total number of Class III genes entered in PeroxiBase exceeds already 3,000 (January 2010) thus representing over 73% of all entered superfamily members. Monocotyledon peroxidases differ slightly in their sequence fingerprints from Eudicotyledons counterparts [50], but the majority of the prx genes is highly conserved throughout the whole Class III. The observed overall sequence similarities lead to the classification of all available Class III peroxidases in eight distinct groups [50], the most abundant in rice being group I and IV. The best known representative of Class III secretory peroxidases is horseradish peroxidase (HRP) from Armoracia rusticana. Not only is the isozyme 1C originally extracted from roots among the most notoriously known peroxidases but it also is still among the most widely presented plant enzymes in the scientific and patent literature [51] for over 50 years. Interestingly, sequences for five other Class III peroxidases encoded in the genome of A. rusticana are entered in PeroxiBase, but none of them has gained so much biotechnological attention as has isozyme 1C. Recently, one putative gene encoding a Class III peroxidase in green algae [52] and even four genes in red algae were detected. So the conclusion [48] that Class III gene family appeared only with the colonization of land by plants has to be reconsidered. Moreover, the complex high level evolutionary analysis of peroxidase genes from all three classes surprisingly places three unknown peroxidase genes from the fungal phytopathogen M. grisea at the root of Class III peroxidase branch (Fig. 2.12) [32]. Further, peroxidase genes from unicellular eukaryotes need to be included to resolve the root of Class III more precisely. Again, mainly algal Class III genes can be promising future targets for protein engineering and directed evolution, accounting the fact that they represent direct predecessors of the biotechnologically well-exploited horseradish peroxidase.

Fig. 2.12
figure 12

Phylogeny of Class III of the peroxidase–catalase superfamily. Sequences coding for secretory peroxidases from Viridiplantae were used for this reconstruction. One of nine equally parsimonious trees is presented. Bootstrap values are indicated before slash, and Bayesian posterior probability values are indicated after the slash. With kind permission from Springer Science & Business Media: Morgenstern et al. [32], Fig. 2

4 Di-Heme Peroxidase Family

This average-sized peroxidase family (IPR004852 or PF03150) is present predominantly among various bacteria, but a few archaeal members have also been found recently. So far no eukaryotic representatives were found. It is probable that corresponding genes evolved very early but, compared with other lineages, did not reach their universal coverage. This family is unique in containing two heme groups in one protein moiety thus allowing studies of intramolecular electron transfers [53]. Di-heme cytochrome c peroxidases (DiHCcP) reduce hydrogen peroxide to water using cytochrome c or cupredoxin. All investigated representatives of this family contain heme c prosthetic groups (unlike fungal CcPs) that are covalently linked to the polypeptide chain. DiHCcPs comprise two distinct domains: the electron transferring (E) heme domain and the peroxidatic (P) heme domain with a calcium-binding site at the domain interface [54]. This gene family also includes eubacterial methylamine utilization proteins (MauG), whose significant similarity to DiHCcP was detected earlier [55]. In the case of Paracoccus denitrificans, the purified heme protein reveals only marginal peroxidase activity. It appears that MauGs are, in principle, heme-oxygenases that play an important role in the synthesis of a tryptophan tryptophylquinone cofactor for the methylamine dehydrogenase [56]. Thus, it is reasonable to conclude that a massive gene conversion must have occurred during the evolution in this particular clade. Selected DiHCcP representatives were studied from various physiological aspects and were shown to be inducible in low oxygen conditions under the control of FNR protein [57]. PeroxiBase currently covers 110 bacterial DiHCcP sequences, whereas in PFAM over 900 mostly putative sequences are registered. The detailed phylogeny from PeroxiBase entries was reconstructed and is presented in Fig. 2.13. From this output, it is obvious that di-heme cytochrome c peroxidases are dominantly spread among all classes of proteobacteria, and in most of them, various orthologs exist. Only a minor clade within Chlorobia genomes exists and the two archaean representatives probably originated via lateral gene transfer from proteobacterial genomes. The phylogenetic position of the sole cyanobacterial DiHCCP gene is not highly supported, but it has apparently a common evolutionary history with the Spirochaetes representatives. The origin of this peroxidase family appears to be among ancient proteobacteria.

Fig. 2.13
figure 13

Phylogeny of bacterial di-heme peroxidases. The reconstructed tree obtained from the NJ method of the MEGA package [13] is presented. A very similar tree was obtained also from ProML method of the PHYLIP package [16]. Numbers in the nodes indicate bootstrap values for NJ and ML methods, respectively. Abbreviations of protein names correspond to PeroxiBase

5 Dyp-Type Heme Peroxidase Family

Originally named “dye-decolorizing peroxidases,” this protein family is spread among bacteria and fungi with so far ∼800 deposited sequences [7]. No archaeal representative could be unequivocally detected so far, although, recently, protein sequences from archaea were reported to belong to the same PFAM family [58] classified as IPR006314 or PF04261. It remains unclear whether the corresponding putative proteins possess all peroxidase domain features. Only one putative eukaroytic gene of nonfungal origin was found in the genome of a slime mold. Thus apparently, the distribution of the dyp-type peroxidase family is not as universal as demonstrated for the two heme peroxidase superfamilies presented above. In biotechnology, mainly the basidiomycete Dyp-members became popular for their ability to degrade various synthetic dispersive dyes, but their physiological role and mainly their physiological electron donors remain obscure and were not addressed by the authors investigating them. It was apparent after their first purifications, without sequence analysis, that they differ significantly in their substrate specificity from known Class II peroxidases that are frequently present in the same club fungi (Sect. 2.3.2) [59] and may interfere in screening of crude samples. In PeroxiBase, 106 sequences of DyP peroxidase family are already registered and annotated (January 2010) but several further can follow from newly sequenced genomes. Possibly in this case, heme peroxidase has evolved to its highest versatility: besides peroxidase activity with, e.g., anthraquinone derivatives serving as electron donors, dyp-type enzymes seem to have also hydrolase and oxygenase activities [7]. This resembles the evolution of di-heme peroxidase towards MauG protein, (Sect. 2.4) but for dyp-type peroxidases, this phenomenon needs more comprehensive analysis. Nevertheless, it is a striking aspect to consider also in future directed evolution experiments whether a heme peroxidase has the internal capacity to evolve towards an oxygenase. A dendrogram of few members of this family was already presented [60] where peroxidases from other families were linked together with Dyp family. This approach is rather problematic as there were only a very limited number of dyp-type sequences included, and the sequence similarity of dyp-type peroxidases to peroxidases from other families described above is too low for a higher-level phylogenetic analysis. All annotated and complete dyp peroxidase sequences from PeroxiBase were collected for the calculation of the unrooted phylogenetic tree presented in Fig. 2.14. Four distinct subfamilies can be clearly defined. According to clade distribution, it can be expected with a high probability that in the first step of their evolution subfamilies A and B diverged from C and D counterparts, and in the later steps subfamilies A and B as well as C and D segregated from each other and their genes have undergone speciations in four directions within the respective subfamilies. There are already 26 solely bacterial subfamily A members. Mainly E. coli and Shigella sp. genes are very abundant orthologs in this subfamily clade, addressing their potential role in pathogenicity. It has been proposed that YcdB protein from E. coli belonging to subfamily A can function as periplasmic peroxidase [61]. Twenty four bacterial and one protozoan member are building subfamily B. Two bacterial representatives of this clade are already known on the structural level [58] thus supporting this phylogenetic overview. The unique eukaryotic variant of this subfamily is at the moment a putative protein that needs to be further characterized. Twenty two solely bacterial C-subfamily members are distributed among proteobacteria, cyanobacteria, and actinobacteria. Corresponding proteins need to be investigated for their substrate specificity as there are only putative sequences within this subfamily. Currently, 27 solely fungal members form subfamily D. Pleurotus ostreatus [62], and Thanatephorus cucumeris [60] peroxidases are typical basidiomycete examples of this subfamily, the latter also with known 3D structure. Apparently, there are also several ascomycete representatives in this subfamily (Fig. 2.14). Their physiological function and substrate specificity with potential biotechnological applications remain completely unknown at the moment.

Fig. 2.14
figure 14

Reconstructed phylogeny of four types of dyp peroxidases. The unrooted tree obtained from the NJ method of the MEGA package [13] is presented. A very similar tree was obtained using ProML of the PHYLIP package [16]. Numbers in the nodes indicate bootstrap values for NJ and ML methods, respectively. Abbreviations of protein names correspond to PeroxiBase

6 Haloperoxidase Family

Haloperoxidases are abundant mainly among fungi, but a few very similar genes were recently detected also among oomycetes (water molds) that, although in several aspects are similar, do not belong to the monophyletic kingdom of fungi. As a newly defined class within stramenopiles, they are more closely related to plants than animals, and it will be very interesting to screen for haloperoxidase genes in genomes of other stramenopiles. Generally, we have to distinguish between haloperoxidases with prosthetic heme group and nonheme haloperoxidases that are phylogenetically unrelated with this family and can form two different gene families. Heme-thiolate haloperoxidases contain protoporphyrin IX as prosthetic group [63] and can catalyze the oxidative transformation of halides and halophenols. In most databases, they are still described as “chloroperoxidases,” e.g., in IPR000028 or PF01328 counting only around 140 distinct protein sequences (January 2010). The most intensively investigated member is chloroperoxidase from the ascomycete Caldariomyces fumago (CCPO) [64]. The evolution of this small peroxidase family resulted, in analogy with other peroxidase families (e.g., Sects. 2.4 and 2.5), in considerable multifunctionality as it possesses besides dehaloperoxidase also remarkable catalase [65] and peroxygenase activities [66]. According to recent opinions on enzyme evolution [34], this haloperoxidase multifunctionality can be understood as another example of ancient peroxidase promiscuity. Moreover, it was mentioned that CCPO (but very probably also other phylogenetic neighbors) is significantly more robust and functions in harsher conditions when compared with all other heme peroxidases [64]. No prokaryotic heme haloperoxidases are annotated so far, suggesting conversion from other gene type, not necessarily a highly specified peroxidase. The reconstructed tree of 63 haloperoxidase members is presented in Fig. 2.15 and demonstrates that heme haloperoxidases form a monophyletic group with frequent gene duplication events. There is a segregation of haloperoxidase representatives between ascomycete (sac) fungi and basidiomycete (club) fungi with gene speciation within particular genomes. However, in some clades, the genes are mixed between these phyla but this phenomenon can be observed also in other peroxidase families (e.g., Sect. 2.3.1.1). The oomycete (non fungal) representative appears to have diverged from related ascomycete genes; however, a more comprehensive analysis with more closely related representatives is needed. Among ascomycetes, basidiomycetes and oomycetes putative heme haloperoxidase genes are abundant mainly in genomes of phytopathogens. The best example is the presence of 13 haloperoxidase genes in the genome of ascomycete Phaeosphaeria nodorum, a major pathogen of wheat causing lead diseases and glume blotch. Such unusual, frequent occurrence of gene duplicates of several gene family paralogs within one genome opens interesting questions about their physiological role and possible involvement in host/pathogen interaction.

Fig. 2.15
figure 15

Phylogeny of heme-containing haloperoxidases. The reconstructed tree from the ME-method of the MEGA package [13] is presented. A very similar tree was obtained with the ProML method of the PHYLIP package [16]. Numbers in the nodes indicate bootstrap values for ME and ML, respectively. Abbreviations of protein names correspond to PeroxiBase

7 Conclusions

The phylogenetic analysis of structural and functional diversity of heme peroxideases can help in the search for new candidates for various biotechnological applications covering all important areas from red through green to white biotechnologies. The phylogenetic output can be also a good starting point for the planning of directed molecular evolution experiments to continue and simulate the natural evolution of peroxidases in the laboratory. This methodology can even lead to de novo peroxidase design (inspired by natural evolution) with desired structure and tailored reaction specificity.