Keywords

Introduction

The mammalian complement system is composed of more than 30 components present mainly in serum and cell membranes and plays essential roles in innate immunity [1]. The proteolytic activation of the central component C3 (the third component of complement) by the C3 convertases is the pivotal step in complement activation, and most major physiological functions of the complement system are induced by the two activation fragments of C3, the larger C3b and smaller C3a. Upon activation, C3b forms a covalent bond with the surface molecules of microbes using its intrachain thioester bond, and bound C3b functions as a molecular tag for foreign particles enhancing phagocytosis by macrophages and neutrophils. C3b also forms a covalent bond with C3b or C4b (the activation fragment of the fourth component of complement equivalent to C3b) of the C3 convertases (alternative pathway C3bBb and classical pathway C4bC2a), switching their specificity into the C5 convertases. Proteolytic activation of C5 (the fifth component of complement) initiates assembly of the late components, C6 (the sixth component of complement) to C9 (the ninth component of complement), leading to the formation of membrane attack complexes (MACs) which disturb the integrity of cell membranes of microbes. The smaller C3a fragment is also known as an anaphylatoxin and induces inflammation. Most key components of the human complement system including C3 possess unique domain structures, and are classified into five protein families [2]; the C3 family (C3, C4 and C5), Bf (Factor B) family (Bf and C2), MASP (mannan binding protein associated serine protease) family (MASP-1, MASP-2, MASP-3, C1r and C1s), C6 family (C6, C7, C8A, C8B and C9), and If (Factor I) family (If only). The origin and evolution of the complement system have been studied by identifying the complement genes possessing these unique domain structures in various eumetazoan species.

Evolution of the TEP Superfamily

C3 family is included in the TEP (thioester-bond containing protein) superfamily whose members are characterized by the unique intrachain thioester bond. Seven members of the TEP superfamily are encoded in the human genome: C3, C4, C5, A2M, pregnancy zone protein (PZP) [3], CD109 [4] and the complement 3 and PZP-like A2M domain-containing 8 (CPAMD8) [5]. Phylogenetic analysis of many TEP superfamily genes from various eumetazoa indicated the presence of two dichotomous families, C3 and A2M (Fig. 3.1), and this classification was supported by the presence of the ANA (anaphylatoxin) and C345C domains in all members of the C3 family, but never in the A2M family members [6]. The C3 family comprises human C3, C4, C5 and their orthologs of various eumetazoa, whereas the A2M family comprises human A2M, PZP, CD109, CPAMD8 and their orthologs. Accumulation of sequence information concerning various eumetazoa TEPs indicated that the iTEP (insect TEP) once considered unique to insects or arthropods is in fact orthologous to CD109 [6]. A2M, PZP and CPAMD8 are close to each other, and the A2M family is further subdivided into the A2M subfamily and the CD109 subfamily. The basic domain structure of the TEP superfamily members is an eightfold repeat of the macroglobulin (MG) domain. In addition to these eight MG domains, a CUB domain (C1r, C1s, uEGF and bone morphogenetic protein) holding the TED (thioester domain) in the middle is inserted between the seventh and eighth MG domains [7]. Moreover, other specific domains are present in each of the TEP members, such as ANA and C345C (C-terminal of C3, C4 and C5) domains of C3 and the bait domain in A2M.

Fig. 3.1
figure 1

Molecular phylogenetic anaylsis of TEP family genes by maximum likelihood method. The evolutionary history was inferred by using the maximum likelihood method based on the Whelan and Goldman + Freq. model. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 40 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 873 positions in the final dataset. Bootstrap percentages with 500 replicates are given. Accession numbers of the used sequences and scientific names of animals are; human (Homo sapiens) C3, C4A, C5, A2M, CD109, CPAMD8, and PZP (NP_000055, P0C0L4, AAA51925, P01023, NP_598000, NP_056507 and CAA38255), squid (Euprymna scolope) C3 (ACF04700), sea urchin (Strongylocentrotus purpuratus) C3 and CPAMD8 (NP_999686 and XP_785018), sea cucumber (Apostichopus japonicas) C3 (ADN97000), sea squirt (Ciona intestinalis) C3 and CPAMD8 (NP_001027684, and XP_002124325), lamprey (Lethenteron japonicum) C3 and A2M (Q00685 and BAA02762), hagfish (Eptatretus burgeri) CD109 (BAD12264), horseshoe crab2 (Tachypleus tridentatus) C3 and A2M (BAH02276 and BAA19844), amphioxus (Branchiostoma floridae) C3 and CPAMD8 (XP_002248496 and XP_002239366), coral (Swiftia exserta) C3 (AAN86548), sea anemone (Haliplanella lineate) C3, A2M, and CD109 (AB481383, AB481385 and AB481386), fruit fly (Drosophila melanogaster) CD109 (DrmeTEP1) (NP_523578), mosquito (Anopheles gambiae) CD109 (AngaTEP1) (AAG00600), clam (Hyriopsis cumingii) A2M (ABJ89824), clam (Venerupis decussatus) C3 (FJ392025), tick (Ornithodoros moubata) A2M (AAN10129), tick (Ixodes scapularis) CD109 (XP_002409560), snail (Euphaedusa tau) CD109 (BAE44110), spider (Hasarius adansoni) C3, A2M and CD109 (AB622468, AB622470 and AB622471), shrimp (Fenneropenaeus chinensis) A2M (ABP97431), crab (Eriocheir sinensis) A2M (ADD71943), honey bee (Apis mellifera) A2M (XP_392454) and CD109 (XP_001122599), flour beetle (Tribolium castaneum) A2M (EFA07508) and CD109 (XP_972838)

The evolutionary origin of the TEP superfamily remains to be clarified. However, no TEP gene is present in the published genome information for a sponge, Amphimedon queenslandica [8] and a choanoflagellate, Monosiga brevicollis [9], suggesting that this gene family arose in the eumetazoan lineage. The three types of TEP genes, C3, A2M and CD109 were identified in two cnidarian sea anemones, Haliplanella lineate [10] and Nematostella vectensis [11, 12], indicating that differentiation of the TEP genes into C3, A2M and CD109 had completed before the divergence of cnidaria and bilateria. All deuterostome species analyzed thus far such as various vertebrates, urochordate sea squirt [13], cephalochordate lancelet and echinoderm sea urchin [2], possess both the C3 and A2M family members, whereas many protostome genomes deciphered thus far such as fly [14], mosquito [15], honeybee [16], parasitoid wasp [17], aphid [18], flour beetle [19], and Caenorhabditis elegans [20] possessed only the A2M family members. Among protostomes, only horseshoe crab [21, 22], spider [6], tick [23], clam [24] and squid [25] were reported to possess the C3 family, indicating that the loss of the C3 family has occurred multiple times during the protostome evolution [6]. The loss of the C3 family also occurred at least once in cnidaria, since hydra has only the A2M family gene [26]. Although the reason why the C3 family genes were lost so many times during the evolution of cnidarians and protostomes is still to be clarified.

Evolution of the Bf, MASP and If Families

The Bf, MASP and if family members are serine proteases possessing a serine protease domain at their C-termini [27]. The human genome contains two Bf family genes, Bf and C2, four MASP genes, MASP-1, MASP-2, C1r and C1s, and only one If family gene. The gene duplication events which played an important role in establishing a modern complement system like the mammalian one will be discussed below. In the following, the common ancestor genes of C3, C4 and C5, Bf and C2, MASP-1, MASP-2, C1r and C1s of invertebrates, will be simply termed as C3, Bf and MASP, respectively. Whereas Bf and MASP are involved in the complement activation pathways, If has a regulatory function to repress complement activation by degrading activated C3 fragments in the presence of the cofactor proteins. The domain architectures of these families based on the SMART database (http://smart.embl-heidelberg.de/) are CCP (complement control proteins) × 3, VWA (von Willebrand factor type A), Tryp_SPc (trypsin-like serine protease); CUB (C1r, C1s, uEGF and bone morphogenic protein), EGFCA (calcium-binding EGF-like), CUB, CCP × 2, Tryp_SPc; and FIMAC (factor I membrane attack complex), SR (scavenger receptor Cys-rich), LDLa (low-density lipoprotein receptor domain class a) × 2, Tryp_SPc for the Bf, MASP and If families, respectively. The Bf family genes have been identified from most deuterostome species analyzed thus far [2], whereas only a few cnidaria [11] and protostome species belonging to arthropod [22] and mollusk [24] are reported to possess the Bf family gene. Actually, the Bf family gene is missing from the genome sequences of cnidarian hydra, C. elegans and several insect species. Although the only lophotrochozoa species reported to possess Bf thus far is a clam, the clam Bf lacks the catalytic Ser residue, suggesting that it should not activate C3 in spite of the conservation of the basic domain structure of Bf [24]. It is therefore suggested that although the common ancestor of cnidaria and bilateria had the Bf family gene, it was abandoned multiple times in the cnidarian and protostome lineages. Moreover, the Bf-independent complement activation pathway was well characterized in horseshoe crab, an arthropod [21], although it is suggested that horseshoe crab also has the Bf-dependent activation pathway [28]. At the present moment, information on the Bf gene in the protostome is very limited compared to that of the C3 gene, and it is of interest to find out if most protostome species possessing the C3 gene also retained the Bf gene or not. In contrast to the basic conservation of the domain structure throughout evolution, the primary structure of the serine protease domain of Bf shows a curious evolutionary variability. Compared to the other serine proteases, the serine protease domain of mammalian Bf and C2 has a number of structurally unique features, in particular that the bottom of the S1 pocket has a negative charge at Asp226 instead of the usual Asp189 [29]. All jawed vertebrate Bf and C2 so far analyzed have this unique structure, whereas Bf from lamprey [30], ascidian [31], lancelet [32], sea urchin [33], horseshoe crab [22] and sea anemone [11] has Asp189, not Asp226, like all other serine proteases with the trypsin-type specificity. Thus, the structural specialization of the serine protease domain of Bf/C2 seems to have occurred in the common ancestor of the jawed vertebrate, simultaneously with the appearance of the adaptive immune system. The functional consequences of this structural specialization are, if any, still to be clarified.

The MASP family genes have been reported from cnidarian and deuterostome except for echinoderm [2]. No MASP family gene has been identified in the deciphered genome sequences of protostomes. However, no comprehensive search for the MASP family gene has been performed in the protostome species known to possess the C3 gene, and it is still an open question whether the MASP family gene is present in some protostome species or not. The serine protease domain of the human MASP family members are classified into two groups, the usual one termed TCN-type in which the active site serine is encoded by a TCN codon, and the other unique one termed AGY-type which is characterized by an AGY codon for the active site serine, the absence of the disulfide bond termed the histidine loop, and the absence of introns in the genomic region coding for the serine protease domain [34]. Human MASP-1 belongs to the former group, and human MASP-2, C1r and C1s belong to the latter group. The structure of the human MASP-1/-3 gene, which encodes both MASP-1 and MASP-3 by differential usage of the dual serine protease regions, has a usual serine protease region at its 3’ end, and the second, intron-less serine protease region just upstream of the usual one [35]. Therefore, the AGY-type MASP is considered to be generated by insertion of the intron-less serine protease region into the TCN-type gene [36]. Whereas only the TCN-type MASP gene has been identified from sea anemone [11] and sea squirt [37], the AGY-type MASP is present in lancelet [38] and vertebrates. Thus, the timing of the insertion of the intron-less serine protease-encoding region is considered to be before the divergence of cephalochordates and vertebrates, although the entire evolutionary story is still to be clarified.

The If family gene has been identified in all major groups of vertebrates, cyclostomes [39], cartilaginous fish [40], teleosts [41], amphibians [42], reptiles [43], birds [44] and mammals. No If family gene has been reported from protostomes and invertebrate deuterostomes, indicating that the If gene was established in the common ancestor of vertebrates. Thus, the If-dependent regulatory mechanism of complement activation seems to be an innovation in vertebrates. Since unrestricted activation of the complement system could be harmful to the host cells and could lead to depletion of the complement components in a short time, it seems to be essential even for the invertebrate complement system to possess some regulatory mechanism.

Evolution of Terminal Complement Component Genes

The domain structures of five human terminal complement component (TCC) genes defined at the SMART site (http://smart.embl-heidelberg.de/) are; C6, TSP1 (Thrombospondin type 1 repeats)-TSP1-LDLa-MACPF (membrane attack complex/perforin)-TSP1-CCP (Domain abundant in complement control proteins)-CCP-FIMAC-FIMAC; C7, TSP1-LDLa-MACPF-TSP1-CCP-CCP-FIMAC-FIMAC; C8A, TSP1-LDLa-MACPF-TSP1; C8B, TSP1-LDLa-MACPF-TSP1; C9, TSP1-LDLa-MACPF. Thus the TSP1-LDLa-MACPF domain structure is conserved by all five TCC genes, suggesting that they originated from a common ancestor by gene duplication and following modification of the domain structure. However it is still not clear whether the common ancestor of the human TCC genes had a simple domain structure like C9 and new genes were generated by adding extra domains or it had a complex domain structure like C6 and new genes were generated by losing some domains [45]. The genes possessing exactly the same or very similar domain structures as human TCC genes have been identified from all classes of extant jawed vertebrates including cartilaginous fish [46, 47] and teleost [48]. In contrast, no such gene has been identified from lamprey. The previous functional and biochemical analysis of lamprey serum identified an opsonic complement activity but not cytolytic complement activity. Actually, a natural hemolytic activity was present in the lamprey serum, although the only molecule responsible for this hemolytic activity seems to be a 25 kDa protein without any connection to the complement system [49]. All these results indicate that the cytolytic activity of the complement system was established in the common ancestor of jawed vertebrates. Many C9-like genes have been identified from the genomes of urochordate Ciona intestinalis [13] and cephalochordate Branchiostoma floridae [50]. A typical domain structure of them is TSP1-LDLa-MACPF, indicating a close evolutionary relationship with the human TCC genes. It is highly probable that they are involved in some cytolytic process. However, it is unlikely that they are activated by the complement system of these animals. Activation of human TCC is initiated by the binding of C6 to the activated C5, C5b. The interaction between the C345c domain of C5 and the two FIMAC domains of C6 has been demonstrated to play an essential role in this binding by biochemical [51] and 3D structure [52] analyses. Since none of the C9 like molecules of urochordates and cephalochordates possesses the FIMAC domain, they seem to function as cytolytic factors independent from their complement system reported to have opsonic and preinflammatory functions. Distribution of the complement genes in various animal groups is summarized in Table 3.1.

Table 3.1 Distribution of the complement component genes in various animal groups

Evolutionary Scenario of the Complement System

The tentative evolutionary process of the complement system is schematically shown in Fig. 3.2. The traceable origin of the complement system at the present moment is the primitive complement system composed of C3, Bf and MASP in the common ancestor of eumetazoa. Conservation of the amino acid sequences critical for basic functions of these components between cnidaria and humans suggests that the basic activation mechanism and physiological functions of the primitive complement system were not very different from that of the human complement system. Thus, MASP appears to be the first protease to be activated, which in turn activates Bf. Then Bf activates C3 into the C3a and C3b fragments. C3b covalently tags microbes and enhances phagocytosis, whereas C3a induces inflammation as an anaphylatoxin.

Fig. 3.2
figure 2

Evolution of the complement system. The bold horizontal arrow represents evolution of the complement system. Light gray indicates a primitive complement system composed of C3, Bf and MASP. Medium gray indicates an intermediate complement system with the If-dependent regulation mechanism. The dark gray indicates a modern complement system found in jawed vertebrates possessing the lytic and classical pathways. Vertical arrows indicate recruitment or innovation of additional components, whereas circular arrow indicates gene duplication events

In deuterostomes, this basic complement system is retained by all members analyzed thus far; echinoderm, cephalochordate, urochordate and vertebrate. The marked development of the complement system seems to have occurred in the common ancestor of jawed vertebrates. Gene duplication and following structural and functional specialization seem to play an important role in this process. First, C4, C2 and C1r/C1s, most probably generated by gene duplication from C3, Bf and MASP, respectively, contributed to establish the classical pathway. The canonical adaptive immunity based on IgSF receptors and MHC is also believed to have been established in the common ancestor of jawed vertebrates, and the establishment of the classical pathway was epoch-making in the respect that it connected the complement system with adaptive immunity. Second, the gene duplication generating C5 from C3 on the one hand, and the recruitment of the FIMAC domain into the C9-like genes on the other hand connected the preexisting complement system and non-complement cytolytic system.

In contrast to deuterostomes which basically retained the complement system composed of C3, Bf and MASP, the loss of the complement system occurred multiple times in the protostome lineages, and several protostome genome sequences deciphered thus far do not contain any complement genes. Although the reason why the complement system was lost so many times in protostomes is still not clear, in some insects CD109, a paralogue of C3, is reported to be multiplied and play an opsonic role like C3 [53]. Thus, it is possible that some protostomes such as insects developed unique immune mechanisms, making it unnecessary to retain the complement system. Even in horseshoe crab, which retained the complement system, unique specialization is reported compared to the deuterostome complement system. Thus, factor C originally identified as an LPS-sensitive initiator of hemolymph coagulation stored within hemocytes is used as an initiating protease of the horseshoe crab complement system [21]. In this activation pathway, factor C directly activates C3 without intervention of Bf. Thus, factor C first evolved in the horseshoe crab lineage, and this alteration in the initiating mechanism of the complement system therefore occurred recently on an evolutionary time scale. Thus, as far as the complement system is concerned, the innovations of protostomes seem to be more revolutionary than those of deuterostomes.