Amino Acid Patterns Within Short Consensus Repeats Define Conserved Duplicons Shared by Genes of the RCA Complex

McLure, Craig A.; Dawkins, Roger L.; Williamson, Joseph F.; Davies, Richard A.; Berry, Jemma; Laird, Natalie Longman-Jacobsen Rebecca; Gaudieri, Silvana

doi:10.1007/s00239-004-2609-8

Amino Acid Patterns Within Short Consensus Repeats Define Conserved Duplicons Shared by Genes of the RCA Complex

Published: August 2004

Volume 59, pages 143–157, (2004)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

Amino Acid Patterns Within Short Consensus Repeats Define Conserved Duplicons Shared by Genes of the RCA Complex

Download PDF

Craig A. McLure¹,
Roger L. Dawkins¹,
Joseph F. Williamson¹,
Richard A. Davies¹^nAff2,
Jemma Berry¹,
Natalie Longman-Jacobsen Rebecca Laird¹ &
…
Silvana Gaudieri¹^nAff3

398 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

Complement control proteins (CCPs) contain repeated protein domains, short consensus repeats (SCRs), which must be relevant to diverse functions such as complement activation, coagulation, viral binding, fetal implantation, and self–nonself recognition. Although SCRs share some discontinuous and imperfect motifs, there are many variable positions and indels making classification in subfamilies extremely difficult. Using domain-by-domain phylogenetic analysis, we have found that most domains can be classified into only 11 subfamilies, designated a, b, c, d, e, f, g, h, i, j, or k and identified by critical residues. Each particular CCP is characterized by the order of representatives of the subfamilies. Human complement receptor 1 (CR1) has ajefbkd repeated four times and followed by ch. The classification crosses CCPs and indicates that a particular CCP is a function of the mix of SCRs. The aje set is a feature of several CCPs including human CR1 and DAF and murine Crry and appears to be associated with the success or failure of implantation inter alia. This approach facilitates genomic analysis of available sequences and suggests a framework for the evolution of CCPs. Units of duplication range from single SCRs, to septamers such as efbkdaj, to extensive segments such as MCP-CR1L. Imperfections of duplication with subsequent deletion have contributed to diversification.

Spatially conserved motifs in complement control protein domains determine functionality in regulators of complement activation-family proteins

Article Open access 05 August 2019

Annotation and Classification of CRISPR-Cas Systems

Convergent Evolution Within CEA Gene Families in Mammals: Hints for Species-Specific Selection Pressures

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The complement control protein (CCP) family has been conserved from invertebrates to provide required regulation of complement and other cascades and to enable self–nonself discrimination. The CCPs have evolved in parallel with the complement cascade and can be regarded as the major agents of regulation. During vertebrate evolution the family has expanded such that there are numerous copies clustered throughout the vertebrate and indeed human genome. Apparently functions have diverged and now include viral receptors, regulation of adaptive immunity, and maintenance of fetomaternal tolerance. (Bell 2000; Xu et al. 2000).

The CCP family is defined by the presence of reiterated protein domains known as short consensus repeats (SCRs). An SCR typically comprises 56–70 amino acids, including four cysteines and a tryptophan which are thought to be essential in holding the domain in its rigid triple-loop structure (Schwarzenbacher et al. 1999) which permits the formation of complex proteins and multiple binding affinities. Each CCP has a specific number of SCR units at the genomic level, e.g., human (Homo sapiens) CR1 has 30 and Homo sapiens decay accelerating factor (DAF) has four SCRs. Numerous SCR-containing proteins are encoded within the genomic region known as the regulators of complement activation (RCA) complex on 1q32.

It remains to be determined how such domains could be responsible for such diverse functions, how each CCP achieves a remarkable degree of ligand specificity, and how self–nonself discrimination is mediated. Indeed, it is still uncertain how it will be possible to explain the similarities as well as the differences in function when component SCRs are compared.

Structure–function studies have exploited several different approaches but can be difficult to interpret. For example, the X-ray structure of two SCRs of CR2 in complex with C3d highlights some of the problems including similarities in sequence of SCRs and therefore cross-reactivity of antibodies and inhibitory peptides (Szakonyi et al. 2001).

Valuable insights have been provided through the use of in situ mutagenesis. For example, the Atkinson group (Krych et al. 1998; Krych-Goldberg et al. 1999) has shown that some residues are important in the binding of C4b to CR1. Interestingly, some of the effects of mutagenesis are unexpected in that the critical mutation is remote from the putative binding site. Furthermore, a major conclusion is that combinations of SCRs are more important than single domains (Kirkitadze et al. 1999), suggesting that the evolutionary unit is a specific set of SCRs. There is a need for alternative approaches to define the sequence characteristics of these sets. Here we illustrate the value of phylogenetic analysis.

From an evolutionary perspective, the CCP family is a very useful model. Undoubtedly, duplication has been the driver of expansion and diversification but the extent of the region duplicated must have ranged from a single domain or less through multiple contiguous domains as fixed sets to genomic segments containing large genes. All of these processes are suggested from a comparison of cDNA sequences and apparent to various degrees from genomic analysis of the data available for the RCA complex (Heine-Suner et al. 1997; Rodriguez de Cordoba et al. 1999; http://www.ncbi.nlm.nih.gov).

With the dual aims of elucidating structure–function relationships and defining mechanisms of genomic duplication, we have analyzed genes within the RCA complex and compared the results with those obtained after earlier analyses of duplication within the major histocompatibility complex (MHC) (Dawkins et al. 1999).

We show here that SCRs can be classified into a limited number of groups and the arrangement of these groups relates to function. By analogy, each SCR is a letter, the order of the letters defines sets (or words), and the number and content of the words create CCPs (or sentences).

We also show that, as in the MHC, it is the imperfection of the duplication processes that contributes to diversity.

Methods

Primate and Rodent RCA Sequences

The relationships between individual SCRs were investigated using the following amino acid and nucleotide sequences: Homo sapiens CR1 (P17927 and Y00816), Homo sapiens CR1L (positions 1278188–1355188 on gi:22041066), Homo sapiens complement receptor 2 (CR2) (PL0009), Homo sapiens DAF (P08174), Homo sapiens membrane cofactor protein (MCP) (P15529), Homo sapiens MCPL (positions 1263188–1273188 on gi:22041066), Homo sapiens C2 and factor B (Bf) (AF019413), chimpanzee (Pan troglodytes) CR1 (AAA51438), Pan troglodytes DAF (AAF73177), baboon (Papio hamadryas) CR1 (AAA62170), Papio hamadryas DAF (AAF73178), Papio hamadryas MCP (AAB66815), house mouse (Mus musculus) CR1 (AAA78271), Mus musculus complement regulatory protein (Crry) (AAA37468), Mus musculus DAF (AAB00092), Mus musculus MCP (BAA31859), Mus musculus CR2 (A43526), and Norway rat (Rattus norvegicus) Crry (AAA91821).

Strategy

Domain-by-Domain Analysis

Initially, we identified SCRs contained within RCA proteins and other CCPs using Pfam (http://pfam.wustl.edu/) and database annotation. Alternative definitions were also used, but in essence the defining motif was simply C..C..C..W..C (Hourcade et al. 1989; Reid et al. 1986).

Next, all domains were extracted and annotated before being added to a SCR database. The individual sequences could be analyzed without regard to their presence within a particular CCP. The intention was to compare and classify SCRs independently without bias.

Third, various phylogenetic analyses were used to establish relationships between individual SCR sequences. Alignments were created (ClustalW v1.8) and adjustments were evaluated depending upon the purpose of the alignment. For example, the clusters were found to differ by indels as well as residues and it was sometimes necessary to realign on the basis of the C..C..C..W..C motif. Given the divergence between and within duplicons of CCPs, evolutionary distances were estimated using the gamma distance model, which accounts for multiple amino acid substitutions and variation of substitution rate among sites. A gamma shape parameter of 0.93 was used based on previous studies of SCRs. (Krushkal et al. 2000). Phylogenetic trees were constructed by neighbor-joining and parsimony methods. Each tree underwent 1000 bootstrap replications and was not rooted (Molecular Evolutionary Genetics Analysis version 2.1; Pennsylvania State University 1993).

Peptide Searches

Using the amino acid alignment shown in Fig. 1, selected amino acid motifs from the group consensus sequences were searched against the databases SwissProt, Prosite, and GenBank/DDBJ/EMBL using BLAST (http://www.ncbi.nlm.nih.gov/Blast) and Motif 2 (http://motif.genome.ad.jp/MOTIF2.html).

Genomic Analysis of the MCP and CR1 Duplicons

The genomic sequence containing the genes CR1, MCP, CR1L, and MCPL at 1q32 was taken from the NCBI database (http://www.ncbi.nlm.nih.gov/) (positions 1103188–1453188 on contig NT_021877.12 [gi:22041066]; accession numbers AL691452.7, AL137789.11, AL365178.8, and AL035209.1) and compared against itself using the dot matrix program, Dotter (Sonnhammer and Durbin 1995). SCR and septamer locations were identified by aligning the CR1 mRNA (Y00816) with the genomic sequence of CR1 and CR1L using Blast 2 (http://www. ncbi.nlm.nih.gov/blast/bl2seq/bl2.html).

Genomic Analysis of the CR1 Revised Sequence

The genomic sequence containing the gene CR1 at 1q32 has recently been revised. The new sequence was taken from the NCBI database (http://www.ncbi.nlm.nih.gov/) (positions 1129691–1479691 on contig NT_021877.15 [gi:29789880]; accession numbers AL691452.10 and AL137789.11) and compared against the older CR1 genomic sequence (positions 1128988–1248188 on contig NT_021877.12 [gi:22041066]; accession numbers AL691452.7 and AL137789.11) using the dot matrix program, Dotter (Sonnhammer and Durbin 1995). SCR and septamer locations were identified by aligning the CR1 mRNA (Y00816) with the genomic sequences of CR1 using Blast 2 (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html).

Synonymous and Nonsynonymous Rates WithinCCPs

Based on the amino acid alignment described above, a sliding window analysis was performed on the nucleotide sequences of CR1 SCRs to determine synonymous and nonsynonymous rates using the program WINA v0.3 (Endo et al. 1996). Results were plotted using Microsoft Excel 98.

Results

Phylogenetic Analysis Suggests a UniversalClassification of SCRs into Subfamilies Shared by Many CCPs

To classify the SCRs and define relationships within and between CCPs, we undertook several phylogenetic analyses of individual domains. All revealed unexpected but remarkable similarities between some SCRs. As an approach to classification we selected the SCRs from the largest CCP. Figure 2 shows that the 30 SCRs of Homo sapiens CR1 yield seven distinct clusters containing four SCRs. Each cluster contains every 8th SCR, reflecting the four septamers (otherwise known as long homologous repeats; LHR) which account for the first 28 SCRs. The seven clusters were designated a (1, 8, 15, 22), b (5, 12, 19, 26), d (7, 14, 21, 28), e (3, 10, 17, 24), f (4, 11, 18, 25), j (2, 9, 16, 23), and k (6, 13, 20, 27) so as to reflect phylogenetic proximity. Although the relationships between clusters were only provisional due to the long branch lengths and require further analysis, the intention was to develop a series of hypotheses to describe the likelihood of common origins.

So as to define further clusters we then pooled the SCRs of CR1 and DAF; the SCRs fell within the a, j, and e clusters such that CR1 and DAF can be written (ajefbkd)₄, xx and aaje, respectively, indicating a common set of aje (Table 1).

Table 1 Conservation of the complement regulatory SCR set aje across vertebrates

Full size table

When SCRs of CR2 are included, clusters c, g, h, and i appear, giving a total of 11 subfamilies. Interestingly, the two unclassified (x) and terminal (29 and 30) SCRs of CR1 cluster with corresponding SCRs of CR2, i.e., c contains CR1 SCR 29 and CR2 SCR 15 and h contains CR1 SCR 30 and CR2 SCR 16; CR1 and CR2 share a set consisting of ch. Thus, as illustrated in Table 1, Homo sapiens CR1 can be written (ajefbkd)₄ch. CR2 is jfbkdgaidgbidfch, which can be rewritten jf(a/bi/kdf/g)₃ch, where a/b, i/k, and f/g are introduced to recognize alternatives yielding three tetramers with a or b, i or k, d, f or g preceded by jf and followed, as in the case of CR1, by ch. Alternatively CR2 could be j(f/ga/bi/kd)₃fch, where the tetramer is similar in content although different in order.

As shown in Figs. 1 and 2, this analysis suggests that SCR subfamilies have diverged to greater or lesser degrees since the original sets were duplicated. For example, the original tetramer of CR2 may have contained the progenitors of a + b, i+ j + k, d + e, and f + g, which is now present in CR1 as ajef and in CR2 as a/bi/ kdf/g. Within CR2, we deduce that the d subfamily has been conserved relative to the other more divergent components of the tetramer. Interestingly, the terminal ch set has been retained by both CR1 and CR2.

SCR Subfamilies and Their Sets Are Shared byDifferent RCA Proteins

With rare exceptions, all SCRs in our database could be classified into these 11 subfamilies (a to k) or their putative progenitors and all CCPs examined could be described according to the schema used for CR1 and CR2 (Table 1). For example, MCP is aj/kdx, suggesting a common origin with CR1 and CR2, divergence or shuffling yet relative retention of the proximal aj set. CR1L is particularly interesting. When the available genomic sequence is examined (Fig. 3), we identify extra SCRs and a structure of a(jefbk)₂d, indicating a close but imperfect relationship with CR1, viz. (ajefbkd)₄ch. Other sequences may well indicate further heterogeneity, but on the basis of the current data, it appears that the internal jefbk set was duplicated as a pentameric segment or that deletions occurred after duplication of the ajefbkd septamer. Apparently, divergence is ongoing since, for example, the introns between j1 and j2 differ in length and, especially, the content of insertions indicated by gaps (Figs. 4A and 4B). We therefore asked whether there were other ways of defining the subfamilies.

Substitutions and Indels Define SCR Subfamilies

Once the major clusters were identified by phylogenetic analysis, it became obvious that some residues are characteristic of, or even restricted to, a single cluster. Many of these residues had previously been regarded as alternatives within all SCRs but we prefer the possibility that there are sites which define specific subfamilies. Further evidence for this interpretation is shown in Table 2, where it can be seen that substitutions correlate with coding indels and therefore the number of residues, i.e., the length of the SCR.

Table 2 Characteristics amino acid patterns of nine SCR groups

Full size table

Conservation of SCR Subfamilies and Their Sets in Other Vertebrates

The validity of the phylogenetic approach was evaluated by examining non-human primates and other vertebrates. If the clusters are meaningful, they should be apparent throughout at least recent evolution. In terms of the order and position of subfamilies, DAF is the same in primates and Mus musculus. Pan troglodytes and Papio hamadryas CR1 are identical to Homo sapiens CR1 with the exception that the first Papio hamadryas SCR a has been deleted at the protein level (Table 1).

More interesting is the comparison with the mouse. Mus musculus CR1 has been contrasted with Homo sapiens CR1 (Holers et al. 1992; Kalli and Fearon 1994; Molina et al. 1994; Prechl and Erdei 2000), but here we show that in spite of all the differences, the sequence designated MCR1 contains ajefbk and is therefore identical to the first six SCRs of Homo sapiens CR1 and CR1L. The remaining sequence of MCR1 is similar to CR2, as expected from earlier observations, suggesting that MCR1 is in fact CR1 and CR2 or, in view of the present findings, a composite of CR1L and CR2. A comparison of Homo sapiens and Mus musculus CR2 reveals the expected three tetramers (a/b, k/i, d, f/g) followed by ch, with the exception that the third tetramer in Mus musculus has lost the first SCR. Interestingly, as with Homo sapiens, d is less divergent than the other SCRs.

The differences between primate and rodent CR1 suggest that the true orthologue for Homo sapiens CR1 and CR1L may be Crry rather than Mus musculus CR1. Using our classification, the sequence designated Mus musculus Crry is ajefh, raising the interpretation that four external domains (ajef) may subserve at least some CR1 or CR1L functions. The Rattus norvegicus Crry sequence is designated ajefkhh and therefore shares the ajef, although the terminus may be different.

Human CR1 is Closer to Mus musculus Crrythan Mus musculus CR1

So as to investigate the evolutionary relationships between Homo sapiens CR1 and Mus musculus CR1 and Crry, we included all component SCRs in the same phylogenetic analysis as shown in Fig. 5. Clearly Crry ajef are very similar to the ajef, respectively, of the first, second, and third septamers of human CR1. Murine CR1 is more remote.

SCR Subfamilies and Function

Previous studies have suggested that the first three and particularly the second and third SCRs are involved in binding to C3 and C4. From Table 1, it can be deduced that the aje set encodes the region of relevance. Other sets of subfamilies clearly have a different significance. For example, the ch set relates to the terminus.

Peptide Patterns Marking Functional Subfamilies

If, as implied above, the classification of SCRs relates to function, it should be possible to determine which amino acid residues are specific and critical to the function of subfamilies, at least within orthologous proteins in related species. From Fig. 1 and Table 2 it can be seen that there C..C..C..W..C are defining in terms of SCRs generally but that most of the other patterns are actually characteristic of specific subfamilies at least within primates. For example, a contains an A which is specific, with some minor exceptions, within the b subfamily, possibly indicating the proximity of a and b. The proximal I/L/V marker, used in earlier studies (see patterns at the base of Fig. 1), is an I in d, e, and c but M in h and P in j. It can also be seen that the D/E marker is in a as a T, in b as Q, in c as M, in d as K, in j as S, in k as A. These residues are preceded by a G which is not present in f, g, or i. The more terminal proline is found in b, d, e, f, g, j, and k but not in a and not necessarily in h, c, or i, etc. With minor exceptions most of the so-called motifs can be used to define subfamilies of the SCRs of CR1 and Crry (Table 2).

Indels and Extensive Continuous Sequences

A striking feature of these analyses is that the different groups of SCRs have specific insertions and deletions (indels) and therefore specific lengths (Table 2). Furthermore, as a consequence, there are many more or less specific continuous sequences which might not have been expected from previous analyses.

Evolution of SCRs and CCPs

From the analyses of protein sequence, it is apparent that RCA proteins have evolved through a series of processes involving duplication and divergence of SCR domains. It is also clear that individual SCRs and particularly sets of SCRs have become specialized in terms of position, proximity, and undoubtedly function. Thus, for example, aje and ch occur on different proteins but in the same relative position. Duplications, deletions, and other processes of divergence have contributed to this process of specialization or selection.

Units of Duplication and Deletion

Having demonstrated that there are specific subfamilies of SCRs, it becomes clear that there must have been many diverse units of duplication and deletion. For example, a alone has been duplicated in DAF. The set ajefbkd has been duplicated as a septamer in the evolution of CR1. This septamer may have arisen by insertion and addition of d to ajefbk, which is found in CR1L and MCR1, or by a more complex process of shuffling. Based on similarities between CR1 and CR2, we expect that there was a much earlier tetramer corresponding to ajef and a trimer corresponding to bkd in CR1 and it appears that these may have been key elements in the formation of many CCPs. Note also that aje and bkd may have arisen from an ancestral form of a trimer designated a+b, j+k, d+e. That is, progenitors of a+b, of j+k, and of d+e may have given rise to aje and also bkd.

Examination of genomic sequence must be qualified by the difficulty in assembling duplicated sequences but appears to indicate extreme complexity. Large units have also been duplicated and/or deleted; shuffling of segments, whole genes, or their fragments has been superimposed on the addition or loss of smaller units. Thus the primordial CR1 and MCP sequences were duplicated as a segment of at least 100 kb which has now extended through further internal duplication and insertion.

It is apparent that duplication may have been imperfect, at least to the extent that some SCRs have diverged substantially and independently while maintaining their position within a duplicon. For example as shown in Fig. 2, in Homo sapiens CR1, a of septamer 1 (a1) has diverged from a2 and a3 (which are identical), whereas f1, f2, and f3 are conserved. In terms of divergence and conservation within CR1, there are similarities between a and. j, between e and f, and among b, d, and k. It is interesting that these groupings are similar to the sets identified above, suggesting that there may be constraints imposed by function and selection.

Relevance to CCPs Encoded Within the MHC

C2 and Bf contain C..C..C..W..C but are encoded within the MHC rather than the RCA complex. We therefore asked whether subfamilies of SCR can be identified. By phylogenetic analysis (Martinez et al. 2001), it can be seen that the h subfamily is present within C2-SCR2 and Bf-SCR2, suggesting that the subfamily existed before the separation of the MHC and RCA clusters. In both genes, SCR3 has some features of a and b, as might be expected if a and b had a common origin, viz a + b. In both genes, SCR1 is truncated but possibly related to a.

A remarkable feature of these analyses is the similarity between C2 and Bf (see Fig. 1). Thus although the two forms are quite different in length and intron structure, they have retained almost identical SCRs. The differences between C2 and Bf may relate to differences in their functional roles in the complement cascades but the residual similarities suggest that there is local conservation of sequence.

Nucleotide Analysis

Phylogenetic analysis of the nucleotide sequence (synonymous sites) of CR1 revealed a very similar result to the amino acid sequence as shown in Fig. 2. Not only were the clusters the same but so too were the different patterns of divergence. This finding emphasizes the apparent paradox: although it appears that CR1 has been formed by successive duplications of a septamer, some of the components of each septamer appear to have a different evolutionary history.

To search for indicators of selection, we used a window analysis to plot the synonymous (dS) and nonsynonymous (dN) changes throughout the length of the CR1 SCRs. The differences within a subfamily (see Fig. 2) can be predominantly synonymous (e.g., a1 [CR1-1]- versus a2 [CR1-8] and a3 [CR1-15]) or nonsynonymous (e.g., j2 [CR1-9] and j3 [CR1-16] versus j4 [CR1-23]) (Fig. 6), suggesting that there has been some divergence of function since the time of duplication of the septamers. In some comparisons (e.g., j and f), the dN/dS ratio reverses as the window progresses along the sequence (Fig. 6). The d and b comparisons reveal limited synonymous and nonsynonymous changes. These findings emphasize the complexity of duplication and suggest that there have been further indels and shuffling of the SCRs following duplication of the septamers.

Discussion

Units of Duplication

This work arose from our interest in comparing the MHC and the RCA complex, especially from the point of view of duplication. Within the MHC, the duplication process generally involves segments of heterogeneous genes, fragments, and indels and there is emerging evidence to suggest that the RCA complex may contain similar segments containing fragments of CR1 and MCP (Hourcade et al. 2000). However, in contrast to the MHC, the dominant process within the RCA complex appears to have involved sequential duplication of one or more relatively homogeneous domains (SCRs). So as to define the latter process, we classified SCR sequences excised from several different CCPs. Remarkably, most SCRs examined could be grouped into only 11 subfamilies, designated a to k. As shown above, Homo sapiens CR1 can be written as (ajefbkd)₄ ch, and DAF as aaje. From such examples, it can be concluded that the subfamilies have been conserved but also shuffled to create specific sets which have themselves been conserved. Duplication has operated at multiple levels of complexity including individual SCRs, specific sets of SCRs such as aje and even ajefbkd, whole genes, and even segments with more than one gene.

After examining CCPs of different species, we conclude that this process of duplication can be imperfect or at least modified by deletion and insertion. As shown in Table 1, variants such as ajefbk, ajef and jefbkd occur, although possibly only as additional copies. Functional diversification and species differences may be direct consequences of imperfect duplication.

Overlap Between CCPs

Historically, distinct CCPs have been identified and contrasted with those already known. Preferred ligands have been defined and differences in expression have been described. The number of SCRs in each CCP has been used as a means of classification and equivalents in different species have been sought often with confusing and controversial conclusions.

The present study has revealed far more structural overlap than could have been expected from earlier observations and from the existing nomenclature. Because single domains from multiple CCPs were examined, unexpected relationships were revealed. These can be considered under three headings: within a gene of a species, between genes of a species, and between species.

Within-Gene Relationships

For more than a decade, it has been appreciated that the amino acid sequence of CR1 contains four long homologous repeats (LHR) or septamers followed by two different carboxy-terminal SCRs. So-called allotypes with three to six septamers have been described. Functional studies by the Atkinson (Uhrinova et al. 2003) and Barlow (Kirkitadze and Barlow 2001) groups have revealed similarities between septamers and the importance of particular sets of SCRs. For example (in CR1), functional Site 1, which binds C4b and has decay accelerating activity (DAA), resides in SCR 1–3, 8–10, and 15–17 of CR1 and all three sets are aje using our nomenclature. Interestingly, within DAF, it has been shown that DAA resides in SCR 2–4 (aje) and that SCR1 (a preceding aje) is redundant Thus, our approach has confirmed, extended, and simplified earlier observations (Krushkal et al. 2000). Furthermore, phylogenetic analysis alone could have predicted these functional relationships while suggesting a useful nomenclature.

Although there are insufficient data at present, it can be predicted that there will be considerable diversity when the same set is compared within a gene (isotypic) or between subjects (polymorphism).

Between-Gene Relationships

As shown in Table 1, there are many similarities between different CCPs within a species, and as indicated above, these correlate with shared functional sites, e.g., DAA is encoded within aje. Even more remarkable is the fact that another transmembrane CCP, CR2, shares the carboxy-terminal ch. The comparison of CR1 and CR2 suggests that an early and functional set is that represented by fbkd in CR1. These three sets can be regarded as the putative building blocks of CCPs and their functions as relevant to the evolution of innate and acquired immunity.

The degree of overlap between CCPs has been underestimated in the past. Attempts to define each CCP as a distinct protein must be reviewed. We expect considerable although not complete functional and immunological overlap between CR1, CR1L, and DAF inter alia. On the other hand, we also expect that there will be polymorphism.

Between-Species Relationships

It is often said that CCPs are very different between species but the present analysis yields a different conclusion. Within primates, subfamily classification of SCRs reveals striking similarities and the order is consistent allowing for occasional deletion as in the case of the first a in Papio hamadryas CR1.

When primates are compared with rodents, some differences are apparent but these can be explained simply by single indels or by different degrees of duplication and alternative shuffling of conserved sets of SCRs. Nomenclature and annotation may give false impressions as to true and false orthologues. For example, so-called MCR1 is a fusion of Homo sapiens CR1/CR1L and Homo sapiens CR2 (with a single deletion). The human equivalent of Mus musculus Crry has been sought because of its likely importance in successful pregnancy. Indeed, although the nomenclature is necessarily historic and arbitrary, there could be a case for revision. In the meanwhile, we argue that the human equivalent of the CCP which protects the murine fetus is a mixture of CR1 and all these human CCP which contain at least aje or more of the ajefbkd CR1 septamer and therefore the conformational unit implied by the collocation of up to seven SCRs on the trophoblast surface. It will be important to consider these possibilities (i.e., aje irrespective of the CCP) when defining the immunopathology and designing antibodies or peptides which might block or initiate the process of abortion.

Evolution of SCRs

While the data are still limited and the process obviously very complex, we propose a framework for the evolution of SCRs and CCPs. Early SCRs were beneficial because of their inherent binding affinities, especially when occurring together. Any high-affinity binding has the potential to contribute to innate immunity by way of localization and immobilization of invading organisms. The same binding may also play a role in self–nonself discrimination in that there could be selection for receptors which are unoccupied and therefore free to bind foreign (rather than host) ligands. Selection would have favored closely linked (duplicated) gene clusters since increased copy number would result in increased concentrations of the receptor while facilitating the assembly of the multicomponent products and the conservation of specific sets.

As individual SCR domains diverged, functional diversity increased especially when sets were imperfectly duplicated. Some sets, however, were conserved particularly when crucial to the regulation of coevolving cascades such as the complement system. Duplication of genes such as the C3+4+5 precursors favored the imperfect but segmental duplication of specific sets of SCRs. Imperfections included insertion and deletion of SCRs as well as errors in linear sequence. Some coding polymorphism arose but was not especially favored because it was necessary to retain specificity for particular ligands. Noncoding insertions may have been important in relation to achieving differential but precise regulation of the expression of CCPs. During primate evolution these processes intensified due to the recruitment of human endogenous retroviruses (HERVs) which facilitated further divergence (Gaudieri et al. 2000).

References

E Bell (2000) ArticleTitleMurine embryonic survival depends on regulation of complement Immunol Today 21 109
Google Scholar
R Dawkins C Leelayuwat S Gaudieri G Tay J Hui S Cattley P Martinez J Kulski (1999) ArticleTitleGenomics of the major histocompatibility complex: Haplotypes, duplication, retroviruses and disease Immunol Rev 167 275–304 Occurrence Handle1:CAS:528:DyaK1MXivFSlur0%3D Occurrence Handle10319268
CAS PubMed Google Scholar
T Endo K Ikeo T Gojobori (1996) ArticleTitleLarge-scale search for genes on which positive selection may operate Mol Biol Evol 13 685–690 Occurrence Handle1:CAS:528:DyaK28Xis1Sms7s%3D Occurrence Handle8676743
CAS PubMed Google Scholar
S Gaudieri RL Dawkins K Habar JK Kulski T Gojobori (2000) Nucleotide diversity within the human major histocompatibility complex: Function of hitch-hiking effect, duplications, indels and recombination. M Kasahara (Eds) The major histocompatibility complex: Evolution, structure, and function Springer-Verlag Tokyo 186–200
Google Scholar
D Heine-Suner M Diaz-Guillen F Villena Particlede M Robledo J Benitez S Rodriguez de Cordoba (1997) ArticleTitleA high-resolution map of the regulator of the complement activation gene cluster on Iq32 that integrates new genes and markers Immunogenetics 45 422–427 Occurrence Handle1:CAS:528:DyaK2sXjt12qsrk%3D Occurrence Handle9089100
CAS PubMed Google Scholar
V Holers T Kinoshita H Molina (1992) ArticleTitleThe evolution of mouse and human complement C3-binding proteins: Divergence of form but conservation of function Immunol Today 13 231–236 Occurrence Handle1:CAS:528:DyaK38Xlt12mu70%3D Occurrence Handle1378280
CAS PubMed Google Scholar
D Hourcade VM Holers JP Atkinson (1989) ArticleTitleThe regulators of complement activation (RCA) gene cluster Adv Immunol 45 381–416 Occurrence Handle1:CAS:528:DyaL1MXlsFWmsbk%3D Occurrence Handle2665442
CAS PubMed Google Scholar
D Hourcade M Liszewski M Krych-Goldberg J Atkinson (2000) ArticleTitleFunctional domains, structural variations and pathogen interactions of MCP, DAF and CR1 Immunopharmacology 49 103–116 Occurrence Handle1:CAS:528:DC%2BD3cXkvFShu74%3D Occurrence Handle10904110
CAS PubMed Google Scholar
K Kalli D Fearon (1994) ArticleTitleBinding of C3b and C4b by the CR1-like site in murine CR1 J Immunol 152 2899–2903 Occurrence Handle1:CAS:528:DyaK2cXisVaisb0%3D Occurrence Handle8144890
CAS PubMed Google Scholar
M Kirkitadze P Barlow (2001) ArticleTitleStructure and flexibility of the multiple domain proteins that regulate complement activation Immunol Rev 180 146–161 Occurrence Handle1:CAS:528:DC%2BD3MXkvFegs7c%3D Occurrence Handle11414356
CAS PubMed Google Scholar
M Kirkitadze D Dryden S Kelly N Price X Wang M Krych J Atkinson P Barlow (1999) ArticleTitleCo-operativity between moduls within a C3b-binding site of complement receptor type 1 FEBS Lett 459 133–138 Occurrence Handle1:CAS:528:DyaK1MXmt1elsb0%3D Occurrence Handle10508932
CAS PubMed Google Scholar
J Krushkal O Bat I Gigli (2000) ArticleTitleEvolutionary relationships among proteins encoded by the regulator of complement activation gene cluster Mol Biol Evol 17 1718–1730
Google Scholar
M Krych R Hauhart J Atkinson (1998) ArticleTitleStructure-function analysis of the active sities of complement receptor type 1* J Biol Chem 273 8623–8629 Occurrence Handle1:CAS:528:DyaK1cXis1elsb8%3D Occurrence Handle9535836
CAS PubMed Google Scholar
M Krych-Goldberg R Hauhart V Subramanian B Yurcisin D Crimmins D Hourcade J Atkinson (1999) ArticleTitleDecay accelerating activity of complement receptor type 1 (CD35) J Biol Chem 274 31160–31168 Occurrence Handle1:CAS:528:DyaK1MXntFKqtrs%3D Occurrence Handle10531307
CAS PubMed Google Scholar
OP Martinez N Longman-Jacobsen R Davies EK Chung Y Yang S Gaudieri RL Dawkins CY Yu (2001) ArticleTitleGenetics of human complement component C4 and evolution the central MHC Frontiers Biosci 6 D904–D913 Occurrence Handle1:CAS:528:DC%2BD3MXnsFChsbg%3D
CAS Google Scholar
H Molina T Kinoshita C Webster V Holers (1994) ArticleTitleAnalysis of C3b/C3d binding sites and factor I cofactor regions within mouse complement receptors 1 and 2 J Immunol 153 789–795 Occurrence Handle1:CAS:528:DyaK2cXlsFGht7o%3D Occurrence Handle8021513
CAS PubMed Google Scholar
J Prechl A Erdei (2000) ArticleTitleImmunomodulatory functions of murine CR1/2 Immunopharmacology 49 117–124 Occurrence Handle1:CAS:528:DC%2BD3cXkvFShu78%3D Occurrence Handle10904111
CAS PubMed Google Scholar
K Reid D Bentley R Campbell L Chung R Sim T Kristensen B Tack (1986) ArticleTitleComplement system proteins which interact with C3b or C4b. A superfamily of structurally related proteins Immunol Today 7 230–234 Occurrence Handle1:CAS:528:DyaL28XltlWltbo%3D
CAS Google Scholar
S Rodriguez de Cordoba MA Diaz-Guillen D Heine-Suner (1999) ArticleTitleAn integrated map of the human regulator of complement activation (RCA) gene cluster on Iq32 Mol Immunol 36 803–808 Occurrence Handle1:STN:280:DC%2BD3c7mt1Wmuw%3D%3D Occurrence Handle10698333
CAS PubMed Google Scholar
R Schwarzenbacher K Zeth K Diederichs A Gries G Kostner P Laggner R Prassl (1999) ArticleTitleCrystal structure of human β2-glycoprotein I: Implications for phospholipid binding and the antiphospholipid syndrome EMBO J 18 6228–6239 Occurrence Handle1:CAS:528:DyaK1MXns1OrsLY%3D Occurrence Handle10562535
CAS PubMed Google Scholar
E Sonnhammer R Durbin (1995) ArticleTitleA dot-matrix program with dynamic threshold control suited for genomic DNA and protien sequence analysis Gene 167: SC 1–10
Google Scholar
G Szakonyi J Guthridge D Li K Young V Holers X Chen (2001) ArticleTitleStructure of complement receptor 2 in complex with its C3d ligand Science 292 1725–1728 Occurrence Handle1:CAS:528:DC%2BD3MXkt1Slu78%3D Occurrence Handle11387479
CAS PubMed Google Scholar
S Uhrinova F Lin G Ball K Bromek D Uhrin M Medof P Barlow (2003) ArticleTitleSolution structure of functionally active fragment of decay accelerating factor Proc Natl Acad Sci USA 100 4718–23 Occurrence Handle1:CAS:528:DC%2BD3sXjt12nsL8%3D Occurrence Handle12672958
CAS PubMed Google Scholar
C Xu D Mao M Holers B Palanca A Cheng H Molina (2000) ArticleTitleA critical role for murine complement regulator Crry in fetomaternal tolerance Science 287 498–501 Occurrence Handle10.1126/science.287.5452.498 Occurrence Handle1:CAS:528:DC%2BD3cXnsVWluw%3D%3D Occurrence Handle10642554
Article CAS PubMed Google Scholar

Web sites

http://pfam.wustl.edu, PFAM protein Family Database home page
http://www.ncbi.nlm.nih.gov, NCBI GenBank
http://www.ncbi.nlm.nih.gov/Blast, NCBI Blast Web page
http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html, NCBI Blast 2 sequences
http://motif.genome.ad.jp/MOTIF2.html, MOTIF home page
http://ftp.genome.washington.du/cgi-bin/RepeatMasker, Repeat Masker

Download references

Acknowledgments

The authors gratefully acknowledge the support of the Immunogenetics Research Foundation, the National Health and Medical Research Council of Australia, the Australian Research Council, and Equitech Pty Ltd. S.G. is supported by a Howard Florey Fellowship; J.B. and C.M. are supported by APA scholarships.

Author information

Richard A. Davies
Present address: Department of Biochemistry, University of Western Australia, Nedlands, 6907, Western Australia
Silvana Gaudieri
Present address: Department of Clinical Immunology, Royal Perth Hospital, Perth, 6000, Western Australia

Authors and Affiliations

Centre for Molecular Immunology and Instrumentation, University of Western Australia, Nedlands, 6907, Western Australia
Craig A. McLure, Roger L. Dawkins, Joseph F. Williamson, Richard A. Davies, Jemma Berry, Natalie Longman-Jacobsen Rebecca Laird & Silvana Gaudieri

Authors

Craig A. McLure
View author publications
You can also search for this author in PubMed Google Scholar
Roger L. Dawkins
View author publications
You can also search for this author in PubMed Google Scholar
Joseph F. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Richard A. Davies
View author publications
You can also search for this author in PubMed Google Scholar
Jemma Berry
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Longman-Jacobsen Rebecca Laird
View author publications
You can also search for this author in PubMed Google Scholar
Silvana Gaudieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Craig A. McLure.

Additional information

Manuscript number 0014 of the Centre for Molecular Immunology and Instrumentation of the University of Western Australia

Rights and permissions

Reprints and permissions

About this article

Cite this article

McLure, C.A., Dawkins, R.L., Williamson, J.F. et al. Amino Acid Patterns Within Short Consensus Repeats Define Conserved Duplicons Shared by Genes of the RCA Complex. J Mol Evol 59, 143–157 (2004). https://doi.org/10.1007/s00239-004-2609-8

Download citation

Received: 08 September 2003
Accepted: 28 January 2004
Issue Date: August 2004
DOI: https://doi.org/10.1007/s00239-004-2609-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Amino Acid Patterns Within Short Consensus Repeats Define Conserved Duplicons Shared by Genes of the RCA Complex

Abstract

Similar content being viewed by others

Spatially conserved motifs in complement control protein domains determine functionality in regulators of complement activation-family proteins

Annotation and Classification of CRISPR-Cas Systems

Convergent Evolution Within CEA Gene Families in Mammals: Hints for Species-Specific Selection Pressures

Introduction

Methods

Primate and Rodent RCA Sequences

Strategy

Domain-by-Domain Analysis

Peptide Searches

Genomic Analysis of the MCP and CR1 Duplicons

Genomic Analysis of the CR1 Revised Sequence

Synonymous and Nonsynonymous Rates WithinCCPs

Results

Phylogenetic Analysis Suggests a UniversalClassification of SCRs into Subfamilies Shared by Many CCPs

SCR Subfamilies and Their Sets Are Shared byDifferent RCA Proteins

Substitutions and Indels Define SCR Subfamilies

Conservation of SCR Subfamilies and Their Sets in Other Vertebrates

Human CR1 is Closer to Mus musculus Crrythan Mus musculus CR1

SCR Subfamilies and Function

Peptide Patterns Marking Functional Subfamilies

Indels and Extensive Continuous Sequences

Evolution of SCRs and CCPs

Units of Duplication and Deletion

Relevance to CCPs Encoded Within the MHC

Nucleotide Analysis

Discussion

Units of Duplication

Overlap Between CCPs

Within-Gene Relationships

Between-Gene Relationships

Between-Species Relationships

Evolution of SCRs

References

Web sites

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation