An almost equal number of introns and exons—as well as the sizes of those exons—attest to the common origin very early in evolution of the genes coding for calmodulin (CaM), centrin (Cetn), troponin C (Tn C) and parvalbumin (PV). While introns vary considerably in nucleotide sequence and overall size (i.e. they exhibit many nucleotide mutations, additions and deletions), for exons coding these subfamilies of proteins, changes in exon size are minor and rare. Usually—not always—each EF hand is present in one separate exon (supporting Gilbert’s maxim that ancient genes were assembled by exon shuffling involving compact modules [1]). In the ancient precursor to these subfamilies, four identical modules were fused, possibly in two steps, forming the precursor molecule. These four EF hand consensus motifs (DXDXXGXI/VXXXE) i.e. 12-amino-acid residue loops are usually separated by 24, 25 and 24 amino acids (each a helix–loop–helix) positioned between motifs 1 and 2, 2 and 3, and 3 and 4, respectively, in members representing two subfamilies of these molecules (CaM and Cetn 2), (Tables 1 and 2), thus reinforcing the evidence for a common origin of the two subfamilies. In Tn C—the third subfamily—the rule 24-25-24 appears slightly modified to 24-28-24, and in PV—the fourth subfamily—the EF hand motif 1 is completely lost and motif 2 is defective so that PV evolved as a molecule with only two functional EF hands which are separated by 27 amino acids. Table 2 depicts the distribution of loop and helix structures in one representative molecule from each of the four subfamilies. This distribution pattern has been highly conserved in spite of a high rate of amino acid substitutions, i.e. many of the substitutions retained the physicochemical character of the replaced amino acid. In all mammals, three separate genes code for an identical CaM. In all vertebrates, the protein sequence for CaM is 100% identical [2] In mammals up to four distinct genes encoding Cetn isoforms exist. However, while the consensus motif given above represents precisely the four motifs contained in the three mammalian CaM proteins, it differs slightly for the multiple mammalian Cetn proteins, i.e. here the amino acid sequences that compose the motifs are slightly modified. While for the Cetn proteins, overall conservation is high, it is not complete. Consequently, in the various Cetn isoforms, calcium binding and hence protein function might be effected. (Note: In CaM, EF hand residues 1, 3, and 5 donate monodentate ligands, i.e. side-chain carboxylates; residue 7 coordinates directly to the calcium ion; and residue 12 provides a bidentate ligand that coordinates the calcium through two oxygen atoms on a side-chain carboxylate [3]).

Table 1 EF-hand amino acid distribution within exons of CaM 3 and of Cetn 2
Table 2 Distribution of loop and helix structure in representative molecules from each of the four human protein subfamilies

A major difference between the two protein subfamilies (CaM and Cetn) is the presence (usually) in Cetn of an approximately 23-amino-acid-long (sometimes longer) protruding amino end—a leader sequence—containing possibly a nucleus locating signal (NLS): (KK, RKR) (i.e. several adjacent basic amino acids in an overall acidic molecule directing this protein to the nucleus). Another function for this extended amino end of the Cetn might be envisioned, however, since human recombinant Cetn 2 lacking its first 25 residues loses its capacity for self-assembly [4]. Disregarding this amino end sequence and examining percent amino acid homology of CaM versus Cetn 2 yields a 52% homology between these two proteins (Table 1). Functional differences between them—as well as between various Cetn proteins, might be attributed to the difference at the amino end in addition to variation in amino acids in calcium binding EF hands.

Cetn 1: The Cetn 1(caltrectin 2) gene is positioned in humans, rats, mice on chr 18, in dogs on chr 7 and in cows on chr 24. Remarkably, in none of these species does the Cetn 1 gene product contain introns and all of these genes code for a 172-amino-acid-long protein that is highly conserved (excepting the aforementioned leader sequence wherein a few mutations occurred at its 5′ end), thus attesting to the evolution of these genes from a common intronless ancestry gene (Fig. 1 (tree) and Table 3). Table 4 shows the presence of 3′ UTR markers (only a small region adjacent to the coding terminus is given in the table) allowing the identification of mRNA transcribed from these orthologs in various species. As observed by Hart et al. [5] for murine Cetn 1, the retroproson-derived ancestor gene in turn arose from the X-linked Cetn 2 gene within germ cells. Such an argument is supported by the percent homology seen when comparing the coding and 3′ UTR regions between the mouse Cetn 1, Cetn 2 and Cetn 3 mRNA (Fig. 2) and applies also to the retroposon-derived Cetn l and the Cetn 2 mRNAs in rats, dogs, cows and humans (Table 5a). (Table 5b is included to show that homology in the coding region is higher for orthologs than for paralogs.) Cetn 1 and Cetn 2 are close to each other and also to the Cetn from green algae. (When comparing the 3′ UTRs for Cetn l versus Cetn 2, however, homologous sequences, are short and rare (Tables 4 and 7). Thus, retroposition occurred early-on in the existence of Cetn 2 and from this original Cetn 2 paralog, all Cetn 1 orthologs evolved.

Fig. 1
figure 1

Tree for Cetn 1 in Mammals. For identification of species see footnote to Table 3. Add XP_590442: Cow

Table 3 Amino acid sequence: Cetn 1
Table 4 Partial 3′ UTR non-coding sequence: Cetn 1
Fig. 2
figure 2

Tree for Mouse Cetn 1, 2, 3, 4 and Frog Cetn 2 and 3. P41209: Mouse Cetn 1; Q9R1K: Mouse Cetn 2; AAH54948: Frog Cetn 2; XP484840: Mouse Cetn 4; AAG30507: Frog Cetn 3; NP031710: Mouse Cetn 3

Table 5 (a) Coding sequence homology displayed by mouse Cetns—AAH48488: Cetn 1, Q9R1K9: Cetn2, o35648: Cetn 3, XP_484840: Cetn 4. (b) Coding sequence homology displayed by mouse versus frog Cetn 2 and 3—Q9R1K9: Mouse 2, NP_031710: Mouse 3, AAA79194: Frog 2, NP_001016387: Frog 3

Cetn 2: For the five mammalian species considered here, Cetn 2 (Caltrectin 1) is always encoded on chr X. While in humans, mice and cows, this protein contains 172 amino acids, in rats the coding region begins with an additional 163 amino acids placed in front of the 172, and in dogs an additional 48 amino acids are present anterior to 171 amino acids. There exists no detectable homology between the 163 additional amino acids in rats and the 48 additional amino acids in dogs (Table 6), thus suggesting that these additions occurred after the two species branched. In all the five species, the gene contains five exons, where exons 2, 3, and 4 code for 53, 43 and 46 amino acids, respectively and exon 5 encodes 29 amino acids up to the begin of the 3′ UTR (Table 6). (Sizes for exons 1 and 5 are not compared here because these exons include the 5′ and 3′ UTRs, respectively, which might differ in length.) The sizes of the introns vary considerably in composition and length.

Table 6 Amino acid sequence: Cetn 2

CaMs in all vertebrates exhibit a 100% identical amino acid sequence which is translated from six (rather than five) exons (exon 1 in CaM provides only the beginning M for the protein molecule) whereas exons 3, 4, and 5 code for 48, 36,45 amino acids, respectively and exon 6 encodes 9 amino acids up to the begin of the 3′ UTR. The CaM exons are so arranged in size that only EF hand motif 1 in exon 3 is positioned toward the middle of this exon 3, while the other three motifs are located at the ends of their respective exons (motif 2 begins in exon 3 but resides mostly in exon 4, and motif 3 begins in exon 4 but resides mostly in exon 5, and motif 4 present in exon 5 edges slightly into exon 6), so that in three out of four cases a couple of amino acids within the motifs are encoded in adjacent exons (Table 1). It might be surmised that the Cetn 2 kept the original placement of the motifs present in the precursor, while CaM which arose from the same precursor underwent slight redistricting of its exons. As mentioned above, the number of amino acids separating motifs 1 and 2, 2 and 3, 3 and 4 in both Cetn 2 and CaM exhibit an identical pattern (24-25-24 amino acids). For CaM, the comparison of the region containing motifs 1 and 2 with that containing motifs 3 and 4 has been interpreted to prove the formation of one gene by the fusion of two genes following a duplication process [6]. This could have been the modus operandi responsible for the creation of a precursor that then in turn gave rise to a CaM and also to a precursor Cetn gene (possibly Cetn 2). A comparison of mRNA 5′ and 3′ UTRs for homology in the Cetn 2 representing the five species indicates the existence of multiple short segments in these regions sufficient to identify a specific Cetn paralog (Table 7). In dogs, there occurs an additional gene on chr 13 encoding a shorter Cetn 2 containing 159 amino acids, consisting of two exons, separated by 219 bases, the first coding for 59 and the other for 100 amino acids and devoid of other introns (i.e. a retroproson-derived gene?) (cDNA: XM_539129).

Table 7 Partial 3′ non-coding sequence: Cetn 2

Cetn3: Cetn 3 is encoded in the genome of the rat (chr 2), mouse (chr 13), human (chr 5), dog (chr 3) and of cattle (chr Un) (chr Un contains contigs that cannot be confidently placed at present). In all of these species the gene codes for a protein containing 167  acids. In all cases there exist 5 exons where exon 2 codes for 46, exon 3 for 38, exon 4 for 65, and exon 5 (up to the 3′ UTR) for 13 amino acids. In all Cetn 3 proteins, EF motifs 1 and 2 reside in exons 2 and 3, respectively. EF motif 3 resides in exon 4 and motif 4 begins in exon 4 but ends in the beginning of exon 5. The 24-25-24 amino acid pattern between EF hands holds for the Cetn 3 proteins even though exon size has been slightly altered (Table 8). A comparison of the 3′ UTRs of the Cetn3 orthologs in the various mammalian species depicts multiple short homologous sequences that convincingly allow proper identification (Table 9). Cetn 3 is more distant in sequence to Cetn 2 than is Cetn l (Table 5a) and closer to yeast CDC31 than is Cetn 1. Cetn 2 and Cetn 3 (but not Cetn1) are transcribed in NIH3T3 cells and are localized primarily in the distal lumen of centrioles and in the procentriole bud [7]. They are involved in centriole duplication. Not all the Cetn 2 and 3, however, are associated with the centrosome: They are also present in nuclei and cytoplasm.

Table 8 Amino acid sequence: Cetn 3
Table 9 Partial 3′ UTR non-coding sequence: Cetn 3

Cetn4: In human DNA, no Cetn 4 could be located. In the mouse, Cetn 4, a protein containing 168 amino acids is coded on chr 3 as an intron-containing gene (and as an additional gene (on chr 1) which is intronless). For this mouse Cetn 4 gene (chr 3), exon 2 codes for 49, exon 3 for 43, exon 4 for 46, and exon 5 (up to the begin of the 3′ UTR) for 29 amino acids (i.e. the number of amino acids coded is identical in exons 3, 4 and 5 with those in the corresponding exons of mouse Cetn 2 (see Tables 6 and 10). In the dog, the Cetn 4 gene is present on chr 19 (173 amino acids) and the numbers of amino acids in exons 2, 3, 4 and 5 are identical to those shown in the mouse. In cattle, the gene (on chr 17), codes for 166 amino acids and exhibits the identical distribution of amino acids in its exons except in exon 2 which spans 47 rather than 49 amino acids. In all instances, separation of the motifs, 1 and 2, 2 and 3, and 3 and 4, remains constant (24-25-24 amino acids), i.e. it is identical in numbers mentioned above for CaM, Cetn 1, Cetn 2 and Cetn 3; see Table 11. For the rat Cetn 4 (chr 4) containing 233 amino acids, however, exon 3 (and hence motif 2) is completely missing. (It is of interest to examine the functional capacity for the product from this gene (xp 342235)). The number of amino acids in exon 4 and exon 5 transcribed from the gene in the rat, remain unchanged. A comparison of the 3′ UTRs of the Cetn 4 ortbologs from the four species examined also provides short homologies for appropriate attribution (Table 11).

Table 10 Amino acid sequence: Cetn 4
Table 11 Partial 3′ UTR non-coding sequence: Cetn 4

Centrin function: The name “centrin” implies a role for centrins in the centriole. Cetn 4 appears only in a few tissues such as ovary, lung, kidney and brain. In brain, it is transcribed in ependymal and choroidal ciliated cells involved in assembly of basal bodies [8]. Thus, the function of Cetn is not restricted to the centriole in the centrosome. It is also apparent in the formation of cilia or flagella. In the latter the modified centriole is labeled “basal body”. Ciliary and flagellar basal bodies and centrioles share the same architecture. Cilia possess a scaffold of tubulin where motor proteins move cargo (in a way similar to what happens in centrosomes). The basal bodies control the direction of the movement of the cilia. An evolutionary relationship for these structures seems likely: It might have had its beginning in the earliest eukaryotes as the “karyomastigont” present in flagellated protists and sperm: The undulipodium (i.e. basal body) connected in the cytoplasm by a rhizoplast (striated roots) to the nucleus during mitosis. The divergence of Giardia lamblia is estimated to be more than twice as ancient as is the common ancestor of yeast and mouse. Giardia is likely among the earliest organisms that branched from the eukaryotic line of descent. This protozoan parasite (a protist) is an amitochondrial, but bi-nucleated eukaryote in which Cetn already plays a role in the motility induced by primary cilia [9] in addition to its involvement in centrosome formation.

Within the human nucleus, Cetn 2 interacts directly with the xeroderma pigmentosum group C protein (XPC), a component of the nucleotide excision repair pathway [10]. XPC protein possesses a high affinity binding site: a typical 1-5-8-14 motif [11] for binding the ubiquitous CaM as well as for the binding of the non-ubiquitous Cetn [12].

Light-activated rhodopsin interacts with heterotrimeric G protein (transducin). High affinity binding to Gxβγ and subcellular localization of Cetn 1, Cetn 2 and Cetn 3 in the connecting cilium suggests that in the mammalian retinal photoreceptor cells, these isoforms could act as regulators of transducin (the cargo) during photon-induced translocation. Cetn binds with high affinity and specificity to transducin. Cetn 4 is exclusively localized in the basal body at the base of the connecting cilium [13]. So far, there is no evidence that CaM is also involved here.

Defective cilia in epithelial kidney tubules are present in the genetic disease (nephronophthisis) which is induced by a mutation in the NPHP-5 (IQCB) gene. NPHP-5 is a gene that codes for nephrocystin-5. An IQ motif might suggest the presence of a CaM binding site within the expressed protein. There are at least two such sites: one represents the cardiac myosin light chain type and the other the 1-5-8-14 type [14]. All patients possessing the mutated gene also have retinitis pigmentosa. It was shown using coimmunoprecipitation on retinal extracts that NPHP-5, CaM and retinitis pigmentosa GTPase regulator (RPGR), all three, localize to connecting cilia of photoreceptors and to primary cilia of renal epithelial cells. The contribution of centrin was not examined. In this case are centrin and CaM selectively or randomly utilized? The same question might be asked when considering the action of the Asp protein, the product of the abnormal spindle (asp) gene. Removing the function of this protein from Drosophila embryo extracts by mutation prevents the holding together of the microtubule-nucleating gamma-tubulin ring complexes that organize the mitotic centrosome [15]. In centrosome duplication within human tissue culture cells, gamma-tubulin, Cetn 2 and kendrin colocalize with the centrosome [16]. A recent study of the distribution of Asp mRNA in zebrafish brain suggests its presence in high concentration in the ependymal cells surrounding the ventricles and the periventricular zone were motile cilia direct the course of cerebrospinal fluid (Sydnor and Friedberg, Unpublished). Kendrin possesses two IQ motifs. The Asp protein displays multiple IQ motifs. Is it CaM or Cetn that is bound and if it is Cetn one might investigate the possibility that the Cetn molecules are arranged in lengthy chain fashion when exposed to multiple binding motifs. Both CaM and Cetn are among the genes that relate directly to the mitotic spindle.