Introduction

Molluscan shells are composed of calcium carbonates and organic molecules. Amino acid composition analyses suggested that among the organic molecules, acidic proteins are the major components (Weiner 1979; Albeck et al. 1993). Since acidic proteins have Ca2+ binding capacity owing to their negative charge under physiological conditions, they are likely to have important functions in calcium manipulation during shell formation. Complete or partial amino acid sequences of acidic shell matrix proteins have been reported from several pterioid bivalves. For example, MSP-1, Asprich, Caspartin, and MPP1 were identified from Patinopecten yessoensis, Atrina rigida, Pinna nobilis, and Crassostrea nippona, respectively (Sarashina and Endo 1998; Sarashina and Endo 2001; Gotliv et al. 2005; Marin et al. 2005; Samata et al. 2008). These proteins have a low theoretical isoelectric point because they contain a high percentage of the acidic aspartic acid residues.

Aspein is one of the acidic shell matrix proteins. It was first identified from the mantle tissue of the pearl oyster Pinctada fucata (Tsukamoto et al. 2004). Aspein has a signal peptide sequence (19 amino acids in length), which is significantly similar to that of Asprich (63 % identity), the acidic shell matrix protein identified from A. rigida. Aspartic acid composes 60.4 % of the mature Aspein, causing its predicted isoelectric point to be as low as 1.45. These characteristics make Aspein the most acidic of all known shell proteins. Most of the aspartic acids are located adjacently and form the D-domain (Fig. 1). The Stains-All staining of the SDS-PAGE gels indicated that the D-domain is likely to have cation binding capacity (Takeuchi et al. 2008).

Fig. 1
figure 1

a Alignment of the amino acid sequences of Aspein. Aspartic acids encoded by GAT (blue) are distinguished from aspartic acids encoded by GAC (red). Signal peptide, the SEP motif, and DA repeat motif are boxed. b Schematic representation of the primary structure of Aspein homologs

Further experiments have also provided evidence for Aspein’s putative role in the calcitic shell layer. Gene expression studies showed that the Aspein gene is expressed only at the outer edge of the mantle, corresponding to the calcitic prismatic layer of the pearl oysters P. fucata and P. margaritifera (Tsukamoto et al. 2004; Takeuchi and Endo 2006; Joubert et al. 2010). The expression level of Aspein during larval and juvenile stages increased at the onset of calcite formation, while it was also weakly expressed when the shell is only composed of amorphous calcium carbonates (Miyazaki et al. 2010). Meanwhile, an in vitro experiment showed that Aspein can induce calcite formation from Mg2+-rich solutions (Takeuchi et al. 2008). These results strongly suggest the involvement of Aspein in the calcite precipitation process, even when the conditions favor the formation of aragonite (high Mg2+/Ca2+ ratio; Kitano 1962; Berner 1975; Davis et al. 2000), a condition which is similar to the extrapallial fluid in living marine mollusks (Wada and Fujinuki 1976).

Despite its possible importance for shell formation in bivalves, Aspein has been characterized only from one species, the pearl oyster P. fucata. Partial sequences of Aspein homologs were obtained from P. margaritifera and P. maxima by EST analysis, but the full sequences have not been published yet (Joubert et al. 2010; Jackson et al. 2010). It is difficult to perform evolutionary analyses to infer conserved regions and functionally important domains without full-length homologs to compare. Here, we report the identification and characterization of Aspein homologs from three pterioid species closely related to P. fucata: P. maxima, Isognomon perna and Pteria penguin. We characterized the evolutionarily conserved regions and inferred the presence of putatively functionally important domains in the Aspein sequences from the three species studied. We also discussed the possible evolutionary processes that might have led to the sequence diversity of Aspein, despite the presence of conserved domains and the putative conservation of its function.

Materials and Methods

Aspein Sequence Isolation

Live individuals of P. maxima and P. penguin were kindly provided by Tasaki and Co., Ltd. (Kobe, Japan). I. perna was collected at the shore of Amami Oshima Island, southern Japan. Total RNA was extracted from the outer edge of the mantle tissues from a single individual using Isogen (Nippon Gene, Tokyo, Japan). 3′RACE-ready cDNA was prepared using ReverTra Ace (Toyobo, Osaka, Japan) with oligo-dT primer Hybrid-VN1 (detailed information of the primers used in this study is given in Table 1). 5′RACE-ready cDNA was synthesized using SMART RACE cDNA Amplification Kit (Clontech, CA, USA). Partial sequences of PmAspein were amplified using the degenerate primer pair of degAspS-1 and degAspA-1. We amplified partial sequences of IpAspein and PpAspein using the degenerate primer pair of degAspS-2 and degAspA-2. We used several different degenerate primers because the conserved regions of these Aspeins are short. We designed these primers based on the conserved regions between PfAspein and Asprich. DegAspS-1 and 2 were designed on the signal peptide sequence. DegAspA-1 and 2 were designed from the sequence near the N-terminus and D-domain, respectively. 3′RACE was performed on the cDNA of P. maxima, I. perna, and P. penguin using the gene-specific sense primers, PmAspS-1, IpAspS-1, and PpAspS-1, respectively, and the adaptor primer RACE-TT1. 5′RACE was performed on the cDNA of P. maxima, I. perna, and P. penguin using the gene-specific antisense primers, PmAspA-1, IpAspA-1, and PpAspA-1, respectively, and the adaptor primer Universal primer A mix (SMART RACE cDNA Amplification Kit, Clontech). Full-length cDNAs of PmAspein, IpAspein, and PpAspein were amplified using the primer pairs of PmAspS-2 and PmAspA-2, IpAspS-2 and IpAspA-2, and PpAspS-2 and PpAspA-2, respectively. The PCR products were subcloned into the pGEM-T Easy Vector (Promega, WI, USA). The inserts of the vectors were sequenced with BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, CA, USA) and using the T7 and SP6 primers. The sequences were run on an ABI PRISM 3100-Avant Genetic Analyzer (Applied Biosystems, CA, USA).

Table 1 List of primers

Sequence Analyses

We performed BLAST similarity searches on the DDBJ website (http://www.ddbj.nig.ac.jp) to confirm whether PmAspein, IpAspein, and PpApein are PfAspein’s homologs. Signal peptide cleavage sites and isoelectric points were predicted using Genetyx version 6 (Genetyx, Tokyo, Japan).

In order to identify evolutionarily conserved domains, we first aligned the D-domains. In order to do so, we distinguished the two codons encoding Asp, and we divided the aspartic acids in the D-domain into two: the GAT-encoded Asp and the GAC-encoded Asp, and then performed amino acid sequence alignment using Clustal W. Afterward, we aligned the rest of the sequences manually.

Enzyme-Linked Immunosorbent Assay (ELISA)

Rabbit antiserum was prepared against a recombinant peptide corresponding to the amino acid position 20 to 131 of PfAspein, comprising 112 amino acid residues. The prismatic layer of P. fucata, P. maxima, I. perna, and P. penguin was mechanically isolated and crushed into powder. The nacreous layer of P. fucata, P. maxima, I. perna, and P. penguin was also isolated and crushed. The crushed shells were incubated overnight in a 30 % (v/v) aqueous solution of bleach with gentle shaking at room temperature to destroy surface contaminants. After thorough washing with ultrapure water, the shells were dried. Organic materials were extracted by dissolution of 0.1 g of the fragments in 0.75 ml of 0.5 M ethylenediaminetetraacetate (EDTA) at pH 8.0 with shaking at room temperature. In order to remove insoluble material, the solution was filtered using Millex-GV sterilizing filter unit (Millipore, MA, USA). An aliquot (100 μl) of the sample solution was incubated at 37 °C for 90 min in each well on a multiwell plate. After the wells were emptied, they were washed with 0.05 % (v/v) Tween 20 in TBS (TBS/Tween) (TBS; 0.9 % [w/v] NaCl, 10 mM Tris, pH 7.5) three times. The wells were blocked by incubation with 100 μl of 1 % (w/v) gelatin in TBS at 37 °C for 30 min. After the wells were emptied, 100 μl of recombinant Aspein-injected rabbit antisera diluted appropriately (1/30–1/590,490) by 0.1 % (w/v) gelatin in TBS/Tween (gelatin/TBS/Tween) were added to each well and incubated at 37 °C for 90 min, followed by the TBS/Tween wash as above to remove unbound antibodies. Then 100 μl of 0.05 % (v/v) Anti-rabbit IgG alkaline phosphatase conjugate (Sigma, MO, USA) in gelatin/TBS/Tween was added to each well and incubated at 37 °C for 90 min. After the TBS/Tween wash to remove unbounded second antibodies, 100 μl of 1 % (w/v) 4-nitrophenyl phosphate (pNPP) disodium salt hexahydrate (Sigma) in 1 M diethanolamine, pH 9.8, with 0.5 M MgCl2, was added to each well and incubated in dark at 37 °C for 4 h. The color intensity was measured spectrophotometrically at 405 nm using a microplate reader, MPR-A4i (Tosoh, Tokyo, Japan). All assays were carried out in duplicates, and the results were averaged.

Results

First, cDNA fragments of 105, 346, and 205 bp were amplified from P. maxima, I. perna, and P. penguin, respectively, using degenerate primers. We then performed 5′ and 3′RACE to obtain full-length cDNA sequences of P. maxima (PmAspein; 2,124 bp, accession number; AB685319), I. perna (IpAspein; 1,489 bp, accession number; AB685320), and P. penguin (PpAspein; 1,508 bp, accession number; AB685321) (Figs. 1, 2). BLASTN results showed that the sequences have the highest similarities to P. fucata (PfAspein) with low e-values, suggesting their homologous relationship.

Fig. 2
figure 2

a Comparison of the amino acid sequences of Aspein homologs. The amino acids in D-domain are highlighted in different colors. The phylogenetic tree shows evolutionary relationships of the species examined (Tëmkin 2010). b Numbers of the main amino acids in D-domain

We then determined conserved regions and domains in the Aspein sequences. All Aspeins have a predicted signal peptide domain, comprising 19 amino acid residues (Fig. 1). They show high sequence similarities to one another. All four Aspeins contain a Ser-Glu-Pro (SEP) motif and Asp-Ala (DA) repeats, near N-terminus (Fig. 1). In all Aspein homologs, the polyaspartic acid-rich D-domain occupies most of the sequences (up to 84 % of the full length) in all species (Fig. 1). In PfAspein, PmAspein, IpAspein, and PpAspein, the proportions of Asp in the D-domains are approximately 67 %, 86 %, 70 %, and 69 %, respectively. Although all D-domains have similar amino acid composition, they showed relatively low sequence similarities (Figs. 1a, 2). The predicted isoelectric points of mature PmAspein, IpAspein, and PpAspein, excluding the putative signal peptide, were 1.30, 2.11, and 1.96, respectively.

In the ELISA immunodetection experiment, our specific antiserum reacted positively with the extracts from the prismatic layer of the shells of P. fucata, P. maxima, I. perna, and P. penguin (Fig. 3). In contrast, the strength of the reaction with the shell extracts of the nacreous layer of all four species examined was as low as the EDTA negative control (Fig. 3). These results showed that Aspein exists only in the calcitic prismatic shell layer.

Fig. 3
figure 3

Immunological binding curves for the shell extracts. Error bars represent standard errors. Symbols without error bars indicate that standard errors are within the size of the symbols

Discussion

Characteristics of Aspein Homologs and Their Localization Within the Shell

Aspein is an exceptionally highly acidic shell matrix protein originally identified from a bivalve species, the pearl oyster P. fucata. In this study, we identified, characterized, and performed evolutionary analyses of Aspein homologs from three other bivalves, namely P. maxima, I. perna, and P. penguin. With relatively low isoelectric points, these proteins are unusually acidic, just as PfAspein. The result of immunoassay on the two shell layer of all four species studied including P. fucata indicated that they are all shell matrix proteins, which exist in the calcitic prismatic shell layer. Although the exact specificity of the antibody is yet to be tested by western blot, the antibody showed no reaction with the shell extracts from the snail Euhadra brandtii and cephalopod Nautilus pompilius (data not shown), as well as with the extracts from the nacreous shell layer of P. fucata. These results are consistent with the hypothesis that Aspein plays a role during the calcitic prismatic layer formation.

Evolutionarily Conserved Regions of Aspein Indicate the Existence of Functionally Important Domains

Sequence comparisons and domain predictions showed the existence of conserved domains among all Aspein sequences. All Aspein sequences determined in this study have a signal peptide, the SEP motif, and the DA repeat motif, and the aspartic acid-rich D-domain. The signal peptide motif is essential for the proteins to be secreted out of the cells. This is consistent with the results of ELISA, which showed that Aspein proteins exist extracellularly. The SEP motif was conserved among all Aspeins, suggesting that this motif is functionally important. This motif is known to be present in mammalian type-1 gonadotropin-releasing hormone (GnRH) receptors, whose function is to help the receptor to selectively interact with mammalian GnRH-1 ligand (Wang et al. 2004; Song et al. 2006). This suggests that the motif might play a role in protein–protein interactions. The SEP motif possibly functions by helping Aspein to interact with other shell matrix proteins. Meanwhile, the DA repeat motif is likely to have a Ca2+ binding capability. This is indicated by a previous work by Takeuchi et al. (2008) which showed that Stains-All staining stained the recombinant peptide containing DA repeat, blue.

Conserved Positions of Asp are Possibly Unimportant for the D-domain’s Function

The D-domain, which is also believed to have a Ca2+ binding capacity (Takeuchi et al. 2008), was also conserved among these Aspein sequences. The D-domains are composed of a high proportion of Asp (Fig. 2). The D-domains of PfAspein, IpApein, and PpAspein have more than 200 Asp residues, and PmAspein has 345 (Fig. 2). The D-domain also contains approximately 50 glycines (Gly), dispersed among the Asp residues (Fig. 2). However, there are few similarities in the repeat patterns of poly(Asp) and the location of Gly (Figs. 1a, 2a). These observations indicate that only the high proportion of Asp and Gly is important for the function of the D-domain, but their specific arrangements are probably not important.

Regularly repeating sequences in acidic shell matrix proteins was thought to work as a template for nucleating crystals in shells (Weiner and Hood 1975). In this model, acidic shell matrix proteins are predicted to have (Asp-Y)n repeat, where Y represents serine or glycine. If this polypeptide exists with the β-sheet conformation, the distance from one aspartic acid residue to the next is nearly consistent with that of Ca2+–Ca2+ in calcium carbonate. Ca2+ is arranged regularly by binding to each aspartic acid. Then, calcium carbonate crystal starts to grow from the arranged Ca2+. However, this repeating sequence was not conserved among acidic shell matrix proteins determined to date. In addition, the repeating patterns of Asp among Aspein homologs are significantly different even among closely related species used in this study, suggesting that any specific sequence as template for nucleation are not required for the function of this acidic shell matrix proteins.

The High Proportions of Asp and Gly in the D-domain

The frequencies of other amino acids in the D-domain of these species are also significantly different. For example, the D-domain of PfAspein and PpAspein is rich in serine (S), but that of PmAspein and IpAspein is poor in serine (Fig. 2). PmAspein, IpAspein, and PpAspein have about ten asparagines (N), but PfAspein has only one asparagine residue (Fig. 2). Our result also showed that the D-domains of the congeneric species P. fucata and P. maxima show high variations in the number of amino acids and their arrangements (Fig. 2). Such a high level of sequence variations among closely related congeneric species hints that Aspein evolves relatively rapidly. These facts suggest that only the high proportion of Asp and Gly is important for the function of the D-domain, but the presence of other amino acids are not crucial.

Evolutionary Processes of Aspein

Comparison of the D-domain sequences clearly shows differences in amino acid composition. Point mutations are thought to have occurred frequently in the D-domains (Fig. 4a). One point mutation occurring in the Asp codon GAT or GAC can convert it to Gly (GGT or GGC), Asn (AAT or AAC), Ala (GCT or GCC), Glu (GAA or GAG), or Val (GTT or GTC), all of which are amino acids contained in the D-domains. These point mutations may have caused the variations in the amino acid composition and the repeat pattern of poly(Asp) in the D-domains.

Fig. 4
figure 4

Models of motif evolution in the D-domain. a Point mutations. The boxed amino acid residues evolved by single nucleotide changes (green letters). b The boxed amino acid sequence was inserted by replication slippage. c The boxed amino acid sequences were exchanged with each other by unequal crossover. For clarity, the evolutionary processes are shown at the amino acid level in b and c, but they actually occur at the nucleotide level

Interestingly, in the D-domains of PfAspein and PpAspein, serine often occurs as di- or tri-amino acid repeat with other amino acids. For example, many of serines and glycines in the D-domain of PfAspein are contained as SG motif (Fig. 2a). In addition, the D-domain of PpAspein has several SNG and SN motifs (Fig. 2a). Therefore, it is unlikely that these repeat motif expansions occurred only by point mutation. The presence of characteristic repeat motifs in the D-domain suggests that the domain underwent several partial expansions and contractions (Fig. 4b, c). There are examples of tandem repeat sequences explained by replication slippages and unequal crossovers (e.g. McKnight et al. 2008; McKnight and Fisher 2009). This might have also been the case with Aspein. Besides point mutations, replication slippages and unequal crossovers were probably responsible for the evolution of the repeating patterns in the D-domain.

Conclusion

Aspein is one of the acidic shell matrix proteins, with potential important functions in calcitic shell formation. However, since Aspein was characterized only from a single species of pearl oyster (P. fucata), it was difficult to do further evolutionary analyses. In this study, we identified Aspein homologs from three closely related species of P. fucata, and characterized the evolutionarily conserved regions (signal peptide, SEP motif, DA repeat motif, and D-domain). We found that although a large number of Asp in the D-domain was conserved among the species examined, the arrangements were not conserved. This high level of variation may be due to the expansions and contractions of the D-domain by replication slippage and unequal crossing over. The variation also strongly suggests that a specific arrangement of Asp is not important for the function of D-domain. Therefore, at least in Aspeins our results do not support the template theory, which purports that a specific arrangement of Asp in acidic shell matrix proteins was crucial as a template for nucleating crystals in shells. To our knowledge, our study is the first to report about evolutionary analyses and functional domain identification of Aspein, or any highly acidic shell matrix protein.