Background

The explosive radiation of metazoan taxa during the Proterozoic–Cambrian transition was shaped on the appearance of several innovations, including the ability to form biomineralized skeletons. Among them, the calcium carbonate shell, that protects the mollusc soft tissues, constitutes an excellent model for studying the process of biomineral formation and its evolution. The wide morphological diversity of shell-bearing molluscs (bivalves, gastropods, cephalopods, monoplacophorans and scaphopods, about 100,000+ species (Ponder and Lindberg 2008)) also extends to a tremendous diversity of shell micro-textures, including “prismatic”, “nacreous”, “foliated”, “cross-lamellar”, “granular” “composite-prismatic” and “homogeneous” structures (Bøggild 1930; Carter 1990; Chateigner et al. 2000). Despite this diversity, all molluscan shell layered structures are extracellular and synthesised according to the same physiological pathway: they result from the secretory activity of an evolutionarily homologous organ known as the mantle. In short, the mantle epithelium extrudes the ionic precursors of the shell minerals (Bielefeld et al., 1992), together with an extracellular ‘cell-free’ organic matrix that is incorporated into and surrounds nascent CaCO3 crystals during the shell growth. Although the organic shell matrix represents only a small part of the CaCO3 shell weight (between 0.1 and 5% w/w according to the different species and microstructures), it is well known to be essential for the control of the biomineral formation (Mann 1988). It is, in particular, involved in the arrangement of the organic framework (Sudo et al. 1997), in the regulation of the CaCO3 precipitation (Wheeler et al. 1981) and in the control of the crystal polymorph—aragonite and/or calcite (Falini et al. 1996). The biochemical characteristics of the organic matrix, usually purified and studied following decalcification of the shell, indicate that it comprises a heterogeneous set of macromolecules including chitin, hydrophobic ‘framework’ proteins, soluble proteins and glycoproteins (Crenshaw 1972; Weiner and Traub, 1984; Lowenstam and Weiner 1989; Keith et al. 1993; Levi-Kalisman et al. 2001; Bédouet et al. 2001; Marie et al. 2007, 2009a).

Because of its exceptional toughness (Jackson et al. 1988, Berthelat 2010), of its commercial value and remarkable biocompatibility properties when implanted in vivo (Atlan et al. 1997; Westbroek and Marin 1998), nacre is among the most studied mineralized biomaterial (Mann 2001). Many authors consider it as the reference model for understanding at the micro- and nano-scales how molluscs control the regular deposition of calcium carbonate crystals (Rousseau et al. 2005; Lin and Meyers 2005; Checa et al. 2009a; Gilbert et al. 2008). Nacre, also called mother-of-pearl, is the calcified internal layer of several mollusc shells. The mature nacreous layer consists of the superimposition of around 0.5-μm-thick aragonitic tablets, embedded in a peripheral thin organic matrix (Nakahara 1991; Addadi et al. 2006; Nudelman et al. 2008; Weiss 2010). The prisms that are almost always associated to nacre, are composed of calcitic or aragonitic needle-like structures of various lengths and diameters that always constitute the external calcified layer of shells, and that grow inward by accretion of crystal units on the inner surface of the periostracum (Marin et al. 2007; Checa et al. 2005). The individual prisms are stacked together in an insoluble and hydrophobic organic sheath, which forms a honeycomb-like structure. They also comprise a slight intracrystalline organic fraction (Marin et al. 2005).

Nacro-prismatic shell microstructure assemblage appeared in the Early Cambrian (Carter 1990; Feng and Sun 2003; Vendrasco et al. 2010). Since then, it remained apparently almost unchanged. Nowadays, nacro-prismatic microstructures are represented in at least three mollusc classes, bivalves, gastropods and cephalopods. In monoplacophorans, true nacreous layers are potentially observed only in one extant genus (Checa et al. 2009b). One key question is to know whether they are constructed from similar matrix protein assemblages, i.e. whether they share a common origin. If so, one can wonder whether the shell matrix proteins (SMPs) are conserved within the different taxa.

Answering these questions is laborious, but will shed a light on the process of recruitment of SMPs in the Cambrian, and on the evolutionary constraints exerted on these proteins during the Phanerozoic. In fact, since the elucidation of the primary structure of Nacrein (Miyamoto et al. 1996), the first-described nacre protein, the number of SMP sequences has increased, but remains limited (Marin et al. 2008). Currently, pursuing discovery of mollusc SMPs is particularly promising because of the genomic and transcriptomic resources that are expending rapidly for many phylogenetically diverse species that are also experimentally tractable (Weiss and Schönitzer 2006; Jackson et al. 2007a; Suzuki et al. 2009, Auzoux-Bordenave et al. 2010; Mamangkey and Southgate 2009; Inoue et al. 2010). Given the high proportion of novel genes being reported from non-model EST datasets and the flood of sequence data from next generation technologies, these results emphasise the importance of proteomic approaches for the validation and the annotation of coding sequences. This is especially relevant in the field of molluscan biomineralization where all characterised biomineral-associated proteins have no known homologues in any model species (Marin et al. 2008). There is already good precedent for such work in global investigations of mollusc shell proteins based on transcriptomics (Jackson et al. 2006, 2007a, 2010; Wang et al. 2010), proteomics (Marie et al. 2009b, 2010a, b), or both (Joubert et al. 2010). It is important to notice that these works concern mostly two mollusc genera—the pearl oyster Pinctada spp. and the abalone Haliotis spp.—which comprise more than 90% of the molecular data for mollusc SMPs (Marin et al. 2008). In a puzzling manner, despite microstructural resemblance between the nacro-prismatic shell layers of these two genera, no sequence homology has been reported so far for mollusc SMPs. Moreover, recent comparative EST approach performed on Haliotis and Pinctada calcifying tissues have highlighted clear differences in the gene sets devoted to the control of shell formation within the two genera (Jackson et al. 2010). Interestingly, Joubert et al. (2010) have highlighted that some biomineral-related proteins from gastropods share sequence similarity with the pearl oyster mantle EST-deduced proteins, but until now no direct evidence of their implication in shell formation process has been produced. On the other hand, although most of the protein domains of mollusc SMPs do not exhibit sequence similarity with other known proteins, few of them present striking domain homology with known extracellular matrix (ECM) proteins from vertebrates, suggesting a deep Precambrian origin (545 + Ma). For example, N66/Nacrein presents two carbonic anhydrase domains (Miyamoto et al. 1996), Perlustrin shows similarities with insulin-like growth factor binding proteins (Weiss et al. 2001), Pif-177 presents a Von Willerbrand A domain (Suzuki et al. 2009), Perlucin exhibits a C-type lectin domain (Mann et al. 2000), and Perlwapin possesses whey acidic protein (WAP) domains (Treccani et al. 2006).

In the present study, we have investigated the SMPs of the edible mussel Mytilus, in order to compare them with those of the above-mentioned taxa. Despite the recent increasing interest in mytilid shells for ocean acidification purposes (Miller et al. 2009; Gazeau et al. 2010), very few works focus on the calcifying shell organic matrix of representatives of this group (Weiner et al. 1977; Weiner 1983; Keith et al. 1993). To date, only one extrapallial fluid protein has been described (Hattan et al. 2001; Yin et al. 2005) and almost no data exist on Mytilus SMPs (Weiner 1983; Keith et al. 1993). By combining a proteomic approach based on the parallel investigation of the SMPs of three closely related species of edible mussel (M. edulis, M. galloprovincialis and M. californianus), and by interrogating the EST dataset recently published for this genera (Tanguy et al. 2008; Vernier et al. 2009; Craft et al. 2010), we report here for the first time the primary structure of nine SMPs, associated with the nacreous and the prismatic shell microstructures of Mytilus. These include three novel proteins (one of them probably corresponding to the N-terminus of P21 which was partially characterised by Keith et al. (1993)), four homologous proteins of Pinctada SMPs, and two homologous proteins of Haliotis SMPs. These results constitute the first report of conserved SMPs between Bivalvia and Gastropoda. We discuss the function of such proteins in calcifying matrix, the molecular evolution of SMP genes and the origin of mollusc nacro-prismatic SMPs.

Materials and Methods

Shell Matrix Extraction

Fresh adult M. edulis, M. galloprovincialis and M. californianus shells (6–12 cm in length) were collected from the Brittany coast (France), the Adriatic coast (Croatia) and the Californian coast (USA), respectively. Superficial organic contaminants as well as the periostracum were removed by incubating intact shells in NaOCl (1%, v/v) for 24 h. Shell calcified layers (nacreous + prismatic layers) were then thoroughly rinsed with deionised water, dried and then roughly crushed into fine powder (>200 μm). All subsequent extractions were performed at 4°C as previously described (Marin et al., 2005), with some modifications. Shell powder samples were decalcified overnight in cold dilute acetic acid (5%, v/v), which was slowly added by an automated titrator (Titronic Universal, Schott, Mainz, Germany) at a flow rate of 100 μL every 5 s. The solutions (final pH around 4.2) were centrifuged at 3,900×g (30 min). The resulting pellets, corresponding to the acid-insoluble matrices (AIMs), were rinsed 6 times with MilliQ water, freeze-dried and weighed. The supernatants comprising the acido-soluble matrices (ASMs) were filtered (5 μm) before being concentrated with an Amicon ultrafiltration system on a Millipore® membrane (YM10; 10 kDa cut-off). The concentrated solutions (about 5–10 ml) were extensively dialysed against 1 l MilliQ water (3 days, several water changes) before being freeze-dried and weighed.

Protein Cleavage

The trypsin digestion of the AIMs from the calcified shell layers (nacre + prisms) of M. edulis, M. galloprovincialis and M. californianus was performed in solution (Marie et al. 2008, 2009a). The samples (0.1 mg) were reduced with 25 μL of 10 mM dithiothreitol in 50 mM NH4HCO3 for 30 min at 50°C. Alkylation was performed with 50 μL of 50 mM iodoacetamide in 50 mM NH4HCO3 for 30 min at room temperature in the dark. Then the solution was treated with 1 μg of trypsin (Sequence grade, Promega, USA) in 10 μL 50 mM NH4HCO3 overnight at 37°C. The sample was dried in a vacuum concentrator and re-suspended in 30 μL of 0.1% trifluoroacetic acid and 4% CH3CN.

Mass Spectrometry Analysis

Mass spectrometry was performed using a Q-Star XL nanospray quadrupole/time-of-flight tandem mass spectrometer, nanospray-qQ-TOF–MS/MS (Applied Biosystems, France), coupled to an online nano liquid chromatography system (Ultimate Famos Switchos from Dionex, The Netherlands). One microlitre of samples were loaded onto a trap column (PepMap100 C18; 5 μm; 100 Å; 300 μm × 5 mm, Dionex), washed for 3 min at 25 μL min−1 with 0.05% trifluoroacetic acid/2% acetonitrile, then eluted onto a C18 reverse phase column (PepMap100 C18; 3 μm; 100 Å; 75 μm × 150 mm, Dionex). Peptides were separated at a flow rate of 0.300 μL min−1 with a linear gradient of 5–80% acetonitrile in 0.1% formic acid over 120 min. MS data were acquired automatically using Analyst QS 1.1 software (Applied Biosystems). Following a MS survey scan over m/z 400–1600, MS/MS spectra were sequentially and dynamically acquired for the three most intense peptide molecular ions over m/z 65–2000. The collision energy was set by the software according to the charge and mass of the precursor ion. The MS and MS/MS data were recalibrated using internal reference ions from a trypsin autolysis peptide at m/z 842.51 [M + H]+ and m/z 421.76 [M + 2H]2+.

MS Data Analysis

Protein identification was performed using the MASCOT search engine (Matrix Science, London, UK; version 2.1) against a protein database comprising the around 70,000 nucleotide sequences derived from the EST libraries of Mytilus spp. (mainly represented by the around 5,000, 19,000 and 42,300 sequences from M. edulis, M. galloprovincialis and M. californianus, respectively), downloaded (March 2010) from the NCBI server (http://www.ncbi.nlm.nih.gov). LC–MS/MS data were searched using carbamidomethylation as fixed modification, and methionine oxidation as variable modification. The peptide mass and fragment ion tolerances were set to 0.5 Da. The peptide hits were manually confirmed by the interpretation of the raw LC–MS/MS spectra with analyst QS software (Version 1.1). Quality criteria were the peptide MS value, the assignment of major peaks to uninterrupted y- and b-ion series of at least 3–4 consecutive amino acids and the match with the de novo interpretations proposed by the software.

Sequence Analysis

Protein sequence identification was attempted using BLASTp and tBLASTn analysis performed against Swiss-Prot, GenBank’s nrdb and dbEST using the online tool provided by UniProt (www.uniprot.org) and NCBI (http://blast.ncbi.nlm.nih.gov/blast.cgi) servers. Signal peptides were predicted using SignalP 3.0 (http://www.cbs.dtu.dk/services/SignalP/), and conserved domains were predicted using SMART (http://smart.embl-heidelberg.de/) and InterProScan (http://www.ebi.ac.uk/Tools/InterProScan/). The sequence alignments were performed with Clustal-W or hierarchical-clustering algorithms using UniProt (www.uniprot.org) or the MULTALIN (http://bioinfo.genotoul.fr/multalin/multalin.html) online tools, using default parameters.

Phylogenetic Analysis

Representative complete sequences of the major non-vertebrate metazoan CAs were selected from the results of BLAST searched performed with Mcal-CA, using UniProt and NCBI online tools, against Swiss-Prot, GenBank’s nrdb and dbEST, or the specific blast tool available from Lottia gigantea genome web site (http://genome.jgi-psf.org/pages/blast.jsf?db=Lotgi1). These selected sequences were compared with the molluscan mantle-secreted and non-secreted CAs and the M. californianus sequence detected in the current data set. The multiple alignment was created using T-Coffee (Notredame et al. 2000) set to standard parameters, and then a phylogenic reconstruction was using the maximum like-hood method PhyML (Guindon and Gascuel 2003) from the www.phylogeny.fr server (Dereeper et al. 2008). Accession numbers of the sequences used are: Elysia timida HP152215; Plakobranchus ocellatus HP204215; Strongylocentrotus purpuratus XP001179236; Capitella teleta EY522037; Lottia gigantea Lotgi1|238082|, Lotgi1|239188| from genome assembly (http://genome.jgi-psf.org/Lotgi1/Lotgi1.download.ftp.html); Crassostrea gigas CU996533; Pinctada maxima EZ420150, Q9NL38; Pinctada fucata Q27908; Mytilus californianus P86856; Haliotis gigantea BAH58349, BAH58350; Turbo marmoratus Q8N0R6; Callinectes sapidus A3FFY1; Panaeus monodon A9XTM5; Culex quinquefasciatus B0W447; Drosophila simulans Q3YMV3; Riftia pachyptila Q8MPH8; Nematostella vectensis A6QR76; Amphimedon queenslandica A6QR75, A6QR76, A6QR77; Ectocarpus siliculosus D8LB10; Chlamydomonas reinhardtii P20507.

Results

The Mytilus Shell

Like for other Mytilidae, the outer wall of the shell of Mytilus exhibits a multi-layered organo-mineral structure (Fig. 1). While the thin external layer, called the periostracum, is mostly organic and gives to the shell its dark brown colour (Fig. 1a), the rest of the shell is highly calcified and composed of an outer prismatic and an inner nacreous layer (Mutvei 1980; Feng et al. 2000; Dalbeck et al. 2008). Prisms are calcitic micro-needles, oblique to the external shell surface, that are enveloped by an organic sheath (Fig. 1b). The nacre consists of the brick-wall-like superimposition of around 0.5-μm thick aragonitic tablets, embedded by a peripheral thin organic matrix, together forming a cohesive framework (Fig. 1c). We carefully removed the periostracum with sodium hypochlorite treatment. After decalcification of the shell powder with cold acetic acid 5% (4°C), we subsequently extracted the matrix exclusively associated with the whole calcified layers of M. edulis, M. galloprovincialis and M. californianus. The AIMs represents around 1% by weight of the shell powder, while the ASMs represents only 0.05–0.1% by weight of the shell powder, and was not further investigated here.

Fig. 1
figure 1

The shell layers of the mussel Mytilus galloprovincialis. a General external view of the shell. b Scanning electron micrograph of texture detail of the external prismatic layer. c Scanning electron micrograph of texture detail of the internal nacreous layer

Proteomic Analysis

In order to investigate the largest part, if not all, of the protein set of Mytilus shell calcified layers, the non-fractionated AIM materials derived from the nacre + prism samples of M. edulis, M. galloprovincialis and M. californianus (representing around 95% of the total AIM + ASM amount) were similarly analysed by LC–MS/MS, after digestion with trypsin enzyme. For all samples, the peak list generated from the MS/MS spectra was directly interrogated against the Mytilus EST database using MASCOT software (Version 2.1). Using this approach, we were able to identify nine proteins of the Mytilus shell matrices (Table 1). The three novel proteins that did not present homology with already known shell proteins, nor putatively biomineral-related EST sequences, were called MUSP-1, MUSP-2 and MUSP-3, for Mytilus Uncharacterised Shell Proteins. Some of these proteins were detected in the shells of all three Mytilus species (MUSP-3 and chitin-binding-like), in two shell matrices (MUSP-1) or in only one of the three species. The MS/MS spectra corresponding to matching peptides were individually checked to confirm their peptide sequences. No additional peptides were identified by including phosphorylation as a variable modification during the MASCOT searches, indicating that specific enrichment and LC–MS procedures are needed to analyse these post-translational modifications. We also found that all conceptually translated EST sequences that match our MS/MS peptides possess a signal peptide. This indicates that these bioinformatically predicted proteins are likely to represent the entire amino N-terminus, and are genuinely secreted by the mantle epithelium.

Table 1 Identification of the shell matrix proteins of Mytilus by MS/MS analysis

Novel Mollusc SMPs

Three of the nine proteins that we have identified here (MUSP-1, MUSP-2 and MUSP-3) do not exhibit sequence similarity with other already described SMPs. MUSP-1 [P86853] was detected in M. galloprovincialis shell matrix. Its EST (FL490251) encodes for the C-terminus incomplete sequence of a protein of at least 181-AA long, presenting a 22-AA long signal peptide (Fig. 2a). Interestingly, two MUSP-1 trypsin peptides were also detected in M. edulis shell matrix, testifying that a homologue protein is also present in other Mytilus species, or at least also in M. edulis. The two M. californianus EST sequences, GE755963 and GE750813, encode for similar incomplete protein sequences that can be aligned to form a unique contig (Supplementary data S1) in one unique complete sequence of MUSP-2 [P86858], a 341-AA long protein containing a 24-AA long signal peptide (Fig. 2b) that was detected in M. californianus shell matrix. When the signal peptide is removed from Mcal-MUSP-2 sequence, the resulting protein exhibits a theoretical molecular mass of 36 kDa and a calculated pI around 11. No significant hit could be found for MUSP-1 and MUSP2, when sequence similarities and protein domains were searched by using BLASTp, tBLASTn and SMART domain tools, respectively. These data suggest that MUSP-1 and MUSP-2 are entirely novel proteins associated with nacro-prismatic structures that very likely do not present known homologous protein in other metazoan taxa, including pteriomorphid bivalves.

Fig. 2
figure 2

Sequence analysis of novel shell proteins: MUSP-1, MUSP-2 and MUSP-3. a The AA sequence of Mgal-MUSP-1 [P86853] is deduced from the translation of the EST entry FL490251. b The AA sequence of Mcal-MUSP-2 [P86858] is deduced from the alignment of the translated sequences of the entry GE755963 and GE750813 in a unique contig (Supplementary data S1). c The AA sequence of Mcal-MUSP-3 [P86859] is deduced from the translation of the EST entry GE749275. The predicted signal peptides are underlined. The peptides identified by MS/MS are indicated in red/grey. The asterisks mark the stop codons. Missing sequence information is indicated by “?”. d Alignment of AA sequences of MUSP-3 [P86859] with the deduced sequence from GE749275 of Pinctada maxima. The predicted signal peptides are underlined. Conserved AA positions are shaded in blue (Color figure online)

On the other hand, MUSP-3 [P86859] was detected in the shell matrix of the three Mytilus species. Its EST (GE749275) encodes for a complete sequence of a 174-AA long protein with a 16-AA long signal peptide (Fig. 2c). When the signal peptide is removed from the Mcal-MUSP-3 sequence, the resulting protein exhibits a theoretical molecular mass of 17 kDa and a calculated pI around 10. Although no significant hit was obtained with BLASTp or protein domain searches, we noticed that the N-terminal sequence of MUSP-3 presents remarkable 82% sequence similarity (but with an insignificant E-value score) with the 30-AA long N-terminus of P21 protein (Q9TWS3), previously described from soluble matrix of M. edulis shell (Keith et al. 1993). Additionally, nBLASTt search with MUSP-3 against all metazoan EST sequences indicates remarkable sequence identity (Fig. 2d) with the EST PmaxCL82Contig1 (EZ420213) that encodes for a putative protein predicted to be secreted by the mantle of the pearl oyster Pinctada maxima (Jackson et al. 2010), and which has not been, to date, detected in calcified shell layers using a similar proteomic approach (B. Marie, unpublished data). These observations suggest that MUSP-3-related proteins constitute a novel family of pteriomorphid conserved proteins.

Bivalve-Conserved SMPs

We have identified four Mytilus SMPs that exhibit high sequence homologies with SMPs extracted from the nacro-prismatic shells of Pinctada bivalves (carbonic anhydrase, MSI60, chitin-binding and fibronectin). Peptides matching Mcal-carbonic anhydrase (Mcal-CA) [P86856] were detected in the M. californianus shell samples (Fig. 3a). The putative ORF for this protein was deduced from the alignment of the ESTs GE751262 and GE749008 in a unique contig (Supplementary data S2). The conceptually derived protein sequence of Mcal-CA exhibits a N-terminus incomplete sequence of 321-AA long. The Mcal-CA sequence exhibits a characteristic CA domain with high sequence identity for AA position conserved in metazoan CA and involved in the catalytic activity of the enzymatic domain, suggesting that Mcal-CA is an active CA (Fig. 3b). Figure 3c shows the phylogenetic reconstruction of the relationships between different mollusc and metazoan CAs (Fig. 3c). This analysis clearly indicates that Mcal-CA belongs to a group of molluscan mantle-secreted CAs that are likely to be included in shell or involved in shell deposition that is distinct from other molluscan non-secreted CAs.

Fig. 3
figure 3

Sequence analysis of Mytilus shell carbonic anhydrase (CA). a The protein sequence of Mcal-CA [P86856] is deduced from the alignment of the translated sequences of the entry GE751262 and GE749008 in a unique contig (Supplementary data S3). The peptides identified by MS/MS are indicated in red/grey. Missing sequence information is indicated by “?” and asterisk marks the stop codon. b Sequence alignment of a partial sequence of Mcal-CA with representative CA from various metazoan taxa. We observed that the enzymatic domains (shaded in purple/grey) are conserved in all represented CA forms. c Phylogram of CAs from various metazoan taxa. Included in the phylogenetic analysis are representative metazoan CAs, the molluscan mantle-secreted and molluscan shell-extracted CAs involved in biocalcification, other molluscan CAs, and two outgroup sequences (brown and green alga). Numerals at each node show local likelihood ratio values estimated by PhyML. The scale bar indicates an evolutionary distance of 0.8 AA substitution per position in the sequences. The Mytilus shell-extracted CA [P86856] is shown by a black arrow. (SP) and (-) indicate that the protein presents a characteristic signal peptide sequence or not, respectively. Stars indicate that proteins were extracted from calcified shell layers (Color figure online)

Interestingly, our results confirm the presence of CA in other pteriomorph bivalve shells, but also indicate that the Mytilus shell CA does not present the GN-repeat sequences, characteristic of Pinctada Nacrein proteins (Smith-Keune and Jerry 2009). However, in spite of the absence of GN repeats in Mytilus shell CA, the best sequence alignment was observed with enzymatic CA domain of Nacrein, the shell-specific CA from Pinctada fucata (Fig. 4a). The specific role of CA in a bivalve nacre extracellular calcifying matrix is still puzzling and may be related to the fine regulation of ionic balance at the vicinity of the biomineral structure formation.

Fig. 4
figure 4

The bivalve conserved shell proteins detected in Mytilus shell matrices and their homologous proteins from Pinctada shells. a Sequence alignment of Mcal-CA (alignment of GE751262 and GE749008 in a unique contig, Supplementary data S1) with Nacrein from Pinctada fucata [Q27908]. b Sequence alignment of Mcal-MSI60 [P86857] (GE749643) with MSI60 from Pinctada fucata [O02402]. c Sequence alignment of Mcal-Chitin-binding [P86860] (alignment of ES393395 and ES393550 in a unique peptide, Supplementary data S1) with chitin-binding from Pinctada maxima (PmaxCL21Contig1 EZ420121). C-bind = chitin-binding. d Sequence alignment of Mcal-Fibronectin [P86861] (GE759315) with Fibronectin from Pinctada maxima (PmaxCL366Contig1 EZ420486). FN3 = Fibronectin of type 3. The conserved AA positions are shaded in blue/grey. The asterisks mark the stop codons (Color figure online)

Four different peptides corresponding to the partial sequence of Mcal-MSI60 [P86857] were detected in the shell matrix of M. californianus. The conceptually deduced sequence of the EST GE749643 encodes for the 189-AA long C-terminus sequence of a poly-Ala protein that presents high sequence similarities with MSI60 (Fig. 4b), previously described from the shell of Pinctada fucata (Sudo et al. 1997). MSI60 is a nacre specific insoluble framework protein that exhibits 11 poly-Ala blocks and 39 poly-Gly blocks dispersed throughout the sequence. The poly-Ala blocks confer to MSI60 structural similarity with silk fibroins.

The ES393395 and ES393550 EST sequences can be aligned in a unique contig sequence (Supplementary data S3) and the resulting sequence encodes for the C-terminus of a 294-AA long incomplete sequence of Mcal-Chitin-binding [P86860] that can be detected in the calcified shell layers of M. edulis, M. galloprovincialis and M. californianus. A SMART search for protein domains indicates that it contains a Peritrophin-A chitin-binding domain (Pfam:CBM_14). Interestingly, the result of the tBLASTn search indicates a high sequence similarity (if not a true homology) with a putative protein (Fig. 4c) encoded by the EST PmaxCL21Contig1 EZ420121 (Jackson et al. 2010), that was in parallel detected by similar proteomic analysis of the nacreous layer of the pearl oyster Pinctada maxima (Marie B., unpublished data).

Additionally, one peptide corresponding of the partial putative sequence of Mcal-Fibronectin [P86861], conceptually deduced from the EST GE759315, was detected in the shell layer of M. californianus. This EST encodes for a 224-AA long N-terminus sequence presenting a 17-AA long signal peptide, for which SMART search indicates the presence of a fibronectin-type 3 domain (FN3). The tBLASTn search indicates a high sequence similarity with a putative fibronectin-containing protein (Fig. 4d), encoded by the EST PmaxCL366Contig1 EZ420486 (Jackson et al. 2010), that was also detected by proteomic analysis of the prismatic layer of the pearl oyster Pinctada maxima (Marie B., unpublished data).

Bivalve/Gastropod-Conserved SMPs

One of the most interesting results of our study was the detection in Mytilus shell matrix of two proteins that present high sequence homologies with SMPs extracted from the nacro-prismatic shells of Haliotis gastropods, constituting the first report of conserved SMPs between the two taxa. Three and two different peptides corresponding to the sequences of Mgal-Perlwapin [P86855] and Mgal-Perlucin [P86854], respectively, were detected in the shell matrix of M. galloprovincialis. Figure 5 illustrates the de novo sequencing of the three peptides observed for Mgal-Perlwapin. Following signal sequence removal, Mcal-Perlwapin and Mcal-Perlucin are characterised by theoretical pIs of 10 and 6, and theoretical molecular weights of 14 and 16 kDa, respectively. The Mgal-Perlwapin EST (FL494664) encodes a 141-AA long protein, presenting a 19-AA long signal peptide and two consecutive whey acidic protein domains (WAP). Mgal-Perlwapin presents high sequence similarity with Perlwapin proteins. Interestingly, Perlwapin proteins have been previously described from the shell of the gastropods H. laevigata (Treccani et al. 2006) and H. asinina (Marie et al. 2010b). These shell-extracted Perlwapin exhibit three successive WAP domains and their sequence alignment with Mgal-Perlwapin (Fig. 6a) indicates a good conservation of the Cys residues of the WAP domains that are potentially involved in protease inhibitor function. On the other hand, the Mgal-Perlucin EST (AJ624413) encodes a 156-AA long protein, presenting a 20-AA long signal peptide and a characteristic C-type lectin domain (CTL). Mgal-Perlucin presents high sequence similarity with the C-type lectin domain containing proteins from various metazoans and especially with the Perlucin protein (Fig. 6b) that has been previously described from the shell of the gastropods H. laevigata (Mann et al. 2000). The alignment of these two shell-extracted Perlucin sequences (Fig. 6b) presents a good conservation of the Cys- and Trp-rich regions that are involved in the Ca2+-dependent carbohydrate recognition ability of C-type lectins.

Fig. 5
figure 5

Example of de novo sequencing for the three peptides matching with Mgal-Perlwapin sequence. a MS/MS spectrum of the CAAVTVNK peptide (m/z 431.74). b MS/MS spectrum of the FNCLFQK peptide (m/z 478.74). c MS/MS spectrum of the CAAVTVNKK peptide (m/z 495.78). The de novo sequencing was performed by considering precise mass differences between adjacent b and y ion series

Fig. 6
figure 6

The bivalve/gastropod conserved shell proteins detected in Mytilus shell matrices and their homologous proteins from Haliotis shells. a Sequence alignment of Mgal-Perlwapin [P86855] (FL494664) with Perlwapin from H. laevigata and H. asinina, [P84811] and [P86730], respectively. b Sequence alignment of Mgal-Perlucin [P86854] (AJ624413) with Perlucin from Haliotis laevigata [P82596]. The conserved AA positions are shaded in green/grey. The asterisks mark the stop codons (Color figure online)

Discussion

Mollusc Nacro-Prismatic Shells

Nacre and prisms seem to be evolutionary-conserved microstructures that could have been observed in the shell of numerous molluscs from the early Cambrian to our days. At first sight, these microstructures are described by simple terminologies, ‘prism’ on one side and ‘nacre’ on the other side. However, these terminologies are misleading. For example, while gastropod and cephalopod nacres are described as “columnar”, the bivalve nacre is presented as “sheet nacre”, with characteristic arrangement of nacre tablets in a ‘brick-wall’ manner (Nakahara 1991). Furthermore, the arrangements of the three axes that characterise their crystal orientations differ between the different mollusc classes (Chateigner et al. 2000, 2010). Indeed, both gastropod and bivalve nacres orient the c axis perpendicular to the shell surface, but in the other hand, the b axis is oriented in the direction of shell growth for bivalves (Chateigner et al., 2000), whereas, in gastropod nacre, b and c axis are gradually co-oriented from the prismatic boundary (Gilbert et al. 2008). These crystallographic differences suggest that the modes of deposition of aragonite platelets could be different within the different mollusc clades, at the molecular level.

Different Sets of Nacro-Prismatic SMPs

The organic matrices extracted from nacro-prismatic shell of several gastropods, cephalopods and bivalves have been the subject of many investigations since they were believed to control the deposition of calcified shell layer (Crenshaw 1972; Mann 1988; Lowenstam and Weiner 1989). As the amino acid analysis of nacro-prismatic shells of different molluscs exhibited similar compositions, with characteristically high contents of Gly, Ala and Asx residues (Keith et al. 1993), it was postulated that the molecular mechanism controlling the formation of these different shell layers is identical from clade to clade. But the discovery of an increasing number of SMPs has revealed an unexpected diversity of nacro-prismatic associated proteins among the different taxa (Marin et al. 2008), rendering this idea oversimplified. Furthermore, a preliminary comparative proteomic approach on four nacreous molluscs has suggested that the nacre protein content of these four genera are different (Marie et al. 2009b). More recently, Jackson et al. (2010) have demonstrated, by using a specific EST approach, the drastic differences in the respective shell building gene sets of the bivalve Pinctada and of the gastropod Haliotis. As underlined by these authors, the data suggest that “the Bivalvia and the Gastropoda have either independently evolved the ability to deposit nacre or that subsequent to the genesis of the ability in a common ancestor, bivalves or gastropods have significantly modified the molecular mechanism that guide this process” (Jackson et al. 2010). We generally agree with this statement—based solely on two genera—but feel that it should be balanced, with the introduction of the new data on Mytilus, especially those that show unexpected similarities between mussel and abalone Perlwapins and Perlucins.

Figure 7a summarises the list of the around sixty SMPs that are now known for the three main models of nacro-prismatic molluscs, the bivalves Mytilus and Pinctada, and the gastropod Haliotis. Beside the four common SMPs between Pinctada and Mytilus (CA, MSI60, Chitin-binding and Fibronectin), the two common SMPs between Pinctada and Crassostea (Fibronectin and EGF-like, described in Marie et al. 2011), we notice that the distribution of these SMPs follows a “mosaic pattern”, which means that some SMPs are absent from, at least, one of the studied models (Fig. 7b). The first example is that of N14/N16/Pearlin which is present only in the pearl oyster shell matrix. N14/N16/Pearlin represents one of the main components of the Pinctada SMP set, and it is believed to be essential in the deposition of the nacreous layer (Samata et al. 1999; Kono et al. 2000). We have searched for N14/N16/Pearlin homologue in Mytilus EST database (also in Crassostrea and Haliotis EST db), using both BLASTn and tBLASTn, and no hit was observed. Taken together, these data suggest that no detectable homologous protein of N14/N16/Pearlin is present in Mytilus, highlighting significant differences with Pinctada in the molecular mechanisms of nacre deposition. Similarly, the homologues of other Pinctada SMPs (e.g. Pif-177, Shematrin, Tyrosinase and KRMP) were not detected by our proteomic approach in Mytilus shell. However, we cannot exclude here the possibility that the Mytilus EST dataset does not represent the whole shell-forming transcriptome. Indeed, the efficiency of such a proteomics approach relies largely on the completeness of the EST data set. In this study, we have exploited a pool of around 70,000 different ESTs from various tissues of M. edulis, M. galloprovincialis and M. californianus (Tanguy et al. 2008; Vernier et al. 2009; Craft et al. 2010). We noticed that, although homologous proteins were detected in different Mytilus shells (e.g. MUSP-1, MUSP-3 and Chitin-binding), the corresponding mRNAs only appeared in the EST of one of these species (Table 1), testifying of important qualitative differences in their respective EST dataset. Indeed, important variations in biomineralising gene expression are likely to occur between individuals according to their developmental stage (Jackson et al. 2007a), to their respective physiological condition or even depending on the moment of the day (Miyazaki et al. 2008). This point should be carefully considered, especially when sampling calcifying tissues for transcriptomic analysis. Moreover, we are aware that the EST data sets used in this study are not exhaustive, and future efforts will likely reveal additional SMPs For example, a recent proteomic analysis of the calcified skeleton of the sea urchin Paracentrotus purpuratus evidenced an unexpected diversity of matrix proteins, due to the availability of a important dataset from Spur_v2.1 draft genome (Mann et al. 2008a, b).

Fig. 7
figure 7

Evolution of the composition of the calcifying matrix in molluscs. a Comparison of shell matrix composition between the nacro-prismatic shell models Pinctada, Haliotis and Mytilus. Black and grey boxes indicate when the proteins were isolated from the prismatic or the nacreous layer, respectively. b Presence/absence mosaic pattern of mollusc SMPs in front of the phylogenetic relationship of the main models. “?” indicates that although CA was observed by the mantle epithelial cells, no CA was directly observed from the shell of Haliotis spp. (Le Roy et al., unpublished). Fibronect = Fibronectin; Chit-bind = chitin-binding; HUSP = Haliotis Uncharacterised Shell Protein

CA is a ubiquitous metalloenzyme, essential in calcification processes (Wilbur and Jodrey 1955; Medakovic 2000), that catalyses the production of bicarbonate ions, that subsequently react with calcium ions to form calcium carbonate. This enzyme has been observed in an increasing number of calcifying epithelia and was also extracted from calcified biominerals from different models belonging to a wide range of metazoan species (Rahman et al. 2005; Tambutté et al. 2007; Jackson et al. 2007b; Mann et al. 2008). In molluscs, a shell-specific form of CA, Nacrein, containing both CA active domain and long GN-repeats, has been isolated from the shell matrix of the pearl oysters Pinctada (Miyamoto et al. 1996; Kono et al. 2000). Similarly, a Nacrein-related protein sequence has been described from the analysis of the mRNA of the mantle of the gastropod Turbo marmoratus (Miyamoto et al. 2003), but to date no CA has been directly detected from the shell nacre of this gastropod nor from a cephalopod nacre (Marie et al. 2009a). On the other hand, the proteomic investigation of the limpet Lottia gigantea SMPs reveals the presence of two different CAs in the calcifying matrix of this gastropod. As shown in Fig. 3c, the phylogenetic reconstruction of non-vertebrate metazoan CAs clearly distinguishes two groups of mollusc CAs—the mantle-secreted CAs and the non-secreted CAs—suggesting a common origin between bivalves and gastropods for the recruitment of a specific CA for the process of shell deposition. Surprisingly, and in spite of numerous studies, no shell CA has ever been detected in the Haliotis shell matrices (for review see Marin et al. 2008; Jackson et al. 2010; Marie et al. 2010b; Le Roy et al., unpublished data). However, we cannot exclude the possibility that one or more mantle-secreted CAs could be specifically involved in shell deposition process, but remain absent of shell-integrated matrix protein set, as suggested by molecular biology data (Le Roy et al., unpublished data). This proves without ambiguity that major calcifying matrix proteins, although always in contact with the mineralization front, may be ultimately not integrated within the calcifying shell matrix during the deposition of the calcified shell layers.

Interestingly, our study emphasises the apparent absence of EP (extrapallial protein, Q6UQ16) in the Mytilus SMP set, in spite of the presence of EP mRNA in the Mytilus EST dataset used for our proteomic investigation. EP is a His-rich Ca2+-binding glycoprotein that was clearly detected from extrapallial fluid of Mytilus edulis by both MALDI or ESI MS/MS techniques, and that was previously supposed to be part of the SMP (Hattan et al. 2001; Yin et al. 2005). Our data suggests that this protein is not incorporated within the shell, while it represents the main extrapallial fluid content and its apparent strong interactions with calcium ions.

Evolution and Origin of Mollusc SMPs

The fact that Perlwapin and Perlucin homologues have been observed from the nacro-prismatic shell matrix of both Mytilus and Haliotis may suggest that these proteins were present in the calcifying matrix of the last common ancestor of conchiferan molluscs. Alternately, we cannot exclude the possibility that these proteins were recruited twice independently in both classes. As the sequence similarities of these proteins are restricted to the active site residues—a fact that allows assigning them to their respective family—it is still difficult to determine whether they are orthologues or paralogues, and more work at the gene level should be performed to give an unambiguous answer.

Interestingly, the similarities detected between the bivalve and gastropod Perlwapins and Perlucins find an echo with previous findings on putatively conserved shell proteins domains, such as Kunitz-like domains, detected in both Pinctada and Haliotis (Liu et al. 2007; Marie et al. 2010b). Here again, such domains may be inherited from the last common ancestor of bivalves and gastropods, or may result from independent recruitments. So far, we cannot yet definitively conclude on the evolutionary scenario for calcifying matrix proteins of the different conchiferan molluscs.

SMP Description by Interrogating EST Dataset with a Proteomic Approach

Our observations make obvious the value of EST libraries when used in conjunction with a shotgun proteomic approach for the investigation of calcifying matrix proteins. By establishing that several of the predicted proteins from the EST dataset are actually components of the shell, we are able to make hypotheses about their direct contribution to shell construction and the implications of their evolution among calcifying shell matrices. Without this, the EST dataset is simply a list of sequences that can be associated to putatively secreted protein sequences, for which functional assumption can be only attempted according to sequence similarity with already described proteins, and is not valuable for the description of novel proteins.

The main contribution of this article is that it is taking the current push of sequencing huge numbers of ESTs to the next step, which is to try and get some functional information from these genes for biomineral formation purposes. The challenge that now faces the field is to characterise the function of novel biomineral associated proteins, using in vivo or in vitro techniques.

Conclusion

Aside from the significant differences in the molecular mechanisms used by the bivalve Pinctada and the gastropod Haliotis for nacro-prismatic shell deposition (review Marin et al. 2008, 2010b; Jackson et al. 2010), we observed that the shell protein set of the nacro-prismatic bivalve Mytilus is partly similar to that of other bivalves, but also shares few similarities with that of the gastropod Haliotis. The evolutionary picture that emerges is, for the moment, patchy. We suggest that the mollusc SMP sets may follow a mosaic phylogenetic pattern, suggesting that the process of the integration in the shell of the mantle secreted proteins may be a complex phenomenon, which does not take place according to the taxonomic position of the considered species. We believe, furthermore, that important molecular functions for shell calcification may not be represented in the shell, once formed.

The origin and evolution of molluscan SMPs appears to be a complex phenomenon, which will require large-scale comparisons across the whole Mollusca phylum, which means, accurate systematic and wide sampling of mollusc species, for deciphering the whole protein set (mantle secreted proteins, extrapallial fluid proteins together with shell-incorporated matrix proteins) involved in shell calcification.