Introduction

Presently, members of two major lophotrochozoan phyla, Mollusca (the mollusks) and Brachiopoda (the brachiopods) form extremely biomineralized (calcified) external tissue called the shell (Crenshaw 1989; Cowen 2013; Wernström et al. 2022). For the animals still retaining this structure, the shell is important, because it mainly functions by giving structural support and protection of soft body parts (Lowenstam and Weiner 1989; Simkiss and Wilbur 1989). The presence of the shell might have given the mollusks an edge to survive, contributing to their biodiversity (ca. 85,000–200,000 extant known species) and success in adaptation to various environmental conditions, from the deep seas to the shallow seas and freshwater, from terrestrial to subterranean environments (Lindberg 2001; Ponder et al. 2019; Prié 2019). As the group’s major synapomorphy (Kocot et al. 2017), shells had evolved early during the evolution of mollusks in Early Cambrian (ca. 525 million years ago) (Budd 2003; Jackson et al. 2010; Vinther 2015). Of the presently recognized seven classes of extant mollusks (Aplacophora, Polyplacophora, Gastropoda, Bivalvia, Scaphopoda, Monoplacophora, and Cephalopoda) (Kocot et al. 2020), six of them (i.e., besides Aplacophora) are shelled (Sigwart and Sutton 2007). Interestingly, however, the spicules on the skin of Aplacopora are thought to form through the precipitation of calcium carbonates on the skin/mantle tissue-secreted proteins (Woodland 1907; Beedham and Trueman 1968), suggesting a possibility that they are homologous to the shell (e.g., Scheltema 1993; Todt and Wanninger 2010; McDougall and Degnan 2018; Wanninger and Wollesen 2019), but probably extremely reduced during its evolution (Kocot et al. 2019). Meanwhile, some members of the shelled six classes have reduced or lost their shells, such as the teredinid shipworms (Bivalvia) and the nudibranch sea slugs (Gastropoda). The non-shell-forming mollusks, however, apparently retain their shell matrix protein-coding genes in their genome (Setiamarga et al. 2021; Yoshida et al. 2022).

Although exact details of the mechanisms of shell formation, biomineralization, and their evolution in mollusks are still unclear (Furuhashi et al. 2009; Pomar 2020), recent studies have provided some key insights. First, the molluscan shell is composed mainly of calcium carbonate crystals, such as aragonite and calcite, arranged in several superimposed layers in different arrangements with bioorganic molecules mixed in (Spann et al. 2010; Frenzel and Harper 2011; Marin et al. 2012). This is because the molluscan shell is formed through an organic matrix-mediated process (Weiner et al. 1983), where cells and tissues secrete biomolecules, such as proteins, forming organic layers that later act as a framework in which inorganic crystal nucleation and mineral salt precipitation occurs (Lowenstam 1981; Isowa et al. 2012). The involvement of proteins in biomineralization and shell formation, including morphogenesis at both the micro, macro, and molecular levels suggests that they are genetically controlled (Carter 1990). Although present only in trace amounts inside the shell, the proteins (called shell matrix proteins; SMPs) apparently play essential roles in shell formation and structural maintenance (Addadi et al. 2006; Marin et al. 2013). For example, Pif, a major SMP, was shown to be crucially involved in nacre formation (Suzuki et al. 2009), while the highly acidic protein Aspein (Isowa et al. 2012) promotes calcite precipitation (Takeuchi et al. 2008). Recent development in technology in large-scale analysis of biomolecules using multiomics approaches, such as genome sequencing, transcriptomics, and proteomics, have allowed for the identification of relatively comprehensive lists of SMPs in various species of mollusks (e.g., Marie et al. 2011, 2012, 2017; Mann and Edsinger 2014; Liu et al. 2015; Feng et al. 2017; Liao et al. 2015; Zhang et al. 2018; Zhao et al. 2018; Ishikawa et al. 2020).

Except for the early branching Nautiloidea (the nautiloids), all members of extant Cephalopoda (the cephalopods; e.g., vampire squids, octopuses, squids, and cuttlefishes) had evolutionarily internalized, degenerated, or completely lost their shells (Kröger et al. 2011; Setiamarga et al. 2021; Setiamarga 2021) (Fig. 1A). Phylogenetically, the cephalopods are included in the subphylum Conchifera (the conchiferans), a monophyletic group composed of five of the seven classes of calcified external shell-forming mollusks (i.e., Gastropoda, Bivalvia, Scaphopoda, Monoplacophora, and Cephalopoda) (Kocot et al. 2011, 2020; Smith et al. 2011). Many extinct members of the cephalopods, such as the ammonites (Fig. 1B) and some extinct nautiloids (Fig. 1C), also had prominent biomineralized shells (Kröger et al. 2011; Ponder et al. 2019; Setiamarga 2021). Thus, to obtain a more complete general picture of the SMPs and their evolution in mollusks, information and insights from diverse cephalopods are important. Moreover, because of the various degrees of shell degeneration in extant cephalopods, a comparative study would allow for the elucidation of gene loss and recruitments during shell evolution, and the identification of key genes in molluscan shell formation.

Fig. 1
figure 1

A Cladogram of shelled mollusks. B Reconstruction of an ammonite. C Reconstruction of Orthoceras, an extinct nautiloid. D Scanning electron micrographs of N. pompilius shell microstructure

The nautiloids (Nautiloidea sensu stricto) is an early branching clade of the cephalopods (Saunders 1981). There are eight presently recognized extant nautiloid species in a single family (Nautilidae): Nautilus pompilius, N. belauensis, N. macromphalus, N. stenomphalus, N. vitiensis, N. samoaensis, N. vanuatuensis, and Allonautilus scrobiculatus (Wray et al. 1995; Barord et al. 2023). Extant nautiloids are considered as living fossils, because they retain many ancestral traits of the cephalopods including the external, chambered shell (Fig. 1D) (Ward et al. 1984; Fortey 2011; Kröger et al. 2011; Setiamarga 2021). Previously, we conducted a multiomics study on the SMPs of one of the eight surviving species of the nautiloids, N. pompilius, by conducting transcriptomics of the mantle tissue and proteomics of the shell, to obtain their SMPs (Setiamarga et al. 2021). By comparing obtained N. pompilius SMPs with those of other conchiferans, we found that most protein domains were conserved and present in the putative ancestral conchiferans. At around the same time when our study was concluded, two draft genomes of N. pompilius were reported (Zhang et al. 2021; Huang et al. 2022), with the former study also including the identification of 78 SMPs. Since the two N. pompilius SMPs studies used different methods, there are slight differences in the number of overlapping SMPs identified (Fig. 2A). Therefore, the two SMP data sets must be compiled and curated, to obtain a more complete view of the proteins and their evolution, as well as their functions during shell formation. Comparisons with the SMPs of other conchiferans including other cephalopods will help to shed light on another piece of information regarding the evolution of biomineralized shells in mollusks at the molecular and genetic levels, and how such an important morphological character was lost in the evolution of the cephalopods.

Fig. 2
figure 2

Newly compiled data of N. pompilius shell matrix proteins (SMPs). A Comparison between the number of identified SMPs in Setiamarga et al. (2021) and Zhang et al. (2021). The newly compiled data re-identified a total of 85 proteins. B The schematic view of the domain structures in the re-identified 85 SMPs of N. pompilius. C Sequences of structural proteins with repeat motifs rich in Gly/Ala rich. Gly, Ala, and Asp are shown in red, blue, and green letters, respectively. Signal peptides are underlined

Materials and methods

Nautilus pompilius sequence data acquisition and compilation

The shell matrix proteins (SMPs) of N. pompilius were data-mined from two previously published data sets (Setiamarga et al. 2021; Zhang et al. 2021). We re-identified the full sequences of SMPs from both studies by mapping the sequences on the draft genome sequence reported by Huang et al. (2022) using BLASTp searches, and afterward compared the data sets. Afterward, domain structures of identified sequences were predicted using the online version of SMART (Letunic et al. 2021; http://smart.embl-heidelberg.de/smart/change_mode.pl; accessed in November 2022) and InterProscan (Jones et al. 2014; https://www.ebi.ac.uk/interpro/search/sequence/; accessed in November 2022). Signal peptide domain was predicted using SignalP 5.0 (Nielsen et al. 2019; Almagro Armenteros et al. 2019; https://services.healthtech.dtu.dk/service.php?SignalP-5.0; accessed in November 2022).

Comparative analysis of shell matrix proteins

Sequence data of shell matrix proteins (SMPs) of 13 mollusks were collected from previously published data, including those of two cephalopods, the ramhorn squid Spirula spirula (Oudot et al. 2020) and the pharaoh cuttlefish Sepia pharaonis (Liu et al. 2023). The complete list of the molluscan species compared in this study is shown in Table S1. Reciprocal local BLASTp was conducted to identify shared proteins among these species. A genomic search using local BLASTp on validated gene models was conducted to look for the availability of the homologs of molluscan SMP-coding genes in the genome of N. pompilius. Reciprocal local BLASTp searches were also conducted for the three cephalopod species compared in this study. All searches were conducted under the e value threshold score of < 1e-20. Homologous proteins among species were visualized using Circos v 0.69 (Krzywinski et al. 2009). It is to be noted that the sequence data of some of the proteins and/or genes identified by Oudot et al. (2020) are not published. For such sequences, we based our analyses only on the reported name of the sequences, by comparing them to the name assigned by the annotation to our sequences.

Phylogenetic analyses

Phylogenetic analyses were conducted on a select major SMPs for which the homologs are present in multiple mollusks compared in this study (SOUL Domain-containing protein, CD109 antigen, Chitinase, Tyrosinase, EGF-ZP Domain-containing protein). To do so, amino acid sequences were collected from Ensembl Metazoa (https://metazoa.ensembl.org/index.html; accessed in November 2022), MolluscDB (Liu et al. 2021; http://mgbase.qnlm.ac/home; accessed in November 2022), published SMPs data collected from various repositories (Table S2), Genbank (NCBI; https://www.ncbi.nlm.nih.gov/; accessed in November 2022), using HMM (HMMER v3.3.2; Eddy 2009) and BLASTp searches. Multiple sequence alignments were performed in MAFFT v7 server (Katoh and Standley 2013; https://mafft.cbrc.jp/alignment/server/; accessed in November 2022). Trimming to exclude poorly aligned regions was conducted in trimAl v1.3 (Capella-Gutiérrez et al. 2009), which was implemented in Phylemon 2 (Sánchez et al. 2011; http://phylemon.bioinfo.cipf.es; accessed in November 2022). Amino acid substitution model selections were conducted in MEGA X (Kumar et al. 2018) using Akaike information criterion (AICc) (Stecher et al. 2020). Maximum-likelihood (ML) phylogenetic tree was inferred using RAxML v.8.2.12 with 1,000 bootstrap replicates (Stamatakis 2014).

Whenever possible, full amino acid sequences were utilized in the phylogenetic inferences. However, when impossible, sequences of one or more representative domains of the proteins/genes were used. For example, due to the diversity of domain structures of Chitinases, a reliable alignment using full sequences was not possible. Therefore, for phylogenetic analysis, only the GH18 domains were used in both alignments and phylogenetic inferences.

Microstructure observations of Nautilus pompilius shell

The observation of shell microstructure of N. pompilius was conducted in the VE-8800 Scanning Electron Microscope (SEM) (Keyence, Osaka, Japan). Sample preparation was as follows. Samples were fragmented into small pieces (ca. 1 cm2) using a precision cutoff machine TS-45 (Maruto Testing Machine Co., Tokyo, Japan). Observation surface was polished using the polisher FG-18 (Ryobi Ltd., Hiroshima, Japan). Afterward, the observation surface was then treated with 1 M NaOH for 10 min and etched with Mutvei’s solution for 5 min (Schöne et al. 2005). After ultrasonic cleaning using US Cleaner USD-2R (AS ONE Corp., Osaka, Japan) for 5 min, the sample was left to air-dry overnight, sections were coated with osmium using Osmium coater HPC-1SW (Vacuum Device Inc., Ibaraki, Japan). The SEM image was visualized under the accelerating voltage of 10–12 kV.

Results

Nautilus pompilius sequence data acquisition and compilation

Previously, we identified 61 unique shell matrix protein (SMP) sequences in the shell of N. pompilius through multiomics analysis (mantle tissue transcriptomics vs. shell proteomics) (Setiamarga et al. 2021). However, because of the lack of sequencing depth, 14 out of 61 sequences showed frameshifts most likely caused by sequencing errors, while of the 47 in-frame sequences 41 were fragmented. Meanwhile, in their report about the full draft genome of N. pompilius, Zhang et al. (2021) identified 78 SMPs. When data of the two studies were compared, only 29 sequences overlapped (Fig. 2A).

In this study, we mapped the 61 unique sequences of Setiamarga et al. (2021) to the high-quality draft genome sequence of N. pompilius (Huang et al. 2022). The sequences were identified as 46 SMP-coding genes, and their full sequences obtained. The data set was then merged with the SMP data set of Zhang et al. (2021). As a result, we obtained a final set consisting of the full sequences of 85 SMPs (Fig. 2A, B). Among them, more than half (45 SMPs) had a signal peptide domain. The number of SMPs without any functional domain was nine. The amino acid sequence of some (MSTRG.749, MSTRG.5336, MSTRG.8937, and MSTRG.11770) are found to be enriched in Gly/Ala (Fig. 2C). The annotation result of the 85 sequences of the N. pompilius SMPs of this study is shown in Table S3.

Comparative analysis of conchiferan and cephalopod shell matrix proteins

In the present study, we compared the more exhaustive data of N. pompilius SMPs we compiled in this study (Fig. 2A, B; Table S3) with the SMP data sets of several conchiferans to look into the evolution of molluscan biomineralization toolkit proteins (Fig. 3A; Tables S1, S2). The result indicated that the degree of homology between N. pompilius and other conchiferan SMPs was limited to around 10% or less (Fig. 3B). These 10% of common SMPs include 11 different types of proteins, i.e., Pif/Pif-like, Peroxidase, BPTI/Kunitz family of Serine Protease Inhibitors, Tyrosinase, Chitinase, Collagen-like, L-Amino-acid Oxidase, CD109 antigen, EGF-ZP Domain-containing protein (EGF-ZP), Carbonic Anhydrase, Chitin Binding Domain-containing protein (ChBD) (Fig. 3C; Fig. S1). This result confirmed our previous result (Setiamarga et al. 2021), but added more shared proteins between N. pompilius and the other conchiferans. Some of these common SMPs are conserved as multiple unigenes on the N. pompilius genome (e.g., unigenes with differing domain structures but identified as similar proteins based on the similarity of their main functional domain(s), such as Pif/Pif-like protein contains two types of unigenes (Pif and Pif-like); Chitinase and Tyrosinase contain two types of unigenes each; ChBD-containing protein contains three types of unigenes). If they are counted as individual genes/proteins, the number of genes shared among the conchiferans including N. pompilius becomes 27.

Fig. 3
figure 3

Common shell matrix proteins (SMPs) conserved among the conchiferans. A The list of published SMPs data sets mapped on the cladogram of the organisms. A corresponding list is also presented in Table S1. B A Circos diagram depicting sequence homologies between the SMPs of N. pompilius SMPs and other conchiferans (E value score < 1.0e−20). C Common SMPs among N. pompilius and the conchiferans. D Common SMPs among the cephalopods. Protein names written in red indicate SMPs matched in both venn diagrams. Because sequence data set of S. spirula (Oudot et al. 2020) is incomplete, some sequences were identified only based on their annotated names (showed in parentheses)

Two SMPs (Chitinase and Tyrosinase) were shared between two cephalopods, N. pompilius and S. pharaonis, and the conchiferans. These proteins are well-known as major SMPs in the conchiferans (Fig. 3B–D; Fig. S1; e.g., Arivalagan et al. 2016; Feng et al. 2017; Shimizu et al. 2022). Meanwhile, N. pompilius and S. spirula share Actin and Peptidyl-prolyl Cis–trans Isomerase B. These proteins were also detected in the SMPs of the gastropods (Marie et al. 2013; Mann et al. 2018; Shimizu et al. 2019). Two types of proteins are shared among the three species of the cephalopods presently studied (N. pompilius, S. pharaonis, and S. spirula) and the other conchiferans: CD109 antigen and VWA Domain-containing proteins (Pif/Pif-like/Collagen-like) (Figs. 3D; Fig. 4; Figs. S1, S2).

Fig. 4
figure 4

Domain structures of Pif and Pif-like proteins

The Pif, Pif-like, and Collagen-like proteins contain von Willebrand Factor Type A (VWA) domain(s), but with different arrangements and numbers (Fig. 4). However, while the VWA Domain-containing protein of N. pompilius could be recognized as Pif/Pif-like, based on the structure of the protein (such as the presence, numbers, and arrangements of the VWA and ChBD domains) because of its complete sequences, the proteins of S. pharaonis and S. spirula were annotated as Collagen-like on NCBI, because both sequences only have the VWA domain(s) as their identified functional domain(s) (two for S. pharaonis and one for S. spirula. We classified Pif/Pif-like proteins and Collagen-like proteins as VWA Domain-containing proteins (Pif/Pif-like/Collagen-like), because we cannot rule out the possibility that the registered sequences of S. pharaonis and S. spirula were incomplete and/or fragmented.

While all VWA Domain-containing proteins and CD109 antigen are also found in the conchiferans, the SOUL Domain-containing protein (SOUL) was discovered in all cephalopods, but not in the other conchiferans (Fig. 3; Fig. 5; Fig. S2). Meanwhile, the number of SMPs shared only between the decapodiforms S. pharaonis and S. spirula are four: Hexosaminidase, Neurofilament, Calmodulin, and Arginine Kinase.

Fig. 5
figure 5

Molecular phylogeny of SOUL Domain-containing proteins (SOUL). A Domain structure of the SOUL Domain-containing protein (SOUL) in the SMPs. B Number of SOUL proteins with SOUL Domain (PF04832), including both SMP and non-SMP types, conserved in published metazoan genomes and SMPs. Length of bars indicates the number of proteins in each genome. C Maximum likelihood tree of the amino acid sequences, inferred using the WAG + Γ + I model with 1000 bootstrap replicates. Bootstraps values of major clades are shown (‘-’ means less than 30%). Abbreviations: Nam: Necator americanus, Cel: Caenorhabditis elegans, Hro: Helobdella robusta, Cte: Capitella teleta, Lan: Lingula anatina, Npo: Nautilus pompilius, Sph: Sepia pharaonis, Ssp: Spirula spirula, Aar: Architeuthis dux, Aar: Argonauta argo, Omi: Octopus minor, Obi: Octopus bimaculoides, Pca: Pomacea canaliculata, Bgl: Biomphalaria glabrata, Aca: Aplysia californica, Lgi: Lottia gigantea, Gae: Gigantopelta aegis, Hla: Haliotis laevigata, Pma: Pinctada fucata martensii, Pye: Patinopecten yessoensis, Mph: Modiolus philippinarum, Bpl: Bathymodiolus platifrons, Cgi: Crassostrea gigas, Pfu: Pinctada fucata

Molecular phylogenetics of some N. pompilius SMPs

In this study, we selected five major molluscan SMPs (SOUL Domain-containing protein, CD109 antigen, Chitinase, Tyrosinase, and EGF-ZP), and conducted molecular phylogenetic analyses to understand their evolutionary history.

SOUL domain-containing protein

We found a copy of SOUL Domain-containing (SOUL) protein in the shell matrices of the three cephalopods compared in this study, namely, N. pompilius, S. pharaonis, and S. spirula (Fig. 5A), but not in other mollusks (Fig. 5B). Phylogenetic and copy number analyses suggest the presence of multiple copies of SOUL-coding genes in the genomes of the animals surveyed in this study, including in the brachiopod Lingula anatina (Fig. 5B, C).The SMP-type SOUL found in the cephalopods form a monophyletic clade together with some of the non-SMP type, with high support (BS = 100%). The sequences were also found in the genome of N. pompilius (Fig. 5B).

CD109 antigen

Although various studies have identified a Thioester-containing protein (TEP) the CD109 antigen (Fig. 6A), as a major SMP, the actual type of this TEP has not been addressed and thus unclear. Annotations have been based mainly only on sequence similarities of the domains. In this study, we performed phylogenetic analysis to classify 205 TEPs of conchiferan SMPs (Fig. 6B). The resulting tree supports the monophyly of all TEP types reported previously (e.g., Sekiguchi et al. 2012; Duval et al. 2020; Marquez et al. 2022), the topology recovered the two subfamilies, C3/C4/C5 and A2M, with full statistical supports (bootstrap values (BS) = 100%). The A2M subfamilies further diverged into six monophyletic clades: the CPAMD8 (BS = 100%), A2M (BS = 100%), MCR (BS = 100%), iTEP (BS = 100%), CD109 (BS = 50%), and molluscan TEP (mTEP) (BS = 56%).

Fig. 6
figure 6

Molecular phylogeny of Thioester-containing proteins (TEPs), including CD109 antigen. A Domain structures of Thioester-containing protein (TEP) family. B Maximum likelihood tree of the amino acid sequences, inferred using the LG + Γ + I model with 1000 bootstrap replications. Bootstraps values of major clade are shown. Background colors of the topology corresponds to the protein family name, all monophyletic in this tree. Outgroup sequences were collected from Sekiguchi et al. (2012), Duval et al. (2020), and Marquez et al. (2022). The domain structures of SMP-type TEPs are shown in Fig. S3

Non-SMP molluscan (bivalves, gastropods, and cephalopods) TEPs formed five monophyletic clades (the C3/C4/C5, A2M, CD109, mTEP, and MCR clades). Besides them, arthropods specific CD109s form the monophyletic iTEP clade (Sekiguchi et al. 2012), with the monophyletic CD109 clade containing several animal TEPs (e.g., mouse, humans, and the bobtail squid) forming its sister clade. Sister to the two clades was the monophyletic mTEP clade (BS = 56%), which was composed exclusively of the molluscan TEPs. The SMPs of all non-cephalopod conchiferans studied here are included in this clade. We found that only the N. pompilius SMP TEP (which were annotated as CD109 antigen in Setiamarga et al. 2021) and the non-SMP ones were also included in the mTEP clade. The TEPs of the other cephalopods studied here, S. pharaonis and S. spirula, were included in the A2M clade.

Chitinase

Molluscan Chitinases are known to have complex and highly diverse domain structures, with one or multiple GH18 domains (PF00704) present in all Chitinases, while in some, one or several Chitin-binding domains (ChBD) are also present besides the GH18 (Table S4). Such domain structure patterns are also seen in those reported as SMPs (Fig. 7A). We found both types of Chitinases (MSTRG20089 and MSTRG20090) in the SMPs of N. pompilius, with MSTRG20090 having two GH18 domains, a signal peptide, and a ChBD domain, while the former had only a single GH18 domain (Fig. 7A).

Fig. 7
figure 7

Molecular phylogeny of Chitinases. A Domain structure of Chitinase of molluscan SMPs. B Maximum likelihood tree of the amino acid sequences, inferred using the LG + Γ + I model with 1000 bootstrap replicates. Bootstraps values of major clades are shown (‘-’ means less than 30%). Outgroup sequences were obtained from Nematoda, Annelida, and Brachiopoda (Table S2). Clades with high bootstrap support (Chitinase without ChBDs groups I–VI) are highlighted gray

The phylogenetic tree inferred using the amino acid sequences of the GH18 domains of Chitinases collected from 536 lophotrochozoans suggested that the enzyme did not form any monophyletic bifurcation corresponding the two types of domain structures mentioned above (Fig. 7B). Six monophyletic clades of some non-SMP Chitinases with only GH18 from all molluscan lineages (bivalve, gastropod, cephalopod; denoted as I–VI in Fig. 7B) were found (BS = 49–100%), but none of the Chitinases reported as SMPs are included in the six clades. Our phylogenetic analysis also recovered a clade containing some of the pteriomorph bivalvian SMPs together with other seemingly non-SMP sequences (BS = 100%), as previously reported (Shimizu et al. 2020). Other molluscan SMP Chitinases (including those of the bivalves) are not monophyletic, but apparently formed taxon-specific clades (bivalve, gastropod, and cephalopod clades) with the non-SMP ones (albeit statistically very weakly-supported). This is also the case for the Chitinases of N. pompilius (MSTRG20089, containing only the GH18 domain; MSTRG20090, containing both GH18 and ChBD domains), which were found to be included in a clade together with other cephalopod Chitinases.

Tyrosinase

Tyrosinase, an oxidoreductase metalloenzyme containing a single Tyrosinase domain (PF00264), was found in all of the 22 Lophotrochozoan genomes included in this study (Fig. 8A). In mollusks, the number of Tyrosinase gene copies present in the genomes showed considerable variability, with more number observed in bivalves than in other mollusks (Bivalvia = 24–53; Gastropoda = 1–12; Cephalopoda = 4–15). Despite the limited number of species analyzed for SMPs (Small Secreted Proteins), the number of Tyrosinases identified as SMPs was also variable and probably included isoforms. Although not found in all of the 21 Tyrosinases identified as conchiferan SMPs, a signal peptide was apparently conserved in addition to the single Tyrosinase domain (Fig. 8B). Two copies of the Tyrosinase gene were discovered in all cephalopods studied here (Fig. 8B).

Fig. 8
figure 8

Molecular phylogeny of Tyrosinases. A Number of proteins with a single Tyrosinase domain (PF00264) conserved in the genomes of animals studied here, plotted on a phylogenetic tree of the organisms compared. Length of bars indicates the number of proteins in each genome. B Domain structures of the SMP-type Tyrosinases in mollusks. (C) Maximum likelihood tree of the amino acid sequences, inferred using the LG + Γ + I model with 1000 bootstrap replicates. Bootstraps values of major clades are shown (‘-’ means less than 30%). Representative clades with gene number expansions in the genome are highlighted by the gray color. Outgroup sequences were obtained from Nematoda, Annelida, and Brachiopoda (Table S2). Nam: Necator americanus, Cel: Caenorhabditis elegans, Nam: Necator americanus, Nam: Necator americanus, Nam: Necator americanus, Nam: Necator americanus, Hro: Helobdella robusta, Cte: Capitella teleta, Lan: Lingula anatina, Npo: Nautilus pompilius, Aar: Architeuthis dux, Aar: Argonauta argo, Omi: Octopus minor, Obi: Octopus bimaculoides, Pca: Pomacea canaliculata, Bgl: Biomphalaria glabrata, Aca: Aplysia californica, Lgi: Lottia gigantea, Gae: Gigantopelta aegis, Hla: Haliotis laevigata, Pma: Pinctada fucata martensii, Pye: Patinopecten yessoensis, Mph: Modiolus philippinarum, Bpl: Bathymodiolus platifrons, Cgi: Crassostrea gigas, Pfu: Pinctada fucata. Mytilidae consist of B. platifrons and M. philippinarum. Pteriidae consist of P. fucata, P. fucata martensii

Our phylogenetic tree did not reveal a monophyletic SMP Tyrosinase, which is consistent with our previous report (Setiamarga et al. 2021) (Fig. 8C). Only a certain degree of phylogenetic clustering of the 344 Tyrosinase protein sequences analyzed in this study was observed in the tree, with almost no taxon-specificity, and in most cases, with relatively low statistical support for the nodes. For example, some of the cephalopod Tyrosinases formed two monophyletic clades, each containing both SMP and non-SMP copies of the gene (BS = 100% and 68%). Meanwhile, the heavily expanded copies of bivalve Tyrosinases formed multiple clades, with some containing the SMP and non-SMP copies of the gene.

EGF-ZP domain-containing protein

A copy of the EGF-ZP Domain-containing protein (EGF-ZP) (Fig. 9A) was identified in N. pompilius SMPs (Setiamarga et al. 2021). However, no EGF-ZP and other proteins thought to form complexes with EGF-ZP in biomineralization (e.g., Shematrin, Pif, and Serine Protease Inhibitor) were detected in the SMPs of the decapodiforms S. pharaonis and S. spirula SMPs. Similar to Tyrosinase, our phylogenetic tree, which includes only the 63 protein sequences annotated as EGF-ZP from mollusks (with one sequence of the annelids as outgroups), also indicated that the EGF-ZP proteins employed as SMPs do not form a monophyletic clade, with a possible taxon-specific clusterings (Fig. 9B). For example, the SMP and non-SMP EGF-ZP proteins in the cephalopods formed a monophyletic clade (BS = 100%), while those of the bivalves formed two monophyletic clades (BS = 100% and 95%).

Fig. 9
figure 9

Phylogenetic tree of EGF-ZP Domain-containing proteins (EGF-ZP). A Domain structures of EGF-ZP Domain-containing proteins (EGF-ZP) in mollusks. B Maximum likelihood tree of EGF-ZP amino acid sequences, inferred using the LG + Γ + I model with 1000 bootstrap replicates. Bootstraps values of major clades are shown (‘-’ means less than 30%). An annelid sequence was used as the outgroup

Discussion

Features and evolution of the shell matrix proteins in mollusks

Since Pieter Harting, one of the pioneers in the study of biomineralization, identified the organic matrix as an essential calcareous component, numerous researchers have been examining its roles in biomineralized structures (Harting 1872; Sommerdijk and Cölfen 2010; Dauphin 2023). The nautiloids have fascinated researchers for being the only extant taxon of the cephalopods still retaining its calcified external shell with distinct structural traits like the presence of a nacreous layer, a characteristic shared with the shells of other mollusks. Because the nautiloids are early-branching cephalopods, examining their shells would allow scientists to reconstruct presumed ancestral features in the cephalopods and study their evolution. Macrostructurally, the equiangular spiral shell of the nautiloids has several distinct parts: the wall, septum, callus, septal neck, and siphuncle. Meanwhile, the main microstructures of the shell are composed of the mineralized layers of granular, prismatic, and nacreous aragonite (Lowenstam et al. 1984; Saunders and Landman 2009; Velázquez-Castillo et al. 2006; Fig. 1D) and organic layers composed of Beta Chitin and other SMPs (Weiner and Traub 1980).

Recent technological and analytical method developments in genomics, transcriptomics, and proteomics have allowed for relatively exhaustive identifications of various matrix proteins embedded in the organic layers of the shell, called shell matrix proteins, in mollusks (Fig. 3; Fig. S1; Table S1). One of the main aim of analyzing SMPs is to identify the genetic tool kits and the key players in molluscan biomineralization, which will allow for the elucidation of the actual mechanism of biomineralization, shell formation, and their evolution at the molecular level, besides opening up possibilities of genetic manipulation of the identified and characterized proteins for applicable purposes (such as pearl production). These studies had thus far revealed that the number of shared SMPs among different conchiferans are relatively low, especially at higher taxonomy, such as between Bivalvia and Gastropoda (Marie et al. 2012; Mann and Jackson 2014; Feng et al. 2017; Song et al. 2019). Those studies also suggested that even the degree of homologies among shared SMPs is arguably low even at the inter-species level in the two molluscan classes (Bivalvia and Gastropoda). This high level of diversity shown by the SMPs was thought to be caused by rapid evolution of the genes, probably because of the differing selective pressures to adapt to the different details of niches in the living environment of each species. Besides that, being close to the actual expressed phenotypes and thus most likely downstream of the gene expression networks, the selective pressures on the SMPs are relatively low, that is, as long as they maintain the necessary spatial structures needed to get the job done, they do not have to strictly retain their primary structures. This would also allow organisms to co-opt non-homologous proteins as SMPs, as long as they have the necessary spatial and functional configurations and structures (e.g., Nakashima et al. 2019), which could probably be caused by structural and functional convergence (Pearson and Sierk 2005). Intriguingly, such differences in recruited SMPs apparently correspond to crystallographic differences, such as the microstructures (e.g., Jackson et al. 2010; Kocot et al. 2016). Deep homology, that is, when core sets of homologous proteins get recruited to form morphologically homologous structures in different species (Zaquin et al. 2021), might also explain why some representative proteins thought to be crucial for biomineralization, such as Nacrein, Pif, and EGF-ZP, have been identified in various bivalves and gastropods (Kocot et al. 2016). Sometimes, key proteins are so important that they get re-recruited to form similar but morphologically non-homologous structures, such as the shell-like calcified eggcase of the argonauts (e.g., Yoshida et al. 2022).

The shell matrix proteins of N. pompilius and the cephalopods provide insights to shell evolution in mollusks at the molecular level

To understand the evolution and diverse mechanisms of biomineralization in mollusks, information from the cephalopods is crucial, because the lineage has evolved the structure, but has extant members with the biomineralized shells internalized (e.g., the cuttlefish), degenerated (e.g., the squids), lost (e.g., the octopuses), and even reinvented (e.g., the argonauts). Therefore, studying the SMPs in the nautiloids is a crucial first step in elucidating the evolution of mollusk shells.

Previously, we showed that the presence of eight out 47 SMPs of N. pompilius SMPs are shared among the conchiferans (Pif/BMSP-like, CD109 antigen, Tyrosinase, EGF-like domain-containing protein, Chitinase, Peroxidase, BPTI/Kunitz domain-containing protein, Uncharacterized LOTGI_169029; Setiamarga et al. 2021). From the updated N. pompilius SMPs in the present study, we found an additional four (Carbonic Anhydrase (CA), L-amino-acid Oxidase, Collagen-like, and Peroxidase), making the total number of homologous SMPs in the conchiferans to be 11 out of 85. We did not find one protein identified in our previous study, Uncharacterized LOTGI_169029, probably because the protein was an artifact caused by sequencing and/or assembly errors and was undetected anymore when mollusk genome sequences were updated. The presence of this gene repertoire, which was probably acquired in the common ancestor of the cephalopods, gastropods, and bivalves, confirmed the presence of a core set of shell matrix proteins conserved among the three conchiferan classes. Five polypeptide fragments were identified SMPs in N. macromphalus: Mucoperlin-like (which exhibited some characteristics of Mucin; Marin et al. 2000), MSI31/MSI60-like (which is enriched in Gly/Ala, Sudo et al. 1997), N14/N16/Pearlin-like (which contain Heparin-binding motif, acidic regions, and Gly/Asn-rich region, Samata et al. 1999; Montagnani et al. 2011), Nacrein-like (also known as Carbonic Anhydrase, Miyamoto et al. 2003), and Tyrosinase-like. Comparison between the two Nautilus species showed that N. pompilius and N. macromphalus share four out of the five proteins (except for N14/N16/Pearlin-like). This indicates that these major SMPs are probably involved in the nacreous shell formation in the two Nautilus species, or even possibly, in all nautiloids. Meanwhile, at present, we cannot decisively determine if the lack of N14/N16/Pearlin-like in N. pompilius was because the protein is a species-specific protein to N. macromphalus or an artifact.

We also conducted a comparison of the SMPs of three cephalopods with biomineralized shells, namely, N. pompilius, S. pharaonis, and S. spirula. While the shells of S. pharaonis and S. spirula are internalized and degenerated, they are considered to be true (morphological) homologs of the shell of the conchiferans, and retain some of the morphological structures seen in both the nautiloid and ammonite shells, such as the phragmocone (Kröger et al. 2011). Our findings suggest that some core matrix proteins (CD109 antigen, Tyrosinase, and Chitinase) are conserved among the cephalopods, although Tyrosinase and Chitinase were not found in the S. spirula SMPs. It should be noted that the three cephalopods studied here have different shell microstructures (Dauphin et al. 2020; Checa et al. 2022). As previously mentioned, this could suggest that the differences in the utilized SMPs among the three species are reflected in the differing shell microstructures (Jackson et al. 2010; Kocot et al. 2016). In addition, we also found that four proteins (Hexosaminidase, Neurofilament, Calmodulin, and Arginine Kinase) were shared only by the decapodiforms S. pharaonis and S. spirula. No studies on conchiferan SMPs have reported their presence thus far, suggesting that these proteins were probably recruited by the ancestral lineage leading to decapodiforms during or after the internalization of the calcified shell. Further research is still necessary to verify this hypothesis.

Nevertheless, it must be noted that not only were S. spirula sequences probably fragmented (not full sequences), but also one previously published data set used in this study (Oudot et al. 2020) was also incomplete, because not all sequences were published. This prompted us to analyze some proteins by reported annotated names rather than by sequence-based analytical methods, which may have caused a biased result. Therefore, a re-examination involving a more complete data set of S. spirula SMP sequences must be conducted in the future.

Cephalopod shell matrix proteins with repeat motifs

Studies on the amino acid compositions of N. pompilius and its congener, N. belauensis, have shown that the organic matrix is extremely abundant in the residues of Gly/Ala/Asx (Asp or Asn), and Ser (Weiner and Hood 1975; Weiner and Traub 1980; Lowenstam et al. 1984; Keith et al. 1993). This property was consistent with other conchiferan shells despite the different microstructures, and it has been suggested that these amino acid sequences probably play roles in shell formation. (e.g., Mercenaria mercenaria Hare and Abelson 1965; Crassostrea virginica Meenakshi et al. 1975; Mytilus californianus Keith et al. 1993). Weiner and Hood (1975) assumed that these proteins most likely consisted of a series of (Asp-Y)n (Y: Serine or Glycine) forming an antiparallel-sheet conformation, and suggested that the negatively charged Asp has a Ca2+ binding capacity (Weiner 1979). Subsequently, sequences with properties similar to what was predicted were identified from several bivalve species (e.g., Patinopecten yessoensis (MSP-1; Sarashina and Endo 1998), Pinctada fucata (Aspein; Tsukamoto et al. 2004), Atrina rigida (Asprich; Gotliv et al. 2005), Pinna nobilis (Caspartin; Marin et al. 2005), Crassostrea nippona (MPP1; Samata et al. 2008). Marie et al. (2009) also reported a possible presence of similar Gly/Ala/Asx (Asp or Asn) rich proteins in the nacreous layer of the congener of N. pompilius, N. macromphalus. The identification of MSI31/MSI60-like (Sudo et al. 1997) and N14/N16/Pearlin-like (Samata et al. 1999; Montagnani et al. 2011) in the same species explains the amino acid compositional bias.

Nautilin-63, which was characterized as an acidic phosphorylated protein of 63 kDa band extracted from the matrix protein of the nacreous layer in N. macromphalus (Marie et al. 2011), is also enriched in Gly/Asx/Thr residues and predicted to have a similar Ca2+ binding capacity (Marie et al. 2011). However, previous multiomics/phenomics studies on N. pompilius SMPs failed to find any protein with similar characteristics (Setiamarga et al. 2021; Zhang et al. 2021). This was probably because either Nautilin-63 is a species-specific protein, or because it is difficult to obtain a complete gene and/or protein sequence with highly repetitive motifs/domains. In this study, we succeeded in identifying possible full sequences with similar features, when we mapped some fragmented sequences on the genome. MSTRG.8937 was identified from the SMPs of N. pompilius, and is composed of repeat motifs rich in Gly/Ala/Asp residues. Previous studies have shown that the sequence similarities among possible orthologs of highly acidic proteins with highly repetitive acidic amino acid residues are very low, even among sequences from congeners (Isowa et al. 2012; McDougall et al. 2013). Nautilin-63 and MSTRG.8937 may be homologous, although this homology cannot be confirmed because of low similarities between the two sequences.

Three Gly/Ala rich proteins (MSTRG.749, MSTRG.5336, and MSTRG.11770) were identified from the SMPs and genome of N. pompilius. However, the three proteins do not contain acidic amino acids. Although reciprocal BLAST searches failed to find any homolog to these proteins, similar structural proteins have been previously identified and characterized as Silk Fibroin-like protein in the nacreous layer of Pinctada fucata (Sudo et al. 1997). It is to be noted that the shell protein was considered as Silk Fibroin-like protein based only on its Ala-rich content and function as a structural protein. These unique repeat sequences are thought to structurally strengthen the shell in mollusks (Yano et al. 2006), because its primary structure resembles the spider silk protein (which is thought to cause the strength of the spider web), and thus most likely functions similarly (Gatesy et al. 2001; Numata 2020). Another similar structural protein in plants, the Glycine-rich protein (GRP), is also thought to have been the cause of mechanical strength of plant cell walls (Lei and Wu 1991). Interestingly, MSTRG.5336 is the only N. pompilius SMP consisting of not only Gly/Ala, but also Gln/Gly/Arg in its repeat motif.

In this study, we succeeded in confirming the presence of several structural proteins containing orderly repeat motifs in N. pompilius, suggesting the presence of possibly few types of structural proteins in the shell matrix, probably involved in shell biomineralization. Our observation here also agrees with previous suggestions on the non-conservation of proteins with repeat motifs. Previous studies have shown that the repeat-motif proteins in mollusk SMPs are not sequence conserved (Weiner and Hood 1975), but the abundance of the same residue(s) is probably critical for them to function as structural proteins (Sarashina and Endo 1998; Isowa et al. 2012).

Remarks on the evolution of some N. pompilius SMPs

Pif/Pif-like

Pif was discovered from the SMPs of P. fucata and its structure consists of one VWA domain, two Chitin-binding domains, and a LamininG-like motif at the C-terminus (Suzuki et al. 2009, 2013; Fig. 4). Later, two types of the orthologous proteins with different domain arrangements, BMSP and LamininG3 (sometimes grouped together as Pif-like), were also reported. The former is apparently specific to bivalve SMPs, and the latter was found in gastropods and cephalopods (Suzuki et al. 2011; Ishikawa et al. 2020; Setiamarga et al. 2021; Yoshida et al. 2022). Pif and Pif-likes apparently play a role in forming a thin organic film layer of the nacreous layers and in controlling the growth direction of aragonites in the shell of Pinctada fucata (Suzuki et al. 2009). Other than the three types of Pif orthologs, recent studies have revealed the presence of multiple SMPs containing one or several VWA and ChBD domains, and a low complexity region (which is a signature of Pif/Pif-like) in various configurations, but with low sequence similarities to Pif (Zhao et al. 2020; Ishikawa et al. 2020). The low complexity region is located, and probably derived from, the LamG3 domain which remains intact in the LamininG3 protein (Suzuki et al. 2013; Setiamarga et al. 2021). Thus, in this study, an unidentified protein containing ChBD domain(s) and a low complexity region has been broadly classified as a “Pif-like” protein.

We found five Pif homologs in the genome and SMPs of N. pompilius: one Pif, one LamininG3, and three Pif-like proteins (Fig. 4). Pif possibly plays similar functions in biomineralization as its paralogs in P. fucata, including but not limited to, the formation of the nacreous layer. Meanwhile, LamininG3 is usually conserved in the cephalopod genomes, despite being found only in the genomes of a limited number of conchiferan species (e.g., Albertin et al. 2015; Kim et al. 2018; Da Fonseca et al. 2020; Huang et al. 2022; Yoshida et al. 2022). While both Pif and LamininG3 show their standard domain configurations (Fig. 4), Pif-like has a distinct configuration with two VWA, a single ChBD, and a low complexity region. This is similar to what was previously found in P. fucata (Zhao et al. 2020).

SOUL domain-containing protein

SOUL Domain-containing protein (SOUL) belongs to the Heme-binding Protein (HBP)/SOUL family of proteins with the SOUL Domain (PF04832) as the main functional domain (Fig. 5A). SOUL represents a group of evolutionarily conserved proteins with members in animals, plants, and even bacteria (Goodfellow et al. 2021). Their recognized representative function is to bind heme to its prosthetic group (Taketani et al. 1998). In mammals, two paralogs, HEBP1 and HEBP2, have been functionally analyzed. HEBP1 binds heme to iron, while HEBP2 does not bind to iron but has other multiple functions related to heme, such as binding to free heme and heme transport (Fortunato et al. 2016). Heme consists of a small cyclic tetrapyrrole with a centrally chelated Fe atom (Layer et al. 2010).

The phylogenetic analysis and copy number survey suggest that the presence of multiple copies of the SOUL-coding genes in metazoan genomes may have been caused by gene duplications and losses, followed by specialization of paralog functions, along with further copy number expansions in the molluscan lineage (Fortunato et al. 2016) (Fig. 5B, C). SOUL has not previously been reported as an SMP in mollusks. Nevertheless, this study identified SOUL proteins in the shell matrices of the three cephalopods examined, forming a monophyletic clade with high support (Fig. 5C). This finding suggests that they were likely recruited and co-opted in the lineage leading to cephalopods for biomineralized shell formation. A high percentage of sequence similarity at the C-terminus of the SMP-type cephalopod sequences may also indicate selection pressure to maintain protein structure.

Oudot et al. (2020) suggested that SOUL may be involved in biomineralization by storing and transporting iron. Ferritin, which is conserved in the SMPs of bivalves and gastropods, also has the ability to bind iron. It is possible that cephalopods have co-opted SOUL to perform a similar function instead of Ferritin. However, the relationship between iron and CaCO3-based biomineralization remains unclear, and thus the potential functional similarity between SOUL and Ferritin requires further investigation. In addition, SOUL may participate in binding calcium ions or other biomineralization-related prosthetic groups or in calcium ion-dependent interactions between proteins (Mikasa et al. 2018).

CD109 antigen

CD109 antigen belongs to the Thioester-containing proteins (TEPs) superfamily (Nonaka and Kimura 2006). TEPs are classified into two subfamilies: Complement Factors (C3/C4/C5) and Alpha-2 Macroglobulins (A2Ms). The A2M subfamily comprises several closely related molecules: Alpha-2 Macroglobulins (A2M), CD109 antigen (CD109), Macroglobulin Complement-related protein (MCR), Pregnancy Zone protein (PZP), C3 And PZP Like Alpha-2-Macroglobulin Domain-containing protein 8 (CPAMD8), and insect TEP (iTEP). These proteins share a set of domains as their basic signature: the signal peptide, A2M-N, A2M-N2, A2M, Alpha-Macroglobulin Thiol-ester Bond-forming Region, A2M Complement, and A2M Receptor domains. Several members of the A2M superfamily can be further distinguished by their additional domains added to the basic common domains. For example, C3/C4/C5 has the NTR (C345C) domain at the C-terminus, CPAMD8 has the KAZAL domain at the C-terminus, and MCR has the LDLa domain between its A2M-N2 and A2M domains (Fig. 6A). Some functional analyses have suggested that the protein must undergo proteolytic cleavage, causing most proteins identified through proteomics to be incomplete (Portet et al. 2018).

We found the non-SMP molluscan (bivalves, gastropods, and cephalopods) TEPs classified into five monophyletic clades (C3/C4/C5, A2M, CD109, mTEP, and MCR) in the genomes of the respective mollusks. These TEPs were probably conserved in the most recent common ancestor of mollusks. The iTEP clade, which consists solely of CD109 specific to arthropods (Sekiguchi et al. 2012), probably expanded from a CD109 included in the animal TEP clade (the CD109 clade), in a lineage that eventually gave rise to arthropods. It is also worth noting that only the SMP and non-SMP TEPs of N. pompilius were included in the mTEP clade, whereas those of S. pharaonis and S. spirula belonged to the A2M clade. This suggests that the expansion of mTEP probably occurred in the molluscan lineage after the split between mTEP and the CD109+iTEP clades, which might have happened at the same time as the Ecdysozoa-Lophotrochozoa split, an event that could have contributed to the development of the ability to form a calcified shell. Furthermore, the switch to A2M in the two decapodiforms possibly coincided with the internalization of their mineralized shells.

Chitinase

Chitinase is a representative conchiferan SMP, found in all studies on conchiferan shell biomineralization. The enzyme belongs to the Glycoside Hydrolase family 18 (GH18), which members are characterized by the presence of the GH18 domain (PF00704). In shell formation, it is thought that the enzyme works by catalyzing the degradation of the Chitin framework during CaCO3 biomineralization. SMP Chitinases show various domain structures (Fig. 7A), but can be artificially grouped into two types: Those consisting of only the GH18 Domain, and those with the Chitin-binding (ChBD) and GH18 Domains. The ChBDs could be located either before (on the N terminus side) or after (on C terminus side) the GH18, but apparently never both, and the number is not fixed (one or two ChBD domains are usually observed). Some mollusk Chitinases have complex domain structures, where several ChBD and GH18 domains are present (Table S4), indicating that both ChBD and GH18 domains seemed to undergo frequent tandem duplication and a possible concerted evolution among them.

The six monophyletic clades (I-VI; Fig. 7B) of Chitinases that consist only of the GH18 domain are noteworthy, because they include non-SMP Chitinases from all molluscan lineages (bivalve, gastropod, cephalopod), indicating possible functional constraints at the primary sequence level. However, Chitinases with the ChBD, including those found in SMPs, were not monophyletic. These proteins seem to have diversified in each taxon by acquiring or losing domains through domain shuffling and tandem duplication. For instance, our phylogenetic tree indicates that bivalves had probably doubled their number of SMPs (8–15) in at least the common ancestor of Pteriomorphia.

We also acknowledged that the variety of domain structures shown by some of the Chitinase sequences included in this study, such as the lack of certain domains, could be experimental artifacts caused by, for example, sequencing errors, or other technical difficulties that could have prevented the complete sequencing of intact sequences. Therefore, the absence of domains in the sequences analyzed in this study may not indicate their actual absence. For example, the signal peptide domain was not always present in all SMP Chitinases observed in this study.

Tyrosinase

Tyrosinase, which is a member of the Type-3 Copper-containing Metalloprotein superfamily, plays a critical role in physiological processes including oxygen transport, pigmentation and innate immunity as the key rate-limiting enzyme in melanin synthesis (Cerenius et al. 2008; Cieslak et al. 2011). In mollusks, Tyrosinase is also involved in shell coloration through melanin synthesis (Yu et al. 2018; Zhu et al. 2021). SMP Tyrosinases contain only a single Tyrosinase domain (PF00264). Regardless of its involvement in the shell biomineralization process, the signal peptide is not necessarily included in a Tyrosinase. A survey on various mollusk draft genomes on Tyrosinases with a single Tyrosinase domain showed that the copy numbers of the gene varies significantly among different molluscan species. For example, the bivalves apparently underwent multiple tandem duplications of their Tyrosinases (Cephalopoda: 4–15; Gastropoda: 3–12; Bivalve: 24–53) (Fig. 8A) (Aguilera et al. 2014).

Since the general support values of our trees are low, we cannot discuss the phylogenetic evolution of this gene based on with certainty. However, the topology of the tree suggests a possibility that after clade-specific tandem duplications, different copies of Tyrosinase were recruited in each clade for shell formation (Setiamarga 2021). Similar phenomena have also been observed for several SMPs, such as Chitinase and EGF-ZP, also surveyed and discussed in this study.

EGF-ZP domain-containing protein

EGF-ZP Domain-containing protein (EGF-ZP) is a type of protein composed of one or two EGF domain(s) and one ZP domain (Fig. 7A). In studies on bivalves and gastropods, EGF-ZP were often identified as SMPs. Meanwhile, a copy of the protein was also identified in N. pompilius SMPs (Setiamarga et al. 2021). The EGF domain functions by helping crystals to aggregate, forming multiple columnar prisms inside the crystals (Iwamoto et al. 2020). The ZP domain forms complexes and functions by assisting protein–protein interaction, forming complexes with other proteins which functions are biomineralization-related, such as Shematrin, Pif, and Serine Protease Inhibitor (Jain et al. 2018; Shimizu et al. 2022). Pif and Serine Protease Inhibitor are also present in the N. pompilius SMPs, suggesting the formation of similar complexes and crystal aggregates also probably occur during the shell formation of N. pompilius. Meanwhile, we could not detect the presence of EGF-ZP, Pif, and Serine Protease Inhibitor in the SMPs of the decapodiforms S. pharaonis and S. spirula. Although we cannot rule out experimental artifacts causing these proteins to be undetected in these species, there is also a possibility that the absence of these proteins indicate that the two decapodiform species do not employ the shell formation mechanism via this protein complexes.

Our EGF-ZP phylogenetic tree also suggested that the proteins employed as SMPs do not form a monophyletic clade (Fig. 7B). It seems that after an expansion of the copy number of the protein-coding gene in the genomes of mollusks, each clade recruited different copies of the gene for shell formation. This result is also in agreement with the result of a previous study by Shimizu et al. (2022).

Presumed functions of cephalopod shell matrix proteins and the mechanism of shell formation

In this study, three of the cephalopod SMPs shared with the conchiferans were reportedly produced by the hemocytes (CD109 antigen, Chitinase, and Tyrosinase) (e.g., Lin et al. 2002; Badariotti et al. 2007; Huang et al. 2010; Terwilliger 2007). A protein shared only among the cephalopods, SOUL, is also expressed by the hemocytes (Goodfellow et al. 2021). This is intriguing, because one of the suggested mechanisms of CaCO3 crystallization in mollusks is hemocyte-mediated biomineralization (Mount et al. 2004). It is well-known that pearl grafting (Schmitt et al. 2018) would trigger a significant hemocyte response, causing a high level of hemocyte accumulation during pearl-sac development (Kishore and Southgate 2015). The level of the response corresponds to the degree of damage to host tissue, as a part of the immunoreaction by the host oyster to recognize foreign materials (Kishore and Southgate 2015; Shen et al. 2020). While molecular and cellular studies on immunoreaction and tissue repair in cephalopods are still limited (Imperadore et al. 2022), proteomic analysis of hemocytes obtained from protozoan parasite-infected octopods indicates an elevated immune response and increased hemocyte activity (Castellanos-Martínez et al. 2014). Furthermore, infection-derived inflammations, which might also show possible color and morphological changes of the damaged tissues, have been observed in the regenerating soft tissues of octopods (Zullo and Imperadore 2019). These observations, taken altogether with those observed in pearl-sac development and pearl grafting, might suggest a potential involvement of the hemocytes and immunoreactions in the regeneration of damaged soft tissues, although further studies are still necessary to confirm this hypothesis.

Similar increase in hemocytes has been also observed in the shell repairing process in oysters (Ruddell 1971; López et al. 1997), suggesting the involvement of the hemocytes and SMPs in oyster’s shell formation and/or repair (Johnstone et al. 2008). Li et al. (2016) also showed evidence that circulating hemocytes can directly participate in the CaCO3 crystal formation in two species of pearl oyster (Pinctada fucata and Crassostrea virginica). Meanwhile, a recent study by Liu et al. (2023) reported the detection of a large amount of Hemocyanin in S. pharaonis’s SMPs, indicating the presence of hemocytes in the shell matrix of this species.

Interestingly, besides being consistently identified and categorized as SMPs, the three proteins shared among the conchiferans and the one shared solely among the cephalopods have been reported to play a role in the function of hemocytes related to immunity. This suggests that these proteins have multiple functions in both immunity and shell formation (A2M and CD109 antigen: Wyatt et al. 2014; Yazzie et al. 2015; Chitinase: Badariotti et al. 2007; Zhou et al. 2020; Tyrosinase: Zhang et al. 2006; Mao et al. 2018; Zhu et al. 2021; Xiong et al. 2022; SOUL: Goodfellow et al. 2021), as well as soft tissue regeneration. This could also explain the mechanism behind shell repair after damage, as soft tissue injuries may occur concurrently with damaged shells, for example, when caused by predation (Tsujino and Shigeta 2012). Therefore, we suggest that the proteins were co-opted and used in shell biomineralization-related processes in at least the ancestral lineage leading to the shelled mollusks. Future studies focusing on their actual functions in shell formation and biomineralization must be conducted to confirm this hypothesis.

Conclusions

In this study, we updated our previous findings on a set of shell matrix proteins (SMPs) of the living fossil N. pompilius, which is a member of an early branching group of extant cephalopods still retaining their external calcified shell, the nautiloids (Setiamarga et al. 2021). A set of 85 proteins was recovered through in silico analyses comparing our previously identified set of proteins and those of Zhang et al. (2021), and by mapping the sequences on the draft genome of the species (Huang et al. 2022). We succeeded in identifying several proteins with repeat motifs but with low sequence similarities to their putative homologs in other organisms, such as Nautilin-63 and a highly acidic protein. These proteins are thought to function by providing a framework for calcium carbonate crystallizations. By comparing the re-identified N. pompilius SMPs with those of a wide range of mollusk taxa, we identified a set of core proteins commonly shared in the conchiferans. We also identified the SMPs shared among three cephalopods (N. pompilius, S. pharaonis, and S. spirula), and between the latter two decapodiforms. The common SMPs shared between the two decapodiforms probably co-opted after the internalization of their shells.

Our phylogenetic analyses on a select five of N. pompilius SMPs, CD109 antigen, Chitinase, Tyrosinase, EGF-ZP, and SOUL, revealed their possible evolutionary scenarios. The five proteins were selected, because while they are known to be expressed in the hemocytes and play a role in immunity, studies in pearl and shell formation indicated their involvement in biomineralization. The phylogenetic analysis of CD109 antigen (TEPs), which was found in conchiferan SMPs, indicated that the different types of TEPs, mTEP and A2M, were utilized in external shell N. pompilius and internal shell S. pharaonis and S. spirula, respectively. Their phylogeny also suggests recruitments of different orthologs to carry out a similar function, which in this case is shell formation. The recruitments probably related to the internalization of the shell in the lineage. Meanwhile, phylogenetic analyses of Tyrosinase and Chitinase revealed that the two genes were probably recruited for shell formation independently in each molluscan lineage. Although low bootstrap values, probably caused by rapid evolution, have prevented a straightforward discussion about their evolution, it seems that extreme expansion of copy numbers through tandem duplications and losses of domains and genes leading to lineage-specific subfunctionalizations probably happened to these two genes. Meanwhile, SOUL was detected as an SMP only in the cephalopods. The cephalopod SMPs were recovered as a monophyletic clade with a high statistical support. This probably indicates that the protein was recruited for shell formation in the lineage leading to the cephalopods, and underwent a functional constraint causing them to maintain their sequences to be similar, suggesting that these proteins are probably crucial in cephalopod shell formation.