Introduction

Metallothioneins (MTs) represent a superfamily of metal-binding proteins that have been identified in virtually every organism investigated to date. MTs are best characterized by their high cysteine content (30 %) with individual cysteine residues arranged in distinctive motifs (e.g., Cys–x–Cys, Cys–x–x–Cys) that form metal-binding clusters. Expression of MTs can be constitutively regulated as part of basic physiological processes, but they are also highly inducible by a variety of metals, including both essential metals such as Zn and Cu, and toxic metals such as cadmium (Cd) (Furst and Nguyen 1989; Brouwer et al. 1992; Czupryn et al. 1992; Bauman et al. 1993; Bonneton et al. 1996; Roesijadi et al. 1996; Hensbergen et al. 2001). Thus, they play fundamental roles in the homeostasis of essential metals and detoxification of trace metals.

The number of cysteine motifs determines the two types of MT domains, designated α and β (Braun et al. 1986). The α-domain, usually containing eleven to twelve cysteine residues, binds four divalent metal ions and conveys structure and stability to the protein (Jiang et al. 2000). In contrast, the β-domain, usually containing nine cysteine residues, binds three divalent metal cations and has been shown to participate in metal exchange reactions via glutathione-shuttling with zinc (Zn)- and copper (Cu)-requiring apoproteins (Brouwer et al. 1993; Suzuki and Kuroda 1994; Jiang et al. 1998). Apo-metallothioneins exist as random coils without any defined secondary structure, and tertiary folding is metals dependent (Vasak et al. 1980; Nielson et al. 1985; Nielson and Winge 1985). Comparisons between individual α- and β-peptides to the complete protein demonstrate the independent metal-binding properties of the two domains (Nielson and Winge 1985). α- and β-peptides bind 4 and 3 Cd ions, respectively, and the saturation of the β-domain of the complete protein does not prevent the coordination of 4 Cd ions in the α-domain (Nielson and Winge 1985). A minimum of 18 different metals are capable of associating with MT, with the majority of them displaying saturation with 7 ions (Nielson et al. 1985). However, in vitro studies have demonstrated that a MT protein fully saturated with monovalent Cu or silver can bind 12 ions in a manner that results in a protein lacking defined α- and β-domains consistent with binding of 7 divalent metal ions (Nielson et al. 1985; Suzuki and Maitani 1981; Salgado and Stillman 2004).

It has been proposed that the β-domain represents the ancestral MT domain with a duplication event giving rise to a ββ-domain MT. Furthermore, the ββ-domain structure eventually diverged into the αβ-structure within specific taxa due to differences in selective pressures (Cols et al. 1999). Given the order of metal-binding preference (Cd > Zn > Cu) of the α-domain (Nielson and Winge 1983), compared to the Cu-binding preference of the β-domain (Nielson and Winge 1984), environmental variation may have led to the evolution of multi-domain MTs with specific functions carried out by each domain. The β-domain plays a primary role in the homeostasis of essential metals, primarily Cu, whereas the α-domain plays a more prevalent role in Zn homeostasis and sequestering of toxic metals such as Cd.

As research on invertebrate MTs progresses, it is becoming increasingly obvious that the domain characteristics of MTs are more varied than traditionally described. For example, pulmonate snails have a single structural MT consisting of two β-domains that have divergent amino acid compositions between various isoforms that allow for differences in metal-binding specificity with separate roles in Cu homeostasis or Cd detoxification (Palacios et al. 2011). Similarly, crustaceans, such as the lobster and blue crab (Homarus americanus and Callinectes sapidus, respectively), also have ββ-MTs that are capable of binding Zn and Cd (Valls et al. 2001; Syring et al. 2000). Callinectes sapidus also contains a novel Cu-binding MT consisting of 21 cysteines arranged in motifs that resemble the βα-MT structure found in vertebrates (Syring et al. 2000). In contrast, the earthworm Lumbricus rubellus contains a Cd MT consisting of 20 cysteines arrange in an αβ-MT structure with Cd4Cys11 and Cd3Cys9 stoichiometries, respectively (Ngu et al. 2006).

One of the more notable deviations in MT protein characteristics is the presence of MT proteins comprising three or more metal-binding domains found in both marine and terrestrial invertebrates, such as the American and Pacific oysters (C. virginica and C. gigas) (Tanguy and Moraga 2001; Jenny et al. 2004) and the earthworm (Enchytraeus buchholzi) (Willuhn et al. 1994). Some of these larger MT proteins have been shown to confer greater Cd resistance, when expressed ectopically, in E. coli and yeast strains (Tanguy et al. 2001; Tschuschke et al. 2002). While gene amplification and tandem duplications of MT genes have been demonstrated to confer metal tolerance (Beach and Palmiter 1981; Maroni et al. 1987; Mehra et al. 1990; Stephan et al. 1994), the evolution of larger MT proteins with more than two metal-binding domains support the theory of domain duplication serving specific functions since extra metal-binding domains may assist in the conference of metal resistance. These isoforms likely provide an advantage to benthic and terrestrial organisms that typically experience a greater exposure to metals due to their ecological niche (Tanguy and Moraga 2001; Tanguy et al. 2001; Tschuschke et al. 2002; Jenny et al. 2004).

Here we report the sequencing and annotation of a MT gene locus from the oyster, Crassostrea virginica, and provide a comparison to the complementary locus from the recently published C. gigas genome (Zhang et al. 2012). In addition, we provide a detailed analysis of the MT genes from C. virginica demonstrating how a complex series of molecular events has led to the current diversity of MT isoforms present in this species. Finally, we propose an evolutionary scheme in which two distinct ancestral β-domains have given rise to the structural diversity of all molluscan MTs.

Materials and Methods

BAC Clone Sequencing

From a Crassostrea virginica BAC library originally constructed by the Clemson University Genomics Institute, 2 clones were selected on the basis of positive hybridization to cDNA probes representing CvMT-I and CvMT-IV genes. Details of library construction, probe design, and hybridization were previously published (Cunningham et al. 2006). Sanger shotgun sequencing of BAC clones was conducted at the U.S. DOE Joint Genome Institute’s Community Sequencing Program, Project 06-SE-20. The GenBank accession numbers for the assembled C. virginica BAC sequences are GU207412.1 and GU324325.1.

MT Gene Annotation of BAC Sequence

The total number and exon/intron structure of MT genes in the assembled BACs were determined by aligning the transcript sequence of current CvMT-I, -II, or -IV isoforms against the full BAC sequence. Exon/intron boundaries were further delineated based on conservation of the canonical 5′ and 3′ splice donor sites identified in each of the genes.

Phylogenetic Analysis

Aligned MT nucleotide and protein sequences were phylogenetically reconstructed according to Bayesian methods with Markov Chain Monte Carlo (MCMC) utilizing MRBAYES 3.1 (Ronquist and Huelsenbeck 2003). CvMT-I and -II nucleotide α- and β-domain alignments were run under the HKY + 1 model determined by jModelTest (Posada 2008). The α- and β-MT protein domain alignments were run under the WAG model of protein evolution with gamma-distributed rates across variable sites and a proportion of invariable sites. All other priors were kept at default. MCMC chains were analyzed for 1 × 107 generations with sampling every 100 generations. The first 25 % of generations were discarded as burn-in, well after each analysis had reached stationarity, and a consensus tree and node probabilities were determined by the remaining generations. Trees were visualized and annotated with TreeView v1.6.6 (Page 1996). Species MT information and NCBI accession numbers for all sequences used in the phylogenetic analysis are included in Supplementary Table 1.

Collection of Crassostrea virginica

Adult C. virginica were obtained from native wild populations of oysters from Perdido Pass, Orange Beach, AL; Sandy Bay, Bayou la Batre, AL; Charleston Harbor, Sullivan Island, SC; and Delaware Bay, Leipsic, DE. Oysters were allowed to acclimate in the laboratory for 96 h in tanks with artificial seawater (25 ppt salinity at 22 °C) before use. An 80 % water change was performed on the tanks daily. Oysters were fed with 2 mL of Shellfish Diet (Reed Mariculture) per oyster twice daily while maintained in the laboratory. After acclimation to the laboratory, the oysters were distributed into 4 L Nalgene beakers containing artificial seawater (25 ppt salinity at 22 °C) containing 50 µM of Cd at a density of one oyster per liter of seawater. Cd exposures were used to induce the expression of the various MT isoforms for more efficient detection. The artificial seawater with 50 µM of Cd was renewed after 24 h. After 48 h of Cd exposure, the oysters were dissected and gill and digestive gland tissue was removed, flash frozen in liquid nitrogen, and stored at −80 °C until RNA isolation.

RT-PCR Screening and Cloning of CvMT Isoforms From Different Oyster Populations

Total RNA was isolated from 50 mg of hepatopancreatic tissue using Trizol Reagent (Invitrogen) followed by an additional column cleanup using RNeasy Miniprep columns (QIAGEN Corporation). First strand cDNA was synthesized from ~2 µg of total RNA with the qScript cDNA synthesis kit (Promega). Different primer sets were used to either simultaneously amplify all MT-I and -II isoforms or to preferentially amplify the MT-II isoforms. For the concurrent amplification of both CvMT-I and -II isoforms; forward 5′ UTR consensus primer (5′-GCCGAYTGTAYCACAGACAC-3′) and reverse 3′ UTR consensus primer (5′-CTCTYATTRGTCGAGCGYTC-3′) were used with a standard protocol of 32 cycles of PCR under the following conditions: denaturation at 94 °C for 30 s, annealing at 55 °C for 60 s, and extension at 72 °C for 60 s. For the targeted amplification of CvMT-II isoforms containing a combination of linker regions #1 and #3, forward LR3 primer (5′-ACCTCTGAAAATGCCAA-3′) and reverse LR1 primer (5′-ATATTGGTGCCGCTGCAC-3′) were used with a standard protocol of 32 cycles of PCR under the following conditions: denaturation at 94 °C for 30 s, annealing at 52 °C for 60 s, and extension at 72 °C for 60 s. PCR products were separated by electrophoresis using a 2.5 % agarose gel with 1X TAE buffer.

To screen for CvMT-IA and IB isoforms, RT-PCR products from eight separate reactions (4 Perdido Pass oyster and 4 Sandy Bay oyster) using the consensus primers for full-length amplification of both the CvMT-I and -II isoforms were cloned into the pGem®-T Easy Vector as per manufacturer’s instructions (Promega Corporation). The constructs were transformed into chemically competent 10G MRF′ E. coli cells (Lucigen Corporation). Four to six colonies from each oyster were prepped for plasmid purification and sent to MWG Operon for DNA sequencing using standard T7 and SP6 primers.

Results and Discussion

Diversity of Oyster Metallothionein Genes

Initial studies revealed far greater diversity in C. virginica MTs, in terms of structure and related function, than previously identified in any other organism. It should be noted that there have been no three-dimensional structural studies (e.g., nuclear magnetic resonance) on any molluscan MT proteins, so the designation of α- and β-domains is based on the number of cysteines within each domain separated by the linker regions and should be considered putative. Thus, the diversity of C. virginica MTs include the typical αβ-domain structure (CvMT-I and CvMT-IV), a ββ-domain structure (CvMT-III), and an exclusively α-domain structure consisting of one to four α-domains (CvMT-II) (Jenny et al. 2004, 2006) (Fig. 1). The CvMT-I isoforms and the paralogous group of CvMT-IV isoforms represent the traditional class of molluscan αβ-MTs with 21 conserved cysteines (Binz and Kagi 1999); however, these groups differ significantly in their non-cysteine amino acids (Jenny et al. 2004, 2006). The CvMT-IV isoforms are also distinguished by four extra cysteine residues (25 cysteines total) that result in three Cys–Cys–Cys motifs in the β-domain and one additional cysteine in the α-domain (Fig. 2) (Jenny et al. 2004, 2006).

Fig. 1
figure 1

Schematic of MT Isoform Structural Diversity found in C. virginica. Schematic representation of three distinct MT gene families present in C. virginica. The MT family diversity is determined by the domain structure and cysteine motifs found within the individual MT proteins

Fig. 2
figure 2

Alignment of representative β-domains. a The representative β-domains from all three CvMT families are compared to the two β-domains found in a gastropod (Helix pomatia) CuMT. The alignment highlights the divergence of the C-terminal β-domain of CvMT-III isoforms from the core cysteine motif (designated by the arrowed line) found in all other molluscan (bivalve and gastropod) MTs. However, the C-terminal β-domain of CvMT-III is highly conserved with the β-domain found in vertebrate MTs (as represented by human MT-II), as well as the complete ancestral MT protein found in Neurospora crassa. b Alignment of the paralogous CvMT-I and -IV isoforms demonstrates the conservation of the cysteine motifs between the α- and β-domains in support of the domain duplication and divergence hypothesis

The CvMT-II sub-family is distinguished by the sole presence of α-domains as a result of a mutation (AAG to TAG) which converts a lysine codon in the linker region separating the α- and β-domains into a stop codon (Jenny et al. 2004). All members of the CvMT-II sub-family contain this point mutation between the α- and β-domains. The CvMT-II sub-family has two related isoforms (CvMT-IIA and -IIB) which produce single α-domain peptides, whereas the remaining members of the sub-family (designated CvMT-IIC through CvMT-IIH) include isoforms in which two, three, and four α-domains are encoded (Jenny et al. 2004). The primary α-domains of the CvMT-II isoforms with two or more domains are highly conserved (96.2 % nucleotide identity) to the α-domain of CvMT-IA. Although the secondary α-domains are still quite conserved to the α-domain of CvMT-IA (92.0 % nucleotide identity), there are a few conserved amino acid substitutions that are found in every secondary α-domain of the CvMT-II isoforms, thus the two different α-domains are differentiated as either α1a or α1bn where the n represents the position of the α1b domains in the protein structure. Based on the differences in amino acid composition and cysteine motifs found in the CvMT-IV isoforms, we designated the α- and β-domains as α1c and β1b, respectively (Fig. 1).

The most defining feature of the CvMT-III gene family is the presence of two β-domains with conserved cysteine motifs (Figs. 1 and 2). To date, all other bivalve MT sequences have contained a minimum of 21 cysteines arranged into the canonical αβ-structure. In contrast, the prototypical structure of gastropod MTs consists of 18 cysteines, consistent with the ββ-domain structure (Dallinger et al. 1993; Berger et al. 1995, 1997). Even though the CvMT-III isoforms also have 18 cysteines arranged in two β-domains, the cysteine motifs of the C-terminal β-domains differ between the two taxonomic groups. With the exception of CvMT-III, all molluscan (bivalve and gastropod) MTs have a core motif, C–X–C–X(3)–C–T–G–X(3)–C–X–C–X(3)–C–X–C–K, in the C-terminal β-domain (Binz and Kagi 1999). In contrast, both domains of CvMT-III contain a core motif, C–X–C–X n –C–X–C–X n –C–X–C, consistent with the β-domain of many vertebrate, invertebrate, and fungal species (Nemer et al. 1985) (Fig. 2).

Evolution of Molluscan Metallothionein Structure

To place the significance of the diversity of MT structure identified in C. virginica into context, we will first review the general hypothesis of MT evolution. The β-domain has been proposed as the ancestral MT domain with a primary role in the homoeostasis of physiologically relevant metals, primarily Cu. Duplication of a Cu-preferring β-domain gave rise to a ββ-domain MT. In some situations divergence of one of the β-domains may have led to the gain of additional cysteines which gave rise to the formation of an α-domain (Cols et al. 1999) with different metal-binding capacity and selectivity. Thus, selective pressures led to the evolution of the α- and β-domains with both domains playing important roles in metal homeostasis. However, the difference in metal-binding affinity (Cd > Zn > Cu) suggests that the α-domain typically plays a more significant role in detoxification or certain metals (Nielson and Winge 1983).

The conservation of the cysteine motifs found in the two β-domains (βNH3+ and βCOO−) of the CvMT-III isoform (Fig. 2a) is supportive of the domain duplication hypothesis. Additionally, alignment of the β-domains from the CvMT-I and -IV isoforms with their respective α-domains demonstrates the conservation of the cysteine motifs between the two domains. Specifically, the first nine cysteines within the CvMT-IA and -IVA α-domains are conserved with all nine cysteines of their complementary β-domains (Fig. 2b). The addition of the three terminal cysteine residues within the α-domains of these paralogous MT genes supports the hypothesis of domain duplication followed by divergence.

Because of the distinct differences in the cysteine motifs of the various CvMTs, we investigated the evolutionary relationship between molluscan MTs by performing separate phylogenetic analyses using Bayesian methods on alignments of either the putative α-domains or the β-domains from various molluscan species. Although the α-domain phylogenetic tree is not well resolved (Fig. S1), consistency of the C-terminal β-domain of gastropod MTs with the canonical molluscan β-domain found in all other molluscan αβ-MT isoforms (Clade 1, Fig. 3) is contrasted by the close relationship between the N-terminal β-domain of gastropod MTs and both β-domains of the CvMT-III isoforms (Clade 2, Fig. 3). Upon closer comparison, the polytomy of molluscan sequences within the α-domain phylogenetic tree (Fig. S1) is consistent with much of the organization of the various branches found in the major clade (Clade 1) of the β-domain tree, specifically the separate clustering of mussel species (e.g., Mytilus edulis, Mytilus galloprovincialis, Bathymodiolus azoricus) from members of the Crassostrea genus and is supported by previously reported phylogenetic analysis of mussel MTs (Aceto et al. 2011). An additional phylogenetic analysis of molluscan MTs, in which the full-length MT protein sequences were used, produced two distinct clades that clearly separated gastropods and bivalves (Palacios et al. 2011). This is in direct contrast to the two clades produced from the phylogenetic analysis of the β-domains presented here, in which both major clades contain gastropod and bivalve species (Fig. 3).

Fig. 3
figure 3

Bayesian phylogenetic tree of molluscan β-domains. The molluscan MT β-domains fall into two distinct clades. Clade 1 contains the C-terminal β-domain with the core cysteine motif found in most bivalve and gastropod MT proteins. Clade 2 contains the second β-domain comprised a different cysteine motif that is more closely related to the ancestral β-domain found in Neurospora crassa (Nc CuMT), as well as the β-domain found in vertebrates (Hs, Homo sapiens; Gg, Gallus gallus; Dr, Danio rerio). The dashed box in Clade 1 and the solid box in Clade 2 represent the ββ-domain MTs found in pulmonate gastropods, while the solid gray boxes in Clade 2 represent ββ-domain MT isoform found in Crassostrea spp. Species abbreviations: BivalviaBa, Bathymodiolus azoricus; Bt, Bathymodiolus thermophiles; Ca, Crassostrea ariakensis; Ce edu, Cerastoderma edule; Ce gla, Cerastoderma glaucum; Cf, Corbicula fluminea; Cg, Crassostrea gigas; Cv, Crassostrea virginica; Dp, Dreissena polymorpha; Hy cum, Hyriopsis cumingii; Le, Laternula elliptica; Me, Mytilus edulis; Mg, Mytilus galloprovincialis; Ml, Meretrix lusoria; Oe, Ostrea edulis; Pm, Pinctada maxima; Pv, Perna veridis; Sb, Scapharca broughtonii; Tg, Tegillarca granosa; Ut, Unio tumidus. GastropodaAa, Arianta arbustorum; Bg, Biomphalaria glabrata; Ce hor, Cepaea hortensis; Ha, Helix aspersa; Hp, Helix pomatia; Me cre, Megathura crenulata; Ns, Nesiohelix samarangae. AscomycotaMc, Microsporum canis; Nc, Neurospora crassa. VertebrataDr, Danio rerio; Gg, Gallus gallus; Hs, Homo sapiens

Within the context of the model of MT protein evolution proposed by Cols et al. (1999), the presence of two distinct clades in Fig. 3 supports the novel hypothesis that there were two ancestral β-domains (designated β1 and β2) within the molluscan phylum that may have followed separate evolutionary paths within major molluscan taxa. If we consider the current known structures of gastropod and bivalve MTs, the two ancestral β-domains form a single structural β2β1-MT isoform in gastropods, whereas in bivalves the two β-domains appear to have diverged to produce two different structural MT isoforms, i.e., α1β1-MT and β2β2-MT (Fig. 4). The β1-domain appears to have duplicated to produce a two-domain MT that diverged and ultimately led to the evolution of the α1β1-MTs (CvMT-I and -IV isoforms) that play dual roles in homeostasis and detoxification (Jenny et al. 2004, 2006). In contrast, the CvMT-III isoforms likely formed from a single-domain duplication event and retained the β2β2-domain structure. Although the metal-binding preferences of the CvMT-III isoforms are unknown, they are primarily expressed during larval development and do not appear to be significantly induced by toxic metal exposure (Jenny et al. 2006). These non-canonical expression profiles suggest that they serve a specific role in metal homeostasis rather than metal detoxification. This evolutionary scheme differs significantly from the pulmonate gastropods which have β2β1-MTs that have diverged in amino acid composition to produce specific CuMT isoforms capable of binding 12 Cu ions and CdMT isoforms capable of binding 6 Cd ions, as well as an unspecific Cu/CdMT (Dallinger et al. 1993; Berger et al. 1997; Palacios et al. 2011; Perez-Rafael et al. 2014). This β2β1-MT structure also appears to be conserved in limpets (Megathura crenulata; Me creMT), a different gastropod family (Fig. 3) (Perez-Rafael et al. 2012). The retention of this single structural isoform is open to hypotheses; however, the Cu homeostatic requirements from the use of hemocyanin as a respiratory pigment in these gastropods, which is not present in oysters, is likely a significant factor in the lack of structural divergence found in pulmonate snails and limpets (Dallinger et al. 1993; Berger et al. 1997; Palacios et al. 2011; Perez-Rafael et al. 2012, 2014).

Fig. 4
figure 4

Molluscan MT evolution from two ancestral β-domains. A schematic representation of the different evolutionary paths taken by the two ancestral molluscan β-domains. The two β-domains appear to have converged to produce a single structural isoform in gastropods. In contrast, within the genus Crassostrea (Bivalvia: Ostreidae) the two β-domains underwent divergent paths that lead to the evolution of the three separate CvMT gene families. The divergence of the paralogous CvMT-I and CvMT-IV isoforms occurred prior to speciation within the Crassostrea genus. In contrast, the evolution of the MT-II isoforms within the Crassostrea genus is species specific

The number of cysteine residues in each MT domain plays an important role in determining the number of metal ions that will bind, but does not necessarily play a role in determining metal-specific binding. MT proteins contain two types of cysteine residues designated as either bridging or terminal. Bridging cysteines coordinate with two metal ions, whereas terminal cysteines only coordinate with one metal ion (Frey et al. 1985; Schultze et al. 1988). Mutations studies in which cysteine residues were substituted for serine demonstrated the importance of bridging cysteines in conferring proper metal-binding and resistance to Cd exposure when expressed in yeast (Chernaik and Huang 1991). This loss of Cd resistance is presumably through the reduction in the number of ions capable of being bound by the proteins, an observation not as strongly affected by the loss of terminal cysteines. In contrast, amino acid replacements on non-cysteine residues had significant effects on metal-binding specificity, domain stability and kinetic reactivity (Munoz et al. 2000a, b; Yamasaki et al. 1997; Kurasaki et al. 1997).

A comparison of the amino acid composition of pulmonate snail MT proteins with representatives of the three major MT gene families in C. virginica highlights some consistent differences in the composition of certain amino acids that are likely influencing metal-binding specificity (Table 1). Although aspartic acid, glutamic acid, and histidine amino acids have functional sidechains that are known to participate in metal-binding in other proteins, in MTs these residues are likely playing a major role in influencing the metal specificity, even though the cysteines are responsible for direct metal-binding. Pulmonate snail CuMTs are rich in aspartic acid and histidine, but completely lack glutamic acid resides, while the CdMTs contain glutamic acid, are enriched in lysine residues and lack histidines. Interestingly, in addition to the unique cysteine motifs, the amino acid composition found in all three MT families in C. virginica is unique to each family (Table 1) and provide excellent targets for future site-directed mutagenesis studies to investigate the role these non-cysteine amino acids have in influencing metal specificity. Although the presence of a single structural MT isoform in pulmonate gastropods highlights the uniqueness of C. virginica MTs, considering that there are approximately 10,000 bivalve species and greater than 60,000 gastropod species occupying unique ecological niches in terrestrial, freshwater, and marine ecosystems, the full potential diversity of molluscan MTs has not been sufficiently explored. Future sequencing efforts and functional studies may shed additional new light onto the evolution of MT proteins within this diverse phylum.

Table 1 Amino acid composition of representative molluscan MT proteins

Crassostrea Metallothionein Gene Structure and Evolution

The sequencing of several partially spliced CvMT-IA transcripts permitted the deduction of the exon/intron structure of the CvMT-IA gene (Jenny et al. 2004), and was confirmed by sequencing two overlapping BAC clones (Genbank Accession #’s GU207412.1 and GU324325.1) that contained a single locus composed of one CvMT-IV gene, four CvMT-II genes, and a single CvMT-IA gene (Fig. 5). This represents a significant increase in MT gene copy number and structural diversity as compared to the C. gigas MT-I, -II, and -IV locus identified from the recently sequenced genome (Zhang et al. 2012). The C. gigas genome has two MT loci, one containing a single CgMT-III gene and the other containing a single CgMT-IV and two CgMT-II genes (Fig. S2). Although the genome appears to be lacking a CgMT-I gene, it should be noted that the C. gigas genome was sequenced from a single inbred oyster derived from four generations of full-sibling mating and is not representative of wild populations which contain the CgMT-I isoform (Tanguy et al. 2001; Tschuschke et al. 2002). Of further interest, the CgMT-II genes identified from the genome contain four coding exons, which differ in structure from the previously published CgMT-II gene that only contains three coding exons (Tanguy and Moraga 2001). This discrepancy results from the loss of the last intron and fusion of exons 4 and 5 found in the C. gigas genome sequence into a single exon.

Fig. 5
figure 5

Annotated CvMT-I, -II, and -IV Locus. A schematic representation of the complete 40 kb CvMT locus consisting of one CvMT-I gene, four CvMT-II genes, and one CvMT-IV gene. Exons are designated as gray boxes. The CvMT-I and -IV isoforms have a genetic structure consisting of four exons. Exons designated by boxes with two numbers and a dashed line represent new exons within the structure of the CvMT-II genes that arose by fusion of two separate exons after partial duplication of an ancestral CvMT-I gene

Annotation of the CvMT-I, -II, and -IV locus (Fig. 5) permits the deduction of a series of events that most likely led to the generation of the genes encoding CvMT-II family members. The general structure of the CvMT-IA gene consists of a non-coding first exon followed by three coding exons, with the entire α-domain, linker region, and first 6 codons of the β-domain being encoded by exons 2 and 3, and the remainder of the β-domain encoded by the fourth exon. It should be noted that this overall genetic structure is conserved within the paralogous CvMT-IV gene. However, the structure of the CvMT-II genes is slightly more complex and likely required a progression of duplication and recombination events to ultimately produce the current structure of the gene locus. We postulate that an initial gene duplication event produced multiple CvMT-I genes in tandem position on the chromosome. A subsequent mutation of a lysine codon to a stop codon in the linker region occurred in the third exon of one of the CvMT-I genes (Jenny et al. 2004). A recombination event resulted in the fusion of the third exon of the CvMT-I gene lacking the mutation with the second exon of the mutated CvMT-I gene, generating a CvMT-II gene with five exons that encodes for an isoform consisting of two α-domains and a non-coding β-domain (predicted gene structure of CvMT-IIC, Fig. 5). Additional duplication and recombination events led to the evolution of CvMT-II isoforms consisting of 3 or 4 α-domains. For each CvMT that gained an additional α-domain, the gene structure appears to include an additional exon formed by an exon 3/exon 2 fusion (Fig. 5). The presence of the novel, extended linker regions characteristic of the CvMT-II isoforms consisting of two or more α-domains reveals the incorporation of 5′ UTR sequence found in the original exon 2 into the open reading frame as a result of the exon fusion (Fig. S3A). The nomenclature of each CvMT-II gene is based on the number of α-domains and the specific arrangement of the unique linker regions between the α-domains (Fig. S3B).

The introduction of a stop codon into the middle of the coding region of a gene would typically result in a dysfunctional protein and eventually lead to a pseudogene. However, previous studies on mammalian MTs indicate that, unlike the β-domain, the α-domain is stable as a single peptide and exhibits an even higher affinity for toxic metals, such as Cd, than the αβ-domain structure (Xiong et al. 1998; Cols et al. 1999). The apparent stability of CvMT-IIA and -IIB proteins, as evidenced by the single Cd-binding α-domain MT proteins expressed in oyster tissues (Jenny et al. 2004), may be indicative of the selective forces that led to the domain duplications. In fact, expression of recombinant protein constructs that produce αα-MTs demonstrate greater stability and a greater metal-binding capacity than that seen in αβ-domain MTs (Xiong et al. 1999). Furthermore, extended linker regions do not appear to alter the metal-binding activity of individual domains (Rhee et al. 1990). Thus, it is very likely that the multi-α-domain structure of the CvMT-II isoforms is extremely stable and very efficient at binding metals.

Phylogenetic analysis of the nucleotide sequence of each individual α-domain was used to provide insight into the evolution of the CvMT-II isoforms. As stated previously, the primary α-domain (α1a; Fig. 1) of the CvMT-II isoforms is highly conserved with the α-domain of CvMT-IA. However, the remaining α-domains (α1bn; Fig. 1) have prominent amino acid substitutions that are found in every secondary α-domain of the CvMT-II isoforms. Bayesian phylogenetic analysis resulted in a tree with two major clades, one consisting of the primary α-domains and the other consisting of the secondary α-domains (Fig. 6). The only notable exception to the positioning within the tree was the fact that the α-domains of related CvMT-IB isoforms (CvMT-IB1 and -IB2α) fall within the clade containing the secondary α-domains, suggesting that the multi-domain CvMT-II isoforms were formed from a recombination event of CvMT-IA and CvMT-IB genes. The most likely scenario is that one allele of the CvMT-IB gene obtained the point mutation in the linker region and the last three exons of that gene were recombined with the first three exons of a CvMT-IA gene, resulting in the five exon gene encoding for an αα-domain MT (CvMT-IIC) (Jenny et al. 2004). At some point, a homologous copy of the CvMT-IB lacking the point mutation served as the source of the duplicated exons 2 and 3 that are responsible for producing the CvMT-II isoforms consisting of three or four α-domains. The observation that the single α-domain CvMT-II isoforms (CvMT-IIA and -IIB) are more closely related to CvMT-IA (Fig. 6) supports the hypothesis that the point mutation within the linker region has independently occurred multiple times.

Fig. 6
figure 6

Phylogenetic analysis of CvMT-I and -II α-domains. A Bayesian analysis of the DNA sequences of the coding region from the individual α-domains of all the CvMT-I and -II isoforms resulted in the production of a tree that is differentiated into two major clades, one clade containing the primary α-domains and the other clade containing all of the secondary α-domains. The α-domain found in the CvMT-IA isoform is found in a clade with all other primary α-domains found in the CvMT-II isoforms. The α-domain from the CvMT-IB isoforms groups with all of the secondary α-domains from the CvMT-II isoforms, supporting the hypothesis that the multi-domain CvMT-II isoforms, arose from duplication and recombination events involving an ancestral CvMT-IB isoform. The α-domain from the C. gigas MT-I isoform was used as the outgroup for rooting the tree

An additional Bayesian phylogenetic analysis using the nucleotide sequences encoding for the individual β-domains of each CvMT-I and -II isoform produced a tree with three minor clades and a single branch (Fig. 7). We presume the tree structure is influenced by the repeated duplication and recombination events that were required to produce the diversity of CvMT-II isoforms. Considering that only a single point mutation is what differentiates the CvMT-IIA and -IIB isoforms from being classified as a CvMT-I gene, the positioning of CvMT-IIB in both the α-domain and β-domain trees (Figs. 6, 7) seems to support the possibility of a locus containing at least three ancestral CvMT-I genes (Fig. 5). However, a much more in-depth sequencing project targeting oysters from different regional populations is required for confirmation of the ancestral locus.

Fig. 7
figure 7

Phylogenetic analysis of CvMT-I and -II β-domains. A Bayesian analysis of the DNA sequences of the individual β-domains from all of the CvMT-I and -II isoforms. The β-domain from the C. gigas MT-I isoform was used as the outgroup for rooting the tree

Prevalence of CvMT-I and-II Isoforms

Original studies with C. virginica demonstrated the presence of two Cd-rich protein fractions of differing molecular weight, 10 and 24 kDa, based on elution characteristics from a Sephadex gel column (Ridlington and Fowler 1979; Fowler et al. 1986). Data from additional early studies suggested that the occurrence of the two different molecular weight Cd-rich proteins, eventually confirmed to be CvMT-I and CvMT-II isoforms (Jenny et al. 2004), are restricted to oysters south of Beaufort, NC and along the Gulf of Mexico coast (Engel 1999). During the initial characterization of CvMT isoforms (Jenny et al. 2004), over 360 individual MT clones were sequenced. All of these clones were isolated from cDNA libraries made from pooling gametes from four females and three male oysters, or by direct PCR cloning from several individual oysters from the ACE Basin, SC, and of those >360 clones only one copy of the CvMT-IB transcript was identified. The presumed lack of CvMT-IB transcripts in the southeastern U.S. oyster population could be explained by the proposed genetic recombination and selection events that led to the formation of the CvMT-II isoforms. Given the high occurrence of the CvMT-II isoforms, an ancestral allele containing the original CvMT-IB may be quite rare in this population.

Early data suggesting that the CvMT-II isoforms are not found in northeastern U.S. populations (Engel 1999), coupled with the observed diversity of CvMT-II isoforms in southeastern U.S. oyster populations (Jenny et al. 2004), raise questions regarding the origin of the CvMT-II isoforms. Since the genetic discontinuity of estuarine organisms within the Gulf of Mexico populations has been demonstrated (Williams et al. 2008), we performed a comparison to investigate the prevalence of CvMT-I and -II isoform diversity between native, wild oysters collected from Perdido Pass (east of Mobile Bay) and Sandy Bay (west of Mobile Bay). Furthermore, because of the potential barrier to gene flow near Cape Hatteras, NC (Baker et al. 2008; McCartney et al. 2013; Boehm et al. 2015), we also investigated the prevalence of CvMT-I and -II isoforms from southeastern U.S. oysters collected from Charleston Harbor near Sullivan’s Island, SC and northeastern U.S. oysters collected from Delaware Bay near Leipsic, DE. Oysters from each site were acclimated to laboratory conditions prior to a 48-hour exposure to 50 µM Cd to induce CvMT-I and -II gene expression. Standard RT-PCR and DNA gel electrophoresis techniques were used to screen for CvMT diversity using a consensus set of primers that would simultaneously amplify all CvMT-I and -II isoforms, and a second set of primers that would only amplify CvMT-II isoforms consisting of a LR3-LR1 linker configuration (CvMT-IIE and CvMT-IIH). Unfortunately, because of the duplication events and high sequence similarity, we were not able to design unique primers that were sufficient for the specific amplification of LR1-LR1 or LR1-LR2 linker configurations.

Using the consensus primers, we successfully amplified PCR products representing the CvMT-I isoforms from all ten oysters from each of the four populations (Fig. 8a). However, Charleston Harbor was the only population in which all ten oysters also had strong amplification of multiple CvMT-II isoforms. We did observe very faint, larger PCR bands from a limited number of oysters from Perdido Pass (1 oyster), Sandy Bay (2 oysters), and Delaware Bay (4 oysters) consistent with the presence of CvMT-II isoforms with multiple α-domains (Fig. 8b). Using primers that would specifically amplify a fragment from CvMT-IIE and -IIH transcripts (LR3-LR1 linker configuration), we were able to once again detect strong PCR products from all ten of the Charleston Harbor oysters, while only seven of the Delaware Bay oysters produced a PCR product. Furthermore, only two oysters from Perdido Pass tested positive from the CvMT-IIE and -IIH isoforms, while six oysters from Sandy Bay tested positive although most produced weak PCR bands (Fig. 8b). The currently assembled CvMT gene locus (Fig. 5) supports the presence of a diversity CvMT-II genes with the respect to the linker regions used to differentiate isoforms, including the presence of CvMT-II genes with the LR3-LR1 configuration. Thus, it is not unreasonable to expect the LR3 forward/LR1 reverse primer set to produce a PCR band in most oysters tested in the screening. However, it is clear that the other three populations did not have detectable expression of these isoforms in every oyster, as seen in the Charleston, SC population (Fig. 8b). Although CvMT-II isoforms are present in all four oyster populations, the stronger levels of expression and multiple PCR products observed in the Charleston population could be due to a higher gene copy number and greater diversity of CvMT-II genes within individuals. Although our sampling was not geographically comprehensive, the observed diversity of expressed CvMT-II isoforms supports the hypothesis that regional environmental pressures may have led to an adaptive expansion of the CvMT locus in southeastern U.S. oyster populations.

Fig. 8
figure 8

PCR-based analysis of CvMT-I and -II diversity in four populations of C. virginica. Reverse-transcriptase PCR was performed on Cd-treated oysters collected from two Gulf of Mexico populations and two Atlantic populations. Oysters were exposed to 50 µM Cd for 48 h to optimize expression of the CvMT-I and -II isoforms. Total RNA was isolated from digestive gland for RT-PCR analysis. a Consensus primers were used to simultaneously amplify all CvMT-I and -II isoforms from 10 individual oysters from four different populations. Although all four populations displayed expression data consistent with the presence of CvMT-II isoforms, only the Charleston SC oysters displayed evidence of extensive diversity of the CvMT-II isoforms. b Specific primers were used to amplify the cDNA fragment between linker regions 3 and 1 in the CvMT-IIE and -IIH isoforms from 10 individual oysters from four different populations. Only the Charleston, SC population had detectable expression of either isoform in all 10 oysters. Overall, Atlantic oyster populations had a greater diversity and abundance of CvMT-II isoforms

Classic phylogeographic studies have identified a stable shift in allele frequencies of mitochondrial DNA and nuclear genes in C. virginica populations along the Atlantic US coast, suggesting pronounced genetic isolation between populations in the Gulf of Mexico and Atlantic Ocean (Reeb and Avise 1990; Hare and Avise 1996). Possible explanations for this cline include both neutral processes (e.g., historical or contemporary dispersal barriers or strong variance in reproductive success (Hedgecock 1994; Hedgecock et al. 2007)) and environmentally mediated selection (Murray and Hare 2006; Rose et al. 2006) from differential sensitivity to stressors such as temperature, salinity, and parasite infection (Rose 1984; King et al. 1994; Bushek and Allen 1996; Pernet et al. 2008). Furthermore, southeastern U.S. oyster populations are predominantly intertidal, unlike the northeastern U.S. and Gulf of Mexico oyster populations which are predominantly subtidal. Finally, previous studies have demonstrated that southeastern U.S. oyster populations are exposed to naturally high levels of arsenic that spatially coincide with natural phosphate deposits. As a result, the arsenic body burden in southeastern U.S. oysters is greater than twice the national average (Scott et al. 1994, 1998; Valette-Silver et al. 1999). Since oysters live in challenging environments requiring significant adaptation to a variety of environmental factors (Zhang et al. 2012), a key avenue for future research is greater characterization of the potential adaptive diversification of the MT genes due to divergent environmental pressures across the C. virginica range within the context of the species’ complex biogeographic history. Broader geographical sampling across the Atlantic and Gulf of Mexico and more powerful population genomic techniques to characterize allelic variation in CvMT with respect to genome-wide variation will ultimately be needed to fully assess any differences in CvMT diversity and their relationship to C. virginica population structure.

Given the lower prevalence of CvMT-II genes in the Gulf of Mexico population, we hypothesized that a significant number of oysters may contain an ancestral CvMT-I and -IV locus (Fig. 5) with an intact CvMT-IB isoform. To investigate the presence of CvMT-IB isoforms in the Gulf of Mexico populations, we cloned and sequenced PCR products from multiple oysters from both Perdido Pass and Sandy Bay. We successfully isolated multiple CvMT-IB isoforms from Perdido Bay and Sandy Bay oyster populations. Interestingly, we were able to isolate two novel CvMT-IB-like isoforms (CvMT-IB2, NCBI Acc# KU557446 and CvMT-IB2α, NCBI Acc# KU557447) from a Perdido Bay oyster that are closely related to the secondary α-domains found in the CvMT-II isoforms. To differentiate these new isoforms, we have re-designated the original CvMT-IB isoform as CvMT-IB1. CvMT-IB2 differs from CvMT-IB1 by only two amino acid substitutions, but the amino acid in the third position is aspartic acid (D) which is consistent with all of the secondary α-domains found in the CvMT-II isoforms.

However, the most important observation is the identification of the CvMT-IB2α isoform which has a mutation in the linker region converting the lysine codon to a stop codon, consistent with all of the CvMT-II isoforms (Fig. 9). Thus, CvMT-IB2α may represent the ancestral gene that we hypothesize eventually gave rise to the terminal secondary α-domain found in all of the CvMT-II isoforms. Additionally, we randomly sequenced multiple PCR products from Sandy Bay oysters and were only able to identify CvMT-IA and CvMT-IB2 isoforms. This raises the possibility that a significant percentage of Gulf of Mexico oyster populations may contain an ancestral CvMT locus consisting solely of CvMT-I and CvMT-IV genes (Fig. 5). Based on the results of the PCR screening, it is possible that some of the Delaware oysters may also contain the ancestral CvMT-I and -IV locus (Fig. 8).

Fig. 9
figure 9

Alignment of the deduced amino acid sequence of CvMT-IB isoforms with Secondary α-domains from representative CvMT-II Isoforms. Two new CvMT-I isoforms (designated as CvMT-IB2 and IB2α) mostly closely related to the original CvMT-IB (designated as CvMT-IB1) were identified in Gulf of Mexico oyster populations. The CvMT-IB2α isoforms contains the linker region point mutation found in all of the CvMT-II isoforms

Our data support the hypothesis that the CvMT-II isoforms originated in a Gulf of Mexico oyster population. A combination of coastal currents and the Gulf Stream could have supported gene flow from the Gulf of Mexico population to the southeastern U.S. populations (ACE Basin) where local selective factors may have driven the duplication events that produced the significant diversity of CvMT-II isoforms. C. virginica is a broadcast spawner with a planktonic larval stage (Galstoff 1964) that should contribute to weak demographic isolation over large geographic scales, but can nonetheless also exhibit pronounced local or regional population structure (Rose et al. 2006; Anderson et al. 2014). It will thus be interesting to investigate the genetic diversity of C. virginica MTs to reveal balance between contributions to genetic divergence from historical biogeographic events and past environments with ongoing patterns of gene flow and selection from contemporary conditions.

Conclusions

The American oyster, Crassostrea virginica, is unique among all metazoans in having the greatest diversity of metallothionein isoforms identified in any species to date. The structural diversity of CvMTs presents new and exciting opportunities for future studies that will address the evolutionary structural and functional diversification of MT proteins. Future studies that incorporate broad geographical and environmental sampling from both Atlantic and Gulf of Mexico populations, and include assessment of genome-wide genetic variation and population differences to physiological stress, could reveal important roles of abiotic factors and ecological niche on structuring the genetic diversity of the CvMT genes, as well as many other stress-response genes.