Introduction

Spiders can make up to six different proteinaceous silks that display a variety of remarkable physicochemical properties (Altman et al. 2003; Gosline et al. 1986). Spider silks, particularly dragline silk, have attracted much attention for their potential exploitation in the development of new materials (Allmeling et al. 2008; Hedhammar et al. 2010; Schacht and Scheibel 2011). The combination of strength and elasticity in dragline fibers suggests that materials based on these fibers will outperform currently available man-made materials such as steel, nylon and Kevlar (Gosline et al. 1999). Spider silks are also biocompatible and biodegradable which makes them ideal materials for a variety of biomedical applications (Hedhammar et al. 2010). However, to take advantage of these remarkable qualities, sufficient amounts of the constituent proteins will need to be readily available. This cannot be accomplished by harvesting the spinning dope from spiders as a single spider produces <1 mg per day (Seidel et al. 1998). In addition, farming of spiders is impractical due to territorial and cannibalistic behavior.

The spider silk with the greatest mechanical toughness is major ampullate or dragline silk. It is made of two large, repetitive proteins, the major ampullate spidroins 1 and 2, MaSp1 and MaSp2 (Hinman and Lewis 1992; Xu and Lewis 1990). The full length sequences that encode these proteins have been unraveled (Ayoub et al. 2007). The general architecture of these proteins (Fig. 1) is similar in that both have large, central glycine- and alanine-rich repeat domains that consist of approximately 100 copies of shorter imperfect repeat blocks (Bini et al. 2004; Ayoub et al. 2007). MaSp1 and MaSp2 repeat blocks differ in the length (~30 and ~40 amino acids, respectively) and the nature of the repeat motifs. While each repeat block in MaSp1 and MaSp2 contains a poly-alanine tract, the amino acid triplet GGX is common in the repeat block of MaSp1 whereas MaSp2 contains proline rich motifs (GPGGY and GPGQQ) in the repeat domain (Bini et al. 2004). The repeat domain of MaSp1 is virtually devoid of proline (Hinman and Lewis 1992). The repeat domains of other silks (e.g. tubuliform and flagelliform silks), while conserved within and across species, differ dramatically from the MaSps (Huang et al. 2006). The differences in repeat domains have been shown to be directly associated with the differences in tensile properties of various spider silks (Xia et al. 2010).

Fig. 1
figure 1

Diagrammatic representation of recombinant mini-spidroin protein domain architecture and purification strategy. a Mini-spidroin structure. Each construct contains NTD and CTD flanking repeat domain multimers (R8 shown) followed by the intein-CBD domain used for protein purification. b, c Purification strategy. rMaSp-intein-CBD protein pre-intein activation (b) and post-intein activation (c)

The repeat domains are flanked on both the N- and C-terminal sides by short non-repetitive sequences of approximately 150 amino acids each (N- and C-terminal domains; NTD and CTD). The NTDs and CTDs are conserved not only between MaSp1 and MaSp2 but also among other silks in the same species and with other species as well (Beckwitt and Arcidiacono 1994; Garb et al. 2010; Motriuk-Smith et al. 2005). This suggests that these non-repetitive domains have a common function in silk biology and the self-assembly process.

Recent studies on the functional role(s) of the NTD and CTD strongly suggest they are involved in regulating the self-assembly process. The CTD forms homodimers through one disulfide bond and two salt bridges (Ittah et al. 2007). Changes at the dimer interface and salt bridge region are critical in protein transition from soluble protein to solid fiber. NTD also forms homodimer through salt bridges (Hagn et al. 2010). The strength of the salt bridges is relevant to the ionic strength of the sodium chloride solution as demonstrated by a computer simulation study (Gronau et al. 2013). Reducing pH induces a local structure change that helps to stabilize NTD dimerization (Gaines et al. 2010) and initiate protein secondary structure transition from coil to β-sheet (Dicko et al. 2004). Inclusion of this domain in the recombinant protein construct may provide a valuable control point of the assembly process that is lacking in repeat domain-only fiber production. In addition, one or both terminal domains may also function as solubility enhancers thereby allowing the high protein concentrations found in the lumen of the spider silk gland (Huemmerich et al. 2004a, b). While it has been demonstrated that fibers can be formed from repeat-only domains, inclusion of the NTD and CTD may help to stabilize proteins expressed in heterologous systems and may also allow the development of novel spinning technologies (Ittah et al. 2006).

Due to the limitations inherent in harvesting silks directly from spiders, there has been widespread use of genetic engineering for the production of recombinant spidroin-like proteins. The expression systems employed thus far include prokaryotes (E. coli and Salmonella: Arcidiacono et al. 1998; Lewis et al. 1996; Widmaier et al. 2009; Xia et al. 2010), lower eukaryotes (yeasts: Fahnestock and Bedzyk 1997; Gaines and Marcotte 2011), transgenic plants (Arabidopsis, potato, and tobacco: Scheller et al. 2001; Yang et al. 2005; Hauptmann et al. 2013a, b, 2015; Weichert et al. 2014; Menassa et al. 2004; Patel et al. 2007), mammalian cell lines (Lazaris et al. 2002), insects (silkworm: Miao et al. 2006; Teulé et al. 2011) and animals (mice and goats: Service 2002; Xu et al. 2007). Each of these approaches has met with some success although none has yet surmounted the largest problem, scalable expression and purification of recombinant spidroin-like proteins for materials development.

The use of a large biomass plant like tobacco provides advantages over other approaches for recombinant spidroin-like protein production. With the demonstration that these proteins can be purified from crude plant extracts (see below), optimization of the processes involved will make the approach essentially unlimited in scalability.

Here, we present the assembly of mini-spidroin genes that encode native MaSp1 and MaSp2 NTD and CTD sequences flanking an abbreviated number of the respective consensus repeat domains. Downstream of the mini-spidroin CTD is a self-cleavable modified minimal intein from the bacterium Mycobacterium xenopi (Mxe, Telenti et al. 1997) fused to a chitin binding domain (CBD; Evans et al. 1999). The Mxe intein variant used here (N198A) is disabled for cleavage at the C-terminus of the intein but retains the ability to cleave at the intein N-terminus when activated by a reducing agent. The CBD is derived from the C-terminal domain of chitinase A1 from Bacillus circulans WL-12 (Watanabe et al. 1994). We demonstrate that upon introduction into Nicotiana tabacum, the genes are expressed and mini-spidroin proteins can be purified by chitin affinity chromatography and intein activation. Mini-spidroins were dialyzed and concentrated through freeze-drying, resulting in gelatin-like liquids.

Materials and methods

Mini-spidroin gene construction and generation of transgenic plants

A synthetic multi-cloning site (MCS), generated by annealing and extension of overlapping oligonucleotides, was used to replace the MCS in pT7T3α-19. The synthetic MCS was used to assemble the mini-spidroin genes and contained the following restriction enzyme sites: HindIII-AscI-NcoI-XbaI-XmaI/SmaI-AgeI-NgoMIV-BglII-BamHI-NotI-EcoRI. Isolation of native NTD sequences for MaSp1A and MaSp2 has been described previously (Gaines and Marcotte 2008). Shorter fragments containing the native NTD sequences lacking the predicted signal peptide were generated using primers (Table 1) such that each NTD contained a SmaI/XmaI site at one end and an AgeI site at the other end. Similarly, shorter fragments containing the native CTD sequences were generated using primers (Table 1) such that each CTD contained an NgoMIV site at one end and an BglII site at the other end.

Table 1 Oligonucleotide primers

Alignment of published MaSp1 and MaSp2 repeat domains provided a consensus amino acid sequence for each:

  • NH2-GGAGQGGYGGLGGQGAGRGGQGAGAAAAAA-COOH for MaSp1

  • NH2-GPGQQGPGGYGPGQQGPGGYGPGQQGPSGPGSAAAAAAAA-COOH for MaSp2

Reverse translation provided a nucleotide sequence optimized for tobacco codon usage. Consensus nucleotide repeats were created by annealing and extension of overlapping oligonucleotides (Table 1) such that the repeat domains contained an AgeI site at one end and an NgoMIV site at the other.

The Mxe gyrA intein and adjacent CBD were amplified from plasmid pTWIN1 (New England Biolabs, Cat. No. N6951S) using primers (Table 1) to produce a fragment that contained a BglII site on one end and a BamHI site on the other end. After amplification and cloning, the single NgoMIV and AgeI sites contained in the region encoding the intein-CBD were mutagenized to facilitate later cloning. These changes result in a single amino acid change in the region containing the intein and CBD.

The nucleotide fragments described above were assembled into the synthetic MCS. The translational start and stop codons were provided by the NcoI site in the MCS and the pTWIN1-derived intein-CBD fragment, respectively. To generate multimers of the repeat domains, monomeric copies of the repeat sequence (containing AgeI and NgoMIV ends) were gel-purified, self-ligated and subsequently re-digested with AgeI and NgoMIV. Tetramers of the repeat domain were then gel-purified and ligated between the NTD and CTD coding regions. For larger order multimers (8 copies and 16 tandem copies of the repeat domain), shorter multimers were pyramided on top of one another.

Plant expression plasmids pKM12 (Dey and Maiti 1999) and pKLP36 (Maiti and Shepherd 1998) were obtained from Dr. Indu Maiti (Kentucky Tobacco Research and Development Center). Both contain caulimovirus promoter regions from (mirabilis mosaic and peanut chlorotic streak, respectively), an MCS and rbcS terminator as well as a kanamycin cassette for selection of transgenic plants. The expression plasmid MCS was replaced with the synthetic MCS described above to generate pKM12′ and pKLP36′. Assembled mini-spidroin constructs were inserted into pKM12′ and pKLP36′ as AscI/NotI fragments. Constructs were named for the mini-spidroin and repeat number (e.g. pKM12-Sp1R8 contains the MaSp1 NTD, eight copies of the Sp1 repeat domain and the MaSp1 CTD followed by the intein CBD domain). Plasmid pKM12′ and pKLP36′ containing either MaSp1 (with eight or 16 copies of the repeat domain) or MaSp2 (with eight, 16, or 32 copies of the repeat domain) were generated and introduced into Agrobacterium. Agrobacterium-mediated transformation into Nicotiana tabacum and selection were essentially as described (Fisher and Guiltinan 1995). Primary transformants (T0) were selected on kanamycin-containing medium, transferred to soil and grown to maturity. Seeds from T0 plants were germinated on kanamycin-containing medium and the seedlings (T1) were transferred to soil. At least 12 T1 plants for each construct were evaluated for the presence of the transgenes using PCR for both NTD and intein-CBD regions and all were positive. At least two T1 plants per construct (up to six) were tested for protein expression by immunoblot (see “Materials and methods” section and below for details). For each construct, at least two protein expression-positive lines were confirmed except for MaSp2R8, for which only a single line was identified. The protein-positive T1 plants were grown to maturity and T2 plants were used for all subsequent analyses.

Protein extraction and purification

Leaf tissues (routinely multiples of 50 g) were ground either manually or using a blender in Tris-buffered saline (1X TBS; 20 mM Tris–HCl pH 7.4, 140 mM NaCl) supplemented with 0.1 % Tween 20 (1X TBST) and the proteinase inhibitors PMSF (1 mM) and TLCK (0.1 mM). The ratio of tissue (T, in g) to extraction buffer volume (E, in mls) was 1:10 (T:E) unless noted otherwise. For small scale extractions and screening purposes, four 0.5 cm leaf punches were manually ground in 250 µl of extraction buffer. Tissue debris was removed from the crude extracts by either centrifugation (for leaf punches) or filtration through Miracloth (EMD Millipore) followed by 3MM filter paper (for larger scale extraction). Chitin resins (New England BioLabs Inc. #S6651L) were washed with 10 resin volumes of nano-pure water followed by five resin volumes of 1X TBS, pH 7.4 prior to addition to the crude extracts (1–2 ml resin/100 ml crude extract) and incubation at 4 °C overnight with mixing. The following day, the chitin beads were collected and washed with ≥20 volumes of 1X TBST until visually most or all of the chlorophyll was removed. Mini-spidroins were released from the fusion proteins by incubating the chitin resins with three resin volumes of intein activation buffer (20 mM Tris–HCl, pH 8 supplemented with 500 mM NaCl, 30 mM DTT and 0.1 % Tween 20) at 16 °C for 40 h with mixing. Eluate was collected and the chitin beads washed twice with one resin volume each of 1X TBST. These were pooled with the eluates after which the mini-spidroins were concentrated ten-fold using Pellicon XL 50 Ultrafiltration Cassettes (EMD Millipore, Cat No. PXC010C50). The concentrated mini-spidroins were dialyzed against 5 mM ammonium bicarbonate at 4 °C overnight. Dialysis tubing was moved into fresh buffer and dialysis was continued for an additional 8 h with fresh buffer every 4 h. The mini-spidroins were recovered from the tubing and pre-frozen at −80 °C. Freeze drying was performed using a Labconco lyophilizer (FreeZone 2.5) with the temperature and vacuum settings at −50 °C and 0.08 mBar, respectively, until all visible ice crystals were gone. The resulting material was a viscous gelatin-like fluid.

Immuodetection

Samples were separated on 8 % SDS–polyacrylamide gels and electro-transferred to PVDF membranes. Recombinant MaSp1A-NTD was used to generate a custom polyclonal antibody (Rockland Immunochemicals, Inc.). The crude antibody was affinity-purified using Affi-10-immobilized MaSp1A-NTD and then cross-adsorbed against total protein isolated from non-transgenic tobacco (Affi-10 and Affi-15 immobilized). The resulting antibody recognizes both MaSp1 and MaSp2 NTDs and was used as the primary antibody. The secondary antibody was goat anti-rabbit conjugated with alkaline phosphatase. The detection reagent for alkaline phosphatase was Lumi-Phos WB (Pierce, Cat. No. #34150) and membranes were imaged using a Fujifilm LAS-1000plus imager.

Results

Mini-spidroin coding regions were assembled as described in “Materials and methods” section. The encoded mini-spidroins consisted of MaSp1 or MaSp2 NTD and CTD sequences flanking either 8, 16 or 32 copies of the respective repeat domain and followed by the intein-CBD. Attempts to generate larger repeat domains have been unsuccessful. Because the coding regions differ only in the number and nature of the repeat domains, these are hereafter referred to as rMaSp1R8, rMaSp1R16, rMaSp2R8, rMaSp2R16 and rMaSp2R32. All cloning junctions were confirmed by sequencing. Each assembled mini-spidroin gene was transferred into the plant expression plasmids pKM12’ and pKLP36’ and introduced into Nicotiana tabacum.

Purified rMaSp1A-NTD protein was used to generate and affinity purify an anti-NTD antibody. The purified antibody detects rGST-MaSp1A NTD (Fig. A1, Lane 1) but does not cross react with any protein from wild-type tobacco leaf crude extract (Fig. A1, Lane 2). Figure 2 and Fig. A1 (Lanes 3 and 4) show the antibody readily detects both rMaSp1 and rMaSp2 proteins containing 8 and 16 copies of the repeat domains in crude extracts from transgenic plants. The expected MWs for the full-length rMaSp1- and rMaSp2-intein-CBD fusions are indicated in the legend and the corresponding bands are marked on the image with asterisks. Interestingly, the intein domain in the rMaSp2-R8 fusion proteins appears to spontaneously activate at some level. This is evidenced by the presence of a polypeptide containing the MaSp2-NTD at a molecular weight consistent with removal of the 28 kDa intein-CBD domain (Fig. 2, Lane 2; Fig. 4b). A lower molecular weight band is also seen in Fig. 2, Lane 4 (rMaSp2R16) but the size is not consistent with simple removal of the intein-CBD domain. The identity of this band is not clear but may indicate anomalous migration of rMaSp2 proteins with larger numbers of repeat domains. While the rMaSp1-R8 and -R16 samples in Fig. 2 do not display this same activity, it has also been observed in some rMaSp1 samples (Fig. A2). It is not clear if this activity is present in planta but, if that were true, one might expect all (or at least most) of the recombinant protein to be cleaved. However, there is clearly abundant full-length protein present in all crude extracts. This suggests that intein activation may be occurring at some point in the extraction/purification procedure or there may be a unique structural arrangement that can form under some circumstances in the rMaSp fusions that leads to intein activation/cleavage. It is also possible to see some bands at higher molecular weights, particularly in the rMaSp2 samples, that likely correspond to multimers of either full-length fusion proteins, mini-spidroins lacking the intein-CBD or combinations thereof.

Fig. 2
figure 2

Immunodetection of recombinant mini-spidroins in crude leaf extracts. Leaf punches were hand-ground in microfuge tubes in the presence of 250 µl of extraction buffer. Each lane represents 25 µl of crude extract (~20 µg TSP). Full length recombinant proteins are marked by asterisks. Lane 1, rMaSp1R8 (73 kDa); Lane 2, rMaSp2R8 (81 kDa); Lane 3, rMaSp1R16 (92 kDa); Lane 4, rMaSp2R16 (108 kDa). Primary antibody was affinity-purified polyclonal rabbit anti-rMaSp1A NTD. Molecular weight marker sizes (in kDa) are shown to left

Figure 3 shows immuno-detection of rMaSp2 proteins containing 32 copies of the repeat domain in crude extracts from both pKM12 and pKLP36 transgenic lines. The expected MW for the full-length rMaSp2R32-intein-CBD fusion is indicated in the legend. It seems that spontaneous activation of intein-CBD in the case of rMaSp2R32 protein is much less than seen for the rMaSp2R8 and rMaSp2R16 proteins. The signal strength of the rMaSp2R32 bands of the two individual pKLP36 lines (Fig. 3, Lanes 2, 3) indicates the protein amount varies in each transgenic line even when under control of the same promoter.

Fig. 3
figure 3

Immunodetection of recombinant mini-spidroin rMaSp2R32 (162 and 134 kDa, with and without the intein-CBD, respectively) in crude leaf extracts. Leaf punches were hand-ground in microfuge tubes in the presence of 250 µl of extraction buffer. Each lane represents 25 µl of crude extract (~20 µg TSP). Full length recombinant proteins are marked by asterisks. Lane 1, pKM12′-Sp2R32 (Line 1.2.1); Lane 2, pKLP36′-Sp2R32 (Line 2.3); Lane 3, pKLP36′-Sp2R32 (Line 3.5). Primary antibody was affinity-purified polyclonal rabbit anti-rMaSp1A-NTD. Molecular weight marker sizes (in kDa) are shown to left

As part of the development of our purification protocol, we compared hand and blender grinding of leaf tissue at two different tissue:extract ratios (T:E = gram:ml). Based on immunodetection (Fig. A3), hand grinding at a T:E of 1:5 is slightly better than the other conditions tested and there was little difference in the laddering observed. The lower leaf tissue to buffer ratio does reduce the amount of extraction buffer which would be an advantage in scalable productions. Other than multimerization and intein spontaneous cleavage, protein degradation is minimized in our current process.

We also assessed our ability to affinity purify the recombinant mini-spidroins rMaSp1R8 and rMaSp2R8 and subsequently remove the CBD tag by intein activation by monitoring various points in the purification process (Fig. 4). It is clear that the crude extract was essentially depleted of full-length fusion protein post-chitin bead binding (Fig. 4a, b, Lanes 2 and 3). It is also possible to demonstrate that the full-length fusion proteins are efficiently bound to the chitin beads (Fig. 4a, b, Lanes 4, 15 µl of a 50 % slurry of the beads). We routinely note the presence of cleaved fusion protein in the chitin bead fraction for the rMaSp2 proteins suggesting that additional spontaneous intein activation is occurring after binding but prior to incubation with intein activation buffer. After activation of the intein and concentration of the pooled eluates/wash fractions, the majority of the protein is present as monomer rMaSp protein (Fig. 4a, b Lanes 5). While there was some residual protein that remained bound to the beads post-cleavage (compare Lanes 4 and 6, 15 µl of a 50 % slurry of the beads each), it was minimal. It is common to see variable amounts of laddering in different extractions of the rMaSp proteins. This is particularly true, and much more easily seen, in the rMaSp2 samples. It is not clear at this time if this is due to differences in individual extractions from a given plant or to differences among siblings in a particular line.

Fig. 4
figure 4

Purification of recombinant mini-spidroins from crude leaf extracts. a rMaSp1R8; B, rMaAp2R8. Lanes 1, positive control rGST-MaSp1A NTD fusion protein (43 kDa), 12 ng; Lanes 2, 25 µl of crude extracts (~20 µg TSP). Full length recombinant proteins are marked by asterisks; Lanes 3, crude extract post chitin binding (25 µl, ~20 µg TSP); Lanes 4, chitin beads post binding (15 µl of 50 % slurry); Lanes 5, pooled intein cleavage buffer and wash post concentration (5 µl, ~15 µg protein); Lanes 6, chitin beads post intein activation and wash (15 µl of 50 % slurry). Primary antibody was affinity-purified polyclonal rabbit anti-rMaSp1A NTD. Molecular weight marker sizes (in kDa) are shown to left

Mini-spidroin yield in percentage of total soluble proteins were estimated based on the extraction and purification process shown in Fig. 4. Expression levels for untargeted (cytoplasmic) mini-spidroin yields in tobacco leaves are estimated to be 0.7 and 1.9 % TSP for rMaSp1R8 and rMaSp2R8, respectively (Table 2). Purified mini-spidroins were dialyzed against 5 mM ammonium bicarbonate and lyophilized. After extensive freeze drying, the mini-spidroins formed viscous gelatin-like fluids.

Table 2 Mini-spidroin yield

Discussion

Protein production in transgenic plants

Plants are attractive hosts for the production of recombinant proteins for a variety of reasons including increased likelihood of appropriate post-translational modifications compared to prokaryotic expression systems and elimination of the risk of contaminating human/mammalian pathogens, a serious concern in many eukaryotic expression systems. Plants expressing significant levels of recombinant proteins also have the potential to provide a cost-effective way to scale up production.

However, target protein production levels can vary greatly across different transgenic lines (Fig. 3) and even among individuals within a single transgenic line. In some cases, recombinant protein is no longer detectable after a few generations even when the seedlings are germinated under selection. This may be due to gene silencing, a common obstacle for plant-derived proteins and can be especially true if the gene contains highly repetitive sequences (Stam et al. 1997). And, as has been noted recently, the cost and efficiency of downstream technical aspects of “molecular farming” can present significant challenges. In an elastin-like peptide (ELP)—MaSp1 repeat fusion protein study, transglutaminase was required to crosslink lysine and glutamine residue in vitro to produce a multimerized nearly native sized ELP-silk-like protein (Weichert et al. 2014).

Despite these concerns, plants continue to be a frequent choice for heterologous gene expression. Not surprisingly, targeting recombinant protein production to various locations/organelles in plants has been shown to have an effect on expression levels. Expression of a spider silk-like protein (tandem MaSp1 repeat domains) in either the apoplast or ER lumen of Arabidopsis leaves resulted in 8.5 and 6.7 % total soluble protein (TSP), respectively (Yang et al. 2005). That same study found that expression levels in seeds harboring the ER targeting construct can be even higher (18 % TSP). Our yields in tobacco leaves are estimated to be 0.7 and 1.9 % TSP for rMaSp1R8 and rMaSp2R8, respectively. This is significantly lower than the levels cited above for Arabidopsis but our expression levels are comparable to the 2 % TSP yield of an ER-targeted MaSp1 repeat domain construct produced in tobacco leaves (Scheller et al. 2001).

We also see rMaSp-NTD positive bands at higher molecular weights in the crude leaf extracts and at some steps in the purification process that likely correspond to self-assembled multimers. This type of aggregation, that may also display laddering, has been reported by others (Menassa et al. 2004; Sponner et al. 2004, 2005; Guehrs et al. 2008) and these could represent associations of full-length fusion proteins, recombinant proteins lacking the intein-CBD or combinations thereof.

The rMaSp proteins expressed here are qualitatively different than those in many other studies in that we have incorporated native spider silk NTD and CTD domains (in addition to the intein-CBD self-cleavable affinity tag). Spider silk-like proteins with only the repeat domain frequently display solubility issues due to the high hydrophobicity of the repeat domains or premature aggregation (Rammensee et al. 2008; Zhang et al. 2008) whereas our mini-spidroins with flanking NTD and CTD seems easy to be readily soluble. Dramatically, the highly concentrated, purified proteins appear to retain water and remain as a gelatin-like fluid instead of powdered form after lyophilization. The NTD and CTD have been speculated to enhance solubility of spidron proteins (Gao et al. 2013) (Huemmerich et al. 2004a, b), and our observations are consistent with that hypothesis.

The availability of both MaSp1 and MaSp2 mini-spidroins with variable copies of the repeat domains enables us to evaluate their assembly into fibers as individual proteins and in various combinations. Thus, we can examine the distinct features of MaSp1 and MaSp2 proteins and the effect of repeat domain numbers on the mechanical properties of a broad range of biomaterials including fibers, hydrogels, coating materials, etc. These materials are likely to have many applications in medical and engineering related fields.