Introduction

Nonribosomal peptides are produced by large multifunctional enzymes [20]. Nonribosomal peptide synthesis requires at least three functional domains: the adenylation (A) domain, which selects and activates the cognate amino acid; the peptidyl carrier protein (PCP) domain, which transports the activated intermediate; and the condensation (C) domain catalyzes peptide bond formation or cyclization (Cy) domain catalyzes heterocycle formation [15]. The A domain is defined as a “gatekeeper” because of its specific substrate selectivity [15].

Bacillamides, which are nonribosomal peptides, can be categorized as bacillamides A, B, C, D, and E (Fig S1) [1, 9, 10, 16, 22]. The tryptamide thiazole (bacillamide A–D) or tryptamide thiazoline (bacillamide E) motif, typical characteristic of the bacillamides, is a building block present in many bioactive cyclic peptides [23]. Bloudoff et al. [1] established a bacillamide synthesis assay by expression of bacillamide synthetase with a one-protein, six-domain, 265-kDa NRPS.

In our previous study, bacillamide C was isolated from Bacillus atrophaeus C89 associated with sponge Dysidea avara and biosynthetic pathway of bacillamide C is proposed [25]. The nonribosomal peptide synthetase (NRPS) gene cluster is predicted to participate for biosynthesizing bacillamide C (Fig. 1) [26].

Fig. 1
figure 1

Proposed biosynthetic pathway of bacillamide C. Each circle represents an NRPS (EIM09914.1, bacitracin synthetase) enzymatic domain: Ox oxidase domain (EIM09913.1), A: adenylation domain, Cy cyclization domain. C condensation domain, PCP peptidyl carrier protein domain, AADC aromatic l-amino acid decarboxylase domain. The C domain catalyzes peptide bond formation between the thiazole derivative produced by the A1-PCP1 module and Cy-A2-PCP2 module, and l-tryptamine is decarboxylated by the AADC enzyme to produce a bacillamide precursor; subsequently, bacillamide is synthesized after acetyl modifications at the N-terminus bacillamide precursor

By adding sequence comparisons and homology modeling, Eppelmann et al. [7] successfully used this information to decipher the selectivity-conferring code of NRPSs and the site-directed mutagenesis guided by the selectivity-conferring code of NRPS A domains represents a powerful alternative for the genetic manipulation of NRPS biosynthetic templates and the rational design of novel peptide antibiotics. The crystal structure of the A domain (PheA) from gramicidin synthetase bound with l-phenylalanine and adenosine monophosphate was determined [5]. Eppelmann et al. [7] and Challis et al. [4] predicted the binding specificities of different A domains, which depend on remarkably conserved recognition templates that are retained in the substrate-binding pockets. The online tool NRPSpredictor 2 can predict NRPS A-domain specificity in several seconds [17]. Predicting the substrate preference of NRPSs will help revealing their specific biological functions [11].

Here, we provided the in vitro evidence for the hypothesis about the substrates selection of the two adenylation domains in the NRPS of B. atrophaeus C89 by bioinformatics and in vitro recombinant enzyme kinetics analyses. Our current results pave the way for design of more effective analogs.

Materials and methods

Bacterial strains, plasmids, and culture conditions

Bacillus atrophaeus C89 (CCTCC AB 2016282) was isolated from the sponge D. avara in the South China Sea [14]. The plasmid pET28a (TransGen, Beijing, China) was used for expressing A1 domain. The plasmid pEASY™-E1 (TransGen, Beijing, China) was used for expressing A2 domain. Escherichia coli Trans1-T1 (TransGen, Beijing, China) was used for propagating the plasmids, and E. coli BL21 (DE3) (TransGen, Beijing, China) was used as a host for expressing the A1 and A2 domains. B. atrophaeus C89 was incubated at 28 °C in a liquid medium containing 0.5% beef extract and 1% peptone. E. coli Trans1-T1 and BL21 were grown in Luria–Bertani (LB) medium at 37 °C.

DNA isolation and PCR amplification of the target genes

The genomic DNA of B. atrophaeus C89 was extracted by the modified Marmur method (Marmur 1961). The strain was grown in a liquid medium containing 0.5% beef extract and 1% peptone at 28 °C for 24 h and then collected by centrifugation at 12,000×g for 5 min. The cells were washed twice with 1 ml of TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0) and centrifuged at 12,000×g for 5 min. Followed by cells were suspended with 500 μL of TE buffer and treated with lysozyme (2 mg/mL) at 37 °C for 1 h. Then, the cells were treated with proteinase K (20 mg/ml) and 20% sodium dodecyl sulfate (SDS) at 55 °C for 1 h. The supernatant was collected by centrifugation after lysis. The lysate was extracted with an equal volume of phenol/chloroform/isoamyl alcohol (25: 24: 1, V: V: V) twice and washed with an equal volume of chloroform/isoamyl alcohol (24: 1, V: V). DNA was precipitated by adding an equal volume of isopropanol at 4 °C and then harvested by centrifugation at 12,000×g for 15 min. The pellets were washed with 500 μl of 70% ethanol, resuspended in 30 μl of TE buffer, and stored at − 20 °C. Primers A1F (5′-CGGGATCCATCATTTCGGAAG AAGA-3′)/A1R (5′-GGAATTCTCAGGCGGCATATGGGAT-3′) and A2F (5′-CTGTTGGAACAATTGGTGAAAC-3′)/A2R(5′-TCAAGTTTTTGGAGCAATATATACC-3′) for the amplification of A1 domain and A2 domain genes were designed, respectively, according to the sequence of NRPS gene cluster (7011 bp) in the genome of B. atrophaeus C89 (Genbank No. JQ 687535) [13]. Trans Taq™-T DNA Polymerase (Transgen, China) was used in the PCR amplification, and PCR conditions were as follows: initial denaturation at 94 °C for 5 min; followed by 30 cycles of 94 °C for 30 s, 53 °C for 30 s and 72 °C for 90 s; and a final extension of 10 min at 72 °C. PCR products were purified using Cycle Pure kit and Gel Extraction kit (Axygen, USA). The purified PCR product of A1 domain genes (1578 bp BamHI/EcoRI fragment) was ligated into pET28a digested with the same restriction enzymes to generate plasmid A1. The purified PCR product of A2 domain gene was ligated into the pEASY™-E1 expression vector (Transgen, China) to generate plasmid A2. The competent E. coli Trans1-T1 was used as the host for both plasmids. Sequences of the target genes were sequenced using T7 promoter and T7 terminator primer.

Substrate prediction of A1 and A2 domains

The A1 and A2 domain amino acid sequence of the NRPSs from B. atrophaeus C89 was analyzed using NRPSpredictor 2, which can predict bacterial and fungal NRPS A-domain specificity [4]. Amino acid sequences of the A1 and A2 domains and the homologous proteins were analyzed using Discovery Studio (Version 3.5). The 1AMU, 3FCE, 3L8C, 3E7 W, and 3ITE proteins were identified using the Protein Data Bank (PDB). Structural models of the A1 and A2 domains were constructed on the basis of the crystal structures of the homologous proteins [18]. The models of the two A domains were designed using the homology modeling protocol of Discovery Studio. The predictive structural models of the A1 and A2 domains were evaluated using Ramachandran plot.

Structural superposition with 1AMU showed that the structural models of the A1 and A2 domains share a high sequence similarity. To obtain more accurate results, AMP was added to the binding pockets of the structural models of the A1 and A2 domains according to the position of the AMP-binding pocket in 1AMU. The composite structural models were energy-minimized using the “Smart Minimizer” method of CHARMM force field in Discovery Studio (Chemistry at HARvard Macromolecular Mechanics, http://www.charmm.org/) [2, 3]. Molecular docking between different amino acids and the structural models of the A1 and A2 domains was performed using the CDOCKER protocol [24]. CDOCKER is a semiflexible docking program based on CHARMM in which the ligands were docked to their receptors using soft-core potentials and an optional grid representation.

Expression and purification of recombinant A1 and A2

A single positive colony was inoculated in 5 mL of LB medium containing of kanamycin (50 μg/mL) or ampicillin (100 μg/mL) at 37 °C for 12 h. The overnight culture was used to inoculate LB medium containing antibiotics and incubated at 37 °C with vigorous shaking until OD600 nm reached 0.6 and isopropyl-β-d-1-thiogalactopyranoside was added to the medium at a final concentration of 0.5 mM and incubated at 16 °C for 15 h. The cells were centrifuged and resuspended in a binding buffer (300 mM NaCl, 20 mM Tris–HCl, pH 8.0), and lysed through sonication. The debris was removed through centrifugation at 12,000×g for 20 min. The supernatant was filtered through a 0.45-μm Millipore filter and loaded onto an equilibrated Ni–NTA agarose resin column (Qiagen Co., Hilden, Germany). The target protein was eluted with an elution buffer (300 mM NaCl, 20 mM Tris–HCl, 200 mM imidazole, pH 8.0) using gravity flow after the residual proteins were removed with a wash buffer (300 mM NaCl, 20 mM Tris–HCl, 50 mM imidazole, pH 8.0). The purified protein was analyzed using SDS-PAGE in a 10.0% (w/v) polyacrylamide gel. The target protein dissolved in the elution buffer was then transferred to a solution buffer (50 mM NaCl, 20 mM Tris–HCl, pH 8.0) using Millipore 10-kDa-MWCO ultrafilters. The purified protein concentration was determined using the Bradford assay (Sangon, Shanghai, China) and the protein was stored at − 80 °C.

Activity and dynamics analyses of A1 and A2 domains

The activities of the A1 and A2 domains were measured by monitoring the release of PPi at 360 nm continuously for 30 min using an Multiskan Spectrum (Perkin Elmer, Connecticut, USA). The EnzChek pyrophosphate assay kit (MicroProbes) was used to estimate the concentration of PPi. The standard curve of the pyrophosphate assay was generated using standard pyrophosphate. To determine the initial velocity with 19 types of standard essential amino acids as substrates, 10 µL of a suitably diluted A1 or A2 was added to each 100-µL reaction system. Furthermore, 4-mM substrates and 2-mM ATP were added to each reaction system. The velocity was calculated on the basis of consecutive increases in the absorbance at 360 nm. The reactions were conducted in triplicate, with boiled A1 or A2 as the control.

Enzyme kinetics comparison of cysteine, alanine, and glycine for A1 domain and cysteine and serine for A2 domain were performed by varying the concentration of each substrate (0.01–0.5-mM alanine, 1–4-mM glycine, and 1–7-mM cysteine for the A1 domain; 0.05–1-mM cysteine and 0.1–2.5-mM serine for the A2 domain). To each 100-µL reaction system, 20 μL of the suitably diluted enzyme was added with the final concentration of 2-mM ATP. All reactions were conducted in triplicate. The velocity was calculated on the basis of consecutive increments in the absorbance at 360 nm for 30 min. The Michaelis–Menten equation was fitted to the velocity of the PPi release and substrate concentration to calculate the values of Km and Kcat using GraphPad Prism 5.

Results

Substrate prediction and structural modeling in silico of A1 and A2 domains

Figure S2 shows the specificities of the A1 and A2 domains predicted using NRPSpredictor 2 (http://nrps.informatik.uni-tuebingen.de). Signatures (active sites) of the A1 and A2 domains were extracted automatically from the full-length NRPS sequences (FIMAFDISLLEIESLLAGELNIYGPTETTLCAAL/DILELAILCK for the A1 domain, and TDISFDLSVYDGNTLHSGDISLGGATEASIWSIY/DLYNLSLIWK for the A2 domain). The specificity predictors that provide positive predictions for the signatures are listed in Fig. S2. The prediction showed that the large cluster of the A1 domain was Gly = Ala = Val = Leu = Ile = Abu = Iva. Furthermore, the prediction was 100% for cysteine to the A2 domain (Fig. S2). Figure 2a shows the structural models of the A1 and A2 domains based on 1AMU (Table 1) by homology modeling. Reasonable conformations of amino acids in proteins were predicted using a Ramachandran plot. In Fig. S3, points located inside the blue and red circles are reasonable for the structural model, whereas points located outside the red circle are not reasonable for the structural model. Evaluating the structural model of the A1 domain revealed that 97.3% of the points were located within the reasonable area (Fig. S3a). Evaluating the structural model of the A2 domain revealed that 97.4% of the points were located within the reasonable area (Fig. S3b). Both results are higher than the bottom line, which contained only 90% of the points. The structural models of the A1 and A2 domains constructed on the basis of homologous proteins were used for subsequent analyses.

Fig. 2
figure 2

Protein structure models of two A domains constructed using homology modeling. a A1 domain; b A2 domain. The α-helix, β-sheet, β-loop, and random coil are represented in red, blue, green, and white, respectively

Table 1 Homology analysis between temple proteins, A1 domain, and A2 domain

Active site prediction of A1 and A2 domains

Structural superimposition showed that the A1 and A2 domains shared a high sequence and tertiary structure similarity with 1AMU (Fig. 3a). According to the structural models (Fig. 3b), AMP interacted with Thr172, Thr312, Lys499, Asp396, Gly286, and Ile308 of the A1 domain through hydrogen bonding (Fig. 3b1). Similarly, AMP bonded with Thr158, Thr300, Lys490, Asp386, Gly273, and Trp275 of the A2 domain through hydrogen bonding (Fig. 3b2). The results were highly similar to those of the AMP-binding pocket in 1AMU. The optimized structural models of the A1 and A2 domains bound to AMP were because of the receptors formed during molecular docking. The positions of the substrate-binding pockets in the A1 and A2 domain structure models were discovered by analyzing the 1AMU crystal structure. The substrate-binding pockets of the A1 and A2 domain structure models comprised Asp215, Leu316, Ile216, and Ala285 and Asp203, Ile304, Ser272, and Lys490, respectively. Hence, alanine, threonine, glycine, serine, cysteine, valine, leucine, asparagine, methionine, glutamine, proline, and isoleucine were chosen as ligands, whereas the other amino acids were too large to fit in the substrate-binding pockets.

Fig. 3
figure 3

a Superimposition of A-domain structure model and 1AMU. (1) A1 domain; (2) A2 domain. The structural models of the A1 and A2 domains are represented in yellow; the structural model of 1AMU is shown in green. b Interaction between AMP and the A-domain structure model. (1) A1 domain; (2) A2 domain. The structural models of the A1 and A2 domains are represented as green cartoons. The AMP molecule and residues connected directly with AMP are represented as stick models. Carbon, oxygen, nitrogen, and phosphorous atoms are shown in green, red, blue, and orange, respectively. Dashes indicate hydrogen bonds. c Interaction of amino acids with the A–AMP composite. (1) Alanine with the A1–AMP composite; (2) cysteine with the A2–AMP composite. The structural models of the A1 and A2 domains are represented as green cartoons. The substrates, AMP molecule, and residues connected directly with the substrates are represented as stick models. Carbon, oxygen, nitrogen, phosphorous, and sulfur atoms are shown in green, red, blue, orange, and yellow, respectively. Dashes indicate hydrogen bonds. The figure was generated using PYMOL

The side chain of the alanine methyl group was located in a hydrophobic zone that was constructed by Ile216, Ala285, and Leu316 (Fig. 3c 1). Although alanine is nonpolar, its side chain is too small, indicating that all the residues lining the specificity pockets may not be hydrophobic. During molecular docking, some polar amino acids such as threonine and cysteine may be compatible with the A1 domain. The amino, carboxyl, and sulfydryl groups of cysteine are bound to the side chains of Asp203 and Ile304, Lys490, and Ser272 by hydrogen bonds, respectively (Fig. 3c2). The binding energy of serine is closest to that of cysteine among all types of amino acids possibly because of the high similitude between cysteine and serine. In addition to cysteine and serine, glycine and alanine were predicted as potential substrates of the A2 domain during molecular docking simulation. Glycine and alanine may easily enter the binding pocket of the A2 domain during the simulation, because glycine has no side chain and alanine has a small side chain. Threonine seems to be representative of the substrates incompatible with the A2 domain.

The molecular docking simulation was performed on the basis of virtual structural models rather than the real crystal structures of the A1 and A2 domains. Thus, more accurate results of substrate specificities of the two A domains could be obtained through in vitro analyses of recombinant enzyme kinetics.

Heterologous expression and characterization of A1 domain and A2 domain

The 58.8-kDa N-terminally His6-tagged A1 domain and 59.5-kDa N-terminally His6-tagged A2 domain were expressed in E. coli BL21 (DE3) and purified using affinity chromatography, respectively (Fig. 4a, b). The protein purity was determined using sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) analysis.

Fig. 4
figure 4

SDS-PAGE of purified recombinant A domains. a A1 domain; b A2 domain; M: Protein marker; Lane 1: unpurified sample; Lane 2: purified protein

The amount of inorganic pyrophosphate (PPi) released from the enzyme reaction systems with various substrates was calculated on the basis of consecutive increases in absorbance at 360 nm for 30 min. The results showed that cysteine and glycine were activated with higher reaction velocities than alanine (Fig. 5a1). Although the turnover numbers (kcat) of cysteine and glycine were higher than that of alanine, the catalytic efficiency (kcat/Km) of the A1 domain for alanine (6.972) revealed by dynamics analyses was much higher than that for the other two substrates (Fig. 5b (1) and Table 2). This result proved that alanine was the optimal substrate for the A1 domain.

Fig. 5
figure 5

a Activation of A1 domain (1) and A2 domain (2) by various amino acids (4 mM). C, G, A, P, S, V, T, L, R, N, F, M, K, Q, I, D, W, H, and E represent cysteine, glycine, alanine, proline, serine, valine, threonine, leucine, arginine, asparagine, phenylalanine, methionine, lysine, glutamine, isoleucine, aspartic acid, tryptophan, histidine, and glutamic acid, respectively. The heat-denatured A1 domain or A2 domain with cysteine as substrate has been as the control (C1). Pyrophosphate (PPi) release was detected using a continuous spectrophotometric assay with a 30-min reaction. Values are presented as mean ± standard deviation. b (1) Determination of Michaelis constant curve of cysteine, alanine, and glycine for the A1 domain; (2) determination of Michaelis constant curve of cysteine and serine for the A2 domain

Table 2 Kinetic parameters for the activation of Cys, Gly, and Ala for A1 domain

The initial velocities of the A2 domain for 19 types of amino acids were calculated on the basis of the release of PPi. The initial reaction velocities of cysteine and serine were substantially higher than those of the other amino acids (Fig. 5a2). To prove cysteine is the optimal substrate for the A2 domain, dynamics analyses of cysteine and serine were performed (Fig. 5b (2)). The results indicated that the turnover number (k cat ) and catalytic efficiency (kcat/Km) of cysteine were 2.3- and 6.74-fold higher than those of serine, respectively (Table 3). The results of initial velocity and kinetics analyses proved that cysteine was the optimal substrate for the A2 domain and could be essential for synthesizing bacillamide C. In contrast to the A1 domain, the A2 domain seems to have an indiscernible capacity to accept various substrates; that is, the A2 domain is more conservative than the A1 domain.

Table 3 Kinetic parameters for the activation of Cys and Ser for A2 domain

Discussion

Predicting substrate specificity of A domain could be seen as deciphering the nonribosomal code of NRPS for oligopeptide biosynthesis. Conti et al. [5] determined the crystal structure of PheA with bound l-phenylalanine and adenosine monophosphate. Stachelhaus et al. [21] deciphered the specificity-conferring codes of the NRPS A domains by comparing the amino acid sequence of PheA with 160 other A domains and then discovered the positions of 10 residues that formed the substrate-binding pockets, and the similarities of the signature sequences activating alanine and cysteine were 55 and 88%, respectively [21]. The prediction of NRPS A-domain substrate specificity showed that the large cluster of the A1 domain was gly = ala = val = leu = ile = abu = iva and the prediction was 60% for hydroxy-phenyl-glycine (hpg) for the A1 domain. Furthermore, the prediction was 100% for cysteine for the A2 domain (Fig. S2). The specificity-conferring codes of the A1 and A2 domains were identified and the results are consistent with our homology modeling results (Fig. 2).

The present homology-based research method used for identifying the specificity-conferring codes of the target A domains was more accurate than previous methods. Similarly, Schneider et al. [19] discovered several residues attaching directly to substrates in the binding pocket of a CoA ligase using homology modeling based on the crystal structure of PheA and proved the importance of substrate specificity using site-directed mutagenesis. In the present study, homology modeling showed the positions of the substrate-binding pockets of the A1 and A2 domains, and the amino acid residues Asp215, Leu316, Ile216, and Ala285 attached directly to the substrates in the structural models of the A1 domain, and Asp203, Ile304, Ser272, and Lys490 attached directly to the substrates in the structural models of A2 domain (Fig. 3). Other unknown factors may influence the substrate selection of the A1 and A2 domains. Moreover, the compatibility of other larger essential amino acids with the two A domains remains unknown. Although cysteine and glycine had higher kcat values than did alanine, enzyme kinetics experiments suggested that alanine had the highest kcat/Km value and the lowest Km value for the A1 domain (Fig. 5 and Table 2). Regarding the A2 domain, cysteine not only had the highest kcat value but also had the lowest Km value (Fig. 5 and Table 3). Therefore, in this study, by in silico analysis and in vitro enzyme assay, we proved that alanine and cysteine are the optimal substrates for the A1 and A2 domains for synthesizing bacillamide C, respectively. This may explain why bacillamide C is the main product of bacillamides produced by B. atrophaeus C89 [25].

On the other hand, molecular docking (based on homology modeling) results suggested that the A1 and A2 domains can also catalyze other amino acids besides alanine and cysteine. Similarly, the A domains of the other NRPS modules showed a broad substrate-activating capacity. For example, Dieckmann et al. [6] discovered a Phe-activating A domain that also catalyzed other amino acids with D- or L-conformations, such as alanine and tyrosine. Huang et al. [8] found a 3-hydroxypicolinic-acid-activating A domain that catalyzed a substantial number of its natural substrate analogs, some of which had higher kcat values than did its natural substrate. However, not all A domains can activate a broad range of substrates. For instance, a hydroxamate-amino-acid-activating A domain was found to specifically catalyze its natural substrate [12]. Zhang et al. [27] reported yeast cell surface display method to engineer the substrate specificity of A domains to select millions of A-domain mutants.

In this study, ultimately, the broad substrate-activating capacity of the two A domains of the NRPSs in B. atrophaeus C89 may have potential for synthesis of bacillamide analogs using different amino acid substrates. Furthermore, point mutating target residues in the binding pockets also can produce novel bacillamide derivatives.