Introduction

Amino acids are the central building blocks of proteins, peptides, nonribosomal peptides (NRPs), and hybrid nonribosomal peptide–polyketides (NRP–PKs). In the biosynthesis of these biomolecules, the carboxy group of an amino acid is generally activated as an aminoacyl adenylate or aminoacyl phosphate with adenosine triphosphate (ATP). The mixed acid anhydrides generated between the carboxylic acids and phosphoric acids are excellent electrophiles, and thus efficiently react with nucleophiles, such as thiols, to form covalent linkages. In the biosynthesis of NRPs and hybrid NRP–PKs, an adenylating enzyme, which is generally composed of 500–600 amino acid residues, activates the carboxy group of an amino acid as an aminoacyl adenylate and ligates the aminoacyl group with the thiol group of the phosphopantetheine arm in the holo form of the carrier protein (CP), which is generally composed of 90–100 amino acid residues, to give an aminoacyl thioester (Scheme 1). Thus, the adenylating enzyme catalyzes the two nucleophilic substitutions for adenylate and thioester formation by releasing inorganic diphosphate (PPi) and adenosine monophosphate (AMP), respectively. The generated aminoacyl-CP thioester functions as an electrophile for amide formation in the nonribosomal peptide synthetase (NRPS) system (Scheme 2a) and for transesterification with the cysteine thiol group in the β-ketosynthase domain leading to subsequent C–C bond formation in the hybrid NRPS–polyketide synthase (PKS) system (Scheme 2b). In the NRPS system, an aminoacyl-CP thioester undergoes nucleophilic attack by the amino group of another aminoacyl intermediate in the condensation enzyme to form an amide bond to give a dipeptidyl-CP thioester in a stepwise manner. It should be noted that the reaction order is strictly regulated by these two types of enzymes to biosynthesize the programmed structures of natural products in the genome of producer strains. In general, the adenylation enzymes accept the holo form of a CP thiol, but do not accept the aminoacyl-CP thioester, to prevent undesired amide bond formation (Scheme 2c).

Scheme 1
scheme 1

Two-step nucleophilic substitution in the adenylation enzymatic reaction

Scheme 2
scheme 2

Condensation reaction in the condensation domain of NRPS and ketosynthase domain in hybrid NRPS–PKS

The amino acid substrate specificity of the adenylation enzymes makes a large contribution in determining the chemical structure of natural products. The information of the substrate specificity of the adenylation enzymes is thus important for rational genetic engineering of NRPS assembly lines to generate novel peptides for drug discovery [5, 6, 12]. Structural information of adenylation enzymes also helps rational engineering of adenylation domains of NRPSs to introduce unnatural amino acids into the peptide structures [80]. Therefore, the mechanism of substrate recognition and the conformational dynamics of adenylation enzymes have been extensively studied [36]. The recognition of an amino acid substrate and the nucleophilic substitution with ATP to afford the aminoacyl adenylate and PPi occur at the active site of the adenylation enzyme. Thus, the positions of ATP and the carboxy group of the substrate amino acid must be precisely fixed to make the appropriate Bürgi–Dunitz angle for nucleophilic attack of the carboxylic acid to the α-phosphorous atom of ATP. To form the required in-line situation, in addition to the carboxy group, the α-amino group of an l-α-amino acid and the side-chain at the α position must be properly accommodated at the active site of the adenylation enzyme. The structural analysis of adenylating enzymes and detailed primary sequence comparisons using the Hidden Markov Model have provided important information to allow the prediction of the substrate specificity of functionally uncharacterized adenylation enzymes [15, 50, 95]. A bioinformatic prediction tool for the amino acid substrate is available with high accuracy for the bacterial adenylation enzymes selective for the 20 proteinogenic l-α-amino acids [8, 50], although the tool does not truly apply to fungal adenylation enzymes because of significant sequence differences. The tool was established based on the structural analysis of the adenylation domain PheA in GrsA, which activates l-α-phenylalanine in gramicidin biosynthesis (Fig. 1) [19]. The crucial ten amino acid residues (A1-position—A10-position), the so-called nonribosomal code [108] or specificity-conferring code [95], are believed to be involved in the substrate recognition, based on information from the PheA structure, and thus are used to predict the amino acid substrates for adenylation enzymes. Many excellent reviews concerning the nonribosomal code for proteinogenic l-α-amino acids with structural information on the adenylation enzymes have been published [14, 62, 96, 98]. However, information regarding adenylation enzymes that recognize nonproteinogenic amino acids is limited, even though nonproteinogenic amino acids make a substantial contribution to the structural diversity of NRPs and hybrid NRP–PKs, resulting in the important biological activities of these compounds [109] (Fig. 2). For example, the well-known immunosuppressant FK506 (tacrolimus) contains pipecolic acid; the lipopeptide antibiotic daptomycin contains l-ornithine, (2S,3R)-3-methylglutamic acid, and kynurenine; the antibiotic rifamycin belonging to the ansamycin family contains 3-amino-5-hydroxybenzoic acid as the starter unit of the polyketide skeleton; and the anticancer drug bleomycin contains β-Ala in its unique hybrid NRP–PK structure. Although some of the proteinogenic amino acids can be modified by enzymes, including epimerases, N-methyltransferases, O-methyltransferases, and heterocyclization and oxidation domains, in the NRPS assembly line, many nonproteinogenic amino acids that are found in NRPs and hybrid NRP–PKs are independently biosynthesized and incorporated into the main NRPS and hybrid NRPS–PKS assembly lines (see the daptomycin structure in Fig. 2) [98, 109]. Thus, the selective activation and thioesterification of nonproteinogenic amino acids by adenylating enzymes is naturally occurring. Structural analysis of NRPSs and PKSs has provided valuable information to better understand the substrate recognition mechanism in the megasynthase [14, 62, 96, 98]. In this review, we describe the structural basis of nonproteinogenic amino acid selective adenylation enzymes in NRPSs and hybrid NRPS–PKSs and discuss the nonribosomal code for nonproteinogenic amino acids. Although the associated proteins, such as condensation enzymes and CPs, affect the enzymatic activity by interacting with the C-terminal Asub domain of adenylating enzymes [58, 68], this review will focus on the active site of the adenylating enzymes, the N-terminal Acore domain, which must accommodate the correct substrate amino acid.

Fig. 1
figure 1

Active site of l-Phe selective adenylation enzyme PheA. Magenta color represents substrate amino acid. Red color represents acidic residues. Blue color represents basic residues. Green color represents polar residues

Fig. 2
figure 2

Representatives of nonproteinogenic amino acid containing natural products. Color indicates nonproteinogenic amino acids. Red color nonproteinogenic amino acids are recognized by adenylation enzymes. Magenta color nonproteinogenic amino acids are d-amino acids, which are constructed in the NRPS assembly line

Structural analysis of amino acid adenylation enzymes from 1997 to 2018

Accumulation of information on the genome sequence and the functional characterization of adenylating enzymes have accelerated accurate bioinformatics predictions. However, structural information regarding nonproteinogenic amino acid selective adenylation enzymes is still limited. The amino acid adenylation enzymes that have been structurally characterized to date are summarized in Table 1. In this review, we focus on amino acid adenylation enzymes and do not include enzymes that recognize hydroxybenzoic acid derivatives, such as salicylic acid and 2,3-dihydroxybenzoic acid. The active sites of the representative adenylation enzymes PheA [19] and VinN [73], the latter of which incorporates (2S,3S)-3-methyl-l-aspartic acid (MeAsp), a β-amino acid, in vicenistatin biosynthesis, are shown in Fig. 3.

Table 1 Structural analysis of amino acid adenylation enzymes
Fig. 3
figure 3

The active sites of the crystal structures of PheA a [19] and VinN b [73]. a PheA contains l-Phe (yellow stick) and AMP (blue stick) at the active site. Nonribosomal-code amino acids are shown as green sticks. b VinN contains (2S,3S)-MeAsp (yellow stick) at the active site. Nonribosomal code amino acids are shown as green sticks. The A10 Lys residue is not shown

Common features of amino acid adenylation enzymes

The Asp235 (A1-position) and Lys517 (A10-position) residues of PheA (according to the amino acid residue of PheA, the nonribosomal code numbering will be used throughout) are highly conserved in amino acid adenylating enzymes and recognize the substrate amino acids, including d-α-amino acids and β-amino acids (Table 2). The C-terminal Lys517 (A10-position) in the Asub domain, which is involved in the recognition of the carboxy group of an amino acid and the α-phosphate group of ATP, which react to afford the acyl adenylate intermediate, is completely conserved in the ANL (Acyl-CoA synthetase, NRPS adenylation domain, and Luciferase) family of enzymes. The Lys residue at the A10-position activates the α-phosphate of ATP and prompts nucleophilic substitution with the carboxy group by displacing the inorganic diphosphate. Along with the release of the inorganic diphosphate, the Asub domain is rotated 140° and the Lys517 (A10-position) residue is thus oriented away from the active site where the acyl adenylate intermediate waits for the nucleophile CP [86]. The acceptor CP accesses the thioesterification form of the adenylating enzyme and then the adenylating enzyme catalyzes the second nucleophilic substitution of the aminoacyl adenylate with the thiol of the holo CP to afford the aminoacyl thioester by releasing AMP. In the conformational movement of the C-terminal Asub domain, an aromatic amino acid residue (Trp, Phe, or His in usual NRPS adenylating enzymes) that is placed before the highly conserved Asp235 (A1-position) is critically involved.

Table 2 Nonribosomal codes of adenylation enzymes in figures in this manuscript

The highly conserved Asp235 (A1-position) in amino acid adenylating enzymes is involved in the recognition of the amino group of the substrate amino acids (Table 2). The adenylation enzyme IdnL1 that is selective for the β-amino acid (S)-3-aminobutyric acid in incednine biosynthesis has an Asp residue at the A1-position that appears to interact with the β-amino group [17]. The d-glutamic acid selective adenylation enzyme McyE in microcystin biosynthesis has an Asp residue at the A1-position that appears to interact with the amino group of d-glutamic acid despite the fact that the γ-carboxylic acid is activated [69]. The l-α-aminoadipic acid selective adenylation enzyme in α-aminoadipate reductase also has an Asp residue at the A1-position that appears to interact with the amino group despite the δ-carboxylic acid being activated [47]. In contrast, aryl acid selective adenylation enzymes do not have an acidic residue at the A1-position and are able to accommodate the aromatic ring of aryl acids [24, 65, 83, 98]. The acidic amino acid residue that is the first amino acid residue of the nonribosomal code (A1-position) appears to interact with the amino group, even of β-amino, γ-amino, and δ-amino acids. The carboxy and the amino groups of the substrate amino acids appear to form two salt bridge interactions with the enzyme Lys and Asp (occasionally Glu in the 2-aminoadipic acid selective enzyme) residues [50], respectively. The long main chains of nonproteinogenic amino acids (in the case of β-amino, γ-amino, and δ-amino acids) seem to be somehow folded, and the corresponding adenylation enzymes form an appropriate space to accommodate the long main chains and the unique side-chains. The other eight amino acid residues of the nonribosomal code (A2-position—A9-position) are located at the active site and seem to be involved in the recognition of the side-chains of the amino acids. In this review, we focus on the eight amino acid residues of the nonribosomal code (A2-position—A9-position), which are believed to be key determinants for particular amino acid substrates, including nonproteinogenic amino acids. Based on the structures of some nonproteinogenic amino acid selective adenylation enzymes, we discuss the current knowledge of the substrate recognition mechanisms of nonproteinogenic amino acids.

d-α-Amino acid selective adenylation enzyme, DltA (a structurally elucidated enzyme)

DltA selectively recognizes d-Ala over l-Ala in the D-alanylation of lipoteichoic acid in the cell wall of Gram-positive bacteria (Fig. 4a) [25, 26, 114]. DltA has a Cys residue (Cys269) at the A6-position, which can only accommodate the small hydrogen atom and would clash with the methyl group of l-Ala. The C269A mutant showed a similar affinity for both d-Ala and l-Ala, suggesting that the role of this Cys residue is as a gatekeeper to accommodate d-Ala only [25, 114]. In l-α-amino acid selective adenylation enzymes, the amino acid residue at the A6-position is often Gly or Ala, which are likely small enough to accommodate the side-chain of l-α-amino acids (Table 2). The small Ala residue at the A9-position of DltA seems to indirectly expand the active site of DltA, with the Val residue at the A8-position and the Leu residue at the A2-position, to accommodate only the methyl group of d-Ala. This nonribosomal code is conserved in LnmQ, which selectively recognizes d-Ala in leinamycin biosynthesis [100].

Fig. 4
figure 4

Active sites of structurally elucidated enzymes. a Active site of d-Ala selective adenylation enzyme DltA. b Active site of MeAsp selective adenylation enzyme SlgN1. c Active site of cis-AMHO selective adenylation enzyme SidNA3. Orange color codes are possibly involved in the recognition of the side chain of substrate amino acid

Modified α-amino acid selective adenylation enzymes (structurally elucidated enzymes)

SlgN1 is an adenylating enzyme that has a MbtH-like domain at the N-terminus and recognizes (2S,3S)-MeAsp, as an α-amino acid, in streptolydigin biosynthesis (Fig. 4b) [39]. SlgN1 has a Gly residue at the A6-position suggesting that it is able to accommodate the side-chain of an l-α-amino acid. The nonribosomal code of SlgN1 is different to that of l-Asp selective enzymes [50]. SlgN1 has a Gln residue at the A4-position, which seems to interact with the β-carboxy group. In contrast, l-Asp selective adenylating enzymes have a Lys residue at the A4-position that interacts with the β-carboxy group of l-Asp. The weak interaction between the Gln residue at the A4-position of SlgN1 and the β-carboxy group of the substrate amino acid might provide the flexibility to accommodate the β-methyl group of (2S,3S)-MeAsp. Because there is no biochemical data available, it is difficult to speculate further on the recognition mechanism of SlgN1.

SidNA3 (the third adenylation domain of NRPS SidN) recognizes the large Nδ-cis-anhydromevalonyl-Nδ-hydroxy-l-ornithine (cis-AMHO) residue in fungal siderophore biosynthesis in Neotyphodium lolii Lp19 (Fig. 4c) [56]. Docking analysis of the SidNA3 structure with cis-AMHO suggests that the main-chain carbonyl oxygen of Ile (A8-position) is involved in the recognition of the Nδ-hydroxy group. SidNA3 has a Gly residue at the A6-position suggesting that it is able to accommodate the long side-chain of an l-α-amino acid in a similar manner to a typical l-α-amino acid adenylating enzyme. Because of the large size of the amino acid, the additional amino acid residues that are involved in the recognition of the side chain of cis-AMHO are proposed to be residues lining the binding pocket. Accumulation of structural information on fungal adenylating enzymes with ligand and biochemical characterizations will enhance our knowledge of the substrate recognition mechanisms of nonproteinogenic amino acid adenylation enzymes selective for large substrates.

Modified α-amino acid selective adenylation enzymes (functionally characterized, but with no protein structures)

Nowadays, the genome sequence analysis of microorganisms is easily accessible, and thus, the target biosynthetic gene clusters of natural products can be rapidly identified based on the chemical structures [10, 22, 67, 93, 94]. The target nonproteinogenic amino acid selective adenylation domains can be predicted based on the domain structures of NRPS and hybrid NRPS–PKS enzymes without functional analysis. Based on bioinformatics analysis, many adenylating enzymes have been characterized in vitro that are selective for nonproteinogenic amino acids in NRPs and hybrid NRP–PKs. In most reports of the functional analysis of adenylation domains, the nonribosomal codes for l-α-amino acid selective enzymes are used to discuss the substrate amino acid recognition mechanisms. The similarity in nonribosomal codes among structurally similar amino acid selective enzymes suggests there may be a plausible substrate amino acid recognition mechanism to allow prediction of the substrates. However, in most cases, the precise role of the nonribosomal codes is unclear, and thus, structural analysis of the enzymes is required to understand the recognition mechanism for accurate amino acid substrate prediction.

CmaA is a standalone adenylation enzyme that recognizes l-allo-isoleucine in coronatine biosynthesis [20]. Kinetic analysis of CmaA with several substrates showed that the kcat and Km values varied depending on the type of the fused FLAG-Tag or His-Tag. l-Ile selective adenylation enzymes usually have a Phe residue at the A4-position and a Leu residue at the A5-position, while CmaA has a Leu residue at the A4-position and a Tyr residue at the A5-position (Fig. 5a). These residues may determine the substrate specificity. The adenylation domain Alb04A2* recognizes l-cyanoalanine in albicidin biosynthesis [18]. Alb04A2* has an l-Asn type nonribosomal code and also recognizes l-Asn (Fig. 5b). It is unclear how Alb04A2* distinguishes between l-cyanoalanine and l-Asn in the producer strain. The adenylation domain of the hybrid NRPS–PKS SylD recognizes 3,4-dehydro-l-lysine in syringolin biosynthesis [112]. l-Lys is also activated with approximately 50% efficiency. The nonribosomal code of the SylD adenylation domain is different from that of any proteinogenic l-α-amino acid adenylation enzyme. The two Glu residues at the A4-position and at the A9-position might be involved in the recognition of the ε-amino group of 3,4-dehydro-l-lysine (Fig. 5c). In sulfazecin biosynthesis, d-Glu and l-2,3-diaminopropionic acid are recognized by the adenylation domain of SulI (SulI A1-domain) and the second adenylation domain of SulM (SulM A3-domain), respectively [58]. The nonribosomal code of SulI suggests that the His residue at the A4-position may be involved in the recognition of the γ-carboxy group of d-Glu (Fig. 5d). The nonribosomal code of SulM A3-domain suggests that three acidic residues, Glu at the A4-position, Asp at the A8-position, and Asp at the A9-position, may be involved in the recognition of the β-amino group of l-2,3-diaminopropionic acid (Fig. 5e). The amino acid residue at the A4-position seems one of the main contributors to the recognition of the terminal side-chain functional groups of the amino acid substrates (Fig. 5).

Fig. 5
figure 5

Model structures of modified α-amino acid selective adenylation enzymes. a Model active site of l-allo-Ile selective adenylation enzyme CmaA. b Model active site of l-cyano-Ala selective adenylation enzyme Alb04A2*. c Model active site of 3,4-dehydro-l-Lys selective adenylation enzyme SylD A1. d Active site model of d-glutamic acid selective adenylation enzyme SulI. (E) Active site model of l-2,3-diaminopropionic acid selective adenylation enzyme SulM A3. Orange color codes are possibly involved in the recognition of the side chain of substrate amino acid. Blue color codes are the conserved residues in the related proteinogenic α-amino acids selective adenylation enzymes

Proline-related amino acids, including pipecolic acid and 4-alkylproline derivatives, are also found in NRPs and hybrid NRP–PKs. The adenylation domain of the NRPS module FkbP selectively recognizes l-pipecolic acid in FK520 biosynthesis [35]. Homologous adenylation domains in rapamycin and FK506 biosynthesis are responsible for the activation of l-pipecolic acid. The FkbP adenylation domain has a different nonribosomal code at the A2-, A4-, and A5-positions from that of L-Pro selective adenylation enzymes (A2, Val; A4, Phe; and A5, Ile) (Fig. 6a). GetE, which is a standalone adenylation enzyme in GE81112 biosynthesis, recognizes l-pipecolic acid and also recognizes l-Pro [9]. GetE has a similar nonribosomal code to l-Pro selective adenylation enzymes, except for the A4- and A8-positions (Fig. 6b). 4-Alkylproline derivatives are found, not only in the NRP griselimycin, but also in licosamides and pyrrolobenzodiazepines [44]. Griselimycins contain two or three (2S,4R)-4-methyl-l-Pro units, which are recognized selectively by the corresponding adenylation domains in module two and module five of GriA, and module eight of GriB [59]. The nonribosomal code of the adenylation domain in module two of GriA (GriA2) shows a subtle difference from that of l-Pro selective adenylation enzymes (Fig. 6c). The small Ala residue at the A8-position may be important to accommodate the 4-methyl group of 4-methyl-l-Pro. LmbC is a standalone adenylation enzyme that selectively recognizes (2S,4R)-4-propyl-l-proline in lincomycin biosynthesis [45]. A homologous adenylation enzyme, CcbC, in celesticetin biosynthesis recognizes l-proline selectively [45]. Because LmbC and CcbC have quite different nonribosomal codes from that of l-Pro selective adenylation enzymes, the substrate recognition mechanism is likely to be different (Fig. 6d, e). Although extensive protein engineering of LmbC has been investigated, it has proved difficult to change the substrate specificity without structural information [107]. The size of the amino acid residues at the A4-position might be important to accommodate the side chains of 4-alkyl-l-proline derivatives.

Fig. 6
figure 6

Model structures of proline-related amino acids including pipecolic acid and 4-alkylproline derivatives. a Active site model of l-pipecolic acid selective adenylation domain of FkbP. b Active site model of l-pipecolic acid selective adenylation enzyme GetE. c Active site model of 4-methyl-l-proline selective adenylation domain GriA2. d Active site model of 4-propyl-l-proline selective adenylation enzyme LmbC. e Active site model of l-proline selective adenylation enzyme CcbC

Pristinamycin IA contains several nonproteinogenic amino acids, including 3-hydroxypicolinic acid, l-2-aminobutyric acid, l-4-dimethylaminophenylalanine, 4-oxo-l-pipecolic acid, and l-phenylglycine [64, 103]. Intact native NRPSs have been purified from the producer Streptomyces pristinaespiralis SP92 and the adenylation activities were examined to show the expected substrate specificity of SnbC and SnbDE. SnbC, consisting of two NRPS modules, recognizes l-Thr and l-2-aminobutyric acid. SnbDE, consisting of four NRPS modules, recognizes l-Pro, l-4-dimethylaminophenylalanine, l-pipecolic acid, and l-phenylglycine. The nonribosomal codes of SnbC-A1-domain and A2-domain are predicted to be l-Thr and l-Val selective codes, respectively. The nonribosomal codes of SnbDE-A1-, A2-, A3-, and A4-domains are predicted to be selective for l-Pro, l-Phe, l-pipecolic acid, and an unknown amino acid, respectively. The adenylation domain of SnbA is predicted to recognize 3-hydroxypicolinic acid. SnbDE-A2-domain has an Ala residue at the A4-position instead of Thr, as is the case for PheA, which appears to expand the room available to be able to accommodate the 4-dimethylamino moiety of l-4-dimethylaminophenylalanine (Fig. 7a). Pyridomycin contains 3-(3-pyridyl)-l-alanine and 3-hydroxypicolinic acid units in a hybrid NRP–PK structure [41]. The adenylation domain of PyrA that is a loading module of NRPS recognizes 3-hydroxypicolonic acid as a starter unit of NRPS. The adenylation domain of PyrA shows a 63% identity/71% similarity to the adenylation domain of SnbA. In congocidine biosynthesis, 4-acetamidepyrrole-2-carboxylic acid and guanidinoacetic acid are recognized by the standalone adenylation enzyme Cgc3* and the adenylation domain of Cgc18, respectively [2]. Cgc3* recognizes pyrrole-2-carboxylic acid derivatives, but does not activate l-proline or picolinic acid. The PKS/NRPS analysis (http://nrps.igs.umaryland.edu) [4] of the target adenylation enzymes that recognizes nonproteinogenic amino acids often afford “no hit” as the nonribosomal codes. The other web-based NRPS substrate prediction tools such as SEQL–NRPS (https://services.birc.au.dk/seql-nrps/) [51] and NRPSsp (http://www.nrpssp.com) [82] predict wrong amino acid residues presumably due to less structural information. In fact, because of the insertion or truncation of sequences in the multiple sequence alignment with the PheA sequence, it is difficult to predict the nonribosomal codes for l-phenylglycine, 3-hydroxypicolinic acid, pyrrole-2-caboxylic acid derivatives, and guanidinoacetic acid selective adenylation enzymes.

Fig. 7
figure 7

Model structures of highly modified α-amino acid selective adenylation enzymes. a Active site of l-4-dimethylamino-Phe selective adenylation enzyme SnbDE A2. b Active site model of N6-hydroxy-l-Orn selective adenylation domain of MbtF. c Active site model of N6-decanoyl-N6-hydroxy-l-Orn selective adenylation domain of MbtE. d Active site model of capreomycidine selective adenylation domain of VioG. e Active site model of l-3-(trans-2-nitrocyclopropyl)Ala selective adenylation domain HrmO1A. f Active site model of β-methyl-Phe selective adenylation domain HrmO3 A

In mycobactin biosynthesis, N6-hydroxy-l-Lys and N6-decanoyl-N6-hydroxy-l-Lys are recognized by MbtF and MbtE, respectively [66]. The N6-hydroxy group is critical for the recognition. In addition, it is noteworthy that the associated MbtH protein is necessary for the adenylation activity of MbtE and MbtF. From examining the nonribosomal codes for MbtF and MbtE, it is difficult to find common features that contribute to the recognition of the N6-hydroxy group of substrates (Fig. 7b, c).

The adenylation domain of the NRPS module VioG strictly recognizes capreomycidine in viomycin biosynthesis [30, 31]. The nonribosomal code of the adenylation domain of VioG suggests that the Asp residue at the A4-position might be responsible for recognition of the guanidine moiety of capreomycidine (Fig. 7d). The adenylation domains of VioF and CmnF are predicted to recognize 2,3-diaminopropanoic acid in viomycin and capreomycin biosynthesis, respectively, although the nonribosomal codes are different from that of the SulI adenylation domain in sulfazecin biosynthesis [31].

HrmO1A (also HrmO4A), HrmO3A (also HrmP1A), and HrmP3A recognize l-3-(trans-2-nitrocyclopropyl)Ala, l-β-methyl-Phe, and l-4-(Z)-propenylproline, respectively, in hormaomycin biosynthesis [21, 40]. The MbtH homolog HrmR is necessary for the adenylation enzymatic activity [21]. HrmO1A has a unique nonribosomal code and thus it is difficult to speculate on the recognition mechanism (Fig. 7e). HrmO3A has a similar nonribosomal code to the l-Phe-selective PheA, except for the A7-, A8-, and A9-positions (Fig. 7f), which cause the different substrate recognition mechanism. HrmP3A has a similar nonribosomal code to GriA2 that recognizes (2S,4R)-4-methyl-l-Pro. Hormaomycin also contains 5-chloro-1-hydroxypyrrole-2-carboxylic acid at the starter position.

JBIR-34 is a 4-methyloxazoline-containing NRP with 6-chloro-4-hydroxy-indole-3-carboxylic acid, d-Ala, and l-Ser amino acid units [78]. The A-CP didomain enzyme FmoA1 recognizes 6-chloro-l-Trp in the biosynthesis of 6-chloro-4-hydroxy-indole-3-carboxylic acid. The nonribosomal code of FmoA1 is different from that of l-Trp selective adenylation enzymes indicating that the 6-Cl group on 6-chloro-l-Trp greatly affects the substrate recognition mechanism (Fig. 8a). 4-Methyloxazoline is constructed from d-Ala via α-methyl-l-Ser, which is recognized by the adenylation domain of FmoA3. The nonribosomal code of FmoA3 contains hydrophilic amino acids (Fig. 8b), so it is difficult to speculate how the α-methyl group is accommodated at the active site. d-Ala and l-Ser are recognized by the adenylation domains of FmoA4 and FmoA5, respectively. The nonribosomal code of FmoA4 (Fig. 8c) is different from that of DltA, which also recognizes d-Ala (Fig. 4a). It was difficult to predict the nonribosomal code for the adenylation domain of FmoA2, which recognizes 6-chloro-4-hydroxy-indole-3-carboxylic acid, because the sequence of the adenylation domain of FmoA2 is largely different from that of PheA.

Fig. 8
figure 8

Model structures of unique amino acid selective adenylation enzymes in JBIR-34, thiocoraline, echinomycin, and sparsomycin biosynthesis. a Active site model of 6-chloro-l-Trp selective adenylation domain FmoA1. b Active site model of α-methyl-l-Ser selective adenylation domain FmoA3. c Active site model of d-Ala selective adenylation domain FmoA4. d Active site model of 3-hydroxyquinaldic acid selective adenylation domain of TioJ. e Active site model of quinoxaline-2-carboxylic acid selective adenylation domain of Ecm1. f Active site model of l-methylthiomethylcysteine selective adenylation domain of SpsQ

TioJ in thiocoraline biosynthesis recognizes 3-hydroxyquinaldic acid, while Ecm1 in echinomycin biosynthesis recognizes 2-quinoxalinecarboxylic acid [76]. The nonribosomal codes of TioJ and Ecm1 are the same, except for the A1-position, which is Asn in TioJ and Gln in Ecm1 (Fig. 8d, e). Thus, this subtle difference might affect the substrate recognition mechanism and/or other structural differences may contribute to distinguishing between these amino acid substrates. It should be noted that the ring nitrogen atom in quinaldic acid and quinoxalinecarboxylic acid has different chemical properties from that of a typical amino acid amino group, and thus, the Asp residue at the A1-position may be replaced by polar amino acids, such as Asn and Gln.

In sparsomycin biosynthesis, 6-methyl-uracil acrylic acid and l-methylthiomethylcysteine (MTM-Cys) are recognized by the first adenylation domain of SpsR and the A-CP didomain enzyme SpsQ, respectively [89]. The nonribosomal code of SpsQ does not provide enough information to speculate on the substrate recognition mechanism of MTM-Cys (Fig. 8f).

In the biosynthesis of phosphinothricyl-alanyl-alanine and phosphinothricyl-alanyl-leucine, the adenylation enzyme PhsA recognizes N-acetyldemethylphosphinothricin or N-acetylphosphinothricin [11, 55, 92]. The amino acid residue at the A1-position of PhsA is Val instead of Asp, presumably because the amino group of the amino acid substrate is N-acetylated [11]. Therefore, the substrate recognition mechanism of PhsA is quite different from typical amino acid selective adenylation enzymes.

β-Amino acid selective adenylation enzymes (Figs. 9 and 10)

The protein structural analyses of three naturally occurring β-amino acid selective adenylating enzymes, VinN [73] in vicenistatin biosynthesis, CmiS6 [17] in cremimycin biosynthesis, and IdnL1 [17] in incednine biosynthesis, have been reported to date (Table 1) [75]. Vicenistatin, cremimycin, and incednine belong to a family of macrolactam antibiotics with a unique β-amino acid starter unit in the polyketide skeleton [54]. VinN recognizes (2S,3S)-MeAsp selectively and ligates it to a discrete CP, VinL. CmiS6 recognizes 3-aminononanoic acid (3-ANA, the stereochemistry is unknown) selectively and ligates it to a discrete CP, CmiS4. IdnL1 recognizes (S)-3-aminobutyric acid (3-ABA) selectively and ligates it to a discrete CP, IdnL6. These adenylating enzymes share a common β-amino acid selective cavity, which prevents the binding of proteinogenic l-α-amino acids (Fig. 9) [17, 73]. The VinN, CmiS6, and IdnL1 family of enzymes have a Phe or Tyr residue at the A2-position (Table 2). This aromatic residue at the A2-position seems to clash with the side-chain of l-α-amino acids. In contrast, the C3 methyl group of (2S,3S)-MeAsp seems to be well accommodated by VinN, because the Km value of VinN for (2S,3S)-MeAsp is 35 times lower than that for l-Asp. The C3 methyl group of (2S,3S)-MeAsp seems to be recognized by the Phe residue at the A2-position through CH–π and van der Waals interactions. The second feature of this enzyme family is a Ser or Thr residue at the A6-position, which would clash with the side chain of l-α-amino acids. These two features at the A2- and A6-positions of the VinN/IdnL1/CmiS6 family of enzymes appear to strictly prohibit these enzymes from accommodating l-α-amino acids. The third feature is that “one amino acid is skipped” just before the A8/A9 position according to the protein structural analysis, although it is difficult to determine “the skipped sequence motif” only by sequence alignment [73]. The pentapeptide sequence motif (Ser or Thr/Glu/Cys or Ser or Thr/A8/A9) where the Glu residue is highly conserved contributes to making a wider active site to accommodate the side-chain at the β-position of β-amino acids. This structural motif was introduced into the l-α-Phe selective adenylating enzyme TycA to alter the substrate specificity resulting in a mutant enzyme that could accommodate (S)-β-Phe [80]. VinN has a Lys residue at the A8-position and an Arg residue at the A9-position that interact with the C3 carboxy group of (2S,3S)-MeAsp (Fig. 9a). IdnL1 and CmiS6 have a Val or Met residue at the A8-position and an Ala residue at the A9-position to accommodate the alkyl side chain at the β-position, the methyl and hexyl groups, respectively (Fig. 9b, c). Leu220/Cys313 (A3/A7) in IdnL1 and Gly220/Leu312 (A3/A7) in CmiS6 seem to indirectly make the appropriate hydrophobic atmosphere to accommodate the appropriate length of alkyl side chain at the C3 position. The CmiS6 G220L variant can accommodate 3-ABA as a substrate, but has lost the affinity for 3-ANA, suggesting that the A3-position is involved in the selection of the alkyl side chain of 3-amino fatty acids [17]. The stereochemistry at C3 might affect the selective recognition by these enzymes, although only one enantiomer of β-amino acids is likely to be biosynthesized in the producer strains. In summary, the VinN/IdnL1/CmiS6 family of enzymes shares a common nonribosomal code (A1, A2, A6, A10, and pre A8/A9 positions). The combinations of A3/A7 and A8/A9 seem to be a determinant for selectivity for the substituents at the C3 position of β-amino acids [75]. FlvN in fluvirucin B2 biosynthesis is known to recognize l-Asp and has the same nonribosomal code as VinN, except for the A2- and A6-positions (Tyr at the A2-position and Thr at the A6-position) [7, 74]. These subtle differences may cause the different recognition mechanisms for l-Asp and MeAsp. HitB in hitachimycin biosynthesis is believed to recognize (S)-β-Phe in a similar manner to the VinN/IdnL1/CmiS6 family of enzymes [53, 75] (Fig. 9d).

Fig. 9
figure 9

Active sites of VinN/IdnL1/CmiS6 family of β-amino acid selective adenylation enzymes. a (2S,3S)-MeAsp selective adenylation enzyme VinN (structurally elucidated), b 3-aminononanoic acid selective adenylation enzyme CmiS6 (structurally elucidated), c 3-aminobutyric acid selective adenylation enzyme IdnL1 (structurally elucidated) d β-Phe selective adenylation enzyme HitB (model structure)

Fig. 10
figure 10

Model structures of β-amino acid selective adenylation enzymes. a Active site model of (2R,3S)-β-methyl-d-aspartic acid selective adenylation domain McyB-A2. b Active site model of (S)-β-Lys selective adenylation enzyme Orf5 (StnE). c Active site model of β-Ala selective adenylation domain BlmIV A1. d Active site model of β-Phe selective adenylation enzyme AdmJ. e Active site model of β-Tyr selective adenylation enzyme SgcC1. f Active site model of β-Tyr selective adenylation domain CmdD-A7

Several other β-amino acid selective adenylation enzymes have been also characterized, although the protein structures have not been elucidated yet. The adenylation domain McyB-A2 in microcystin biosynthesis is known to recognize the β-amino acid (2R,3S)-MeAsp [50, 69]. The nonribosomal code of McyB-A2 is different from that of the VinN/IdnL1/CmiS6 family of enzymes, because McyB-A2 does not have an aromatic residue at the A2-position or Ser/Thr at the A6-position (Fig. 10a and Table 2). It is also difficult to find the third feature of the VinN/IdnL1/CmiS6 family of enzymes in which one amino acid is skipped just before the A8/A9-position (please see Fig. 9 and the corresponding sentences to describe the structure of VinN) [50, 69]. McyB-A2 has an Arg at the A3-position and a His at the A4-position, both of which seem to be involved in the recognition of the carboxy group of MeAsp.

The standalone adenylation enzymes Orf5 (StnE) and Orf19 (StnS) in streptothricin biosynthesis are known to recognize β-Lys [50, 57, 63]. The nonribosomal codes of Orf5 (StnE) and Orf19 (StnS) are different from that of the VinN/IdnL1/CmiS6 family of enzymes, because Orf5 (StnE) and Orf19 (StnS) do not have an aromatic residue at the A2-position or Ser/Thr at the A6-position [73] (Fig. 10b). The Glu residue at the A3-position of Orf5 (StnE) and Orf19 (StnS) may be involved in the recognition of terminal amino group of β-Lys. The first adenylation domain of BlmIV (BlmIV A1-domain) in bleomycin biosynthesis is predicted to recognize β-Ala [34, 50]. The amino acid at the A1-position of BlmIV A1-domain is not Asp, but the amino acid at the A2-position is Asp, which is presumed to be involved in the recognition of the amino group of β-Ala (Fig. 10c). The nonribosomal code at the A6-position of BlmIV A1-domain has a Ser residue, which would clash with the side chain of l-α-amino acids, as seen with the VinN/IdnL1/CmiS6 family of enzymes. The Asp residue at the A9-position might be also involved in the recognition of the β-amino group of β-Ala.

Many β-Phe derivatives are known to be involved in NRP and hybrid NRP–PK biosynthesis. AdmJ in andrimid biosynthesis recognizes (S)-β-Phe [61]. SgcC1 recognizes (S)-β-Tyr in the biosynthesis of the enediyne antibiotic C-1027 [106]. CmdD-A7-domain in the biosynthesis of chondramides recognizes (R)-β-Tyr [85]. It is difficult to find common features among these adenylating domains selective for β-Phe derivatives, except for the A1- and A10-positions, because the nonribosomal codes of these enzymes are different from those of the VinN/IdnL1/CmiS6 family of enzymes, including HitB that recognizes (S)-β-Phe in hitachimycin biosynthesis (Figs. 9, 10d–f) [53, 73]. The A6-positions of SgcC1 and CmdD-A7 are Met and Thr, respectively, which may clash with the side chain of l-α-amino acids similar to the VinN/IdnL1/CmiS6 family of enzymes. In contrast, AdmJ has a Gly residue at the A6-position as is typical for l-α-amino acid selective adenylation enzymes. The Pro residue (P571) of SgcC1 at the A2-position seems to affect the catalytic reaction mechanism, because the kcat/Km value of the SgcC1 P571A variant is reduced 142-fold [106]. The successful protein engineering of TycA (engineered TycA, PDB ID 5N82), which was able to recognize (S)-β-Phe after the introduction of the skipped sequence motif at preA8/A9 [80] based on the VinN structure, indicate that a wide active site appears to be one of the key properties to accommodate β-amino acids. Several different recognition mechanisms for β-amino acids appear to be used depending on the different adenylation enzymes. Structural analysis of these unique β-amino acid selective adenylating enzymes will help to better understand the substrate recognition mechanisms.

γ-Amino acids, δ-amino acids, aminobenzoic acids, and others (no structural analysis to date)

The adenylating enzymes selective for γ-amino acids, such as d-glutamic acid in microcystin biosynthesis [69], δ-amino acids, such as l-α-aminoadipic acid in penicillin biosynthesis [49, 91], and other unusual amino acids have not been structurally elucidated. McyE in microcystin biosynthesis recognizes d-Glu as γ-amino acid and has a similar nonribosomal code to McyB-A2-domain, which recognizes the β-amino acid (2R,3S)-MeAsp (Fig. 11a) [69]. The conserved Arg at the A3-position and His residues at the A4-position might be involved in the recognition of the α-carboxy group [69]. The adenylation domain of α-aminoadipate reductase Nps3 recognizes l-2-aminoadipic acid as δ-amino acid in lysine biosynthesis of the basidiomycete Ceriporiopsis subvermispora [46, 47]. The nonribosomal code at the A6-position of Nps3 is Val, which would clash with the side chain of l-α-amino acids (Fig. 11b). Two Arg residues at the A3- and A8-positions, and the His residue at the A4-position might be involved in the recognition of the α-carboxy group, as for McyB-A2 and McyE. The α-carboxy groups of d-glutamic acid and l-2-aminoadipic acid appear to be held distant from the adenylation site by interactions with several basic residues, resulting in only the terminal carboxy group of the substrates being able to react with ATP. Because only a comparison of nonribosomal codes has been discussed, structural analysis of unique adenylation enzymes will be required to understand the amino acid substrate recognition mechanisms.

Fig. 11
figure 11

Model structures of γ-amino acids selective adenylation enzyme and δ-amino acids selective adenylation enzyme. a Active site model of d-Glu as γ-amino acids selective adenylation domain of McyE. b Active site model of l-2-aminoadipic acid as δ-amino acids selective adenylation domain of Nps3

Aminobenzoic acid derivatives are also unique components of NRPs and hybrid NRP–PKs. The loading adenylation domain of RifA recognizes 3-amino-5-hydroxybenzoic acid (AHBA) in rifamycin biosynthesis [1]. The nonribosomal code for AHBA in the adenylation domain of RifA can be predicted, indicating a hydrophobic active site (Fig. 12a). In albicidin biosynthesis, four p-aminobenzoic acids (PABAs) are recognized by Alb01A1-domain, A3-domain, Alb09A4-domain, and A5-domain [18]. The nonribosomal code for PABA in Alb adenylation enzymes indicates that the Asp residue at the A8-position may interact with the amino group of the aniline moiety (Fig. 12b). The first adenylation domain of the fungal NRPS AnaPS recognizes anthranilic acid in acetylaszonalenin biosynthesis [3]. The nonribosomal code for anthranilic acid in AnaPS indicates a hydrophobic active site (Fig. 12c). Interestingly, in aurachin biosynthesis, the standalone adenylation enzyme AuaEII recognizes anthranilic acid and ligates it with coenzyme A to afford anthraniloyl-CoA [43, 81]. Then, another ANL family enzyme AuaE, which lacks the Lys residue at the A10-position, transfers the anthranilate onto the CP AuaB. The nonribosomal code for the anthranilic acid selective adenylation enzyme AuaEII is different from that of AnaPS (Figs. 12c, d). Based on the crystal structure of AuaEII with anthranilic acid adenylate, the Thr residue at the A1-position and the His residue at the A8-position appear to interact with the amino group of anthranilic acid [43]. Because the amino acid residue at the A1-position is not Asp in the adenylation enzymes of these aminobenzoic acid derivatives, the amino group of aminobenzoic acid derivatives seems to be recognized in a different manner to that of amino acids.

Fig. 12
figure 12

Active sites of adenylation enzymes of aminobenzoic acids. a Proposed active site model of 3-amino-5-hydroxybenzoic acid selective adenylation domain of RifA. b Proposed active site model of PABA selective adenylation domain Alb01 A3. c Proposed active site model of anthranilic acid selective adenylation domain of AnaPS. d Active site of anthranilic acid selective adenylation enzyme AuaEII

Mining of nonproteinogenic amino acid selective adenylation enzymes

Many adenylating enzymes for nonproteinogenic amino acids have been predicted based on the NRPS and hybrid NRPS–PKS assembly lines in the target biosynthetic gene cluster. Norcoronamic acid [(1S,2S)-1-amino-2-methylcyclopropane-1-carboxylic acid] is known to be involved in the biosynthesis of the quinomycin family of nonribosomal peptides, SW-163 s [110]. The results of feeding norcoronamic acid into the producer strain indicated that the adenylation domain in Swb17 is responsible for the recognition of norcoronamic acid. Polymyxins are cyclic nonribosomal lipopeptides that contain four to five 2,4-diaminobutyric acid residues [16, 33]. The highly conserved Glu residue at the A4-position of the NRPS adenylation domains is proposed to be involved in the recognition of the terminal amino group. Enduracidin contains enduracididine, citrulline, and six l-hydroxyphenylglycine derivatives [113]. The adenylation domains that recognize these nonproteinogenic amino acids can be predicted based on the domain organization. Daptomycin contains kynurenine as a nonproteinogenic amino acid unit. The second adenylation domain in the NRPS module DptD2 is responsible for the recognition of kynurenine in daptomycin biosynthesis [70, 88]. However, the adenylation activity of these enzymes has not been investigated yet.

The biosynthetic gene clusters of several NRPs and hybrid NRP–PKs containing piperazic acid units [28, 79], including sanglifehrin A [84], himastatin [60], aurantimycin [115], polyoxypeptin A [27], and kutznerides [32] have been identified. This family of compounds contains many nonproteinogenic amino acids. For example, kutznerides contain norcoronamic acid, O-methyl-l-Ser, 3-hydroxy-d-Glu, (2S,3aR,8aS)-6,7-dichloro-3a-hydroxy-hexahydropyrrolo[2,3-b]indole-2-carboxylic acid, and chlorinated piperazic acid. The adenylation domain responsible for recognizing piperazic acid can be predicted based on the domain organization. However, the adenylation activity of adenylation enzymes selective for piperazic acid has not been investigated yet. Because of the commercial unavailability of nonproteinogenic amino acids, functional analysis, including protein structural analysis of nonproteinogenic amino acid selective adenylating enzymes is limited. Efficient synthetic methodology for nonproteinogenic amino acids and mimics is required to overcome this situation. Further functional and structural analyses will pave a way to better understand the substrate recognition mechanisms.

Engineering of adenylation enzymes

Engineering of the substrate specificity of adenylation enzymes is an attractive method in synthetic biology for creating new molecules [5, 6, 29, 111]. The adenylation enzyme CdaPS3 selective for l-Glu (also 3-methyl-l-Glu) in the biosynthesis of calcium-dependent antibiotic (CDA) has been successfully engineered to recognize 3-methyl-l-Gln [104]. Double mutation of CdaPS3, Q236E at the A2-position and K278Q at the A4-position, resulted in the production of CDA derivatives containing l-Gln or 3-methyl-l-Gln. The l-Phe selective adenylation enzyme PheA (GrsAA) in gramicidin S biosynthesis has been successfully engineered to recognize O-propargyl-l-Tyr by introduction of a single mutation W239S at the A3-position [52]. Furthermore, a successful attempt to alter the substrate specificity from an α-amino acid to a β-amino acid has been reported recently, based on structural analysis of the β-amino acid selective adenylation enzyme VinN [80]. Replacement of the tetrapeptide sequence Thr328–Ser329–Ile330(A8)–Cys331(A9) with randomized tripeptides and a random mutation at A236 (A2-position) in the W239S variant of TycA in tyrocidine biosynthesis resulted in an α/β switch. Therefore, structural elucidation of nonproteinogenic amino acid selective adenylation enzymes will have a significant impact on rational protein engineering to change the substrate specificity, leading to the engineered biosynthesis of designed NRPs and hybrid NRP–PKs.

Protein–protein interactions between adenylation enzymes and associated proteins

In addition to the nonribosomal code of adenylation enzymes that selectively recognizes amino acids; the adenylation enzymes must recognize the acceptor CPs. In fact, the associated proteins, such as condensation enzymes and CPs, affect the adenylation enzymatic activity by interacting with the Asub domain of adenylation enzymes [68]. Successful combinatorial biosynthetic approach by manipulating NRPS assembly lines indicates that C–A di-domain and C–A–CP tri-domain may have coevolved by maintaining appropriate protein–protein interaction [5, 6]. However, it is still unknown exactly how the acceptor CP interacts with the partner adenylation enzyme and the condensation enzymes [71, 102]. It is anticipated that the structures of adenylation enzyme–CP complexes will be further elucidated in future to better understand this protein–protein interaction. The complex crystal structure of the EntE (dihydroxybenzoic acid adenylation enzyme)–EntB (CP) fusion protein with a vinyl sulfamoyl adenosine derivative as a molecular probe to form a covalent bond with the thiol of EntB, shows the interface between the adenylation enzyme and CP [97]. The recent crystal structures of several conformations of full NRPS modules show that the CP domain is delivered to each catalytic domain depending on the reaction stage [23, 37, 87]. Accumulation of structural data for the protein–protein interaction between adenylation enzymes and CPs will enhance our knowledge and promote being able to produce the CP targeting code of the adenylation enzyme, which is important for the rational engineering of NRPSs and hybrid NRPS–PKSs [13, 42].

Conclusion

In this review, we have summarized the current status of structural knowledge on nonproteinogenic amino acid selective adenylation enzymes. Currently, only limited structural information of nonproteinogenic amino acid selective adenylation enzymes is available. As we pointed out, protein structural information helps rational engineering of adenylation enzymes to introduce unique amino acids such as nonproteinogenic amino acids into NRPS assembly lines. For future successful engineered biosynthesis of NRPs and hybrid NRP–PKs for drug discovery, we emphasize that more extensive efforts for protein structural analysis are necessary to overcome the current situation to better understand the substrate recognition mechanisms of nonproteinogenic amino acid selective adenylation enzymes.