Introduction

Ubiquitination is one of the most important post-translation modifications in eukaryotes [1]. With the exception of the most known function as regulating protein turnover inside the cell [2, 3], ubiquitination can also induce conformational changes that alter the biological function. Furthermore, different kinds of ubiquitination with different types, lengths, connectivities, and anchoring sites play distinct roles in the cell and regulate diverse areas of biology [4]. Like phosphoration, ubiquitination is reversible and can be occurred in only one or several amino acid residues on the same protein. Thus the ubiquitin system could be the prime candidate for these new targets following the kinase superfamily [5].

Ubiquitination of a protein target is tightly regulated through a cascade of enzyme activities (E1 → E2 → E3), which can link the C-terminal Gly residue of ubiquitin (Ub) to the Lys side chain of the target protein through an isopeptide bond [6, 7]. The transfer of ubiquitin from the E2 enzyme to substrates mediated by the ubiquitin E3 ligase is the key step for the appropriate ubiquitination of target protein which confers specificity to ubiquitination. In addition, the E3 ubiquitin ligases are specified by over 600 human genes, surpassing the 518 protein kinase genes [8]. Thereby, E3 ubiquitin ligases are more attractive therapeutic targets than general proteasome inhibitors for modulating the ubiquitin system [5, 8, 9]. However, the development of E3 ligase inhibitors has proven challenging, in part due to the fact that they must disrupt protein–protein interactions (PPI) [10]. Finding active small molecular compounds to disrupt the interface of two proteins can be intrinsically more difficult than searching for small molecules that block catalytic activity [11]. So by now, only a few additional E3 ligase inhibitors have been reported that target inhibitors of P53–MDM2 [12], apoptosis proteins (IAPs) [13], Skp1–Cullin–F-box (SCF)Cdc40 [14], SCFMet30 [15], and von Hippel–Lindau (VHL) [16, 17]. Therefore, this field remains unmined. Researchers have still not developed a general approach for identifying inhibitors of many E3 ubiquitin ligases, while the E3 ligase involved in interactome provides the new hope. Each ligase degrades multiple proteins and many other proteins have been found to inhibit ubiquitination through disrupting the PPI of E3 ligase and its substrate.

These natural binding partners of the E3 ligase not only enrich the regulatory network of the E3 ligase, but also can benefit the discovery of the E3 ligase inhibitors. Herein, we present a novel strategy that using multiple substrates to elucidate the sub-pocket-activity and per-residue-activity relationship of the E3 ligase. Molecular dynamics (MD) simulation, molecular mechanics-generalized born surface area (MM-GBSA) binding energy calculation and energy decomposition scheme were incorporated to give the quantitative contributions of sub-pocket and per-residue to binding. Using different substrates of the E3 ligase as the input structures, the detailed information of sub-pocket-activity and per-residue-activity relationship can be obtained. These quantified binding elucidations of the E3 ubiquitin ligase and substrate can provide direct information about the binding determinants of E3 ligase. These information can guide the discovery of novel substrate and benefit the discovery of the E3 ligase inhibitor. For example, the key residue of the substrate identified from the quantified analysis can give the detailed structure fragment to design the inhibitor.

In the present work, Kelch-like ECH-associated protein-1 (Keap1), a substrate adaptor component of the Cullin–RING ubiquitin ligases (CRL) complex, was considered for investigation. In the complex, Cul3 (Cullin 3) and Rbx1 (Ring box 1) form the catalytic component of the enzyme complex and interact with an E2 ubiquitin ligase to transfer ubiquitin to the substrate. Keap1 is the adaptor component of the complex which is in charge of recognizing ubiquitination substrate. The well-known substrate for the Keap1–Cul3–RING ubiquitin E3 ligase complex is nuclear factor erythroid-2-related factor 2 (Nrf2) [18], which is an attractive target for the prevention and treatment of oxidative stress-related diseases and conditions including cancer, neurodegenerative, cardiovascular, metabolic and inflammatory diseases [19]. Besides, Keap1 also mediates the ubiquitination of several other important proteins including IKKβ [20, 21] (IκB kinase β) and Bcl-2 [22]. Moreover, many other proteins have been found to inhibit the Keap1 mediated ubiquitination through disrupting the PPI of Keap1 and its substrate. For example, the selective autophagy substrate p62 activates the stress responsive transcription factor Nrf2 through disrupting the PPI of Keap1–Nrf2 [23, 24]. The pathological process associated with p62 accumulation can result in hyperactivation of Nrf2, delineating unexpected roles of selective autophagy in controlling the transcription of cellular defence enzyme gene [24]. Thus, understanding of the intermolecular recognition mechanism between Keap1 and substrates is important for the study of Keap1-related interactome.

Previously, we have investigated the Keap1–Nrf2 ETGE motif interaction using the MD simulations [25]. Key electrostatic interactions between arginines of Keap1 and glutamates of Nrf2 play an important role in Keap1–Nrf2 ETGE motif PPI. However, using one substrate of Keap1 could not give the whole understanding of the substrate recognition mechanism. In order to deeply understand the recognition mode and the crucial interactions between Keap1 and substrate, four different substrates in complex with Keap1 DC domain were included to perform MD simulations combining with MM-GBSA free energy calculations. Total energy was decomposed on per residue contribution, and hydrogen bond occupancies were monitored throughout the simulations. The results provide detailed information of sub-pocket-activity relationship of Keap1 and per-residue-activity relationship of substrates, which can directly guide the discovery of Keap1 substrates. What’s more, these critical insights into the binding determinants of the Keap1 DC domain structure can facilitate the design of the peptide inhibitors [26] and further the small molecular inhibitory compounds of Keap1, which has been proven successful [25, 27].

Materials and methods

General procedure and system preparations

The currently known structures of Keap1–substrate complex were obtained from protein data bank (PDB). The detailed information of crystal structures was listed in Table 1. All structures obtained from PDB were corrected using clean protein tool in Discovery Studio (DS) 3.0 package (Accelrys Inc., San Diego, CA, USA). All calculations were conducted using Dawning TC2600 cluster. Except for otherwise mentioned, other parameters were set as default.

Table 1 The currently known crystal structures of Keap1–substrate complex

Preparation of MD starting structures

Four known substrates with reported crystal structures were downloaded from PDB for further analysis. These peptides were LDEETGEFL (the Nrf2 ETGE motif, PDB ID 1X2R), WRQDLD (the Nrf2 DLG motif, PDB ID 2DYH), QNEENGEQE (a nuclear oncoprotein, prothymosin α, PDB ID 2Z32) and VDPSTGEL (the selective autophagy substrate p62, PDB ID 3ADE). To avoid terminal charges on the peptides, the N-terminal residues were capped with acetyl (ACE). All crystallographic water molecules were removed from the coordinate set.

Molecular dynamics simulations

MD simulation of Keap1 DC domain bound to four different peptide substrates were performed using PMEMD module of AMBER 12 with ff99SB modifications [31, 32] of the Cornell et al. [33] force field. TIP3P water molecules were applied to solvate the complex, extending at least 12 Å from the protein. The counterions were added to the solvent to keep the system neutral (twelve Na+ for 1X2R, nine Na+ for 2DYH, twelve Na+ for 2Z32 and ten Na+ for 3ADE). The geometry of the system was minimized in two steps before MD simulation. First, the water molecules were refined through 2,500 steps of steepest descent followed by 2,500 steps of conjugate gradient, keeping the protein fixed with a constraint of 2.0 kcal mol−1 Å2. Second, the complexes were relaxed by 10,000 cycles of minimization procedure (5,000 cycles of steepest descent and 5,000 cycles of conjugate gradient minimization). During the simulation, the particle mesh Ewald method [34] was employed to calculate the long-range electrostatic interactions, while the SHAKE method [35] was applied to constrain all covalent bonds involving hydrogen atoms to allow the time step of 2 fs. A 10 Å cutoff value was used for the nonbonded interactions. The whole system was heated from 0 to 300 K running 50 ps molecular dynamics with position restraints at constant volume. Subsequent isothermal isobaric ensemble (NPT)-MD was performed for 500 ps to adjust the solvent density with a time constant of 1.0 ps for pressure relaxation. Harmonic restraints with force constants of 2 kcal/(mol Å2) were applied to all receptor and peptide atoms in this step. An additional 500 ps of unconstrained NPT-MD at 300 K with a time constant of 2.0 ps for pressure relaxation was performed to relax the system without constraints. The production dynamics at constant pressure achieved lengths of 40 ns of which snapshots saved at 20 ps intervals were used for further analysis.

Analysis of MD trajectories and binding energy calculation

To explore the stability of complexes in the simulations and to ensure the rationality of the sampling method, the ‘ptraj’ tool in Amber12 was used to analyze the time-dependence of the RMSD of the backbone atoms. Furthermore, the hydrogen bonds were defined by a distance cutoff of 3.2 Å and an angle cutoff of 120°. Hydrogen bonds were only considered if their occupancies attained >20 % (the percentage of simulation time in which the hydrogen bond is formed).

The MM-GBSA method [36, 37], implemented in the AMBER program, has been used for free energy calculation and to investigate the energetic contributions of each residue to binding. The basic theory of the MM-GBSA approach is that the free energy of binding can be obtained through calculating only the end points of the thermodynamical cycle of ligand binding. The binding free energy, Δ Gbind, can be calculated as:

$$ \Delta {\text{G}}_{\text{bind}} = {\text{G}}_{\text{complex}} - ({\text{G}}_{\text{protein}} + {\text{ G}}_{\text{ligand}} ) $$

Each term can be estimated as follows:

$$ \Delta {\text{G}} =\Delta {\text{G}}_{\text{MM}} +\Delta {\text{G}}_{\text{sol}} - {\text{T}}\Delta {\text{S}} $$

where ΔGMM is the molecular mechanics free energy, ΔGsol is the solvation free energy, and TΔS represents the entropy term. The molecular mechanics energy was calculated by the electrostatic and van der Waals interactions, while the solvation free energy was composed of the polar and the nonpolar contributions:

$$ \begin{aligned}\Delta {\text{G}}_{\text{MM}} & =\Delta {\text{G}}_{\text{ele}} +\Delta {\text{G}}_{\text{vdW}} \\\Delta {\text{G}}_{\text{sol}} & =\Delta {\text{G}}_{{{\text{ele}},{\text{sol}}}} +\Delta {\text{G}}_{{{\text{nonpol}},{\text{sol}}}} \\ \end{aligned} $$

The contribution of polar solvation energy, ΔGele,sol, is calculated with the generalized born (GB) implicit solvent model, whereas the nonpolar part of the solvation energy, ΔGnonpol,sol, is dependent on the solvent accessible surface area (SA). TΔS are the contributions arising from changes in the degrees of freedom of the solute molecules, which were not considered here. Thus, values reported for the MM-GBSA calculations should be considered as “effective energies” (ΔGEff) rather than the free energies [38].

For MM-GBSA analysis, snapshots at 20 ps intervals were extracted from the last 20 ns of the MD trajectory, and the binding free energies were averaged over the ensemble of conformers produced. In an attempt to detect the hot spot residues, the effective binding energies were decomposed into contributions of individual residues using the MM-GBSA energy decomposition scheme introduced by Gohlke et al. [39].

Results

In order to fully understand the binding mode and key interactions of Keap1–peptide substrates, MD simulations of Keap1 bound to four known peptide substrates were carried out in combination with free energy calculations. These peptides were LDEETGEFL (the Nrf2 ETGE motif, PDB ID 1X2R), WRQDLD (the Nrf2 DLG motif, PDB ID 2DYH), QNEENGEQE (a nuclear oncoprotein, prothymosin α, PDB ID 2Z32) and VDPSTGEL (the selective autophagy substrate p62, PDB ID 3ADE). The Nrf2 ETGE motif and the Nrf2 DLG motif together function as a “hinge and latch model” that senses the oxidative/electrophilic stress [4042]. While both prothymosin α and p62 have been found to participate in dissociating Nrf2 from the Keap1–Nrf2 complex through competitive binding to the Keap1 DC domain. Thus, deeply understanding of the binding mode and key interactions between Keap1 and known substrates can contribute to the rational design of PPI inhibitor of Keap1.

Static structure analysis of binding mode

The binding interface in the X-ray crystal structure of the Keap1–peptide has been widely reported in the literature. In brief, the binding cavity of Keap1 contain five sub pockets as P1, P2, P3, P4 and P5 (shown in Fig. 1). The polar sub pockets, P1 and P2, have been proven of great importance in binding substrate. The P1, formed by residues Ser508, Phe478, Ile461, Arg483, Arg415 and Gly462, is highly positively charged and the electrostatic interactions with the Arg483 and Arg415 are significant for binding. In the four peptide substrates, the residues occupying this sub pocket are Glu, Gln, Glu and Asp, separately. The Gln in Nrf2 DLG motif which lacks the key carboxyl group makes the binding affinity getting an order of magnitude reduction. The P2 sub pocket, consisting of Ser363, Arg380, Asn382 and Asn414, is also positively charged. Moreover, the residues which occupy this sub pocket are all glutamates, indicating its conservation and important role in substrate recognition. Noteworthily, the Arg415 can take part in the P1 and P2 simultaneously. The P3 sub pocket, formed by Gly509 Ala556, Ser555, Ser602, Gly603 and Gly571, is occupied by the peptide backbone. These small-sized residues make this pocket sensitive to the steric hindrance. For example, the Ala mutation of the Gly in the Nrf2 ETGE motif nearly abolishes binding to Keap1. Unlike the P1, P2 and P3, the hydrophobic site P4 and P5 have been poorly investigated. In the case of Nrf2 ETGE motif, the side chains of Leu76, Glu78 and Phe83 cover both two sites, while for Nrf2 DLG motif, the two sites are nearly vacant. It may be one of the reasons why the ETGE motif and DLG motif show distinct binding affinity for Keap1. Although both P62 and Prothymosin α occupy the two hydrophobic sub-pockets, the binding modes show some differences. The hydrophobic residues of Prothymosin α, including Val348, Pro350 and Leu355, occupy the P4 and P5. On the contrary, P62 uses polar residues, including Gln40, Glu42 and Gln47, to occupy the two hydrophobic sub-pockets.

Fig. 1
figure 1

Static structural analysis of Keap1–peptide interface. a Keap1–Nrf2 ETGE motif, PDB code: 1X2R; b Keap1–Nrf2 DLG motif, PDB code: 2DYH; c Keap1–prothymosin α, PDB code: 2Z32; d Keap1–P62, PDB code: 3ADE. The surface of Keap1 was colored as the partial charge. The ligand was represented as sticks

System stability examination

The convergence and stability of the simulations were monitored through the examination of the root-mean-square deviation (RMSD) of backbone atoms with respect to the structures obtained at the end of the equilibration procedure. The detailed results can be found in Fig. 2. As can be seen in the plots, the β-propeller structure of Keap1 shows good structure stability in all cases. In order to ensure the system stable and well equilibrated, only the MD trajectories from the last 20 ns of simulation of all systems were taken for MM-GBSA analysis.

Fig. 2
figure 2

Stability examination for MD simulations. Time series of RMSD values of Keap1–peptide complexes are shown with respect to structures obtained at the end of the equilibration procedure

Hydrogen bond interactions

With the aim of highlighting key binding determinants of the Keap1–substrate, we investigated the hydrogen bond interactions formed between Keap1 and the simulated peptide substrates along the MD trajectories. The hydrogen bonds of which occupancies were more than 20 % were considered and shown in the Fig. 3. The hydrogen bonds are colored as red (occupancy >80 %), purple (occupancy of 60–80 %), blue (occupancy of 40–60 %) and green (occupancy of 20–40 %). The detailed information can be found in Supporting Information Table S4.

Fig. 3
figure 3

Hydrogen bonds formed between Keap1 and peptide substrates along the MD trajectories. The hydrogen bonds of which occupancies were more than 20 % were considered and shown in the Figure. The hydrogen bonds are colored as red (occupancy >80 %), purple (occupancy of 60–80 %), blue (occupancy of 40–60 %) and green (occupancy of 20–40 %)

Hydrogen bonds formed via the peptide backbone fix the conformation of the peptide

Two strong canonical hydrogen bonds are formed in the P3 sub-pocket which is occupied by the peptide backbone. Both of the two strong hydrogen bonds are formed between the hydroxyl hydrogen of the serine, namely Ser555, Ser602, and the carbonyl oxygen of the peptide backbone. Strikingly, the two hydrogen bonds are located between the two polar cavities, P1 and P2. These conservative hydrogen bonds can stabilize the binding conformation of the peptide. While concerning Nrf2 DLG motif, one of the canonical hydrogen bonds, formed by Ser602 is absent, partly owing to the lack of the acidic amino acid in the P1 cavity. The residues connected after the Ser602 interacting carbonyl group are glycine and glutamate in all of the three cases indicating the structure conservation.

Another strong canonical hydrogen bond is formed between Gln530 (side chain NH2) and the carbonyl oxygen of the peptide backbone, of which occupancies were over 80 % in all cases (for P62, the occupancies are the sum of two amide hydrogens in the side chain). The Gln530, in the P4 sub-pocket, is located on the edge of the binding cavity, which can restrain the N-terminal of the peptide. The three conserved canonical hydrogen bonds together determinate the appropriate conformation of the peptide and drive the acidic residues of the peptide into the polar cavity. Besides, in the case of Nrf2 ETGE motif, the C-terminal carbonyl oxygen of the peptide backbone can also interact with the Asn382 (side chain NH2), which can make the binding conformation more stable.

Hydrogen bonds formed via the side chain of the peptide mainly interact with the polar sub pockets

The polar sub-pockets, P1 and P2, both form strong polar interactions with the peptide side chain. In the P1 sub-pocket, the hydrogen bond involved in the Arg415 can be found in the three complexes except the Nrf2 DLG motif which lacks the acidic res idues in this site. The Nrf2 ETGE motif could interact with all of the three polar residues, namely Arg415, Arg483 and Ser508, in the P1, indicating the strong binding in this site. Nonetheless, P62 only can form stable hydrogen bond with Arg415, suggesting the weak binding of Keap1–P62 in the P1. In the P2, the hydrogen bonds are more conservative than those in the P1. The hydrogen bonds formed between the acidic residues in the peptide and the Arg380, Asn382 and Ser508 in the Keap1 can be found in all cases except the Nrf2 DLG motif, which does not interact with Ser508. The occupancies of the Nrf2 DLG motif are also the smallest in the four complexes. Thus, the binding strength between the Nrf2 DLG motif and Keap1 in P2 cavity may be the weakest among the four cases. Noteworthily, the Prothymosin α can form stable hydrogen bond with the Tyr525 (side chain OH) using the side chain carboxyl group and this strong hydrogen bond cannot be found in other cases.

Binding energy calculation

Previous study [43] showed that MM-PBSA performed better in calculating the absolute binding free energies but that MM-GBSA performed better in calculating relative free energies. Herein, we focus on the insight into the binding energy origination of peptide ligands and the main driving force for their binding through comparison of four different peptide substrates, thus the MM-GBSA method was chosen to calculate binding energy. Binding energy calculated by MM-PBSA was also available in Table S1 for comparison.

Overall, the calculated effective binding energies of the four complexes all show negative values, indicating the favorable PPI in all cases. The Nrf2 ETGE and DLG motif gave big differences in the effective binding energies, consistent with the reported results that the binding constants of the two motifs show two orders of magnitude differences. However, the calculated values overestimate the binding free energy which can be partly explained due to two missing contributions: the lack of entropic contributions, which can be expected to be unfavorable in the case of the flexible peptides; and the lack of energetic contributions due to conformational changes, which were not considered here because of the use of the single trajectory approach [38].

As shown in Table 2, the decomposition of the binding free energy showed that the nonpolar contribution was the major part of the binding energy, but it was little changed among the four ligands. In contrast to the nonpolar contribution, the electrostatic contribution varied widely within the four substrates and it is the primary origin of the binding energy differences. The electrostatic contribution in the gas phase was favorable in all cases, indicating the strong polar interaction in Keap1–substrate binding. However, the desolvation penalties associated with the binding event was also huge, especially for the Nrf2 DLG motif. Only for the Keap1–Nrf2 ETGE motif complex, the total electrostatic contribution of binding was favorable.

Table 2 Binding free energies and its components for Keap1–peptide complex

Per-residue contributions

Next, we examined the energetic contributions of individual residue to seek the key one that dominates the Keap1–substrate binding.

Per-residue contributions of the peptides

The free energy decomposition of peptide ligand can give the energetic contributions of individual residue in the ligand. These information can quantitatively evaluate the importance of individual residue and sequentially guide the identification of small molecular inhibitor that mimic the binding determinants of protein–protein complexes. The per-residue contributions to the binding effective energy of the peptide ligand were listed in the Fig. 4. The most favorable interactions are formed by the Glu79 in the Nrf2 ETGE motif, whose contribution to ΔGeff is as low as −11.7 kcal/mol.

Fig. 4
figure 4

Per-residue contribution to the binding effective energy of peptide ligand. Per-residue contributions were calculated by the MM-GBSA decomposition method

This agrees with an experimental mutation study on the residues in the Nrf2 ETGE motif [29]. The large contribution of Glu79 to the binding free energy is attributed to both the total nonpolar contributions and the total electrostatic contribution. The favorable electrostatic interaction as calculated by the molecular mechanics force field (ΔGELE) can offset the unfavorable electrostatic contribution due to desolvation (ΔGele,sol). The total electrostatic contribution is more than half of the effective binding energy (Supporting Information Table S2). Besides, the van der Waals contribution and the nonpolar part of solvation free energy are both favorable for binding. The huge contribution of the Glu79 in the Nrf2 ETGE motif indicates that the proper occupation of the P1 sub-pocket by the aliphatic carboxylic acid group should be addressed in the rational design of the Keap1 PPI inhibitors. The Glu43 of Prothymosin α also gave the similar result. The Asp349 of P62, which only forms strong canonical hydrogen bond with Arg380 in the P1 cavity, shows much lower contribution to the effective binding energy. The residue occupying the P2 cavity shows moderate effective binding energy and the binding free energy was attributed to the van der Waals contribution and the nonpolar desolvation term. It is noteworthy that some nonpolar residues show great influence on binding energy, such as Phe83 in Nrf2 ETGE motif, Ile28 in Nrf2 DLG motif and Leu355 in P62. The contributions of these nonpolar residues are mainly from the favorable van der Waals contribution. It partly explains why the nonpolar contribution dominated the effective binding energies of Keap1-Ligand.

Sub-pocket contribution of Keap1 binding cavity

In order to find the hot-spot residues in the Keap1 substrate binding cavity, we also evaluated the per-residue contribution of the residues on the binding surface. Each sub-pocket contributions were calculated and shown in Fig. 5 as the sum of component residues (detailed information of per-residue effective energy can be found in the Supporting Information). The most favorable interactions are formed in the P1 sub pocket in the Keap1–Nrf2 ETGE motif complex. This sub pocket’s contribution to binding is as low as −17.3 kcal/mol. It is consistent with that the most favorable residue, Glu79 in Nrf2 ETGE motif, occupied the P1 sub pocket. The mutation of key Arg415 and Arg483 of Keap1, which are responsible for the electrostatic interactions with the substrate, has also been found in human cancer cells [44, 45]. As shown in the Fig. 5, the Nrf2 ETGE motif and the Prothymosin α make use of the whole four sub pockets which are responsible for the most of the effective binding energies and the two peptides are also more potent than the others.

Fig. 5
figure 5

Per-residue contribution to the binding effective energy of the Keap1 bound to different peptide substrates. The per-residue contributions were calculated by applying the MM-GBSA decomposition approach to MD trajectories of Keap1 in complex with a Nrf2 ETGE motif, b Nrf2 DLG motif, c Prothymosin α peptide, and d P62 peptide. The per-residue contributions are mapped onto the starting structures of the simulation using a color code with a linear scale. Residues whose contributions to the effective free energy ΔGEff ≤ −1 kcal/mol (Table S3 in the Supporting Information) are labeled. e Per-pocket contribution to the effective binding energy. Keap1–Nrf2 ETGE motif was colored as blue, Keap1–Nrf2 DLG motif was colored as red, Keap1–Prothymosin α was colored as green and the Keap1–P62 was colored as purple

The polar sub pockets, P1 and P2, are the main source of effective binding energies. While the P3 pocket, which is occupied by the peptide backbone, is important for binding, but it has little contribution to the effective binding energy. Noteworthily, the hydrophobic sub pockets are also important for the binding energy, especially for the P4 sub pocket which is used by all of the four peptide ligands.

Discussions and conclusions

E3 ubiquitin ligases have been expected as attractive candidates of drug targets for a long time [5]. However, the successful cases are limited. The biggest obstacle may be that modulation of E3 ligase activities requires the targeting of PPI, which is highly challenging drug targets. Different from other kinds of PPI, the E3 ligase involved interactome has some promising characters. Each ligase degrades multiple proteins and many other proteins have been found to inhibit ubiquitination through disrupting the PPI of E3 ligase and its substrate. These natural binding partners of the E3 ligase can not only guide the discovery of novel substrates, but also benefit the discovery of the E3 ligase inhibitors. Herein, we present a novel strategy that using multiple substrates to elucidate the sub-pocket-activity and per-residue-activity relationship of the E3 ligase and substrate. MD simulation, MM-GBSA binding energy calculation and energy decomposition scheme were incorporated to give the quantitative contributions of sub-pocket and per-residue to binding. With the help of the results of multiple substrates, the hot spot of the E3 ligase can be obtained in the quantitative descriptions and the key residue of the substrate can be identified to directly design the inhibitors. The method was used for the elucidation of the intermolecular recognition between the E3 ligase Keap1 and its substrates, which can directly guide the discovery of novel Keap1 substrates. Several determinants were firstly identified for the modulator design, and the strong binding motif of Keap1 was firstly proposed for the substrate discovery.

Nonpolar interactions are the main source of the binding energy

Previous research has identified that the electrostatic interactions between the two acidic glutamates of the ETGE motif and conserved arginines (Arg380, Arg415, and Arg483) at the entrance of the central cavity on the bottom side of the β-propeller structure of Keap1-DC are critical for binding [41]. However, in this paper, we showed that the effective binding energy was mainly from the nonpolar contribution, especially the van der Waals interactions. Not only the hydrophobic amino acids, but also these polar residues with the carbon chains do contribute to the total nonpolar part of the binding energy. Both of the hydrophobic sub pockets, P4 and P5, can gave the positive and statistically significant impact on the binding. The residues located in the P4 and P5 sub pockets, such as Leu and Phe, indicate that these hydrophobic sites favor the large hydrophobic group.

Electrostatic interactions are the driving force for Keap1 binding

Although the electrostatic contribution of the effective binding energy was not in a dominant position, it varied widely within the four substrates and it played an important role in the distinction of the binding affinity. The total electrostatic contribution varied from −13.7 to 28.9 kcal/mol, while the total nonpolar only varied from −56.7 to −49.7 kcal/mol. Noteworthily, the polar P1 sub pocket shows extremely huge difference in the binding energy between different substrates, from −17.3 kcal/mol in Keap1–Nrf2 ETGE motif to −4.2 kcal/mol in P62, indicating that only the effective occupation of the P1 sub pocket could cause the favorable binding energy changes. While in the case of P62, though the Asp349 could interact with Arg415 in Keap1, it was not large enough to compensate the desolvation penalties associated with the binding event. Thus, in order to obtain the energy advantages in the polar sub pockets, the carboxy group in the polar sub pockets should be well positioned and properly linked by the carbon chains. Recently, the further study [46] of P62–Keap1 showed that Ser351 phosphorylation of the autophagy-adaptor protein P62 markedly increases P62’s binding affinity for Keap1. Phosphorylated Ser351 formed extra hydrogen bonds with Arg483 and Ser508 of Keap1 and showed similar binding mode compared to Glu79 in ETGE of Nrf2. ITC experiment results also showed that phosphorylated P62 for Keap1-DC is approximately 30-fold higher than that of nonphosphorylated P62 but fivefold lower than that of ETGE. Our research combining with these experimental results further confirmed that the electrostatic interactions involved in key arginines, Arg380, Arg415, Arg483, are the driving force for Keap1 binding.

Key Glu residue and its mimic are the core residues of the Keap1 binding motif

Previous studies have identified that the two Glu residues of the ETGE motif are important for binding depending on the experimental Alanine mutation results. Herein, we explained why the proper positioned glutamate was so important for Keap1 binding in the dynamic process. The carboxyl group of the glutamic acids can form multiple stable electrostatic interactions with the key arginines along the MD simulation. In addition, the hydrophobic carbon chain are also important for binding. The carbon chain not only contributes to the binding energy through the favorable vdW interaction, but also can keep the carboxyl group in the appropriate place. On contrast, in the case of the Nrf2 DLG peptide, the Asp residue did not make as much contribution as the Glu residue in the Nrf2 ETGE peptide to binding, which further indicated the core role of the Glu residue. The research of phosphorylated and nonphosphorylated P62 gave an alternative way to investigate the Keap1 substrate. Phosphorylated serine can mimic the Glu residue and form multiple hydrogen bonds with the polar residues of Keap1. Phosphorylated P62 has similar binding affinity compared to Keap1. Our results also showed that the Glu residue’s contributions to binding mainly originated from the electrostatic interactions formed by the carboxyl group and the nonpolar interactions involved in the carbon chain. It is reasonable that the phosphorylated Ser residue can replace the Glu in the ETGE motif. The well-known binding motif of Keap1 is the (D/N)XETGE, which has been validated by several substrates, such as IKK beta, PGAM5 and PALB2. However, inspired by these research results, all of these proteins that contain the motif (D/N)X(E/S)TG(E/S) could be the Keap1’s substrate. It can be used to find the potential Keap1 substrates and fulfill the Keap1 related interactome.

Keap1 inhibitor design based on the substrate recognition mechanism

Enhancing Nrf2 activity through modulating the Keap1 mediated Nrf2 inhibition has been identified as an important approach for prevention of cancer and other chronic diseases in which oxidative and inflammatory stress contribute [47]. However, most known Nrf2 activators are electrophilic species or metabolically activated to become electrophilic [48] and the molecular mechanism of these activators is the covalent binding to the thiol of the cysteine, which does not possess selectivity and specificity for the ubiquitous cysteines in cells. Thus, the targets of currently known Nrf2 activators could be promiscuous, which may lead to intolerable side effects. Intriguingly, PPI of Keap1–Nrf2 has evolved as a direct molecular target of ARE activation and opens up a new direction for the design of reversible ARE inducers with high specificity and potency [19]. These results reported in this paper can contribute to the rational design of PPI inhibitors of Keap1–Nrf2, which has been used in the Keap1–Nrf2 PPI inhibitor discovery [25]. Several tips could be suggested for the future design of PPI inhibitors of Keap1.

  1. 1.

    The main driving force for the substrate binding is hydrophobic interactions. The total nonpolar part, especially the van der Walls contributions, is the main source of the binding energy. Thus fully occupying the five sub pockets is important to the design of potent PPI inhibitors.

  2. 2.

    Effective occupying the pol ar sub pockets could effectively improve the binding affinity. Stable strong polar interactions with polar pockets do not mean the favorable binding energy changes. Only the carboxyl group in the P1 or P2 which is well positioned and appropriately linked to the scaffold could contribute to the total binding energy. The Glu79 of the Nrf2 ETGE motif in the P1 pocket is an excellent example.

  3. 3.

    The scaffold of the inhibitor should occupy the P3 sub pocket to stabilize the binding conformation. Although the P3 sub pocket does not contribute to the total binding energy, it is important to fix the conformation of the ligand. The scaffold of the PPI inhibitors should occupy this cavity to enable the side chain with the appropriate conformation.

  4. 4.

    The hydrophobic fragments located in the P4 and P5 sub pockets could give extra binding energy contributions. The total contributions of P4 and P5 are more than one-third of the total binding energy in all cases. The potent PPI inhibitors should make full use of the two pockets. The Leu, Phe and Ile used by the peptide ligands indicate that the hydrophobic sites favor the large hydrophobic group.