Introduction

Nattokinase was first found by Sumi, and then it was purified and characterized by Fujita from the vegetable cheese natto, a soybean fermented food in Japan [1, 2]. Nattokinase is produced by Bacillus subtilis subsp. natto under the gene aprN, a serine protease composed of 275 amino acids, and well crystalized by Yanagisawa [3, 4]. This protein exhibits an antioxidant activity and fibrinolytic activity to degrade fibrin. Nattokinase also inhibits the angiotensin I converting enzyme, thus reducing blood pressure activity, with various pathways, highlighting the importance of this enzyme to treat cardiovascular diseases [5,6,7,8]. Other agents to treat thrombosis, such as urokinase, streptokinase, staphylokinase, and plasminogen activator, only functioning by converting plasminogen to plasmin, have a short half-life, and display uncontrollable adverse effects, such as accelerated fibrinolysis and hemorrhaging [8, 9]. The adverse effects of nattokinase oral treatment are not found in both animals and humans (2000 FU/capsule) [10, 11]. Thus, this protein is considered as a potential and natural supplement to prevent heart and cardiovascular diseases [12].

Oral consumption of nattokinase has some drawbacks, such as the absence of clear data on the status of the pharmacokinetics of nattokinase, which is the absorption or the intactness of the protein in the blood stream, and the dose is also considered high [8]. The formulation for injection provides one of the solutions to these drawbacks, especially to treat acute cardiovascular diseases. Unfortunately, nattokinase also displays immunogenicity effect [13]. Moreover, NSK-SD® product is also considered inflicting similar allergy risks compared to the other soy-derived products [14].

One solution is to conduct mutagenesis experiment to develop an entirely new type of nattokinase that displays low antigenicity. Computational analysis used in a preliminary study determines which amino acid residues are immunogenic and allow mutation to occur. Some strategies have been well demonstrated in arginine deaminase by Zarei et al. [15] and in peroxidase by Fattahian et al. [16]. Specifically, the B-cell epitopes of nattokinase, will be determined both continuously (linearly) and discontinuously (conformationally) by various tools [17,18,19,20,21,22]. Besides, the protein engineering of nattokinase has been conducted to increase the activity and to maintain the stability of the proteins using computational analysis, further emphasizing the usefulness of in silico analysis prior to experiment [23]. The nattokinase with less immunogenicity also suggests a new innovation of nattokinase type, thus broadens the current nattokinase market.

In the current study, both continuous and discontinuous B-cell epitopes of nattokinase were examined. Those selected amino acids were subjected to mutagenesis using rational design by considering the conservation of the residues and its 3D structure and stability. In addition, the optimization of codon for the production of mutated nattokinase in Escherichia coli as host was developed.

Materials and methods

Materials

The protein sequence of nattokinase was retrieved from UniProtKB (accession no. Q93L66). It comprised of 275 amino acids and was saved in FASTA format. The 3D structure of nattokinase was obtained by Yanagisawa et al. [4] by an X-ray crystallography method and retrieved from RCSB PDB under the code 4dww. HLA-DM 3D structure was retrieved from RCSB PDB (code: 2bc4) as a receptor for the molecular docking with nattokinase as ligand.

Antigenicity preliminary prediction of nattokinase

Nattokinase probability as an antigen was predicted with VaxiJen (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) [24]. This web server measured the probability using Auto Cross Covariance method; thus, it is independent from the alignment method. The cut-off to consider the protein as antigen from bacteria is 0.4 by default.

Determination of B-cell epitopes

Various tools were used to predict both continuous and discontinuous B-cell epitopes. For the continuous B-cell epitopes, Bepipred 2.0 (http://www.cbs.dtu.dk/services/BepiPred/) was employed with a threshold of 0.5, specificity of 0.57158, and sensitivity of 0.58564. Bepipred 2.0 was based on random forest algorithm trained using various epitopes and non-epitopes of amino acids interpreted from antibody-antigen crystal structures [17]. ABCpred (http://crdd.osdd.net/raghava/abcpred/) was used to determine B-cell epitopes with a threshold of 0.8 as long as 16 mers [21]. SVMTriP (http://sysbio.unl.edu/SVMTriP/) was also used to predict the continuous 20-mer-long B-cell epitopes, which, in this case, achieved sensitivity of 80.1% and precision of 55.2% [25]. Flagged sequences were selected as epitopes. Another tool to predict continuous B-cell epitopes was Kolaskar and Tongaonkar antigenicity scale [22], accessible at Immune Epitope Database (IEDB) (http://tools.iedb.org/bcell/). This tool stored data of epitopes obtained from various experiments.

CBTOPE (conformational B-cell epitope prediction) (http://crdd.osdd.net/raghava/cbtope/) server was used to predict the discontinuous B-cell epitopes with an accuracy of more than 85%. By default, the threshold was − 0.3 and the residues were above 5 regarded as epitopes. Amino acid composition as an input for SVM algorithm was analyzed using CBTOPE [26]. BCEPred was also employed (http://crdd.osdd.net/raghava/bcepred/) to predict the discontinuous B-cell epitopes by evaluating the combination of various physicochemical properties of protein. The accuracy was 58.7% at the threshold of 2.38 by default [27]. Emini surface accessibility, accessed at IEDB Analysis Resource [28] was used to analyze the exposed surface of protein accessible to antibody. The threshold for the probability score was 1.

Determination of the mutated residues

Conservation of each residue of nattokinase had to be evaluated first before doing mutational analysis. Amino acid conservation was analyzed with Swiss-Model ExPASy using entropy method [29,30,31]. Entropy score below 2 was regarded as conserved residue. Thus, this residue should be excluded in the mutational analysis as not to disturb the structure and the stability of the protein. Residues were not located on the protein surface, and predicted Emini surface accessibility was excluded. Residues allowed for mutation must conform to three of the predicted continuous epitopes and one of the predicted discontinuous epitopes.

In silico mutagenesis of nattokinase

I-Mutant 2.0 (http://folding.biofold.org/i-mutant/i-mutant2.0.html) was used to perform the mutagenesis using substitution mutation. 3D structure protein (4dww) was subjected to the mutagenesis analysis. This tool could predict the change of protein stability from the result of single point mutation with the help of Gibbs’ free energy difference [32]. The formula applied was ΔG (mutein) – ΔG (wild-type protein), and positive free energy difference indicating that the mutation was permissible. The condition for mutation was set to correspond with the human physiological condition, with the pH of 7.4 and the temperature of 37 °C. For every acceptable single mutation, the antigenicity score was reevaluated with the help of VaxiJen web server independently. Then, the suggestible mutations from various residues were re-analyzed with VaxiJen in combined techniques to decide whether multiple mutations demonstrate a lower antigenicity probability score compared to a single mutation In addition, dDFIRE was used to evaluate mutant stability compare to native structure [33].

3D modeling and validation of the mutein

The determined mutein was then 3D modeled by PHYRE2 protein fold recognition server (http://www.sbg.bio.ic.ac.uk/phyre2) [34] and I-TASSER for comparison purpose. Both best 3D mutein models were compared with the native nattokinase and validated with molprobity (http://molprobity.biochem.duke.edu) generating Ramachandran plot that analyzed the favored and permissible regions for the dihedral angle of the protein backbone of amino acid residues [35, 36]. ProQ3/ProQ3D (http://proq3.bioinfo.se/pred/) was used to assess the quality of the 3D protein structure by single model method. ProQ3 was based on the combination of ProQ2, ProQRosCen (centroid model), and ProQCenFA (full atom model). ProQ3D is the deep-learning version of the ProQ3 that generates better correlation regarding the estimation of protein quality [37]. Moreover, the protein model quality was also analyzed with VERIFY3D [38, 39] and ERRAT [40] from SAVES 5.0 web servers (http://servicesn.mbi.ucla.edu/SAVES/).

The native and mutated nattokinase physicochemical properties and stability, such as Pi, molecular weight, instability index, aliphatic index, hydrophobicity by GRAVY, and half-life estimation at mammalian reticulocytes were analyzed with ProtParam. The secondary structure of protein was confirmed in SOPMA secondary structure prediction (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html). Catalytic site and binding pocket predictions of native and mutated nattokinase were compared using tools in the PHYRE2 based on catalytic site atlas [41] and fpocket2 program [42].

Molecular dynamics simulation

MD simulation was used to predict the fluctuation and stability of amino acid residues in the 3D structure protein. Native nattokinase and the 3D models of mutein generated by PHYRE2 were subjected to MD simulation. Protonation state at pH 7.4 from native and mutated nattokinase was done with PROPKA 3.1 [43]. Both structures were solvated with TIP3P water [44] and both system were neutralized with counter ion; hence, the dimension of the both system were 7.39 mm × 7.392 mm × 7.39 mm. Gromacs 2018 with OPLS force field [45, 46] was employed. Minimization was done for 50 ps, continued by NVT and NPT equilibration for 100 ps. Simulation temperature was kept at 310 K using V-rescale thermostat [47]. The MD simulation time was 50 ns with the 2 fs of time step and 10 Å of cutoff. VMD program was used to analyze the trajectory simulation [48]. Several parameters were computed, such as RMSD, RMSF of all residues, Rg, and SASA.

Codon optimization for in silico cloning

Reverse translation web server tool (http://www.bioinformatics.org/sms2/rev_trans.html) was employed to predict and to optimize the DNA sequence of the mutated nattokinase gene using E. coli as expression host. Evaluation of the appropriate codon for E. coli expression was done in GenScript rare codon analysis (https://www.genscript.com/tools/rare-codon-analysis) using codon adaptation index (CAI) with an ideal value of 0.8-1, GC content (ideal at 30–70%), codon frequency distribution (CFD) (ideal if below 30%), and detection of negative regulatory elements plus repeat sequences. The restriction site prediction of the optimized linear DNA sequence was employed (http://nc2.neb.com/NEBcutter2) to suggest the restriction enzymes which should not be used if this gene was cloned in expression vector, or to consider changing the DNA sequence with Wooble hypothesis concept to remove the recognition sites from the gene.

Results and discussion

Nattokinase antigenicity and B-cell epitopes determination

Determination of nattokinase antigenicity by VaxiJen web servers acted as a preliminary step to decide whether or not the mutagenesis should be done to reduce its immunogenicity. Indeed, the VaxiJen antigen probability score of nattokinase sequence (Q93L66) was 0.7391, far above the 0.4 threshold. Thus, it raised an importance to develop a less-immunogenic nattokinase by mutagenesis. In silico analysis aided various researchers to determine both continuous and discontinuous B-cell epitopes [15, 16, 49,50,51,52,53]. In general, four tools were used to predict continuous B-cell epitopes, two web servers to predict the discontinuous B-cell epitopes, and Emini surface accessibility assessment. The B-cell epitopes were overlapped by each of the conformational B-cell epitopes. The epitopes were determined based on the requirements that the residues must comply with three out of four continuous B-cell epitopes prediction, and one out of two discontinuous B-cell epitopes prediction. Nonetheless, the surface accessibility assessment was compulsory [15]. It was the exposed residues that usually acted as functional epitopes, which was well demonstrated by bauA from Acinetobacter baumanii [54]. The selected epitopes can be seen schematically in Fig. 1. It can be seen clearly that there were 5 residues (residue no. 18, 19, 59, 242, 245) that comply with five out of the seven web server tools. Yet, residue number 59 was excluded since this residue only conformed to the two out of the four continuous B-cell prediction tools. Linear B-cell epitopes prediction tools generated better accuracy than the discontinuous B-cell epitopes prediction. Several studies even only used linear B-cell tools for the prediction for the epitopes [46, 55, 56].

Fig. 1
figure 1

Illustration of the determination of the continuous and discontinuous B-cell epitopes together with the surface accessibility analysis to reveal which residues should be mutated

Entropy method (ExPaSy) was one way to determine the conserved residues. Less entropy indicated more conservancy. Besides, the conserved residues usually contributed to stability of the protein [15]. These residues must be excluded as not to reduce the stability of the protein. Nevertheless, the four putative B-cell epitopes were all not suggested as conserved (Supplementary file 1). Based on all these requirements, S18, Q19, T242, and Q245 were determined and continued for the mutagenesis analysis.

In silico mutagenesis of nattokinase

In silico mutagenesis was done with the help of I-Mutant 2.0. Substitution mutation for point mutation to the other 19 amino acids could be performed using this tool to predict their stability through the differences of Gibbs energy [32]. The choice of substituted amino acids were based on two rationales, the increase of the difference of Gibbs energy and the largest decrease of VaxiJen score, which indicated reduction in antigenicity. An increase in the Gibbs energy differences indicated a rise in stability. Thus, only the residues showing positive values from the difference of Gibbs energy after substitution mutation were used. Combined with VaxiJen, the antigenicity of the protein was reevaluated after substitution mutation with each residue. In Table 1, the mutation result was described in terms of differences of Gibbs energy and VaxiJen antigenicity score. Accordingly, the decision of which residues that should be mutated is the result of combination of positive values of difference of Gibbs energy and the largest decrease of VaxiJen score, which gave preliminary prediction of retainment or rise in stability and a decrease of immunogenicity in the mutated protein. With a single substitution mutation of amino acid, selected mutations were S18D, Q19I, T242Y, and Q245W. All the changed four amino acids were polar. Surface antigenic residues are likely to be hydrophilic amino acid or aromatic [15]. With the exception of S18D, all the residues were changed into non-polar amino acid. These changes reduced both protein surface accessibility and the immunogenicity as shown by Ramya and Pulicherla [57] in the in silico mutation of asparaginase for deimmunization.

Table 1 In silico mutagenesis and stability analysis of nattokinase by single substitution mutation technique. The mutation condition was set at 37 °C for the temperature and at the pH of 7.4

Although Zarei et al. [15] showed that mutation of single amino acid residues generated lower immunogenicity scores than multiple mutations, some studies indicated that the combinations of multiple mutations on epitopes dramatically reduced immunogenicity and IgE-binding activity [16, 58]. Thus, a series of combination mutation with its antigenicity score can be seen in Table 2. The combinations of all four substitution (S18D, Q19I, T242Y, and Q245W) mutations in one protein generated the lowest antigenicity. Protein conformational free energy calculation confirms that the mutation of nattokinase does not cause structural destabilization. The energy obtained from dDFIRE gives score − 588.11 and − 584.99 for mutated and native nattokinase, respectively.

Table 2 Determination of the lowest antigenicity score by multiple mutagenesis of nattokinase. Italicized entry indicates that the mutein gave lowest VaxiJen score and was subjected for further analyses

In addition to the antigenicity score of the mutated protein, the antigenicity scores for each surface exposed area determined by Emini surface accessibility assessment were also evaluated. The exposed areas of 237–242 and 244–254 residues were unified because the areas did not meet the requirement of an epitope. This mutation dramatically reduced the antigenicity score of the exposed residues (Table 3).

Table 3 Antigenicity score of the epitopes that was surface exposed, before and after mutation

Mutein validation, physicochemical, and stability analysis

The recommended mutein was then 3D modeled with the two web-server tools, PHYRE2, and I-TASSER. Validation analyses were employed to compare which web servers generated best geometric quality of the 3D-modeled protein. Figure 2 showed the Ramachandran plot for the general amino acid residues for the native nattokinase and mutein generated by both tools. The complete Ramachandran plot can be seen in supplementary file 2. The 3D structure of the mutein generated by PHYRE2 generated a better geometry related to the torsional angle between two adjacent residues than the mutein generated by I-TASSER and native nattokinase. There were no outlier residues in the mutein generated by PHYRE2, compared to 5 outlier residues given by the 3D mutein modeled by I-TASSER (Table 4). Moreover, the ProQ3D score of the mutein generated by PHYRE2 was higher than the one generated by I-TASSER. Compared to the native structure, higher ProQ score indicated better overall quality of the model [37].

Fig. 2
figure 2

Ramachandran plot for general amino acid residues of a nattokinase, b mutated nattokinase generated by PHYRE2, and c mutated nattokinase generated by I-TASSER

Table 4 Quality assessment and validation of the mutated nattokinase 3D model

VERIFY3D can be used to analyze the 3D model using its constituent component sequences (1D) and its environment [59]. More than 90% of all 3 protein model residues met the standard of the 1D-3D score which was ≥ 0.2 (Table 4). The overall quality factor for non-bonded interactions within three types of atoms: carbon (C), nitrogen (N), and oxygen (O) was analyzed using ERRAT. The error function versus the position of every nine-residue sliding windows was plotted [40]. Both ERRAT scores from the mutein were lower than the native protein related to the resolution of the model [60]. However, both mutein ERRAT scores were 50 above the threshold in general [61]. Still, the ERRAT score of the mutein model generated by PHYRE2 was above 90%, and this score was higher than the ERRAT score of the mutein model generated by I-TASSER. The ERRAT score above 90% showed the tendency of the protein model with the maximum resolution of 3 Å [40]. Overall, these validation results indicated that the 3D model of the mutein built by PHYRE2 was more proper. Muteins built by PHYRE2 were subjected to the various stability and the analysis of the physicochemical properties.

Regarding protein stability by instability index, the mutein was considered stable although the index was slightly higher than the native. The instability index algorithm was generated from the empirical data of the stability of protein and dipeptide composition [62]. Higher aliphatic index signified higher thermostability due to the hydrophobic interaction within non-polar groups from the aliphatic side chain of amino acids [63], in which mutein aliphatic index was higher than the native one. With hydropathy analysis by GRAVY, the mutein was more hydrophobic than native nattokinase. Lower GRAVY value indicated more residues were to be exposed on the surface of the protein [53]. Hydrophilicity is one method that correlates with the antigenicity of a protein. Regarding the secondary structure, there was a reduction in the beta-turn conformation in the mutein, compared to the native protein. Beta-turn conformation was well-correlated with antigenicity because the residues that composed the beta-turn were usually hydrophilic, flexible, and accessible [64]. Reduction in beta-turn conformation in the mutein also indicated reduction in immunogenicity (Table 5).

Table 5 Physicochemical properties, stability, and hydrophilicity of the native and mutated 3D model nattokinase

Functionality of an enzyme was also evaluated by comparing the catalytic site and binding pocket of both mutated and native nattokinase. As indicated in Fig. 3, the catalytic site (D32, H64, N155, and S221) and the binding pocket (S125-P130; A152-E156; S163-Y167, A169, S191, T220, and S221) of the mutated proteins were completely similar with those of the native nattokinase. This was further supported with structure imposition after 50 ns of molecular dynamics simulation which showed there is no major difference after simulation (Supp file 3). Thus, the mutein was predicted to have similar mode of action compared to the native nattokinase.

Fig. 3
figure 3

3D modeling of native nattokinase (a) and mutated nattokinase (b); catalytic site prediction of native nattokinase (c) and mutated nattokinase (d); and binding pocket of native nattokinase (e) and mutated nattokinase (f)

Molecular dynamics simulation

Molecular dynamic simulations were employed to investigate and to compare the protein and mutein behaviors, conformational stability, and flexibility. Some dynamic properties, such as RMSD, RMSF of all residues, Rg, SASA, and secondary structure projection along the trajectory were studied. RMSD of all the backbone atoms from the initial structure indicated the convergence of the protein [65]. Both native and mutated nattokinases generated similar fluctuations along with the simulation, whereas all RMSD values are below 2.5 Å. After 40 ns of the simulation, both protein and mutein only gained a small increase of RMSD. This small magnitude of fluctuations indicated that the simulation reached stable trajectories (Fig. 4). The superimposed structure resulted from the simulation also revealed a 1.45 Å RMSD difference between native and mutated nattokinase which indicated there was no significant difference in structure (Supp file 3).

Fig. 4
figure 4

Time evolution of the RMSD for native and mutated nattokinases

Rg and SASA were employed to analyze the geometrical behaviors of the native and mutated nattokinases. Both native and mutated nattokinase does not exhibit significant difference of Rg as depicted in Fig. 4a. The Rg relatively remained constant from the beginning until the end of the simulation. The implication of similar compactness between native and mutated nattokinase was further supported by SASA analysis, in which both SASA values along the simulation time did not differ significantly (Fig. 5). RMSF was employed to measure the flexibility of each residue in the protein. RMSF profiles between native and mutated nattokinase were similar (Fig. 6). However, there is an increase of flexibility in the amino acid residues of 150–160 due to the mutation. The gain of flexibility did not disturb the functional behavior of protein [66].

Fig. 5
figure 5

Rg plot (a) and SASA (b) as a function of time for native and mutated nattokinase

Fig. 6
figure 6

RMSF analysis of the native nattokinase and mutated nattokinase

In reference to the RMSD, structure superimposition Rg, SASA, and RMSF, it can be interpreted that the mutation did not disrupt protein conformation and behavior. Thus, the mutation option for the less immunogenic nattokinase was allowed.

Codon optimization and in silico transcription

Reverse translation was employed to determine the proper codon and DNA sequences of the mutated nattokinase. It was optimized by using E. coli as the host for the protein production (Supp file 4). The optimized DNA sequence was analyzed on its codon adaptation index related to the codon usage bias [67], %GC content, and the codon frequency distribution (CFD) was related to the rare codon hindering the translation process. All parameters indicated that the generated gene sequence was within the range of the threshold or even showed a maximum value for the CAI and CFD (Table 6). The gene could be well-expressed in the E. coli. However, there was no negative cis element and repeated elements which could negatively regulate the expression of the gene or cancel out the transcription process. The restriction sites of this in silico synthetic gene were also described in Fig. 7. These sites acted as prediction to add the adaptors in the cis and the trans-region of the gene. These adaptors should contain restriction sites based on the selected plasmid but should not be present in the gene itself. Various restriction sites of common restriction endonuclease, such as EcoRI, BamHI, NotI, HindIII, and SacI, were not present in the optimized codon for the mutated nattokinase synthetic gene. Thus, there was no need to exploit the Wooble hypothesis to remove any restriction sites from the suggested codons. This could ease the cloning process of the mutated nattokinase gene.

Table 6 Codon optimization parameter of the mutated nattokinase
Fig. 7
figure 7

Restriction sites prediction of the mutated nattokinase in silico synthetic gene

Conclusions

In this study, the strategy to determine nattokinase B-cell epitopes and to generate mutated nattokinase with less immunogenicity but similar stability and functionality as proposed. Four mutations of S18D, Q19I, T242Y, and Q245W, altogether in one protein, generated the lowest antigenicity score. The similar stability as well as conformation of the mutated nattokinase was implied through stability analysis, RMSD, RMSF, Rg, SASA, and structure superimposition. This suggested nattokinase mutation be more applicable in the formulation of the injection due to the better safety and the same efficacy to the native one. This could act as suggestion to reduce or minimize cost and time in the wet-lab mutagenesis experiment.