Introduction

Angiotensin-I-converting enzyme (ACE) is by definition associated with the renin-angiotensin system, which regulates peripheral blood pressure. The enzyme can increase blood pressure by converting angiotensin I to the potent vasoconstrictor, angiotensin II, and catalyse the degradation of bradykinin and enkephalins. Inhibition of ACE may therefore exert an antihypertensive effect and potent synthetic inhibitors of ACE are used extensively in the treatment of hypertension in humans [1]. Several peptides derived from milk proteins by enzymatic cleavage have also been found to be potent inhibitors of ACE [2, 3]. Thus, foods enriched with such inhibitory peptides could be targeted towards consumers as functional food with milk-derived neutraceuticals that reduce blood pressure. A composition of peptides in the foods giving a high inhibitory potency would be crucial to obtain claimed health effects. Taking into account the large amount of theoretical possible peptides, i.e. 400 dipeptides, 8,000 tripeptides, 160,000 quatropeptides, 3.2×106 pentapeptides etc., the examination of all possible peptides to find highly efficient inhibitors would be a daunting task. At present, the main strategy has been to identify and characterise inhibitory peptides isolated from enzymatic digests of proteins, but methods describing the relationship between peptide structure and ACE-inhibition are needed to predict theoretical inhibitory potential of food protein hydrolysates.

One approach is to develop statistical models that predict the relationship between structure (e.g. amino acid sequence) and activity (e.g. ACE-inhibition). The approach is referred to as quantitative structure-activity relationship (QSAR) modelling. Besides contributing to a biochemical understanding of which peptides show activity, it provides a tool to predict the amino acid sequence of peptides that would give a potent inhibitory potential. Using physico-chemical variables to describe the chemical structure of active components, data analysis can reveal the relationship between activity and structure. Besides a suitable dataset of active compounds, an optimal set of descriptor variables is critical in QSAR modelling. Descriptor variables such as hydrophobicity/-philicity, molecular mass and shape, charge and electronic properties of the individual amino acids are good candidates. The development within the field of chemometrics has introduced powerful regression-type techniques such as partial least square regression (PLSR) to find relationships between variables [4]. QSAR-modelling makes possible the identification of relationships between variables describing the chemical structure of peptides and their activity.

A recent development of QSAR on peptides is the use of amino acid “z-scores” obtained by principal component analysis (PCA) of property data [5, 6]. The z-scores have been interpreted as related to hydrophilicity (z1-score), side-chain bulk (z2-score) and electronic properties (z3-score) of amino acids. The z-scores have been proven to be useful for modelling a number of biological effects of small peptides, among them inhibition of ACE activity by dipeptides [7]. Since a QSAR-modeling study of ACE-inhibition activity of dipeptides has been proven successful and provided valuable information, it seems warranted to extend it to larger peptides and with variables more directly linked to physico-chemical properties than the z-scores.

The objective of this study was to use hydrophobicity/-philicity, molecular size and charge properties as descriptors of the amino acids in QSAR-modelling of ACE-inhibiting peptides derived from milk proteins and compare these results with modelling using z-scores. Based on obtained models, biochemical interpretation of the relationship between structure and activity may be expected.

Materials and methods

The primary structure of ACE inhibitory peptides derived from milk proteins and their respective inhibition of activity (Table 1) expressed as peptide concentration (μM) required to inhibit ACE by 50% (IC50%) was obtained from a review by Fitzgerald and Meisel [2]. Peptides up to a length of eight amino acids and with a specific measurement of IC50% in μM were used in the modelling. Five continuous and three categorical variables describing the physico-chemical properties of amino acids were used (Table 2). Each amino acid at a given position in the peptides was thus described by eight variables. Special focus was given to the two external amino acids in N- or C-terminal position further called N- or C-terminal dipeptide, respectively. The N-or C-terminal dipeptide could then be described by a 17-term model (Eq. 1).

Table 1 Peptide fragments from milk proteins and their ACE inhibition expressed as log IC50% (μmol/l) (from literature review by Fitzgerald and Meisel [2])
Table 2 Physico-chemical properties of amino acids used as descriptor variables
$$y = b_{0} + {\sum\limits_{j = 1}^2 {{\sum\limits_{i = 1}^8 {b_{{ij}} x_{{ij}} } }} },$$
(1)

where y is the modelled property (ACE-inhibition), j =1,2 is the position of the amino acid in the N- or C-terminal dipeptide, i =1,...,8 is the descriptor variables (Table 2) and consequently x ij is descriptor variable number i in position j.

The amino acid z-scores used were those reported by Jonsson et al. [5]. Using the three z-scores for each amino acid, properties to a dipeptide region could be described by a six-term model, where the terms z11, z21 and z31 were scores for the amino acid in position 1 and z12, z22 and z32 were scores for the amino acid in position 2 giving the following model

$$y = b_{0} + b_{{11}} z_{{11}} + b_{{21}} z_{{21}} + b_{{31}} z_{{31}} + b_{{12}} z_{{12}} + b_{{22}} z_{{22}} + b_{{32}} z_{{32}} $$
(2)

Modelling of the ACE inhibition based on the physico-chemical descriptor variables or the z-scores for amino acids was performed calculating partial least square regression (PLSR) with full cross validation. The predictive accuracy of a model is described by the multivariate correlation coefficient (R) after full cross-validation and the root mean square error of prediction (RMSEP), which is defined for a response variable y i as

$${\text{RMSEP}} = {\sqrt {\frac{1} {N}{\sum\limits_{i = 1}^N {{\left( {\ifmmode\expandafter\hat\else\expandafter\^\fi{y}_{i} - y_{i} } \right)}^{2} } }} }$$
(3)

where y i is the measured data from analysis of sample number i, \(\ifmmode\expandafter\hat\else\expandafter\^\fi{y}_{i} \) is the predicted value obtained by full cross validation, and N is the number of objects used in the prediction. In order to compare the influence of measurement uncertainty on the predictive ability of the models, the relative ability of prediction (RAP) is used [8]. RAP was originally developed in connection with sensory analysis and takes into account the level of experimental error in the reference data. It is defined as

$${\text{RAP = }}\frac{{{\text{SD}}^{{\text{2}}}_{{{\text{tot}}}} {\text{ - RMSEP}}^{{\text{2}}} }} {{{\text{SD}}^{{\text{2}}}_{{{\text{tot}}}} {\text{ - SD}}^{{\text{2}}}_{{{\text{ref}}}} }}$$
(4)

where SDtot is the standard deviation for the data-set, SDref is a standard error that indicates the uncertainty of the analysis and RMSEP is the root mean square error of prediction (after full cross-validation). By removal of SDref, the RAP equation will express the squared multivariate correlation coefficient [9].

Partial least square regression was performed with the Unscrambler software, version 8.0 (Camo Prosess AS, Oslo, Norway) and statistical test of regression coefficient with the Minitab Statistical Software, release 13.1 (Minitab Inc., State College, PA, USA).

Results and discussion

Fitzgerald and Meisel [2] reviewed properties of milk protein hydrolysates and bioactive peptides and compiled a group of ACE-inhibitory peptides derived from milk proteins (Table 1). A set of variables describing physico-chemical properties of amino acids (Table 2) was chosen. Previous research on structure-activity relationship between ACE-inhibitory peptides have pointed out that hydrophobic amino acid residues and charged side groups influence inhibitory potential. To cover different effects of hydrophobicity three descriptive variables were included in the QSAR-modelling. The Van der Waals volume describes molecular volume and thereby steric effects of amino acid side-chains. Initially, the molecular weight of amino acids was also included in the models, but since van der Waals volume and molecular mass of amino acids are highly correlated (r=0.96) this descriptor was later neglected in the modeling work. Three categorical variables describing aromatic (i.e. Trp, Tyr, Phe) and positively (i.e. Lys, Arg, His) or negatively (i.e. Asp, Glu) charged side chains were also included with a value of 1 for presence of given type of amino acid and value of 0 for absence.

Modelling work with the data set revealed that a logarithmic transformation of IC50%-values improved the models. This is in agreement with previous QSAR-modelling using so-called z-scores to amino acids on a set of ACE-inhibiting dipeptides [7]. The ACE-inhibition was also in that study expressed as a logarithmic transformation of IC50%-values. The data set of ACE-inhibitory peptides derived from milk proteins is made up of different research work over a 15-year period, and the ACE inhibition was measured using different assays and expressions of measurement uncertainty are lacking in several of the studies. It was therefore difficult to derive the precise uncertainty measurement based on those data. However, recently in our laboratory a study on ACE-inhibition by ethanol-soluble peptides from fish was conducted using an extract of rabbit lung acetone powder as source of ACE and furanacryloyl-Phe-Gly-Gly (FAPGG) as substrate [10]. The standard deviation between replicates for log IC50% based on the data from that study was 0.06. At such a low measurement uncertainty, the difference between RAP and squared multivariate correlation coefficient was neglectible. RAP is one if measurement uncertainty is equivalent to the value of RMSEP.

QSAR-models on N- or C-terminal dipeptides for sub-sets of ACE-inhibiting peptides with increasing length were calculated by PLSR and the predictive ability assessed by the multivariate correlation coefficient (R) (Table 3). A highly significant correlation (p<0.001) was found for the QSAR-model based on the C-terminal dipeptide for peptides up to six amino acids in length. The low R for the data set comprising only di- and tripeptides are likely a result of a low number of samples used in the model. When the data set was increased with longer peptides, the prediction ability improved. The lower R-value when peptides with seven and eight amino acids were included, might reflect that for smaller peptides the ACE inhibition potential is primarily a result of the C-terminal structure, but as length increases steric effects that are not expressed by this QSAR-model begins to interfere with the results. In contrast to C-terminus, in which the composition of the two most external amino acids had a clear influence on the ACE inhibitory potency, the two amino acids at N-terminus had no apparent influence on inhibition.

Eight variables described each of two terminal amino acids and regression model with 17 terms were generated (Eq. 1). Work was undertaken to optimise the model using fewer terms. Martens and Martens [11] developed a modified “Jack-knife” method to identify collineary and noisy variables that can be left out to improve the model. An improved prediction (RMSEP=0.60, R=0.73, p<0.001) was obtained with a three-term model using side-chain hydrophobicity (x1), positively charged side chain for amino acid (x2) in C-terminal position and van der Waals volume for the amino acid (x3) next to C-terminal position. Peptide LAHKAL was so poorly predicted by this model that it was considered as an outlier. The QSAR model is given in Eq. 5 and a plot of predicted versus measured values is given in Fig. 1.

Fig. 1
figure 1

Predicted versus measured values for inhibition of angiotensin-I-converting enzyme by peptides up to six amino acid residues in length derived from milk proteins using the QSAR-model (Eq. 5) based on hydrophobicity, van der Waals volume and positively charged side chains of the two last amino acids comprising C-terminus (RMSEP=0.60, R=0.73, p<0.001). Numbers correspond to peptides in Table 1

$$ \log {\text{IC}}_{{50\% }} = 1.46 - 9.29 \cdot 10^{{ - 5}} x_{1} + 0.52x_{2} + 3.21 \cdot 10^{{ - 2}} x_{3} $$
(5)

Based on the regression coefficient of the weighted variables, the three terms are of relatively similar importance for predicting ACE inhibition by peptides. A biochemical interpretation of this QSAR-model is that increased side chain hydrophobicity of the amino acid and absence of positive charge in the C-terminal position enhance ACE-inhibitory potential (decreased log IC50%), while increased side chain size of the amino acid next to the C-terminal position decreases ACE-inhibitory potential. Such an interpretation of the QSAR-model is supported by previous research on the structure-activity relationship of ACE-inhibitory peptides pointing out that the C-terminal tripeptide region of the substrate influences binding to ACE significantly and that peptides containing hydrophobic amino acid residues in the C-terminal region display a high potency for inhibition [12, 13]. Some studies have suggested that the presence of amino acids with a positively charged side group contributes significantly to ACE inhibitory potency [2] The effect of positively charged side groups needs further investigation. Omitting that variable from the model presented in Eq. (5) somewhat reduced the predictive ability (RMSEP=0.61, R=0.71, p<0.001) without altering the relative effect and biochemical interpretation of the two remaining variables.

As an alternative to QSAR-modelling on specific physico-chemical descriptor variables for the amino acids, principal component analysis (PCA) as performed on large amount of structural and property data for amino acids indicated that three principal components accounted for over 80% of the variance in the data [5, 6]. The three first PCs represented properties closely aligned with the chemical concepts of hydrophilicity (z1), molecular size (z2) and electronic properties (z3). Using this approach, ACE inhibition by peptides derived from milk proteins (Table 1) were examined for structure-activity relationships. The same approach for QSAR-modelling as presented in Table 3, but using z-scores [5] instead of physico-chemical variables was explored (Table 4). Similar to the models with physico-chemical variables, highly significant predictive ability was found for peptides up to six amino acids in length based on the properties of the dipeptide region in the C-terminal position. Again, no correlation between the N-terminal structure and ACE-inhibition was found; except for peptides up to five amino acid residues at 5%-level, but taken into account the lack of significant correlation for the other sub-set of peptides this is likely a coincidence. These findings support that ACE-inhibiting potential for smaller peptides are linked to the composition of the C-terminal region, but not the N-terminal, and as peptide length increases other steric effects not related to the C-terminal structure begins to interfere.

Table 3 The multivariate correlation coefficient (R) for QSAR-models on the two most external amino acids (aa) in the N- or C-terminal position using physio-chemical descriptor variables (Table 2) for subsets ( n =number of peptides in subset) including peptides of increasing length
Table 4 The multivariate correlation coefficient (R) for QSAR-models on the two most external amino acids (aa) in the N- or C-terminal region using z-scores (from reference [5] for subsets ( n =number of peptides in subset) including peptides of increasing length

Even though similar predictive ability of the QSAR-models was obtained using either z-scores or physico-chemical properties of amino acids, a disadvantage with the z-scores is that the biochemical interpretation of structure-activity relationships becomes more difficult. For the dataset containing peptides from two to six amino acids in length PLSR gave the following model:

$$\log {\text{IC}}_{{50\% }} = 2.21 + 0.03z_{{1c1}} - 0.1z_{{2c1}} - 0.21z_{{3c1}} - 0.07z_{{1c2}} + 0.12z_{{2c2}} - 0.17z_{{3c2}} ,$$
(6)

where z1c1, z2c1 and z3c1 are the z1, z2 and z3 scores, respectively, for the amino acid in C-terminal position and z1c2, z2c2 and z3c2 are the z1, z2 and z3 scores, respectively, for the amino acid next to the C-terminal position. A plot of predicted versus measured values is given in Fig. 2. Based on the modified Jack-knife method developed by Martens and Martens [11] to identify important terms in PLSR-models, z3c1, z2c2 and z3c2 were identified. As found using physico-chemical variables in the QSAR-modelling, the peptide LAHKAL was poorly predicted after full cross validation supporting the view that it can be regarded as an outlier. Modelling ACE-inhibition of dipeptides using z-scores has been done by Hellberg et al. [7], who found that hydrophobicity (z1) and size (z2) of the C-terminal position followed by electronic effects (z3) and hydrophobicity (z1) of the N-terminal position in the dipeptides were the most important factors of the QSAR-model. Studies on peptides longer than dipeptides were not presented.

Fig. 2
figure 2

Predicted versus measured values for inhibition of angiotensin-I-converting enzyme by peptides up to six amino acid residues in length derived from milk proteins using the QSAR-model (Eq. 6) based on z-scores of two last amino acids comprising C-terminus (RMSEP=0.65, R=0.67, p<0.001). Numbers correspond to peptides in Table 1

QSAR-modelling of ACE-inhibitory peptides derived from milk proteins found a relationship between the compositions of the C-terminal region and inhibiting potency. Such a relationship was not apparent for the N-terminal region. A correlation was found between ACE-inhibition and structural properties related to hydrophobicity, positive charge and molecular volume of the amino acids at the C-terminal region covering the two last amino acids for peptides containing up to six amino acid residues. For longer peptides, steric effects not taken into account in the QSAR-model may be important. Measuring ACE-inhibition of peptides predicted to have different inhibitory potential should further validate the findings of the QSAR-model. Predicting bioactivity of peptides can help to identify food proteins containing encrypted peptides of potential for functional foods.