Introduction

Antimicrobial peptides (AMPs) isolated from many different species can be considered peptide antibiotics as their cytotoxic activity is significantly lower against host cells than against pathogens (Zasloff 2002). This high selectivity, a mechanism of action which is predominantly non-stereospecific, and their activity against multidrug-resistant pathogens make them promising lead compounds for drug development (Hancock and Lehrer 1998; Glukhov et al. 2005; Marr et al. 2006). Consequently, many laboratories have been engaged in research related to this new class of antibiotics, little known to clinicians, over the past two decades (Hancock and Sahl 2006; Bommarius and Kalman 2009; Zhang and Falla 2010). To date, no AMP has been approved by the FDA due to a combination of factors including high production costs, low bioavailability, moderate (micromolar) activity against microbial cells and a relatively high toxicity in comparison with conventional antibiotics (Marr et al. 2006). Yet, a strong interest persists in finding novel peptide antibiotics (Bommarius and Kalman 2009).

One serious shortcoming of all antibiotics is that they drive Darwinian evolution of bacterial populations toward resistant varieties. Their heavy use has led to a surge in multiply drug resistant strains, a serious problem fostering the search for novel antibiotic classes that can overcome it (Woodword 1998; Siegal 2008; Chen et al. 2009). AMPs, unlike antibiotics derived from secondary metabolites, are gene encoded and have maintained the ability to effectively counter infections in producer species over evolutionary time, despite the capacity of resistance development against them in some targeted microorganisms (Zasloff 2002; Perron et al. 2006; Yeaman and Yount 2007). It has been proposed that cationic AMPs and AMP-directed resistance mechanisms have co-evolved, leading to a host-pathogen balance that has shaped the existing AMP repertoire (Peschel and Sahl 2006). Furthermore, although specific intracellular targets for AMPs cannot be excluded (Kragol et al. 2001; Brogden 2005), the bacterial cytoplasmic membrane is most often their principal target (Matsuzaki 1998; Fernandez et al. 2009; Nicolas 2009). They have evolved to exploit major differences in bacterial with respect to eukaryotic membranes, such as the absence of cholesterol, greater abundance of anionic lipids and a stronger, inward directed electric field (Yeaman and Yount 2003). It is more difficult for bacteria to alter these characteristics than the more circumscribed molecular targets of conventional antibiotics (Steinberg et al. 1997). Research in AMPs as potential anti-infective agents is thus justified, especially at a time when the pharmaceutical industry seems to be abandoning research in development of novel antimicrobials (Norrby et al. 2005).

Most organisms use a wide panoply of peptide antibiotics, further decreasing the likeliness of resistance development. The dictum “Resistance is futile” is ensured by attacking bacteria with a cocktail of AMPs, often acting in synergy (Juretić 1990; Strandberg et al. 2009a), a strategy that has ensured their long-term persistence as the major source of natural antibiotics in multicellular organisms, including humans (Zasloff 2002). The fact that AMPs can have multiple other roles, including a tight collaboration with immune cells in fighting pathogenic microorganisms (Hancock 2001; Bowdish et al. 2005; Lai and Gallo 2009), further increases their potential as anti-infective agents.

Anurans are a particularly abundant source of AMPs. Unfortunately, due to the accelerated rate of species extinction (Vanhoye et al. 2003; Stuart et al. 2004; Rollins-Smith 2009; Rockström et al. 2009) we risk losing a significant part of this source of natural antibiotics. The spectrum of potential novel antibiotic classes can be significantly increased by basic research profiting from the vast and ever increasing amount of information saved in genomic and proteomic databases, as well as in published structure–activity data. This data-mining process, however, requires methods both to assess their potential as antimicrobials, and to predict their selectivity with respect to host cells. Methicillin-resistant S. aureus (MRSA), and other resistant bacterial strains can be killed quite easily with the appropriate choice of such AMPs (Pál et al. 2006; Conlon et al. 2009), either alone or in combination with other antimicrobials (Desbois et al. 2010), although toxicity to human cells remains a concern (Matsuzaki 2009). One approach to overcome this flaw is to use the structure–activity information deposited in published papers and biological databases to learn a priori what makes an AMP both active and selective.

In this paper we shall discuss different methods for using available evolutionary and structure–activity information to find novel peptide antibiotics with potentially high selectivity and low toxicity. While the development of rational methods for achieving these aims are amply covered in the published literature, efforts in this direction have been rather conservative, with some notable exceptions (Hawrani et al. 2008), and a closer look exposes some limitations (Hancock and Sahl 2006; Matsuzaki 2009). Typically, encouraging results are not easily transferrable to other non-homologous lead compounds.

We have been working on the development of a data-mining and peptide antibiotic design method capable of extracting physical characteristics from natural AMPs that correlate with high selectivity based on published structure–activity data, and using these to generate a large number of potential peptide antibiotics not homologous to any existing natural or synthetic AMPs (Juretić et al. 2009). We are combining this with a method for identifying new potential lead compounds based on the surprisingly high conservation of signal and pro-sequences in some families of AMP, for in-silico searches in large un-annotated genomic databases (e.g. EST databases). Conserved evolutionary information is used in both cases to propose novel selective putative AMPs. These in-silico methods must necessarily be followed by dedicated experiments for defining activity, selectivity, toxicity and mechanism of action of chemically synthesized versions of the identified AMPs.

The following sections will illustrate examples of the connection between in-silico design methods for synthetic AMP and identification methods for novel natural peptide antibiotics and the experimental testing of their activity-selectivity. They are based on the construction of algorithms with inbuilt expert rules for predicting high activity and selectivity (toxicity decrease) of AMP sequences extracted from dedicated sequence/activity databases and can also be used to evaluate variations in these parameters after introduction of specific point mutations into lead AMPs, as suggested by the algorithm.

Design of selective AMPs

Use of a therapeutic index to guide AMP design

The therapeutic index (TI) is often used as a parameter for evaluating AMPs. A common definition of TI is the HC50/MIC ratio, i.e. the peptide concentration causing 50% haemolysis of red blood cells (HC50) to the minimal concentration inhibiting overnight growth of bacteria in liquid assays (MIC). Compiling accurate TI values from the literature presents some problems, as it often requires comparing HC50 and MIC results from different laboratories using different protocols (Chen et al. 2006; Matsuzaki 2009). Furthermore, this definition is misleading, as it does not address the therapeutic potential. High values can be obtained for non-haemolytic peptides even though they are relatively inefficient antibacterial agents, and similarly for moderately haemolytic ones if the MIC is quite low.

The TI is simply a dimensionless selectivity parameter, comparing activity on erythrocytes to that against a chosen strain of microbial cells. Even this can be misleading as haemoglobin is released from cells only if the peptide creates large channels or rents in the membrane bilayer structure, and more subtle damage is missed. On the other hand, for bacteriostatic action, small, short-lived pores or lesions in the bacterial cytoplasmic membrane are sufficient (Matsuzaki et al. 1995) so that protons and cations can equilibrate across the membrane, destroying the strong bacterial electric field in the process (Bolintineanu et al. 2010). When the bacterial membrane potential drops significantly below its minimal value of around −130 mV, among other effects, its dissipation will halt ATP synthesis (Yeaman and Yount 2003). MICs are therefore a more sensitive measure of damage. Many workers thus use only the MIC, 1/MIC, or some related antibacterial activity parameter to correlate theoretical activity predictions with experimental results (Pathak et al. 1995; Ostberg and Kaznessis 2005; Fjell et al. 2009). In addition, one can collect 10 times more MIC than HC50 data, so the data-mining procedure is severely restricted when using TI values.

Obtaining an AMP with high antimicrobial but low haemolytic activities is not easy and requires the right choice of lead compounds (Kondejewski et al. 1999; Dathe et al. 2001; Jiang et al. 2008). Some limited success in increasing selectivity has been achieved by introducing different point mutations in either highly active AMPs (i.e. with a low MIC) to reduce haemolytic activity (Pérez-Payá et al. 1994; Pandey et al. 2010), or in non-cytotoxic natural lead compounds to increase antimicrobial activity without increasing toxicity (Bessalle et al. 1992; Maloy and Kari 1995). Such a strategy almost led to approval with pexiganan, an analogue of magainin 2 with much increased activity against a wide panel of bacterial species (Gottler and Ramamoorthy 2009). It was, however, significantly more toxic than its progenitor magainin 2 (HC50 = 45 and 1,000 μM respectively, see Juretić et al. 2009). As a rule, increasing antimicrobial activity through point mutation also results in higher cytotoxicity, with few notable exceptions (e.g. introducing insertion of Trp at position five in magainin-2) (Tachi et al. 2002; Imura et al. 2008). Another strategy has been to fuse a fragment from a generally cytotoxic peptide with one from a moderately active but selective AMP in the hope that the right fragment combination will increase antibacterial activity while retaining selectivity (Boman et al. 1989; Wade et al. 1992; Maloy and Kari 1995; Sun et al. 2005; Ferre et al. 2009).

A third possibility is to design peptides de novo to have both high antimicrobial activity and high selectivity, resulting in a high TI, by taking into account structural information stored in non-homologous antimicrobial peptides during their long and eventful evolution. As aptly observed by Hancock and Sahl (2006), a net cationic charge combined with the presence of hydrophobic residues is not sufficient, in the vast sequence space of molecules showing antimicrobial activity under chosen laboratory conditions, to earn the definition “AMP”. The long co-evolution of natural host defence peptides with microbes is a better guarantee of selective antimicrobial activity. What it is that these non-homologous natural AMPs have in common, and how to extract this evolutionary information, will be the subject of the next section.

QSAR studies of AMPs

For all quantitative structure–activity (QSAR) studies, including those on AMPs, the choice of descriptors is a crucial step for connecting structure with activity (Bhonsle et al. 2007). Most QSAR studies of AMPs have concentrated on defined groups of related, easily alignable peptides differing from one another at several sequence positions (Taboureau 2010). Using a variable number of peptide properties as descriptors (most often hydrophobicity and amphipathicity) (Hilpert et al. 2008), statistically significant correlations could be determined with measured activities in linear models (Lejon et al. 2004; Frecer et al. 2004; Taboureau et al. 2006; Langham et al. 2008), often multivariate linear regressions and principal component analysis (Yang et al. 2002). In more complex analyses, twenty five global mean peptide properties of protegrin homologs were calculated and used for development of several regression QSAR models (Ostberg and Kaznessis 2005). Tachi et al. (2002) stressed the necessity of taking into account position-dependent physicochemical properties. A recently developed nonlinear QSAR methodology for AMPs used hidden Markov models and neural networks, which were trained on large data sets of diverse peptides and then were able to identify novel short AMPs with high activity against several multiresistant bacterial strains (Fjell et al. 2007, 2009; Taboureau 2010). 3D-QSAR analysis of similar peptides acting against a particular bacterial strain identified specific physicochemical properties responsible for peptide activity and selectivity (Bhonsle et al. 2007). NMR is the best technique for extracting structural 3D information about peptide structure in a membrane environment (Haney et al. 2009) and one can reasonably expect that the rapid increase of NMR data for AMPs will lead to improved 3D-QSAR descriptors based on solved NMR structures (Mason et al. 2007; Ramamoorthy 2009).

Until this occurs, QSAR models are limited to using mean peptide properties extracted from related peptide sequences. However, these are not necessarily transferable from one AMP family to another. For a given structural class (e.g. helical AMPs) the positioning of amino acids in the sequence can be important (Tossi et al. 1997), but this positional information is lost when mean peptide properties are calculated. Varying amino acid attribute profiles, on going from the N to the C-terminus, can drastically change peptide activity and selectivity, with little or no change in mean peptide properties (Tachi et al. 2002).

Use of sequence moments to determine selectivity descriptors

Most natural helical AMPs exhibit lengthwise asymmetry of physicochemical peptide properties. This can be easily seen for the non-homologous anuran peptide antibiotics magainin 2, PGLa, dermaseptin 3 and ascaphin 1, using the on-line SPLIT algorithm (http://split.pmfst.hr/split) to create sequence-profiles (Fig. 1). The profiles for each peptide are quite different no matter which of 88 available amino acid attribute scales is used to calculate the preference for membrane buried helix (red line) or the preference for amphipathic α-alpha helical structure (grey line) (attribute scales and references are available on the web site). The SPLIT algorithm (Juretić et al. 2002) has been described in the literature as one of the three best bioinformatics tools for finding sequence position and orientation of transmembrane helices in integral membrane proteins (Cuthbertson et al. 2005). It is also very useful for examining finer details of predicted membrane associated secondary structure, with the additional advantage that in one version, a manual choice of amino acid attributes is possible.

Fig. 1
figure 1

SPLIT 3.5 profiles of amino acid attributes for melittin and anuran AMPs. The preference for membrane buried helix (red line), the preference for alpha helical amphipathic structure (grey line) and the preference for beta-strands (blue line) for non-selective (haemolytic) and selective peptides (toxins and antibiotics respectively). The bold straight line just under the x-axis is algorithm’s prediction for the sequence location of membrane spanning alpha-helix

Visual inspection of profiles generated by the SPLIT algorithm can help distinguish haemolytic peptides, as they are generally predicted to adopt a transmembrane helical conformation (TMH) (Fig. 1, upper panel: high preference for membrane buried helix), while non-haemolytic AMPs are not (Fig. 1, lower panel). Although it is possible to find haemolytic AMPs that are not TMH according to the SPLIT algorithm, none of the tested AMPs with high TI values with respect to E. coli were predicted to be TMH. Due to relatively high hydrophobic moments (grey line) and moderate hydrophobicity, monomers of these AMPs prefer to remain at the membrane surface (Bechinger et al. 1998; Grage et al. 2010).

Since the definition of hydrophobic moment by Eisenberg et al. (1982), many different modifications have been proposed. For sequence profiles of hydrophobic moments in the SPLIT algorithm, we use the Juretić and Lučin modification (1998) giving the INDA index (index of amphipathicity calculated at each sequence position for each twist angle so that an amphipathic α-helix conformation with 100 degrees twist angle gets the highest index value) using the Eisenberg hydrophobicity scale as input (Eisenberg et al. 1982). This locates sequence segments with optimal hydrophobic moments given a helical conformation (Juretić et al. 1999). A convenient measure of peptide α-amphipathicity is a continuous length of INDA values higher than 3.0, denoted as the INDA-length. This can be used as an alternative to the more commonly used global amphipathicity of the peptide (mean per residue hydrophobic moment or μH) in evaluating the potential of an AMP.

Note that μH values for a given peptide can vary markedly depending on the hydrophobicity index scale used to calculate it, so should not be used in absolute terms. To partly overcome this problem, a relative hydrophobic moment (μHrel) can be used (Zelezetsky and Tossi 2006). This is the ratio of μH for a given sequence to that of a theoretical perfectly amphipathic helix, μHmax). μH, μHrel and mean hydrophobicity (Ĥ) of a peptide sequence can be obtained with the on-line tool HydroMCalc on the web server http://www.bbcm.univ.trieste.it/~tossi/HydroCalc/HydroMCalc.html. They can be evaluated using either the Eisenberg consensus hydrophobobity scale, the Kyte and Doolittle scale (1982), or the combined consensus hydrophobicity scale (CCS) of Tossi et al. (2002). While μHrel values are generally comparable with the different scales, μH values are not.

An advantage of using a profile rather than a global property is that one can take into consideration asymmetry of structural preferences and physicochemical properties along the peptide sequence. A practical consideration is how to convert profiles of smoothed amino acid attributes into QSAR descriptors in such a way that lengthwise asymmetry information is not disregarded. We have solved this problem by bending the peptide sequence into an arc (Juretić et al. 2009). A vector is then associated with each amino acid so that all vectors have the same origin (that of the coordinate system, see Fig. 2), with their direction depending on the chosen arc and amino acid position in the arc sequence, while the vector’s length depends on the chosen amino acid attribute and the smoothing process used for creating the sequence profiles. Vector summation of all such vectors for all amino acids in the sequence produces the sequence moment, whose direction (the angle with respect to the x-axis) preserves the lengthwise asymmetry information for a chosen attribute (Juretić et al. 2009).

Fig. 2
figure 2

Sequence moments for kassinatuerin-1 (left panel) and PGLa (right panel). For each residue, small red vectors are calculated by using Janin’s amino acid index scale (Janin 1979), while small blue vectors are calculated by using Guy’s amino acid index scale (Guy 1985). The D-descriptor is the cosine of the angle δ between sequence moments that are vector sums (large bold arrows) of vectors for individual residues from a chosen peptide

This information can be very useful, as the separation of the sequence moment vectors obtained using two different amino acid attribute scales can be very different for mediocre and highly selective peptide antibiotics. An exhaustive examination of many possibilities ended with a final empirical choice of one of the simplest cases, the descriptor being the cosine of the angle between the two sequence moments for a peptide bending arc of π/2, using just two different amino acid attributes. These were the hydrophobicity indices from the scales of Janin (1979), and Guy (1985). Named the D-descriptor, it gives the best correlation between measured and predicted TI values for 36 non-homologous peptides in the training data set of anuran helical AMPs (Juretić et al. 2009), following the linear relationship: TI = 50.126−44.803D. All TI predictions in this paper use this model if not specified otherwise. The discriminating capacity of the D-descriptor is illustrated in Fig. 2 for the anuran AMPs PGLa and kassinatuerin-1 (Lohner and Prossnigg 2009; Mattute et al. 2000). For PGLa the sequence moments are highly separated and it has both a very high measured and predicted selectivity in our data set (Table 1: measured TI = 105, predicted TI = 95), while for kassinateurin sequence moments are poorly separated and both measured and predicted selectivity are low (Table 1: measured TI = 7.5, predicted TI = 7).

Table 1 Measured and predicted selectivity for some anuran AMPs and their analogues

It is interesting that Janin’s and Guy’s hydrophobicity scales differ principally in values for Gly, Ala and His residues, which are well represented in the anuran AMPs. Both scales are based on an evaluation of predominantly buried to predominantly solvent exposed residues in proteins, but come to different conclusions as to these particular residues, so they have opposite signs assigned to them in the two scales (Juretić et al. 2009 Supp. Info.). For the purpose of creating sequence profiles, sequence environments are calculated for each amino acid position, using the mean value of attributes of closest neighbours excluding the central amino acid, a smoothing choice that produced the best descriptors (Juretić et al. 2009). The resulting smoothed positional attributes (small blue and red vectors on the peptide arc in Fig. 2) have distinctly different behaviours, particularly in the N-terminal part of the two AMP sequences. A similar behaviour was also observed for other anuran AMPs such as magainin 2 and pseudin 2 (Juretić et al. 2009). We surmise that the two hydrophobicity scales assess amino acid attributes differently in such a way as to capture subtle differences between highly selective and mediocre frog-type AMPs through the sequence moments description of peptide lengthwise asymmetry. In this respect, it is significant that when used in the SPLIT algorithm, Janin’s scale consistently predicts higher preferences for membrane buried helix conformation than the Guy’s scale, and this difference is more prominent for selective AMPs.

Applicability of the D-descriptor model for calculating TI

The D-descriptor predicts high selectivity even when other characteristics of the peptide do not. For example, PGLa does not have a higher content of amino acids that are over represented in good peptide antibiotics as identified by statistical analysis of residue frequencies (E,D,Q,H,G,M,V,N,K,T, Juretić et al. 2009). It actually has a lower amphipathicity (μHrel = 0.37, INDA-length = 5) than pseudin 2 (μHrel = 0.58, INDA-length = 17) or kassinatuerin-1 (μHrel = 0.61, INDA-length = 8). The mean hydrophobicity, as calculated using the CCS hydrophobicity scale, is similar for PGLa and pseudin 2 (Ĥ = −0.63 and −0.52 respectively), while considerably higher for kassinatuerin-1 (1.51). However, PGLa has a high content of the small amino acids singled out with the Janin-Guy pair of hydrophobicity scales, and these are arranged in several so-called “small motifs” [GAS]XXX[GAS] and [GAS]XXXXXX[GAS] indicated as being relevant for the interaction of membrane helices (Senes et al. 2000; Schneider and Engelman 2004; Walters and De Grado 2006), which may have a bearing on antimicrobial activity. Pseudin 2 and kassinatuerin-1 have fewer of these residues and are devoid of small motifs.

Experimental TI values are compared in Table 1 with D-descriptor predicted values obtained using the on-line tool at http://split.pmfst.hr/split/dserv1/, and the correlation seems quite acceptable. Note that some sequences in this table have low pairwise sequence identity, while others differ by only one or two point mutations from their wild type progenitors, so that the D-descriptor prediction method works for both homologous and non-homologous peptides.

The D-descriptor model, as it stands, predicts TI for helical peptides of anuran origin or derived from them, and should not be used loosely outside this framework. The calculated TI are predictive for potentially highly selective peptides acting specifically on Gram-negative bacterial strains. The correlation among predicted and measured TI values in the training set of 36 non-homologous peptides (r 2 = 0.83, Juretić et al. 2009) is sufficiently good that a high calculated TI value confidently predicts a high selectivity, as haemolytic peptides and poorly selective AMPs both tend to have smaller angle between sequence moments and thus lower predicted TI.

The linear relationship used (see "Use of sequence moments to determine selectivity descriptors") means that as the D-descriptor ranges from −1 to +1 (cos 0°–cos 180°) the predicted TI values range between ≈5 and ≈95. This is one factor that limits r 2 value, as a predicted value of 95 can actually correspond to a real TI that could be considerably greater than this, and conversely, one of 5 to a real TI considerably less than this. In any case, a calculated TI value close to 95 predicts for high selectivity, but does not guarantee a high antimicrobial activity. Furthermore, spuriously low or high TI values are obtained for degenerate peptides composed solely of some types of amino acids (especially Ala, Gly, or His). One of the goals in future research is the development of a descriptor that would amend these shortcomings, but these problems can also be solved by providing appropriate filters that use incorporated expert knowledge about anuran peptide antibiotics in evaluating a sequence (see next section).

Design of novel selective AMPs, the adepantins

The Designer algorithm incorporates TI prediction via the D-descriptor model with expert knowledge about frog-type linear peptides having a preference for forming an amphipathic helix in a membrane environment (Juretić et al. 2009). Its output is the primary structure of de novo peptides, which are predicted to be highly selective towards Gram-negative bacteria, named adepantins, an abbreviation for automatically designed peptide antibiotics.

The Designer algorithm uses an objective construction procedure, based on collected experimental data from anuran AMPs having a high therapeutic index (TI > 20) as calculated from published MIC values against E. coli and the HC50 for human red blood cells. The module incorporating expert knowledge depends on the data set used for training the algorithm, but allows freedom in choosing certain conditions, for example the sequence length and net positive charge. We chose a net charge of +4 or +5, a percentage of residues with a high selectivity index (E,D,Q,H,G, Juretić et al. 2009) of at least 35%, and a length of 16–23 residues. Furthermore, the two C-terminal residues must already exist as a motif in at least one of 26 best anuran natural antibiotics in our database, the peptide must conform to a motif regularity index of less than 2.5 (see below) and the predicted TI must be over 85. The output consisted of peptides with a relatively high Gly content (21–32%). A first set of seven 23 residue adepantins suggested by the Designer algorithm had very limited similarity to any other natural or synthetic AMPs, with at most 50% identity to plasticins (El Amri and Nicolas 2008) and bombinins (Gibson et al. 1991).

Three adepantins have been selected for synthesis and validation. Adepantins 1 and 2 (GIGKHVGKALKGLKGLLKGLGE[S/C]) are identical apart from the C-terminal residues while adepantin 3 (GLKGLLGKALKGIGKHIGKAQGC) shows only about 30% identity to these. Small motifs are underlined. The Cys residue in adepantins 2 and 3 was acetamidated when testing the peptides in monomeric form. All three adepantins exhibited strong and specific antibacterial activity against E. coli (MIC values from 1 to 4 μM and low haemolytic activity (HC50 > 150) leading to a TI in the range 150–400 (Juretić et al. 2009; Ilić et al. manuscript in preparation).

Adepantin 1 has several small motifs both GXXXG and GXXGXXXG, known to promote helix-helix interaction in a membrane environment (Melnyk et al. 2004; Walters and De Grado 2006). It is possible that their presence promotes helix aggregation in this environment, favouring penetration into the bacterial cytoplasmic membrane, the obligatory first step toward antibacterial activity for many AMPs (Matsuzaki et al. 1995; Huang 2000; Fernandez et al. 2009; Mihajlovic and Lazarides 2010). In any case, for Designer produced peptides ranging from 16 to 23 residues, examples can be found of both peptides without any small motifs as well as peptides so rich in them that they cover the entire sequence.

Plasticins and bombinins (El Amri et al. 2007; Nicolas and El Amri 2009; Simmaco et al. 2009), also have a high percentage of Gly residues and present small motifs over most of their sequence. Plasticin B1, (GLVTSLIKGAGKLLGGLFGSVTGGQS) for example has 85% of its primary structure composed of them, and its identity to the adepantins is about 40% (as determined by the LALIGN tool). These peptides are also quite selective and active against Gram-negative bacteria, so it is possible that the presence of these motifs somehow correlates with these attributes. They are not, however, essential, as there are examples of selective and quite active anuran AMPs without small motifs (ranatuerin-1 for example), as well as examples of quite haemolytic AMPs containing them (e.g. Dermaseptin-5). It may be that it is not just their presence but a particular arrangement that correlates with peptide activity and selectivity. When larger data sets connecting structure and function become available, it may become possible to identify those combinations of small motifs that promote both peptide activity and selectivity.

The choice of 23 residues is obviously arbitrary. The Designer module can generate primary structures down to 14 amino acid residues that are still predicted to be both active and selective. This considerably reduces the expense of peptide synthesis and increases the potential for conversion of adepantins into viable lead compounds for anti-infective agents. However, 16 residues is the limit to maintain the chosen parameters (G,D,E,Q,H > 35% and charge +4 to +5).

The fixing of the last two amino acids is responsible for the frequent presence of Cys residues at the C-terminus of adepantins. They derive from the so called “rana box” present of many natural anuran peptides (Tossi et al. 2000), a cystine-bridged cyclic structure that results in a deviation from the linear helical structure, but concerns only a small percentage of peptide residues, and its role in peptide activity and selectivity is not clear (Simmaco et al. 1998). This can be rectified in the Designer algorithm by choosing another closely related peptide (e.g. ADP1 with regard to ADP2), or kept as a useful anchoring site for fluorescent labelling or covalent dimer formation.

The motif regularity index (Juretić et al. 2009), which measures how well the designed peptide incorporates motifs that are the most common in the structure of the best peptides antibiotics, is also an important design restriction. The smaller this index is from its adopted upper limit of 2.5, the greater is the probability that most common amino acid motifs from natural AMPs will be incorporated in the designed peptide.

To give an example of what happens when parameters are altered, for a predicted TI > 70, regularity index <2.5; % G,D,E,Q,H > 25%, Ĥ = −1.5–+0.5, μHrel > 0.35 (CCS scale); charge > + 2; strict nonpolar versus polar residues separation on a helical wheel projection (A,L,M,V,I,F,W vs. E,D,Q,N,G,K,R), length = 16 residues, the result is a total of 95 potential AMPs.

Increasing the selectivity through suggested point mutations

While the Designer module generates potentially selective AMP sequences, another module, Mutator, has been implemented to suggest whether point mutations in a peptide with known TI can improve it by determining the effect of such mutations in the D-descriptor model. The bottleneck again comes from the experimental tests required to establish how much confidence we can have in the present versions of Mutator for suggesting just one or two point mutations expected to increase peptide selectivity.

Experimental results available to date confirm its predictions, and we intend to provide free access to Mutator through a dedicated web site, so that other research groups interested in a rational approach for improving similar lead antimicrobial peptides can synthesize and test peptides with point mutations suggested by the algorithm. Point mutations were initially tested by synthesizing pseudin 2 and its F9A point mutant, as well as ascaphin 1 and its F2I point mutant (Juretić et al. 2009), as both mutants were predicted to show significant increases in the therapeutic index. All peptides were amidated at the C-terminus, which increased their net positive charge by one unit. Haemolysis was tested by using a 0.5% RBC concentration instead of the somewhat higher values used by other workers (e.g. Conlon et al. 2004; Pál et al. 2005), so we obtained lower measured TI values for the wild type peptides than reported by these authors. Psuedin was predicted by the Mutator module to have a TI of 6 against a measured TI of 7, while ascaphin was predicted to have a TI of 40 against a measured TI of 50. Both suggested point mutations (F9A or F2I) were predicted to increase the TI (to 89 and 78 for pseudin and ascaphin respectively) and in fact did so, as the experimentally estimated TI values were, respectively, >30 and >60. The D-descriptor model in Mutator thus makes it quite sensitive to the effects of point mutations.

Other examples of point mutations suggested by the Mutator algorithm as likely to increase the therapeutic index are listed in Table 2 for magainin analogues. Testing improved pexiganan analogues is interesting due to high estimated revenues for this antibiotic (Islam and Hawser 1998; Gottler and Ramamoorthy 2009), which were never realized. The Mutator algorithm predicts a substantial increase of pexiganan selectivity (increased TI) after only one or two point mutations. While some of these decrease hydrophobicity and might therefore also result in a decreased antimicrobial potency, others actually increase Ĥ, so may not reduce potency. For example, the K18L mutation brings the predicted TI to the maximal value of TI = 95.

Table 2 Point mutations suggested by the mutator

Searching for novel natural AMPs by using conserved signal peptides

In-silico searches of EST databases to find novel natural AMPs

Searching the UNIPROT database using an AMP sequence as query can result in many hits corresponding to putative or reported AMPs sequences; more so than searching dedicated AMP databases (Giangaspero et al. 2001; Wang et al. 2009; Thomas et al. 2010) that may not consistently follow up published AMP sequences or those deposited in UNIPROT. Out of 90 anuran peptides extracted from published papers to form and test the Designer algorithm only a minority could be found by searching the AMP databases. With respect to anuran peptides, by using the keyword “Amphibian Defense Peptide” 1,676 sequences were extracted from the UNIPROT database, out of which 1,108 were precursor AMP sequences (pAMP) containing a signal peptide and an acidic propeptide as well as the mature AMP (this tripartite distribution is characteristic of anuran AMP propeptides), a number significantly larger than anuran peptides contained in the dedicated databases (from about 200 to 600).

A simple visual examination of the collected anuran precursor AMP sequences was sufficient to confirm earlier observations for several classes of toxins (Conticello et al. 2001) and AMPs (Zanetti 2004; Nicolas and El Amri 2009; Konig and Bininda-Emonds 2011) that signal sequences are better conserved than the propeptide region, and that both of these are much better conserved than the mature AMPs. This could seem surprising, in view of the at least 3 times better conservation of secretory protein sequences than associated signal sequences, which are mostly cut off and discarded (Li et al. 2009). While the biological reasons for the conservation of AMP signal sequences certainly deserve closer study, one can immediately appreciate that indirectly hunting for novel AMPs through their associated signal peptides can be a powerful tool for finding apparently non-homologous natural AMPs. One example of how evolutionary pressure has resulted in high variation of mature anuran AMPs and conservation of associated signal sequences is shown in the Table 3.

Table 3 Precursor AMP sequences from anurans

By using the XPF associated signal peptide from Xenopus laevis: MYKGIFLCVLLAVICANSLA, we have found three apparently novel antimicrobial peptides in the Xenopus tropicalis EST database using the TBLASTN tool (http://blast.ncbi.nlm.nih.gov/) to search translated nucleotide database, and then examining all hits for the telltale presence of the characteristic tripartite propeptide (Table 4). All three putative AMPs are predicted to have a high TI on the D-server and have little similarity to known AMP sequences (BLASTP E-values always > 1). Their pairwise identity is 44–61%, but none is more than 30% identical to known AMPs. The two peptides initiating with Gly were found also by using the sequence MFKGLFLCVLLAVLSAQSMA (XT-6-like precursor signal peptide from Xenopus tropicalis) as query (Roelants et al. 2010).

Table 4 Novel AMPs found in anuran EST database

Using conserved signal sequences from other AMP families

A conservation of specific signal peptides has been observed also with a different tripartite arrangement of signal, acidic propiece and mature AMP sequences. This is the case with teleost (bony fish) host defence propeptides, where the AMP is between the N-terminal signal sequence and C-terminal acidic propiece. By using the signal sequence of moronecidin (MKCATLFLVLSMVVLMAEPGDA), from striped bass (Morone saxatilis Lauth et al. 2002), we have identified eight previously unrecognized putative AMPs in six different fish species with low similarity to sequences in the UNIPROT database, as shown by E-values ranging from 0.3 to 40 after a BLASTP search (Table 5). A novel putative AMP with the curiously apt primary sequence FISHIIGGIIHAGKAIHEAIQRHRR, discovered among EST sequences from Burton’s mouthbrooder (Haplochromis burtoni), has the highest, albeit limited similarity to known teleost APMs. It is 58% identical with a piscidin-like peptide from brown-marbled grouper (Epinephelus fuscoguttatus, sequence identifier ADE06665). Furthermore, pairwise identity among identified peptides is no more than 70%. Note that the assumed AMP sequences in Tables 4 and 5 are based on imperfect alignments with known peptides.

Table 5 Putative fish AMPs

The sequences of many novel potential peptide antibiotics that already exist in the EST database can thus be easily identified by using this indirect strategy based on conserved signal peptides, and the combined use of the D-descriptor further helps identify those most likely to result in active and selective AMP leads. Note that as the D-descriptor is currently based on an anuran peptide database, and we are confident in expectations of high TI for the identified frog peptides, but it still needs to be validated for other classes of AMPs, such as the putative fish AMPs.

Future prospects

Other descriptors able to better distinguish haemolytic from both poorly selective and highly selective AMPs may be extracted from SPLIT profiles for transmembrane/membrane buried helix (TMH) preference versus surface-bound amphipathic helix (SAH) preference (Fig. 1, red and gray profiles respectively), and can be used to construct alternative one-parameter linear models for TI prediction. The C1-descriptor, for example, takes into account that a) peaks corresponding to SAH and TMH are more widely separated for antibiotics than haemolytic, and b) TMH peaks are more prominent in sequence profiles for haemolytic with low measured TI. A C1 based TI estimate is obtained by using the SPLIT 3.5 AMP link, which appears after the peptide sequence has been submitted for analysis (http://split.pmfst.hr/split/). The log–log correlation between TI(measured) and TI(predicted) for our data set of anuran AMPs is then considerably improved. Preliminary results indicate that the combination of D and C1 parameters is better for the whole range of TI values than either of these parameters alone, when we restrict ourselves to anuran peptides and considering MIC towards Escherichia coli (Lučić et al. to be submitted).

Adepantin 1 exhibited an unusually high selectivity for E. coli with respect to S. aureus, with MIC (S. aureus)/MIC (E. coli) of at least 32 (Juretić et al. 2009), while other peptide antibiotics designed against Gram-negative bacteria, such as MSI-103 (Table 1) achieved only about half of this selectivity ratio (Epand et al. 2010). Adepantins may thus be novel tools for exploring selectivity mechanism also with respect to different types of bacterial membranes, such as, for example, effects of induced phase separation and clustering of anionic phospholipids from zwitterionic ones as proposed by Epand et al. (2010). This might also help us better understand the mechanism responsible for the high selectivity with respect to erythrocytes.

To exit from the realm of anuran peptides and E. coli as the test organism, the limit is in data availability. In the open literature, mostly due to the efforts of Conlon and his collaborators, the TI data are most abundant for anuran peptides and for E. coli as the test organism (Juretić et al. 2009). S. aureus has often been used as the reference Gram-positive test organism, a choice often spurred by the high mortality associated with MRSA strains (Klein et al. 2007), and natural peptide antibiotics with preferential activity against S. aureus have also been reported (Castro et al. 2006; Fernandez et al. 2009). These may be good lead compounds in the design of peptides targeting Gram-positive bacterial strains, which are known to have quite different cytoplasmic membrane lipid composition with respect to Gram-negative bacteria (Yeaman and Yount 2003). Collecting a sufficiently representative database of such peptides, however, is proving to be more arduous. Having open access to experimentally determined TI values for AMPs, not only from anurans, but also from insects, fish and mammals, would be very beneficial. At present the scarcity of openly available structure–activity data somewhat impedes QSAR efforts.

Gaining a deeper insight into which structural features are important for distinguishing highly selective and active AMPs (such as the PGLa and adepantins) from mediocre ones (such as kassinatuerin 1 and pseudin 2) will become possible when a larger number of structures become available. Subtle differences among these AMPs may be revealed by their three dimensional, membrane-associated structure and dynamics, either determined directly from the NMR data (Tremouilhac et al. 2006) or through molecular-dynamics simulation techniques (La Rocca et al. 1999; Sengupta et al. 2008). Physicochemical surface properties of target cell membranes, peptide mobility, lateral diffusion, dimerization & oligomerization and induced membrane thinning (Chen et al. 2003; Jang et al. 2006; Bhonsle et al. 2007; Strandberg et al. 2009b; Chekmenov et al. 2010) are also likely to influence antimicrobial potency and selectivity.

Statistical information from peptide activity databases can also be used to improve AMP design. For example, the observation that a Trp residue in position 2 is quite frequent among known AMPs (Maloy and Kari 1995) was confirmed in the putative AMP sequences extracted from the EST databases (Tables 4 and 5), and suggests using this as a design criterion in short adepantins. In addition to improving activity, Trp residues are valuable natural fluorescent probes to measure physical properties of peptide’s microenvironment. Due to observed position-dependent effects of Trp substitution (Jin et al. 2003) it is preferable to insert this residue at specific sequence positions as guided by the Mutator module, rather than doing so without a prediction as to the effect on activity/selectivity, or to attaching a bulky fluorescent probe to the peptide.

Given the conservation of signal sequences, the processing of immature host defence peptides also deserves further study since it may lead to the discovery of a highly conserved subset of signal recognition particles and peptidases specific for such peptides. We have shown that EST database searches for subsets of signal peptides or propeptides associated with host defence peptides can produce a plethora of potential AMPs not annotated as such in the UNIPROT data base (Petrov et al. to be submitted). Furthermore, genome analysis for conserved AMP-associated signal sequences has the potential to resurrect AMPs from extinct species now present in museums only as dried specimens and tissues.